
Read an interesting interview with John Barlow. He's
worried that the sky is falling WRT control of intellectual property and
the shrinking state of decentralization in the net in general. Trouble ahead, trouble
behind [ Thanks to Marcia Blake for this link. ]

Yesterday, I read a pretty interesting piece in The Register by Thomas C Greene
claiming the MPAA President Jack Valenti's ranting about adding content
controls to PC is over the top. Of course, but then he followed up with
an attractive vision of Hollywood movies striped of most of the income
stream copyright laws give them. MPAA's Valenti
pushes for copy-control PCs.
Reading about YouServ
(or uServ) I found a reference to XDegrees. This outfit provides a
uniform namespace for resources that is location independent. Based on
Andy Oram's writeup,
it has many of the other features of a useful file sharing system:
caching, security, etc. Users rely on XDegree servers, so this isn't a
pure software package. There's a technology whitepaper
on the XDegrees web site, but I haven't read it.
I've read Plan 9 papers before, but it seems like there's been a lot of discussion about its uniform treatment of namespaces lately, so I followed a link provided by Jeff Bone to "The Organization of Networks in Plan 9". In addition, to every resource is a file or directory, Plan 9 also advocates ascii formatting for commands and status data. I hadn't realized that the Linux /proc file system is a partial realization of these ideas.
Bob Frankston rants about
various problems facing the Internet today. Three key things are needed
a separation of connectivity from content providers, decoupling DNS
names from semantics (such as trademarks) to make them more like phone
numbers, and deployment of encrypted IPv6. All these are obvious at
some level, but being all caught up in how hard it is to fix these
problems we may forget how necessary the fixes really are. [ Thanks to
Monty Solomon for this pointer. ]
Kevin Kelly argues
in the New York Times Magazine that we should accept the technological
inevitability of free music in the wake of Napster and its followers.
There are still plenty of ways for most members of the music industry to
continue to make money. Every should just get over it and move on. Of
course, I'm paraphrasing a bit. [ I got this pointer from Eric
S. Johansson. ]
An interesting comparison of SOAP
and REST
from a security perspective by Paul Prescod. I've been reading about
REST and the alternatives in the Decentralization
mailing list, though I'm still reading messages from August of last
year. [ The pointer to the Prescod piece came from Bruce Schneier's Crypto-Gram. ]
The InterMezzo paper "Removing
Bottlenecks in Distributed Filesystems: Coda & InterMezzo as
examples" by Peter J. Braam and Philip A. Nelson. This describes a
reimplementation and extension of Coda to provide better performance by
integrating more smoothly with the native kernel file system that
implements the persistent cache. InterMezzo is organized with a small
kernel component, a user space cache manager (as Coda and early versions
of AFS) and a remote server. As with later versions of Coda, InterMezzo
obtains a write permit from the server providing better consistency for
writes than AFS has. In fact, this is closer to the token managment
used by DFS and could enable single site semantics. The address the
performance problems caused frequent communication with the user space
cache manager, the kernel also obtains permission from the cache manager
to operate directly on the cache. Thus, InterMezzo uses a two level
synchronization mechanism to keep all components coherent while
providing fast operation for the most common operations. The original
cache manager, called Lento, was
implemented in Perl. More recently, a reimplementation (and redesign)
is called InterSync.

A discussion of high-speed Internet
worms makes pretty alarming reading. Past attacks of Code Red and
Nimba are likely to be weak precursors of what we may expect in the not
to distance future. These worms are often referred to as Warhol worm
could infect all vulnerable hosts on the Internet in 15 minutes. Far
from scare mongering, the infection speedup techniques described here
seem entirely plausible. The paper constructively discusses ideas for a
CDC-like entity to enhance defenses against these attacks. [ Thanks to
Bruce Schneier's Crypto-Gram
for the pointer to this article. ]
[KL96]The idea is to use techniques from text compression algorithm which predict events based on their recent predecessors. These predictions are used to prefetch data into the cache. Using file traces they claim 15% improvement over LRU and the ability to match LRU's hit-rate with much smaller cache, e.g. 4Mb predictive vs 90Mb LRU.
[AAM+02]The paper doesn't describe specific results but proposes a scheme that combines the estimates of multiple caching policies using machine learning techniques. The learning rewards good performing cache replacement algorithms and punishes bad ones, this allows it to change the weights used to actually manage the cache. These weights can change over time leading to behavior that adapts to changing conditions such as workload. Their system also split criterial from policy. Criteria are the metrics (e.g. size, frequency of access, last time of access, etc) upon which the policies are based. This allows several policies to share criteria and reduces the cost of evaluating multiple policies. I had a similar idea for competing caching policies in connection with applying autonomic computing principles to a distributed file system and I am glad the it is getting attention.

Read "Tangler: A
Censorship-Resistant Publishing System Based On Document Entanglements"
by Marc Waldman and David Mazières dated December 8, 2001. This system
contains an interesting combination of features that could make a very
useful publishing paradigm. The name comes from the idea of using
Shamir secret sharing to entangle each data block being with two other
randomly selected blocks from the storage pool. They propose using 3 of
4 sharing so that each data block is represented by 4 server blocks, any
three of which are needed to reconstruct the original data. Each block
of data appears completely random in isolation. Server blocks are
indexed by the SHA-1 hash of their contents. Each data block is then
identified by a set of four SHA-1 hash values. Each file consists of a
data block similar to an inode consisting of the 4 hashes that identify
each data block in the file. A collection consists of a tree of such
files and directories assembled recursively using this entanglement
process. The collection root is signed and labeled with the publisher's
public key.
A similar scheme described by David Madore,
called Random
Pads, involves XORing multiple large blocks of data with several
existing random pads and storing the result as another random pad. This
approach is considerable cheaper than Shamir secret sharing, the main
difference being that all random pads must be located to reconstruct the
original. Because of this there is no threshold of tolerable loss, so
this faster method is considerably more fragile. My recollection is
that I saw this suggested on the Freenet mailing list. Mojonation and
other systems also use n-of-m sharing, but I don't know if the
performance or other characteristics are similar to that used by
Tangler.
An interesting consequence of this entanglement is that each publisher
has an interest in preservation of the blocks needed to reconstruct the
content of other publishers. A file cannot be removed from the system
without also removing blocks needed by other files. This furthers the
cause of censor-resistance.
Unlike other peer to peer systems with very many nodes, Tangler is
designed to operate with a modest number of server block storage nodes
each of which knows of the others. The storage network uses credits and
receipts to validate the behavior of the servers. A server's operation
is implicitly audited during ordinary use and it can be ejected from the
system for non-performance. I must say, however, that I found the
paper's description of the server algorithm difficult to understand.

Over the last few days I've read the research summary for
the IRIS project. This outlines
a comprehensive plan to develop decentralized infrastructure based on
distributed hash tables (DHT) for supporting large-scale distributed
applications. I've long thought that using DHTs to store data blocks
based on the hash of their contents was an important idea. Tangler,
mentioned yesterday, is clearly an application on
this model.
It is hard to criticize the whole proposal but the effort has received
substantial ($12M) funding from NFS. The proposal/summary provides
a good overview of decentralized systems and has a comprehensive
bibliography.
I read "Linked:
The New Science of Networks", by Albert-László Barabási and recently
read "SYNC:
The Emerging Science of Spontaneous Order", by Steven Strogatz.
I don't remember many specifics about "Sync" except that I liked it well enough to talk my son into reading it, which was no mean feat. I was a little worried about "Linked" in the beginning because it started with some of the same small-world anecdotes, such as the Kevin Bacon game and the Erdös numbers of published mathematicians. However, it soon veered onto its own trajectory and even cites Strogatz's research on several occasions (I don't recall whether Strogatz returned the favor).
Barabási covers the early work of Erdös on random networks and the clustered random nets of Strogatz both of which exhibit the small world effect but aren't very realistic models of most networks that are far from random. He surveys a wide variety of real networks and analyzes their organization, observing that it is at odds with the predictions of the random network models.
Then Barabási goes off on power law, or scale-free, networks where the number of nodes with k links is proportional to k-γ, where γ is some constant that characterizes the connectivity of the network. This is often expressed informally as an 80/20 relation, such as "80% of the money is made by 20% of the people". Barabási clearly believes that power law relationships are very important to understanding complex systems. He describes numerous examples of physical systems undergoing a phase transition from chaos to order (or the reverse) where power laws crop up. He organized one chapter around a scale-free network model in which each node is assigned a fitness that governs its its affinity for links. A transformation of fitness to energy has the same mathematical structure as a Bose gas. The correspondence predicts that networks like this can undergo the analog of a Bose-Einstein condensation. The most well-known example of this winner-take-all scenario with a network which exhibits star-like topology is Microsoft Windows where Windows users are nodes (in the network analog) or particles (in the Bose gas analog).
Power laws show up whenever physical systems are undergoing chaos / order transitions. Studys of networks in computers, the Internet, actors and authors, cellular metabolism, gene regulation and others show the same sorts of characteristics. The compelling implication is that there are some deep and important laws that govern the formation and behavior of these complex systems. While is to too early to precisely formulate these laws Barabási argues convincingly for the importance of gaining a deep understanding of networks.
Important but not too original.

Important and interesting or clever.
An area of study.