My WebLog

March

1-Mar-2002

Read an interesting interview with John Barlow. He's worried that the sky is falling WRT control of intellectual property and the shrinking state of decentralization in the net in general. Trouble ahead, trouble behind [ Thanks to Marcia Blake for this link. ]

Yesterday, I read a pretty interesting piece in The Register by Thomas C Greene claiming the MPAA President Jack Valenti's ranting about adding content controls to PC is over the top. Of course, but then he followed up with an attractive vision of Hollywood movies striped of most of the income stream copyright laws give them. MPAA's Valenti pushes for copy-control PCs.

4-Mar-2002

Reading about YouServ (or uServ) I found a reference to XDegrees. This outfit provides a uniform namespace for resources that is location independent. Based on Andy Oram's writeup, it has many of the other features of a useful file sharing system: caching, security, etc. Users rely on XDegree servers, so this isn't a pure software package. There's a technology whitepaper on the XDegrees web site, but I haven't read it.

I've read Plan 9 papers before, but it seems like there's been a lot of discussion about its uniform treatment of namespaces lately, so I followed a link provided by Jeff Bone to "The Organization of Networks in Plan 9". In addition, to every resource is a file or directory, Plan 9 also advocates ascii formatting for commands and status data. I hadn't realized that the Linux /proc file system is a partial realization of these ideas.

18-Mar-2002 20:25

Bob Frankston rants about various problems facing the Internet today. Three key things are needed a separation of connectivity from content providers, decoupling DNS names from semantics (such as trademarks) to make them more like phone numbers, and deployment of encrypted IPv6. All these are obvious at some level, but being all caught up in how hard it is to fix these problems we may forget how necessary the fixes really are. [ Thanks to Monty Solomon for this pointer. ]

Kevin Kelly argues in the New York Times Magazine that we should accept the technological inevitability of free music in the wake of Napster and its followers. There are still plenty of ways for most members of the music industry to continue to make money. Every should just get over it and move on. Of course, I'm paraphrasing a bit. [ I got this pointer from Eric S. Johansson. ]

An interesting comparison of SOAP and REST from a security perspective by Paul Prescod. I've been reading about REST and the alternatives in the Decentralization mailing list, though I'm still reading messages from August of last year. [ The pointer to the Prescod piece came from Bruce Schneier's Crypto-Gram. ]

2-April-2002 10:23

A review from the New Yorker of "The Social Life of Paper", a book by Sellen and Harper makes interesting reading for advocates of the paperless office and telecommuting. It makes the, perhaps obvious, point that people use paper to organize their thoughts and augment their memories. What is less obvious is that these properties of paper do not translate directly into online forms. The conclusion seems to be that while computers can superbly replace filing cabinets for storage and archiving, they have a tougher time with desktops and the piles of paper they support.

30-May-2002 7:46

The InterMezzo paper "Removing Bottlenecks in Distributed Filesystems: Coda & InterMezzo as examples" by Peter J. Braam and Philip A. Nelson. This describes a reimplementation and extension of Coda to provide better performance by integrating more smoothly with the native kernel file system that implements the persistent cache. InterMezzo is organized with a small kernel component, a user space cache manager (as Coda and early versions of AFS) and a remote server. As with later versions of Coda, InterMezzo obtains a write permit from the server providing better consistency for writes than AFS has. In fact, this is closer to the token managment used by DFS and could enable single site semantics. The address the performance problems caused frequent communication with the user space cache manager, the kernel also obtains permission from the cache manager to operate directly on the cache. Thus, InterMezzo uses a two level synchronization mechanism to keep all components coherent while providing fast operation for the most common operations. The original cache manager, called Lento, was implemented in Perl. More recently, a reimplementation (and redesign) is called InterSync.

18-June-2002 10:37

A discussion of high-speed Internet worms makes pretty alarming reading. Past attacks of Code Red and Nimba are likely to be weak precursors of what we may expect in the not to distance future. These worms are often referred to as Warhol worm could infect all vulnerable hosts on the Internet in 15 minutes. Far from scare mongering, the infection speedup techniques described here seem entirely plausible. The paper constructively discusses ideas for a CDC-like entity to enhance defenses against these attacks. [ Thanks to Bruce Schneier's Crypto-Gram for the pointer to this article. ]

27-August-2002 8:17

Read a couple of interesting papers on adaptive and predictive caching.

[KL96]: Thomas M. Kroeger and Darrell D. E. Long. Predicting file-system actions from prior events. In Proceedings of the Winter 1996 USENIX Technical Conference, pages 319--328, San Diego, January 1996.
The idea is to use techniques from text compression algorithm which predict events based on their recent predecessors. These predictions are used to prefetch data into the cache. Using file traces they claim 15% improvement over LRU and the ability to match LRU's hit-rate with much smaller cache, e.g. 4Mb predictive vs 90Mb LRU.
[AAM+02]: Ismail Ari, Ahmed Amer, Ethan Miller, Scott Brandt, and Darrell Long. Who is more adaptive? ACME: adaptive caching using multiple experts. In Workshop on Distributed Data and Structures (WDAS 2002), Paris, France, March 2002.
The paper doesn't describe specific results but proposes a scheme that combines the estimates of multiple caching policies using machine learning techniques. The learning rewards good performing cache replacement algorithms and punishes bad ones, this allows it to change the weights used to actually manage the cache. These weights can change over time leading to behavior that adapts to changing conditions such as workload. Their system also split criterial from policy. Criteria are the metrics (e.g. size, frequency of access, last time of access, etc) upon which the policies are based. This allows several policies to share criteria and reduces the cost of evaluating multiple policies. I had a similar idea for competing caching policies in connection with applying autonomic computing principles to a distributed file system and I am glad the it is getting attention.

4-October-2002 20:39

Read "Tangler: A Censorship-Resistant Publishing System Based On Document Entanglements" by Marc Waldman and David Mazičres dated December 8, 2001. This system contains an interesting combination of features that could make a very useful publishing paradigm. The name comes from the idea of using Shamir secret sharing to entangle each data block being with two other randomly selected blocks from the storage pool. They propose using 3 of 4 sharing so that each data block is represented by 4 server blocks, any three of which are needed to reconstruct the original data. Each block of data appears completely random in isolation. Server blocks are indexed by the SHA-1 hash of their contents. Each data block is then identified by a set of four SHA-1 hash values. Each file consists of a data block similar to an inode consisting of the 4 hashes that identify each data block in the file. A collection consists of a tree of such files and directories assembled recursively using this entanglement process. The collection root is signed and labeled with the publisher's public key. A similar scheme described by David Madore, called Random Pads, involves XORing multiple large blocks of data with several existing random pads and storing the result as another random pad. This approach is considerable cheaper than Shamir secret sharing, the main difference being that all random pads must be located to reconstruct the original. Because of this there is no threshold of tolerable loss, so this faster method is considerably more fragile. My recollection is that I saw this suggested on the Freenet mailing list. Mojonation and other systems also use n-of-m sharing, but I don't know if the performance or other characteristics are similar to that used by Tangler. An interesting consequence of this entanglement is that each publisher has an interest in preservation of the blocks needed to reconstruct the content of other publishers. A file cannot be removed from the system without also removing blocks needed by other files. This furthers the cause of censor-resistance. Unlike other peer to peer systems with very many nodes, Tangler is designed to operate with a modest number of server block storage nodes each of which knows of the others. The storage network uses credits and receipts to validate the behavior of the servers. A server's operation is implicitly audited during ordinary use and it can be ejected from the system for non-performance. I must say, however, that I found the paper's description of the server algorithm difficult to understand.

5-October-2002 21:16

Over the last few days I've read the research summary for the IRIS project. This outlines a comprehensive plan to develop decentralized infrastructure based on distributed hash tables (DHT) for supporting large-scale distributed applications. I've long thought that using DHTs to store data blocks based on the hash of their contents was an important idea. Tangler, mentioned yesterday, is clearly an application on this model. It is hard to criticize the whole proposal but the effort has received substantial ($12M) funding from NFS. The proposal/summary provides a good overview of decentralized systems and has a comprehensive bibliography.

7-October-2002 20:55

I just finished reading "We've Got Blog: How Weblogs are Changing Our Culture", edited by John Rodzvilla of Perseus Publishing. I stumbled on this book in the local public library and was curious enough to check it out. The book consists of small chapters which are essays about weblogs, interviews with bloggers, or excerpts from weblogs. This provided a lot of background, history, personalities and something of the flavor of weblogs. Overall, though, I'd have to say I was not very impressed. There was very little I'd serious analysis of the subject, its development or future, which is what I was hoping for. On the other hand there was some fairly interesting and amusing details of the phenomenon. Since I knew next to nothing about the subject going into it, I found this useful. If you've been following blogs for a while, it probably isn't worth while unless you're a collector. I'm afraid my overall reaction was disappointment.

26-May-2003 22:33

I read "Linked: The New Science of Networks", by Albert-László Barabási and recently read "SYNC: The Emerging Science of Spontaneous Order", by Steven Strogatz.

I don't remember many specifics about "Sync" except that I liked it well enough to talk my son into reading it, which was no mean feat. I was a little worried about "Linked" in the beginning because it started with some of the same small-world anecdotes, such as the Kevin Bacon game and the Erdös numbers of published mathematicians. However, it soon veered onto its own trajectory and even cites Strogatz's research on several occasions (I don't recall whether Strogatz returned the favor).

Barabási covers the early work of Erdös on random networks and the clustered random nets of Strogatz both of which exhibit the small world effect but aren't very realistic models of most networks that are far from random. He surveys a wide variety of real networks and analyzes their organization, observing that it is at odds with the predictions of the random network models.

Then Barabási goes off on power law, or scale-free, networks where the number of nodes with k links is proportional to k^-γ, where γ is some constant that characterizes the connectivity of the network. This is often expressed informally as an 80/20 relation, such as "80% of the money is made by 20% of the people". Barabási clearly believes that power law relationships are very important to understanding complex systems. He describes numerous examples of physical systems undergoing a phase transition from chaos to order (or the reverse) where power laws crop up. He organized one chapter around a scale-free network model in which each node is assigned a fitness that governs its its affinity for links. A transformation of fitness to energy has the same mathematical structure as a Bose gas. The correspondence predicts that networks like this can undergo the analog of a Bose-Einstein condensation. The most well-known example of this winner-take-all scenario with a network which exhibits star-like topology is Microsoft Windows where Windows users are nodes (in the network analog) or particles (in the Bose gas analog).

Power laws show up whenever physical systems are undergoing chaos / order transitions. Studys of networks in computers, the Internet, actors and authors, cellular metabolism, gene regulation and others show the same sorts of characteristics. The compelling implication is that there are some deep and important laws that govern the formation and behavior of these complex systems. While is to too early to precisely formulate these laws Barabási argues convincingly for the importance of gaining a deep understanding of networks.

I'm sure I need a better rating system, but here's what I'm using now.

Important but not too original.

Important and interesting or clever.

An area of study.