March
1-Mar-2002
Read an interesting interview with John Barlow. He's
worried that the sky is falling WRT control of intellectual property and
the shrinking state of decentralization in the net in general. Trouble ahead, trouble
behind [ Thanks to Marcia Blake for this link. ]
Yesterday, I read a pretty interesting piece in The Register by Thomas C Greene
claiming the MPAA President Jack Valenti's ranting about adding content
controls to PC is over the top. Of course, but then he followed up with
an attractive vision of Hollywood movies striped of most of the income
stream copyright laws give them. MPAA's Valenti
pushes for copy-control PCs.
4-Mar-2002
Reading about YouServ
(or uServ) I found a reference to XDegrees. This outfit provides a
uniform namespace for resources that is location independent. Based on
Andy Oram's writeup,
it has many of the other features of a useful file sharing system:
caching, security, etc. Users rely on XDegree servers, so this isn't a
pure software package. There's a technology whitepaper
on the XDegrees web site, but I haven't read it.
I've read Plan 9 papers before, but it seems like there's been a lot
of discussion about its uniform treatment of namespaces lately, so I
followed a link provided by Jeff Bone to "The
Organization of Networks in Plan 9". In addition, to every resource
is a file or directory, Plan 9 also advocates ascii formatting for
commands and status data. I hadn't realized that the Linux /proc file
system is a partial realization of these ideas.
18-Mar-2002 20:25
Bob Frankston rants about
various problems facing the Internet today. Three key things are needed
a separation of connectivity from content providers, decoupling DNS
names from semantics (such as trademarks) to make them more like phone
numbers, and deployment of encrypted IPv6. All these are obvious at
some level, but being all caught up in how hard it is to fix these
problems we may forget how necessary the fixes really are. [ Thanks to
Monty Solomon for this pointer. ]
Kevin Kelly argues
in the New York Times Magazine that we should accept the technological
inevitability of free music in the wake of Napster and its followers.
There are still plenty of ways for most members of the music industry to
continue to make money. Every should just get over it and move on. Of
course, I'm paraphrasing a bit. [ I got this pointer from Eric
S. Johansson. ]
An interesting comparison of SOAP
and REST
from a security perspective by Paul Prescod. I've been reading about
REST and the alternatives in the Decentralization
mailing list, though I'm still reading messages from August of last
year. [ The pointer to the Prescod piece came from Bruce Schneier's Crypto-Gram. ]
2-April-2002 10:23
A review from the New Yorker of "The
Social Life of Paper", a book by Sellen and Harper makes interesting
reading for advocates of the paperless office and telecommuting. It
makes the, perhaps obvious, point that people use paper to organize
their thoughts and augment their memories. What is less obvious is that
these properties of paper do not translate directly into online forms.
The conclusion seems to be that while computers can superbly replace
filing cabinets for storage and archiving, they have a tougher time with
desktops and the piles of paper they support.
30-May-2002 7:46
The InterMezzo paper "Removing
Bottlenecks in Distributed Filesystems: Coda & InterMezzo as
examples" by Peter J. Braam and Philip A. Nelson. This describes a
reimplementation and extension of Coda to provide better performance by
integrating more smoothly with the native kernel file system that
implements the persistent cache. InterMezzo is organized with a small
kernel component, a user space cache manager (as Coda and early versions
of AFS) and a remote server. As with later versions of Coda, InterMezzo
obtains a write permit from the server providing better consistency for
writes than AFS has. In fact, this is closer to the token managment
used by DFS and could enable single site semantics. The address the
performance problems caused frequent communication with the user space
cache manager, the kernel also obtains permission from the cache manager
to operate directly on the cache. Thus, InterMezzo uses a two level
synchronization mechanism to keep all components coherent while
providing fast operation for the most common operations. The original
cache manager, called Lento, was
implemented in Perl. More recently, a reimplementation (and redesign)
is called InterSync.
18-June-2002 10:37
A discussion of high-speed Internet
worms makes pretty alarming reading. Past attacks of Code Red and
Nimba are likely to be weak precursors of what we may expect in the not
to distance future. These worms are often referred to as Warhol worm
could infect all vulnerable hosts on the Internet in 15 minutes. Far
from scare mongering, the infection speedup techniques described here
seem entirely plausible. The paper constructively discusses ideas for a
CDC-like entity to enhance defenses against these attacks. [ Thanks to
Bruce Schneier's Crypto-Gram
for the pointer to this article. ]
Read a couple of interesting papers on adaptive and predictive caching.
-
[KL96]
-
Thomas M. Kroeger and Darrell D. E. Long.
Predicting file-system actions from prior events.
In Proceedings of the Winter 1996 USENIX Technical Conference, pages
319--328, San Diego, January 1996.
The idea is to use techniques from text compression
algorithm which predict events based on their recent predecessors.
These predictions are used to prefetch data into the cache. Using file
traces they claim 15% improvement over LRU and the ability to match
LRU's hit-rate with much smaller cache, e.g. 4Mb predictive vs 90Mb
LRU.
- [AAM+02]
-
Ismail Ari, Ahmed Amer, Ethan Miller, Scott Brandt, and Darrell Long.
Who is more adaptive? ACME: adaptive caching using multiple
experts.
In Workshop on Distributed Data and Structures (WDAS 2002), Paris,
France, March 2002.
The paper doesn't describe specific results but proposes a
scheme that combines the estimates of multiple caching policies using
machine learning techniques. The learning rewards good performing cache
replacement algorithms and punishes bad ones, this allows it to change
the weights used to actually manage the cache. These weights can change
over time leading to behavior that adapts to changing conditions such as
workload. Their system also split criterial from policy. Criteria are
the metrics (e.g. size, frequency of access, last time of access, etc)
upon which the policies are based. This allows several policies to
share criteria and reduces the cost of evaluating multiple policies. I
had a similar idea for competing caching policies in connection with
applying autonomic computing principles to a distributed file system and
I am glad the it is getting attention.
Read "Tangler: A
Censorship-Resistant Publishing System Based On Document Entanglements"
by Marc Waldman and David Mazičres dated December 8, 2001. This system
contains an interesting combination of features that could make a very
useful publishing paradigm. The name comes from the idea of using
Shamir secret sharing to entangle each data block being with two other
randomly selected blocks from the storage pool. They propose using 3 of
4 sharing so that each data block is represented by 4 server blocks, any
three of which are needed to reconstruct the original data. Each block
of data appears completely random in isolation. Server blocks are
indexed by the SHA-1 hash of their contents. Each data block is then
identified by a set of four SHA-1 hash values. Each file consists of a
data block similar to an inode consisting of the 4 hashes that identify
each data block in the file. A collection consists of a tree of such
files and directories assembled recursively using this entanglement
process. The collection root is signed and labeled with the publisher's
public key.
A similar scheme described by David Madore,
called Random
Pads, involves XORing multiple large blocks of data with several
existing random pads and storing the result as another random pad. This
approach is considerable cheaper than Shamir secret sharing, the main
difference being that all random pads must be located to reconstruct the
original. Because of this there is no threshold of tolerable loss, so
this faster method is considerably more fragile. My recollection is
that I saw this suggested on the Freenet mailing list. Mojonation and
other systems also use n-of-m sharing, but I don't know if the
performance or other characteristics are similar to that used by
Tangler.
An interesting consequence of this entanglement is that each publisher
has an interest in preservation of the blocks needed to reconstruct the
content of other publishers. A file cannot be removed from the system
without also removing blocks needed by other files. This furthers the
cause of censor-resistance.
Unlike other peer to peer systems with very many nodes, Tangler is
designed to operate with a modest number of server block storage nodes
each of which knows of the others. The storage network uses credits and
receipts to validate the behavior of the servers. A server's operation
is implicitly audited during ordinary use and it can be ejected from the
system for non-performance. I must say, however, that I found the
paper's description of the server algorithm difficult to understand.
Over the last few days I've read the research summary for
the IRIS project. This outlines
a comprehensive plan to develop decentralized infrastructure based on
distributed hash tables (DHT) for supporting large-scale distributed
applications. I've long thought that using DHTs to store data blocks
based on the hash of their contents was an important idea. Tangler,
mentioned yesterday, is clearly an application on
this model.
It is hard to criticize the whole proposal but the effort has received
substantial ($12M) funding from NFS. The proposal/summary provides
a good overview of decentralized systems and has a comprehensive
bibliography.
I just finished reading "We've
Got Blog: How Weblogs are Changing Our Culture", edited by John
Rodzvilla of Perseus Publishing. I stumbled on this book in the local
public library and was curious enough to check it out.
The book consists of small chapters which are essays about weblogs,
interviews with bloggers, or excerpts from weblogs. This provided a lot
of background, history, personalities and something of the flavor of
weblogs. Overall, though, I'd have to say I was not very impressed.
There was very little I'd serious analysis of the subject, its
development or future, which is what I was hoping for. On the other
hand there was some fairly interesting and amusing details of the
phenomenon. Since I knew next to nothing about the subject going into
it, I found this useful. If you've been following blogs for a while, it
probably isn't worth while unless you're a collector. I'm afraid my
overall reaction was disappointment.
I read "Linked:
The New Science of Networks", by Albert-László Barabási and recently
read "SYNC:
The Emerging Science of Spontaneous Order", by Steven Strogatz.
I don't remember many specifics about "Sync" except that I liked it
well enough to talk my son into reading it, which was no mean feat. I
was a little worried about "Linked" in the beginning because it started
with some of the same small-world anecdotes, such as the Kevin Bacon game and the
Erdös
numbers of published mathematicians. However, it soon veered onto
its own trajectory and even cites Strogatz's research on several
occasions (I don't recall whether Strogatz returned the favor).
Barabási covers the early work of Erdös on random networks and the
clustered random nets of Strogatz both of which exhibit the small world
effect but aren't very realistic models of most networks that are far
from random. He surveys a wide variety of real networks and analyzes
their organization, observing that it is at odds with the predictions of
the random network models.
Then Barabási goes off on power law, or scale-free, networks where
the number of nodes with k links is proportional to
k-γ, where γ is some constant that characterizes
the connectivity of the network. This is often expressed informally as
an 80/20 relation, such as "80% of the money is made by 20% of the
people". Barabási clearly believes that power law relationships are
very important to understanding complex systems. He describes numerous
examples of physical systems undergoing a phase transition from chaos to
order (or the reverse) where power laws crop up. He organized one
chapter around a scale-free network model in which each node is assigned
a fitness that governs its its affinity for links. A transformation of
fitness to energy has the same mathematical structure as a Bose gas.
The correspondence predicts that networks like this can undergo the
analog of a Bose-Einstein condensation. The most well-known example of
this winner-take-all scenario with a network which exhibits star-like
topology is Microsoft Windows where Windows users are nodes (in the
network analog) or particles (in the Bose gas analog).
Power laws show up whenever physical systems are undergoing chaos /
order transitions. Studys of networks in computers, the Internet,
actors and authors, cellular metabolism, gene regulation and others show
the same sorts of characteristics. The compelling implication is that
there are some deep and important laws that govern the formation and
behavior of these complex systems. While is to too early to precisely
formulate these laws Barabási argues convincingly for the importance of
gaining a deep understanding of networks.
I'm sure I need a better rating system, but here's what I'm using now.
Important but not too original.
Important and interesting or clever.
An area of study.