I posted this note to cypherpunks on 23 March 1995 and got little
feedback, so I'm still looking for comments.  Let me know what you
think.

I have been pondering the future of the net from the perspective of a
file service provider.  Here is a scenario for providing caching and
replication for the WWW and similar applications.  I follow the sketchy
outline with some questions that have been plagueing me for a while. 

ASSUMPTIONS 

The following scenario depends on a few assumptions about the world of 
the near future, say five years from now: 
  1. Intellectual property rights are dead.  No one bothers to try to 
     collect royalties or license fees for published or public 
     materials.  Copyright and patent protections are unenforceable and 
     ignored.  Contracts can be made for intellectual property and trade 
     secrets are still important (as long as they remain secret). 
  2. Really private data will use conventional access control and be 
     kept off distributed systems or be encrypted.  This type of data is 
     not the primary focus of this scenario. 
  3. Data is read-only or can be made to look that way by using a 
     version control system where each version is immutable.  Frequently 
     changing data is outside this scenario's domain. 
  4. Digital money is widely deployed and allows efficient charging of 
     small amounts for a wide range of network services. 
  5. Network bandwidth is very widely available and inexpensive, but not 
     not "too cheap to meter".  Small amounts of medium-latency (30-300 
     msec) bandwidth are continuously available for use as a control 
     channel; at least something like digital cellular or Iridium 
     connections can be used. 
  6. Disk space is plentiful but also not "too cheap to meter". 
  7. A global name space exists for identifying files, and some digital 
     signature system is used to ensure integrity.  A file reference 
     consists of a univeral unique identifier, a primary location/name, 
     and a digital signature.  Perhaps the digital signature acts as 
     (part of) the identifier.  Perhaps the identifier has some small 
     amount of structure allowing limited sorting of files according to 
     some criteria. 

SCENARIO 

Consider a large collection of file users distributed throughout the 
network.  They communicate with each other using a protocol that allows 
them to trade files for money.  Some of these users (clients) mostly 
fetch files of interest into a small cache (e.g. a 10M disk partition); 
imagine this as the transport layer for a WWW application.  Other users 
(servers) manage large amounts of disk space as a replication site for 
an assortment of files.  Servers primarily respond to requests for files 
from clients, but also act as clients when fetching files they desire to 
cache.  The protocol is symetric; all users are sometimes clients and 
sometimes servers. 

Servers are mostly money making operations.  They charge enough for 
files to recoup disk and network costs plus make a small profit. 
Clients are generally money losing operations.  However, the clients of 
popular authors will make some profit as a server of files which are in 
demand by other clients and servers. 

This is an example of Hal Finney's Micro-Capitalism.  Clearly there are 
a lot of clever strategies for managing a server's cache and network 
bandwidth.  Because the file transfer protocol is symetric, servers can 
fetch files from clients and clients can share files among themselves. 
For instance, a server might track requests from clients that it could 
not fulfill.  If a file seems to be getting popular the server may 
decide it would be advantageous to cache it.  To locate the file it 
could call back to the clients that had contacted it to see if any of 
them had successfully obtained it from other sources, and buy it back. 
Clients, of course, could anticipate this and offer newly acquired files 
to the servers they had unsuccessfully contacted. 

I assume that a global database of all replicas for all files is 
hopelessly unscalable.  However, one distinguished location seems useful 
and easy.  This location gives a financial avantage to the server at 
that location, as they will be the server of last resort when a client 
tries to access the file.  For very popular data that server will be 
heavily loaded, exhibit large latencies and charge hefty premiums.  This 
incentivizes other servers to pick up the data and advertise themselves 
as an alternate (cheaper) source. 

I assume that very popular files like SL-9 comet impact images, and 
Olympic Game scores, and the like will flood out quite rapidly and 
efficiently.  I assume that unpopular files will always be fetched from 
the primary source, and no server will bother to cache them. 

QUESTIONS 

1. Will a stable network of providers develop to replicate reasonably 
popular data?  What about the broad and fuzzy range of medium popular files, 
like CERT advisories, RFC's, source releases of esoteric programs, 
essays by fringe authors, etc.  These files may "only" be accessed by a 
few thousand users, but will economic constraints allow them to be 
cached or not? 

2. What strategies are possible to allow improvement of the locality of 
reference seen by servers?  I imagine adding a topical labels to the 
file identifier (in addition to or instead of the primary location). 
This would allow servers to advertise themselves as specializing in 
particular topics.  Are there other good possibilities? 

3. Even in the absence of enforceable intellectual property rights, 
authors may still desire to make a living.  For files not widely cached 
the advantage of being the primary location will return a fair income to 
the author.  But what are Danielle Steele and Steven Spielberg to do? 
Two possibilities come to mind.  They can enter into contracts with a 
number of servers that will pay for the privilege to be the first to 
have the file available at the outset.  A second idea is to serialize a 
popular work into a number of files.  For each installment the author 
can extract some additional profit.  How well will these work?  Are 
there better ideas? 

4. I assume that pricing strategies for maximizing profit in a range of 
market and competitive environments are well known.  Does this highly 
dynamic environment map into a well-studied situtation?  Are there 
analogs in the traditional markets that would be worth studying? 
References? 

5. How reasonable are the initial assumptions? 

6. What papers are there that address these issues? 

Your comments are solicited, 
Ted Anderson