I posted this note to cypherpunks on 23 March 1995 and got little feedback, so I'm still looking for comments. Let me know what you think. I have been pondering the future of the net from the perspective of a file service provider. Here is a scenario for providing caching and replication for the WWW and similar applications. I follow the sketchy outline with some questions that have been plagueing me for a while. ASSUMPTIONS The following scenario depends on a few assumptions about the world of the near future, say five years from now: 1. Intellectual property rights are dead. No one bothers to try to collect royalties or license fees for published or public materials. Copyright and patent protections are unenforceable and ignored. Contracts can be made for intellectual property and trade secrets are still important (as long as they remain secret). 2. Really private data will use conventional access control and be kept off distributed systems or be encrypted. This type of data is not the primary focus of this scenario. 3. Data is read-only or can be made to look that way by using a version control system where each version is immutable. Frequently changing data is outside this scenario's domain. 4. Digital money is widely deployed and allows efficient charging of small amounts for a wide range of network services. 5. Network bandwidth is very widely available and inexpensive, but not not "too cheap to meter". Small amounts of medium-latency (30-300 msec) bandwidth are continuously available for use as a control channel; at least something like digital cellular or Iridium connections can be used. 6. Disk space is plentiful but also not "too cheap to meter". 7. A global name space exists for identifying files, and some digital signature system is used to ensure integrity. A file reference consists of a univeral unique identifier, a primary location/name, and a digital signature. Perhaps the digital signature acts as (part of) the identifier. Perhaps the identifier has some small amount of structure allowing limited sorting of files according to some criteria. SCENARIO Consider a large collection of file users distributed throughout the network. They communicate with each other using a protocol that allows them to trade files for money. Some of these users (clients) mostly fetch files of interest into a small cache (e.g. a 10M disk partition); imagine this as the transport layer for a WWW application. Other users (servers) manage large amounts of disk space as a replication site for an assortment of files. Servers primarily respond to requests for files from clients, but also act as clients when fetching files they desire to cache. The protocol is symetric; all users are sometimes clients and sometimes servers. Servers are mostly money making operations. They charge enough for files to recoup disk and network costs plus make a small profit. Clients are generally money losing operations. However, the clients of popular authors will make some profit as a server of files which are in demand by other clients and servers. This is an example of Hal Finney's Micro-Capitalism. Clearly there are a lot of clever strategies for managing a server's cache and network bandwidth. Because the file transfer protocol is symetric, servers can fetch files from clients and clients can share files among themselves. For instance, a server might track requests from clients that it could not fulfill. If a file seems to be getting popular the server may decide it would be advantageous to cache it. To locate the file it could call back to the clients that had contacted it to see if any of them had successfully obtained it from other sources, and buy it back. Clients, of course, could anticipate this and offer newly acquired files to the servers they had unsuccessfully contacted. I assume that a global database of all replicas for all files is hopelessly unscalable. However, one distinguished location seems useful and easy. This location gives a financial avantage to the server at that location, as they will be the server of last resort when a client tries to access the file. For very popular data that server will be heavily loaded, exhibit large latencies and charge hefty premiums. This incentivizes other servers to pick up the data and advertise themselves as an alternate (cheaper) source. I assume that very popular files like SL-9 comet impact images, and Olympic Game scores, and the like will flood out quite rapidly and efficiently. I assume that unpopular files will always be fetched from the primary source, and no server will bother to cache them. QUESTIONS 1. Will a stable network of providers develop to replicate reasonably popular data? What about the broad and fuzzy range of medium popular files, like CERT advisories, RFC's, source releases of esoteric programs, essays by fringe authors, etc. These files may "only" be accessed by a few thousand users, but will economic constraints allow them to be cached or not? 2. What strategies are possible to allow improvement of the locality of reference seen by servers? I imagine adding a topical labels to the file identifier (in addition to or instead of the primary location). This would allow servers to advertise themselves as specializing in particular topics. Are there other good possibilities? 3. Even in the absence of enforceable intellectual property rights, authors may still desire to make a living. For files not widely cached the advantage of being the primary location will return a fair income to the author. But what are Danielle Steele and Steven Spielberg to do? Two possibilities come to mind. They can enter into contracts with a number of servers that will pay for the privilege to be the first to have the file available at the outset. A second idea is to serialize a popular work into a number of files. For each installment the author can extract some additional profit. How well will these work? Are there better ideas? 4. I assume that pricing strategies for maximizing profit in a range of market and competitive environments are well known. Does this highly dynamic environment map into a well-studied situtation? Are there analogs in the traditional markets that would be worth studying? References? 5. How reasonable are the initial assumptions? 6. What papers are there that address these issues? Your comments are solicited, Ted Anderson