SafeX - David E. Weekly

← Back to code | Home

Historical Archive: This is a conceptual design document from 2000 describing an anonymous file exchange system. Never implemented due to time and moral considerations.

SafeX = Secure and Anonymous File EXchange.

The Idea

To get on SafeX, you download a small client with which you then construct a list of your friends on SafeX. A small dialog box pops up on each friend's box, asking if you are to be trusted. A PGP fingerprint might be presented to be verified over the phone.

Your computer then records the IPs of the SafeX nodes with which it has established a trust relationship and securely records it to disk. A private key, used to encrypt all transactions and corresponding disk storage, could be uploaded to a Rio player or any such flash-memory based device for secure, portable storage and encrypted itself with a pass phrase. (This is actually the exact purpose for which I wrote the first piece of code to upload a file on the Rio to a computer.) The PGP protocol would ensure that the message was not only encrypted but also authenticated to come from a given node, preventing man-in-the-middle attacks and snooping. A quick "panic!" command from a given node would immediately erase all local transactions, flush the secure cache (more on this later), and inform all connected nodes to erase the transmitting node's information from their repositories.

Information on SafeX would be presented as a giant, hierarchical list of files and (anonymized or authenticated) discussions. You would provide your client with a list of files and/or directories that you wished to post on SafeX. The client would run a secure hash (e.g. MD5) on each file to fingerprint it and publish the filenames and hashes to all connected nodes. The other nodes would, in turn, broadcast their "accessible" lists (about to be explained!). The client then checks for hash collisions, sorts, and creates a concatenated list of all of the files that it can access, locally or through a connected client. This "accessible" list ("You can call me AL") is then rebroadcast to all listening hosts. Each computer stores locally an identifier that tells it which PC it heard about each file from. A rough metric of "network distance" is appropriately summed and attached to each item, thus allowing for a self-optimizing route through SafeX to the "closest" copy of a file.

Some of you might point out that this will involve the transmission of redundant information: this is true. However, it A) simplifies the protocol B) doesn't matter since there will simply be a hash collision and the duplicate will be discarded and C) makes it less prone to probabilistic analysis.

With regards to probabilistic analysis, some of you may point out that having lists of differing size would leak some information to an observer. This could be easily circumvented by first sending an encrypted broadcast request for the size of the list and then sending the encrypted request: the servers would pad their lists to be as long as the longest one, revealing nothing about who knew how much. Back to the system...

When a request for a given file is made, the request is encrypted with the public key of the machine that originally advertised the file's availability and broadcast to attached SafeX nodes. All of the nodes but the target discard the junk packet: the target node decrypts the packet and deciphers the request. If the file is local to the target, it feeds the data back to the client. If the data is not local to the target, the target repeats this "pseudo-broadcast" to the "nearest" node to the file. When the response is received it is conditionally cached (depending on user preferences) and retransmitted to the client.

It is quickly seen that such a protocol could potentially reveal the true transmitter of the file if the network was under strict supervision. The solution is for answering nodes to feed *all* attached nodes the encrypted information. Given a large number of attached nodes, the linearly scaling nature of this solution makes it relatively impractical to implement, save for on extremely high-speed networks (i.e., University ethernet and not 28.8!). The interesting alternative is to implement broadcast through multicast, if available. While this would certainly still not solve the problem of non-target nodes receiving an excessive number of packets that they would have to discard, it would significantly ease network and server usage.

Let's do a quick walk-through: 'A' downloads SafeX. 'A' claims 'B' and 'C' as friends. Both accept. 'C' is connected to 'D' and 'E' who both have massive file repositories. 'C,' 'D,' and 'E,' are all friends with 'F,' 'G,' and 'H.'

1) 'A' compiles a local list of files, computes their checksums, and transmits the list in separate transmissions to B and C -- encrypted with each node's respective keys.

2) B and C each append A's list to their own, adding the "network distance" from each to A and recording that the file was obtained from A.

3) B and C both respond to A by announcing the length of their lists.

4) A calculates the greater of the two and transmits a list request to B and C.

5) B and C send their lists to A, padding the result if necessary.

6) A appends B and C's list to its own, appropriately adding the network distance and noting which files came from who.

7) User on A selects a file in the list for downloading. This file resides on D.

8) A broadcasts a file request to both B and C, encrypted with C's key, since the file was in C's transmitted access list.

9) B ignores the packet and C retransmits the request to D, E, F, G, and H, but encrypted with D's key, since D registers the shortest network path to the file. (F could potentially get the file from D and then C could get it from F, but getting it straight from D will likely be fastest.)

10) D encrypts the file with C's key and transmits it (perhaps after a random amount of time between 1-10 minutes) to C, F, G, and H. A random amount of buffer is added to the file to prevent guessing at the contents from the length of transmission.

11) C decrypts the file and stores it locally. F, G, and H discard the unreadable packets. (They don't know who requested the file, or even what it was!)

12) C encrypts the file with A's key and transmits it to A, D, E, F, G, and H after a random amount of time and with a random amount of padding.

13) D, E, F, G, and H ignore the unreadable packets. A decrypts her file and stores it locally!

A simple variant on this could involve passing A's public key along with the request -- the originating server would have a second (internal) encryption with A's key. This would prevent caching but would prevent middle (even though admittedly trusted) stations from observing the unencrypted content.

Note that D only knows that C is requesting a file and doesn't know whether C is a proxy or a client. Note also that A doesn't know who C is getting the file from (although A might be able to deduce some information about this from the "network distance" metric -- while this parameter could be left out to increase (superanal!) security it might make managing the SafeX topology considerably more difficult).

As for bandwidth efficiency in the protocol...well, let's just say that without multicast it's quite difficult to have a bandwidth-efficient algorithm that can defeat probabilistic analysis of the network. And a key fact is brutally exposed: which computers are connected to SafeX and to which others. Other than dumping bytes randomly to random IPs (further increasing unnecessary bandwidth!) I'm not sure of a good way around this. Any very good & complete setup will make the network look like nothing but a hell of a lot of white noise: sort of like building a giant man-made waterfall so that one can stand under it and whisper without being heard.

Yes, for anybody who was wondering, I *am* in the middle of Cryptonomicon. A heartily recommended read for any of you. I hope the detail I have outlined in SafeX (no, I have not implemented it, partly for time and partly for moral issues) makes it clear that it is possible to create a system capable of quietly and anonymously serving a large quantity of files and messages. (I have some preliminary ideas for an anonymous and/or authenticated Instant Messaging system that sits on top of SafeX.) It is understood that there may be some significant shortcomings and holes and unthought-out portions to this concept, but the groundwork exists sufficient to prove such a project as being plausible and relatively bullet-proof. (If the RIAA thought they were being high-tech by looking sites up with Yahoo! and Network Solutions, they have seen nothing yet.)

The Response

It seems pretty well thought out. I'm wondering if you've read the paper Ron Rivest did on (I believe he called it) winnowing and chafing. Which, if I'm remembering correctly, would have a similar result but the bandwidth usage would be much smaller. The analogy would be instead of building a giant waterfall... you would make it look like there was a giant waterfall so you couldn't see the real trickle of data behind it.

The Reply

Thanks for the response and the pointer to Rivest's article (online at http://theory.lcs.mit.edu/~rivest/chaffing.txt -- found it with Yahoo!). It's a very interesting paper, but it would appear that chaffing & winnowing are quite different from SafeX. While SafeX does hope to make transmissions unreadable (via encryption) SafeX also tries to make it so that it is unclear even who is transmitting to who and to link an A→B transmission with a B→C transmission.

One thing that I left out of SafeX was the possibility for machines to also randomly send out completely junk packets, especially after any other packet has been received. Such a (semi-)continuous stream of packets would make it very difficult to discern when an actual SafeX packet was being sent, let alone to whom, let alone what the request might contain!

The time may come soon to start building SafeX. I don't think I'll put it together unless I believe there to be a dire need...

I think that with SafeX it should be possible to sufficiently mask data transfers by sending bogus packets to only ~3 other computers (not all attached nodes!) -- this would make for a 4x increase in data over what was required, but this is a reasonable goal for a high-speed network.

Note also with SafeX that the distributed, hierarchical caching built into the system delegates the job of serving files to the entire network, easing load on originators of content.

UPDATE: looks like FreeNet is implementing a subset of SafeX's features. Cool!

Historical Context

SafeX was a conceptual design for an anonymous peer-to-peer file sharing system that anticipated many features later seen in networks like Freenet, Tor, and BitTorrent. The design emphasized cryptographic privacy, plausible deniability, and resistance to traffic analysis - concepts that became increasingly important as governments and corporations began monitoring internet traffic.

The "network distance" metrics and hierarchical caching design prefigured many modern P2P optimization techniques. The reference to Cryptonomicon reflects the influence of Neal Stephenson's novel on thinking about cryptography and privacy in the early 2000s.

Design Document Only: This system was never implemented. The author noted concerns about "time and moral issues" regarding the creation of tools that could be used to circumvent copyright enforcement, demonstrating the ethical considerations faced by technologists during the early P2P era.