3 years ago · b80ff71fde
--- a/doc/Paper/Paper.tex
+++ b/doc/Paper/Paper.tex
@@ -18,20 +18,57 @@ solving problems germane to their application.
 
				 \end{abstract}
			
 
				 
			
 
				 \section{Introduction}
			
 
				-(Do I need this?)
			
 
				+The online services that users currently have access to are incredible. They handle all
			
 
				+the details of backing up user data and implementing access controls to facilitate
			
 
				+safe sharing. However, because these are closed systems, users are forced to trust that
			
 
				+the operators are benevolent, and they lack any real way of ensuring that the access
			
 
				+control they prescribe will actually be enforced. There have been several systems proposed
			
 
				+as an alternative to the conventional model, but these systems suffer from several shortcommings.
			
 
				+They either assume the need for cloud storage providers (Blockstack) or implement all operations
			
 
				+using a global blockchain, limiting performance (FileCoin). Blocktree takes a different approach.
			
 
				+
			
 
				+The idea behind blocktree is to organize a user's computers into a cooperative unit, called a
			
 
				+blocktree. The user is said to own the blocktree, and they wield soveriegn authority over it.
			
 
				+The artifact granting them this authority is the root private key for the blocktree. Measures for protecting
			
 
				+this key and delegating its authority are important design considerations of the system.
			
 
				+The owners of blocktrees are encouraged to collaborate with each to store data by
			
 
				+means of a cryptocurrency known as blockcoin. The blockchain implementing this cryptocurrency
			
 
				+is the source of global state for the system, and allows for the creation of global paths.
			
 
				+
			
 
				+All data stored in blocktree is contained in units called blocks. The blocks in a blocktree, of
			
 
				+course, form a tree. Each block has a path corresponding to its location in the tree. The first
			
 
				+component of a fully qualified block tree path is the fingerprint of the root public key of the
			
 
				+blocktree. Thus a blocktree path can globally specify a block. If a block is not a leaf,
			
 
				+then it is called a directory, and the data it contains is managed by the system,
			
 
				+including the list of blocks which are children of the block. In addition to its payload of data,
			
 
				+each block has a header containing cryptographic access control mechanisms. These mechanisms ensure
			
 
				+that only authorized users can read or optionally write to the block.
			
 
				+
			
 
				+Users and nodes in the blocktree system are identified by hashes of their public keys. These hashes
			
 
				+are referred to as principals, as they are the units used for setting access control policy.
			
 
				+
			
 
				+This remainder of this paper is as follows:
			
 
				+\begin{itemize}
			
 
				+\item A description of the operations of a single blocktree.
			
 
				+\item The definition of a blockchain which provides global state and links individual blocktrees
			
 
				+together.
			
 
				+\item The programming interface for interacting with blocktrees and sending messages.
			
 
				+\item An exploration of applications that could be written using this platform.
			
 
				+\item Conclusion.
			
 
				+\end{itemize}
			
 
				 
			
 
				 \section{An Individual Blocktree}
			
 
				-The atomic unit of data storage and privacy and authenticity guarantees is called a block. A
			
 
				+The atomic unit of data storage, confidentiality and authenticity is called a block. A
			
 
				 block contains a payload of data. Confidentiality of this data is achieved by encrypting it using 
			
 
				 a symmetric cipher using a random key. This random key is known as the block key.
			
 
				-The block key is encapsualated using
			
 
				-a public key cipher and the resulting cipher text is stored in the header of the block. Thus
			
 
				-only the person possessing the corresponding private key will be able to access the contents of
			
 
				+The block key is encapsualated using the public key of the principal whose is being given access.
			
 
				+The resulting cipher text is stored in the header of the block. Thus
			
 
				+the person possessing the corresponding private key will be able to access the contents of
			
 
				 the block. Blocks are arranged into trees, and the parent of the block also has a block key.
			
 
				 The child's block key is always encapsulated using the parent's key and stored in the block
			
 
				 header. This ensures that if a principal is given access to a block, they automatically have
			
 
				 access to every child of that block. The encapsulated block key is known as a read capability,
			
 
				-or readcap, as it grants the holder the ability to the block.
			
 
				+or readcap, as it grants the holder the ability to read the block.
			
 
				 
			
 
				 Authenticity guarantees are provided using a digital signature scheme. In order to change the
			
 
				 contents of a block a data structure called a write capability, or writecap, is needed. A
			
@@ -57,7 +94,7 @@ chain.
 
				 \item The principal corresponding to public key used to sign the last writecap in the chain,
			
 
				 is the owner of the blocktree.
			
 
				 \end{itemize}
			
 
				-The intiution behind these rules is that a writecap is only valid if there is a chain of trust
			
 
				+The intuition behind these rules is that a writecap is only valid if there is a chain of trust
			
 
				 that leads back to the owner of the block tree. The owner may delegate their trust to any number
			
 
				 of intermediaries by issuing them writecaps. These writecaps are scoped based on the path
			
 
				 specified when they are issued. These intermediaries can then delegate this trust as well.
			
@@ -75,9 +112,54 @@ the system to scale. When more than one node is attached to the same block they
 
				 Each node in the cluster contains a copy of the data that the cluster is reponsible for. They
			
 
				 maintain consistency of this data by running the Raft consensus protocol.
			
 
				 
			
 
				-TODO: Nodes have keys. These keys are signed by the root key, creating a tree of trust.
			
 
				+Every blocktree requires at least one node attached to the root block to function. The nodes
			
 
				+in the root block contain the user's private key. For security, it is highly recommended that
			
 
				+this key be stored in a Trusted Platform Module (TPM), and that the TPM be configured to disallow
			
 
				+unauthenticate key use. As it is envisioned for multiple nodes to run on a single computer,
			
 
				+thus sharing a single TPM, this last point is particularly important. Even though these nodes
			
 
				+contain the root key, they do not use it for most operations, and instead use the scheme described
			
 
				+in the next paragraph to obtain their own credentials.
			
 
				 
			
 
				-TODO: Symbolic links and readonly models (views?).
			
 
				+When a new node is created, it generates a new public-private key pair. The public key of this
			
 
				+node then needs to be transmitted to another node that's already part of the user's blocktree. The
			
 
				+mechanism used will depend on the nature of the device on which the node is running, and is
			
 
				+outside the scope of this description. For example, a phone could scan a QR code which contains
			
 
				+the IP address of the user's root node, and then transmit its public key to that internet host.
			
 
				+In order for the new node to be added to the user's blocktree, it needs to be issued a writecap
			
 
				+and the block where it will attach needs to have a readcap added.
			
 
				+This could be accomplished
			
 
				+by providing a user interface on the node which received the public key from the new node.
			
 
				+This interface would show the user the requests that have been received from new nodes attempting
			
 
				+to join their blocktree. The user can then choose to approve or deny the request, and can specify
			
 
				+the path where the node will attach. If the user chooses to approve the request, they are
			
 
				+prompted for the root password. This is used to send an authenticated signing request to the TPM on
			
 
				+the node containing the user's root key. If the password is correct, the TPM will sign the requested
			
 
				+data, producing a valid writecap, which the node can then send back to the new node.
			
 
				+
			
 
				+The ability to cope with key compromise is an important design consideration in any real-world
			
 
				+cryptosystem. In blocktree the compromise of a node key is handled by re-keying every block under
			
 
				+the block where the node was attached. Specifically, this means that a new block key is generated for
			
 
				+each block, and the readcap for the compromised node is removed. This ensures that new writes to
			
 
				+these blocks will not be visible to the holder of the compromised key. To ensure that writes
			
 
				+will not be allowed, the root directory contains a revokation list containing the public keys
			
 
				+which have been revoked by the tree. These only need to be maintained until their writecaps
			
 
				+expire, after which time the list can be cleaned. Note that if the root private key is compromised or lost,
			
 
				+then the blocktree must be abandoned, there is no recovery. This is real security, the artifact
			
 
				+which grants control over the blocktree is the root private key. That is why storing the root
			
 
				+private key in multiple secure cryptographic co-processors is so important.
			
 
				+
			
 
				+A concept that has proven to be very useful in the world of filesystems is the symbolic link.
			
 
				+This is a short file that contains the path to another file, and is interpreted by most programs
			
 
				+as being a "link" to that file. Blocktree supports a similar system, where a block can be
			
 
				+marked as a symbolic link when its body contains a blocktree path. This also provides us with
			
 
				+a convenient way of storing readcaps for data that a node would otherwise not have access to.
			
 
				+For instance a symbolic link could be created which points to a block in another user's blocktree.
			
 
				+The other user only knows the public key of the owner of our blocktree, so they issue
			
 
				+a readcap to it. But the root nodes, when given the user's password, can open this readcap and extract
			
 
				+the block key. This key can then be encapsulated using he public key of the node which
			
 
				+requires access, and placed in the symbolic link. When the node needs to read the data
			
 
				+in the block, it opens the readcap in the symbolic link, follows the link to the block (how
			
 
				+that actually happens will be discussed below) and decrypts its contents.
			
 
				 
			
 
				 While the consistency of an individual block can be maintained using Raft, in order to enable
			
 
				 transactions which span multiple blocks a distributed locking mechanism is employed. This is
			
@@ -131,7 +213,7 @@ remain, the original data can be recovered.
 
				 
			
 
				 Once these fragments have been computed an event is created for each one and published to the
			
 
				 blockchain. This event indicates to other nodes that this node wishes to store a fragment and
			
 
				-states the amount of blockcoin the node will pay to the first node that accepts the offer. When
			
 
				+states the amount of blockcoin it will pay and the frequency it will make these payments. When
			
 
				 another nodes wishes to accept the offer, it directly contacts the first node, who then sends 
			
 
				 it the fragment an publishes and event stating that the fragment is stored with the second 
			
 
				 node. This event includes the path of the block the fragment was computed from, the fragment's 
			
@@ -139,14 +221,35 @@ ID (the sequence number from the erasure code), and the principal of the node wh
 
				 Thus any other node in the network can use the information contained in these events to
			
 
				 determine the set of nodes which contain the fragments of any given path.
			
 
				 
			
 
				-In order for nodes to be able to conntact other nodes, a mechanism is required for associating
			
 
				+In order for the node which stored a fragment to receive its next payment, it has to pass
			
 
				+a time-bound challenge-response protocol initiated by the node that ownes the fragment.
			
 
				+The owning node select a leaf in the Merkel tree of the fragment and sends the index of
			
 
				+this leaf to the storing node. The storing node then walks the path from this leaf back to
			
 
				+the root of the Merkle tree, and updates a hash value using the data in each node it traverses.
			
 
				+It sends this result back to the owning node who then verifies that this value matches its
			
 
				+own computation. If it does then the owning node signs a message indicating that the challenge
			
 
				+passed and that the storing node should be paid. The storing node recives this message and uses
			
 
				+it to construct an event, which it signs and publishes to the blocktree. This event causes
			
 
				+the blockcoin amount specified to be withdrawn from the owning node's account and deposited
			
 
				+into storing nodes account.
			
 
				+
			
 
				+The fact that payments occur over time provides a simple incentive for nodes to be honest and
			
 
				+store the data they agree to. In banking terms, the storing node views the fragment as an
			
 
				+asset, it is a loan of its disk space which provides a series of payments over time.
			
 
				+On the other hand the owning node views the fragment as a liability, it requires payments to
			
 
				+be made over time. In order for a blocktree owner to remain solvent, it must balance its
			
 
				+liabitlies with its assets, incentivizing it to store data for others so that its own data
			
 
				+will be stored.
			
 
				+
			
 
				+In order for nodes to be able to contact other nodes, a mechanism is required for associating
			
 
				 an internet protocol (IP) address with a principal. This is done by having nodes publish events
			
 
				 to the blockchain when their IP address changes. This event includes their new IP address,
			
 
				 their public key, and a digital signature computed using their private key. Other nodes can
			
 
				 then verify this signature to ensure that an attacker cannot bind the wrong
			
 
				 IP address to a principal in order to receive messages it was not meant to have.
			
 
				 
			
 
				-While this event ledger is useful for appending new events, and ensuring that previous events
			
 
				+While this event ledger is useful for appending new 
			
 
				+events, and ensuring that previous events
			
 
				 cannot be changed, another data structure is required to ensure that queries on this data can
			
 
				 be performed efficiently. In particular, it's important to be able to quickly perform the
			
 
				 following queries:
			
@@ -248,12 +351,145 @@ is discussed. It's important to note that blocktree does not try to force all co
 
				 to be local to a user's device, it merely trys to enable this for applications where it
			
 
				 is possible.
			
 
				 
			
 
				-\subsection{A contacts application.}
			
 
				+\subsection{Contacts and Mail}
			
 
				+The first application we'll consider is one which manages a user's contacts. This would expose
			
 
				+the usual CRUD operations, allowing a user to input the name of a person they know and associate
			
 
				+that name with their public key. Once the principal of a person is known, then their public
			
 
				+key can be looked up in the global blocktree. This principal needs to be communicated to the
			
 
				+user via some out-of-band method. They could receive it in an email, a text message, or embedded
			
 
				+in a QR code. Of course this out-of-band communications needs to be authenticated, otherwise
			
 
				+it would be easy to fool the user into associating an attacker's key for the person.
			
 
				+
			
 
				+The user now has a way of associating a blocktree with the name of this person. However, the
			
 
				+root public key of this block tree is not enough to establish secure communications, because
			
 
				+the root private key is not avialable to every node in the person's blocktree. In particular
			
 
				+it would be inadvisable for the root private key to be stored on a user's mobile device,
			
 
				+so a message encrypted using the root public key would not be readable on this device. To
			
 
				+address this mailbox blocks are created.
			
 
				 
			
 
				-\subsection{A distributed social network.}
			
 
				+For each contact two blocks are created: the inbox and the outbox. The user creates a readcap
			
 
				+for the person and adds it to the outbox. The inbox is a symbolic link to the user's outbox in
			
 
				+the blocktree of the person. Thus each person can read messages sent to them using their readcap,
			
 
				+and they can write messages into their own blocktree where the other party knows how to find them.
			
 
				+Now to solve the problem outlined above, the person needs to give permission to a node in their
			
 
				+blocktree in order for it to read messages from the user. It does this by creating a new readcap
			
 
				+for the node, containing the block key in the readcap it was issued. It then stores that
			
 
				+readcap in the symbolic link in its blocktree (the inbox for the user). When the person uses
			
 
				+their mobile to read messages from the user, it looks at the union of the readcaps in the
			
 
				+symbolic link and the inbox. Once it finds the one for its principal, it decryptes the block key
			
 
				+and uses it to decypher the contents of the inbox.
			
 
				+
			
 
				+In addition to being able to check its inbox for messages, the person also receives a blocktree
			
 
				+message from the user when a new message is sent. This means that the person doesn't need to
			
 
				+constantly poll the inbox to see if it has new messages, it can be assured it will be
			
 
				+notified.
			
 
				+
			
 
				+\subsection{Social Network}
			
 
				+Building a social network on top of the contacts app is fairly straight-forward. Once
			
 
				+an contacts entry has been created for a person, most interactions between the user and that
			
 
				+person can be implemented via messages passed between their mailboxes. For example, when the
			
 
				+user sends a direct message to the person a message is placed in the outbox for that person
			
 
				+and a blocktree message is sent to the root cluster of their blocktree. If this message was
			
 
				+meant for a group of people it could placed in the outboxes of each one. These messages need
			
 
				+not be restricted to text, images, video, or any other kind of data could be included in them.
			
 
				+If a large amount of data is to be shared this way (for instance a video), then it makes sense
			
 
				+to only include the blocktree path to it in the notification messages sent to the recipients.
			
 
				+
			
 
				+This same mechanism can be used to implement status updates. When a user updates their status,
			
 
				+they send mail to each of their "friends". The social networking app running on each of the contacts'
			
 
				+devices will then display the latest status update from the user as their current status. This
			
 
				+setup allows the user to "unfriend" anyone by simply omitting them from the list of recipients
			
 
				+of these updates. To allow comments and likes on the status update to be visible to everyone
			
 
				+that it was shared with, the block created in the outbox of each of the friends
			
 
				+is a symbolic link to a single status update block. This symbolic link contains the
			
 
				+readcap for the status update block.
			
 
				+
			
 
				+Comments and likes on status updates are implemented by sending mail to the user who posted the
			
 
				+update. When one of the user's nodes recieves this mail, it then updates the block containing
			
 
				+the status update with the new comment, or increments the like counter (dislikes should not be supported IMHO).
			
 
				+It then sends a blocktree message to the people this status update was shared with notifiying them
			
 
				+that the data has changed.
			
 
				 
			
 
				 \subsection{An ecomerce website.}
			
 
				+The previous two examples show how to build decentralized versions of existing web applications,
			
 
				+but blocktree also excells at building traditional web applications. Imagine an ecommerce website,
			
 
				+with multiple warehouses, all of whose inventory is to be visible to customers of the site. Part
			
 
				+of the design consideration for the site is that the warehouses need to be able to update their
			
 
				+inventory even when their internet service goes down, and that these updates need to be visible
			
 
				+on the website once connectivity is restored.
			
 
				+
			
 
				+To accomplish this a designer could create a directory for each warehouse. This directory would have
			
 
				+nodes attached to it that are physically located at each warehouse. The inventory of the warehouse
			
 
				+is then maintained in the warehouse's directory, satisfying the first requirement. Now, in order to
			
 
				+enable efficient queries of the overall available inventory, the data from each of the warehouses
			
 
				+needs to be merged. This is accomplished by creating another directory containing the merged data.
			
 
				+In event sourcing terms this is called a read-model. This directory will have another cluster of nodes attached
			
 
				+which will act as web servers. These nodes will subscribe to events published by the warehouse
			
 
				+clusters, events indicating chaging levels of inventory, and digest this information into a format
			
 
				+that can be efficiently queried. When a warehouse goes offline, it previous levels of inventory are
			
 
				+still recorded in the read-model, so queries can still be answered using the cluster best knowledge
			
 
				+of the warehouse. Once the warehouse comes back online, then the events that were recorded by the
			
 
				+warehouse cluster while it was offline can be replayed to the web server cluster, and their read-model
			
 
				+can be brought up to date.
			
 
				+
			
 
				+One of the advantages of this approach is that the cluster of webservers need not be physically
			
 
				+close to each other. In fact they could be spread over several datacenters, or even in different
			
 
				+cloud providers. This allows for greater fault tolerance and reliability. Of course, running a
			
 
				+consenus cluster over a larger area means more latency and thus reduced performance, but if
			
 
				+the workload is read-heavy, and writes can be handled in the warehouse clusters, then this
			
 
				+tradeoff may be worthwhile.
			
 
				+
			
 
				+I hope this example shows that having a standard format for data and the federation of servers,
			
 
				+can provide designers with much greater flexibility, even if they do not care about decentralization
			
 
				+or their user's privacy.
			
 
				 
			
 
				 \subsection{The Open Metaverse}
			
 
				+As a final example I'd like to consider a platform for recording spacial information. The key insight
			
 
				+that enables this is very general: blocktree enables the creation of distributed tree-like data structures.
			
 
				+For instance, its straight forward to imagine creating a distributed hashtable implemented as a red-black tree.
			
 
				+This impedence match between efficient query structures and the design of blocktree is one
			
 
				+of the reasons why I believe it is so useful. The particular datastructure germane to building the metaverse
			
 
				+is the Binary Space Partition (BSP) tree.
			
 
				+
			
 
				+I don't see the metaverse as being one world, but rather many. Each world would be hosted by its own blocktree.
			
 
				+The world will have a directory in the blocktree, that directory will be the root of a BSP. Thinking of worlds
			
 
				+like ours, those that can be resonably approximated as the surface of a sphere, we can impose the usual latitude
			
 
				+and longitude coordinate system. We can then define parcels in the world by specifying the polygonal boundary as
			
 
				+a set of these coordinates (more preciesely, the parcel is the convex hull of this set of points). These parcels
			
 
				+are then recorded as blocks in the world's directory, whose path is determined by its virtual location. If a parcel
			
 
				+is owned by the same user who owns the world, then the data contained in the parcel is stored in
			
 
				+the same blocktree. However, if a parcel is to be owned by another user, then a symbollic link is created
			
 
				+pointing to a block in the owner's blocktree. They can then write whatever data they want into this block, defining
			
 
				+the contents of their parcel. Collaboration on a single parcel is accomplished by issuing a writecap to
			
 
				+another user. 
			
 
				+
			
 
				+It's easy to imagine that one world would be more important than the rest and that he creation of a metaverse 
			
 
				+representation of Earth will be an important undertaking. The hierarchical nature of permissions in blocktree
			
 
				+make such a shared world possible. National blocktrees could be given ownership of their virtual territory,
			
 
				+this would then
			
 
				+be delegated down to the state and municipal levels. Finally municipalities would delegate individual parcels
			
 
				+to their actual owners. Owners could then use these as they see fit, including leasing them to third parties.
			
 
				+The ending date of such a lease would be enforced technically by the writecap issued to the lessee; when it
			
 
				+expires so too does the lease.
			
 
				+
			
 
				+\section{Conclusion}
			
 
				+In this paper I have given the outline of a decentralized method of organizing information, computation,
			
 
				+and trust in a way that I hope will be useful and easy to use. The use of cryptographic primitives for
			
 
				+implementing access control were discussed, as well as methods of protecting private keys. A blockchain
			
 
				+and corresponding cryptocurrency was proposed as a means of incentivizing the distribution of data.
			
 
				+Erasure coding was used to ensure that distributed data could be reslient to the loss of nodes and
			
 
				+reconstructed efficiently. A programming environment based on WASM and WASI was proposed as a way
			
 
				+of providing an interface to this data. APIs for defining protocol contracts and efficient web servers
			
 
				+were indicated. APIs for constructing supervision trees were mentioned as a means for building reliable
			
 
				+systems.
			
 
				+
			
 
				+When Sir Tim Berners-Lee invented HTTP he could not have anticipated the applications
			
 
				+that his inventions would bring about. It has been said that the key to the success of his system is
			
 
				+that it made networking programming so easy that anyone could do it. I don't know what the future will
			
 
				+bring, but I hope that this system, or one like it, will enable anyone to fearlessly build distributed
			
 
				+systems.
			
 
				+One thing is certain however, it is a moral imperative that we provide users with viable alternatives
			
 
				+to online services which harvest their data and weaponize it against them. Only then will the web
			
 
				+become the place it was meant to be.
			
 
				 
			
 
				 \end{document}