+\chapter{System Overview}
+The development of the internet was undoudedtly one of the greatest achievements in the 20th 
+century, and the internet's killer app, the web, has reshaped our lives and the way we do
+business. But, for all the benefits we have received from these technologies there have
+corresponding costs. It's now possible to cheaply surveil entire popluations and discern their
+preferences and responses so that propaganda can be effectively created and distributed. The
+surveilence it not surepticious, it's quite overt. Consumers hand over this data willingly to
+tech companies because of the benefits they receive in return. But why should people be forced to
+choose between privacy and convenience?
+The cost of computing power, storage, and bandwidth are very cheap. A single board computer
+can provide more than enough computing power for the average web-browsing consumer. A classic rotating magnetic hard drive can hold terrabytes of data, more than enough to hold an individual's
+social media output. Umetered broadband internet access measured in hudreds of megabits per
+second is common, and becoming more so all the time. So with all these resources available,
+why is it that consumers do not more control over the computing infrastructure upon which
+they rely?
+The issue is that consumers' don't want to manage servers, networking, routing, and
+backup strategies. Each of these can be a full time job by itself. So in order for a consumer to
+be in control of their own infrastructure, the infrastructure needs to be able to take care of
+Perhaps as important as consumer agency over their computing infrastructure, is the security of
+that infrastructure. Time and time again consumer trust has been violated by data breaches and
+service disruptions. These incidents show the need for a more secure and standardized system for
+storing application data and building web services. Such a system must provide both
+confidentiality of consumer data, and the ability to authenticate consumers without the
+need for insecure techniques, such as passwords.
+This document proposes a potential solution. It describes a system for organizing information into
+trees of blocks, the distribution of those blocks over a network of nodes, and a programming interface
+to access this information. Because no one piece of hardware
+is infallible, the system also includes mechanisms for nodes to contract with one another to store
+data. This allows data to be backed up and later restored in the case a node is lost. In order to
+ensure the free exchange of data amongst nodes, a digital currency is used to account for the
+available storage capacity of the network.
+The remainder of this chapter will give an overview of the system, with the remainder of the
+document going into specific details of each of the system's components.
+User data stored in this system is organized into structures called \emph{block trees}. Every block tree
+is identified with a public key. The private key that corresponds to a block tree's public key is
+required to control that tree. Any person who has the private key for a block tree is called that
+tree's owner.
+Computers participating in this system are called \emph{nodes}. Nodes are also identified by public keys, but
+these keys are not directly tied to block trees. Nodes that have access to a block trees data are said
+to be \emph{attached} to that block tree. Nodes can be attached to multiple block trees at once, or none
+at all.
+Block trees are of course trees of \emph{blocks}. Every block is identified by a string called a \emph{path},
+which describes its location in the tree. The root of this path is a hash (hex encoded)
+of the tree's public key, allowing blocks from any tree to be referred to. A block consists of three segments:
+a header, a payload and a signature.
+The payload is encrypted by a symmetric cipher using a randomly generated key. This randomly
+generated key is called the \emph{block's key}. To allow access to the payload, the block's key is encapsulated
+using other keys and the resulting cipher texts are stored in the block's header. These encapculated keys
+are referred to as read capabilities, or \emph{read caps} for short.
+The root node of every block tree contains a read cap for the block tree's public key.
+Every non-root block contains a read cap
+for the block's parent, which is to say the block's key is encapsulated using the its parent's block key.
+So when one has a read cap for a block, they can read the data in all blocks descended from that
+block. Because the owner of a block tree has a read cap for the root block, they can read all data
+stored in the tree. Other people (or nodes) can be given access to a subtree by granting them a read
+cap for the subtree's root. A block which contains public data is stored as cleartext with no read caps.
+While read caps provide for confidentiality, write caps provide for integrity. A \emph{write cap}
+for a block is a certificate chain which terminates at a certificate signed by the block tree's
+owner. Thus a self-signed certificate made using the tree's private key is a valid write cap
+for any block in the tree. By allowing a chain of certificates to be used, it's possible for
+the owner to give other people or nodes the ability to write data into their tree. The scope of
+this access is controlled by specifying the path under which writing is allowed to the certificiate.
+A write cap for a block is only valid if the path of the block is a contained in the path
+specified in every certificate in the chain.
+Both the header and the payload of a Block are protected using a private key signature. The writer
+of the block computes this signature using the private key which corresponds to the write cap 
+for the block they're trying to write. In order to validate a block, this signature is validated, then
+the Write Cap is validated, and finally the hash of the public key of
+the last signer in the Write Cap chain is compared to the root of the Block's path. If these match,
+then the block is valid, as this means that an owner has given permission for the writer to write
+into their tree at this path.
+Accessing the data in a block requires several cryptographic operations, both for vaidation and
+for decryption. Because of this its important that blocks are relatively large, on the order of
+4 MB, to amortize the cost of these operations.
+By itself this block structure would be useful for building a secure filesystem, but in order to
+be a durable storage system we need an efficient way of distributing data for redundancy and
+availability. This is the purpose of fragments.
+Blocks are distributed amongst nodes in the network using a fountain code. The output symbols
+of this code are referred to as \emph{fragments}. A code with a high performance implementation and good
+coding efficiency is an important design consideration for the system. For these reasons the
+RaptorQ code was chosen.
+In order to preserve the data in a newly created block, a node will need to distribute
+fragments to other nodes. It does this by advertising its desire to trade [currency]
+in its block tree for the storage of these fragments. \emph{[currency]} is a fungible
+token for the exchange of computing resources between nodes. Every block tree has
+some non-negative value for the amount of [currency] it controls. Nodes that are attached
+to a tree spend the tree's [currency] when paying other nodes for the storage of fragments.
+If another node is interested in making the exchange, it contacts the advertising node
+and both sign a contract. A \emph{contract} is a data structure signed by both nodes which
+states that hash of the fragment being stored and the amount of [currency] being exchanged
+for its storage. The contract is then stored in the public block tree (to be discussed below),
+so that [currency] can be transerfed between nodes and to create an accountability mechanism
+to prevent the storing node from acting in bad faith and deleting the fragment.
+When a node needs to retreive a block that was previously distributed in fragments, it connects to a
+subset of nodes containing the fragments and downloads enough to reconstruct 
+the block. These downloads can be performed concurrently for greater speed. This same mechanism
+can be used to distribute public blocks to unaffiliated nodes. This mechanism facilitates load balancing
+and performance, as concurrent downloads
+spread the load over multiple nodes and are not limited by the bandwidth between any pair of nodes.
+The list of nodes containing the fragments of a block is called the block's \emph{node list}.
+A block's node list is stored in it's parent. This allows for any non-root block to be retreived.
+To allow the root block to be retrieved its node list is stored in the public block tree.
+\section{The Public Blocktree}
+\emph{The Public Block Tree} is a block tree which is known to all nodes. This is accomplished by
+providing all nodes with a hardcoded list of nodes that are attached to the public block tree.
+This is similar to the list of root DNS servers distributed with any networked operating system.
+Because the public block tree is only used for storing information that should be known to all
+nodes in the network, the payload of every block in it is cleartext. The public block tree serves
+only to facilitate the communication and exchange of data between nodes.
+One way that it does this is by containing a database of nodes and their IP addresses. A node
+which has a write cap to this database will only store an entry for a node if that node can provide
+a valid signed request. This signed request is stored in the database verbatim, so that other nodes can
+independently verify its validity. Thus the nodes in the network can use this database to securely resolve
+the IDs of other node's to their IP addresses.
+The other function of the public block tree is to contain a list of transactions and disputes.
+This list is referred to as the \emph{public log}.
+When a node is created, an event is logged detailing the amount of [currency] the node is worth.
+When a node is first attached to a block tree, this [currency] is then removed from the node
+and added to the block tree. When a node signs a contract with another node, it is stored in the
+log and [currency] is removed by the sending node's block tree and added to the receiving node's.
+In order to discourage nodes from receive payment for the storage of a fragment, then deleting
+the fragment to reclaim disk space, a reporting mechanism exists. If a node is unable to retrieve
+a fragment that it previously stored with another node, then it sends an event to the log
+indicating this. The other node can then respond by sending an event which contains the actual
+fragment which was requested. This allows all the nodes in the network to view the log and
+see if a node that they are considering signing a contract with is trustworthy. If they
+are not the defendant in any disputes, then they should be safe. If they are in one, but responded
+quickly with the fragment, then it could have been a transient network issue. If they never
+responded, then they are risky and should perhaps receive a lower payment for the storage
+of the fragment.
+Finally, the public block tree stores node lists for the root blocks of every block tree.
+This ensures that even if every node that participates in a block tree fails, the block
+tree can still be recovered from its fragments, provided its private key is known.
+\section{Nodes and the Network}
+Each node in the network has a public-private keypair. The string formed by hex encoding the
+hash of a node's public key is referred to as the \emph{node ID} of the node. When nodes
+are manufactured they are issued a certificate trusted by the
+public block tree. New nodes are claimed by issuing them a certificate and then writing
+that certificate into the public log. When a new node is claimed, currency is credited to
+the block tree which claimed it. This currency is to account for the storage capacity
+that the new node brings to that block tree. This mechanism is the reason why the node must have a
+certificate trusted by the public block tree, otherwise there would be no way to control the
+creation of currency.
+Nodes are identified by their node ID in the public block tree and in node lists. Nodes
+are responsible for updating their IP address in the public block tree whenever it changes.
+When a node is attached to a block tree it is issued a certificate containing a path
+under which its data will be stored. We say the the node is attached to the block tree at that path.
+The node which issues the node its certificate creates a read cap for it and stores it in the
+block where the node is attached.
+The data created by a node may optionally be replicated in its parent node. This would be suitable
+for a lightweight or mobile device which needs to ensure its data is replicated immediately and
+doesn't have time to negotiate contracts for the storage of fragments. For larger 
+block trees, having non-replicating nodes is essential for scalability.
+More than one node can be attached at the same path, and when this happens a \emph{cluster} is formed.
+Each node in the cluster stores
+copies of the same data and they coordinate with each other to ensure the consitency of this
+data. This is accomplished by electing a leader. All writes to blocks under the attachment point are
+sent to the leader. The leader then serializes these writes and sends them to the rest of the
+nodes. By default writes to blocks use optimistic concurrency, with the last write known to the
+leader being the winner. But if a node requires exclusive access to a block it can
+make a request to the leader to lock it. Writes from nodes other than the locking node are rejected until
+the lock is released. The leader will release the lock on its own if no messages are received from
+the locking node after a timeout.
+If the attachment point is configured to be replicated to its parent, then the leader will maintain
+a connection to the the leader in the parent cluster. Note that the parent cluster need not be
+housed in the parent block, just at some ancestor block. Then, writes will be propagated through
+this connection to the parent cluster, where this process may continue if that cluster is also
+configured for replication. Distributed locking is similarly comunicated to the parent cluster,
+where the lock is only aquired with the parent's approval.
+\section{Programmatic Access to Data}
+No designer can hope to envsion all the potential applications that a person would want to have
+access to their data. That's why an important component of the system is the ability to run
+programs that can access data and provide services to other internet hosts, whether they are
+blocktree nodes or not. This is accomplished by providing a WebAssembly based execution
+environment with a system interface based on WASI. Information in the blocktree is available
+to programs using standard filesystem system calls that specify paths in a special directory.
+While some programs may wish to access blocks directly in this manner, others may wish to use
+an API at a higher level of abstraction. Thus there is also an API for creating arbitrarily sized
+files that will get mapped to fixed sized blocks, freeing the programmer from having to implment
+this themselves.
+Data provided by these filesystem APIs will be the most up-to-date versions known to the node.
+There's the possiblity that conflicts with other nodes may cause writes made by programs on the
+node to be rolled back or overwritten. Of course locks can be taken on files and blocks if a
+program must ensure exclusive access to data. Finally, an inotify-like API is provided for programs to be notified when changes to blocks occur.
+An important consideration for the design of this system was to facilitate the creation of web
+servers and other types of internet hosts which can serve data stored in a blocktree. For this
+reason there is a high level callback based API for declaring HTTP handlers, as well as handlers
+for blocktree specific messages.
+In order to provide the consumer with control over how their data is used a permissions system
+exists to control which blocks and APIs programs have access to. For instance a consumer would
+have to grant special permission for a program to be able to access the Berkeley sockets API.
+Programs are installed by specifing a blocktree path. This is a secure and convenient method of
+distribution as these programs can be downloaded from the nodes associated with the root of their
+path and the downloaded blocks can be cryptographically verified to be trusted by the root 
+key. Authors wishing to distribute their programs in this manner will of course need to make the
+blocks containing them public (unencrypted), or else provide some mechanism for selective access.
+\chapter{Data Structures}
+\chapter{The Network}
+\chapter{Application Programming Interface}
+\chapter{Example Applications}

+\title{Blocktree \\
+\large A platform for distributed computing.}
+\author{Matthew Carr}
+\date{May 28, 2022}
+Blocktree is a platform which aims to make developing reliable distributed systems easy and safe. 
+It does this by defining a format for data stored in the system, a mechanism for replicating
+that data, and a programming interface for accessing it. It aims to solve the difficult problems
+of cryptographic key management, consensus, and global addressing so that user code can focus on
+solving problems germane to their application.
+\section{An Individual Blocktree}
+The atomic unit of data storage and privacy and authenticity guarantees is called a block. A
+block contains a payload of data. Confidentiality of this data is achieved by encrypting it using 
+a symmetric cipher using a random key. This random key is known as the block key.
+The block key is encapsualated using
+a public key cipher and the resulting cipher text is stored in the header of the block. Thus
+only the person possessing the corresponding private key will be able to access the contents of
+the block. Blocks are arranged into trees, and the parent of the block also has a block key.
+The child's block key is always encapsulated using the parent's key and stored in the block
+header. This ensures that if a principal is given access to a block, they automatically have
+access to every child of that block. The encapsulated block key is known as a read capability,
+or readcap, as it grants the holder the ability to the block.
+Authenticity guarantees are provided using a digital signature scheme. In order to change the
+contents of a block a data structure called a write capability, or writecap, is needed. A
+writecap is approximately an x509 certificate chain. A writecap contains the following data:
+\item The path the writecap can be used under.
+\item The principal that the writecap was issued to.
+\item The timestamp when the writecap expires.
+\item The public key of the principal who issued the writecap.
+\item A digital signature produced by the private key corresponding to the above public key.
+\item Optionally, the next writecap in the chain.
+The last item is only excluded in the case of a self-signed writecap, i.e. one that was signed by
+the same principal it was issued to. A writecap is considered valid for use on a block if all
+of the following conditions are true:
+\item The signature on every writecap in the chain is valid.
+\item The signing principal matches the principal the next writecap was issued to for every
+write cap in the chain.
+\item The path of the block is contained in the path of every writecap in the chain.
+\item The current timestamp is strictly less than the expiration of all the writecaps in the
+\item The principal corresponding to public key used to sign the last writecap in the chain,
+is the owner of the blocktree.
+The intiution behind these rules is that a writecap is only valid if there is a chain of trust
+that leads back to the owner of the block tree. The owner may delegate their trust to any number
+of intermediaries by issuing them writecaps. These writecaps are scoped based on the path
+specified when they are issued. These intermediaries can then delegate this trust as well.
+A block is considered valid if it contains a valid writecap, it was signed using the key
+corresponding to the first writecap's public key, and this signature is valid.
+Blocks are used for more than just orgnaizing data, they also organize computation. A program
+participating in the blocktree network is referred to as a node. Multiple nodes may be run on
+a single computer. Every node is attached to the blocktree at a specific path. This information
+is recorded in the block where the node is attached. A node is responsible for the storage of
+the block where it is attached and the blocks that are descended from this block, unless there
+is another node attached to a descendent block.
+In this way data storage can be delegated, allowing
+the system to scale. When more than one node is attached to the same block they form a cluster.
+Each node in the cluster contains a copy of the data that the cluster is reponsible for. They
+maintain consistency of this data by running the Raft consensus protocol.
+\section{Connecting Blocktrees}
+In order to allow nodes to access blocks in other blocktrees, a global ledger of events is used.
+This ledger is implemented using a proof of work (PoW) blockchain and a corresponding 
+cryptocurrency known as blockcoin. Nodes mine chain blocks (not to be confused with the tree 
+blocks we've been discussing up till now) in the same way they do in other PoW blockchain
+systems such as BitCoin. The node which manages to mine the next chain block receives a reward,
+which is the sum of the fees for each event in the chain and a variable amount of newly minted
+blockcoin. The amount of new blockcoin created by a chain block is directly proportional to the
+amount of data storage events contained in the chain block. Thus the total amount of blockcoin
+in circulation has a direct relationship to the amount of data stored in the system, reflecting
+the fact that blockcoin exists to provide and accounting mechanism for data stored in the system.
+When a node writes data to a tree block, and it wishes this block to be globally accessible, then
+it produces what are called fragments. Fragments are the output symbols from an Erasure Coding
+algorithm (such as the RaptorQ code). These algorithms are a class of fountain codes which have
+the property that only $m$ out of $n$ (where $m < n$) symbols are needed to reconstruct the
+original data. Such a code ensures that even if some of the fragments are lost, as long as $m$
+remain, the original data can be recovered.
+Once these fragments have been computed an event is created for each one and published to the
+blockchain. This event indicates to other nodes that this node wishes to store a fragment and
+states the amount of blockcoin the node will pay to the first node that accepts the offer. When
+another nodes wishes to accept the offer, it directly contacts the first node, who then sends 
+it the fragment an publishes and event stating that the fragment is stored with the second 
+node. This event includes the path of the block the fragment was computed from, the fragment's 
+ID (the sequence number from the erasure code), and the principal of the node which stored it.
+Thus any other node in the network can use the information contained in these events to
+determine the set of nodes which contain the fragments of any given path.
+In order for nodes to be able to conntact other nodes, a mechanism is required for associating
+an internet protocol (IP) address with a principal. This is done by having nodes publish events
+to the blockchain when their IP address changes. This event includes their new IP address,
+their public key, and a digital signature computed using their private key. Other nodes can
+then verify this signature to ensure that an attacker cannot bind the wrong
+IP address to a principal in order to receive messages it was not meant to have.
+While this event ledger is useful for appending new events, and ensuring that previous events
+cannot be changed, another data structure is required to ensure that queries on this data can
+be performed efficiently. In particular, it's important to be able to quickly perform the
+following queries:
+\item Find the set of nodes storing the fragments for a given path.
+\item Find the IP address of a node or owner given a principal.
+\item Find the public key associated with a principal.
+These are enabled by creating a read model from the data in the event ledger. One possible
+implementation of this read  model is the following SQL database.
+This database contains two tables:
+\item \emph{Fragments}: one column containing path, one for the fragment ID, and another for the principal. Indexed on the path column.
+\item \emph{Principals}: one column containing principal, one column for the public key, and
+	one column for the IP address, one columns for the amount blockcoin the principal has.
+	Indexed on the principal column.
+This index is built from the event ledger by iterating over it, and modifying the database
+appropriately for each event. For instance when a fragment stored event is encountered then
+a row is added to the \emph{Fragments} table containing the path of the block, the fragment ID,
+and the principal which were recorded in the event. The reader who is experienced with software
+patterns will recognize that this is an event sourced architecture.
+\section{Programming Interface}