3 years ago · 5d93068b73
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,6 @@
 
				+*.aux
			
 
				+*.log
			
 
				+*.out
			
 
				+*.pdf
			
 
				+*.synctex.gz
			
 
				+*.toc
			
--- a/README.md
+++ b/README.md
@@ -17,7 +17,7 @@
 
				    

			
 
				 *Node Definition

			
 
				   Describe the roles and responsibilities of nodes:

			
 
				-   -API provided to programs

			
 
				+   -They provide an API to programs

			
 
				    -Persitence of data locally and over the network.

			
 
				    -Forwarding BlockTree messages and HTTP requests to authorized Programs.

			
 
				    -Handling BlockTree messages from the network that are for the node

			
@@ -27,13 +27,20 @@
 
				    -Resolving node IDs to IP addresses.

			
 
				     

			
 
				 *Network Definition

			
 
				-  -Overlay network definition.

			
 
				   -Mechanism to resolve node IDs to IP addresses.

			
 
				   -Node hierarchy and leader election among peers.

			
 
				   -Distributed Block locking for exclusive writes.

			
 
				   -Opportunistic concurrency by default, with the rule

			
 
				    that the last write known to the leader is

			
 
				    the winner.

			
 
				+  -In order to participate in the network a node must have a valid TPM.

			
 
				+   The issuance of these TPMs is controlled to prevent inflation. When a

			
 
				+   TPM is first connected to the network it is claimed by the party who

			
 
				+   first registers it. The currency associated with that TPM is then transfered

			
 
				+   to the party.

			
 
				+  -Nodes store fragments for other nodes when they are paid. Two nodes create a contract

			
 
				+   store it in the public blocktree when they agree to do this.

			
 
				+  -Nodes can also create contracts to forward traffic for each other.

			
 
				 

			
 
				 *Program API

			
 
				   -Provide file creation API. Blocks can be created which are

			
--- a/doc/paper/BlockTree.tex
+++ b/doc/paper/BlockTree.tex
@@ -0,0 +1,194 @@
 
				+\documentclass{book}
			
 
				+\usepackage{amsfonts,amssymb,amsmath,amsthm}
			
 
				+\usepackage[scale=0.80]{geometry}
			
 
				+\usepackage{hyperref}
			
 
				+
			
 
				+\begin{document}
			
 
				+\tableofcontents
			
 
				+\chapter{System Overview}
			
 
				+The development of the internet was undoudedtly one of the greatest achievements in the 20th 
			
 
				+century, and the internet's killer app, the web, has reshaped our lives and the way we do
			
 
				+business. But, for all the benefits we have received from these technologies there have
			
 
				+corresponding costs. It's now possible to cheaply surveil entire popluations and discern their
			
 
				+preferences and responses so that propaganda can be effectively created and distributed. The
			
 
				+surveilence it not surepticious, it's quite overt. Consumers hand over this data willingly to
			
 
				+tech companies because of the benefits they receive in return. But why should people be forced to
			
 
				+choose between privacy and convenience?
			
 
				+
			
 
				+The cost of computing power, storage, and bandwidth are not very cheap. A single board computer
			
 
				+can provide more than enough computing power for the average web-browsing consumer. A classic rotating magnetic hard drive can hold terrabytes of data, more than enough to hold an individuals
			
 
				+social media output. Broadband internet access is not universal, but it is very common, and
			
 
				+becoming more so all the time. So with all these resources available, why is it that consumers do
			
 
				+not more control over the computing infrastructure upon which they rely?
			
 
				+
			
 
				+The answer is of course obvious. Consumers' don't want to manage servers, networking, routing, and
			
 
				+backup strategies. Each of these can be a full time job by iteself, so in order for a consumer to
			
 
				+be in control of their own infrastructure, the infrastructure needs to be able to take care of
			
 
				+itself.
			
 
				+
			
 
				+Perhaps as important as consumer agency over their computing infrastructure, is the security of
			
 
				+that infrastructure. Time and time again consumer trust has been violated by data breaches and
			
 
				+service disruptions. These incidents show the need for a more secure and standardized system for
			
 
				+storing application data and building web services. Such a system must provide both
			
 
				+confidentiality of consumer data, and the ability to authenticate consumers without the
			
 
				+need for insecure techniques, such as passwords.
			
 
				+
			
 
				+This document proposes a potential solution. It describes a system for organizing information into
			
 
				+trees of blocks, distributing those blocks over a network of nodes, and a programming interface
			
 
				+to access this information in a convenient way for a programmer. Because no one piece of hardware
			
 
				+is infallible, the system also includes mechanisms for nodes to contract with one another to store
			
 
				+data. This allows data to be backed up and later restored in the case a node is lost. In order to
			
 
				+ensure the free exchange of data amongst nodes, a currency will be used to account for the
			
 
				+available computing capacity of the network.
			
 
				+
			
 
				+The remainder of this chapter will give an overview of the different system areas.
			
 
				+
			
 
				+\section{Blocks}
			
 
				+The basis of all trust in the system is a public-private keypair. The secrecy of the private key
			
 
				+is the linchpin of all security in the system. If this key is compromised then all confidentiality
			
 
				+and authenticity assurances go out the window. Further, if this key is lost, then control of the
			
 
				+resources over which it has agency over is also lost.
			
 
				+
			
 
				+Because of this, it's very important to take protect this key by storing it in a secure location.
			
 
				+It is envisioned that a TPM will be used for this purpose. A TPM will be important for other
			
 
				+system features, as we'll see later.
			
 
				+
			
 
				+All data stored in the system is put into data structures called blocks. Each block has a path
			
 
				+which describes it's location in the tree. The root of this path is always a hash (hex encoded)
			
 
				+of the public key corresponding to the private key which is the root of trust for the tree. The
			
 
				+root block is encrypted by a symmetric cipher using a randomly generated key, which is referred
			
 
				+to as the block key. This key is then
			
 
				+encrypted using the root public key and added to the block. This is a standard key encapsulation
			
 
				+scheme. For all non-root blocks in the tree, their block key is encrypted using the block key of
			
 
				+their parent, and then stored in the block. This ensures that we can allow a party access to all
			
 
				+blocks in a subtree by simply encrypting the block key at the root of that subtree using the
			
 
				+party's public key. This mechanism will be used later to give node's that are controlled by the
			
 
				+root selective access to data stored in the tree.
			
 
				+
			
 
				+Integrity assurance of the block's content is assured by a digital signature. A certificate chain
			
 
				+starting with the root key and ending with the key used by make this signature is included with 
			
 
				+the block. In this way any node which has been issued a certificate using the root key is able to
			
 
				+write data into the tree. All data in the block is covered by the signature (except the signature itself of course). In particular the path of the block is covered, which means that all
			
 
				+information needed to verify the authenticity of a block is included in it.
			
 
				+
			
 
				+The size of each block has yet to be determined. It's envisioned that they will be fairly large,
			
 
				+on the order of 4 MB, so as to amortize the overhead of storing a certificate chain, and 
			
 
				+encapsulated keys as well as the cost of cryptographic operations.
			
 
				+
			
 
				+\section{Fragments}
			
 
				+By itself this block structure would be useful for building a secure filesystem, but in order to
			
 
				+be a durable storage system it needs an efficient way of backing up, or distributing data. This
			
 
				+is the purpose of fragments.
			
 
				+
			
 
				+Blocks are distributed amongst nodes in the network by using a fountain code. The output symbols
			
 
				+of this code are referred to as fragments. A code with a high performance implementation and good
			
 
				+coding efficiency is an important design consideration for the system. For these reasons it's
			
 
				+envisioned that the RaptorQ code will be used.
			
 
				+
			
 
				+After a block is created the creating node will need to distribute the data in the block to other
			
 
				+nodes to ensure it's persistence in case it fails. It will create fragments as needed and solicit
			
 
				+other nodes for their storage. Currency controlled by the root key will be exchanged with these
			
 
				+other nodes in exchange for contracts to store the fragments.
			
 
				+
			
 
				+When a node needs to rebuild data that was previously distributed in fragments, it conects to a
			
 
				+subset of nodes containing fragments and downloads in parallel enough fragments to reconstruct the
			
 
				+block. This same mechanism can be used to distribute block data to unaffiliated nodes in the
			
 
				+network. It is a convenient load balancing and performance improvement, as the parallel downloads
			
 
				+spread the load over multiple nodes and are not limited by the bandwidth between any pair.
			
 
				+
			
 
				+Much attention will be given to the system used to keep track of which node has the fragments
			
 
				+of a given block, and how nodes contract with each other for the storage and distribution of
			
 
				+fragments.
			
 
				+
			
 
				+\section{Nodes and the Network}
			
 
				+Each node in the network is identified by a public-private keypair and is issued a certificate
			
 
				+trusted by the public root key. Nodes can be claimed by issuing them a certificate and
			
 
				+then writing
			
 
				+that certificate into the public blocktree. When a new node is claimed, currency is deposited into
			
 
				+the account of the root key which claimed it. This currency is to account for the storage capacity
			
 
				+that the new node brings to the network. This mechanism is the reason why the node must have a
			
 
				+certificate trusted by the public root key, otherwise there would be no way to control the
			
 
				+creation of currency.
			
 
				+
			
 
				+Nodes are identified by the hex encouding of the hash of their public key. This string is written
			
 
				+into directory blocks to keep track of which nodes contain fragments of blocks in that directory.
			
 
				+In order for this information to be useful, a mechanism is needed to resolve node IDs to IP
			
 
				+addresses. This is accomplished by writing a block into the public blocktree with the node's 
			
 
				+IP address which is signed by the node's private key. Anytime the node receives a new IP address,
			
 
				+it updates this block to inform the network of this change. Because the node's ID is derived from
			
 
				+its private key, and the block containing its IP address is signed with this key, it's possible
			
 
				+for a third party to independently verify that this IP address is authentic.
			
 
				+
			
 
				+When a node is given access to a blocktree, by issuing it a certificate, it is assigned a path
			
 
				+under which its data will be stored. This path is referred to as the node's home. This path is
			
 
				+written into the node's certificate. Thus a node's home is cryptographically verifiable and must
			
 
				+be chosen when the node joins the blocktree. The node which issues the new node it's certificate
			
 
				+grants the new node access to the block at its home path by encapsulating the block's key using
			
 
				+the node's public key. Thus the new node can recover the block key and read the contents of its
			
 
				+home block. If a node has its home at a block then its ID is written into the block.
			
 
				+
			
 
				+The data created by a node may optionally replicated to its parent node. This would be suitable
			
 
				+for a lightweight or mobile device which needs to ensure its data is replicated immediately and
			
 
				+doesn't have time to negotiate contracts for the storage fragments. However, for larger 
			
 
				+blocktrees, having non-replicating nodes is essential for scalability.
			
 
				+
			
 
				+More than one node can be housed at the same path, such nodes are called a cluster.
			
 
				+Each node in the cluster stores
			
 
				+copies of the same data and they coordinate with each other to ensure the consitency of this
			
 
				+data. This is accomplished by electing a leader. All writes to blocks under the nodes home are
			
 
				+sent to the leader. The leader then serializes these writes and sends them to the rest of the
			
 
				+nodes. By default writes to blocks use optimistic concurrency, with the last write known to the
			
 
				+leader being the winner. But if a node requires exclusive access to the data in a block it can
			
 
				+request to the leader to lock it. Writes from nodes other than the locking node are rejected until
			
 
				+the lock is released. The leader will release the lock on its own if no messages are received from
			
 
				+the locking node after a timeout.
			
 
				+
			
 
				+If a path is configured to be replicated to its parent, then the leader at that path will maintain
			
 
				+a connection to the the leader in a parent cluster. Note that the parent cluster need not be
			
 
				+housed in the parent block, just at some ancestor block. Then, writes will be propagated through
			
 
				+this connection to the parent cluster, where this process may continue if that cluster is also
			
 
				+configured for replication. Distributed locking is similarly comunicated to the parent cluster,
			
 
				+where the lock is only aquired with the parent's approval.
			
 
				+
			
 
				+\section{Programmatic Access to Data}
			
 
				+No designer can hope to envsion all the potential applications that a consumer would want to have
			
 
				+access to their data. That's why an important component of the system is the ability to run
			
 
				+programs that can access data and provide services to other internet hosts, whether they are
			
 
				+blocktree nodes or not. This is accomplished by providing a WebAssembly based execution
			
 
				+environment with a system interface based on WASI. Information in the blocktree is available
			
 
				+to programs using standard filesystem system calls that specify paths in a special directory.
			
 
				+While some programs may wish to access blocks directly in this manner, others may wish to use
			
 
				+an API at a higher level of abstraction. Thus there is also an API for creating arbitrarily sized
			
 
				+files that will get mapped to fixed sized blocks, freeing the programmer from having to implment
			
 
				+this themselves.
			
 
				+
			
 
				+Data provided by these filesystem APIs will be the most up-to-date versions known to the node.
			
 
				+There's the possiblity that conflicts with other nodes may cause writes made by programs on the
			
 
				+node to be rolled back or overwritten. Of course locks can be taken on files and blocks if a
			
 
				+program must ensure exclusive access to data. Finally, an inotify-like API is provided for programs to be notified when changes to blocks occur.
			
 
				+
			
 
				+An important consideration for the design of this system was to facilitate the creation of web
			
 
				+servers and other types of internet hosts which can serve data stored in a blocktree. For this
			
 
				+reason there is a high level callback based API for declaring HTTP handlers, as well as handlers
			
 
				+for blocktree specific messages.
			
 
				+
			
 
				+In order to provide the consumer with control over how their data is used a permissions system
			
 
				+exists to control which blocks and APIs programs have access to. For instance a consumer would
			
 
				+have to grant special permission for a program to be able to access the Berkeley sockets API.
			
 
				+
			
 
				+Programs are installed by specifing a blocktree path. This is a secure and convenient method of
			
 
				+distribution as these programs can be downloaded from the nodes associated with the root of their
			
 
				+path and the downloaded blocks can be cryptographically verified to be trusted by the root 
			
 
				+key. Authors wishing to distribute their programs in this manner will of course need to make the
			
 
				+blocks containing them public (unencrypted), or else provide some mechanism for selective access.
			
 
				+
			
 
				+\chapter{Data Structures}
			
 
				+
			
 
				+\chapter{Nodes}
			
 
				+
			
 
				+\chapter{The Network}
			
 
				+
			
 
				+\chapter{Application Programming Interface}
			
 
				+
			
 
				+\chapter{Example Applications}
			
 
				+\end{document}