|
@@ -1,11 +1,12 @@
|
|
|
\documentclass{book}
|
|
|
\usepackage{amsfonts,amssymb,amsmath,amsthm}
|
|
|
-\usepackage[scale=0.80]{geometry}
|
|
|
+\usepackage[scale=0.75]{geometry}
|
|
|
\usepackage{hyperref}
|
|
|
|
|
|
\begin{document}
|
|
|
\tableofcontents
|
|
|
\chapter{System Overview}
|
|
|
+% I should replace "consumer" with "user".
|
|
|
The development of the internet was undoudedtly one of the greatest achievements in the 20th
|
|
|
century, and the internet's killer app, the web, has reshaped our lives and the way we do
|
|
|
business. But, for all the benefits we have received from these technologies there have
|
|
@@ -35,8 +36,8 @@ confidentiality of consumer data, and the ability to authenticate consumers with
|
|
|
need for insecure techniques, such as passwords.
|
|
|
|
|
|
This document proposes a potential solution. It describes a system for organizing information into
|
|
|
-trees of blocks, distributing those blocks over a network of nodes, and a programming interface
|
|
|
-to access this information in a convenient way. Because no one piece of hardware
|
|
|
+trees of blocks, the distribution of those blocks over a network of nodes, and a programming interface
|
|
|
+to access this information. Because no one piece of hardware
|
|
|
is infallible, the system also includes mechanisms for nodes to contract with one another to store
|
|
|
data. This allows data to be backed up and later restored in the case a node is lost. In order to
|
|
|
ensure the free exchange of data amongst nodes, a digital currency is used to account for the
|
|
@@ -46,126 +47,169 @@ The remainder of this chapter will give an overview of the system, with the rema
|
|
|
document going into specific details of each of the system's components.
|
|
|
|
|
|
\section{Blocks}
|
|
|
-The basis of all trust in the system is a public-private keypair. The secrecy of the private key
|
|
|
-is the linchpin of all security in the system. If this key is compromised then all confidentiality
|
|
|
-and authenticity assurances are void. Further, if this key is lost, then control of the
|
|
|
-resources over which it has agency is also lost.
|
|
|
-
|
|
|
-% Should I remove this paragraph? Seems like this is an implementation details that should
|
|
|
-% be saved for later.
|
|
|
-Because of this, it's very important to protect this key by storing it in a secure location.
|
|
|
-It is envisioned that a TPM will be used for this purpose. A TPM will be important for other
|
|
|
-system features, as we'll see later.
|
|
|
-
|
|
|
-All data stored in the system is put into data structures called blocks. Each block has a path
|
|
|
-which describes it's location in the tree. The root of this path is always a hash (hex encoded)
|
|
|
-of the public key corresponding to the private key which is the root of trust for the tree. Each block is encrypted by a symmetric cipher using a randomly generated key, which is referred
|
|
|
-to as the block's key. The block key for the root block is
|
|
|
-encrypted using the root public key and the resulting cipher text is stored in the block itself,
|
|
|
-along with a hash of the public key for later identification. For all non-root blocks in the tree,
|
|
|
-their block key is encrypted using the block key of
|
|
|
-their parent, and the resulting ciphertext is stored in the block.
|
|
|
-This ensures that we can allow a party access to all
|
|
|
-blocks in a subtree by simply encrypting the block key at the root of that subtree using the
|
|
|
-party's public key. This mechanism is used to give node's that are controlled by the
|
|
|
-root selective access to data stored in the tree, by encrypting the block key of the root of the
|
|
|
-subtree using the node's public key.
|
|
|
-
|
|
|
-Integrity assurance of the block's content is achieved by a digital signature which covers all of
|
|
|
-the block's contents (except the signature itself of course). A certificate chain
|
|
|
-starting with the root key and ending with the key used by make this signature is included with
|
|
|
-the block. This ensures that any node which has been issued a certificate using the root key is
|
|
|
-able to write data into the tree. In particular the path of the block is covered, and since the
|
|
|
-path includes a hash of the root key, this means that anyone is able to independently verify
|
|
|
-the authenticity of the block by checking that the certificate chain was indeed signed by
|
|
|
-the root, that all other signatures in the chain are valid, and finally that the signature on
|
|
|
-the block itself is valid.
|
|
|
-
|
|
|
-The size of each block has yet to be determined. It's envisioned that they will be fairly large,
|
|
|
-on the order of 4 MB, so as to amortize the overhead of storing a certificate chain, and
|
|
|
-encapsulated keys as well as the cost of cryptographic operations.
|
|
|
+User data stored in this system is organized into structures called \emph{block trees}. Every block tree
|
|
|
+is identified with a public key. The private key that corresponds to a block tree's public key is
|
|
|
+required to control that tree. Any person who has the private key for a block tree is called that
|
|
|
+tree's owner.
|
|
|
+
|
|
|
+Computers participating in this system are called \emph{nodes}. Nodes are also identified by public keys, but
|
|
|
+these keys are not directly tied to block trees. Nodes that have access to a block trees data are said
|
|
|
+to be \emph{attached} to that block tree. Nodes can be attached to multiple block trees at once, or none
|
|
|
+at all.
|
|
|
+
|
|
|
+Block trees are of course trees of \emph{blocks}. Every block is identified by a string called a \emph{path},
|
|
|
+which describes its location in the tree. The root of this path is a hash (hex encoded)
|
|
|
+of the tree's public key, allowing blocks from any tree to be referred to. A block consists of three segments:
|
|
|
+a header, a payload and a signature.
|
|
|
+
|
|
|
+The payload is encrypted by a symmetric cipher using a randomly generated key. This randomly
|
|
|
+generated key is called the \emph{block's key}. To allow access to the payload, the block's key is encapsulated
|
|
|
+using other keys and the resulting cipher texts are stored in the block's header. These encapculated keys
|
|
|
+are referred to as read capabilities, or \emph{read caps} for short.
|
|
|
+The root node of every block tree contains a read cap for the block tree's public key.
|
|
|
+Every non-root block contains a read cap
|
|
|
+for the block's parent, which is to say the block's key is encapsulated using the its parent's block key.
|
|
|
+So when one has a read cap for a block, they can read the data in all blocks descended from that
|
|
|
+block. Because the owner of a block tree has a read cap for the root block, they can read all data
|
|
|
+stored in the tree. Other people (or nodes) can be given access to a subtree by granting them a read
|
|
|
+cap for the subtree's root. A block which contains public data is stored as cleartext with no read caps.
|
|
|
+
|
|
|
+While read caps provide for confidentiality, write caps provide for integrity. A \emph{write cap}
|
|
|
+for a block is a certificate chain which terminates at a certificate signed by the block tree's
|
|
|
+owner. Thus a self-signed certificate made using the tree's private key is a valid write cap
|
|
|
+for any block in the tree. By allowing a chain of certificates to be used, it's possible for
|
|
|
+the owner to give other people or nodes the ability to write data into their tree. The scope of
|
|
|
+this access is controlled by specifying the path under which writing is allowed to the certificiate.
|
|
|
+A write cap for a block is only valid if the path of the block is a contained in the path
|
|
|
+specified in every certificate in the chain.
|
|
|
+
|
|
|
+Both the header and the payload of a Block are protected using a private key signature. The writer
|
|
|
+of the block computes this signature using the private key which corresponds to the write cap
|
|
|
+for the block they're trying to write. In order to validate a block, this signature is validated, then
|
|
|
+the Write Cap is validated, and finally the hash of the public key of
|
|
|
+the last signer in the Write Cap chain is compared to the root of the Block's path. If these match,
|
|
|
+then the block is valid, as this means that an owner has given permission for the writer to write
|
|
|
+into their tree at this path.
|
|
|
+
|
|
|
+Accessing the data in a block requires several cryptographic operations, both for vaidation and
|
|
|
+for decryption. Because of this its important that blocks are relatively large, on the order of
|
|
|
+4 MB, to amortize the cost of these operations.
|
|
|
|
|
|
\section{Fragments}
|
|
|
By itself this block structure would be useful for building a secure filesystem, but in order to
|
|
|
-be a durable storage system it needs an efficient way of backing up, or rather distributing data.
|
|
|
-This is the purpose of fragments.
|
|
|
-
|
|
|
-Blocks are distributed amongst nodes in the network by using a fountain code. The output symbols
|
|
|
-of this code are referred to as fragments. A code with a high performance implementation and good
|
|
|
-coding efficiency is an important design consideration for the system. For these reasons it's
|
|
|
-envisioned that the RaptorQ code will be used.
|
|
|
-
|
|
|
-After a block is created the creating node will need to distribute the data in the block to other
|
|
|
-nodes to ensure its persistence in case the node fails. It will create fragments as needed
|
|
|
-and advertise to other node's its desire to store them. Currency controlled by the root key is
|
|
|
-exchanged with these other nodes in exchange for contracts to store the fragments.
|
|
|
-
|
|
|
-When a node needs to rebuild data that was previously distributed in fragments, it connects to a
|
|
|
-subset of nodes containing fragments and, in parallel, downloads enough fragments to reconstruct
|
|
|
-the block. This same mechanism can be used to distribute block data to unaffiliated nodes in the
|
|
|
-network. It is a convenient load balancing and performance improvement, as the parallel downloads
|
|
|
-spread the load over multiple nodes and are not limited by the bandwidth between any pair.
|
|
|
-
|
|
|
-We keep track of which nodes hold fragments of a block by storing the IDs of these nodes in the
|
|
|
-block's parent. This list of node IDs can then be resolved to a list of IP addresses by looking
|
|
|
-up data in a shared data structure called the Public Blocktree.
|
|
|
+be a durable storage system we need an efficient way of distributing data for redundancy and
|
|
|
+availability. This is the purpose of fragments.
|
|
|
+
|
|
|
+Blocks are distributed amongst nodes in the network using a fountain code. The output symbols
|
|
|
+of this code are referred to as \emph{fragments}. A code with a high performance implementation and good
|
|
|
+coding efficiency is an important design consideration for the system. For these reasons the
|
|
|
+RaptorQ code was chosen.
|
|
|
+
|
|
|
+In order to preserve the data in a newly created block, a node will need to distribute
|
|
|
+fragments to other nodes. It does this by advertising its desire to trade [currency]
|
|
|
+in its block tree for the storage of these fragments. \emph{[currency]} is a fungible
|
|
|
+token for the exchange of computing resources between nodes. Every block tree has
|
|
|
+some non-negative value for the amount of [currency] it controls. Nodes that are attached
|
|
|
+to a tree spend the tree's [currency] when paying other nodes for the storage of fragments.
|
|
|
+
|
|
|
+If another node is interested in making the exchange, it contacts the advertising node
|
|
|
+and both sign a contract. A \emph{contract} is a data structure signed by both nodes which
|
|
|
+states that hash of the fragment being stored and the amount of [currency] being exchanged
|
|
|
+for its storage. The contract is then stored in the public block tree (to be discussed below),
|
|
|
+so that [currency] can be transerfed between nodes and to create an accountability mechanism
|
|
|
+to prevent the storing node from acting in bad faith and deleting the fragment.
|
|
|
+
|
|
|
+When a node needs to retreive a block that was previously distributed in fragments, it connects to a
|
|
|
+subset of nodes containing the fragments and downloads enough to reconstruct
|
|
|
+the block. These downloads can be performed concurrently for greater speed. This same mechanism
|
|
|
+can be used to distribute public blocks to unaffiliated nodes. This mechanism facilitates load balancing
|
|
|
+and performance, as concurrent downloads
|
|
|
+spread the load over multiple nodes and are not limited by the bandwidth between any pair of nodes.
|
|
|
+
|
|
|
+The list of nodes containing the fragments of a block is called the block's \emph{node list}.
|
|
|
+A block's node list is stored in it's parent. This allows for any non-root block to be retreived.
|
|
|
+To allow the root block to be retrieved its node list is stored in the public block tree.
|
|
|
|
|
|
\section{The Public Blocktree}
|
|
|
-The Public blocktree is just another block tree, but one which is controlled by a distinguished
|
|
|
-private key, whose public key is hard-coded into the other node's in the network. This blocktree
|
|
|
+\emph{The Public Block Tree} is a block tree which is known to all nodes. This is accomplished by
|
|
|
+providing all nodes with a hardcoded list of nodes that are attached to the public block tree.
|
|
|
+This is similar to the list of root DNS servers distributed with any networked operating system.
|
|
|
+Because the public block tree is only used for storing information that should be known to all
|
|
|
+nodes in the network, the payload of every block in it is cleartext. The public block tree serves
|
|
|
+only to facilitate the communication and exchange of data between nodes.
|
|
|
+
|
|
|
+One way that it does this is by containing a database of nodes and their IP addresses. A node
|
|
|
+which has a write cap to this database will only store an entry for a node if that node can provide
|
|
|
+a valid signed request. This signed request is stored in the database verbatim, so that other nodes can
|
|
|
+independently verify its validity. Thus the nodes in the network can use this database to securely resolve
|
|
|
+the IDs of other node's to their IP addresses.
|
|
|
+
|
|
|
+The other function of the public block tree is to contain a list of transactions and disputes.
|
|
|
+This list is referred to as the \emph{public log}.
|
|
|
+When a node is created, an event is logged detailing the amount of [currency] the node is worth.
|
|
|
+When a node is first attached to a block tree, this [currency] is then removed from the node
|
|
|
+and added to the block tree. When a node signs a contract with another node, it is stored in the
|
|
|
+log and [currency] is removed by the sending node's block tree and added to the receiving node's.
|
|
|
+
|
|
|
+In order to discourage nodes from receive payment for the storage of a fragment, then deleting
|
|
|
+the fragment to reclaim disk space, a reporting mechanism exists. If a node is unable to retrieve
|
|
|
+a fragment that it previously stored with another node, then it sends an event to the log
|
|
|
+indicating this. The other node can then respond by sending an event which contains the actual
|
|
|
+fragment which was requested. This allows all the nodes in the network to view the log and
|
|
|
+see if a node that they are considering signing a contract with is trustworthy. If they
|
|
|
+are not the defendant in any disputes, then they should be safe. If they are in one, but responded
|
|
|
+quickly with the fragment, then it could have been a transient network issue. If they never
|
|
|
+responded, then they are risky and should perhaps receive a lower payment for the storage
|
|
|
+of the fragment.
|
|
|
+
|
|
|
+Finally, the public block tree stores node lists for the root blocks of every block tree.
|
|
|
+This ensures that even if every node that participates in a block tree fails, the block
|
|
|
+tree can still be recovered from its fragments, provided its private key is known.
|
|
|
|
|
|
\section{Nodes and the Network}
|
|
|
-Each node in the network is identified by a public-private keypair and is issued a certificate
|
|
|
-trusted by the public root key. Nodes can be claimed by issuing them a certificate and
|
|
|
-then writing
|
|
|
-that certificate into the public blocktree. When a new node is claimed, currency is deposited into
|
|
|
-the account of the root key which claimed it. This currency is to account for the storage capacity
|
|
|
-that the new node brings to the network. This mechanism is the reason why the node must have a
|
|
|
-certificate trusted by the public root key, otherwise there would be no way to control the
|
|
|
+Each node in the network has a public-private keypair. The string formed by hex encoding the
|
|
|
+hash of a node's public key is referred to as the \emph{node ID} of the node. When nodes
|
|
|
+are manufactured they are issued a certificate trusted by the
|
|
|
+public block tree. New nodes are claimed by issuing them a certificate and then writing
|
|
|
+that certificate into the public log. When a new node is claimed, currency is credited to
|
|
|
+the block tree which claimed it. This currency is to account for the storage capacity
|
|
|
+that the new node brings to that block tree. This mechanism is the reason why the node must have a
|
|
|
+certificate trusted by the public block tree, otherwise there would be no way to control the
|
|
|
creation of currency.
|
|
|
|
|
|
-Nodes are identified by the hex encouding of the hash of their public key. This string is written
|
|
|
-into directory blocks to keep track of which nodes contain fragments of blocks in that directory.
|
|
|
-In order for this information to be useful, a mechanism is needed to resolve node IDs to IP
|
|
|
-addresses. This is accomplished by writing a block into the public blocktree with the node's
|
|
|
-IP address which is signed by the node's private key. Anytime the node receives a new IP address,
|
|
|
-it updates this block to inform the network of this change. Because the node's ID is derived from
|
|
|
-its private key, and the block containing its IP address is signed with this key, it's possible
|
|
|
-for a third party to independently verify that this IP address is authentic.
|
|
|
-
|
|
|
-When a node is given access to a blocktree, by issuing it a certificate, it is assigned a path
|
|
|
-under which its data will be stored. This path is referred to as the node's home. This path is
|
|
|
-written into the node's certificate. Thus a node's home is cryptographically verifiable and must
|
|
|
-be chosen when the node joins the blocktree. The node which issues the new node it's certificate
|
|
|
-grants the new node access to the block at its home path by encapsulating the block's key using
|
|
|
-the node's public key. Thus the new node can recover the block key and read the contents of its
|
|
|
-home block. If a node has its home at a block then its ID is written into the block.
|
|
|
-
|
|
|
-The data created by a node may optionally replicated to its parent node. This would be suitable
|
|
|
+Nodes are identified by their node ID in the public block tree and in node lists. Nodes
|
|
|
+are responsible for updating their IP address in the public block tree whenever it changes.
|
|
|
+
|
|
|
+When a node is attached to a block tree it is issued a certificate containing a path
|
|
|
+under which its data will be stored. We say the the node is attached to the block tree at that path.
|
|
|
+The node which issues the node its certificate creates a read cap for it and stores it in the
|
|
|
+block where the node is attached.
|
|
|
+
|
|
|
+The data created by a node may optionally be replicated in its parent node. This would be suitable
|
|
|
for a lightweight or mobile device which needs to ensure its data is replicated immediately and
|
|
|
-doesn't have time to negotiate contracts for the storage fragments. However, for larger
|
|
|
-blocktrees, having non-replicating nodes is essential for scalability.
|
|
|
+doesn't have time to negotiate contracts for the storage of fragments. For larger
|
|
|
+block trees, having non-replicating nodes is essential for scalability.
|
|
|
|
|
|
-More than one node can be housed at the same path, such nodes are called a cluster.
|
|
|
+More than one node can be attached at the same path, and when this happens a \emph{cluster} is formed.
|
|
|
Each node in the cluster stores
|
|
|
copies of the same data and they coordinate with each other to ensure the consitency of this
|
|
|
-data. This is accomplished by electing a leader. All writes to blocks under the nodes home are
|
|
|
+data. This is accomplished by electing a leader. All writes to blocks under the attachment point are
|
|
|
sent to the leader. The leader then serializes these writes and sends them to the rest of the
|
|
|
nodes. By default writes to blocks use optimistic concurrency, with the last write known to the
|
|
|
-leader being the winner. But if a node requires exclusive access to the data in a block it can
|
|
|
-request to the leader to lock it. Writes from nodes other than the locking node are rejected until
|
|
|
+leader being the winner. But if a node requires exclusive access to a block it can
|
|
|
+make a request to the leader to lock it. Writes from nodes other than the locking node are rejected until
|
|
|
the lock is released. The leader will release the lock on its own if no messages are received from
|
|
|
the locking node after a timeout.
|
|
|
|
|
|
-If a path is configured to be replicated to its parent, then the leader at that path will maintain
|
|
|
-a connection to the the leader in a parent cluster. Note that the parent cluster need not be
|
|
|
+If the attachment point is configured to be replicated to its parent, then the leader will maintain
|
|
|
+a connection to the the leader in the parent cluster. Note that the parent cluster need not be
|
|
|
housed in the parent block, just at some ancestor block. Then, writes will be propagated through
|
|
|
this connection to the parent cluster, where this process may continue if that cluster is also
|
|
|
configured for replication. Distributed locking is similarly comunicated to the parent cluster,
|
|
|
where the lock is only aquired with the parent's approval.
|
|
|
|
|
|
\section{Programmatic Access to Data}
|
|
|
-No designer can hope to envsion all the potential applications that a consumer would want to have
|
|
|
+No designer can hope to envsion all the potential applications that a person would want to have
|
|
|
access to their data. That's why an important component of the system is the ability to run
|
|
|
programs that can access data and provide services to other internet hosts, whether they are
|
|
|
blocktree nodes or not. This is accomplished by providing a WebAssembly based execution
|