|
@@ -8,7 +8,7 @@
|
|
|
\title{Blocktree \\
|
|
|
\large A platform for distributed computing.}
|
|
|
\author{Matthew Carr}
|
|
|
-\date{May 28, 2022}
|
|
|
+%\date{May 28, 2022}
|
|
|
\begin{document}
|
|
|
\maketitle
|
|
|
\begin{abstract}
|
|
@@ -25,91 +25,116 @@ the details of backing up user data and implementing access controls to facilita
|
|
|
safe sharing. However, because these are closed systems, users are forced to trust that
|
|
|
the operators are benevolent, and they lack any real way of ensuring that the access
|
|
|
controls they prescribe will actually be enforced. There have been several systems proposed
|
|
|
-as alternatives to the conventional model, but these systems suffer from their own shortcomings.
|
|
|
-They either assume the need for cloud storage providers (Blockstack \cite{blockstack}) or implement
|
|
|
-all operations using a global blockchain, limiting performance (Filecoin \cite{filecoin}).
|
|
|
-Blocktree takes a different approach.
|
|
|
-
|
|
|
-The idea behind blocktree is to organize a user's computers into a cooperative unit, called a
|
|
|
-blocktree. The user is said to own the blocktree, and they wield sovereign authority over it.
|
|
|
-The artifact granting them this authority is the root private key of the blocktree. Measures for protecting
|
|
|
-this key and delegating its authority are important design considerations of the system.
|
|
|
+as alternatives to the conventional model \cite{blockstack, filecoin}, but these systems
|
|
|
+suffer from their own shortcomings which Blocktree aims to address.
|
|
|
+
|
|
|
+The idea behind blocktree is to organize a user's computing devices and data into a single
|
|
|
+tree, called a blocktree. The user is said to own the blocktree, and they wield sovereign authority
|
|
|
+over it.
|
|
|
+The artifacts granting them this authority are their private keys, one for use in an encryption
|
|
|
+scheme and the other in a signing scheme.
|
|
|
+Measures for managing these keys and delegating their authority are important design considerations
|
|
|
+of the system.
|
|
|
The owners of blocktrees are encouraged to collaborate with each other to replicate data by
|
|
|
-means of a cryptocurrency known as blockcoin. The blockchain implementing this cryptocurrency
|
|
|
-is the source of global state for the system, and allows for the creation of global paths.
|
|
|
-
|
|
|
-All data stored in blocktree is contained in units called blocks. As the name would suggest, the blocks
|
|
|
-in a blocktree form a tree. Each block has a path corresponding to its location in the tree. The first
|
|
|
-component of a fully qualified blocktree path is the fingerprint of the root public key of the
|
|
|
-blocktree. Thus a blocktree path can globally specify a block. If a block is not a leaf,
|
|
|
-then it is called a directory, and the data it contains is managed by the system.
|
|
|
-This information includes the list of blocks which are children of the directory as well as the
|
|
|
-list of nodes which are attached to the block tree at this directory. In addition
|
|
|
-to its payload of data,
|
|
|
-each block has metadata containing cryptographic access control mechanisms. These mechanisms ensure
|
|
|
+means of a cryptocurrency known as blockcoin.
|
|
|
+The blockchain implementing this cryptocurrency is the source of global state for the system,
|
|
|
+and allows for the creation of a name resolution mechanism.
|
|
|
+
|
|
|
+A blocktree consists of four different types of blocks: files, directories, servers and processes.
|
|
|
+Each block has a path corresponding to its location in the tree.
|
|
|
+A server is responsible for storing the files and directories that are children of the directory
|
|
|
+it's contained in.
|
|
|
+When multiple servers are contained in the same directory, they form a cluster, and run
|
|
|
+the Raft consensus protocol \cite{raft} to ensure consistency of the data they store.
|
|
|
+A process is either the child of a server, or the child of the process which started it.
|
|
|
+Processes operate under the actor model and exchange messages that are addressed using paths.
|
|
|
+System calls are implemented using this same messaging facility.
|
|
|
+
|
|
|
+The first component of a fully qualified path is the fingerprint of the blocktree owner's public
|
|
|
+signing key, allowing a block to be globally specified.
|
|
|
+The contents of directories are managed by the system,
|
|
|
+and they may contain links to files, other directories and servers.
|
|
|
+In addition to its payload of data, each file and directory has system managed metadata.
|
|
|
+These metadata are used to implement cryptographic access control mechanism which ensure
|
|
|
that only authorized users can read and optionally write to the block.
|
|
|
-Users and nodes in the blocktree system are identified by hashes of their public keys. These hashes
|
|
|
-are referred to as principals, and they are used for setting access control policy.
|
|
|
+Access control for a given security principal is set using a hash of the principal's public signing
|
|
|
+key.
|
|
|
|
|
|
This paper is intended to be a short introduction to the ideas of blocktree. A book is planned which
|
|
|
will specify the system in greater detail. In keeping with the agile software methodology, this
|
|
|
book is being written concurrently with an open source implementation of the system.
|
|
|
The remainder of this paper is organized as follows:
|
|
|
-\begin{itemize}
|
|
|
+\begin{enumerate}
|
|
|
\item A description of the operations of a single blocktree.
|
|
|
\item The definition of a blockchain which provides global state and links individual blocktrees
|
|
|
together.
|
|
|
\item The programming interface for interacting with blocktrees and sending messages.
|
|
|
\item An exploration of applications that could be written using this platform.
|
|
|
\item Conclusion.
|
|
|
-\end{itemize}
|
|
|
+\end{enumerate}
|
|
|
|
|
|
\section{An Individual Blocktree}
|
|
|
-The atomic unit of data storage, confidentiality and authenticity is the block. A
|
|
|
-block contains a payload of data. Confidentiality of this data is achieved by encrypting it using
|
|
|
-a symmetric cipher using a random key. This random key is known as the block key.
|
|
|
-The block key can be encapsulated using the public key of a principal whose is to be given access.
|
|
|
-The resulting ciphertext is stored in the metadata of the block. Thus
|
|
|
-the person possessing the corresponding private key will be able to access the contents of
|
|
|
-the block. Blocks are arranged into trees, and the parent block also has a block key.
|
|
|
-The child's block key is always encapsulated using the parent's key and stored in the block
|
|
|
-metadata. This ensures that when a principal is given read access to a block, it automatically has
|
|
|
-access to every child of that block. The encapsulated block key is known as a read capability,
|
|
|
-or readcap, as it grants the holder the ability to read the block. In the case of the root block
|
|
|
-there is no parent block key to issue a readcap to. Instead, the root block always contains a
|
|
|
-readcap for the the root key.
|
|
|
-
|
|
|
-A note on visualizing a blocktree. In this paper I will describe the root block as being at the
|
|
|
-bottom of the tree, and describe the descendants of a block as being above it. The reader may
|
|
|
-recognize this as the ``trees grow up" model of formal logic, and it was chosen because such trees
|
|
|
-appear visually similar to real world trees when rendered in a virtual environment.
|
|
|
-
|
|
|
-Authenticity guarantees are provided using a digital signature scheme. In order to change the
|
|
|
+Files and directories are collectively known as data blocks. Data blocks have three components:
|
|
|
+\begin{enumerate}
|
|
|
+ \item Their body, which is a sequence of bytes. This is managed by user code in the case of files
|
|
|
+ and by the system in the case of directories.
|
|
|
+ \item Their metadata, which is managed by the system.
|
|
|
+ \item The log of events which have been committed, or are in the process of being committed.
|
|
|
+ This is managed by the system.
|
|
|
+\end{enumerate}
|
|
|
+
|
|
|
+The body of a data block and its metadata are both covered by integrity protection.
|
|
|
+The log is not directory covered, but the events in it are.
|
|
|
+The body of the block may be optionally encrypted to provide it with confidentiality protection.
|
|
|
+
|
|
|
+Confidentiality of the body is achieved by encrypting it using a symmetric cipher using a random
|
|
|
+key.
|
|
|
+This random key is known as the block key.
|
|
|
+When a principal is to be given access to a data block, it's public encryption key is used to
|
|
|
+encrypt the block key.
|
|
|
+The resulting ciphertext is referred to as a read capability, or readcap, for short.
|
|
|
+It is stored in the metadata of the block in a dictionary under the key which is computed by hashing
|
|
|
+the principal's public signing key.
|
|
|
+We say that the principal has been \emph{issued} a readcap for the block.
|
|
|
+When the principal issued the readcap wishes to read the block it looks for a hash of its public
|
|
|
+signing key in the block's metadata, and if it find a value, uses it's private encryption key
|
|
|
+to decrypt the block key.
|
|
|
+
|
|
|
+If a block is not the root, then its parent's block key is used to encrypt its block key and the
|
|
|
+result is stored in the block's metadata.
|
|
|
+This enables anyone with a readcap for a block to read all blocks which are descended from it.
|
|
|
+This also enables the contents of any block to be decrypted using only a single private key
|
|
|
+decryption operation.
|
|
|
+The root always contains a readcap for the owner, ensuring they can grant a readcap to any block
|
|
|
+in their blocktree.
|
|
|
+
|
|
|
+Integrity protection is provided using a digital signature scheme. In order to change the
|
|
|
contents of a block a data structure called a write capability, or writecap
|
|
|
\footnote{The names readcap and writecap were taken from the Tahoe Least-Authority Filesystem
|
|
|
\cite{tahoe}. The access control mechanism described in the Tahoe system heavily influenced the
|
|
|
design of Blocktree.},
|
|
|
-is needed. A
|
|
|
-writecap is approximately an x509 certificate chain. A writecap contains the following data:
|
|
|
+is used.
|
|
|
+A writecap is a certificate chain and it contains the following data:
|
|
|
\begin{itemize}
|
|
|
\item The path the writecap can be used under.
|
|
|
\item The principal that the writecap was issued to.
|
|
|
\item The timestamp when the writecap expires.
|
|
|
\item The public key of the principal who issued the writecap.
|
|
|
-\item A digital signature produced by the private key corresponding to the above public key.
|
|
|
-\item Optionally, the next writecap in the chain.
|
|
|
+\item A digital signature produced by the private key corresponding to the public key above.
|
|
|
+\item Optionally, the next certificate in the chain.
|
|
|
\end{itemize}
|
|
|
The last item is only excluded in the case of a self-signed writecap, i.e. one that was signed by
|
|
|
the same principal it was issued to. A writecap is considered valid for use in a block if all
|
|
|
of the following conditions are met:
|
|
|
\begin{itemize}
|
|
|
-\item The signature on every writecap in the chain is valid.
|
|
|
-\item The signing principal matches the principal the next writecap was issued to for every
|
|
|
-writecap in the chain.
|
|
|
-\item The path of the block is contained in the path of every writecap in the chain.
|
|
|
-\item The current timestamp is strictly less than the expiration of all the writecaps in the
|
|
|
-chain.
|
|
|
-\item The principal corresponding to the public key used to sign the last writecap in the chain,
|
|
|
+\item The signature on every certificate in the chain is valid.
|
|
|
+\item The signing principal matches the principal the next certificate was issued to for every
|
|
|
+certificate in the chain.
|
|
|
+\item The path in every certificate is contained in the path of the next certificate for each
|
|
|
+certificate.
|
|
|
+\item The path of the block is contained in the path of every certificate in the chain.
|
|
|
+\item The current timestamp is strictly less than the expiration of all the certificates.
|
|
|
+\item The principal corresponding to the public key used to sign the last certificate,
|
|
|
is the owner of the blocktree.
|
|
|
\end{itemize}
|
|
|
The intuition behind these rules is that a writecap is only valid if there is a chain of trust
|
|
@@ -117,55 +142,55 @@ that leads back to the owner of the blocktree. The owner may delegate their trus
|
|
|
of intermediaries by issuing them writecaps. These writecaps are scoped based on the path
|
|
|
specified when they are issued. These intermediaries can then delegate this trust as well.
|
|
|
A block is considered valid if it contains a valid writecap, it was signed using the key
|
|
|
-corresponding to the first writecap's public key, and this signature is valid. Note that because
|
|
|
+corresponding to the first certificate's public key, and this signature is valid. Note that because
|
|
|
the first component of the path is the fingerprint
|
|
|
-\footnote{By \emph{fingerprint} I mean the base64url encoding of the root principal, which is
|
|
|
-itself a hash of the root public key.}
|
|
|
-of the root public key, and the path is contained
|
|
|
-in the block, a block can be verified using only information contained within it.
|
|
|
+\footnote{By \emph{fingerprint} I mean the base64url encoding of a hash of the owner's public
|
|
|
+signing key.}
|
|
|
+of the owners public signing key, anyone who receives a block and knowns the block's path can verify
|
|
|
+that the block has not been altered.
|
|
|
|
|
|
Blocks are used for more than just organizing data, they also organize computation. A program
|
|
|
-participating in a blocktree is referred to as a node. Multiple nodes may be run on
|
|
|
-a single computer. Every node is attached to the blocktree at a specific path. This information
|
|
|
-is recorded in the directory where the node is attached. A node is responsible for the storage of
|
|
|
-the directory where it is attached and all of the blocks that are recursively contained with in it.
|
|
|
-If there is a child node attached to a subdirectory contained in the directory the node is
|
|
|
-responsible for, then the child node is responsible for the subdirectory.
|
|
|
+participating in a blocktree is referred to as a server. Multiple servers may be run on
|
|
|
+a single computer. Every server is contained in a directory in the blocktree.
|
|
|
+A server is responsible for the storage of
|
|
|
+the directory where it is attached and all of the blocks that are recursively contained within it.
|
|
|
+If there is a child server attached to a subdirectory contained in the directory the server is
|
|
|
+responsible for, then the child server is responsible for the subdirectory.
|
|
|
In this way data storage can be delegated, allowing the system to scale. When more than one
|
|
|
-node is attached to the same directory they form a cluster.
|
|
|
-Each node in the cluster contains a copy of the data that the cluster is responsible for. They
|
|
|
+server is attached to the same directory they form a cluster.
|
|
|
+Each server in the cluster contains a copy of the data that the cluster is responsible for. They
|
|
|
maintain consistency of this data by running the Raft \cite{raft} consensus protocol.
|
|
|
|
|
|
-When a new blocktree is created a node generates a key pair to serve as the root keys.
|
|
|
+When a new blocktree is created a server generates a key pair to serve as the root keys.
|
|
|
It is imperative for the security of the system that the root private key is protected, and it
|
|
|
is highly recommended that it be stored in a Trusted Platform Module (TPM) \cite{tpm} and that
|
|
|
the TPM
|
|
|
-be configured to disallow unauthenticated use of this key. The node then generates its own key pair
|
|
|
+be configured to disallow unauthenticated use of this key. The server then generates its own key pair
|
|
|
and uses the root private key to issue itself a writecap for the root of the tree. Once it has
|
|
|
this writecap, it creates the root block and generates a block key for it. A readcap is added to
|
|
|
-this block for the root public key and the node's public key. Additional
|
|
|
-cryptographic operations are performed using the node's key pair, and only when a new writecap
|
|
|
-needs to be created for an addition root node is the root private key used.
|
|
|
+this block for the root public key and the server's public key. Additional
|
|
|
+cryptographic operations are performed using the server's key pair, and only when a new writecap
|
|
|
+needs to be created for an addition root server is the root private key used.
|
|
|
|
|
|
-When a new node comes online and wishes to join the blocktree, it generates its own key pair.
|
|
|
+When a new server comes online and wishes to join the blocktree, it generates its own key pair.
|
|
|
The public key of this
|
|
|
-node then needs to be transmitted to another node that's already part of the user's blocktree. The
|
|
|
-mechanism used will depend on the nature of the device on which the new node is running.
|
|
|
+server then needs to be transmitted to another server that's already part of the user's blocktree. The
|
|
|
+mechanism used will depend on the nature of the device on which the new server is running.
|
|
|
For example, a phone could scan a QR code which contains
|
|
|
-the IP address of the user's root node, and then transmit its public key to that internet host.
|
|
|
-In order for the new node to be added to the user's blocktree, it needs to be issued a writecap
|
|
|
+the IP address of the user's root server, and then transmit its public key to that internet host.
|
|
|
+In order for the new server to be added to the user's blocktree, it needs to be issued a writecap
|
|
|
and a readcap must be added to the directory where it will be attached.
|
|
|
This could be accomplished
|
|
|
-by providing a user interface on the node which received the public key from the new node.
|
|
|
-This interface would show the requests that have been received from nodes attempting
|
|
|
+by providing a user interface on the server which received the public key from the new server.
|
|
|
+This interface would show the requests that have been received from servers attempting
|
|
|
to join the blocktree. The user can then choose to approve or deny the request, and can specify
|
|
|
-the path where the new node will attach. If the user chooses to approve the request, then the writecap
|
|
|
-is signed using the node's key and transmitted to then new node.
|
|
|
+the path where the new server will attach. If the user chooses to approve the request, then the writecap
|
|
|
+is signed using the server's key and transmitted to then new server.
|
|
|
|
|
|
The ability to cope with key compromise is an important design consideration in any real-world
|
|
|
-cryptosystem. In blocktree the compromise of a node key is handled by re-keying every block under
|
|
|
-the directory where the node was attached. Specifically, this means that a new block key is generated for
|
|
|
-each block, and the readcap for the compromised node is removed. This ensures that new writes to
|
|
|
+cryptosystem. In blocktree the compromise of a server key is handled by re-keying every block under
|
|
|
+the directory where the server was attached. Specifically, this means that a new block key is generated for
|
|
|
+each block, and the readcap for the compromised server is removed. This ensures that new writes to
|
|
|
these blocks will not be visible to the holder of the compromised key. To ensure that writes
|
|
|
will not be allowed, the root directory contains a revocation list containing the public keys
|
|
|
which have been revoked by the tree. These only need to be maintained until their writecaps
|
|
@@ -174,72 +199,72 @@ then the blocktree must be abandoned, there is no recovery. This is real securit
|
|
|
which grants control over the blocktree is the root private key which is why storing the root
|
|
|
private key in multiple secure cryptographic co-processors is so important.
|
|
|
|
|
|
-Nodes in a blocktree communicate with one another by sending blocktree messages. These messages
|
|
|
+servers in a blocktree communicate with one another by sending blocktree messages. These messages
|
|
|
are used for implementing consensus and distributed locking,
|
|
|
-as well as notifying parent nodes when a
|
|
|
+as well as notifying parent servers when a
|
|
|
child has written new data. The later mechanism allows the parent to replicate the data stored in
|
|
|
a child for redundancy. User code is also able to initiate the sending of messages. Messages are
|
|
|
-addressed using blocktree paths. When a node receives a message that is not addressed to it,
|
|
|
-but is addressed to its blocktree, it forwards it to the closest node to the recipient that it
|
|
|
-is connected to. In order to enable efficient low-latency message transfers, nodes maintain open
|
|
|
-TCP connections to the other nodes in their cluster, and the cluster leader maintains a connection to
|
|
|
+addressed using blocktree paths. When a server receives a message that is not addressed to it,
|
|
|
+but is addressed to its blocktree, it forwards it to the closest server to the recipient that it
|
|
|
+is connected to. In order to enable efficient low-latency message transfers, servers maintain open
|
|
|
+TCP connections to the other servers in their cluster, and the cluster leader maintains a connection to
|
|
|
its parent. Diffie-Hellman key exchange is used to exchange a key for use in an AEAD cipher, and
|
|
|
-once this cipher context is established, the two nodes mutually authenticate each other using their
|
|
|
-respective key pairs. When a node comes online, it uses the global blocktree (described later)
|
|
|
-to find the other nodes in its cluster. If it is not part of a cluster, or this information is
|
|
|
+once this cipher context is established, the two servers mutually authenticate each other using their
|
|
|
+respective key pairs. When a server comes online, it uses the global blocktree (described later)
|
|
|
+to find the other servers in its cluster. If it is not part of a cluster, or this information is
|
|
|
not stored in the global blocktree, then it instead looks up the
|
|
|
-IP address of a root node and connects to it. The root node may direct the node to connect to
|
|
|
-one of root's children, and this process repeats until the new node is connected to its parent.
|
|
|
+IP address of a root server and connects to it. The root server may direct the server to connect to
|
|
|
+one of root's children, and this process repeats until the new server is connected to its parent.
|
|
|
|
|
|
A concept that has proven to be very useful in the world of filesystems is the symbolic link.
|
|
|
This is a short file that contains the path to another file, and is interpreted by most programs
|
|
|
as being a "link" to that file. Blocktree supports a similar system, where a block can be
|
|
|
marked as a symbolic link and a blocktree path placed in its body. This also provides us with
|
|
|
-a convenient way of storing readcaps for data that a node would otherwise not have access to.
|
|
|
+a convenient way of storing readcaps for data that a server would otherwise not have access to.
|
|
|
For instance a symbolic link could be created which points to a block in another user's blocktree.
|
|
|
The other user only knows the public key of the owner of our blocktree, so they issue
|
|
|
-a readcap to it. But the root nodes can open this readcap and extract
|
|
|
-the block key. This key can then be encapsulated using the public key of the node which
|
|
|
-requires access, and placed in the symbolic link. When the node needs to read the data
|
|
|
+a readcap to it. But the root servers can open this readcap and extract
|
|
|
+the block key. This key can then be encapsulated using the public key of the server which
|
|
|
+requires access, and placed in the symbolic link. When the server needs to read the data
|
|
|
in the block, it opens the readcap in the symbolic link, follows the link to the block (how
|
|
|
that actually happens will be discussed below) and decrypts its contents.
|
|
|
|
|
|
While the consistency of individual blocks can be maintained using Raft, a distributed locking
|
|
|
mechanism is employed to enable transactions which span multiple blocks. This is
|
|
|
-accomplished by exploiting the hierarchical arrangement of nodes in the tree. In order to describe
|
|
|
-this, its first helpful to define a new term. The \emph{nodetree} of a blocktree is tree obtained
|
|
|
-from the blocktree by collapsing all the blocks that a node (or cluster of nodes) is responsible
|
|
|
-for into a single logical block representing the node itself. Thus we can talk about a node having
|
|
|
-a parent, and by this we mean its parent in the nodetree. In terms of the blocktree, the parent
|
|
|
-of a node is the first node encountered when the path back to the root is traversed.
|
|
|
+accomplished by exploiting the hierarchical arrangement of servers in the tree. In order to describe
|
|
|
+this, its first helpful to define a new term. The \emph{servertree} of a blocktree is tree obtained
|
|
|
+from the blocktree by collapsing all the blocks that a server (or cluster of servers) is responsible
|
|
|
+for into a single logical block representing the server itself. Thus we can talk about a server having
|
|
|
+a parent, and by this we mean its parent in the servertree. In terms of the blocktree, the parent
|
|
|
+of a server is the first server encountered when the path back to the root is traversed.
|
|
|
Now, distributed locking works as follows:
|
|
|
\begin{itemize}
|
|
|
-\item A node sends a request to lock a block to the current consensus leader in its cluster.
|
|
|
- If the node is not part of a cluster, then it is the leader. This request contains a timestamp
|
|
|
+\item A server sends a request to lock a block to the current consensus leader in its cluster.
|
|
|
+ If the server is not part of a cluster, then it is the leader. This request contains a timestamp
|
|
|
for when the lock expires.
|
|
|
\item If the leader is responsible for the block then it moves on to the next step. Otherwise
|
|
|
it contacts its parent and forwards the lock request and this step is repeated for the parent.
|
|
|
-\item The responsible node checks to see if there is already a lock for this block. If there is
|
|
|
+\item The responsible server checks to see if there is already a lock for this block. If there is
|
|
|
then the request fails. Otherwise the request succeeds and a lock is placed on the block. A
|
|
|
- message indicating the result is then passed back up the tree ending at the original node. This
|
|
|
- message includes the principal of the node enforcing the lock.
|
|
|
-\item Once the locking node is done making its updates it sends a message directly to the node
|
|
|
+ message indicating the result is then passed back up the tree ending at the original server. This
|
|
|
+ message includes the principal of the server enforcing the lock.
|
|
|
+\item Once the locking server is done making its updates it sends a message directly to the server
|
|
|
enforcing the lock, causing it to be removed.
|
|
|
\end{itemize}
|
|
|
Locking a block locks the subtree rooted at that block. Thus no writes to any path contained in
|
|
|
-the path of the locked block will be allowed, unless they come from locking node. If the locking
|
|
|
-node does not send the message
|
|
|
+the path of the locked block will be allowed, unless they come from locking server. If the locking
|
|
|
+server does not send the message
|
|
|
unlocking the block before the lock expires, then the modifications which had been performed by
|
|
|
-it are dropped and the block reverts to its prior state. Since the locking node is the leader
|
|
|
+it are dropped and the block reverts to its prior state. Since the locking server is the leader
|
|
|
of the consensus cluster that is responsible for the block, this guarantees that
|
|
|
-writes from other nodes will not be accepted.
|
|
|
+writes from other servers will not be accepted.
|
|
|
|
|
|
\section{Connecting Blocktrees}
|
|
|
-In order to allow nodes to access blocks in other blocktrees, a global ledger of events is used.
|
|
|
+In order to allow servers to access blocks in other blocktrees, a global ledger of events is used.
|
|
|
This ledger is implemented using a proof of work (PoW) blockchain and a corresponding
|
|
|
-cryptocurrency known as blockcoin. Nodes mine chain blocks (not to be confused with the tree
|
|
|
+cryptocurrency known as blockcoin. Servers mine chain blocks (not to be confused with the tree
|
|
|
blocks we've been discussing up till now) in the same way they do in other PoW blockchain
|
|
|
-systems such as Bitcoin \cite{bitcoin}. The node which manages to mine the next chain block receives
|
|
|
+systems such as Bitcoin \cite{bitcoin}. The server which manages to mine the next chain block receives
|
|
|
a reward,
|
|
|
which is the sum of the fees for each event in the chain and a variable amount of newly minted
|
|
|
blockcoin. The amount of new blockcoin created by a chain block is directly proportional to the
|
|
@@ -247,7 +272,7 @@ amount of data storage events contained in the chain block. Thus the total amoun
|
|
|
in circulation has a direct relationship to the amount of data stored in the system, reflecting
|
|
|
the fact that blockcoin exists to provide an accounting mechanism for data.
|
|
|
|
|
|
-When a node writes data to a tree block, and it wishes this block to be globally accessible or
|
|
|
+When a server writes data to a tree block, and it wishes this block to be globally accessible or
|
|
|
replicated for redundancy,
|
|
|
it produces what are called fragments. Fragments are the output symbols of the RaptorQ code
|
|
|
\cite{raptorq}. This algorithm is an example of an Erasure Code, which is a class of fountain codes
|
|
@@ -256,41 +281,41 @@ the original data. Such a code ensures that even if some of the fragments are lo
|
|
|
remain, the original data can be recovered.
|
|
|
|
|
|
Once these fragments have been computed an event is created for each one and published to the
|
|
|
-blockchain. This event indicates to other nodes that this node wishes to store a fragment and
|
|
|
+blockchain. This event indicates to other servers that this server wishes to store a fragment and
|
|
|
states the amount of blockcoin it will pay for each of the fragment's maintenance payments. When
|
|
|
-another nodes wishes to accept the offer, it directly contacts the first node, which then sends
|
|
|
+another servers wishes to accept the offer, it directly contacts the first server, which then sends
|
|
|
it the fragment and publishes an event stating that the fragment is stored with the second
|
|
|
-node. This event includes the path of the block the fragment was computed from, the fragment's
|
|
|
-ID (the sequence number from the erasure code), and the principal of the node which stored it.
|
|
|
-Thus any other node in the network can use the information contained in this event to
|
|
|
-determine the set of nodes which contain the fragments of any given path.
|
|
|
-
|
|
|
-In order for the node which stored a fragment to receive its next payment, it has to pass
|
|
|
-a time-bound challenge-response protocol initiated by the node that owns the fragment.
|
|
|
-The owning node select a leaf in the Merkle tree of the fragment and sends the index of
|
|
|
-this leaf to the storing node. The storing node then walks the path from this leaf back to
|
|
|
+server. This event includes the path of the block the fragment was computed from, the fragment's
|
|
|
+ID (the sequence number from the erasure code), and the principal of the server which stored it.
|
|
|
+Thus any other server in the network can use the information contained in this event to
|
|
|
+determine the set of servers which contain the fragments of any given path.
|
|
|
+
|
|
|
+In order for the server which stored a fragment to receive its next payment, it has to pass
|
|
|
+a time-bound challenge-response protocol initiated by the server that owns the fragment.
|
|
|
+The owning server select a leaf in the Merkle tree of the fragment and sends the index of
|
|
|
+this leaf to the storing server. The storing server then walks the path from this leaf back to
|
|
|
the root of the Merkle tree, and updates a hash value using the data in each node it traverses.
|
|
|
-It sends this result back to the owning node which verifies that this value matches its
|
|
|
-own computation. If it does then the owning node signs a message indicating that the challenge
|
|
|
-passed and that the storing node should be paid. The storing node receives this message and uses
|
|
|
+It sends this result back to the owning server which verifies that this value matches its
|
|
|
+own computation. If it does, then the owning server signs a message indicating that the challenge
|
|
|
+passed and that the storing server should be paid. The storing server receives this message and uses
|
|
|
it to construct an event, which it signs and publishes to the blocktree. This event causes
|
|
|
-the blockcoin amount specified to be transferred from the owning node's to the storing
|
|
|
-node's blocktree.
|
|
|
+the blockcoin amount specified to be transferred from the owning server's to the storing
|
|
|
+server's account.
|
|
|
|
|
|
-The fact that payments occur over time provides a simple incentive for nodes to be honest and
|
|
|
-store the data they agree to. In banking terms, the storing node views the fragment as an
|
|
|
+The fact that payments occur over time provides a simple incentive for servers to be honest and
|
|
|
+store the data they agree to. In banking terms, the storing server views the fragment as an
|
|
|
asset, it is a loan of its disk space which provides a series of payments over time.
|
|
|
-On the other hand the owning node views the fragment as a liability, it requires payments to
|
|
|
+On the other hand the owning server views the fragment as a liability, it requires payments to
|
|
|
be made over time. In order for a blocktree owner to remain solvent, it must balance its
|
|
|
liabilities with its assets, incentivizing it to store data for others so that its own data
|
|
|
will be stored.
|
|
|
|
|
|
-In order for nodes to be able to contact other nodes, a mechanism is required for associating
|
|
|
-an internet protocol (IP) address with a principal. This is done by having nodes publish events
|
|
|
+In order for servers to be able to contact other servers, a mechanism is required for associating
|
|
|
+an internet protocol (IP) address with a principal. This is done by having servers publish events
|
|
|
to the blockchain when their IP address changes. This event includes their new IP address,
|
|
|
-their public key, and a digital signature computed using their private key. Other nodes can
|
|
|
+their public key, and a digital signature computed using their private key. Other servers can
|
|
|
then verify this signature to ensure that an attacker cannot bind the wrong
|
|
|
-IP address to a principal in order to receive messages not meant for it.
|
|
|
+IP address to a principal in order to receive messages meant for it.
|
|
|
|
|
|
While this event ledger is useful for appending new
|
|
|
events, and ensuring previous events
|
|
@@ -298,78 +323,82 @@ cannot be changed, another data structure is required to enable efficient querie
|
|
|
In particular, it's important to be able to quickly perform the
|
|
|
following queries:
|
|
|
\begin{itemize}
|
|
|
-\item Find the set of nodes storing the fragments for a given path.
|
|
|
-\item Find the IP address of a node or owner given a principal.
|
|
|
+\item Find the set of servers storing the fragments for a given path.
|
|
|
+\item Find the IP address of a server or owner given a principal.
|
|
|
\item Find the public key associated with a principal.
|
|
|
\end{itemize}
|
|
|
-To enable these queries a special blocktree is maintained by each node in the network: the global
|
|
|
+To enable these queries a special blocktree is maintained by each server in the network: the global
|
|
|
blocktree. This tree does not support the usual writing and locking semantics of local blocktrees.
|
|
|
In functional programming terms, it can be thought of as a left fold over all of the events in the
|
|
|
-blockchain.
|
|
|
+blockchain starting from the empty state.
|
|
|
The above queries are facilitated by the following blocks:
|
|
|
\begin{itemize}
|
|
|
\item \emph{/global/fragments}: this block contains a hashtable where the key is a path and the
|
|
|
-value is the list of nodes storing the fragments for the block at that path.
|
|
|
+value is the list of servers storing the fragments for the block at that path.
|
|
|
\item \emph{/global/principals}: contains a hashtable where the key is the a principal and the value
|
|
|
is the tuple containing the public key of that principal, its current IP address, and its current
|
|
|
blockcoin balance.
|
|
|
\end{itemize}
|
|
|
-To compute the entries in these tree blocks, the nodes in the network iterate over all the chain blocks, updating
|
|
|
-their local copy of each tree block appropriately. The experienced reader will recognize that this is an event
|
|
|
+To compute the entries in these tree blocks, the servers in the network iterate over all the chain blocks, updating
|
|
|
+their local copy of each tree block appropriately. The reader may recognize this as an event
|
|
|
sourced architecture. Currently only these two tree blocks are known to be needed, but if new events are
|
|
|
added to the system it can be easily extended to enable queries that have yet to be envisioned.
|
|
|
|
|
|
\section{Programming Environment}
|
|
|
Enabling an excellent developer experience is one of the primary goals of this system (the others being security
|
|
|
-and scalability). Nodes execute user code that has been compiled into WebAssembly modules \cite{wasm}. Such code
|
|
|
-running on a blocktree node is referred to as an "app". An app
|
|
|
-executes in a sandbox that isolates it from other code, as well as the security critical operations of the node
|
|
|
-itself. The sandbox provides the code with an extension of the WebAssembly System Interface (WASI), with extra
|
|
|
-system calls to interact with the particulars of the blocktree system.
|
|
|
-The extra system calls fall into these categories:
|
|
|
+and performance). servers execute user code that has been compiled into WebAssembly modules \cite{wasm}. Such code
|
|
|
+running on a blocktree server is referred to as a process. A process
|
|
|
+executes in a sandbox that isolates it from other processes, as well as the security critical operations of the server
|
|
|
+itself. The sandbox provides the code with an extension of the WebAssembly System Interface (WASI),
|
|
|
+with extra functions to send and receive blocktree messages.
|
|
|
+The standard POSIX filesystem APIs are used to interact with the contents of blocktrees.
|
|
|
+For instance, a file descriptor for a block can be obtained by calling path\_open.
|
|
|
+Additional non-POSIX functionality is implemented by adding
|
|
|
+messages which are handled by the server. This functionality includes the following:
|
|
|
\begin{itemize}
|
|
|
\item Distributed Locking
|
|
|
\item Messaging
|
|
|
\item Supervision Trees
|
|
|
\item Protocol Contracts (Session Types)
|
|
|
\end{itemize}
|
|
|
-The standard WASI filesystem APIs are used
|
|
|
-to interact with the contents of blocktrees. For instance a file descriptor for a block can be obtained
|
|
|
-by calling path\_open. Writes and reads of blocks are performed using the privileges of the node on which
|
|
|
-the app is running, but the node may limit the app's access depending on its permissions.
|
|
|
-
|
|
|
-When an app is installed it is given a directory under which it can store data that is shared between all nodes
|
|
|
-in the blocktree. The path of this block is formed by prefixing the path the app was published at
|
|
|
-with the string ``/apps". When an app is runs on a node, it is confined to a block contained
|
|
|
-in the node's directory. It is only allowed read and write blocks in this block, but to allow it to
|
|
|
-access shared data, a symbolic link is created to the app's shared directory in ``/apps".
|
|
|
-
|
|
|
-% App Publishing
|
|
|
-Apps are published by writing them into a blocktree. The path of the directory used to publish an app is used to
|
|
|
-identify it. Only one app per directory is allowed. This directory contains the WebAssembly module itself as
|
|
|
-well as a JSON manifest. This manifest defines the app's user-friendly name as well as the list
|
|
|
-of permissions it requires. This list of permissions is used to determine which APIs the app has access to.
|
|
|
+This additional functionality is described later in this section.
|
|
|
+
|
|
|
+% Package Publishing
|
|
|
+These modules are distributed in packages stored in a blocktree. A package contains one or more
|
|
|
+Wasm modules and a TOML manifest file which describes the package.
|
|
|
+This manifest defines the package's user-friendly name as well as the list
|
|
|
+of permissions it requires.
|
|
|
+This list of permissions is used to determine which APIs the package has access to.
|
|
|
+These artifacts are then
|
|
|
+placed in a zip file and stored in a blocktree file. This ensures the integrity of the code in the
|
|
|
+package, as all blocks in a blocktree are integrity protected.
|
|
|
+
|
|
|
+When a package is installed it's given a directory under which it can store data that is shared between all servers
|
|
|
+in a blocktree. The path of this block is formed by prefixing the path the package was published at
|
|
|
+with the string ``/apps". When a package is runs on a server, it is confined to a block contained
|
|
|
+in the server's directory. It is only allowed read and write blocks in this block, but to allow it to
|
|
|
+access shared data, a symbolic link is created to the package's shared directory in ``/apps".
|
|
|
|
|
|
% Privacy Safe vs Unsafe
|
|
|
-Apps are broken into two categories: those that are privacy safe and those that are not.
|
|
|
-An app is privacy unsafe if it requests any permissions which allow it to send data
|
|
|
+Packages are broken into two categories: those that are privacy safe and those that are not.
|
|
|
+A package is privacy unsafe if it requests any permissions which allow it to send data
|
|
|
outside of the blocktree it's part of. Thus requesting the ability to
|
|
|
-open a TCP socket would cause an app to be privacy unsafe. Similarly, the creation of a protocol
|
|
|
-handler for HTTP would also be privacy unsafe. Privacy unsafe apps can limit the scope of their
|
|
|
+open a TCP socket would cause a package to be privacy unsafe. Similarly, the creation of a protocol
|
|
|
+handler for HTTP would also be privacy unsafe. Privacy unsafe packages can limit the scope of their
|
|
|
unsafety by
|
|
|
-imposing limits on the unsafe APIs that they request. For instance an app which needs to send
|
|
|
+imposing limits on the unsafe APIs that they request. For instance a package which needs to send
|
|
|
blocktree message back to the blocktree it was published in can request the messaging permission
|
|
|
-for a path in this tree. Similarly, an app which only wants to open a TCP socket listening on the
|
|
|
+for a path in that tree. Similarly, a package which only wants to open a TCP socket listening on the
|
|
|
local network, can limit the scope of its requested permission.
|
|
|
|
|
|
% Protocol Contracts
|
|
|
-In order to make it easy to write apps which interface with existing systems, most notably
|
|
|
-those using HTTP, apps are able to define protocols using a contract language and then register
|
|
|
-call backs for these protocols. This works by providing a system call which the app can
|
|
|
-supply a protocol contract to. This contract is then compiled into a state machine by the node
|
|
|
-and a handle is returned to the app. This handle is then used to register callbacks for different
|
|
|
-parts of the protocol. This ensures that the protocol is handled by the node itself and that
|
|
|
-protocols can be shared between many different apps as a library. For instance, the HTTP protocol
|
|
|
+In order to make it easy to write packages which interface with existing systems, most notably
|
|
|
+those using HTTP, packages are able to define protocols using a contract language and then register
|
|
|
+call backs for these protocols. This works by providing a system call which the package can
|
|
|
+supply a protocol contract to. This contract is then compiled into a state machine by the server
|
|
|
+and a handle is returned to the package. This handle is then used to register callbacks for different
|
|
|
+parts of the protocol. This ensures that the protocol is handled by the server itself and that
|
|
|
+protocols can be shared between many different packages as a library. For instance, the HTTP protocol
|
|
|
would be compiled to a particularly simply state machine, with only one state: the listening state.
|
|
|
This state would expose a hook that where a callback can be registered to handle a request.
|
|
|
This state also defines the record format used to pass the request information to the callback
|
|
@@ -377,30 +406,30 @@ and the record format of the return value that is expected in order to produce t
|
|
|
More complicated (stateful) protocols would have more states, each defining their own request and
|
|
|
response records, as well as hooks. One nice thing about this setup is that it will enable
|
|
|
optimizations where the state machine and the user callbacks can be compiled into a program
|
|
|
-which can be safely run in the node itself, or even in a SmartNIC. This would require that
|
|
|
+which can be safely run in the server itself, or even in a SmartNIC. This would require that
|
|
|
the callbacks only use an approved set of APIs, but could enable much higher performance.
|
|
|
|
|
|
% Supervision Trees
|
|
|
-Apps can arrange themselves into supervision trees, in the same way that Erlang
|
|
|
-processes are arranged \cite{armstrong}. In this scheme, when a child app crashes, or the node its
|
|
|
-running on dies (which is detected by other nodes), then the app receives a message.
|
|
|
+Processes can arrange themselves into supervision trees, in the same way that Erlang
|
|
|
+processes are arranged \cite{armstrong}. In this scheme, when a child package crashes, or the server its
|
|
|
+running on dies (which is detected by other servers), then the process receives a message.
|
|
|
In the
|
|
|
-simplest case this can be used to implement a logging system, where crashes and node
|
|
|
+simplest case this can be used to implement a logging system, where crashes and server
|
|
|
deaths are recorded. More interestingly, this can be used to integrate with a control
|
|
|
plane. For instance, if a blocktree were running in AWS, when a message is received indicating
|
|
|
-that a node has died, a new EC2 instance could be started to replace it. The reliability of Erlang
|
|
|
+that a server has died, a new EC2 instance could be started to replace it. The reliability of Erlang
|
|
|
and other system employing the Actor Model have shown the robustness of this approach.
|
|
|
-Apps can form this relationship after they've started running, provided that both of the apps
|
|
|
-have permission to send messages to each other. Apps, with the appropriate permissions, can also
|
|
|
-spawn other apps on descendent nodes. This can be used to implement map-reduce workloads, where
|
|
|
-an app spawns mapping jobs on descendent nodes containing the data of interest, and it processes
|
|
|
+Processes can form this relationship after they've started running, provided that both of the processes
|
|
|
+have permission to send messages to each other. Processes, with the appropriate permissions, can also
|
|
|
+spawn other processes on descendent servers. This can be used to implement map-reduce workloads, where
|
|
|
+a process spawns mapping jobs on descendent servers containing the data of interest, and it processes
|
|
|
their messages to compute the final reduction. Due to the tiny size of most programs, this is a much
|
|
|
-more efficient approach than moving the data to the nodes performing the computation.
|
|
|
+more efficient approach than moving the data.
|
|
|
|
|
|
-\section{A Brave New Web}
|
|
|
+\section{Potential Applications}
|
|
|
In order to explore how blocktree can be used, the design of several hypothetical systems
|
|
|
is discussed. It's important to note that blocktree does not try to force all computation
|
|
|
-to be local to a user's device, it merely tries to enable this for applications where it
|
|
|
+to be local to a user's device, but it tries to enable this for applications where it
|
|
|
is possible.
|
|
|
|
|
|
\subsection{Contacts and Mail}
|
|
@@ -415,7 +444,7 @@ it would be easy to fool the user into associating an attacker's key for the per
|
|
|
|
|
|
The user now has a way of associating a blocktree with the name of this person. However, the
|
|
|
root public key of this blocktree is not enough to establish secure communications, because
|
|
|
-the root private key is not available to every node in the person's blocktree. In particular
|
|
|
+the root private key is not available to every server in the person's blocktree. In particular
|
|
|
it would be inadvisable for the root private key to be stored on a user's mobile device. To
|
|
|
address this mailbox directories are created.
|
|
|
|
|
@@ -423,11 +452,11 @@ For each contact two directories are created: the inbox and the outbox. The user
|
|
|
for another user's root key and adds it to their outbox. The inbox for the other user is a symbolic
|
|
|
link to the user's outbox in the blocktree of the other user. Thus each user
|
|
|
can write messages into their own blocktree at a location where the other party knows how
|
|
|
-to find them. But, in order for a node to read these messages it requires a its own readcap.
|
|
|
+to find them. But, in order for a server to read these messages it requires a its own readcap.
|
|
|
Only the root
|
|
|
-nodes can issue this readcap as only they have access to the root key. Once permission has been
|
|
|
-granted to a node, a root node can use the root key to decrypt the readcap issued to it, and then
|
|
|
-encrypt it using the public key of the node. The resulting readcap is then stored in the metadata
|
|
|
+servers can issue this readcap as only they have access to the root key. Once permission has been
|
|
|
+granted to a server, a root server can use the root key to decrypt the readcap issued to it, and then
|
|
|
+encrypt it using the public key of the server. The resulting readcap is then stored in the metadata
|
|
|
of the inbox.
|
|
|
|
|
|
In addition to being able to check the inbox for mail, a blocktree message is sent to the receiving
|
|
@@ -453,7 +482,7 @@ is a symbolic link to a single status update block. This symbolic link contains
|
|
|
readcap for the status update block.
|
|
|
|
|
|
Comments and likes on status updates are implemented by sending mail to the user who posted the
|
|
|
-update. When one of the user's nodes receives this mail, it then updates the block containing
|
|
|
+update. When one of the user's servers receives this mail, it then updates the block containing
|
|
|
the status update with the new comment, or increments the like counter.
|
|
|
It then sends mail to the people this status update was shared with, notifying them
|
|
|
that the data has changed.
|
|
@@ -467,14 +496,14 @@ inventory even when their internet service goes down, and that these updates nee
|
|
|
on the website once connectivity is restored.
|
|
|
|
|
|
To accomplish this a designer could create a directory for each warehouse. This directory would have
|
|
|
-nodes attached to it that are physically located at each warehouse. The inventory of the warehouse
|
|
|
+servers attached to it that are physically located at each warehouse. The inventory of the warehouse
|
|
|
is then maintained in the warehouse's directory, satisfying the first requirement. Now, in order to
|
|
|
enable efficient queries of the overall available inventory, the data from each of the warehouses
|
|
|
needs to be merged. This is accomplished by creating another directory containing the merged data.
|
|
|
In event sourcing terms this is called a read-model. This directory will have another cluster of
|
|
|
-nodes attached which will act as web servers. These nodes will subscribe to events published by the
|
|
|
+servers attached which will act as web servers. These servers will subscribe to events published by the
|
|
|
warehouse clusters, events indicating changing levels of inventory, and digest this information into
|
|
|
-a format that can be efficiently queried. These nodes will also publish events to the warehouses
|
|
|
+a format that can be efficiently queried. These servers will also publish events to the warehouses
|
|
|
when orders are placed, cancelled or amended.
|
|
|
|
|
|
When a warehouse goes offline, its previous inventory counts are
|
|
@@ -494,7 +523,7 @@ I hope this example shows that having a standard format for data and the federat
|
|
|
can provide designers with much greater flexibility, even if they do not care about decentralization
|
|
|
or their user's privacy.
|
|
|
|
|
|
-\subsection{The Open Metaverse}
|
|
|
+\subsection{A Metaverse}
|
|
|
As a final example I'd like to consider a platform for recording spacial information. The key insight
|
|
|
that enables this is very general: blocktree enables the creation of distributed tree-like data structures.
|
|
|
For instance, its straight forward to imagine creating a distributed hashtable implemented as a red-black tree.
|
|
@@ -519,8 +548,8 @@ pointing to a block in the owner's blocktree. They can then write whatever data
|
|
|
the contents of their parcel. Collaboration on a single parcel is accomplished by issuing a read and writecap to
|
|
|
another user.
|
|
|
|
|
|
-It's easy to imagine that one world would be more important than the rest and that he creation of a metaverse
|
|
|
-representation of Earth will be an important undertaking. The hierarchical nature of permissions in blocktree
|
|
|
+It's easy to imagine that one world would be more important than the rest and that the creation of a metaverse
|
|
|
+representation of the Earth will be an important undertaking. The hierarchical nature of permissions in blocktree
|
|
|
make such a shared world possible. National blocktrees could be given ownership of their virtual territory,
|
|
|
this would then
|
|
|
be delegated down to the state and municipal levels. Finally municipalities would delegate individual parcels
|
|
@@ -533,7 +562,7 @@ In this paper I have given the outline of a decentralized method of organizing i
|
|
|
and trust in a way that I hope will be useful and easy to use. The use of cryptographic primitives for
|
|
|
implementing access control were discussed, as well as methods of protecting private keys. A blockchain
|
|
|
and corresponding cryptocurrency was proposed as a means of incentivizing the distribution of data.
|
|
|
-Erasure coding was used to ensure that distributed data is resilient to the loss of nodes and can be
|
|
|
+Erasure coding was used to ensure that distributed data is resilient to the loss of servers and can be
|
|
|
reconstructed efficiently. A programming environment based on WASM and WASI was proposed as a way
|
|
|
of providing an interface to this data. APIs for defining protocol contracts and efficient web servers
|
|
|
were indicated. APIs for constructing supervision trees were mentioned as a means for building reliable
|