123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259 |
- \documentclass{article}
- \usepackage{amsfonts,amssymb,amsmath}
- \usepackage[scale=0.75]{geometry}
- \usepackage{hyperref}
- \title{Blocktree \\
- \large A platform for distributed computing.}
- \author{Matthew Carr}
- \date{May 28, 2022}
- \begin{document}
- \maketitle
- \begin{abstract}
- Blocktree is a platform which aims to make developing reliable distributed systems easy and safe.
- It does this by defining a format for data stored in the system, a mechanism for replicating
- that data, and a programming interface for accessing it. It aims to solve the difficult problems
- of cryptographic key management, consensus, and global addressing so that user code can focus on
- solving problems germane to their application.
- \end{abstract}
- \section{Introduction}
- (Do I need this?)
- \section{An Individual Blocktree}
- The atomic unit of data storage and privacy and authenticity guarantees is called a block. A
- block contains a payload of data. Confidentiality of this data is achieved by encrypting it using
- a symmetric cipher using a random key. This random key is known as the block key.
- The block key is encapsualated using
- a public key cipher and the resulting cipher text is stored in the header of the block. Thus
- only the person possessing the corresponding private key will be able to access the contents of
- the block. Blocks are arranged into trees, and the parent of the block also has a block key.
- The child's block key is always encapsulated using the parent's key and stored in the block
- header. This ensures that if a principal is given access to a block, they automatically have
- access to every child of that block. The encapsulated block key is known as a read capability,
- or readcap, as it grants the holder the ability to the block.
- Authenticity guarantees are provided using a digital signature scheme. In order to change the
- contents of a block a data structure called a write capability, or writecap, is needed. A
- writecap is approximately an x509 certificate chain. A writecap contains the following data:
- \begin{itemize}
- \item The path the writecap can be used under.
- \item The principal that the writecap was issued to.
- \item The timestamp when the writecap expires.
- \item The public key of the principal who issued the writecap.
- \item A digital signature produced by the private key corresponding to the above public key.
- \item Optionally, the next writecap in the chain.
- \end{itemize}
- The last item is only excluded in the case of a self-signed writecap, i.e. one that was signed by
- the same principal it was issued to. A writecap is considered valid for use on a block if all
- of the following conditions are true:
- \begin{itemize}
- \item The signature on every writecap in the chain is valid.
- \item The signing principal matches the principal the next writecap was issued to for every
- write cap in the chain.
- \item The path of the block is contained in the path of every writecap in the chain.
- \item The current timestamp is strictly less than the expiration of all the writecaps in the
- chain.
- \item The principal corresponding to public key used to sign the last writecap in the chain,
- is the owner of the blocktree.
- \end{itemize}
- The intiution behind these rules is that a writecap is only valid if there is a chain of trust
- that leads back to the owner of the block tree. The owner may delegate their trust to any number
- of intermediaries by issuing them writecaps. These writecaps are scoped based on the path
- specified when they are issued. These intermediaries can then delegate this trust as well.
- A block is considered valid if it contains a valid writecap, it was signed using the key
- corresponding to the first writecap's public key, and this signature is valid.
- Blocks are used for more than just orgnaizing data, they also organize computation. A program
- participating in the blocktree network is referred to as a node. Multiple nodes may be run on
- a single computer. Every node is attached to the blocktree at a specific path. This information
- is recorded in the block where the node is attached. A node is responsible for the storage of
- the block where it is attached and the blocks that are descended from this block, unless there
- is another node attached to a descendent block.
- In this way data storage can be delegated, allowing
- the system to scale. When more than one node is attached to the same block they form a cluster.
- Each node in the cluster contains a copy of the data that the cluster is reponsible for. They
- maintain consistency of this data by running the Raft consensus protocol.
- TODO: Nodes have keys. These keys are signed by the root key, creating a tree of trust.
- TODO: Symbolic links and readonly models (views?).
- While the consistency of an individual block can be maintained using Raft, in order to enable
- transactions which span multiple blocks a distributed locking mechanism is employed. This is
- accomplished by exploiting hierarchical arrangement of nodes in the tree. In order to describe
- this, its first helpful to define a new term. The \emph{nodetree} of a blocktree is tree obtained
- from the blocktree by collapsing all the blocks that a node (or cluster of nodes) is responsible
- for into a single logical block representing the node itself. Thus we can talk about the node
- which is the parent of another node, and by these we mean it is the parent of the node in the
- nodetree. What this means in terms of the blocktree is that it is the first node encountered
- when one traverses the path from the current block back to the root. Now, distributed locking
- works as follows:
- \begin{itemize}
- \item A node sends a request to lock a block to the current concensus leader in its cluster.
- If the node is not part of a cluster, then it is the leader. This request contains a timestamp
- for when the lock expires, preventing the situation where a lock is never released.
- \item If the leader is responsible for the block then it moves on to the next step. Otherwise
- it contacts its parent and forwards the lock request and this step is repeated for the parent.
- \item The responsible node checks to see if there is already a lock for this block. If there is
- then the request fails. Otherwise the request succeeds and a lock is placed on the block. A
- message indicating the result is then pass back up the tree ending at the orignal node. This
- message includes the principal of the node enforcing the lock.
- \item Once the locking node is done making its updates it sends a message directly to the node
- enforcing the lock, causing it to remove the lock.
- \end{itemize}
- Locking a block locks the subtree rooted at that block. Thus no writes to any path contained in
- the path of the locked block will be allowed, unless they come from locking node. If the locking
- node does not send the message
- unlocking the block before the lock expires, then the modifications which had been performed by
- it are dropped and the block reverts to its prior state. Since the locking node is the leader
- of the consensus cluster that is responsible for the block's, this guarantees that
- writes from other nodes will not be accepted.
- \section{Connecting Blocktrees}
- In order to allow nodes to access blocks in other blocktrees, a global ledger of events is used.
- This ledger is implemented using a proof of work (PoW) blockchain and a corresponding
- cryptocurrency known as blockcoin. Nodes mine chain blocks (not to be confused with the tree
- blocks we've been discussing up till now) in the same way they do in other PoW blockchain
- systems such as BitCoin. The node which manages to mine the next chain block receives a reward,
- which is the sum of the fees for each event in the chain and a variable amount of newly minted
- blockcoin. The amount of new blockcoin created by a chain block is directly proportional to the
- amount of data storage events contained in the chain block. Thus the total amount of blockcoin
- in circulation has a direct relationship to the amount of data stored in the system, reflecting
- the fact that blockcoin exists to provide and accounting mechanism for data stored in the system.
- When a node writes data to a tree block, and it wishes this block to be globally accessible, then
- it produces what are called fragments. Fragments are the output symbols from an Erasure Coding
- algorithm (such as the RaptorQ code). These algorithms are a class of fountain codes which have
- the property that only $m$ out of $n$ (where $m < n$) symbols are needed to reconstruct the
- original data. Such a code ensures that even if some of the fragments are lost, as long as $m$
- remain, the original data can be recovered.
- Once these fragments have been computed an event is created for each one and published to the
- blockchain. This event indicates to other nodes that this node wishes to store a fragment and
- states the amount of blockcoin the node will pay to the first node that accepts the offer. When
- another nodes wishes to accept the offer, it directly contacts the first node, who then sends
- it the fragment an publishes and event stating that the fragment is stored with the second
- node. This event includes the path of the block the fragment was computed from, the fragment's
- ID (the sequence number from the erasure code), and the principal of the node which stored it.
- Thus any other node in the network can use the information contained in these events to
- determine the set of nodes which contain the fragments of any given path.
- In order for nodes to be able to conntact other nodes, a mechanism is required for associating
- an internet protocol (IP) address with a principal. This is done by having nodes publish events
- to the blockchain when their IP address changes. This event includes their new IP address,
- their public key, and a digital signature computed using their private key. Other nodes can
- then verify this signature to ensure that an attacker cannot bind the wrong
- IP address to a principal in order to receive messages it was not meant to have.
- While this event ledger is useful for appending new events, and ensuring that previous events
- cannot be changed, another data structure is required to ensure that queries on this data can
- be performed efficiently. In particular, it's important to be able to quickly perform the
- following queries:
- \begin{itemize}
- \item Find the set of nodes storing the fragments for a given path.
- \item Find the IP address of a node or owner given a principal.
- \item Find the public key associated with a principal.
- \end{itemize}
- To enable these queries a special blocktree is maintained by each node in the network: the global blocktree.
- This tree does not support the usual writing and locking sematics of local blocktrees. It can be thought of as
- a left fold of all of the events in the blockchain, where each event is processed by updating blocks in the
- global tree appropriately. The above queries are facilitate by the following blocks:
- \begin{itemize}
- \item \emph{/global/fragments}: this block contains a hashtable where the key is a path and the value is the list
- of nodes storing the fragments for the block at that path.
- \item \emph{/global/principals}: contains a hashtable where the key is the a principal and the value is the tuple
- contining the public key of that principal, its current IP address, and its current blockcoin balance.
- \end{itemize}
- To compute the entries in these tree blocks, the nodes in the network iterate over all the chain blocks, updating
- their local copy of each tree block approriately. The experienced reader will recognize that this is an event
- sourced architecture. At this time only the two tree blocks are known to be needed, but if new events need to be
- added to the system it's easy to see that this system can be used for creating other data structures enabling
- queries that we have yet to envision. One such extension is the registration of globally unique names, which will
- be the focus of future work.
- \section{Programming Environment}
- Enabling an excellent developer experience is one of the primary goals of this system (the others being security
- and scaleability). Nodes execute user code that has been compiled into WebAssembly modules. Such code
- running on a blocktree node is referred to as an "app". An app
- executes in a sandbox that isolates it from other code, as well as the security critical operations of the node
- itself. The sandbox provides the code with an extension of the WebAssembly System Interface (WASI), with extra
- system calls to interact with the particulars of the blocktree system. The stadard WASI filesystem APIs are used
- to interact with the contents of blocktrees. For instance a file descriptor for a remote block can be obtained
- by calling path\_open. Writes and reads of blocks are performed using the privledges of the node on which
- the app is running. The extra system calls fall into three categories:
- \begin{itemize}
- \item Distributed Locking
- \item Messaging
- \item Supervision Trees
- \item Protocol Contracts (Session Types)
- \end{itemize}
- When an app is installed it is given a block under which it can store data that is shared between all nodes
- in the blocktree. The path of this block is formed by prefixing the path the app was published at
- with the string "/apps". When an app is installed on a particular node, it is run in a block contained
- in the node's block. It can read and write blocks in this block, and to allow it to access shared data,
- a symbolic link is created to the app's block in "/apps".
- % App Publishing
- These apps are, of course, distributed in a blocktree. The path of the block used to publish the app is used to
- identify the app, and must be unique. The block containing the app contains the WebAssembly module itself as
- well as JSON file containing the app's manifest. This manifest defines the app's name as well as the list
- of permissions it requires. This list of persmissions is used to determine which APIs the app will have access to.
- % Privacy Safe vs Unsafe
- Apps are broken into two large categories: those that are privacy safe and those that are not.
- An app is privacy unsafe if it requests any permissions which allow it to send data
- to any node that is not part of the blocktree in which it's running. Thus request the ability to
- open a TCP socket would cause an app to be privacy unsafe. Similarly the creation of a protocol handler for
- HTTP would also be privacy unsafe. Privacy unsafe apps can limit the scope of their unsafety by
- imposing limits on the unsafe APIs that they request. For instance an app which needs to send
- blocktree message back to the blocktree it was published in can request the messaging permission
- for a path in this tree. Similarly, an app which only wants to open a TCP socket listening on the
- local network, can limit the scope of its requested permission.
- % Protocol Contracts
- In order to make it easy to write apps which interface with existing systems, most notably
- those using HTTP, apps are able to define protocols using a contract language and then register
- call backs for these protocols. This works by providing a system call which the app can
- supply a protocol contract to. This contract is then compiled into a state machine by the node
- and a handle is returned to the app. This handle is then used to register callbacks for different
- parts of the protocol. This ensures that the protocol is handled by the node itself and that
- protocols can be shared between many different apps as a library. For instance, the HTTP protocol
- would be compiled to a particularly simply state maching, with only one state: the listening state.
- This state would expose a hook that where a callback can be registered to handle the request.
- This state also defines the record format used to pass the request information to the callback
- and the record format of the return value that is expected in order to produce the response.
- More complicated (stateful) protocols would more states, each defining their own request and
- response records, as well as hooks. One nice thing about this setup is that it will enable
- optimizations where the state machine and the user callbacks can be compiled into a program
- which can be safely run in the node itself, or even in a SmartNIC. This would require that
- the callbacks only use an approved set of APIs, but it can enable much higher performance
- network services.
- % Supervision Trees
- Apps can also arrange themselves into supervision trees, in the same way that Erlang
- processes are arranged. In this scheme, when a child app crashes, or the node its
- running on dies (which is detected by the node), then the app receives a message. In the
- simplest case this can be used to implement a logging system, where crashes and node
- deaths are recorded. More interestingly, this could be used to integrate with a control
- plan. For instance, if a blocktree were running in AWS, when a message is recevied indicating
- that a node has died a new EC2 instance could be started to replace it. Of course these
- are just two of the potential applications of this mechanism. The reliability of Erlang
- and other system employing the Actor Model have shown the robustness of this approach.
- \section{A Brave New Web}
- In order to explore how this system can be used, the design of several hypothetical systems
- is discussed. It's important to note that blocktree does not try to force all compuatation
- to be local to a user's device, it merely trys to enable this for applications where it
- is possible.
- \subsection{A contacts application.}
- \subsection{A distributed social network.}
- \subsection{An ecomerce website.}
- \subsection{The Open Metaverse}
- \end{document}
|