Procházet zdrojové kódy

Added more sections to the paper.

Matthew Carr před 2 roky
rodič
revize
396dc8a7a2
2 změnil soubory, kde provedl 133 přidání a 266 odebrání
  1. 133 14
      doc/Paper/Paper.tex
  2. 0 252
      doc/paper/BlockTree.tex

+ 133 - 14
doc/Paper/Paper.tex

@@ -75,6 +75,41 @@ the system to scale. When more than one node is attached to the same block they
 Each node in the cluster contains a copy of the data that the cluster is reponsible for. They
 Each node in the cluster contains a copy of the data that the cluster is reponsible for. They
 maintain consistency of this data by running the Raft consensus protocol.
 maintain consistency of this data by running the Raft consensus protocol.
 
 
+TODO: Nodes have keys. These keys are signed by the root key, creating a tree of trust.
+
+TODO: Symbolic links and readonly models (views?).
+
+While the consistency of an individual block can be maintained using Raft, in order to enable
+transactions which span multiple blocks a distributed locking mechanism is employed. This is
+accomplished by exploiting hierarchical arrangement of nodes in the tree. In order to describe
+this, its first helpful to define a new term. The \emph{nodetree} of a blocktree is tree obtained
+from the blocktree by collapsing all the blocks that a node (or cluster of nodes) is responsible
+for into a single logical block representing the node itself. Thus we can talk about the node
+which is the parent of another node, and by these we mean it is the parent of the node in the
+nodetree. What this means in terms of the blocktree is that it is the first node encountered
+when one traverses the path from the current block back to the root. Now, distributed locking
+works as follows:
+\begin{itemize}
+\item A node sends a request to lock a block to the current concensus leader in its cluster.
+  If the node is not part of a cluster, then it is the leader. This request contains a timestamp
+  for when the lock expires, preventing the situation where a lock is never released.
+\item If the leader is responsible for the block then it moves on to the next step. Otherwise
+  it contacts its parent and forwards the lock request and this step is repeated for the parent.
+\item The responsible node checks to see if there is already a lock for this block. If there is
+  then the request fails. Otherwise the request succeeds and a lock is placed on the block. A
+  message indicating the result is then pass back up the tree ending at the orignal node. This
+  message includes the principal of the node enforcing the lock.
+\item Once the locking node is done making its updates it sends a message directly to the node
+  enforcing the lock, causing it to remove the lock.
+\end{itemize}
+Locking a block locks the subtree rooted at that block. Thus no writes to any path contained in
+the path of the locked block will be allowed, unless they come from locking node. If the locking
+node does not send the message
+unlocking the block before the lock expires, then the modifications which had been performed by
+it are dropped and the block reverts to its prior state. Since the locking node is the leader
+of the consensus cluster that is responsible for the block's, this guarantees that
+writes from other nodes will not be accepted.
+
 \section{Connecting Blocktrees}
 \section{Connecting Blocktrees}
 In order to allow nodes to access blocks in other blocktrees, a global ledger of events is used.
 In order to allow nodes to access blocks in other blocktrees, a global ledger of events is used.
 This ledger is implemented using a proof of work (PoW) blockchain and a corresponding 
 This ledger is implemented using a proof of work (PoW) blockchain and a corresponding 
@@ -120,21 +155,105 @@ following queries:
 \item Find the IP address of a node or owner given a principal.
 \item Find the IP address of a node or owner given a principal.
 \item Find the public key associated with a principal.
 \item Find the public key associated with a principal.
 \end{itemize}
 \end{itemize}
-These are enabled by creating a read model from the data in the event ledger. One possible
-implementation of this read  model is the following SQL database.
-This database contains two tables:
+To enable these queries a special blocktree is maintained by each node in the network: the global blocktree.
+This tree does not support the usual writing and locking sematics of local blocktrees. It can be thought of as
+a left fold of all of the events in the blockchain, where each event is processed by updating blocks in the
+global tree appropriately. The above queries are facilitate by the following blocks:
 \begin{itemize}
 \begin{itemize}
-\item \emph{Fragments}: one column containing path, one for the fragment ID, and another for the principal. Indexed on the path column.
-\item \emph{Principals}: one column containing principal, one column for the public key, and
-	one column for the IP address, one columns for the amount blockcoin the principal has.
-	Indexed on the principal column.
+\item \emph{/global/fragments}: this block contains a hashtable where the key is a path and the value is the list
+of nodes storing the fragments for the block at that path.
+\item \emph{/global/principals}: contains a hashtable where the key is the a principal and the value is the tuple
+contining the public key of that principal, its current IP address, and its current blockcoin balance.
 \end{itemize}
 \end{itemize}
-This index is built from the event ledger by iterating over it, and modifying the database
-appropriately for each event. For instance when a fragment stored event is encountered then
-a row is added to the \emph{Fragments} table containing the path of the block, the fragment ID,
-and the principal which were recorded in the event. The reader who is experienced with software
-patterns will recognize that this is an event sourced architecture.
+To compute the entries in these tree blocks, the nodes in the network iterate over all the chain blocks, updating
+their local copy of each tree block approriately. The experienced reader will recognize that this is an event 
+sourced architecture. At this time only the two tree blocks are known to be needed, but if new events need to be
+added to the system it's easy to see that this system can be used for creating other data structures enabling
+queries that we have yet to envision. One such extension is the registration of globally unique names, which will
+be the focus of future work.
+
+\section{Programming Environment}
+Enabling an excellent developer experience is one of the primary goals of this system (the others being security
+and scaleability). Nodes execute user code that has been compiled into WebAssembly modules. Such code
+running on a blocktree node is referred to as an "app". An app
+executes in a sandbox that isolates it from other code, as well as the security critical operations of the node
+itself. The sandbox provides the code with an extension of the WebAssembly System Interface (WASI), with extra 
+system calls to interact with the particulars of the blocktree system. The stadard WASI filesystem APIs are used
+to interact with the contents of blocktrees. For instance a file descriptor for a remote block can be obtained
+by calling path\_open. Writes and reads of blocks are performed using the privledges of the node on which
+the app is running. The extra system calls fall into three categories:
+\begin{itemize}
+\item Distributed Locking
+\item Messaging
+\item Supervision Trees
+\item Protocol Contracts (Session Types)
+\end{itemize}
+
+When an app is installed it is given a block under which it can store data that is shared between all nodes
+in the blocktree. The path of this block is formed by prefixing the path the app was published at
+with the string "/apps". When an app is installed on a particular node, it is run in a block contained
+in the node's block. It can read and write blocks in this block, and to allow it to access shared data,
+a symbolic link is created to the app's block in "/apps". 
+
+% App Publishing
+These apps are, of course, distributed in a blocktree. The path of the block used to publish the app is used to
+identify the app, and must be unique. The block containing the app contains the WebAssembly module itself as
+well as JSON file containing the app's manifest. This manifest defines the app's name as well as the list
+of permissions it requires. This list of persmissions is used to determine which APIs the app will have access to.
+
+% Privacy Safe vs Unsafe
+Apps are broken into two large categories: those that are privacy safe and those that are not.
+An app is privacy unsafe if it requests any permissions which allow it to send data
+to any node that is not part of the blocktree in which it's running. Thus request the ability to
+open a TCP socket would cause an app to be privacy unsafe. Similarly the creation of a protocol handler for
+HTTP would also be privacy unsafe. Privacy unsafe apps can limit the scope of their unsafety by
+imposing limits on the unsafe APIs that they request. For instance an app which needs to send
+blocktree message back to the blocktree it was published in can request the messaging permission
+for a path in this tree. Similarly, an app which only wants to open a TCP socket listening on the
+local network, can limit the scope of its requested permission.
+
+% Protocol Contracts
+In order to make it easy to write apps which interface with existing systems, most notably
+those using HTTP, apps are able to define protocols using a contract language and then register
+call backs for these protocols. This works by providing a system call which the app can
+supply a protocol contract to. This contract is then compiled into a state machine by the node
+and a handle is returned to the app. This handle is then used to register callbacks for different
+parts of the protocol. This ensures that the protocol is handled by the node itself and that
+protocols can be shared between many different apps as a library. For instance, the HTTP protocol
+would be compiled to a particularly simply state maching, with only one state: the listening state.
+This state would expose a hook that where a callback can be registered to handle the request.
+This state also defines the record format used to pass the request information to the callback
+and the record format of the return value that is expected in order to produce the response.
+More complicated (stateful) protocols would more states, each defining their own request and
+response records, as well as hooks. One nice thing about this setup is that it will enable
+optimizations where the state machine and the user callbacks can be compiled into a program
+which can be safely run in the node itself, or even in a SmartNIC. This would require that
+the callbacks only use an approved set of APIs, but it can enable much higher performance
+network services.
+
+% Supervision Trees
+Apps can also arrange themselves into supervision trees, in the same way that Erlang
+processes are arranged. In this scheme, when a child app crashes, or the node its
+running on dies (which is detected by the node), then the app receives a message. In the
+simplest case this can be used to implement a logging system, where crashes and node
+deaths are recorded. More interestingly, this could be used to integrate with a control
+plan. For instance, if a blocktree were running in AWS, when a message is recevied indicating
+that a node has died a new EC2 instance could be started to replace it. Of course these
+are just two of the potential applications of this mechanism. The reliability of Erlang
+and other system employing the Actor Model have shown the robustness of this approach.
+
+\section{A Brave New Web}
+In order to explore how this system can be used, the design of several hypothetical systems
+is discussed. It's important to note that blocktree does not try to force all compuatation
+to be local to a user's device, it merely trys to enable this for applications where it
+is possible.
+
+\subsection{A contacts application.}
+
+\subsection{A distributed social network.}
+
+\subsection{An ecomerce website.}
 
 
-\section{Programming Interface}
+\subsection{The Open Metaverse}
 
 
-\end{document}
+\end{document}

+ 0 - 252
doc/paper/BlockTree.tex

@@ -1,252 +0,0 @@
-\documentclass{book}
-\usepackage{amsfonts,amssymb,amsmath,amsthm}
-\usepackage[scale=0.75]{geometry}
-\usepackage{hyperref}
-
-\begin{document}
-\tableofcontents
-\chapter{System Overview}
-% I should replace "consumer" with "user".
-The development of the internet was undoudedtly one of the greatest achievements in the 20th 
-century, and the internet's killer app, the web, has reshaped our lives and the way we do
-business. But, for all the benefits we have received from these technologies there have
-corresponding costs. It's now possible to cheaply surveil entire popluations and discern their
-preferences and responses so that propaganda can be effectively created and distributed. The
-surveilence it not surepticious, it's quite overt. Consumers hand over this data willingly to
-tech companies because of the benefits they receive in return. But why should people be forced to
-choose between privacy and convenience?
-
-The cost of computing power, storage, and bandwidth are very cheap. A single board computer
-can provide more than enough computing power for the average web-browsing consumer. A classic rotating magnetic hard drive can hold terrabytes of data, more than enough to hold an individual's
-social media output. Umetered broadband internet access measured in hudreds of megabits per
-second is common, and becoming more so all the time. So with all these resources available,
-why is it that consumers do not more control over the computing infrastructure upon which
-they rely?
-
-The issue is that consumers' don't want to manage servers, networking, routing, and
-backup strategies. Each of these can be a full time job by itself. So in order for a consumer to
-be in control of their own infrastructure, the infrastructure needs to be able to take care of
-itself.
-
-Perhaps as important as consumer agency over their computing infrastructure, is the security of
-that infrastructure. Time and time again consumer trust has been violated by data breaches and
-service disruptions. These incidents show the need for a more secure and standardized system for
-storing application data and building web services. Such a system must provide both
-confidentiality of consumer data, and the ability to authenticate consumers without the
-need for insecure techniques, such as passwords.
-
-This document proposes a potential solution. It describes a system for organizing information into
-trees of blocks, the distribution of those blocks over a network of nodes, and a programming interface
-to access this information. Because no one piece of hardware
-is infallible, the system also includes mechanisms for nodes to contract with one another to store
-data. This allows data to be backed up and later restored in the case a node is lost. In order to
-ensure the free exchange of data amongst nodes, a digital currency is used to account for the
-available storage capacity of the network.
-
-The remainder of this chapter will give an overview of the system, with the remainder of the
-document going into specific details of each of the system's components.
-
-\section{Blocks}
-User data stored in this system is organized into structures called \emph{block trees}. Every block tree
-is identified with a public key. The private key that corresponds to a block tree's public key is
-required to control that tree. Any person who has the private key for a block tree is called that
-tree's owner.
-
-Computers participating in this system are called \emph{nodes}. Nodes are also identified by public keys, but
-these keys are not directly tied to block trees. Nodes that have access to a block trees data are said
-to be \emph{attached} to that block tree. Nodes can be attached to multiple block trees at once, or none
-at all.
-
-Block trees are of course trees of \emph{blocks}. Every block is identified by a string called a \emph{path},
-which describes its location in the tree. The root of this path is a hash (hex encoded)
-of the tree's public key, allowing blocks from any tree to be referred to. A block consists of three segments:
-a header, a payload and a signature.
-
-The payload is encrypted by a symmetric cipher using a randomly generated key. This randomly
-generated key is called the \emph{block's key}. To allow access to the payload, the block's key is encapsulated
-using other keys and the resulting cipher texts are stored in the block's header. These encapculated keys
-are referred to as read capabilities, or \emph{read caps} for short.
-The root node of every block tree contains a read cap for the block tree's public key.
-Every non-root block contains a read cap
-for the block's parent, which is to say the block's key is encapsulated using the its parent's block key.
-So when one has a read cap for a block, they can read the data in all blocks descended from that
-block. Because the owner of a block tree has a read cap for the root block, they can read all data
-stored in the tree. Other people (or nodes) can be given access to a subtree by granting them a read
-cap for the subtree's root. A block which contains public data is stored as cleartext with no read caps.
-
-While read caps provide for confidentiality, write caps provide for integrity. A \emph{write cap}
-for a block is a certificate chain which terminates at a certificate signed by the block tree's
-owner. Thus a self-signed certificate made using the tree's private key is a valid write cap
-for any block in the tree. By allowing a chain of certificates to be used, it's possible for
-the owner to give other people or nodes the ability to write data into their tree. The scope of
-this access is controlled by specifying the path under which writing is allowed to the certificiate.
-A write cap for a block is only valid if the path of the block is a contained in the path
-specified in every certificate in the chain.
-
-Both the header and the payload of a Block are protected using a private key signature. The writer
-of the block computes this signature using the private key which corresponds to the write cap 
-for the block they're trying to write. In order to validate a block, this signature is validated, then
-the Write Cap is validated, and finally the hash of the public key of
-the last signer in the Write Cap chain is compared to the root of the Block's path. If these match,
-then the block is valid, as this means that an owner has given permission for the writer to write
-into their tree at this path.
-
-Accessing the data in a block requires several cryptographic operations, both for vaidation and
-for decryption. Because of this its important that blocks are relatively large, on the order of
-4 MB, to amortize the cost of these operations.
-
-\section{Fragments}
-By itself this block structure would be useful for building a secure filesystem, but in order to
-be a durable storage system we need an efficient way of distributing data for redundancy and
-availability. This is the purpose of fragments.
-
-Blocks are distributed amongst nodes in the network using a fountain code. The output symbols
-of this code are referred to as \emph{fragments}. A code with a high performance implementation and good
-coding efficiency is an important design consideration for the system. For these reasons the
-RaptorQ code was chosen.
-
-In order to preserve the data in a newly created block, a node will need to distribute
-fragments to other nodes. It does this by advertising its desire to trade [currency]
-in its block tree for the storage of these fragments. \emph{[currency]} is a fungible
-token for the exchange of computing resources between nodes. Every block tree has
-some non-negative value for the amount of [currency] it controls. Nodes that are attached
-to a tree spend the tree's [currency] when paying other nodes for the storage of fragments.
-
-If another node is interested in making the exchange, it contacts the advertising node
-and both sign a contract. A \emph{contract} is a data structure signed by both nodes which
-states that hash of the fragment being stored and the amount of [currency] being exchanged
-for its storage. The contract is then stored in the public block tree (to be discussed below),
-so that [currency] can be transerfed between nodes and to create an accountability mechanism
-to prevent the storing node from acting in bad faith and deleting the fragment.
-
-When a node needs to retreive a block that was previously distributed in fragments, it connects to a
-subset of nodes containing the fragments and downloads enough to reconstruct 
-the block. These downloads can be performed concurrently for greater speed. This same mechanism
-can be used to distribute public blocks to unaffiliated nodes. This mechanism facilitates load balancing
-and performance, as concurrent downloads
-spread the load over multiple nodes and are not limited by the bandwidth between any pair of nodes.
-
-The list of nodes containing the fragments of a block is called the block's \emph{node list}.
-A block's node list is stored in it's parent. This allows for any non-root block to be retreived.
-To allow the root block to be retrieved its node list is stored in the public block tree.
-
-\section{The Public Blocktree}
-\emph{The Public Block Tree} is a block tree which is known to all nodes. This is accomplished by
-providing all nodes with a hardcoded list of nodes that are attached to the public block tree.
-This is similar to the list of root DNS servers distributed with any networked operating system.
-Because the public block tree is only used for storing information that should be known to all
-nodes in the network, the payload of every block in it is cleartext. The public block tree serves
-only to facilitate the communication and exchange of data between nodes.
-
-One way that it does this is by containing a database of nodes and their IP addresses. A node
-which has a write cap to this database will only store an entry for a node if that node can provide
-a valid signed request. This signed request is stored in the database verbatim, so that other nodes can
-independently verify its validity. Thus the nodes in the network can use this database to securely resolve
-the IDs of other node's to their IP addresses.
-
-The other function of the public block tree is to contain a list of transactions and disputes.
-This list is referred to as the \emph{public log}.
-When a node is created, an event is logged detailing the amount of [currency] the node is worth.
-When a node is first attached to a block tree, this [currency] is then removed from the node
-and added to the block tree. When a node signs a contract with another node, it is stored in the
-log and [currency] is removed by the sending node's block tree and added to the receiving node's.
-
-In order to discourage nodes from receive payment for the storage of a fragment, then deleting
-the fragment to reclaim disk space, a reporting mechanism exists. If a node is unable to retrieve
-a fragment that it previously stored with another node, then it sends an event to the log
-indicating this. The other node can then respond by sending an event which contains the actual
-fragment which was requested. This allows all the nodes in the network to view the log and
-see if a node that they are considering signing a contract with is trustworthy. If they
-are not the defendant in any disputes, then they should be safe. If they are in one, but responded
-quickly with the fragment, then it could have been a transient network issue. If they never
-responded, then they are risky and should perhaps receive a lower payment for the storage
-of the fragment.
-
-Finally, the public block tree stores node lists for the root blocks of every block tree.
-This ensures that even if every node that participates in a block tree fails, the block
-tree can still be recovered from its fragments, provided its private key is known.
-
-\section{Nodes and the Network}
-Each node in the network has a public-private keypair. The string formed by hex encoding the
-hash of a node's public key is referred to as the \emph{node ID} of the node. When nodes
-are manufactured they are issued a certificate trusted by the
-public block tree. New nodes are claimed by issuing them a certificate and then writing
-that certificate into the public log. When a new node is claimed, currency is credited to
-the block tree which claimed it. This currency is to account for the storage capacity
-that the new node brings to that block tree. This mechanism is the reason why the node must have a
-certificate trusted by the public block tree, otherwise there would be no way to control the
-creation of currency.
-
-Nodes are identified by their node ID in the public block tree and in node lists. Nodes
-are responsible for updating their IP address in the public block tree whenever it changes.
-
-When a node is attached to a block tree it is issued a certificate containing a path
-under which its data will be stored. We say the the node is attached to the block tree at that path.
-The node which issues the node its certificate creates a read cap for it and stores it in the
-block where the node is attached.
-
-The data created by a node may optionally be replicated in its parent node. This would be suitable
-for a lightweight or mobile device which needs to ensure its data is replicated immediately and
-doesn't have time to negotiate contracts for the storage of fragments. For larger 
-block trees, having non-replicating nodes is essential for scalability.
-
-More than one node can be attached at the same path, and when this happens a \emph{cluster} is formed.
-Each node in the cluster stores
-copies of the same data and they coordinate with each other to ensure the consitency of this
-data. This is accomplished by electing a leader. All writes to blocks under the attachment point are
-sent to the leader. The leader then serializes these writes and sends them to the rest of the
-nodes. By default writes to blocks use optimistic concurrency, with the last write known to the
-leader being the winner. But if a node requires exclusive access to a block it can
-make a request to the leader to lock it. Writes from nodes other than the locking node are rejected until
-the lock is released. The leader will release the lock on its own if no messages are received from
-the locking node after a timeout.
-
-If the attachment point is configured to be replicated to its parent, then the leader will maintain
-a connection to the the leader in the parent cluster. Note that the parent cluster need not be
-housed in the parent block, just at some ancestor block. Then, writes will be propagated through
-this connection to the parent cluster, where this process may continue if that cluster is also
-configured for replication. Distributed locking is similarly comunicated to the parent cluster,
-where the lock is only aquired with the parent's approval.
-
-\section{Programmatic Access to Data}
-No designer can hope to envsion all the potential applications that a person would want to have
-access to their data. That's why an important component of the system is the ability to run
-programs that can access data and provide services to other internet hosts, whether they are
-blocktree nodes or not. This is accomplished by providing a WebAssembly based execution
-environment with a system interface based on WASI. Information in the blocktree is available
-to programs using standard filesystem system calls that specify paths in a special directory.
-While some programs may wish to access blocks directly in this manner, others may wish to use
-an API at a higher level of abstraction. Thus there is also an API for creating arbitrarily sized
-files that will get mapped to fixed sized blocks, freeing the programmer from having to implment
-this themselves.
-
-Data provided by these filesystem APIs will be the most up-to-date versions known to the node.
-There's the possiblity that conflicts with other nodes may cause writes made by programs on the
-node to be rolled back or overwritten. Of course locks can be taken on files and blocks if a
-program must ensure exclusive access to data. Finally, an inotify-like API is provided for programs to be notified when changes to blocks occur.
-
-An important consideration for the design of this system was to facilitate the creation of web
-servers and other types of internet hosts which can serve data stored in a blocktree. For this
-reason there is a high level callback based API for declaring HTTP handlers, as well as handlers
-for blocktree specific messages.
-
-In order to provide the consumer with control over how their data is used a permissions system
-exists to control which blocks and APIs programs have access to. For instance a consumer would
-have to grant special permission for a program to be able to access the Berkeley sockets API.
-
-Programs are installed by specifing a blocktree path. This is a secure and convenient method of
-distribution as these programs can be downloaded from the nodes associated with the root of their
-path and the downloaded blocks can be cryptographically verified to be trusted by the root 
-key. Authors wishing to distribute their programs in this manner will of course need to make the
-blocks containing them public (unencrypted), or else provide some mechanism for selective access.
-
-\chapter{Data Structures}
-
-\chapter{Nodes}
-
-\chapter{The Network}
-
-\chapter{Application Programming Interface}
-
-\chapter{Example Applications}
-\end{document}