|
@@ -240,24 +240,190 @@ path and the downloaded blocks can be cryptographically verified to be trusted b
|
|
|
key. Authors wishing to distribute their programs in this manner will of course need to make the
|
|
|
blocks containing them public (unencrypted), or else provide some mechanism for selective access.
|
|
|
|
|
|
-\chapter{Data Structures}
|
|
|
+\chapter{Concepts}
|
|
|
|
|
|
\section{Blocks}
|
|
|
-The fundamental cryptographic operations that can be performed on blocks are:
|
|
|
-\begin{itemize}
|
|
|
-\item Encrypt body.
|
|
|
-\item Decrypt body.
|
|
|
-\item Add a writecap.
|
|
|
-\item Sign
|
|
|
-\item Verify
|
|
|
-\end{itemize}
|
|
|
+A block is a sequence of bytes and a sequence of events. The sequence of events define how the
|
|
|
+sequence of bytes came to be in its current state. At any time the sequence of bytes can
|
|
|
+be recreated by replaying the events starting with an empty sequence of bytes. Thus the sequence of
|
|
|
+events is considered the canonical form of of the block, with the sequence of bytes simply enabling
|
|
|
+efficient reads.
|
|
|
+
|
|
|
+Blocks are hierarchical, with every block having at most one parent and zero or more children.
|
|
|
+If a block has children then it is called a directory, and its children are called its directory
|
|
|
+entries. This hierarchical structure allows us to identify blocks using their position in the
|
|
|
+hierarchy.
|
|
|
+
|
|
|
+\section{Paths}
|
|
|
+A path is a globally unique identifier assigned to a block. The path of the block defines its
|
|
|
+position in the hierarchy of blocks. The syntax of a path is as follows:
|
|
|
+\begin{verbatim}
|
|
|
+ COMP ::= '[\w-_\.]'+
|
|
|
+ RelPath ::= COMP ('/' COMP)*
|
|
|
+ AbsPath ::= '/' RelPath*
|
|
|
+\end{verbatim}
|
|
|
+In other words, a path is a sequence of components, represented texturally as `/' separated
|
|
|
+fields. The empty sequence of components is called the root path.
|
|
|
+The root path is a directory, and its entries are the blocks of the global blocktree and links
|
|
|
+to every private blocktree.
|
|
|
+A path to a private blocktree has only
|
|
|
+one component, consisting of the Hash of the private blocktree's root credentials. Any path with
|
|
|
+only one components which consists of a Standard Hash
|
|
|
+of the blocktree's root credentials is a valid path to the blocktree. These paths are
|
|
|
+called the root paths of the private blocktree, and any path that begins with one of them
|
|
|
+is simply called a private path. Conversely, paths that do not begin with them are called public.
|
|
|
+
|
|
|
+If one path is a prefix of a second, then we say the first path contains the second. Thus every path
|
|
|
+is contained in the root path, and every private path is contained in the root path of a private
|
|
|
+blocktree.
|
|
|
+
|
|
|
+In addition to
|
|
|
+identifying blocks, paths are used to scope capabilities and to address
|
|
|
+messages to nodes and to the processes they're running.
|
|
|
+
|
|
|
+\section{Principals}
|
|
|
+A principal is any entity which can be authenticated.
|
|
|
+All authentication in the blocktree system is performed using digital signatures and as such
|
|
|
+principals are identified by a cryptographic hash of their public signing key. Support
|
|
|
+for the following hash algorithms is required:
|
|
|
+\begin{enumerate}
|
|
|
+ \item SHA2 256
|
|
|
+ \item SHA2 512
|
|
|
+\end{enumerate}
|
|
|
+These are referred to as the Standard Hash Algorithms and a digest computed using one of them is
|
|
|
+referred to as a Standard Hash.
|
|
|
+
|
|
|
+When a principal is identified in a textural representation, such as the textural representation of
|
|
|
+a path, then the following syntax is used:
|
|
|
+\begin{verbatim}
|
|
|
+ PrincipTxt ::= <hash algo index> '!' <Base64Url(hash)>
|
|
|
+\end{verbatim}
|
|
|
+where ``hash algo index'' is the base 10 string representation of the index of the hash algorithm in
|
|
|
+the above list, and ``Base64Url(hash)'' is the Base64Url encoding of the hash data identifying the
|
|
|
+principal.
|
|
|
+
|
|
|
+Principals can be issued capabilities which allow them to read and write to blocks.
|
|
|
+These capabilities are scoped by paths, which limit the set of blocks the capability can be used
|
|
|
+for. A capability can only be used on a block whose path is contained in the path of the
|
|
|
+capability. This access
|
|
|
+control mechanism is enforced cryptographically, as described below.
|
|
|
+
|
|
|
+A principal can grant a capability to another principal so long as the granted capability has a
|
|
|
+path which is contained in the capability possessed by the granting node. The specific mechanisms
|
|
|
+for how this is done differs depending on whether the capability is reading or writing to a block.
|
|
|
+
|
|
|
+Every private blocktree is associated with a root principal. The path containing only one
|
|
|
+component which consists of a hash of the root principal's public key is a root path to the private
|
|
|
+block tree. The root principal has read and write capabilities for the private blocktree's root,
|
|
|
+and so can grant subordinate capabilities scoped to any block in the private blocktree.
|
|
|
+
|
|
|
+\section{Readcaps}
|
|
|
+In order to protect the confidentiality of the data stored in a block, a symmetric cypher is used.
|
|
|
+The key for this cipher is called block key, and by controlling access to it we can control which
|
|
|
+principals can read the block.
|
|
|
+
|
|
|
+The metadata of every block contains a dictionary of zero or more entries called read capabilities,
|
|
|
+or readcaps for short. A key in this dictionary is a hash of the principal for which the
|
|
|
+readcap was issued and the value is the encryption of the block key using the principal's public
|
|
|
+encryption key.
|
|
|
+
|
|
|
+The block key is also encrypted using the block key of the parent block, and the resulting cipher
|
|
|
+text is stored in the block's metadata. This is referred to as the inherited readcap.
|
|
|
+Hence, a principal which has a read capability for a
|
|
|
+given path can read all paths contained in it as well. Further, a principal can use it's readcap
|
|
|
+to decrypt the block key, and then re-encrypt it using the public encryption key of another
|
|
|
+principal, thus allowing the principal to grant a subordinate readcap.
|
|
|
+
|
|
|
+A block that does not require confidentiality protection need not be encrypted. In this case, the
|
|
|
+table of readcaps is empty and the inherited readcap is set to a flag value to indicate the block
|
|
|
+is stored as plaintext. Blocks in the global blocktree are never encrypted.
|
|
|
|
|
|
\section{Writecaps}
|
|
|
-The cryptographic operations that can be performed on a writecap are:
|
|
|
-\begin{itemize}
|
|
|
-\item Sign
|
|
|
-\item Verify
|
|
|
-\end{itemize}
|
|
|
+A write capability, or writecap for short, is a certificate chain which extends from a node's
|
|
|
+credentials back to the root signing key of a private blocktree.
|
|
|
+Each certificate in this chain contains
|
|
|
+the public key of the principal who issued it and the path that it grants write capabilities to.
|
|
|
+The path of a certificate must be contained in the path of the next certificate in the chain, and
|
|
|
+the chain must terminate in a certificate which is self-signed by the root signing credentials.
|
|
|
+Each certificate also contains an expiration time, represented by an Epoch value, and a writecap
|
|
|
+shall be considered invalid by a node if any certificate has an expiration time which is judged by
|
|
|
+the node to be in the past.
|
|
|
+
|
|
|
+A block's metadata contains a writecap, which allows it to be verified by checking the digital
|
|
|
+signature on the metadata using the first certificate in the writecap, and then checking that the
|
|
|
+writecap is valid. This ensures that the chain of trust from the root signing credentials extends
|
|
|
+to the block's metadata. To extend this chain to the block's data, a Merkle Tree is used, as
|
|
|
+described below. A separate mechanism is used to ensure the integrity of blocks in the global
|
|
|
+block tree.
|
|
|
+
|
|
|
+\section{Sectors}
|
|
|
+A block's data is logically broken up into fixed sized sectors, and these are the units of
|
|
|
+confidentiality protection, integrity protection and consensus. When a process performs writes on
|
|
|
+an open block, those writes are buffered until a sector is filled (or until the process calls
|
|
|
+flush). Once this happens, the contents of the buffer are encrypted using the block key and then
|
|
|
+hashed. This hash, along with the offset at which the write occurred is then used to update the
|
|
|
+Merkle Tree for the block. Once the new root of the Merkle Tree is computed, it is copied into the
|
|
|
+block's metadata. After ensuring it's writecap is copied into the metadata, the node then signs
|
|
|
+the metadata using its private signing key.
|
|
|
+
|
|
|
+When the block is later opened its metadata is verified using the process described in the previous
|
|
|
+section. Then the value of the root Merkle Tree node in the metadata is compared to the value in the
|
|
|
+Merkle Tree for the block. If it matches, then reads to any offset into the block can be verified
|
|
|
+using the Merkle Tree. Otherwise, the block is rejected as invalid.
|
|
|
+
|
|
|
+The sector size can be configured for each block at creation time. The sector size cannot be changed
|
|
|
+after creation, but the same effect can be achieved by creating a new block with the desired sector
|
|
|
+size, copying the data from the old block into, deleting the old block and renaming the new one.
|
|
|
+The choice of sector size for a given block represents a tradeoff between the amount of space the
|
|
|
+Merkle Tree over the blocks contents will occupy and the latency experienced by the initial read to
|
|
|
+a sector. With a larger sector size, the Merkle Tree size is reduced, but more data has to be
|
|
|
+hashed and decrypted when each sector is read. Depending on the size of the block and its intended
|
|
|
+usage, the optimal choice will vary.
|
|
|
+
|
|
|
+It's important to note that the entire Merkle Tree for a block is held in memory for as long as the
|
|
|
+block is open. So, assuming the 32 byte sha2 256 hash and the default 4 KiB sector size are used,
|
|
|
+if a 3 GiB file is opened then its MerkleTree will occupy approximately 64 MiB of memory, but will
|
|
|
+enable fast random access. Conversely, if 64 KiB sectors are used the Merkle Tree for the same
|
|
|
+3 GiB block will occupy approximately 4 MiB, but it's random access performance will suffer from
|
|
|
+increased latency.
|
|
|
+
|
|
|
+\section{Nodes}
|
|
|
+The independent computing systems participating in the blocktree system are called nodes. A node
|
|
|
+need not run on hardware distinct from other nodes, but it is logically distinct from all other
|
|
|
+nodes. In all contemporary operating systems a node can be implemented as a process.
|
|
|
+A node posses it own unique credentials, which consist of two public and private key pairs,
|
|
|
+one used in an encryption scheme and another in a signing scheme. A node is a Principal
|
|
|
+and a hash of its public signing key is used to identify it. In a slight abuse of language, this
|
|
|
+hash is referred to as the node's principal.
|
|
|
+
|
|
|
+Nodes are identified by paths. We say that a node is attached to the blocktree at the directory
|
|
|
+containing it. A node is responsible for storing the blocks contained in the directory where it is
|
|
|
+attached. This allows data storage to scale as more nodes are added to a blocktree.
|
|
|
+
|
|
|
+When a directory contains more than one node a cluster is formed.
|
|
|
+The nodes in the cluster run the Raft consensus protocol in order to agree on the sequence
|
|
|
+of events which constitutes each of the blocks contained in the directory where they're attached.
|
|
|
+This allows for redundancy, load balancing, and increased performance, as different sectors
|
|
|
+of a block can be read from multiple nodes in the cluster concurrently.
|
|
|
+
|
|
|
+\section{Processes}
|
|
|
+Nodes run code and that running code is called a process. Processes are spawned by the node on which
|
|
|
+their running based on configuration stored in the node's directory. The code which they're spawned
|
|
|
+from is also stored in a block, though that block need not be contained in the directory of the
|
|
|
+node running it. So, for example, a software developer can publish their code in their
|
|
|
+blocktree, and a user can run it by configuring a node with the path it was published to. Code
|
|
|
+which is stored in it's own directory with a manifest describing it is called an app.
|
|
|
+
|
|
|
+There are two types of apps, portable and native. A portable app is a collection of WebAssembly
|
|
|
+(Wasm) modules which are executed by the node in a special Wasm runtime which exposes a
|
|
|
+messaging API which can be used to perform block IO and communicate with other processes and nodes.
|
|
|
+Native apps on the other hand are container images containing binaries compiled to a specific
|
|
|
+target architecture. These containers can access blocktree messaging services via native library
|
|
|
+using the C ABI, and they have access to blocks in their node's directory via a POSIX-compatible
|
|
|
+filesystem API. The software which runs the node itself is distributed using this mechanism.
|
|
|
+
|
|
|
+Regardless of their type, all apps require permissions to communicate with the outside world, with
|
|
|
+the default only allowing them to communicate with their parent.
|
|
|
|
|
|
\chapter{Nodes}
|
|
|
|