|
@@ -267,19 +267,20 @@ fields. The empty sequence of components is called the root path.
|
|
|
The root path is a directory, and its entries are the blocks of the global blocktree and links
|
|
|
to every private blocktree.
|
|
|
A path to a private blocktree has only
|
|
|
-one component, consisting of the Hash of the private blocktree's root credentials. Any path with
|
|
|
-only one components which consists of a Standard Hash
|
|
|
-of the blocktree's root credentials is a valid path to the blocktree. These paths are
|
|
|
+one component, consisting of the Hash of the private blocktree's root public signing key. Any path
|
|
|
+with only one components which consists of a fingerprint
|
|
|
+of the blocktree's root public key is a valid path to the blocktree. These paths are
|
|
|
called the root paths of the private blocktree, and any path that begins with one of them
|
|
|
-is simply called a private path. Conversely, paths that do not begin with them are called public.
|
|
|
+is simply called a private path. Conversely, paths that do not begin with them are called public
|
|
|
+paths.
|
|
|
|
|
|
-If one path is a prefix of a second, then we say the first path contains the second. Thus every path
|
|
|
-is contained in the root path, and every private path is contained in the root path of a private
|
|
|
-blocktree.
|
|
|
+If one path is a prefix of a second, then we say the first path contains the second and that the
|
|
|
+second is nested in the first.
|
|
|
+Thus every path is contained in the root path, and every private path is contained in the root path
|
|
|
+of a private blocktree.
|
|
|
+Note that if path one is equal to path two, then path one also contains path two, and vice-versa.
|
|
|
|
|
|
-In addition to
|
|
|
-identifying blocks, paths are used to scope capabilities and to address
|
|
|
-messages to nodes and to the processes they're running.
|
|
|
+In addition to identifying blocks, paths are used to scope capabilities and to address messages.
|
|
|
|
|
|
\section{Principals}
|
|
|
A principal is any entity which can be authenticated.
|
|
@@ -293,29 +294,31 @@ for the following hash algorithms is required:
|
|
|
These are referred to as the Standard Hash Algorithms and a digest computed using one of them is
|
|
|
referred to as a Standard Hash.
|
|
|
|
|
|
-When a principal is identified in a textural representation, such as the textural representation of
|
|
|
-a path, then the following syntax is used:
|
|
|
+When a principal is identified in a textural representation, such as in a path,
|
|
|
+the following syntax is used:
|
|
|
\begin{verbatim}
|
|
|
PrincipTxt ::= <hash algo index> '!' <Base64Url(hash)>
|
|
|
\end{verbatim}
|
|
|
where ``hash algo index'' is the base 10 string representation of the index of the hash algorithm in
|
|
|
the above list, and ``Base64Url(hash)'' is the Base64Url encoding of the hash data identifying the
|
|
|
-principal.
|
|
|
+principal. Such a textural representation is referred to as a fingerprint of the public key from
|
|
|
+which the hash was computed. In a slight abuse of language, we sometimes refer to the fingerprint or
|
|
|
+hash data as a principal, even though these data merely identify a principal.
|
|
|
|
|
|
Principals can be issued capabilities which allow them to read and write to blocks.
|
|
|
These capabilities are scoped by paths, which limit the set of blocks the capability can be used
|
|
|
-for. A capability can only be used on a block whose path is contained in the path of the
|
|
|
-capability. This access
|
|
|
-control mechanism is enforced cryptographically, as described below.
|
|
|
+on. A capability can only be used on a block whose path is contained in the path of the
|
|
|
+capability. This access control mechanism is enforced cryptographically.
|
|
|
|
|
|
A principal can grant a capability to another principal so long as the granted capability has a
|
|
|
path which is contained in the capability possessed by the granting node. The specific mechanisms
|
|
|
-for how this is done differs depending on whether the capability is reading or writing to a block.
|
|
|
+for how this is done differs depending on whether the capability is for reading or writing to a
|
|
|
+block.
|
|
|
|
|
|
Every private blocktree is associated with a root principal. The path containing only one
|
|
|
-component which consists of a hash of the root principal's public key is a root path to the private
|
|
|
-block tree. The root principal has read and write capabilities for the private blocktree's root,
|
|
|
-and so can grant subordinate capabilities scoped to any block in the private blocktree.
|
|
|
+component which consists of the fingerprint of the principal's public key is a root path to the
|
|
|
+private block tree. The root principal has read and write capabilities for the private blocktree's
|
|
|
+root, and so can grant subordinate capabilities scoped to any block in the private blocktree.
|
|
|
|
|
|
\section{Readcaps}
|
|
|
In order to protect the confidentiality of the data stored in a block, a symmetric cypher is used.
|
|
@@ -323,16 +326,16 @@ The key for this cipher is called block key, and by controlling access to it we
|
|
|
principals can read the block.
|
|
|
|
|
|
The metadata of every block contains a dictionary of zero or more entries called read capabilities,
|
|
|
-or readcaps for short. A key in this dictionary is a hash of the principal for which the
|
|
|
-readcap was issued and the value is the encryption of the block key using the principal's public
|
|
|
-encryption key.
|
|
|
+or readcaps for short. A key in this dictionary is a hash of the public signing key of the principal
|
|
|
+for which the readcap was issued and the value is the encryption of the block key using the
|
|
|
+principal's public encryption key.
|
|
|
|
|
|
The block key is also encrypted using the block key of the parent block, and the resulting cipher
|
|
|
-text is stored in the block's metadata. This is referred to as the inherited readcap.
|
|
|
+text is also stored in the block's metadata. This is referred to as the inherited readcap.
|
|
|
Hence, a principal which has a read capability for a
|
|
|
-given path can read all paths contained in it as well. Further, a principal can use it's readcap
|
|
|
-to decrypt the block key, and then re-encrypt it using the public encryption key of another
|
|
|
-principal, thus allowing the principal to grant a subordinate readcap.
|
|
|
+given path can read all paths contained in it as well. Further, a principal can use it's
|
|
|
+private encryption key to decrypt the block key, and then re-encrypt it using the public encryption
|
|
|
+key of another principal, thus allowing the principal to grant a subordinate readcap.
|
|
|
|
|
|
A block that does not require confidentiality protection need not be encrypted. In this case, the
|
|
|
table of readcaps is empty and the inherited readcap is set to a flag value to indicate the block
|
|
@@ -343,11 +346,11 @@ A write capability, or writecap for short, is a certificate chain which extends
|
|
|
credentials back to the root signing key of a private blocktree.
|
|
|
Each certificate in this chain contains
|
|
|
the public key of the principal who issued it and the path that it grants write capabilities to.
|
|
|
-The path of a certificate must be contained in the path of the next certificate in the chain, and
|
|
|
-the chain must terminate in a certificate which is self-signed by the root signing credentials.
|
|
|
+The path of a certificate must be contained in the path of the previous certificate in the chain,
|
|
|
+and the chain must terminate in a certificate which is self-signed by the root signing credentials.
|
|
|
Each certificate also contains an expiration time, represented by an Epoch value, and a writecap
|
|
|
-shall be considered invalid by a node if any certificate has an expiration time which is judged by
|
|
|
-the node to be in the past.
|
|
|
+shall be considered invalid by a node if any certificate has an expiration time which it judges
|
|
|
+to be in the past.
|
|
|
|
|
|
A block's metadata contains a writecap, which allows it to be verified by checking the digital
|
|
|
signature on the metadata using the first certificate in the writecap, and then checking that the
|
|
@@ -361,7 +364,7 @@ A block's data is logically broken up into fixed sized sectors, and these are th
|
|
|
confidentiality protection, integrity protection and consensus. When a process performs writes on
|
|
|
an open block, those writes are buffered until a sector is filled (or until the process calls
|
|
|
flush). Once this happens, the contents of the buffer are encrypted using the block key and then
|
|
|
-hashed. This hash, along with the offset at which the write occurred is then used to update the
|
|
|
+hashed. This hash, along with the offset at which the write occurred, is then used to update the
|
|
|
Merkle Tree for the block. Once the new root of the Merkle Tree is computed, it is copied into the
|
|
|
block's metadata. After ensuring it's writecap is copied into the metadata, the node then signs
|
|
|
the metadata using its private signing key.
|
|
@@ -371,24 +374,25 @@ section. Then the value of the root Merkle Tree node in the metadata is compared
|
|
|
Merkle Tree for the block. If it matches, then reads to any offset into the block can be verified
|
|
|
using the Merkle Tree. Otherwise, the block is rejected as invalid.
|
|
|
|
|
|
-The sector size can be configured for each block at creation time. The sector size cannot be changed
|
|
|
-after creation, but the same effect can be achieved by creating a new block with the desired sector
|
|
|
-size, copying the data from the old block into, deleting the old block and renaming the new one.
|
|
|
+The sector size can be configured for each block individually at creation time,
|
|
|
+but cannot be changed afterwards.
|
|
|
+However, the same effect can be achieved by creating a new block with the desired sector
|
|
|
+size, copying the data from the old block into it, deleting the old block and renaming the new one.
|
|
|
The choice of sector size for a given block represents a tradeoff between the amount of space the
|
|
|
-Merkle Tree over the blocks contents will occupy and the latency experienced by the initial read to
|
|
|
+Merkle Tree occupies occupies and the latency experienced by the initial read to
|
|
|
a sector. With a larger sector size, the Merkle Tree size is reduced, but more data has to be
|
|
|
hashed and decrypted when each sector is read. Depending on the size of the block and its intended
|
|
|
usage, the optimal choice will vary.
|
|
|
|
|
|
It's important to note that the entire Merkle Tree for a block is held in memory for as long as the
|
|
|
-block is open. So, assuming the 32 byte sha2 256 hash and the default 4 KiB sector size are used,
|
|
|
+block is open. So, assuming the 32 byte SHA2 256 hash and the default 4 KiB sector size are used,
|
|
|
if a 3 GiB file is opened then its MerkleTree will occupy approximately 64 MiB of memory, but will
|
|
|
enable fast random access. Conversely, if 64 KiB sectors are used the Merkle Tree for the same
|
|
|
-3 GiB block will occupy approximately 4 MiB, but it's random access performance will suffer from
|
|
|
+3 GiB block will occupy approximately 4 MiB, but it's random access performance may suffer from
|
|
|
increased latency.
|
|
|
|
|
|
\section{Nodes}
|
|
|
-The independent computing systems participating in the blocktree system are called nodes. A node
|
|
|
+The independent computing systems participating in the Blocktree system are called nodes. A node
|
|
|
need not run on hardware distinct from other nodes, but it is logically distinct from all other
|
|
|
nodes. In all contemporary operating systems a node can be implemented as a process.
|
|
|
A node posses it own unique credentials, which consist of two public and private key pairs,
|
|
@@ -396,9 +400,13 @@ one used in an encryption scheme and another in a signing scheme. A node is a Pr
|
|
|
and a hash of its public signing key is used to identify it. In a slight abuse of language, this
|
|
|
hash is referred to as the node's principal.
|
|
|
|
|
|
-Nodes are identified by paths. We say that a node is attached to the blocktree at the directory
|
|
|
-containing it. A node is responsible for storing the blocks contained in the directory where it is
|
|
|
-attached. This allows data storage to scale as more nodes are added to a blocktree.
|
|
|
+Nodes are also identified by paths.
|
|
|
+A node is responsible for storing the blocks contained in the directory where it is
|
|
|
+attached, unless there is a second node whose parent directory is contained in the parent directory
|
|
|
+of the first and which contains the block's path.
|
|
|
+In other words, the node whose parent directory is closest to the block is responsible for storing
|
|
|
+the block.
|
|
|
+This allows data storage to scale as more nodes are added to a blocktree.
|
|
|
|
|
|
When a directory contains more than one node a cluster is formed.
|
|
|
The nodes in the cluster run the Raft consensus protocol in order to agree on the sequence
|
|
@@ -408,18 +416,20 @@ of a block can be read from multiple nodes in the cluster concurrently.
|
|
|
|
|
|
\section{Processes}
|
|
|
Nodes run code and that running code is called a process. Processes are spawned by the node on which
|
|
|
-their running based on configuration stored in the node's directory. The code which they're spawned
|
|
|
+their running based on configuration stored in the node's parent directory.
|
|
|
+The code which they're spawned
|
|
|
from is also stored in a block, though that block need not be contained in the directory of the
|
|
|
-node running it. So, for example, a software developer can publish their code in their
|
|
|
-blocktree, and a user can run it by configuring a node with the path it was published to. Code
|
|
|
-which is stored in it's own directory with a manifest describing it is called an app.
|
|
|
+node running it. So, for example, a software developer can publish code in their
|
|
|
+blocktree, and a user can run it by updating the configuring of a node with the path it was
|
|
|
+published to.
|
|
|
+Code which is stored in it's own directory with a manifest describing it is called an app.
|
|
|
|
|
|
There are two types of apps, portable and native. A portable app is a collection of WebAssembly
|
|
|
(Wasm) modules which are executed by the node in a special Wasm runtime which exposes a
|
|
|
-messaging API which can be used to perform block IO and communicate with other processes and nodes.
|
|
|
+messaging API which can be used to perform block IO and communicate with other processes.
|
|
|
Native apps on the other hand are container images containing binaries compiled to a specific
|
|
|
-target architecture. These containers can access blocktree messaging services via native library
|
|
|
-using the C ABI, and they have access to blocks in their node's directory via a POSIX-compatible
|
|
|
+target architecture. These containers can access blocktree messaging services via a native library
|
|
|
+using the C ABI, and they have access to blocks via a POSIX-compatible
|
|
|
filesystem API. The software which runs the node itself is distributed using this mechanism.
|
|
|
|
|
|
Regardless of their type, all apps require permissions to communicate with the outside world, with
|