|
@@ -86,13 +86,6 @@ which allows anyone to verify that their contents were written by an authorized
|
|
|
Encryption can be optionally applied to sectors,
|
|
|
with the system handling key management.
|
|
|
The cryptographic mechanisms used to implement these protections are described in section 3.
|
|
|
-To reduce load on the sector service, and to allow the system to scale to a larger number of users,
|
|
|
-a peer-to-peer distribution system is implemented in the filesystem service.
|
|
|
-This system allows filesystem actors to download sectors from other filesystem actors
|
|
|
-that have the sectors in their local cache.
|
|
|
-The threat of malicious actors serving bad sector data is mitigated by the strong integrity
|
|
|
-protections applied to sectors.
|
|
|
-By using peer-to-peer distribution, the system can serve as a content delivery network.
|
|
|
|
|
|
|
|
|
One of the design goals of Blocktree is to facilitate the creation of composable distributed
|
|
@@ -113,6 +106,8 @@ communication protocol.
|
|
|
|
|
|
|
|
|
Blocktree is implemented in the Rust programming language.
|
|
|
+It currently only supports running on Linux,
|
|
|
+though porting it to other Unix-like operating systems should be straight-forward.
|
|
|
Its source code is licensed under the Affero GNU Public License Version 3.
|
|
|
It can be downloaded at the project homepage at \url{https://blocktree.systems}.
|
|
|
Anyone interested in contributing to development is welcome to submit a pull request
|
|
@@ -541,11 +536,18 @@ they will be safe to communicate over any network.
|
|
|
This network encapsulation system could be used in other actors as well,
|
|
|
allowing a lightweight and secure VPN system to built.
|
|
|
|
|
|
+
|
|
|
+Any modern computer system must include a GUI,
|
|
|
+it is required by users.
|
|
|
+For this reason Blocktree includes a web-based GUI called \texttt{btconsole} that can
|
|
|
+monitor the system, provision runtimes, and configure access control.
|
|
|
+\texttt{btconsole} is itself implemented as an actor in the runtime,
|
|
|
+and so has access to the same facilities as any other actor.
|
|
|
|
|
|
|
|
|
\section{Filesystem}
|
|
|
|
|
|
-The responsibility for serving data in the system is shared between the filesystem and sector
|
|
|
+The responsibility for serving data in Blocktree is shared between the filesystem and sector
|
|
|
services.
|
|
|
Most actors will access the filesystem through the filesystem service,
|
|
|
which provides a high-level interface that takes care of the cryptographic operations necessary to
|
|
@@ -600,6 +602,21 @@ The \texttt{SectorId} type is used to identify a sector.
|
|
|
}
|
|
|
\end{verbatim}
|
|
|
|
|
|
+
|
|
|
+The sector service persists sectors in a directory in its local filesystem,
|
|
|
+with each sector is stored in a different file.
|
|
|
+The scheme used to name these files involves security considerations,
|
|
|
+and is described in the next section.
|
|
|
+When a sector is updated,
|
|
|
+a new local file is created with a different name containing the new contents.
|
|
|
+Rather than deleting the old sector file,
|
|
|
+it is overwritten by the creation of a hardlink to the new file,
|
|
|
+and the name that used to create the new file is unlinked.
|
|
|
+This method ensures that the sector file is updated in one atomic operation
|
|
|
+and is used by other Unix programs.
|
|
|
+The sector service also uses the local filesystem to persist the replicated log it uses for Raft.
|
|
|
+This file serves as a journal of sector operations.
|
|
|
+
|
|
|
|
|
|
Communication with the sector service is done by passing it messages of type \texttt{SectorMsg}.
|
|
|
\begin{verbatim}
|
|
@@ -747,19 +764,270 @@ then reading new sectors as soon as notifications are received.
|
|
|
These sectors can then be written into replica files in a different directory.
|
|
|
This ensures that the contents of the replicas will be updated in near real-time.
|
|
|
|
|
|
-\section{Cryptography}
|
|
|
-
|
|
|
-
|
|
|
-
|
|
|
-
|
|
|
-
|
|
|
-
|
|
|
-
|
|
|
+
|
|
|
+Because of the strong integrity protection afforded to sectors,
|
|
|
+it is possible for peer-to-peer distribution of sector data to be done securely.
|
|
|
+Implementing this mechanism is planned as a future enhancement to the system.
|
|
|
+The idea is to base the design on bit torrent,
|
|
|
+where the sector service responsible for a file acts as a tracker for that file,
|
|
|
+and the file actors accessing the file communicate with one another directly using the information
|
|
|
+provided by the sector service.
|
|
|
+This could allow the system to scale to a much larger number of concurrent reads by reducing
|
|
|
+the load on the sector service.
|
|
|
+
|
|
|
+
|
|
|
+Being able to access the filesystem from actors allows a programmer to implement new applications
|
|
|
+using Blocktree,
|
|
|
+but there is an entire world of existing applications which only know how to access the local
|
|
|
+filesystem.
|
|
|
+To allow these applications access to Blocktree,
|
|
|
+a FUSE daemon is included which allows a Blocktree directory to be mounted to a directory in the
|
|
|
+local filesystem.
|
|
|
+This daemon can directly access the sector files in a local directory,
|
|
|
+or it can connect over the network to filesystem or sector service provider.
|
|
|
+This FUSE daemon could be included in a system's initrd to allow it to mount its root filesystem
|
|
|
+from Blocktree,
|
|
|
+opening up many interesting possibilities for hosting machine images in Blocktree.
|
|
|
+A planned future enhancement is to develop a Blocktree filesystem driver which actually runs in
|
|
|
+kernel space.
|
|
|
+This would reduce the overhead associated with context switching from user space, to kernel space,
|
|
|
+and back to user space, for every filesystem interaction,
|
|
|
+making the system more practical to use for a root filesystem.
|
|
|
|
|
|
-
|
|
|
|
|
|
-
|
|
|
+\section{Cryptography}
|
|
|
+This section describes the cryptographic mechanisms used to integrity and confidentiality protect
|
|
|
+files.
|
|
|
+These mechanisms are based on well-established cryptographic constructions.
|
|
|
+
|
|
|
+
|
|
|
+File integrity is protected by a digital signature over its metadata.
|
|
|
+The metadata contains the integrity field which contains the root node of a Merkle tree over
|
|
|
+the file's contents.
|
|
|
+This allows any sector in the file to be verified with a number of hash function invocations that
|
|
|
+is logarithmic in the size of the file.
|
|
|
+It also allows the sectors of a file to be verified in any order,
|
|
|
+enabling random access.
|
|
|
+The hash function used in the Merkle tree can be configured when the file is created.
|
|
|
+Currently, SHA-256 is the default, and SHA-512 is supported.
|
|
|
+A file's metadata also contains a certificate chain,
|
|
|
+and this chain is used to authenticate the signature over the metadata.
|
|
|
+In Blocktree, the certificate chain is referred to as a \emph{writecap}
|
|
|
+because it grants the capability to write to files.
|
|
|
+The certificates in a valid writecap are ordered by their paths,
|
|
|
+the initial certificate contains the longest path,
|
|
|
+the path in each subsequent certificate must be a prefix of the one preceding it,
|
|
|
+and the final certificate must be signed by the root principal.
|
|
|
+These rules ensure that there is a valid delegation of write authority at every
|
|
|
+link in the chain,
|
|
|
+and that the authority is ultimately derived from the root principal specified by the absolute path
|
|
|
+of the file.
|
|
|
+By including all the information necessary to verify the integrity of a file in its metadata,
|
|
|
+it is possible for a requestor who only knows the path of a file to verify that the contents of the
|
|
|
+file are authentic.
|
|
|
|
|
|
+
|
|
|
+Confidentiality protection of files is optional but when it is enabled,
|
|
|
+a file's sectors are individually encrypted using a symmetric cipher.
|
|
|
+The key to this cipher is randomly generated when a file is created.
|
|
|
+A different IV is generated for each sector by hashing the index of the sector with a
|
|
|
+randomly generated IV for the entire file.
|
|
|
+A file's key and IV are encrypted using the public keys of the principals to whom read access is
|
|
|
+to be allowed.
|
|
|
+The resulting ciphertext is referred to as a \emph{readcap}, as it grants the capability to read the
|
|
|
+file.
|
|
|
+These readcaps are stored in a table in the file's metadata.
|
|
|
+Each entry in the table is identified by a byte string that is derived from the public key of the
|
|
|
+principal who owns the entry's readcap.
|
|
|
+The byte string is computed by calculating an HMAC of the the principal's public key.
|
|
|
+The HMAC is keyed with a randomly generated salt that is stored in the file's metadata.
|
|
|
+An identifier for the hash function that was used in the HMAC is included in the byte string so
|
|
|
+that the HMAC can be recomputed later.
|
|
|
+When the filesystem service accesses the file,
|
|
|
+it recomputes the HMAC using the salt, its public key, and the hash function specified in each entry
|
|
|
+of the table.
|
|
|
+It can then identify the entry which contains its readcap,
|
|
|
+or that such an entry does not exist.
|
|
|
+This mechanism was designed to prevent offline correlation attacks on file metadata,
|
|
|
+as metadata is stored in plaintext in local filesystems.
|
|
|
+The file key and IV are also encrypted using the keys of the file's parents.
|
|
|
+Note that there may be multiple parents of a file because it may be hard linked to several
|
|
|
+directories.
|
|
|
+Each of the resulting ciphertexts is stored in another table in the file's metadata.
|
|
|
+The entries in this table are identified by an HMAC of the parent's generation and inode numbers,
|
|
|
+where the HMAC is keyed using the file's salt.
|
|
|
+By encrypting a file's key and IV using the key and IV of its parents,
|
|
|
+it is possible to traverse a directly tree using only a single public key decryption.
|
|
|
+The file where this traversal begins must contain a readcap owned by the accessing principal,
|
|
|
+but all subsequent accesses can be performed by decrypting the key and IV of a child using the
|
|
|
+key and IV of a parent.
|
|
|
+Not only does this allow traversals to use more efficient symmetric key cryptography,
|
|
|
+but it also means that it suffices to grant a readcap on a single directory in order to grant
|
|
|
+access to the entire tree rooted at that directory.
|
|
|
+
|
|
|
+
|
|
|
+Because it is not possible to change the key used by a file after it is created,
|
|
|
+a file must be copied in order to rotate the key used to encrypt it.
|
|
|
+Similarly, revoking a readcap is accomplished by creating a copy of the file
|
|
|
+and adding all the readcaps from the original's metadata except for the one being revoked.
|
|
|
+While it is certainly possible to remove a readcap from the metadata table,
|
|
|
+this is not supported because the readcap holder may have used custom software to save the file's
|
|
|
+key and IV while it had access to them,
|
|
|
+so data written to the same file after revocation could potentially be decrypted by it.
|
|
|
+By forcing the user to create a new file,
|
|
|
+they are forced to re-encrypt the data using a fresh key and IV.
|
|
|
+
|
|
|
+
|
|
|
+From an attacker's perspective,
|
|
|
+not every file in your domain is equally interesting.
|
|
|
+They may be particularly interested in reading your root directory,
|
|
|
+or they may have identified the inode of a file containing kompromat.
|
|
|
+To make offline identification of which files sectors in the local filesystem belong to,
|
|
|
+an obfuscation mechanism is used.
|
|
|
+This works by generating a random salt for each generation of the sector service,
|
|
|
+and storing it in the generation's superblock.
|
|
|
+It is hashed along with the inode and the sector ID to produce the file name of the sector file
|
|
|
+in the local filesystem.
|
|
|
+These files are arranged into different subdirectories according to the value of the first two
|
|
|
+digits in the hex encoding of the resulting hash,
|
|
|
+the same way git organizes object files.
|
|
|
+This simple method makes it more difficult for an attacker to identify the files each sector belongs
|
|
|
+to
|
|
|
+while still allowing the sector service efficient access.
|
|
|
+
|
|
|
+
|
|
|
+Processes need a way to securely store their credentials.
|
|
|
+They accomplish this by using a credential store,
|
|
|
+which is a type that implementor the trait \texttt{CredStore}.
|
|
|
+A credential store provides methods for using a process's credentials to encrypt, decrypt,
|
|
|
+sign, and verify data,
|
|
|
+but it does not allow them to be exported.
|
|
|
+A credential store also provides a method for generating new root credentials.
|
|
|
+Because root credentials represent the root of trust for an entire domain,
|
|
|
+it must be possible to securely back them up from one credential store to another.
|
|
|
+Root credentials can also be used to perform cryptographic operations without exporting them.
|
|
|
+A password is set when the root credentials are generated,
|
|
|
+and this same password must be provided to use, export, and import them.
|
|
|
+When root credentials are exported from a credential store they are confidentiality protected
|
|
|
+using multiple layers of encryption.
|
|
|
+The outer most layer is encryption by a symmetric key cipher whose key is derived from the
|
|
|
+password.
|
|
|
+a public key of the receiving credential store must also be provided when root credentials are
|
|
|
+exported.
|
|
|
+This public key is used to perform the inner encryption of the root credentials,
|
|
|
+ensuring that only the intended credential store is able to import them.
|
|
|
+Currently there are two \texttt{CredStore} implementors in Blocktree,
|
|
|
+one which is used for testing and one which is more secure.
|
|
|
+The first is called \texttt{FileCredStore},
|
|
|
+and it uses a file in the local filesystem to store credentials.
|
|
|
+A symmetric cipher is used to protect the root credentials, if they are stored,
|
|
|
+but it relies on the security of the underlying filesystem to protect the process credentials.
|
|
|
+For this reason it is not recommended for production use.
|
|
|
+The other credential store is called \texttt{TpmCredStore},
|
|
|
+and it uses a Trusted Platform Module (TPM) 2.0 on the local machine to store credentials.
|
|
|
+The TPM is used to generate the process's credentials in such a way that they can never be
|
|
|
+exported from the TPM (this is a feature of TPM 2.0).
|
|
|
+A randomly generated cookie is needed to use these credentials.
|
|
|
+The cookie is stored in a file in the local filesystem which its permissions set to prevent
|
|
|
+others from accessing it.
|
|
|
+Thus this type also relies on the security of the local filesystem.
|
|
|
+But, an attacker would need to steal the TPM and this cookie in order to steal a process's
|
|
|
+credentials.
|
|
|
+
|
|
|
+
|
|
|
+The term provisioning is used in Blocktree to refer to the process of acquiring credentials.
|
|
|
+A command line tool call \texttt{btprovision} is provided for provisioning credential stores.
|
|
|
+This tool can be used to generate new process or root credentials, create a certificate request
|
|
|
+using them, issue a new certificate, and finally to import the new certificate chain.
|
|
|
+When setting up a new domain,
|
|
|
+\texttt{btprovision} can create a new sector storage directory in the local filesystem
|
|
|
+and write the new process's files to it.
|
|
|
+It is also capable of connecting to the filesystem service if it is already running.
|
|
|
+
|
|
|
+
|
|
|
+While manual provisioning is necessary to bootstrap a domain,
|
|
|
+an automatic method is needed to make this process more ergonomic.
|
|
|
+When a runtime starts it checks its configured credential store to find the certificate chain to
|
|
|
+use for authenticating to other runtimes.
|
|
|
+If no such chain is stored,
|
|
|
+the runtime can choose to request a certificate from the filesystem service.
|
|
|
+This is done by dispatching a message with \texttt{call} to the filesystem service without
|
|
|
+specifying a scope.
|
|
|
+Because the message specifies no path, there is no root directory to begin discovery at.
|
|
|
+So, the runtime resorts to using link-local discovery to find other runtimes.
|
|
|
+Once one is discovered,
|
|
|
+the runtime connects to it anonymously
|
|
|
+and sends it a certificate request.
|
|
|
+This request includes a copy of the runtime's public key and, optional, a path where the
|
|
|
+runtime would like to be located.
|
|
|
+This path is purely advisory,
|
|
|
+the filesystem service is free to place the runtime in any directory it sees fit.
|
|
|
+The filesystem service creates a new process file containing the public key and marks it as
|
|
|
+pending.
|
|
|
+The reply to the runtime contains the path of the file created for it.
|
|
|
+The operators of the domain can then use the web GUI or \texttt{btprovision} to view the request
|
|
|
+and approve it at their discretion.
|
|
|
+Assuming an operator approves the request,
|
|
|
+it uses its credentials and the public key in the new process's file to issue a certificate
|
|
|
+and then stores it in the file.
|
|
|
+Authorization attributes (e.g. UID and GID) are also assigned to the process and written into its
|
|
|
+file.
|
|
|
+Note that a process's file is normally not writeable by the process itself,
|
|
|
+so as to prevent it from setting its own authorization attributes.
|
|
|
+Once these data have been written to the process file,
|
|
|
+the runtime can read them to retrieve its new certificate chain.
|
|
|
+It stores this chain in its credential store for later use.
|
|
|
+The runtime can avoid polling its file for changes if it subscribes to write notifications.
|
|
|
+The runtime must close the anonymous connections it made
|
|
|
+and reconnect using the new certificate chain.
|
|
|
+Once new connections are established,
|
|
|
+it can read and write files using the authorization attributes specified in its file.
|
|
|
+Note that this procedure only works when the runtime is on the same LAN as another runtime.
|
|
|
+
|
|
|
+
|
|
|
+The procedure for creating a new domain is straight-forward,
|
|
|
+and all the steps can be performed using \texttt{btprovision}.
|
|
|
+\begin{enumerate}
|
|
|
+ \item Generate the root credentials for the new domain.
|
|
|
+ \item Generate the credentials for the first runtime.
|
|
|
+ \item Create a certificate request using the runtime credentials.
|
|
|
+ \item Approve the request using the root credentials.
|
|
|
+ \item Import the new certificate into the credential store of the first runtime.
|
|
|
+\end{enumerate}
|
|
|
+The first runtime is configured to host the sector and filesystem services,
|
|
|
+so that subsequent runtimes will have access to the filesystem.
|
|
|
+After that, additional runtime on the same LAN can be provisioned using the automatic process.
|
|
|
+
|
|
|
+
|
|
|
+To illustrate how these mechanisms can be used,
|
|
|
+consider a situation where two companies wish to partner to the development of a product.
|
|
|
+To facilitate their collaboration,
|
|
|
+they wish to have a way to securely exchange data with each other.
|
|
|
+One of the companies is selected to host the data
|
|
|
+and accepts the cost and responsibility of serving it.
|
|
|
+The host company creates a directory which will be used to store all of the data created during
|
|
|
+development.
|
|
|
+The other company will connect to the filesystem service in the host company's domain to access
|
|
|
+data in the shared directory.
|
|
|
+Each of the principals in the other company which wish to connect request to be credentialed in the
|
|
|
+shared directory.
|
|
|
+The hosting company manually reviews these requests and approves them,
|
|
|
+assigning each of the principals authorization attributes appropriate for its domain.
|
|
|
+This may involve issuing UID and GID values to each of the principals, or perhaps SELinux contexts.
|
|
|
+The actually set of attributes supported is determined by the \texttt{Authorization} type used by
|
|
|
+by the filesystem service in the host company's domain.
|
|
|
+Once the principals have their credentials,
|
|
|
+they can dispatch messages to the filesystem service using the shared directory as the scope and
|
|
|
+setting the rootward field to true.
|
|
|
+This allows actors authenticating with the credentials of these principals to perform all filesystem
|
|
|
+operations authorized by the hosting company.
|
|
|
+This situation gives the hosting company a lot of control over the data.
|
|
|
+If the other company wishes to protect its investment in the R\&D effort,
|
|
|
+it should subscribe to write events on the shared directory and the files in it so that it can
|
|
|
+copy new sectors out of the host company's domain as soon as they are written.
|
|
|
+Note that although it is not possible to directly subscribe to writes on the contents of a
|
|
|
+directory, by monitoring a directory for changes,
|
|
|
+one can begin monitoring files as soon as they are created.
|
|
|
|
|
|
|
|
|
\section{Examples}
|