|
@@ -1,6 +1,7 @@
|
|
\documentclass{article}
|
|
\documentclass{article}
|
|
\usepackage[scale=0.8]{geometry}
|
|
\usepackage[scale=0.8]{geometry}
|
|
\usepackage{hyperref}
|
|
\usepackage{hyperref}
|
|
|
|
+\usepackage{graphicx}
|
|
|
|
|
|
\title{The Blocktree Cloud Orchestration Platform}
|
|
\title{The Blocktree Cloud Orchestration Platform}
|
|
\author{Matthew Carr}
|
|
\author{Matthew Carr}
|
|
@@ -151,6 +152,11 @@ This handshake cryptographically verifies the credentials of each runtime.
|
|
These credentials contain the filesystem path where each runtime is located,
|
|
These credentials contain the filesystem path where each runtime is located,
|
|
which ensures that messages addressed to a specific path will only be delivered to the runtime
|
|
which ensures that messages addressed to a specific path will only be delivered to the runtime
|
|
at that path.
|
|
at that path.
|
|
|
|
+Because QUIC supports the concurrent use of many different streams,
|
|
|
|
+it serves as an ideal transport for a message oriented system.
|
|
|
|
+\texttt{bttp} uses different streams for independent messages,
|
|
|
|
+ensuring that head of line blocking will not occur.
|
|
|
|
+However, replies are sent over the same stream as the original message.
|
|
|
|
|
|
% Delivering messages locally.
|
|
% Delivering messages locally.
|
|
When a message is sent between actors in the same runtime it is delivered into the queue of the recipient without any copying,
|
|
When a message is sent between actors in the same runtime it is delivered into the queue of the recipient without any copying,
|
|
@@ -202,25 +208,33 @@ One or more actors may be register as providing a service.
|
|
Services are resolved to actor names by the runtime.
|
|
Services are resolved to actor names by the runtime.
|
|
The service resolution method takes the path of a service and a scope path.
|
|
The service resolution method takes the path of a service and a scope path.
|
|
The scope path defines the filesystem path where service resolution will begin.
|
|
The scope path defines the filesystem path where service resolution will begin.
|
|
-Resolution produces the name of an actor which is registered in a runtime which is closest to the
|
|
|
|
|
|
+Resolution produces the name of an actor which is registered in a runtime which is "closest" to the
|
|
scope, or \texttt{None} if no service provider can be found.
|
|
scope, or \texttt{None} if no service provider can be found.
|
|
-To be more precise, consider the following cases:
|
|
|
|
|
|
+Here "closest" means the that it is the name returned by the following recursive procedure:
|
|
\begin{enumerate}
|
|
\begin{enumerate}
|
|
- \item If the scope is the path of a runtime, and there are service providers registered in the
|
|
|
|
- runtime, then one of their names if returned. Otherwise, service resolution is retried using a
|
|
|
|
|
|
+ \item If the scope is the path of a runtime, and there are providers of the service registered in the
|
|
|
|
+ runtime, then one of their names is returned. Otherwise, service resolution is retried using a
|
|
new scope which is obtained by removing the last path component of the current scope.
|
|
new scope which is obtained by removing the last path component of the current scope.
|
|
\item If a directory is specified, then all of the runtimes in the directory are checked for
|
|
\item If a directory is specified, then all of the runtimes in the directory are checked for
|
|
- registered service providers, and if one is found its name is returned. Otherwise, service
|
|
|
|
- resolution is retried using a new scope which is obtained by removing the last path component of
|
|
|
|
- the current scope.
|
|
|
|
|
|
+ registered service providers, and the first one which is found has its name is returned.
|
|
|
|
+ Otherwise, service resolution is retried using a new scope which is obtained by removing the
|
|
|
|
+ last path component of the current scope.
|
|
\item If the scope is the empty string, then \texttt{None} is returned.
|
|
\item If the scope is the empty string, then \texttt{None} is returned.
|
|
\end{enumerate}
|
|
\end{enumerate}
|
|
|
|
+When there are multiple names which could be returned as providers for a given service,
|
|
|
|
+the one which is actually returned is unspecified,
|
|
|
|
+which allows the runtime to balance load.
|
|
In order to contact other runtimes and query their service registrations,
|
|
In order to contact other runtimes and query their service registrations,
|
|
their IP addresses need to be known.
|
|
their IP addresses need to be known.
|
|
To enable this a file with the runtime's IP address is maintained in the same directory as the
|
|
To enable this a file with the runtime's IP address is maintained in the same directory as the
|
|
runtime.
|
|
runtime.
|
|
The runtime is granted write permissions on the file,
|
|
The runtime is granted write permissions on the file,
|
|
and it is updated by the transport layer when it begin listening on a new endpoint.
|
|
and it is updated by the transport layer when it begin listening on a new endpoint.
|
|
|
|
+The services which are allowed to be registered in a given runtime are specified in the runtime's
|
|
|
|
+file.
|
|
|
|
+The runtime reads this list and uses it to deny service registrations for unauthorized services.
|
|
|
|
+The list is also read by other runtime's when they are searching a directory for service providers.
|
|
|
|
+Only runtimes which are authorized to run the service will be searched for service providers.
|
|
|
|
|
|
% The sector and filesystem service.
|
|
% The sector and filesystem service.
|
|
The filesystem is itself implemented as a service.
|
|
The filesystem is itself implemented as a service.
|
|
@@ -238,22 +252,219 @@ and thus maintaining the persistent state of the system.
|
|
It stores sector data in the local filesystem of each computer on which it is registered.
|
|
It stores sector data in the local filesystem of each computer on which it is registered.
|
|
The details of how this is accomplished are deferred to the next section.
|
|
The details of how this is accomplished are deferred to the next section.
|
|
|
|
|
|
-% protocol contracts, and runtime checking of protocol adherence. Emphasize the benefits to
|
|
|
|
-% system composability that this enables, where errors can be traced back to the actor which
|
|
|
|
-% violated the contract.
|
|
|
|
|
|
+% Overview of protocol contracts and runtime checking of protocol adherence.
|
|
To facilitate the creation of composable systems,
|
|
To facilitate the creation of composable systems,
|
|
a protocol contract checking system based on session types has been designed.
|
|
a protocol contract checking system based on session types has been designed.
|
|
-This system operates on a state transition model of a communications protocol.
|
|
|
|
|
|
+This system models a communication protocol as a directed graph representing state transitions
|
|
|
|
+based on types of received messages.
|
|
|
|
+The protocol author defines the states that the actors participating in the protocol can be in using
|
|
|
|
+Rust traits.
|
|
|
|
+These traits define handler methods for each message type the actor is expected to handle in that
|
|
|
|
+state.
|
|
|
|
+A top-level trait which represents the entire protocol is defined that contains the types of the
|
|
|
|
+initial state of every actor in the protocol.
|
|
|
|
+A macro is used to generate the message handling loop for the each of the parties to the protocol,
|
|
|
|
+as well as enums to represent all possible states that the parties can be in and the messages that
|
|
|
|
+they exchange.
|
|
|
|
+The generated code is responsible for ensuring that errors are generated when a message of an
|
|
|
|
+unexpected type is received,
|
|
|
|
+eliminating the need for ad-hoc error handling code to be written by application developers.
|
|
|
|
+
|
|
|
|
+% Example of a protocol contract.
|
|
|
|
+Let us explore the use of this system through a simple example.
|
|
|
|
+Consider the HTTP/1.1 protocol.
|
|
|
|
+It is a state-less client-server protocol,
|
|
|
|
+essentially just an RPC from client to server.
|
|
|
|
+We can model this in for the contract checker by defining a trait representing the protocol:
|
|
|
|
+\begin{verbatim}
|
|
|
|
+ pub trait Http {
|
|
|
|
+ type Server: ServerInit;
|
|
|
|
+ }
|
|
|
|
+\end{verbatim}
|
|
|
|
+The job of this top-level trait is to specify the initial state of every party to the communications
|
|
|
|
+protocol.
|
|
|
|
+In this case were only modeling the state of the server,
|
|
|
|
+as the client will just \texttt{call} a method on the server.
|
|
|
|
+The initial state for the server is defined as follows:
|
|
|
|
+\begin{verbatim}
|
|
|
|
+ pub trait ServerInit {
|
|
|
|
+ type AfterActivate: Listening;
|
|
|
|
+ type Fut: Future<Output = Result<Self::AfterActivate>>;
|
|
|
|
+ fn handle_activate(self, msg: Activate) -> Self::Fut;
|
|
|
|
+ }
|
|
|
|
+\end{verbatim}
|
|
|
|
+The \texttt{Activate} is a message sent by the generated code to allow the actor access to the
|
|
|
|
+runtime and its ID.
|
|
|
|
+It is defined as follows:
|
|
|
|
+\begin{verbatim}
|
|
|
|
+ pub struct Activate {
|
|
|
|
+ rt: &'static Runtime,
|
|
|
|
+ act_id: Uuid,
|
|
|
|
+ }
|
|
|
|
+\end{verbatim}
|
|
|
|
+We represent the statelessness of HTTP by having the requests to the \texttt{Listening} state
|
|
|
|
+return another \texttt{Listening} state.
|
|
|
|
+\begin{verbatim}
|
|
|
|
+ pub trait Listening {
|
|
|
|
+ type AfterRequest: Listening;
|
|
|
|
+ type Fut: Future<Output = Result<Self::AfterRequest>>;
|
|
|
|
+ fn handle_request(self, msg: Envelope<Request>) -> Self::Fut;
|
|
|
|
+ }
|
|
|
|
+\end{verbatim}
|
|
|
|
+The \texttt{Envelope} type is a wrapper around a message which contains information about who sent
|
|
|
|
+it and a method which can be used to send a reply.
|
|
|
|
+In general a new type could be returned after each message received,
|
|
|
|
+with the returned type being dependent on the type of the message.
|
|
|
|
+The state graph of this protocol can be visualized as follows:
|
|
|
|
+\begin{center}
|
|
|
|
+ \includegraphics[height=1.5in]{HttpStateGraph.pdf}
|
|
|
|
+\end{center}
|
|
|
|
+
|
|
|
|
+% Implementing actors in languages other than Rust.
|
|
|
|
+Today the actor runtime only supports executing actors implemented in Rust.
|
|
|
|
+A WebAssembly (Wasm) plugin system is planned to allow any language which can compile to Wasm to be
|
|
|
|
+used to implement an actor.
|
|
|
|
+This work is blocked pending the standardization of the WebAssembly Component Model,
|
|
|
|
+which promises to provide an interface definition language which will allow type safe actors to be
|
|
|
|
+defined in many different languages.
|
|
|
|
+
|
|
|
|
+% Running containers using actors.
|
|
|
|
+Blocktree allows containers to be run by encapsulating them using a supervising actor.
|
|
|
|
+This actor is responsible for starting the container and managing the container's kernel namespace.
|
|
|
|
+Logically, it owns any kernel resources created by the container, including all spawned operating
|
|
|
|
+system processes.
|
|
|
|
+When the actor halts,
|
|
|
|
+all of these resources are destroyed.
|
|
|
|
+All network communication to the container is controlled by the supervising actor.
|
|
|
|
+The supervisor can be configured to bind container ports to host ports,
|
|
|
|
+as is commonly done today,
|
|
|
|
+but it can also be used to encapsulate traffic to and from the container in Blocktree messages.
|
|
|
|
+These messages are routed to other actors based on the configuration of the supervisor.
|
|
|
|
+This essentially creates a VPN for containers,
|
|
|
|
+ensuring that regardless of the security hardness of their communications,
|
|
|
|
+they will be safe to communicate over any network.
|
|
|
|
+This network encapsulation system could be used in other actors as well,
|
|
|
|
+allowing a lightweight and secure VPN system to built.
|
|
|
|
|
|
\section{Filesystem}
|
|
\section{Filesystem}
|
|
-% Benefits of using a distributed filesystem as the sole source of persistent state for the system,
|
|
|
|
-% including secure software delivery.
|
|
|
|
|
|
+% The division of responsibilities between the sector and filesystem services.
|
|
|
|
+The responsibility for storing data in the system is shared between the filesystem and sector
|
|
|
|
+services.
|
|
|
|
+Most actors will access the filesystem through the filesystem service,
|
|
|
|
+which provides a high-level interface that takes care of the cryptographic operations necessary to
|
|
|
|
+read and write files.
|
|
|
|
+The filesystem service relies on the sector service for actually persisting data.
|
|
|
|
+The individual sectors which make up a file are read from and written to the sector service,
|
|
|
|
+which stores them in the local filesystem of the computer on which it is running.
|
|
|
|
+A sector is the atomic unit of data storage.
|
|
|
|
+The sector service only supports reading and writing entire sectors at once.
|
|
|
|
+File actors spawned by the filesystem service buffer reads and writes so until there is enough
|
|
|
|
+data to fill a sector.
|
|
|
|
+Because cryptographic operations are only performed on full sectors,
|
|
|
|
+the cost of providing these protections is amortized over the size of the sector.
|
|
|
|
+Thus there is tradeoff between latency and throughput when selecting the sector size of a file.
|
|
|
|
+A smaller sector size means less latency while a larger one enables more throughput.
|
|
|
|
+
|
|
|
|
+% Types of sectors: metadata, integrity, and data.
|
|
|
|
+A file has a single metadata sector, a Merkle sector, and zero or more data sectors.
|
|
|
|
+The sector size of a file can be specified when it is created,
|
|
|
|
+but cannot be changed later.
|
|
|
|
+Every data sector contains the ciphertext of the number of bytes equal to the sector size,
|
|
|
|
+but the metadata and Merkle sectors contain a variable amount of data.
|
|
|
|
+The metadata sector contains all of the filesystem metadata associated with the file.
|
|
|
|
+In addition to the usual metadata present in any Unix filesystem (the contents of the \texttt{stat} struct),
|
|
|
|
+cryptographic information necessary to verify and decrypt the contents of the file are also stored.
|
|
|
|
+The Merkle sector of a file contains a Merkle tree over the data sectors of a file.
|
|
|
|
+The hash function used by this tree can be configured at file creation,
|
|
|
|
+but cannot be changed after the fact.
|
|
|
|
+
|
|
|
|
+% How sectors are identified.
|
|
|
|
+When sector service providers are contained in the same directory they connect to each other to form
|
|
|
|
+a consensus cluster.
|
|
|
|
+This cluster is identified by a \texttt{u64} called the cluster's \emph{generation}.
|
|
|
|
+Every file is identified by a pair of \texttt{u64}, its generation and its inode.
|
|
|
|
+The sectors within a file are identified by an enum which specifies which type they are,
|
|
|
|
+and in the case of data sectors, their index.
|
|
|
|
+\begin{verbatim}
|
|
|
|
+ pub enum SectorKind {
|
|
|
|
+ Meta,
|
|
|
|
+ Merkle,
|
|
|
|
+ Data(u64),
|
|
|
|
+ }
|
|
|
|
+\end{verbatim}
|
|
|
|
+The offset in the plaintext of the file at which each data sector begins can be calculated by
|
|
|
|
+multiplying the sectors offset by the sector size of the file.
|
|
|
|
+
|
|
|
|
+% Scaling horizontally: using Raft to create consensus cluster. Additional replication methods.
|
|
|
|
+When multiple multiple sector service providers are contained in the same directory,
|
|
|
|
+the sector service providers connect to each other to form a consensus cluster.
|
|
|
|
+This cluster uses the Raft protocol to synchronize the state of the sectors it stores.
|
|
|
|
+The system is currently designed to replicate all data to each of the service providers in the
|
|
|
|
+cluster.
|
|
|
|
+Additional replication methods are planned for implementation,
|
|
|
|
+such as consisting hashing and erasure encoding,
|
|
|
|
+which allow for different tradeoffs between data durability and storage utilization.
|
|
|
|
+
|
|
|
|
+% Scaling vertically: how different generations are stitched together.
|
|
|
|
+The creation of a new generation of the sector service is accomplished with several steps.
|
|
|
|
+First, a new directory is created in which the generation will be located.
|
|
|
|
+Next, one or more processes are credentialed for this directory,
|
|
|
|
+using a procedure which is described in the next section.
|
|
|
|
+The credentialing process produces files for each of the processes stored in the new directory.
|
|
|
|
+The sector service provider in each of the new processes uses service discovery to establish
|
|
|
|
+communication with its peers in the other processes.
|
|
|
|
+Finally, the service provider which is elected leader contacts the cluster in the root directory
|
|
|
|
+and requests a new generation number.
|
|
|
|
+Once this number is known it is stored in the superblock for the generation,
|
|
|
|
+which is the file identified by the new generation number and inode 2.
|
|
|
|
+Note that the superblock is not contained in any directory and cannot be accessed by actors
|
|
|
|
+outside of the sector service.
|
|
|
|
+The superblock also contains information used to assign a inodes when a files are created.
|
|
|
|
+
|
|
|
|
+% The filesystem service is responsible for cryptographic operations. Client-side encryption.
|
|
|
|
+The sector service is relied upon by the filesystem service to read and write sectors.
|
|
|
|
+Filesystem service providers communicate with the sector service to open files, read and write
|
|
|
|
+their contents, and update their metadata.
|
|
|
|
+These providers are responsible for verifying and decrypting the information contained in sectors
|
|
|
|
+and providing it to downstream actors.
|
|
|
|
+They are also responsible for encrypting and integrity protecting data written by downstream actors.
|
|
|
|
+Most of the complexity of implementing a filesystem is handled in the filesystem service.
|
|
|
|
+Most messages sent to the sector service only specify the operation (read or write), the identifier
|
|
|
|
+for the sector, and the sector contents.
|
|
|
|
+Every time a data sector is written an updated metadata sector is required to be sent in the same
|
|
|
|
+message.
|
|
|
|
+This requirement exists because a signature over the root of the file's Merkle tree is contained in
|
|
|
|
+the metadata,
|
|
|
|
+and since this root changes with every modification, it must be updated during every write.
|
|
|
|
+When the sector service commits a write it hashes the sector contents,
|
|
|
|
+updates the Merkle sector of the file, and updates the metadata sector.
|
|
|
|
+In order for the filesystem service to produce a signature over the root of the file's Merkle tree,
|
|
|
|
+it maintains a copy of the tree in memory.
|
|
|
|
+This copy is loaded from the sector service when the file is opened.
|
|
|
|
+While this does mean duplicating data between the sector and filesystem services,
|
|
|
|
+this design was chosen to reduce the network traffic between the two services,
|
|
|
|
+as the entire Merkle tree does not need to be transmitted on every write.
|
|
|
|
+Encapsulating all cryptographic operations in the filesystem service allows the computer storing
|
|
|
|
+data to be different from the computer encrypting it.
|
|
|
|
+This approach allows client-side encryption to be done on more capable computers
|
|
|
|
+and for this task to be delegated to a storage server on low powered devices.
|
|
|
|
|
|
-% Accessing data at two different levels of abstraction: sectors and files.
|
|
|
|
|
|
+% Sector service discovery. Paths.
|
|
|
|
|
|
-% Concurrency semantics at the sector layer, and their implementation using Raft.
|
|
|
|
|
|
+% Description of how the filesystem layer: opens a file, reads, and writes.
|
|
|
|
|
|
\section{Cryptography}
|
|
\section{Cryptography}
|
|
|
|
+% The underlying trust model: self-certifying paths.
|
|
|
|
+
|
|
|
|
+% Verifying sector contents on read and certifying on write.
|
|
|
|
+
|
|
|
|
+% Confidentiality protecting files with readcaps. Single pubkey operation to read a dir tree.
|
|
|
|
+
|
|
|
|
+% Give example of how these mechanisms allow data to be shared without any prior federation.
|
|
|
|
+
|
|
|
|
+% Description of bttp handshake and the authentication data which is provided by both parties.
|
|
|
|
+
|
|
|
|
+% Requesting and issuing credentials. Multicast link-local network discovery.
|
|
|
|
|
|
\section{Examples}
|
|
\section{Examples}
|
|
This section contains examples of systems built using Blocktree. The hope is to illustrate how this
|
|
This section contains examples of systems built using Blocktree. The hope is to illustrate how this
|
|
@@ -271,5 +482,14 @@ implement systems which are currently out of reach.
|
|
% Explain my vision of the metaverse.
|
|
% Explain my vision of the metaverse.
|
|
|
|
|
|
\section{Conclusion}
|
|
\section{Conclusion}
|
|
|
|
+% Blocktree serves as the basis for building a cloud-level distributed operating system.
|
|
|
|
+
|
|
|
|
+% The system enables individuals to self-host the services they rely on.
|
|
|
|
+
|
|
|
|
+% It also gives business a freeer choice of whether to own or lease computing resources.
|
|
|
|
+
|
|
|
|
+% The system advances the status quo in secure computing.
|
|
|
|
+
|
|
|
|
+% Composability leads to emergent benefits.
|
|
|
|
|
|
\end{document}
|
|
\end{document}
|