1 год назад · 3abccf92d6
--- a/doc/BlocktreeCloudPaper/BlocktreeCloudPaper.tex
+++ b/doc/BlocktreeCloudPaper/BlocktreeCloudPaper.tex
@@ -1,6 +1,7 @@
 
				 \documentclass{article}

			
 
				 \usepackage[scale=0.8]{geometry}

			
 
				 \usepackage{hyperref}

			
 
				+\usepackage{graphicx}

			
 
				 

			
 
				 \title{The Blocktree Cloud Orchestration Platform}

			
 
				 \author{Matthew Carr}

			
@@ -151,6 +152,11 @@ This handshake cryptographically verifies the credentials of each runtime.
 
				 These credentials contain the filesystem path where each runtime is located,

			
 
				 which ensures that messages addressed to a specific path will only be delivered to the runtime

			
 
				 at that path.

			
 
				+Because QUIC supports the concurrent use of many different streams,

			
 
				+it serves as an ideal transport for a message oriented system.

			
 
				+\texttt{bttp} uses different streams for independent messages,

			
 
				+ensuring that head of line blocking will not occur.

			
 
				+However, replies are sent over the same stream as the original message.

			
 
				 

			
 
				 % Delivering messages locally.

			
 
				 When a message is sent between actors in the same runtime it is delivered into the queue of the recipient without any copying,

			
@@ -202,25 +208,33 @@ One or more actors may be register as providing a service.
 
				 Services are resolved to actor names by the runtime.

			
 
				 The service resolution method takes the path of a service and a scope path.

			
 
				 The scope path defines the filesystem path where service resolution will begin.

			
 
				-Resolution produces the name of an actor which is registered in a runtime which is closest to the

			
 
				+Resolution produces the name of an actor which is registered in a runtime which is "closest" to the

			
 
				 scope, or \texttt{None} if no service provider can be found.

			
 
				-To be more precise, consider the following cases:

			
 
				+Here "closest" means the that it is the name returned by the following recursive procedure:

			
 
				 \begin{enumerate}

			
 
				-  \item If the scope is the path of a runtime, and there are service providers registered in the

			
 
				-    runtime, then one of their names if returned. Otherwise, service resolution is retried using a

			
 
				+  \item If the scope is the path of a runtime, and there are providers of the service registered in the

			
 
				+    runtime, then one of their names is returned. Otherwise, service resolution is retried using a

			
 
				     new scope which is obtained by removing the last path component of the current scope.

			
 
				   \item If a directory is specified, then all of the runtimes in the directory are checked for

			
 
				-    registered service providers, and if one is found its name is returned. Otherwise, service

			
 
				-    resolution is retried using a new scope which is obtained by removing the last path component of

			
 
				-    the current scope.

			
 
				+    registered service providers, and the first one which is found has its name is returned.

			
 
				+    Otherwise, service resolution is retried using a new scope which is obtained by removing the

			
 
				+    last path component of the current scope.

			
 
				   \item If the scope is the empty string, then \texttt{None} is returned.

			
 
				 \end{enumerate}

			
 
				+When there are multiple names which could be returned as providers for a given service,

			
 
				+the one which is actually returned is unspecified,

			
 
				+which allows the runtime to balance load.

			
 
				 In order to contact other runtimes and query their service registrations,

			
 
				 their IP addresses need to be known.

			
 
				 To enable this a file with the runtime's IP address is maintained in the same directory as the

			
 
				 runtime.

			
 
				 The runtime is granted write permissions on the file,

			
 
				 and it is updated by the transport layer when it begin listening on a new endpoint.

			
 
				+The services which are allowed to be registered in a given runtime are specified in the runtime's

			
 
				+file.

			
 
				+The runtime reads this list and uses it to deny service registrations for unauthorized services.

			
 
				+The list is also read by other runtime's when they are searching a directory for service providers.

			
 
				+Only runtimes which are authorized to run the service will be searched for service providers.

			
 
				 

			
 
				 % The sector and filesystem service.

			
 
				 The filesystem is itself implemented as a service.

			
@@ -238,22 +252,219 @@ and thus maintaining the persistent state of the system.
 
				 It stores sector data in the local filesystem of each computer on which it is registered.

			
 
				 The details of how this is accomplished are deferred to the next section.

			
 
				 

			
 
				-% protocol contracts, and runtime checking of protocol adherence. Emphasize the benefits to

			
 
				-% system composability that this enables, where errors can be traced back to the actor which

			
 
				-% violated the contract.

			
 
				+% Overview of protocol contracts and runtime checking of protocol adherence.

			
 
				 To facilitate the creation of composable systems,

			
 
				 a protocol contract checking system based on session types has been designed.

			
 
				-This system operates on a state transition model of a communications protocol.

			
 
				+This system models a communication protocol as a directed graph representing state transitions

			
 
				+based on types of received messages.

			
 
				+The protocol author defines the states that the actors participating in the protocol can be in using 

			
 
				+Rust traits.

			
 
				+These traits define handler methods for each message type the actor is expected to handle in that

			
 
				+state.

			
 
				+A top-level trait which represents the entire protocol is defined that contains the types of the

			
 
				+initial state of every actor in the protocol.

			
 
				+A macro is used to generate the message handling loop for the each of the parties to the protocol,

			
 
				+as well as enums to represent all possible states that the parties can be in and the messages that

			
 
				+they exchange.

			
 
				+The generated code is responsible for ensuring that errors are generated when a message of an

			
 
				+unexpected type is received,

			
 
				+eliminating the need for ad-hoc error handling code to be written by application developers.

			
 
				+

			
 
				+% Example of a protocol contract.

			
 
				+Let us explore the use of this system through a simple example.

			
 
				+Consider the HTTP/1.1 protocol.

			
 
				+It is a state-less client-server protocol,

			
 
				+essentially just an RPC from client to server.

			
 
				+We can model this in for the contract checker by defining a trait representing the protocol:

			
 
				+\begin{verbatim}

			
 
				+  pub trait Http {

			
 
				+    type Server: ServerInit;

			
 
				+  }

			
 
				+\end{verbatim}

			
 
				+The job of this top-level trait is to specify the initial state of every party to the communications

			
 
				+protocol.

			
 
				+In this case were only modeling the state of the server,

			
 
				+as the client will just \texttt{call} a method on the server.

			
 
				+The initial state for the server is defined as follows:

			
 
				+\begin{verbatim}

			
 
				+  pub trait ServerInit {

			
 
				+    type AfterActivate: Listening;

			
 
				+    type Fut: Future<Output = Result<Self::AfterActivate>>;

			
 
				+    fn handle_activate(self, msg: Activate) -> Self::Fut;

			
 
				+  }

			
 
				+\end{verbatim}

			
 
				+The \texttt{Activate} is a message sent by the generated code to allow the actor access to the

			
 
				+runtime and its ID.

			
 
				+It is defined as follows:

			
 
				+\begin{verbatim}

			
 
				+  pub struct Activate {

			
 
				+    rt: &'static Runtime,

			
 
				+    act_id: Uuid,

			
 
				+  }

			
 
				+\end{verbatim}

			
 
				+We represent the statelessness of HTTP by having the requests to the \texttt{Listening} state

			
 
				+return another \texttt{Listening} state.

			
 
				+\begin{verbatim}

			
 
				+  pub trait Listening {

			
 
				+    type AfterRequest: Listening;

			
 
				+    type Fut: Future<Output = Result<Self::AfterRequest>>;

			
 
				+    fn handle_request(self, msg: Envelope<Request>) -> Self::Fut;

			
 
				+  }

			
 
				+\end{verbatim}

			
 
				+The \texttt{Envelope} type is a wrapper around a message which contains information about who sent

			
 
				+it and a method which can be used to send a reply.

			
 
				+In general a new type could be returned after each message received,

			
 
				+with the returned type being dependent on the type of the message.

			
 
				+The state graph of this protocol can be visualized as follows:

			
 
				+\begin{center}

			
 
				+  \includegraphics[height=1.5in]{HttpStateGraph.pdf}

			
 
				+\end{center}

			
 
				+

			
 
				+% Implementing actors in languages other than Rust.

			
 
				+Today the actor runtime only supports executing actors implemented in Rust.

			
 
				+A WebAssembly (Wasm) plugin system is planned to allow any language which can compile to Wasm to be

			
 
				+used to implement an actor.

			
 
				+This work is blocked pending the standardization of the WebAssembly Component Model,

			
 
				+which promises to provide an interface definition language which will allow type safe actors to be

			
 
				+defined in many different languages.

			
 
				+

			
 
				+% Running containers using actors.

			
 
				+Blocktree allows containers to be run by encapsulating them using a supervising actor.

			
 
				+This actor is responsible for starting the container and managing the container's kernel namespace.

			
 
				+Logically, it owns any kernel resources created by the container, including all spawned operating

			
 
				+system processes.

			
 
				+When the actor halts,

			
 
				+all of these resources are destroyed.

			
 
				+All network communication to the container is controlled by the supervising actor.

			
 
				+The supervisor can be configured to bind container ports to host ports,

			
 
				+as is commonly done today,

			
 
				+but it can also be used to encapsulate traffic to and from the container in Blocktree messages.

			
 
				+These messages are routed to other actors based on the configuration of the supervisor.

			
 
				+This essentially creates a VPN for containers,

			
 
				+ensuring that regardless of the security hardness of their communications,

			
 
				+they will be safe to communicate over any network.

			
 
				+This network encapsulation system could be used in other actors as well,

			
 
				+allowing a lightweight and secure VPN system to built.

			
 
				 

			
 
				 \section{Filesystem}

			
 
				-% Benefits of using a distributed filesystem as the sole source of persistent state for the system,

			
 
				-% including secure software delivery.

			
 
				+% The division of responsibilities between the sector and filesystem services.

			
 
				+The responsibility for storing data in the system is shared between the filesystem and sector

			
 
				+services.

			
 
				+Most actors will access the filesystem through the filesystem service,

			
 
				+which provides a high-level interface that takes care of the cryptographic operations necessary to

			
 
				+read and write files.

			
 
				+The filesystem service relies on the sector service for actually persisting data.

			
 
				+The individual sectors which make up a file are read from and written to the sector service,

			
 
				+which stores them in the local filesystem of the computer on which it is running.

			
 
				+A sector is the atomic unit of data storage.

			
 
				+The sector service only supports reading and writing entire sectors at once.

			
 
				+File actors spawned  by the filesystem service buffer reads and writes so until there is enough

			
 
				+data to fill a sector.

			
 
				+Because cryptographic operations are only performed on full sectors,

			
 
				+the cost of providing these protections is amortized over the size of the sector.

			
 
				+Thus there is tradeoff between latency and throughput when selecting the sector size of a file.

			
 
				+A smaller sector size means less latency while a larger one enables more throughput.

			
 
				+

			
 
				+% Types of sectors: metadata, integrity, and data.

			
 
				+A file has a single metadata sector, a Merkle sector, and zero or more data sectors.

			
 
				+The sector size of a file can be specified when it is created,

			
 
				+but cannot be changed later.

			
 
				+Every data sector contains the ciphertext of the number of bytes equal to the sector size,

			
 
				+but the metadata and Merkle sectors contain a variable amount of data.

			
 
				+The metadata sector contains all of the filesystem metadata associated with the file.

			
 
				+In addition to the usual metadata present in any Unix filesystem (the contents of the \texttt{stat} struct),

			
 
				+cryptographic information necessary to verify and decrypt the contents of the file are also stored.

			
 
				+The Merkle sector of a file contains a Merkle tree over the data sectors of a file.

			
 
				+The hash function used by this tree can be configured at file creation,

			
 
				+but cannot be changed after the fact.

			
 
				+

			
 
				+% How sectors are identified.

			
 
				+When sector service providers are contained in the same directory they connect to each other to form

			
 
				+a consensus cluster.

			
 
				+This cluster is identified by a \texttt{u64} called the cluster's \emph{generation}.

			
 
				+Every file is identified by a pair of \texttt{u64}, its generation and its inode.

			
 
				+The sectors within a file are identified by an enum which specifies which type they are,

			
 
				+and in the case of data sectors, their index.

			
 
				+\begin{verbatim}

			
 
				+  pub enum SectorKind {

			
 
				+    Meta,

			
 
				+    Merkle,

			
 
				+    Data(u64),

			
 
				+  }

			
 
				+\end{verbatim}

			
 
				+The offset in the plaintext of the file at which each data sector begins can be calculated by

			
 
				+multiplying the sectors offset by the sector size of the file.

			
 
				+

			
 
				+% Scaling horizontally: using Raft to create consensus cluster. Additional replication methods.

			
 
				+When multiple multiple sector service providers are contained in the same directory,

			
 
				+the sector service providers connect to each other to form a consensus cluster.

			
 
				+This cluster uses the Raft protocol to synchronize the state of the sectors it stores.

			
 
				+The system is currently designed to replicate all data to each of the service providers in the

			
 
				+cluster.

			
 
				+Additional replication methods are planned for implementation,

			
 
				+such as consisting hashing and erasure encoding,

			
 
				+which allow for different tradeoffs between data durability and storage utilization.

			
 
				+

			
 
				+% Scaling vertically: how different generations are stitched together.

			
 
				+The creation of a new generation of the sector service is accomplished with several steps.

			
 
				+First, a new directory is created in which the generation will be located.

			
 
				+Next, one or more processes are credentialed for this directory,

			
 
				+using a procedure which is described in the next section.

			
 
				+The credentialing process produces files for each of the processes stored in the new directory.

			
 
				+The sector service provider in each of the new processes uses service discovery to establish

			
 
				+communication with its peers in the other processes.

			
 
				+Finally, the service provider which is elected leader contacts the cluster in the root directory

			
 
				+and requests a new generation number.

			
 
				+Once this number is known it is stored in the superblock for the generation,

			
 
				+which is the file identified by the new generation number and inode 2.

			
 
				+Note that the superblock is not contained in any directory and cannot be accessed by actors

			
 
				+outside of the sector service.

			
 
				+The superblock also contains information used to assign a inodes when a files are created.

			
 
				+

			
 
				+% The filesystem service is responsible for cryptographic operations. Client-side encryption.

			
 
				+The sector service is relied upon by the filesystem service to read and write sectors.

			
 
				+Filesystem service providers communicate with the sector service to open files, read and write

			
 
				+their contents, and update their metadata.

			
 
				+These providers are responsible for verifying and decrypting the information contained in sectors

			
 
				+and providing it to downstream actors.

			
 
				+They are also responsible for encrypting and integrity protecting data written by downstream actors.

			
 
				+Most of the complexity of implementing a filesystem is handled in the filesystem service.

			
 
				+Most messages sent to the sector service only specify the operation (read or write), the identifier

			
 
				+for the sector, and the sector contents.

			
 
				+Every time a data sector is written an updated metadata sector is required to be sent in the same

			
 
				+message.

			
 
				+This requirement exists because a signature over the root of the file's Merkle tree is contained in

			
 
				+the metadata,

			
 
				+and since this root changes with every modification, it must be updated during every write.

			
 
				+When the sector service commits a write it hashes the sector contents,

			
 
				+updates the Merkle sector of the file, and updates the metadata sector.

			
 
				+In order for the filesystem service to produce a signature over the root of the file's Merkle tree,

			
 
				+it maintains a copy of the tree in memory.

			
 
				+This copy is loaded from the sector service when the file is opened.

			
 
				+While this does mean duplicating data between the sector and filesystem services,

			
 
				+this design was chosen to reduce the network traffic between the two services,

			
 
				+as the entire Merkle tree does not need to be transmitted on every write.

			
 
				+Encapsulating all cryptographic operations in the filesystem service allows the computer storing

			
 
				+data to be different from the computer encrypting it.

			
 
				+This approach allows client-side encryption to be done on more capable computers

			
 
				+and for this task to be delegated to a storage server on low powered devices.

			
 
				 

			
 
				-% Accessing data at two different levels of abstraction: sectors and files.

			
 
				+% Sector service discovery. Paths.

			
 
				 

			
 
				-% Concurrency semantics at the sector layer, and their implementation using Raft.

			
 
				+% Description of how the filesystem layer: opens a file, reads, and writes.

			
 
				 

			
 
				 \section{Cryptography}

			
 
				+% The underlying trust model: self-certifying paths.

			
 
				+

			
 
				+% Verifying sector contents on read and certifying on write.

			
 
				+

			
 
				+% Confidentiality protecting files with readcaps. Single pubkey operation to read a dir tree.

			
 
				+

			
 
				+% Give example of how these mechanisms allow data to be shared without any prior federation.

			
 
				+

			
 
				+% Description of bttp handshake and the authentication data which is provided by both parties.

			
 
				+

			
 
				+% Requesting and issuing credentials. Multicast link-local network discovery.

			
 
				 

			
 
				 \section{Examples}

			
 
				 This section contains examples of systems built using Blocktree. The hope is to illustrate how this

			
@@ -271,5 +482,14 @@ implement systems which are currently out of reach.
 
				 % Explain my vision of the metaverse.

			
 
				 

			
 
				 \section{Conclusion}

			
 
				+% Blocktree serves as the basis for building a cloud-level distributed operating system.

			
 
				+

			
 
				+% The system enables individuals to self-host the services they rely on.

			
 
				+

			
 
				+% It also gives business a freeer choice of whether to own or lease computing resources.

			
 
				+

			
 
				+% The system advances the status quo in secure computing.

			
 
				+

			
 
				+% Composability leads to emergent benefits.

			
 
				 

			
 
				 \end{document}
			
--- a/doc/BlocktreeCloudPaper/HttpStateGraph.gv
+++ b/doc/BlocktreeCloudPaper/HttpStateGraph.gv
@@ -0,0 +1,8 @@
 
				+// This can be regenerated with the following command:
			
 
				+// dot -Tpdf -o HttpStateGraph.pdf HttpStateGraph.gv
			
 
				+digraph {
			
 
				+  server_init[label = "ServerInit"];
			
 
				+  listening[label = "Listening"];
			
 
				+  server_init -> listening [label = "  Activate"];
			
 
				+  listening -> listening [label = "  Request"];
			
 
				+}