Delease
/
Blocktree


			
				
					
						
						
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275
							\documentclass{article}
\usepackage[scale=0.8]{geometry}
\usepackage{hyperref}

\title{The Blocktree Cloud Orchestration Platform}
\author{Matthew Carr}

\begin{document}
\maketitle
\begin{abstract}
This document is a proposal for a novel cloud platform called Blocktree.
The system is described in terms of the actor model,
where tasks and services are implemented as actors.
The platform is responsible for orchestrating these actors on a set of native operating system processes.
A service is provdied to actors which allows them access to a highly available distributed file system,
which serves as the only source of persistent state for the system.
High availability is achieved using the Raft consensus protocol to synchronize the state of files between processes.
All data stored in the filesystem is secured with strong integrity and optional confidentiality protections.
A network block device like interface allows for fast low-level read and write access to the encrypted data,
with full support for client-side encryption.
Well-known cryptographic primitives and constructions are employed to provide this protection,
the system does not attempt to innovate in terms of cryptography.
The system's trust model allows for mutual TLS authentication between all processes in the system,
even those which are controlled by different owners.
By integrating these ideas into a single platform,
the system aims to advance the status quo in the security and reliability of software systems.
\end{abstract}

\section{Introduction}
% Describe paths, actors, and files. Emphasize the benefit of actors and files sharing the same
% namespace.
Blocktree is an attempt to extend the Unix philosophy that everything is a file
to the entire distributed system that comprises modern IT infrastructure.
The system is organized around a global distributed filesystem which defines security
principals, resources, and their authorization attributes.
This filesystem provides a language for access control that can be used to securely grant principals
access to resources from different organizations, without the need to setup federation.
The system provides an actor runtime for orchestrating tasks and services.
Resources are represented by actors, and actors are grouped into operating system processes.
Each process has its own credentials which authenticate it as a unique security principal,
and which specify the filesystem path where the process is located.
A process has authorization attributes which determine the set of processes that may communicate with it.
Every connection between processes is established using mutual TLS authentication,
which is accomplished without the need to trust any third-party certificate authorities.
The cryptographic mechanisms which make this possible are described in detail in section 3.
Messages addressed to actors in a different process are forwarded over these connections,
while messages delivered to actors in the same process are delivered with zero-copying.

One of the major challenges in distributed systems is managing persistent state.
Blocktree solves this issue using its distributed filesystem.
Files are broken into segments called sectors.
The sector size of a file can be configured when it is created,
but cannot be changed after the fact.
Reads and writes of individual sectors are guaranteed to be atomic.
The sectors which comprise a file and its metadata are replicated by a set of processes running
the sector service.
This service is responsible for storing the sectors of files which are contained in the directory
containing the process in which it is running.
The actors providing the sector service in a given directory coordinate with one another using
the Raft protocol to synchronize the state of the sectors they store.
This method of partitioning the data in the filesystem based on directory
allows the system to scale beyond the capabilities of a single consensus cluster.
Sectors are secured with strong integrity protection,
which allows anyone to verify that their contents were written by an authorized principal.
Encryption can be optionally applied to sectors,
with the system handling key management.
The cryptographic mechanisms used to implement these protections are described in section 3.

To reduce load on the sector service, and to allow the system to scale to a larger number of users,
a peer-to-peer distribution system is implemented in the filesystem service.
This system allows filesystem actors to download sectors from other filesystem actors
that have the sectors in their local cache.
The threat of malicious actors serving bad sector data is mitigated by the strong integrity
protections applied to sectors.
By using peer-to-peer distribution, the system can serve as a content delivery network.

One of the design goals of Blocktree is to facilitate the creation of composable distributed
systems.
A major challenge to building such systems is the difficulty in pinning down bugs when they
inevitably occur.
Research into session types (a.k.a. Behavioral Types) promises to bring the safety benefits
of type checking to actor communication.
Blocktree integrates a session typing system that allows protocol contracts to be defined that
specify the communication patterns of a set of actors.
This model allows the state space of the set of actors participating in a computation to be defined,
and the state transitions which occur to be specified based on the types of received messages.
These contracts are used to verify protocol adherence statically and dynamically.
This system is implemented using compile time code generation,
making it a zero-cost abstraction.
By freeing the developer from dealing with the numerous failure modes that occur in a communication protocol,
they are able to focus on the functionality of their system.

Blocktree is implemented in the Rust programming language.
Its source code is licensed under the Affero GNU Public License Version 3.
It can be downloaded at the project homepage at \url{https://blocktree.systems}.
Anyone interested in contributing to development is welcome to submit a pull request
to \url{https://gogs.delease.com/Delease/Blocktree}.
If you have larger changes or architectural suggestions,
please submit an issue for discussion prior to spending time implementing your idea.

% Describe the remainder of the paper.
The remainder of this paper is structured as follows:
\begin{itemize}
  \item Section 2 describes the actor runtime, service and task orchestration, and service
    discovery.
  \item Section 3 discusses the filesystem, its concurrency semantics and implementation.
  \item Section 4 details the cryptographic mechanisms used to secure communication between
    actor runtimes and to protect sector data.
  \item Section 5 is a set of examples describing ways that Blocktree can be used to build systems.
  \item Section 6 provides some concluding remarks.
\end{itemize}

\section{Actor Runtime}
% Motivation for using the actor model. 
Building scalable fault tolerant systems requires us to distribute computation over
multiple computers.
Rather than switching to a different programming model when an application scales beyond the
capacity of a single computer,
it is beneficial in terms of programmer time and program simplicity to begin with a model that 
enables multi-computer scalability.
Fundamentally, all communication over an IP network involves the exchange of messages,
namely IP packets.
So if we wish to build scalable fault-tolerant systems,
it makes sense to choose a programming model built on message passing,
as this will ensure low impedance with the underlying networking technology.

% Overview of message passing interface.
That is why Blocktree is built on the actor model
and why its actor runtime is at the core of its architecture.
The runtime can be used to spawn new actors, register services, and dispatch messages.
Messages can be dispatched in two different ways: with \texttt{send} and \texttt{call}.
A message is dispatched with the \texttt{send} method when no reply is required,
and with \texttt{call} when exactly one is.
The \texttt{Future} returned by \texttt{call} can be awaited to obtain the reply.
If a timeout occurs while waiting for the reply,
then the \texttt{Future} completes with an error.
The name \texttt{call} was chosen to bring to mind a remote procedure call,
which is the primary use case this method was intended for.
Awaiting replies to messages serves as a simple way to synchronize a distributed computation.

% Delivering messages over the network.
Messages can be forwarded between actor runtimes using a secure transport layer called
\texttt{bttp}.
Messages are addressed using \emph{actor names}.
An actor name is a pair consisting of the filesystem path of the runtime
and a UUID specifying an actor in that runtime.
Every message has a header containing the name of the sender and receiver.
The transport is implemented using the QUIC protocol, which integrates TLS for security.
The TLS handshake between runtimes is performed using mutual TLS authentication.
This handshake cryptographically verifies the credentials of each runtime.
These credentials contain the filesystem path where each runtime is located,
which ensures that messages addressed to a specific path will only be delivered to the runtime
at that path.

% Delivering messages locally.
When a message is sent between actors in the same runtime it is delivered into the queue of the recipient without any copying,
while ensuring immutability (move semantics).
This is possible thanks to the Rust ownership system,
because the message sender gives ownership to the runtime when it dispatches the message,
and the runtime gives ownership to the recipient when it delivers the message.

% Security model based on filesystem permissions.
A runtime is represented in the filesystem as a file.
This file contains the authorization attributes which are associated with the runtime's security
principal.
The credentials used by the runtime specify the file, so other runtimes are able to locate it.
The metadata of the file contains authorization attributes just like any other file
(e.g. UID, GID, and mode bits).
In order for a principal to be able to send a message to an actor in the runtime,
it must have execute permissions for this file.
Thus communication between runtimes can be controlled using simple filesystem permissions.
Permissions checking is done during the \texttt{bttp} handshake.
Note that it is possible for messages to be sent in one direction in a \texttt{bttp} connection
but not in the other.
In this situation replies are permitted but unsolicited messages are not.
An important trade-off which was made when designing this model was that messages which are
sent between actors in the same runtime are not subject to any authorization checks.
This was done for two reasons: performance and security.
By eliminating authorization checks messages can be more efficiently delivered between actors in the
same process,
which helps to reduce the performance penalty of the actor runtime over directly using threads.
Security is enhanced by this decision because it forces the user to separate actors with different
security requirements into different operating system processes,
which ensures all of the process isolation machinery in the operating system will be used to
isolate the different security domains.

% Representing resources as actors.
As in other actor systems, it is convenient to represent resources in Blocktree using actors.
This allows the same security model used to control communication between actors to be used for
controlling access to resources,
and for resources to be shared by many actors.
For instance, a Point-to-Point Protocol connection could be owned by an actor.
This actor could forward traffic delivered to it in messages over this connection.
The set of actors which are able to access the connection is controlled by setting the filesystem
permissions on the file for the runtime executing the actor with the connection.

% Service discovery.
In addition to spawning actors, the runtime can also be used to register actors as service
providers.
A service is identified by a filesystem path.
One or more actors may be register as providing a service.
Services are resolved to actor names by the runtime.
The service resolution method takes the path of a service and a scope path.
The scope path defines the filesystem path where service resolution will begin.
Resolution produces the name of an actor which is registered in a runtime which is closest to the
scope, or \texttt{None} if no service provider can be found.
To be more precise, consider the following cases:
\begin{enumerate}
  \item If the scope is the path of a runtime, and there are service providers registered in the
    runtime, then one of their names if returned. Otherwise, service resolution is retried using a
    new scope which is obtained by removing the last path component of the current scope.
  \item If a directory is specified, then all of the runtimes in the directory are checked for
    registered service providers, and if one is found its name is returned. Otherwise, service
    resolution is retried using a new scope which is obtained by removing the last path component of
    the current scope.
  \item If the scope is the empty string, then \texttt{None} is returned.
\end{enumerate}
In order to contact other runtimes and query their service registrations,
their IP addresses need to be known.
To enable this a file with the runtime's IP address is maintained in the same directory as the
runtime.
The runtime is granted write permissions on the file,
and it is updated by the transport layer when it begin listening on a new endpoint.

% The sector and filesystem service.
The filesystem is itself implemented as a service.
A filesystem service provider can be passed messages to delete files, list directory contents,
open files, or perform several other standard filesystem operations.
When a file is opened,
a new actor is spawned which owns the newly created file handle and its name is returned to the
caller in a reply.
Subsequent read and write messages are sent to this actor.
The filesystem service does not persist any data itself,
its job is to function as an integration layer,
conglomerating sector data from many different sources into a single unified interface.
The sector service is what is ultimately responsible for storing data,
and thus maintaining the persistent state of the system.
It stores sector data in the local filesystem of each computer on which it is registered.
The details of how this is accomplished are deferred to the next section.

% protocol contracts, and runtime checking of protocol adherence. Emphasize the benefits to
% system composability that this enables, where errors can be traced back to the actor which
% violated the contract.
To facilitate the creation of composable systems,
a protocol contract checking system based on session types has been designed.
This system operates on a state transition model of a communications protocol.

\section{Filesystem}
% Benefits of using a distributed filesystem as the sole source of persistent state for the system,
% including secure software delivery.

% Accessing data at two different levels of abstraction: sectors and files.

% Concurrency semantics at the sector layer, and their implementation using Raft.

\section{Cryptography}

\section{Examples}
This section contains examples of systems built using Blocktree. The hope is to illustrate how this
platform can be used to implement existing applications more easily and to make it possible to
implement systems which are currently out of reach.

\subsection{A personal cloud for a home user.}
% Describe my idealized home Blocktree setup.

\subsection{An ecommerce website.}
% Describe a blocktree which runs a cluster of webservers, a manufacturing process, a warehouse
% inventory management system, and an order fulfillment system.

\subsection{A realtime geo-spacial environment.}
% Explain my vision of the metaverse.

\section{Conclusion}

\end{document}