123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275 |
- \documentclass{article}
- \usepackage[scale=0.8]{geometry}
- \usepackage{hyperref}
- \title{The Blocktree Cloud Orchestration Platform}
- \author{Matthew Carr}
- \begin{document}
- \maketitle
- \begin{abstract}
- This document is a proposal for a novel cloud platform called Blocktree.
- The system is described in terms of the actor model,
- where tasks and services are implemented as actors.
- The platform is responsible for orchestrating these actors on a set of native operating system processes.
- A service is provdied to actors which allows them access to a highly available distributed file system,
- which serves as the only source of persistent state for the system.
- High availability is achieved using the Raft consensus protocol to synchronize the state of files between processes.
- All data stored in the filesystem is secured with strong integrity and optional confidentiality protections.
- A network block device like interface allows for fast low-level read and write access to the encrypted data,
- with full support for client-side encryption.
- Well-known cryptographic primitives and constructions are employed to provide this protection,
- the system does not attempt to innovate in terms of cryptography.
- The system's trust model allows for mutual TLS authentication between all processes in the system,
- even those which are controlled by different owners.
- By integrating these ideas into a single platform,
- the system aims to advance the status quo in the security and reliability of software systems.
- \end{abstract}
- \section{Introduction}
- % Describe paths, actors, and files. Emphasize the benefit of actors and files sharing the same
- % namespace.
- Blocktree is an attempt to extend the Unix philosophy that everything is a file
- to the entire distributed system that comprises modern IT infrastructure.
- The system is organized around a global distributed filesystem which defines security
- principals, resources, and their authorization attributes.
- This filesystem provides a language for access control that can be used to securely grant principals
- access to resources from different organizations, without the need to setup federation.
- The system provides an actor runtime for orchestrating tasks and services.
- Resources are represented by actors, and actors are grouped into operating system processes.
- Each process has its own credentials which authenticate it as a unique security principal,
- and which specify the filesystem path where the process is located.
- A process has authorization attributes which determine the set of processes that may communicate with it.
- Every connection between processes is established using mutual TLS authentication,
- which is accomplished without the need to trust any third-party certificate authorities.
- The cryptographic mechanisms which make this possible are described in detail in section 3.
- Messages addressed to actors in a different process are forwarded over these connections,
- while messages delivered to actors in the same process are delivered with zero-copying.
- One of the major challenges in distributed systems is managing persistent state.
- Blocktree solves this issue using its distributed filesystem.
- Files are broken into segments called sectors.
- The sector size of a file can be configured when it is created,
- but cannot be changed after the fact.
- Reads and writes of individual sectors are guaranteed to be atomic.
- The sectors which comprise a file and its metadata are replicated by a set of processes running
- the sector service.
- This service is responsible for storing the sectors of files which are contained in the directory
- containing the process in which it is running.
- The actors providing the sector service in a given directory coordinate with one another using
- the Raft protocol to synchronize the state of the sectors they store.
- This method of partitioning the data in the filesystem based on directory
- allows the system to scale beyond the capabilities of a single consensus cluster.
- Sectors are secured with strong integrity protection,
- which allows anyone to verify that their contents were written by an authorized principal.
- Encryption can be optionally applied to sectors,
- with the system handling key management.
- The cryptographic mechanisms used to implement these protections are described in section 3.
- To reduce load on the sector service, and to allow the system to scale to a larger number of users,
- a peer-to-peer distribution system is implemented in the filesystem service.
- This system allows filesystem actors to download sectors from other filesystem actors
- that have the sectors in their local cache.
- The threat of malicious actors serving bad sector data is mitigated by the strong integrity
- protections applied to sectors.
- By using peer-to-peer distribution, the system can serve as a content delivery network.
- One of the design goals of Blocktree is to facilitate the creation of composable distributed
- systems.
- A major challenge to building such systems is the difficulty in pinning down bugs when they
- inevitably occur.
- Research into session types (a.k.a. Behavioral Types) promises to bring the safety benefits
- of type checking to actor communication.
- Blocktree integrates a session typing system that allows protocol contracts to be defined that
- specify the communication patterns of a set of actors.
- This model allows the state space of the set of actors participating in a computation to be defined,
- and the state transitions which occur to be specified based on the types of received messages.
- These contracts are used to verify protocol adherence statically and dynamically.
- This system is implemented using compile time code generation,
- making it a zero-cost abstraction.
- By freeing the developer from dealing with the numerous failure modes that occur in a communication protocol,
- they are able to focus on the functionality of their system.
- Blocktree is implemented in the Rust programming language.
- Its source code is licensed under the Affero GNU Public License Version 3.
- It can be downloaded at the project homepage at \url{https://blocktree.systems}.
- Anyone interested in contributing to development is welcome to submit a pull request
- to \url{https://gogs.delease.com/Delease/Blocktree}.
- If you have larger changes or architectural suggestions,
- please submit an issue for discussion prior to spending time implementing your idea.
- % Describe the remainder of the paper.
- The remainder of this paper is structured as follows:
- \begin{itemize}
- \item Section 2 describes the actor runtime, service and task orchestration, and service
- discovery.
- \item Section 3 discusses the filesystem, its concurrency semantics and implementation.
- \item Section 4 details the cryptographic mechanisms used to secure communication between
- actor runtimes and to protect sector data.
- \item Section 5 is a set of examples describing ways that Blocktree can be used to build systems.
- \item Section 6 provides some concluding remarks.
- \end{itemize}
- \section{Actor Runtime}
- % Motivation for using the actor model.
- Building scalable fault tolerant systems requires us to distribute computation over
- multiple computers.
- Rather than switching to a different programming model when an application scales beyond the
- capacity of a single computer,
- it is beneficial in terms of programmer time and program simplicity to begin with a model that
- enables multi-computer scalability.
- Fundamentally, all communication over an IP network involves the exchange of messages,
- namely IP packets.
- So if we wish to build scalable fault-tolerant systems,
- it makes sense to choose a programming model built on message passing,
- as this will ensure low impedance with the underlying networking technology.
- % Overview of message passing interface.
- That is why Blocktree is built on the actor model
- and why its actor runtime is at the core of its architecture.
- The runtime can be used to spawn new actors, register services, and dispatch messages.
- Messages can be dispatched in two different ways: with \texttt{send} and \texttt{call}.
- A message is dispatched with the \texttt{send} method when no reply is required,
- and with \texttt{call} when exactly one is.
- The \texttt{Future} returned by \texttt{call} can be awaited to obtain the reply.
- If a timeout occurs while waiting for the reply,
- then the \texttt{Future} completes with an error.
- The name \texttt{call} was chosen to bring to mind a remote procedure call,
- which is the primary use case this method was intended for.
- Awaiting replies to messages serves as a simple way to synchronize a distributed computation.
- % Delivering messages over the network.
- Messages can be forwarded between actor runtimes using a secure transport layer called
- \texttt{bttp}.
- Messages are addressed using \emph{actor names}.
- An actor name is a pair consisting of the filesystem path of the runtime
- and a UUID specifying an actor in that runtime.
- Every message has a header containing the name of the sender and receiver.
- The transport is implemented using the QUIC protocol, which integrates TLS for security.
- The TLS handshake between runtimes is performed using mutual TLS authentication.
- This handshake cryptographically verifies the credentials of each runtime.
- These credentials contain the filesystem path where each runtime is located,
- which ensures that messages addressed to a specific path will only be delivered to the runtime
- at that path.
- % Delivering messages locally.
- When a message is sent between actors in the same runtime it is delivered into the queue of the recipient without any copying,
- while ensuring immutability (move semantics).
- This is possible thanks to the Rust ownership system,
- because the message sender gives ownership to the runtime when it dispatches the message,
- and the runtime gives ownership to the recipient when it delivers the message.
- % Security model based on filesystem permissions.
- A runtime is represented in the filesystem as a file.
- This file contains the authorization attributes which are associated with the runtime's security
- principal.
- The credentials used by the runtime specify the file, so other runtimes are able to locate it.
- The metadata of the file contains authorization attributes just like any other file
- (e.g. UID, GID, and mode bits).
- In order for a principal to be able to send a message to an actor in the runtime,
- it must have execute permissions for this file.
- Thus communication between runtimes can be controlled using simple filesystem permissions.
- Permissions checking is done during the \texttt{bttp} handshake.
- Note that it is possible for messages to be sent in one direction in a \texttt{bttp} connection
- but not in the other.
- In this situation replies are permitted but unsolicited messages are not.
- An important trade-off which was made when designing this model was that messages which are
- sent between actors in the same runtime are not subject to any authorization checks.
- This was done for two reasons: performance and security.
- By eliminating authorization checks messages can be more efficiently delivered between actors in the
- same process,
- which helps to reduce the performance penalty of the actor runtime over directly using threads.
- Security is enhanced by this decision because it forces the user to separate actors with different
- security requirements into different operating system processes,
- which ensures all of the process isolation machinery in the operating system will be used to
- isolate the different security domains.
- % Representing resources as actors.
- As in other actor systems, it is convenient to represent resources in Blocktree using actors.
- This allows the same security model used to control communication between actors to be used for
- controlling access to resources,
- and for resources to be shared by many actors.
- For instance, a Point-to-Point Protocol connection could be owned by an actor.
- This actor could forward traffic delivered to it in messages over this connection.
- The set of actors which are able to access the connection is controlled by setting the filesystem
- permissions on the file for the runtime executing the actor with the connection.
- % Service discovery.
- In addition to spawning actors, the runtime can also be used to register actors as service
- providers.
- A service is identified by a filesystem path.
- One or more actors may be register as providing a service.
- Services are resolved to actor names by the runtime.
- The service resolution method takes the path of a service and a scope path.
- The scope path defines the filesystem path where service resolution will begin.
- Resolution produces the name of an actor which is registered in a runtime which is closest to the
- scope, or \texttt{None} if no service provider can be found.
- To be more precise, consider the following cases:
- \begin{enumerate}
- \item If the scope is the path of a runtime, and there are service providers registered in the
- runtime, then one of their names if returned. Otherwise, service resolution is retried using a
- new scope which is obtained by removing the last path component of the current scope.
- \item If a directory is specified, then all of the runtimes in the directory are checked for
- registered service providers, and if one is found its name is returned. Otherwise, service
- resolution is retried using a new scope which is obtained by removing the last path component of
- the current scope.
- \item If the scope is the empty string, then \texttt{None} is returned.
- \end{enumerate}
- In order to contact other runtimes and query their service registrations,
- their IP addresses need to be known.
- To enable this a file with the runtime's IP address is maintained in the same directory as the
- runtime.
- The runtime is granted write permissions on the file,
- and it is updated by the transport layer when it begin listening on a new endpoint.
- % The sector and filesystem service.
- The filesystem is itself implemented as a service.
- A filesystem service provider can be passed messages to delete files, list directory contents,
- open files, or perform several other standard filesystem operations.
- When a file is opened,
- a new actor is spawned which owns the newly created file handle and its name is returned to the
- caller in a reply.
- Subsequent read and write messages are sent to this actor.
- The filesystem service does not persist any data itself,
- its job is to function as an integration layer,
- conglomerating sector data from many different sources into a single unified interface.
- The sector service is what is ultimately responsible for storing data,
- and thus maintaining the persistent state of the system.
- It stores sector data in the local filesystem of each computer on which it is registered.
- The details of how this is accomplished are deferred to the next section.
- % protocol contracts, and runtime checking of protocol adherence. Emphasize the benefits to
- % system composability that this enables, where errors can be traced back to the actor which
- % violated the contract.
- To facilitate the creation of composable systems,
- a protocol contract checking system based on session types has been designed.
- This system operates on a state transition model of a communications protocol.
- \section{Filesystem}
- % Benefits of using a distributed filesystem as the sole source of persistent state for the system,
- % including secure software delivery.
- % Accessing data at two different levels of abstraction: sectors and files.
- % Concurrency semantics at the sector layer, and their implementation using Raft.
- \section{Cryptography}
- \section{Examples}
- This section contains examples of systems built using Blocktree. The hope is to illustrate how this
- platform can be used to implement existing applications more easily and to make it possible to
- implement systems which are currently out of reach.
- \subsection{A personal cloud for a home user.}
- % Describe my idealized home Blocktree setup.
- \subsection{An ecommerce website.}
- % Describe a blocktree which runs a cluster of webservers, a manufacturing process, a warehouse
- % inventory management system, and an order fulfillment system.
- \subsection{A realtime geo-spacial environment.}
- % Explain my vision of the metaverse.
- \section{Conclusion}
- \end{document}
|