Эх сурвалжийг харах

Added citations to the paper.

Matthew Carr 1 жил өмнө
parent
commit
ca841bcece

+ 59 - 36
doc/BlocktreeDce/BlocktreeDce.tex

@@ -1,7 +1,10 @@
 \documentclass{article}
+\usepackage{amsfonts,amssymb,amsmath}
 \usepackage[scale=0.8]{geometry}
 \usepackage{hyperref}
 \usepackage{graphicx}
+\usepackage{biblatex}
+\bibliography{../citations.bib}
 
 \title{Blocktree: A Distributed Computing Environment}
 \author{Matthew Carr}
@@ -17,7 +20,7 @@ The persistent state for the system is stored in a global distributed filesystem
 this actor runtime.
 High availability is achieved using the Raft consensus protocol to synchronize the state of files between processes.
 All data stored in the filesystem is secured with strong integrity and optional confidentiality protections.
-Well-known cryptographic constructions are used to provide this protection,
+Well-known cryptographic constructions are used to provide these protections,
 the system does not attempt to innovate in terms of cryptography.
 A network block device interface allows for fast low-level read and write access to file sectors,
 with full support for client-side encryption.
@@ -34,12 +37,13 @@ to the entire distributed system that comprises modern IT infrastructure.
 The system is organized around a global distributed filesystem which defines security
 principals, resources, and their authorization attributes.
 This filesystem provides a language for access control that can be used to securely grant
-access to resources, even those owned by different organizations.
+access to resources,
+even those owned by different organizations.
 The system provides an actor runtime for orchestrating services.
 Resources are represented as actors
 and actors are executed by runtimes in different operating system processes.
 Each process has its own credentials which authenticate it as a unique security principal,
-and which specify the filesystem path where it is located.
+and which specify the filesystem path where it's located.
 A process has authorization attributes which determine the set of processes that it may communicate
 with.
 TLS authentication is used to secure connections between processes.
@@ -65,6 +69,8 @@ and consist of certificates with correctly scoped authority in order for the fil
 Given the path of a file and the file's contents,
 this allows the file to be validated by anyone without the need to trust a third-party.
 Blocktree paths are called self-certifying for this reason.
+This construction was independently discovered by the author,
+but a similar system was previously used in the Self-certifying File System (SFS) \cite{sfs}.
 
 % Persistent state provided by the filesystem.
 One of the major challenges in distributed systems is managing persistent state.
@@ -78,7 +84,7 @@ the sector service.
 These service providers are responsible for storing the sectors of files that are contained in the
 directory containing the runtime in which it's running.
 The actors providing the sector service in a given directory coordinate with one another using
-the Raft protocol to synchronize the state of the sectors they store.
+the Raft protocol \cite{raft} to synchronize the state of the sectors they store.
 By partitioning the data in the filesystem based on directory,
 the system can scale beyond the capabilities of a single consensus cluster.
 Associated with every file is a Merkle tree of sector hashes,
@@ -92,7 +98,7 @@ systems.
 A major challenge to building such systems is the difficulty is locating the cause of bugs when they
 inevitably occur.
 Research into session types (a.k.a. Behavioral Types) promises to bring the safety benefits
-of type checking to actor communication.
+of type checking to actor communication (\cite{armstrong} chapter 9).
 Blocktree integrates a session typing system that allows protocol contracts to be defined that
 specify the communication protocol of a set of actors.
 This model allows the state space of the actors participating in a computation to be defined,
@@ -106,16 +112,16 @@ communication protocol.
 Blocktree is implemented in the Rust programming language.
 It is currently tested on Linux,
 but running it on other Unix-like operating systems should be straight-forward.
-FUSE support is required to mount the filesystem.
+FUSE support from the host kernel is required to mount the filesystem.
 The system's source code is licensed under the Affero GNU Public License Version 3.
 The project's homepage is \url{https://blocktree.systems}.
 Anyone interested in contributing to development is welcome to submit a pull request
 to \url{https://gogs.delease.com/Delease/Blocktree}.
 If you have larger changes or architectural suggestions,
-please submit an issue for discussion prior to spending time implementing your idea.
+please submit an issue for discussion prior to investing your time in an implementation.
 
 % Outline of the rest of the paper.
-The remainder of this paper is structured as follows:
+The remainder of this document is structured as follows:
 \begin{itemize}
   \item Section 2 describes the actor runtime, services, and runtime discovery.
   \item Section 3 discusses the filesystem, its concurrency semantics and implementation.
@@ -132,8 +138,8 @@ Building scalable fault tolerant systems requires us to distribute computation o
 multiple computers.
 Rather than switching to a different programming model when an application scales beyond the
 capacity of a single computer,
-it's beneficial in terms of programmer time and program simplicity,
-to begin with a model that enables multi-computer scalability.
+it's beneficial in terms of programmer time and program simplicity to begin with a model that
+enables multi-computer scalability.
 Fundamentally, all communication over a network involves the exchange of messages.
 So if we wish to build scalable fault-tolerant systems,
 it makes sense to choose a programming model built on message passing,
@@ -145,9 +151,11 @@ and why its actor runtime is at the core of its architecture.
 The runtime can be used to spawn actors, register services, dispatch messages immediately,
 and schedule messages to be delivered in the future.
 Messages can be dispatched in two ways: with \texttt{send} and \texttt{call}.
-A message is dispatched with the \texttt{send} method when no reply is required,
+A message is dispatched with \texttt{send} when no reply is required,
 and with \texttt{call} when exactly one is.
-The \texttt{Future} returned by \texttt{call} can be awaited to obtain the reply.
+The Rust
+\href{https://doc.rust-lang.org/std/future/trait.Future.html}{\texttt{Future}}
+returned by \texttt{call} can be awaited to obtain the reply.
 If a timeout occurs while waiting for the reply,
 the \texttt{Future} completes with an error.
 The name \texttt{call} was chosen to bring to mind a remote procedure call,
@@ -157,7 +165,7 @@ Awaiting replies to messages serves as a simple way to synchronize a distributed
 % Scheduling messages for future delivery.
 Executing actions at some point in the future or at regular intervals are common tasks in computer
 systems.
-Blocktree facilitates this by allows messages to be scheduled for future delivery.
+Blocktree facilitates this by allowing messages to be scheduled for future delivery.
 The schedule may specify a one time delivery at a specific instant in time,
 or a repeating delivery with a given period.
 These scheduling modes can be combined so that you can specify an anchoring instant
@@ -175,13 +183,14 @@ But, if a message is periodic,
 any messages which were missed due to a runtime not being active will never be sent.
 This is because the runtime only persists the message's schedule,
 not every delivery.
-This mechanism is intended for periodic tasks or delaying work to a later time.
-It is not for building hard realtime systems.
+This mechanism is intended for periodic tasks or delaying work to a later time,
+not for building hard realtime systems.
 
 % Description of virtual actor system.
 One of the challenges in building actor systems is supervising and managing actors' lifecycles.
-This is handled in Erlang through the use of supervision trees,
-but Blocktree takes a different approach, one inspired by Microsoft's Orleans framework.
+This is handled in Erlang \cite{armstrong} through the use of supervision trees,
+but Blocktree takes a different approach, one inspired by Microsoft's Orleans framework
+\cite{orleans}.
 Orleans introduced the concept of virtual actors,
 which are purely logical entities that exist perpetually.
 In Orleans, one does not need to spawn actors nor worry about respawning them should they crash,
@@ -231,7 +240,8 @@ file.
 Messages are then dispatched to the file actor using its actor name to read and write to the file.
 
 % The runtime is implemented using tokio.
-The actor runtime is implemented using the Rust asynchronous runtime tokio.
+The actor runtime is implemented using the Rust asynchronous runtime tokio
+[\url{https://tokio.rs}].
 Actors are spawned as tasks in the tokio runtime,
 and multi-producer single consumer channels are used for message delivery.
 Because actors are just tasks,
@@ -247,7 +257,8 @@ and is ideal for a system focused on orchestrating services which may be used by
 
 % Delivering messages over the network.
 Messages can be forwarded between actor runtimes using a secure transport called \texttt{bttp}.
-This transport is implemented using the QUIC protocol, which integrates TLS for security.
+This transport is implemented using the QUIC protocol \cite{quic}, which integrates TLS for
+security.
 A \texttt{bttp} client may connect anonymously or using credentials.
 If an anonymous connection is attempted,
 the client has no authorization attributes associated with it.
@@ -284,8 +295,8 @@ A runtime is represented in the filesystem as a file.
 Among other things,
 this file contains the authorization attributes associated with the runtime's security
 principal.
-The certificate used by the runtime to authenticate contain the to this file,
-so other runtimes are able to locate it.
+The certificate used by the runtime to authenticate is also contained in this file,
+so other runtimes are able to locate it and the public key contained within it.
 The metadata of the file contains authorization attributes just like any other file
 (e.g. UID, GID, and mode bits).
 In order for a principal to be able to send a message to an actor in the runtime,
@@ -300,8 +311,8 @@ sent between actors in the same runtime are not subject to any authorization che
 This was done for two reasons: performance and security.
 By eliminating authorization checks messages can be more efficiently delivered between actors in the
 same process,
-which helps to reduce the performance penalty of the actor runtime over directly using
-\texttt{tokio::Task}s.
+which helps to reduce the performance penalty of the actor runtime over directly using a
+\href{https://docs.rs/tokio/latest/tokio/task/index.html}{\texttt{tokio::Task}}.
 Security is enhanced by this decision because it forces the user to separate actors with different
 security requirements into different operating system processes,
 which ensures all of the process isolation machinery in the operating system will be used to
@@ -319,7 +330,7 @@ permissions on the file for the runtime executing the actor owning the connectio
 
 % Actor ownership.
 The concept of ownership in programming languages is very useful for ensuring that resources are
-properly freed when the type using them dies.
+properly released when the object using them dies.
 Because actors are used for encapsulating resources in Blocktree,
 a similar system of ownership is employed.
 An actor is initially owned by the actor that spawned it.
@@ -387,7 +398,7 @@ The list is also read by other runtime's when they're searching for service prov
 % The sector and filesystem service.
 The filesystem is itself implemented as a service.
 A filesystem service provider can be passed messages to delete files, list directory contents,
-open files, or perform several other standard filesystem operations.
+open files, or perform other standard filesystem operations.
 When a file is opened,
 a new actor is spawned which owns the newly created file handle and its name is returned to the
 caller in a reply.
@@ -405,7 +416,7 @@ While it's possible to resolve runtime paths to network endpoints when the files
 another mechanism is needed to allow the filesystem service providers to be discovered.
 This is accomplished by allowing runtimes to query one another to learn of other runtimes.
 Because queries are intended to facilitate message delivery,
-the query fields and their meanings mirror those used for addressing messages:
+the query fields and their semantics mirror those used for addressing messages:
 \begin{enumerate}
   \item \texttt{service} The path of the service whose providers are sought.
     Only runtimes with this service registered will be returned.
@@ -456,13 +467,13 @@ These runtimes would also need to be configured with static IP addresses,
 and the NS records for the search domain would need to point to them.
 It is also possible to build such a system without hosting DNS inside of Blocktree,
 by using a dynamic DNS service.
-The downside of using DNS is that it couples Blocktree with a centralized,
+The downside of using DNS is that it couples Blocktree with a centrally administered,
 albeit distributed, system.
 
 % Using link-local multicast datagrams to find runtimes.
 Because this mechanism requires knowledge of the root principal of a domain to perform
 discovery,
-it will not work if a runtime does not know its own root principal because it's starting up for the
+it will not work if a runtime doesn't know its own root principal because it's starting up for the
 first time and has no credentials.
 This runtime needs a way to discover other runtimes so it can connect to the filesystem and sector
 services.
@@ -592,11 +603,15 @@ The definition of \texttt{Activate} is as follows:
     act_id: Uuid,
   }
 \end{verbatim}
+A static reference can be given to a runtime because a runtime is required to live for the
+entire lifetime of a process.
+This allows simple references to be passed around,
+avoiding the complexity of lifetimes and the overhead of reference counting.
 The \texttt{Envelope} type is a wrapper around a message which contains information about who sent
 it and a method that can be used to send a reply.
 In general a new actor state, represented by a new type, can be returned by a messaging handling
 method.
-The protocol itself is also represented by a trait:
+The protocol itself is represented by the trait:
 \begin{verbatim}
   pub trait PubSubProtocol {
     type Server: ServerInit;
@@ -615,6 +630,9 @@ Wasm.
 This work is blocked pending the standardization of the WebAssembly Component Model,
 which promises to provide an interface definition language which will allow type safe actors to be
 defined in many different languages.
+Once Wasm support is added,
+it will make sense to use the filesystem to distribute compiled actor modules,
+as the strong integrity protection it provides make it an ideal way to securely distribute software.
 
 % Running containers using actors.
 While the actor runtime can be a convenient way of implementing new systems,
@@ -913,11 +931,11 @@ increasing the performance of the system.
 
 \section{Cryptography}
 This section describes the cryptographic mechanisms used to integrity and confidentiality protect
-files.
+files as well as procedures for obtaining credentials.
 These mechanisms are based on well-established cryptographic constructions.
 
 % Integrity protection.
-File integrity is protected by a digital signature over its metadata.
+A file is integrity protected by a digital signature over its metadata.
 The metadata contains an integrity field which contains the root node of the Merkle tree over
 the file's contents.
 This allows any sector in the file to be verified with a number of hash function invocations that
@@ -930,6 +948,7 @@ A file's metadata also contains a certificate chain,
 and this chain is used to authenticate the signature over the metadata.
 In Blocktree, the certificate chain is referred to as a \emph{writecap}
 because it grants the capability to write to files.
+This term comes from the Tahoe Least-Authority Filesystem \cite{tahoe}.
 The certificates in a valid writecap are ordered by their paths,
 the initial certificate contains the longest path,
 the path in each subsequent certificate must be a prefix of the one preceding it,
@@ -952,6 +971,7 @@ A file's key and IV are encrypted using the public keys of the principals to who
 allowed.
 The resulting ciphertext is referred to as a \emph{readcap}, as it grants the capability to read the
 file.
+This term is also from Tahoe \cite{tahoe}.
 These readcaps are stored in a table in the file's metadata.
 Each entry in the table is identified by a byte string that is derived from the public key of the
 principal who owns the entry's readcap.
@@ -1042,7 +1062,7 @@ A symmetric cipher is used to protect the root credentials, if they are stored,
 but it relies on the security of the underlying filesystem to protect the process credentials.
 For this reason it is not recommended for production use.
 The other credential store is called \texttt{TpmCredStore},
-and it uses a Trusted Platform Module (TPM) 2.0 to store credentials.
+and it uses a Trusted Platform Module (TPM) 2.0 \cite{tpm} to store credentials.
 The TPM is used to generate the process's credentials in such a way that they can never be
 exported from the TPM (this is a feature of TPM 2.0).
 A randomly generated cookie is needed to use these credentials.
@@ -1119,7 +1139,8 @@ Up till now the focus has been on authentication and authorization of processes,
 but it bears discussing how user based access control can be accomplished with Blocktree.
 Because credentials are locked to the device on which they're created,
 a user will be associated with at least as many principals as they have devices.
-But, all of these principals can be configured to have the same authorization attributes (UID, GID),
+But, all of these principals can be configured to have the same authorization attributes
+(UID, GID, SELinux context, etc.),
 giving them the same permissions.
 It makes sense to provision all of the runtimes associated with a user in one place
 and the natural place is the user's home directory.
@@ -1560,7 +1581,7 @@ which will make it more resistent to disruption and censorship.
 Cloud computing has also driven changes in the way businesses acquire computing resources.
 It's common for startups to rent all of their computing resources from one large cloud provider
 and there are compelling economic and technical reasons to do this.
-But, as a firm grows they often experience growing pains as their cloud bills also grow.
+But, as a firm grows they often experience growing pains as their cloud bills grow with them.
 If the firm has not developed their software with a multi-cloud, or hybrid approach in mind,
 they may face the prospect of major changes in order to bring their application on-prem or to a
 rival cloud.
@@ -1580,7 +1601,7 @@ There are many reasons for this,
 from the reliance on passwords for authentication, to the complexity of the software supply chain,
 but it's clear that as IT professionals we need to do more to keep the systems under our
 protection safe.
-Blocktree helps us to do this by solving many of the difficult problems involved with securing
+Blocktree helps us do this by solving many of the difficult problems involved with securing
 communication on a hostile network.
 It takes a true zero-trust approach,
 ensuring that all communications between processes is authenticated using public key cryptography.
@@ -1602,8 +1623,10 @@ it is hoped that low overhead communication between distributed components can b
 By using this system to provide a global distributed filesystem,
 it is hoped that the interoperable sharing of data can be achieved.
 And by using protocol contracts to constrain actor communication,
-it is hoped that the structure and safety can bring order to distributed computation.
+it is hoped that structure and safety can bring order to distributed computation.
 While it's possible to see some of the applications that can be built from these abstractions,
 their composability and the creativity of developers will lead to systems that cannot be foreseen.
 
+\printbibliography
+
 \end{document}

+ 35 - 1
doc/citations.bib

@@ -101,7 +101,7 @@ series = {PLDI 2017}
     doi = {10.1145/3140587.3062363},
     abstract = { The maturation of the Web platform has given rise to sophisticated and demanding Web applications such as interactive 3D visualization, audio and video software, and games. With that, efficiency and security of code on the Web has become more important than ever. Yet JavaScript as the only built-in language of the Web is not well-equipped to meet these requirements, especially as a compilation target.  Engineers from the four major browser vendors have risen to the challenge and collaboratively designed a portable low-level bytecode called WebAssembly. It offers compact representation, efficient validation and compilation, and safe low to no-overhead execution. Rather than committing to a specific programming model, WebAssembly is an abstraction over modern hardware, making it language-, hardware-, and platform-independent, with use cases beyond just the Web. WebAssembly has been designed with a formal semantics from the start. We describe the motivation, design and formal semantics of WebAssembly and provide some preliminary experience with implementations. },
     journal = {SIGPLAN Not.},
-    month = {jun},
+    month = {6},
     pages = {185–200},
     numpages = {16},
     keywords = {assembly languages, type systems, virtual machines, just-in-time compilers, programming languages}
@@ -113,3 +113,37 @@ series = {PLDI 2017}
     school    = {Royal Institute of Technology, Stockholm, Sweden},
     year      = {2003}
 }
+
+@techreport{orleans,
+    author = {Bernstein, Phil and Bykov, Sergey and Geller, Alan and Kliot, Gabriel and Thelin, Jorgen},
+    title = {Orleans: Distributed Virtual Actors for Programmability and Scalability},
+    year = {2014},
+    month = {3},
+    abstract = {High-scale interactive services demand high throughput with low latency and high availability, difficult goals to meet with the traditional stateless 3-tier architecture. The actor model makes it natural to build a stateful middle tier and achieve the required performance. However, the popular actor model platforms still pass many distributed systems problems to the developers.
+    The Orleans programming model introduces the novel abstraction of virtual actors that solves a number of the complex distributed systems problems, such as reliability and distributed resource management, liberating the developers from dealing with those concerns. At the same time, the Orleans runtime enables applications to attain high performance, reliability and scalability.
+    This paper presents the design principles behind Orleans and demonstrates how Orleans achieves a simple programming model that meets these goals. We describe how Orleans simplified the development of several scalable production applications on Windows Azure, and report on the performance of those production systems.},
+    url = {https://www.microsoft.com/en-us/research/publication/orleans-distributed-virtual-actors-for-programmability-and-scalability/},
+    number = {MSR-TR-2014-41},
+}
+
+@inproceedings{sfs,
+    author = {Mazi\`{e}res, David and Kaashoek, M. Frans},
+    title = {Escaping the Evils of Centralized Control with Self-Certifying Pathnames},
+    year = {1998},
+    isbn = {9781450373173},
+    publisher = {Association for Computing Machinery},
+    address = {New York, NY, USA},
+    url = {https://doi.org/10.1145/319195.319213},
+    doi = {10.1145/319195.319213},
+    booktitle = {Proceedings of the 8th ACM SIGOPS European Workshop on Support for Composing Distributed Applications},
+    pages = {118–125},
+    numpages = {8},
+    location = {Sintra, Portugal},
+    series = {EW 8}
+}
+
+@inproceedings{quic,
+    title	= {The QUIC Transport Protocol: Design and Internet-Scale Deployment},
+    author	= {Adam Langley and Al Riddoch and Alyssa Wilk and Antonio Vicente and Charles 'Buck' Krasic and Cherie Shi and Dan Zhang and Fan Yang and Feodor Kouranov and Ian Swett and Janardhan Iyengar and Jeff Bailey and Jeremy Christopher Dorfman and Jim Roskind and Joanna Kulik and Patrik Göran Westin and Raman Tenneti and Robbie Shade and Ryan Hamilton and Victor Vasiliev and Wan-Teh Chang},
+    year	= {2017}
+}