|
@@ -151,7 +151,7 @@ as this will ensure low impedance with the underlying networking technology.
|
|
|
% Overview of message passing interface.
|
|
|
That is why Blocktree is built on the actor model
|
|
|
and why its actor runtime is at the core of its architecture.
|
|
|
-The runtime can be used to register services and dispatch messages.
|
|
|
+The runtime can be used to spawn actors, register services, and dispatch messages.
|
|
|
Messages can be dispatched in two different ways: with \texttt{send} and \texttt{call}.
|
|
|
A message is dispatched with the \texttt{send} method when no reply is required,
|
|
|
and with \texttt{call} when exactly one is.
|
|
@@ -172,18 +172,49 @@ In Orleans, one does not need to spawn actors nor worry about respawing them sho
|
|
|
the framework takes care of spawning an actor when a message is dispatched to it.
|
|
|
This model also gives the framework the flexibility to deactivate actors when they are idle
|
|
|
and to load balance actors across different computers.
|
|
|
-In Blocktree a similar system is used,
|
|
|
-which is possible because messages are only addressed to services.
|
|
|
+In Blocktree a similar system is used when messages are dispatched to services.
|
|
|
The Blocktree runtime takes care of routing these messages to the appropriate actors,
|
|
|
spawning them if needed.
|
|
|
|
|
|
+% Message addressing modes.
|
|
|
+Messages can be addressed to services or specific actors.
|
|
|
+When addressing a specific actor,
|
|
|
+the message contains an \emph{actor name},
|
|
|
+which is a pair consisting of the path of the runtime hosting the actor and the \texttt{Uuid}
|
|
|
+identifying the specific actor in that runtime.
|
|
|
+When addressing a service,
|
|
|
+the message is dispatched using a \emph{service name},
|
|
|
+which contains the following fields:
|
|
|
+\begin{enumerate}
|
|
|
+ \item \texttt{service}: The path identifying the receiving service.
|
|
|
+ \item \texttt{scope}: A filesystem path used to specify the intended recipient.
|
|
|
+ \item \texttt{rootwards}: An boolean describing whether message delivery is attempted towards or
|
|
|
+ away from the root of the filesystem tree. A value of
|
|
|
+ \texttt{false} indicates that the message is intended for a runtime directly contained in the
|
|
|
+ scope. A value of \texttt{true} indicates that the message is intended for a runtime contained
|
|
|
+ in a parent directory of the scope and should be delivered to a runtime which has the requested
|
|
|
+ service registered and is closest to the scope.
|
|
|
+ \item \texttt{id}: An identifier for a specific service provider.
|
|
|
+\end{enumerate}
|
|
|
+The ID can be a \texttt{Uuid} or a \texttt{String}.
|
|
|
+It is treated as an opaque identifier by the runtime,
|
|
|
+but a service is free to associate additional meaning to it.
|
|
|
+Every message has a header containing the name of the sender and receiver.
|
|
|
+The receiver name can be an actor or service name,
|
|
|
+but the receiver name is always an actor name.
|
|
|
+For example, to open a file in the filesystem,
|
|
|
+a message is dispatched with \texttt{call} using the service name of the filesystem service.
|
|
|
+The reply contains the name of the file actor spawned by the filesystem service which owns the opened
|
|
|
+file.
|
|
|
+Messages are then dispatched to the file actor using its actor name to read and write to the file.
|
|
|
+
|
|
|
% The runtime is implemented using tokio.
|
|
|
The actor runtime is currently implemented using the Rust asynchronous runtime tokio.
|
|
|
Actors are spawned as tasks in the tokio runtime,
|
|
|
and multi-producer single consumer channels are used for message delivery.
|
|
|
Because actors are just tasks,
|
|
|
they can do anything a task can do,
|
|
|
-including awaiting other futures.
|
|
|
+including awaiting other \texttt{Future}s.
|
|
|
Because of this, there is no need for the actor runtime to support short-lived worker tasks,
|
|
|
as any such use-case can be accomplished by awaiting a set of \texttt{Future}s.
|
|
|
This allows the runtime to focus on providing support for services.
|
|
@@ -195,23 +226,6 @@ and is ideal for a system focused on orchestrating services which may be used by
|
|
|
% Delivering messages over the network.
|
|
|
Messages can be forwarded between actor runtimes using a secure transport layer called
|
|
|
\texttt{bttp}.
|
|
|
-Messages are addressed using \emph{actor names}.
|
|
|
-An actor name consists of the following fields:
|
|
|
-\begin{enumerate}
|
|
|
- \item \texttt{service}: The path identifying the receiving service.
|
|
|
- \item \texttt{scope}: A filesystem path used to specify the intended recipient.
|
|
|
- \item \texttt{rootwards}: An enum describing whether message delivery is attempted towards or
|
|
|
- away from the root of the filesystem tree. A value of
|
|
|
- \texttt{false} indicates that the message is intended for a runtime directly contained in the
|
|
|
- scope. A value of \texttt{true} indicates that the message is intended for a runtime contained
|
|
|
- in a parent directory of the scope and should be delivered to a runtime which has the requested
|
|
|
- service registered and is closest to the scope.
|
|
|
- \item \texttt{id}: An identifier for a specific service provider.
|
|
|
-\end{enumerate}
|
|
|
-The ID can be a \texttt{Uuid} or a \texttt{String}.
|
|
|
-It is treated as an opaque identifier by the runtime,
|
|
|
-but a service is free to associate additional meaning to it.
|
|
|
-Every message has a header containing the name of the sender and receiver.
|
|
|
The transport is implemented using the QUIC protocol, which integrates TLS for security.
|
|
|
A \texttt{bttp} client may connect anonymously or using credentials.
|
|
|
If an anonymous connection is attempted,
|
|
@@ -230,6 +244,8 @@ Because QUIC supports the concurrent use of many different streams,
|
|
|
it serves as an ideal transport for a message oriented system.
|
|
|
\texttt{bttp} uses different streams for independent messages,
|
|
|
ensuring that head of line blocking does not occur.
|
|
|
+Note that although data from separate streams can arrive in any order,
|
|
|
+the protocol does provide reliable in-order delivery of data in a given stream.
|
|
|
The same stream is used for sending the reply to a message dispatched with \texttt{call}.
|
|
|
Once a connection is established,
|
|
|
message may flow both directions (provided both runtimes have execute permissions for the other),
|
|
@@ -277,6 +293,27 @@ This actor could forward traffic delivered to it in messages over this connectio
|
|
|
The set of actors which are able to access the connection is controlled by setting the filesystem
|
|
|
permissions on the file for the runtime executing the actor owning the connection.
|
|
|
|
|
|
+% Actor ownership.
|
|
|
+The concept of ownership in programming languages is very useful for ensuring that resources are
|
|
|
+properly freed when the type using them dies.
|
|
|
+Because actors are used for encapsulating resources in Blocktree,
|
|
|
+a similar system of ownership is employed for this reason.
|
|
|
+An actor is initially owned by the actor that spawned it.
|
|
|
+An actor can only have a single owner,
|
|
|
+but the owner can grant ownership to another actor.
|
|
|
+An actor is not allowed to own itself,
|
|
|
+though it may be owned by the runtime.
|
|
|
+When the owner of an actor returns,
|
|
|
+the actor is sent a message instructing it to return.
|
|
|
+If it does not return after a timeout,
|
|
|
+it is interrupted.
|
|
|
+This is the opposite of how supervision trees work in Erlang.
|
|
|
+Instead of the parent receiving a message when the child returns,
|
|
|
+the child receives a message when the parent returns.
|
|
|
+Service providers spawned by the runtime are owned by it.
|
|
|
+They continue running until the runtime chooses to reclaim their resources,
|
|
|
+which can happen because they are idle or the runtime is overloaded.
|
|
|
+
|
|
|
% Message routing to services.
|
|
|
A service is identified by a Blocktree path.
|
|
|
Only one service implementation can be registered in a particular runtime,
|
|
@@ -508,7 +545,7 @@ allowing a lightweight and secure VPN system to built.
|
|
|
|
|
|
\section{Filesystem}
|
|
|
% The division of responsibilities between the sector and filesystem services.
|
|
|
-The responsibility for storing data in the system is shared between the filesystem and sector
|
|
|
+The responsibility for serving data in the system is shared between the filesystem and sector
|
|
|
services.
|
|
|
Most actors will access the filesystem through the filesystem service,
|
|
|
which provides a high-level interface that takes care of the cryptographic operations necessary to
|
|
@@ -516,14 +553,14 @@ read and write files.
|
|
|
The filesystem service relies on the sector service for actually persisting data.
|
|
|
The individual sectors which make up a file are read from and written to the sector service,
|
|
|
which stores them in the local filesystem of the computer on which it is running.
|
|
|
-A sector is the atomic unit of data storage.
|
|
|
-The sector service only supports reading and writing entire sectors at once.
|
|
|
-File actors spawned by the filesystem service buffer reads and writes so until there is enough
|
|
|
+A sector is the atomic unit of data storage
|
|
|
+and the sector service only supports reading and writing entire sectors at once.
|
|
|
+File actors spawned by the filesystem service buffer reads and writes until there is enough
|
|
|
data to fill a sector.
|
|
|
Because cryptographic operations are only performed on full sectors,
|
|
|
the cost of providing these protections is amortized over the size of the sector.
|
|
|
-Thus there is tradeoff between latency and throughput when selecting the sector size of a file.
|
|
|
-A smaller sector size means less latency while a larger one enables more throughput.
|
|
|
+Thus there is tradeoff between latency and throughput when selecting the sector size of a file:
|
|
|
+a smaller sector size means less latency while a larger one enables more throughput.
|
|
|
|
|
|
% Types of sectors: metadata, integrity, and data.
|
|
|
A file has a single metadata sector, a Merkle sector, and zero or more data sectors.
|
|
@@ -544,7 +581,7 @@ a consensus cluster.
|
|
|
This cluster is identified by a \texttt{u64} called the cluster's \emph{generation}.
|
|
|
Every file is identified by a pair of \texttt{u64}, its generation and its inode.
|
|
|
The sectors within a file are identified by an enum which specifies which type they are,
|
|
|
-and in the case of data sectors, their index.
|
|
|
+and in the case of data sectors, their 0-based index.
|
|
|
\begin{verbatim}
|
|
|
pub enum SectorKind {
|
|
|
Meta,
|
|
@@ -552,17 +589,49 @@ and in the case of data sectors, their index.
|
|
|
Data(u64),
|
|
|
}
|
|
|
\end{verbatim}
|
|
|
-The offset in the plaintext of the file at which each data sector begins can be calculated by
|
|
|
-multiplying the sectors offset by the sector size of the file.
|
|
|
+The byte offset in the plaintext of the file at which each data sector begins can be calculated by
|
|
|
+multiplying the sector's index by the sector size of the file.
|
|
|
+The \texttt{SectorId} type is used to identify a sector.
|
|
|
+\begin{verbatim}
|
|
|
+ pub enum SectorId {
|
|
|
+ generation: u64,
|
|
|
+ inode: u64,
|
|
|
+ sector: SectorKind,
|
|
|
+ }
|
|
|
+\end{verbatim}
|
|
|
+
|
|
|
+% Types of messages handled by the sector service.
|
|
|
+Communication with the sector service is done by passing it messages of type \texttt{SectorMsg}.
|
|
|
+\begin{verbatim}
|
|
|
+ pub struct SectorMsg {
|
|
|
+ id: SectorId,
|
|
|
+ op: SectorOperation,
|
|
|
+ }
|
|
|
+
|
|
|
+ pub enum SectorOperation {
|
|
|
+ Read,
|
|
|
+ Write(WriteOperation),
|
|
|
+ }
|
|
|
+
|
|
|
+ pub enum WriteOperation {
|
|
|
+ Meta(Box<FileMeta>),
|
|
|
+ Data {
|
|
|
+ meta: Box<FileMeta>,
|
|
|
+ contents: Vec<u8>,
|
|
|
+ }
|
|
|
+ }
|
|
|
+\end{verbatim}
|
|
|
+Here \texttt{FileMeta} is the type used to store metadata for files.
|
|
|
+Note that updated metadata is required to be sent when a sector's contents are modified.
|
|
|
|
|
|
% Scaling horizontally: using Raft to create consensus cluster. Additional replication methods.
|
|
|
-When multiple multiple sector service providers are contained in the same directory,
|
|
|
-the sector service providers connect to each other to form a consensus cluster.
|
|
|
-This cluster uses the Raft protocol to synchronize the state of the sectors it stores.
|
|
|
-The system is currently designed to replicate all data to each of the service providers in the
|
|
|
-cluster.
|
|
|
-Additional replication methods are planned for implementation,
|
|
|
-such as consisting hashing and erasure encoding,
|
|
|
+A generation of sector service providers uses the Raft protocol to synchronize the state of the
|
|
|
+sectors it stores.
|
|
|
+The message passing interface of the runtime enables this implementation
|
|
|
+and the sector service's requirements were important considerations in designing this interface.
|
|
|
+The system currently replicates all data to each of the service providers in the cluster.
|
|
|
+Additional replication methods are planned for future implementation
|
|
|
+(e.g. erasure encoding and distribution via consistent hashing),
|
|
|
which allow for different tradeoffs between data durability and storage utilization.
|
|
|
|
|
|
% Scaling vertically: how different generations are stitched together.
|
|
@@ -571,53 +640,112 @@ First, a new directory is created in which the generation will be located.
|
|
|
Next, one or more processes are credentialed for this directory,
|
|
|
using a procedure which is described in the next section.
|
|
|
The credentialing process produces files for each of the processes stored in the new directory.
|
|
|
-The sector service provider in each of the new processes uses service discovery to establish
|
|
|
-communication with its peers in the other processes.
|
|
|
-Finally, the service provider which is elected leader contacts the cluster in the root directory
|
|
|
+The sector service provider in each of the processes uses the filesystem service
|
|
|
+(which connects to the parent generation of the sector service)
|
|
|
+to find the other runtimes hosting the sector service in the directory and messages them to
|
|
|
+establish a fully-connected cluster.
|
|
|
+Finally, the service provider which is elected leader contacts the generation in the root directory
|
|
|
and requests a new generation number.
|
|
|
Once this number is known it is stored in the superblock for the generation,
|
|
|
which is the file identified by the new generation number and inode 2.
|
|
|
-Note that the superblock is not contained in any directory and cannot be accessed by actors
|
|
|
-outside of the sector service.
|
|
|
-The superblock also contains information used to assign a inodes when a files are created.
|
|
|
-
|
|
|
-% Sector service discovery. Paths.
|
|
|
+The superblock is not contained in any directory and cannot be accessed outside the sector service.
|
|
|
+The superblock also keeps track of the next inode to assign to a new file.
|
|
|
|
|
|
-% The filesystem service is responsible for cryptographic operations. Client-side encryption.
|
|
|
+% Authorization logic of the sector service.
|
|
|
+To prevent malicious actors from writing invalid data,
|
|
|
+the sector service must cryptographically verify all write messages.
|
|
|
+The process it uses to do this involves several steps:
|
|
|
+\begin{enumerate}
|
|
|
+ \item The certificate chain in the metadata that was sent in the write message is validated.
|
|
|
+ It is considered valid if it ends with a certificate signed by the root principal
|
|
|
+ and the paths in the certificates are correctly nested,
|
|
|
+ indicating valid delegation of write authority at every step.
|
|
|
+ \item Using the last public key in the certificate chain,
|
|
|
+ the signature in the metadata is validated.
|
|
|
+ This signature covers all of the fields in the metadata.
|
|
|
+ \item The new sector contents in the write message are hashed using the digest function configured
|
|
|
+ for the file and the resulting hash is used to update the file's Merkle tree in its Merkle
|
|
|
+ sector.
|
|
|
+ \item The root of the Merkle tree is compared with the integrity value in the file's metadata.
|
|
|
+ The write message is considered valid if and only if there is a match.
|
|
|
+\end{enumerate}
|
|
|
+This same logic is used by file actors to verify the data they read from the sector service.
|
|
|
+Only once a write message is validated is it shared with the sector service provider's peers in
|
|
|
+its generation.
|
|
|
+Although the data in a file is encrypted,
|
|
|
+it is still beneficial for security to prevent unauthorized principal's from gaining access to a
|
|
|
+file's ciphertext.
|
|
|
+To prevent this, a sector service provider checks a file's metadata to verify that the requesting
|
|
|
+principal actually has a readcap (to be defined in the next section) for the file.
|
|
|
+This ensures that only principals that are authorized to read a file can gain access to the file's
|
|
|
+ciphertext, metadata, and Merkle tree.
|
|
|
+
|
|
|
+% File actors are responsible for cryptographic operations. Client-side encryption.
|
|
|
The sector service is relied upon by the filesystem service to read and write sectors.
|
|
|
-Filesystem service providers communicate with the sector service to open files, read and write
|
|
|
-their contents, and update their metadata.
|
|
|
-These providers are responsible for verifying and decrypting the information contained in sectors
|
|
|
-and providing it to downstream actors.
|
|
|
-They are also responsible for encrypting and integrity protecting data written by downstream actors.
|
|
|
-Most of the complexity of implementing a filesystem is handled in the filesystem service.
|
|
|
-Most messages sent to the sector service only specify the operation (read or write), the identifier
|
|
|
-for the sector, and the sector contents.
|
|
|
-Every time a data sector is written an updated metadata sector is required to be sent in the same
|
|
|
-message.
|
|
|
-This requirement exists because a signature over the root of the file's Merkle tree is contained in
|
|
|
-the metadata,
|
|
|
-and since this root changes with every modification, it must be updated during every write.
|
|
|
-When the sector service commits a write it hashes the sector contents,
|
|
|
-updates the Merkle sector of the file, and updates the metadata sector.
|
|
|
-In order for the filesystem service to produce a signature over the root of the file's Merkle tree,
|
|
|
+Filesystem service providers communicate with the sector service to open files and perform
|
|
|
+filesystem operations.
|
|
|
+These providers spawn file actors that are responsible for verifying and decrypting the information
|
|
|
+contained in sectors and providing it to other actors.
|
|
|
+They use the credentials of the runtime they are hosted in to decrypt sector data using
|
|
|
+information contained in file metadata.
|
|
|
+File actors are also responsible for encrypting and integrity protecting data written to files.
|
|
|
+In order for a file actor to produce a signature over the root of the file's Merkle tree,
|
|
|
it maintains a copy of the tree in memory.
|
|
|
-This copy is loaded from the sector service when the file is opened.
|
|
|
+This copy is read from the sector service when the file is opened.
|
|
|
While this does mean duplicating data between the sector and filesystem services,
|
|
|
this design was chosen to reduce the network traffic between the two services,
|
|
|
as the entire Merkle tree does not need to be transmitted on every write.
|
|
|
-Encapsulating all cryptographic operations in the filesystem service allows the computer storing
|
|
|
-data to be different from the computer encrypting it.
|
|
|
+Encapsulating all cryptographic operations in the filesystem service and file actors allows the
|
|
|
+computer storing data to be different from the computer encrypting it.
|
|
|
This approach allows client-side encryption to be done on more capable computers
|
|
|
-and for this task to be delegated to a storage server on low powered devices.
|
|
|
-
|
|
|
-% Description of how the filesystem layer: opens a file, reads, and writes.
|
|
|
-
|
|
|
-% Peer-to-peer data distribution in the filesystem service.
|
|
|
+and low powered devices to delegate this task to a storage server.
|
|
|
+
|
|
|
+% Prevention of resource leaks through ownership.
|
|
|
+A major advantage of using file actors to access file data is that they can be accessed over the
|
|
|
+network from a different runtime as easily as they can be from the same runtime.
|
|
|
+One complication arising from this approach is that file actors must not outlive the actor which
|
|
|
+caused them to be spawned.
|
|
|
+This is handled in the filesystem service by making the actor who opened the file the owner of the
|
|
|
+file actor.
|
|
|
+When a file actor receives notification that its owner returned,
|
|
|
+it flushes any buffered data in its cache and returns,
|
|
|
+ensuring that a resource leak does not occur.
|
|
|
+
|
|
|
+% Authorization logic of the filesystem service.
|
|
|
+The filesystem service uses an \texttt{Authorizer} type to make authorization decisions.
|
|
|
+It passes this type the authorization attributes of the principal accessing the file, the
|
|
|
+attributes of the file, and the type of access (read, write, or execute).
|
|
|
+The \texttt{Authorizer} returns a boolean indicating if access is permitted or denied.
|
|
|
+These access control checks are performed for every message processed by the filesystem service,
|
|
|
+including opening a file.
|
|
|
+A file actor only responds to messages sent from its owner,
|
|
|
+which ensures that it can avoid the overhead of performing access control checks as these were
|
|
|
+carried out by the filesystem service when it was created.
|
|
|
+The file actor is configured when it is spawned to allow read only, write only, or read write
|
|
|
+access to a file,
|
|
|
+depending on what type of access was requested by the actor opening the file.
|
|
|
|
|
|
% Streaming replication.
|
|
|
-
|
|
|
-
|
|
|
+Often when building distributed systems it is convenient to alert any interested party that an event
|
|
|
+has occurred.
|
|
|
+To facilitate this pattern,
|
|
|
+the sector service allows actors to subscribe for notification of writes to a file.
|
|
|
+The sector service maintains a list of actors which are currently subscribed
|
|
|
+and when it commits a write to its local storage,
|
|
|
+it sends each of them a notification message identifying the sector written
|
|
|
+(but not the written data).
|
|
|
+By using different files to represent different events,
|
|
|
+a simple notification system can be built.
|
|
|
+Because the contents of a directory may be distributed over many different generations,
|
|
|
+this system does not support the recursive monitoring of directories.
|
|
|
+Although this system lacks the power of \texttt{inotify} in the Linux kernel,
|
|
|
+it does provides some of its benefits without incurring much or a performance overhead
|
|
|
+or implementation complexity.
|
|
|
+For example, this system can be used to implement streaming replication.
|
|
|
+This is done by subscribing to writes on all the files that are to be replicated,
|
|
|
+then reading new sectors as soon as notifications are received.
|
|
|
+These sectors can then be written into replica files in a different directory.
|
|
|
+This ensures that the contents of the replicas will be updated in near real-time.
|
|
|
|
|
|
\section{Cryptography}
|
|
|
% The underlying trust model: self-certifying paths.
|