Delease/Blocktree: A platform for self-hosting internet services.

TODO

Replace references to "process" with "runtime". Because the runtime is required to route messages, it will be present in all practical Blocktree processes.
Apply the new terminology I've used in this paper to the codebase.

Actor runtime
Messages securely forwarded over the network.
Distributed network storage system.
Sector-level access to data.
File-level access to data.

Process of delegating storage in a directory.

A new directory is created. This directory has the generation number of the original sector cluster.
A process credential file is created in the directory. It is marked to indicate that the process will host the sector service. This mark means that the process will be responsible (jointly, along with all other such processes in the directory) for storing the sectors in the directory.
The new process starts and initializes a new directory in its local filesystem to store sector data. It knows to create this directory because it is configured to run the sector service, which creates a new storage directory if one does not already exist. As part of the creation process a new super block is created, which is the file with inode 1 and which is not contained in any directory. This new superblock contains the generation number which identifies the sector service in this directory. The generation number is determined by contacting the sector service in the root directory, which has knowledge and authority to assign unique numbers to every sector service.
The filesystem service in the directory will discover the sector service actor running inside the new process. When it creates new files in the directory it will store their sectors using the sector service in the process. These new files will use the generation number defined in the superblock stored in the sector service in the directory, which is different from the generation number of the directory itself.
When new processes configured to run the sector service are added to the directory, they automatically replicate sectors marked with their generation number, and use Raft to ensure the consistency of sector data.
Note that the sectors of the directory itself are actually stored by the parent sector service. It is just the files created within it which are created after the sector service in the directory becomes active which are stored by the child sector service.

Filesystem discovery

There are four cases to consider, depending on what permissions the discovering runtime has for the file being accessed:

The discoverer hosts the sector service responsible for the file.
The discoverer hosts the filesystem service because it has a readcap for the file.
The discoverer does not host the filesystem service for the file but has read permissions for the file.
The discoverer is attempting read the file anonymously.

In the first case, the sector service needs to discover all of the other sector service providers in its directory. Once it has connected to all of them, sectors can be reconstructed and written to the cluster. It makes sense to have the filesystem service registered in such a runtime, because this would allow all filesystem operations to happen locally (at least it would access the local sector service, the sector service may need to communicate with its peers in the directory when data is written). In this case the runtime needs to be able to find all of the runtimes hosting the sector service in its directory.

In the second case the runtime needs to be able to discover the correct sector service provider to connect to. It seems that it needs to find a runtime hosting the sector service contained in one of its parent directories. Once such a runtime is found, messages can be delivered to it to access the sectors of the file, and their contents will be decrypted locally.

In the third case, the runtime must locate the closest runtime hosting the filesystem service which is contained in one of the runtime's parent directories. This should be the same query as in case 2, just used for the filesystem service instead of the sector service.

In case four, the process must discover a filesystem service hosting the file. This case actually doesn't seem any different from case 3, it's just performed with no authorization attributes. So in terms of FS permissions, only files which allow others to read them could be accessed in this way, and all of whose parent directories can be read by others can be accessed in this way. This requirement that all parent directories can also be read by others, would be too strict for non-anonymous access. It's important to allow credentialed access to a file when a process has permission to that specific file, even if the process can't access one or more of the files parents. This helps to keep the system flexible.

There seem to be two queries which are needed to locate the appropriate runtimes. A query is executed with respect to a scope and only considers runtimes with a given service registration.

Find all runtimes directly contained in the scope.
Find a runtime which is contained in a parent directory which is closest to the scope. Closest means that there are no relevant runtimes contained in any of the subdirectories of the directory containing the query result.

These queries correspond to the two ways that messages can be dispatched by an actor.

There are three cases to consider when defining the security model for runtime queries:

The process has a readcap for the scope of the query.
The process has read permission for the scope of the query.
The processes is issuing the query anonymously.

In the first two cases the query should be allowed. In the third case it should only be allowed if every file on the path from the scope to the root permits others to read.

When a runtime receives a query it should use the filesystem to answer it. If, as it navigates to the scope, it encounters a directory which it is not responsible for storing, it will return a redirection to the querier with the IP address of a runtime where the query should be retried. This processes repeats until the query is answered, either successfully with one or more runtimes or with an error and no runtimes.

Queries are issued automatically by processes as part of the message routing procedure. Each process maintains a trie keyed using message scope. It uses this trie to find the longest prefix match with the scope. The value contained in the trie is a hash table of service registrations. This allows a process to quickly determine if it already knows the correct runtime to deliver the message to. If the process does not know the correct recipient, it performs discovery using one of the queries above, with the query being determined by how the message was dispatched. If no other runtimes are known, the process uses DNS to find a runtime in the root directory, remembers the runtime in its trie, and issues the query to it. There will need to be a cache control mechanism for determining how long entries in the trie can be kept.

Firewall traversal

Blocktree requires a mechanism which allows runtimes to connect to each other even if one or both of them is behind a firewall. I don't yet know how to do this in the case were both are behind a firewall, but in the case where only a single one is, we can handle it by having a runtime contained in a parent directory send a control plane message to the runtime which can't be reached telling it to initiate a connection to the runtime attempting to reach it. If the runtime that initiated the connection has a public IP address, this will allow the two to connect, after which messages can be sent in either direction. This requires that at one runtime in the root directory has a public IP address, and that a connection is maintained between a child runtime and one of its parents.

Because the sector clusters are fully connected we only need to a connection request message to one of them if we have the runtime forward these connection requests. Then, if at least one of the sector hosts in the root has a public IP, one runtime in each cluster is connected to one runtime in each of its child clusters, the message should eventually be delivered to the correct runtime.

This means that the sector hosts will form a single connected component of the connection graph.

Representation of files by the filesystem service.

My idea of using actors to own file handles has a significant drawback. If an actor which opened a file crashes, the file will remain open forever, resulting in a resource leak. An alternative would be to issue file handle structs to actors in local messages, but this will not work when the filesystem service is being accessed by a remote runtime. I could keep a table of file handles (integers) in the filesystem service, and access it similar to how the filesystem struct is used today. This approach brings the overhead of an RwLock on the table and searching it for a specific file on every read or write. Perhaps I could have the file actor poll its owner periodically to see if its still alive? Then it would be able to halt if the owning actor has crashed. To get this to work I'll need to reintroduce the ability to send messages to a specific actor, and solve the issue of handling undeliverable messages. This approach has the advantage of working over the network, and it does not introduce any overhead from maintaining a table.

notes.md 9.5 KB Permalink History Raw

TODO

Process of delegating storage in a directory.

Filesystem discovery

Firewall traversal

Representation of files by the filesystem service.

notes.md 9.5 KB

Permalink History Raw