Jelajahi Sumber

Ironed out some open questions about runtime network discovery.

Matthew Carr 1 tahun lalu
induk
melakukan
c05ad2ed5d

+ 1 - 0
.vscode/settings.json

@@ -17,6 +17,7 @@
         "pkey",
         "readcap",
         "readcaps",
+        "runtimes",
         "writecap",
         "writecaps",
         "Xsalsa"

+ 231 - 59
doc/BlocktreeCloudPaper/BlocktreeCloudPaper.tex

@@ -28,15 +28,14 @@ the system aims to advance the status quo in the security and reliability of sof
 \end{abstract}
 
 \section{Introduction}
-% Describe paths, actors, and files. Emphasize the benefit of actors and files sharing the same
-% namespace.
+% The "Big" Picture.
 Blocktree is an attempt to extend the Unix philosophy that everything is a file
 to the entire distributed system that comprises modern IT infrastructure.
 The system is organized around a global distributed filesystem which defines security
 principals, resources, and their authorization attributes.
 This filesystem provides a language for access control that can be used to securely grant principals
 access to resources from different organizations, without the need to setup federation.
-The system provides an actor runtime for orchestrating tasks and services.
+The system provides an actor runtime for orchestrating services.
 Resources are represented by actors, and actors are grouped into operating system processes.
 Each process has its own credentials which authenticate it as a unique security principal,
 and which specify the filesystem path where the process is located.
@@ -47,6 +46,27 @@ The cryptographic mechanisms which make this possible are described in detail in
 Messages addressed to actors in a different process are forwarded over these connections,
 while messages delivered to actors in the same process are delivered with zero-copying.
 
+% Self-certifying paths and the chain of trust.
+The single global Blocktree filesystem is partitioned into disjoint domains of authority.
+Each domain is controlled by a root principal.
+As is the case for all principals,
+a root principal is authenticated by a public-private key pair,
+and is identified by a hash of its public key.
+The domain of authority for a given absolute path is determined by its first component,
+which is the identifier of the root principal who controls the domain.
+Because there is no meaning to the directory "/",
+a directory consisting of only a single component equal to a root principal's identifier is
+referred to as the root directory of that root principal.
+The root principal delegates its authority to write files to subordinate principals by issuing
+them certificates which specify the path that the authority of the subordinate is limited to.
+File data is signed for authenticity and a certificate chain is contained in its metadata.
+This certificate chain must lead back to the root principal
+and consist of certificates with correctly scoped authority in order for the file to be authentic.
+Given the path of a file and the file's contents,
+this system allows the file to be validated by anyone without the need to trust a third-party.
+Blocktree paths are referred to as self-certifying for this reason.
+
+% Persistent state provided by the filesystem.
 One of the major challenges in distributed systems is managing persistent state.
 Blocktree solves this issue using its distributed filesystem.
 Files are broken into segments called sectors.
@@ -66,7 +86,6 @@ which allows anyone to verify that their contents were written by an authorized
 Encryption can be optionally applied to sectors,
 with the system handling key management.
 The cryptographic mechanisms used to implement these protections are described in section 3.
-
 To reduce load on the sector service, and to allow the system to scale to a larger number of users,
 a peer-to-peer distribution system is implemented in the filesystem service.
 This system allows filesystem actors to download sectors from other filesystem actors
@@ -75,6 +94,7 @@ The threat of malicious actors serving bad sector data is mitigated by the stron
 protections applied to sectors.
 By using peer-to-peer distribution, the system can serve as a content delivery network.
 
+% Protocol contracts.
 One of the design goals of Blocktree is to facilitate the creation of composable distributed
 systems.
 A major challenge to building such systems is the difficulty in pinning down bugs when they
@@ -88,9 +108,10 @@ and the state transitions which occur to be specified based on the types of rece
 These contracts are used to verify protocol adherence statically and dynamically.
 This system is implemented using compile time code generation,
 making it a zero-cost abstraction.
-By freeing the developer from dealing with the numerous failure modes that occur in a communication protocol,
-they are able to focus on the functionality of their system.
+This frees the developer from dealing with the numerous failure modes that can occur in a
+communication protocol.
 
+% Implementation language and project links.
 Blocktree is implemented in the Rust programming language.
 Its source code is licensed under the Affero GNU Public License Version 3.
 It can be downloaded at the project homepage at \url{https://blocktree.systems}.
@@ -99,7 +120,7 @@ to \url{https://gogs.delease.com/Delease/Blocktree}.
 If you have larger changes or architectural suggestions,
 please submit an issue for discussion prior to spending time implementing your idea.
 
-% Describe the remainder of the paper.
+% Outline of the rest of the paper.
 The remainder of this paper is structured as follows:
 \begin{itemize}
   \item Section 2 describes the actor runtime, service and task orchestration, and service
@@ -111,6 +132,8 @@ The remainder of this paper is structured as follows:
   \item Section 6 provides some concluding remarks.
 \end{itemize}
 
+
+
 \section{Actor Runtime}
 % Motivation for using the actor model. 
 Building scalable fault tolerant systems requires us to distribute computation over
@@ -128,39 +151,93 @@ as this will ensure low impedance with the underlying networking technology.
 % Overview of message passing interface.
 That is why Blocktree is built on the actor model
 and why its actor runtime is at the core of its architecture.
-The runtime can be used to spawn new actors, register services, and dispatch messages.
+The runtime can be used to register services and dispatch messages.
 Messages can be dispatched in two different ways: with \texttt{send} and \texttt{call}.
 A message is dispatched with the \texttt{send} method when no reply is required,
 and with \texttt{call} when exactly one is.
 The \texttt{Future} returned by \texttt{call} can be awaited to obtain the reply.
 If a timeout occurs while waiting for the reply,
-then the \texttt{Future} completes with an error.
+the \texttt{Future} completes with an error.
 The name \texttt{call} was chosen to bring to mind a remote procedure call,
 which is the primary use case this method was intended for.
 Awaiting replies to messages serves as a simple way to synchronize a distributed computation.
 
+% Description of virtual actor system.
+One of the challenges when building actor systems is supervising and managing actor's lifecycles.
+This is handled in Erlang through the use of supervision trees,
+but Blocktree takes a different approach inspired by Microsoft's Orleans framework.
+Orleans introduced the concept of virtual actors,
+which are purely logical entities that exist perpetually.
+In Orleans, one does not need to spawn actors nor worry about respawing them should they crash,
+the framework takes care of spawning an actor when a message is dispatched to it.
+This model also gives the framework the flexibility to deactivate actors when they are idle
+and to load balance actors across different computers.
+In Blocktree a similar system is used,
+which is possible because messages are only addressed to services.
+The Blocktree runtime takes care of routing these messages to the appropriate actors,
+spawning them if needed.
+
+% The runtime is implemented using tokio.
+The actor runtime is currently implemented using the Rust asynchronous runtime tokio.
+Actors are spawned as tasks in the tokio runtime,
+and multi-producer single consumer channels are used for message delivery.
+Because actors are just tasks,
+they can do anything a task can do,
+including awaiting other futures.
+Because of this, there is no need for the actor runtime to support short-lived worker tasks,
+as any such use-case can be accomplished by awaiting a set of \texttt{Future}s.
+This allows the runtime to focus on providing support for services.
+Using tokio also means that we have access to a high performance multi-threaded runtime with
+evented IO.
+This asynchronous programming model ensures that resources are efficiently utilized,
+and is ideal for a system focused on orchestrating services which may be used by many clients.
+
 % Delivering messages over the network.
 Messages can be forwarded between actor runtimes using a secure transport layer called
 \texttt{bttp}.
 Messages are addressed using \emph{actor names}.
-An actor name is a pair consisting of the filesystem path of the runtime
-and a UUID specifying an actor in that runtime.
+An actor name consists of the following fields:
+\begin{enumerate}
+  \item \texttt{service}: The path identifying the receiving service.
+  \item \texttt{scope}: A filesystem path used to specify the intended recipient.
+  \item \texttt{rootwards}: An enum describing whether message delivery is attempted towards or
+    away from the root of the filesystem tree. A value of
+    \texttt{false} indicates that the message is intended for a runtime directly contained in the
+    scope. A value of \texttt{true} indicates that the message is intended for a runtime contained
+    in a parent directory of the scope and should be delivered to a runtime which has the requested
+    service registered and is closest to the scope.
+  \item \texttt{id}: An identifier for a specific service provider.
+\end{enumerate}
+The ID can be a \texttt{Uuid} or a \texttt{String}.
+It is treated as an opaque identifier by the runtime,
+but a service is free to associate additional meaning to it.
 Every message has a header containing the name of the sender and receiver.
 The transport is implemented using the QUIC protocol, which integrates TLS for security.
-The TLS handshake between runtimes is performed using mutual TLS authentication.
-This handshake cryptographically verifies the credentials of each runtime.
-These credentials contain the filesystem path where each runtime is located,
-which ensures that messages addressed to a specific path will only be delivered to the runtime
-at that path.
+A \texttt{bttp} client may connect anonymously or using credentials.
+If an anonymous connection is attempted,
+the client has no authorization attributes associated with it.
+Only runtimes which grant others the execute permission allow connections from such clients.
+If these permissions are not granted in the runtime's file,
+anonymous connections are rejected.
+When a client connects with credentials,
+mutual TLS authentication is performed as part of the connection handshake,
+which cryptographically verifies the credentials of each runtime.
+These credentials contain the filesystem paths where each runtime is located,
+which ensures that messages addressed to a specific path will only be delivered to that path.
+The \texttt{bttp} server is always authenticated during the handshake,
+even when the client is connecting anonymously.
 Because QUIC supports the concurrent use of many different streams,
 it serves as an ideal transport for a message oriented system.
 \texttt{bttp} uses different streams for independent messages,
-ensuring that head of line blocking will not occur.
-However, replies are sent over the same stream as the original message.
+ensuring that head of line blocking does not occur.
+The same stream is used for sending the reply to a message dispatched with \texttt{call}.
+Once a connection is established,
+message may flow both directions (provided both runtimes have execute permissions for the other),
+regardless of which runtime is acting as the client or the server.
 
 % Delivering messages locally.
 When a message is sent between actors in the same runtime it is delivered into the queue of the recipient without any copying,
-while ensuring immutability (move semantics).
+while ensuring immutability (i.e. move semantics).
 This is possible thanks to the Rust ownership system,
 because the message sender gives ownership to the runtime when it dispatches the message,
 and the runtime gives ownership to the recipient when it delivers the message.
@@ -188,7 +265,7 @@ which helps to reduce the performance penalty of the actor runtime over directly
 Security is enhanced by this decision because it forces the user to separate actors with different
 security requirements into different operating system processes,
 which ensures all of the process isolation machinery in the operating system will be used to
-isolate the different security domains.
+isolate them.
 
 % Representing resources as actors.
 As in other actor systems, it is convenient to represent resources in Blocktree using actors.
@@ -198,43 +275,45 @@ and for resources to be shared by many actors.
 For instance, a Point-to-Point Protocol connection could be owned by an actor.
 This actor could forward traffic delivered to it in messages over this connection.
 The set of actors which are able to access the connection is controlled by setting the filesystem
-permissions on the file for the runtime executing the actor with the connection.
-
-% Service discovery.
-In addition to spawning actors, the runtime can also be used to register actors as service
-providers.
-A service is identified by a filesystem path.
-One or more actors may be register as providing a service.
-Services are resolved to actor names by the runtime.
-The service resolution method takes the path of a service and a scope path.
-The scope path defines the filesystem path where service resolution will begin.
-Resolution produces the name of an actor which is registered in a runtime which is "closest" to the
-scope, or \texttt{None} if no service provider can be found.
-Here "closest" means the that it is the name returned by the following recursive procedure:
-\begin{enumerate}
-  \item If the scope is the path of a runtime, and there are providers of the service registered in the
-    runtime, then one of their names is returned. Otherwise, service resolution is retried using a
-    new scope which is obtained by removing the last path component of the current scope.
-  \item If a directory is specified, then all of the runtimes in the directory are checked for
-    registered service providers, and the first one which is found has its name is returned.
-    Otherwise, service resolution is retried using a new scope which is obtained by removing the
-    last path component of the current scope.
-  \item If the scope is the empty string, then \texttt{None} is returned.
-\end{enumerate}
-When there are multiple names which could be returned as providers for a given service,
-the one which is actually returned is unspecified,
+permissions on the file for the runtime executing the actor owning the connection.
+
+% Message routing to services.
+A service is identified by a Blocktree path.
+Only one service implementation can be registered in a particular runtime,
+though this implementation may be used to spawn many actors as providers for the service,
+each associated with a different ID.
+The runtime spawns a new actor when it finds no service provider associated with the ID in the
+message it is delivering.
+Some services may only have one service provider in a given runtime,
+as is the case for the sector and filesystem services.
+Services are reactive,
+they don't do anything until they receive a message to process.
+The \texttt{scope} and \texttt{rootward} field in an actor name specify the set of runtimes to
+which a message may be delivered.
+They allow the sender to express their intended recipient,
+while still affording enough flexibility to the runtime to route messages as needed.
+If \texttt{rootward} is \texttt{false},
+the message is delivered to a service provider in a runtime that is directly contained in
+\texttt{scope}.
+If \texttt{rootward} is \texttt{true},
+the parent directories of scope are searched,
+working towards the root of the filesystem tree,
+and the message is delivered to the first provider of \texttt{service} which is found.
+When there are multiple service providers to which a given message could be delivered,
+the one to which it is actually delivered is unspecified,
 which allows the runtime to balance load.
-In order to contact other runtimes and query their service registrations,
+Delivery will occur for at most one recipient,
+even in the case that there are multiple potential recipients.
+In order to contact other runtimes and deliver messages to them,
 their IP addresses need to be known.
-To enable this a file with the runtime's IP address is maintained in the same directory as the
+This is achieved by maintaining a file with a runtime's IP address in the same directory as the
 runtime.
 The runtime is granted write permissions on the file,
-and it is updated by the transport layer when it begin listening on a new endpoint.
+and it is updated by \texttt{bttp} when it begins listening on a new endpoint.
 The services which are allowed to be registered in a given runtime are specified in the runtime's
 file.
 The runtime reads this list and uses it to deny service registrations for unauthorized services.
 The list is also read by other runtime's when they are searching a directory for service providers.
-Only runtimes which are authorized to run the service will be searched for service providers.
 
 % The sector and filesystem service.
 The filesystem is itself implemented as a service.
@@ -252,6 +331,84 @@ and thus maintaining the persistent state of the system.
 It stores sector data in the local filesystem of each computer on which it is registered.
 The details of how this is accomplished are deferred to the next section.
 
+% Runtime network discovery.
+While it is possible to resolve runtime paths to IP addresses when the filesystem is available,
+a different mechanism is needed to allow the filesystem and sector services to discover service
+providers.
+To facilitate this,
+runtimes are able to query one another to learn about other runtimes.
+Because queries are intended to facilitate message delivery,
+the query fields and their meanings mirror those used for addressing messages:
+\begin{enumerate}
+  \item \texttt{service} The path of the service whose providers are sought.
+    Only runtimes with this service registered will be returned.
+  \item \texttt{scope} The filesystem path relative to which the query will be processed.
+  \item \texttt{rootward} Indicates if the query should search for runtimes from \texttt{scope}
+    toward the root.
+\end{enumerate}
+The semantics of \texttt{scope} and \texttt{rootward} in a query are identical to their use in an
+actor name.
+As long as at least one other runtime is known,
+a query can be issued to learn of more runtimes.
+A runtime which receives a query may not be able to answer it directly.
+If it cannot,
+it returns the IP address of the next runtime to which the query should be sent.
+In order to bootstrap the discovery processes,
+another mechanism is needed to find the first peer to query.
+There were several possibilities explored for doing this.
+One way is to use a blockchain to store the IP addresses of the runtimes hosting the sector service
+in the root directory.
+As long as these runtimes could be located,
+then all others could be found using the filesystem.
+This idea may be worth revisiting in the future,
+but the author wanted to avoid the complexity of implementing a new proof of work blockchain.
+Another idea was to use multicast link-local addressing to discover other runtimes,
+similar to how mDNS operates.
+This approach has several advantages.
+It avoids any dependency on centralized internet infrastructure
+and keeps network load local to the segment on which the runtimes are connected.
+But, it will not work over a wide area network,
+making it unsuitable for the general case.
+Instead, the design which was decided on was to use DNS to resolve a fully qualified domain name
+(FQDN) derived from the root principal's identifier.
+This FQDN is expected to resolve to the public IP addresses of the runtimes hosting the
+sector service in the root directory of the root principal.
+Each process is configured with a search domain which is used as a suffix of the FQDN.
+The leading labels in the FQDN are computed by base32 encoding a hash of the root
+principal's public key.
+If the encoded string is longer than 63 bytes (the limit for each label in a hostname),
+it is separated into the fewest number of labels possible,
+working from left to right along the string.
+A dot followed by the search domain is concatenated onto the end of this string to form the FQDN.
+This method has the advantages of being simple to implement
+and allowing runtimes to discover each other over the internet.
+Implementing this system would be facilitated by hosting DNS servers in actors in the same
+runtimes as the root sector service providers.
+Then, A or AAAA records could be served which point to these runtimes.
+These runtimes would also need to be configured with static IP addresses,
+and the NS records for the search domain would need to point to them.
+Of course it is also possible to build such a system without hosting DNS inside of Blocktree.
+The downside of using DNS is that it couples Blocktree with a centralized,
+albeit distributed, system.
+
+% Security model for queries.
+To allow runtimes which are not permitted to execute the root directory to query for other runtimes,
+authorization logic which is specific to queries is needed.
+If a process is connected with credentials
+and the path in the credentials contains the scope of the query,
+the query is permitted.
+If a process is connected anonymously,
+its query will only be answered if the query scope
+and all of its parent directories,
+grant others the execute permission.
+Queries from authenticated processes can be authorized using only the information in the query,
+but anonymous queries require knowledge of filesystem permissions,
+some of which may not be known to the answering runtime.
+When authorizing an anonymous query,
+an answering runtime should check that that the execute permission is granted on all directories
+that it is responsible for storing.
+If all these checks pass, it should forward the querier to the next runtime as usual.
+
 % Overview of protocol contracts and runtime checking of protocol adherence.
 To facilitate the creation of composable systems,
 a protocol contract checking system based on session types has been designed.
@@ -271,8 +428,9 @@ unexpected type is received,
 eliminating the need for ad-hoc error handling code to be written by application developers.
 
 % Example of a protocol contract.
-Let us explore the use of this system through a simple example.
-Consider the HTTP/1.1 protocol.
+% TODO: I don't find this example very compelling. It would be more impressive to show a pub-sub
+% protocol, that would look cool.
+Let us explore the use of this system through a simple example using the HTTP/1.1 protocol.
 It is a state-less client-server protocol,
 essentially just an RPC from client to server.
 We can model this in for the contract checker by defining a trait representing the protocol:
@@ -281,9 +439,9 @@ We can model this in for the contract checker by defining a trait representing t
     type Server: ServerInit;
   }
 \end{verbatim}
-The job of this top-level trait is to specify the initial state of every party to the communications
-protocol.
-In this case were only modeling the state of the server,
+The purpose of this top-level trait is to specify the initial state of every party to the
+communications protocol.
+In this case we're only modeling the state of the server,
 as the client will just \texttt{call} a method on the server.
 The initial state for the server is defined as follows:
 \begin{verbatim}
@@ -293,8 +451,8 @@ The initial state for the server is defined as follows:
     fn handle_activate(self, msg: Activate) -> Self::Fut;
   }
 \end{verbatim}
-The \texttt{Activate} is a message sent by the generated code to allow the actor access to the
-runtime and its ID.
+\texttt{Activate} is a message sent by the generated code to allow the actor access to the
+runtime and the actor's ID.
 It is defined as follows:
 \begin{verbatim}
   pub struct Activate {
@@ -341,11 +499,13 @@ as is commonly done today,
 but it can also be used to encapsulate traffic to and from the container in Blocktree messages.
 These messages are routed to other actors based on the configuration of the supervisor.
 This essentially creates a VPN for containers,
-ensuring that regardless of the security hardness of their communications,
+ensuring that regardless of well secured their communication is,
 they will be safe to communicate over any network.
 This network encapsulation system could be used in other actors as well,
 allowing a lightweight and secure VPN system to built.
 
+
+
 \section{Filesystem}
 % The division of responsibilities between the sector and filesystem services.
 The responsibility for storing data in the system is shared between the filesystem and sector
@@ -421,6 +581,8 @@ Note that the superblock is not contained in any directory and cannot be accesse
 outside of the sector service.
 The superblock also contains information used to assign a inodes when a files are created.
 
+% Sector service discovery. Paths.
+
 % The filesystem service is responsible for cryptographic operations. Client-side encryption.
 The sector service is relied upon by the filesystem service to read and write sectors.
 Filesystem service providers communicate with the sector service to open files, read and write
@@ -449,10 +611,14 @@ data to be different from the computer encrypting it.
 This approach allows client-side encryption to be done on more capable computers
 and for this task to be delegated to a storage server on low powered devices.
 
-% Sector service discovery. Paths.
-
 % Description of how the filesystem layer: opens a file, reads, and writes.
 
+% Peer-to-peer data distribution in the filesystem service.
+
+% Streaming replication.
+
+
+
 \section{Cryptography}
 % The underlying trust model: self-certifying paths.
 
@@ -466,6 +632,8 @@ and for this task to be delegated to a storage server on low powered devices.
 
 % Requesting and issuing credentials. Multicast link-local network discovery.
 
+
+
 \section{Examples}
 This section contains examples of systems built using Blocktree. The hope is to illustrate how this
 platform can be used to implement existing applications more easily and to make it possible to
@@ -478,9 +646,13 @@ implement systems which are currently out of reach.
 % Describe a blocktree which runs a cluster of webservers, a manufacturing process, a warehouse
 % inventory management system, and an order fulfillment system.
 
+\subsection{A smart home.}
+
 \subsection{A realtime geo-spacial environment.}
 % Explain my vision of the metaverse.
 
+
+
 \section{Conclusion}
 % Blocktree serves as the basis for building a cloud-level distributed operating system.
 

+ 108 - 1
doc/BlocktreeCloudPaper/notes.md

@@ -31,4 +31,111 @@
    consistency of sector data.
 6. Note that the sectors of the directory itself are actually stored by the parent sector service.
    It is just the files created within it which are created after the sector
-   service in the directory becomes active which are stored by the child sector service.
+   service in the directory becomes active which are stored by the child sector service.
+
+## Filesystem discovery
+There are four cases to consider, depending on what permissions the discovering runtime has for the
+file being accessed:
+1. The discoverer hosts the sector service responsible for the file.
+2. The discoverer hosts the filesystem service because it has a readcap for the file.
+3. The discoverer does not host the filesystem service for the file but has read permissions for the
+   file.
+4. The discoverer is attempting read the file anonymously.
+
+In the first case, the sector service needs to discover all of the other sector service providers
+in its directory. Once it has connected to all of them, sectors can be reconstructed and written
+to the cluster.
+It makes sense to have the filesystem service registered in such a runtime,
+because this would allow all filesystem operations to happen locally (at least it would access the
+local sector service, the sector service may need to communicate with its peers in the directory
+when data is written).
+In this case the runtime needs to be able to find all of the runtimes hosting the sector service
+in its directory.
+
+In the second case the runtime needs to be able to discover the correct sector service provider to
+connect to.
+It seems that it needs to find a runtime hosting the sector service contained in one of its parent
+directories.
+Once such a runtime is found, messages can be delivered to it to access the sectors of the file,
+and their contents will be decrypted locally.
+
+In the third case,
+the runtime must locate the closest runtime hosting the filesystem service which is contained in one
+of the runtime's parent directories.
+This should be the same query as in case 2, just used for the filesystem service instead of the
+sector service.
+
+In case four, the process must discover a filesystem service hosting the file. This case
+actually doesn't seem any different from case 3, it's just performed with no authorization
+attributes.
+So in terms of FS permissions, only files which allow others to read them could be accessed in this
+way, and all of whose parent directories can be read by others can be accessed in this way.
+This requirement that all parent directories can also be read by others,
+would be too strict for non-anonymous access.
+It's important to allow credentialed access to a file when a process has permission to that
+specific file, even if the process can't access one or more of the files parents.
+This helps to keep the system flexible.
+
+There seem to be two queries which are needed to locate the appropriate runtimes. A query is
+executed with respect to a scope and only considers runtimes with a given service registration.
+1. Find all runtimes directly contained in the scope.
+2. Find a runtime which is contained in a parent directory which is closest to the scope.
+   Closest means that there are no relevant runtimes contained in any of the subdirectories
+   of the directory containing the query result.
+
+These queries correspond to the two ways that messages can be dispatched by an actor.
+
+There are three cases to consider when defining the security model for runtime queries:
+1. The process has a readcap for the scope of the query.
+2. The process has read permission for the scope of the query.
+3. The processes is issuing the query anonymously.
+
+In the first two cases the query should be allowed.
+In the third case it should only be allowed if every file on the path from the scope to the root
+permits others to read.
+
+When a runtime receives a query it should use the filesystem to answer it.
+If, as it navigates to the scope, it encounters a directory which it is not responsible for
+storing,
+it will return a redirection to the querier with the IP address of a runtime where the
+query should be retried.
+This processes repeats until the query is answered,
+either successfully with one or more runtimes or with an error and no runtimes.
+
+Queries are issued automatically by processes as part of the message routing procedure.
+Each process maintains a trie keyed using message scope.
+It uses this trie to find the longest prefix match with the scope.
+The value contained in the trie is a hash table of service registrations.
+This allows a process to quickly determine if it already knows the correct runtime to deliver the
+message to.
+If the process does not know the correct recipient,
+it performs discovery using one of the queries above,
+with the query being determined by how the message was dispatched.
+If no other runtimes are known,
+the process uses DNS to find a runtime in the root directory,
+remembers the runtime in its trie,
+and issues the query to it.
+There will need to be a cache control mechanism for determining how long entries in the trie can
+be kept.
+
+## Firewall traversal
+Blocktree requires a mechanism which allows runtimes to connect to each other even if one or both
+of them is behind a firewall.
+I don't yet know how to do this in the case were both are behind a firewall,
+but in the case where only a single one is,
+we can handle it by having a runtime contained in a parent directory send a control plane message
+to the runtime which can't be reached telling it to initiate a connection to the runtime attempting
+to reach it.
+If the runtime that initiated the connection has a public IP address,
+this will allow the two to connect,
+after which messages can be sent in either direction.
+This requires that at one runtime in the root directory has a public IP address,
+and that a connection is maintained between a child runtime and one of its parents.
+
+Because the sector clusters are fully connected we only need to a connection request message to
+one of them if we have the runtime forward these connection requests.
+Then, if at least one of the sector hosts in the root has a public IP,
+one runtime in each cluster is connected to one runtime in each of its child clusters,
+the message should eventually be delivered to the correct runtime.
+
+This means that the sector hosts will form a single connected component of the connection graph.