Paper.tex 38 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548
  1. \documentclass{article}
  2. \usepackage{amsfonts,amssymb,amsmath}
  3. \usepackage[scale=0.75]{geometry}
  4. \usepackage{hyperref}
  5. \usepackage{biblatex}
  6. \bibliography{../citations.bib}
  7. \title{Blocktree \\
  8. \large A platform for distributed computing.}
  9. \author{Matthew Carr}
  10. \date{May 28, 2022}
  11. \begin{document}
  12. \maketitle
  13. \begin{abstract}
  14. Blocktree is a platform which aims to make developing reliable distributed systems easy and safe.
  15. It does this by defining a format for data stored in the system, a mechanism for replicating
  16. that data, and a programming interface for accessing it. It aims to solve the difficult problems
  17. of cryptographic key management, consensus, and global addressing so that user code can focus on
  18. solving problems germane to their application.
  19. \end{abstract}
  20. \section{Introduction}
  21. The online services that users currently have access to are incredible. They handle all
  22. the details of backing up user data and implementing access controls to facilitate
  23. safe sharing. However, because these are closed systems, users are forced to trust that
  24. the operators are benevolent, and they lack any real way of ensuring that the access
  25. controls they prescribe will actually be enforced. There have been several systems proposed
  26. as alternatives to the conventional model, but these systems suffer from their own shortcomings.
  27. They either assume the need for cloud storage providers (Blockstack \cite{blockstack}) or implement
  28. all operations using a global blockchain, limiting performance (Filecoin \cite{filecoin}).
  29. Blocktree takes a different approach.
  30. The idea behind blocktree is to organize a user's computers into a cooperative unit, called a
  31. blocktree. The user is said to own the blocktree, and they wield sovereign authority over it.
  32. The artifact granting them this authority is the root private key of the blocktree. Measures for protecting
  33. this key and delegating its authority are important design considerations of the system.
  34. The owners of blocktrees are encouraged to collaborate with each other to replicate data by
  35. means of a cryptocurrency known as blockcoin. The blockchain implementing this cryptocurrency
  36. is the source of global state for the system, and allows for the creation of global paths.
  37. All data stored in blocktree is contained in units called blocks. As the name would suggest, the blocks
  38. in a blocktree form a tree. Each block has a path corresponding to its location in the tree. The first
  39. component of a fully qualified blocktree path is the fingerprint of the root public key of the
  40. blocktree. Thus a blocktree path can globally specify a block. If a block is not a leaf,
  41. then it is called a directory, and the data it contains is managed by the system.
  42. This information includes the list of blocks which are children of the directory as well as the
  43. list of nodes which are attached to the block tree at this directory. In addition
  44. to its payload of data,
  45. each block has a header containing cryptographic access control mechanisms. These mechanisms ensure
  46. that only authorized users can read and optionally write to the block.
  47. Users and nodes in the blocktree system are identified by hashes of their public keys. These hashes
  48. are referred to as principals, and they are used for setting access control policy.
  49. This paper is intended to be short introduction to the ideas of blocktree. A book is planned which
  50. will specify the system in greater detail. In keeping with the agile software methodology, this
  51. book is being written concurrently with an open source implementation of the system.
  52. The remainder of this paper is organized as follows:
  53. \begin{itemize}
  54. \item A description of the operations of a single blocktree.
  55. \item The definition of a blockchain which provides global state and links individual blocktrees
  56. together.
  57. \item The programming interface for interacting with blocktrees and sending messages.
  58. \item An exploration of applications that could be written using this platform.
  59. \item Conclusion.
  60. \end{itemize}
  61. \section{An Individual Blocktree}
  62. The atomic unit of data storage, confidentiality and authenticity is the block. A
  63. block contains a payload of data. Confidentiality of this data is achieved by encrypting it using
  64. a symmetric cipher using a random key. This random key is known as the block key.
  65. The block key can be encapsulated using the public key of a principal whose is to be given access.
  66. The resulting ciphertext is stored in the header of the block. Thus
  67. the person possessing the corresponding private key will be able to access the contents of
  68. the block. Blocks are arranged into trees, and the parent block also has a block key.
  69. The child's block key is always encapsulated using the parent's key and stored in the block
  70. header. This ensures that when a principal is given read access to a block, it automatically has
  71. access to every child of that block. The encapsulated block key is known as a read capability,
  72. or readcap, as it grants the holder the ability to read the block. In the case of the root block
  73. there is no parent block key to issue a readcap to. Instead, the root block always contains a
  74. readcap for the the root key.
  75. A note on visualizing a blocktree. In this paper I will describe the root block as being at the
  76. bottom of the tree, and describe the descendants of a block as being above it. The reader may
  77. recognize this as the ``trees grow up" model of formal logic, and it was chosen because such trees
  78. appear visually similar to real world trees when rendered in a virtual environment.
  79. Authenticity guarantees are provided using a digital signature scheme. In order to change the
  80. contents of a block a data structure called a write capability, or writecap
  81. \footnote{The names readcap and writecap were taken from the Tahoe Least-Authority Filesystem
  82. \cite{tahoe}. The access control mechanism described in the Tahoe system heavily influenced the
  83. design of Blocktree.},
  84. is needed. A
  85. writecap is approximately an x509 certificate chain. A writecap contains the following data:
  86. \begin{itemize}
  87. \item The path the writecap can be used under.
  88. \item The principal that the writecap was issued to.
  89. \item The timestamp when the writecap expires.
  90. \item The public key of the principal who issued the writecap.
  91. \item A digital signature produced by the private key corresponding to the above public key.
  92. \item Optionally, the next writecap in the chain.
  93. \end{itemize}
  94. The last item is only excluded in the case of a self-signed writecap, i.e. one that was signed by
  95. the same principal it was issued to. A writecap is considered valid for use in a block if all
  96. of the following conditions are met:
  97. \begin{itemize}
  98. \item The signature on every writecap in the chain is valid.
  99. \item The signing principal matches the principal the next writecap was issued to for every
  100. writecap in the chain.
  101. \item The path of the block is contained in the path of every writecap in the chain.
  102. \item The current timestamp is strictly less than the expiration of all the writecaps in the
  103. chain.
  104. \item The principal corresponding to the public key used to sign the last writecap in the chain,
  105. is the owner of the blocktree.
  106. \end{itemize}
  107. The intuition behind these rules is that a writecap is only valid if there is a chain of trust
  108. that leads back to the owner of the blocktree. The owner may delegate their trust to any number
  109. of intermediaries by issuing them writecaps. These writecaps are scoped based on the path
  110. specified when they are issued. These intermediaries can then delegate this trust as well.
  111. A block is considered valid if it contains a valid writecap, it was signed using the key
  112. corresponding to the first writecap's public key, and this signature is valid.
  113. Blocks are used for more than just organizing data, they also organize computation. A program
  114. participating in a blocktree is referred to as a node. Multiple nodes may be run on
  115. a single computer. Every node is attached to the blocktree at a specific path. This information
  116. is recorded in the directory where the node is attached. A node is responsible for the storage of
  117. the directory where it is attached and all of the blocks that are recursively contained with in it.
  118. If there is a child node attached to a subdirectory contained in the directory the node is
  119. responsible for, then the child node is responsible for the subdirectory.
  120. In this way data storage can be delegated, allowing the system to scale. When more than one
  121. node is attached to the same directory they form a cluster.
  122. Each node in the cluster contains a copy of the data that the cluster is responsible for. They
  123. maintain consistency of this data by running the Raft \cite{raft} consensus protocol.
  124. When a new blocktree is created a node generates a key pair to serve as the root keys.
  125. It is imperative for the security of the system that the root private key is protected, and it
  126. is highly recommended that it be stored in a Trusted Platform Module (TPM) \cite{tpm} and that
  127. the TPM
  128. be configured to disallow unauthenticated use of this key. The node then generates its own key pair
  129. and uses the root private key to issue itself a writecap for the root of the tree. Once it has
  130. this writecap, it creates the root block and generates a block key for it. A readcap is added to
  131. this block for the root public key and the node's public key. Additional
  132. cryptographic operations are performed using the node's key pair, and only when a new writecap
  133. needs to be created for an addition root node is the root private key used.
  134. When a new node comes online and wishes to join the blocktree, it generates its own key pair.
  135. The public key of this
  136. node then needs to be transmitted to another node that's already part of the user's blocktree. The
  137. mechanism used will depend on the nature of the device on which the new node is running.
  138. For example, a phone could scan a QR code which contains
  139. the IP address of the user's root node, and then transmit its public key to that internet host.
  140. In order for the new node to be added to the user's blocktree, it needs to be issued a writecap
  141. and a readcap must be added to the directory where it will be attached.
  142. This could be accomplished
  143. by providing a user interface on the node which received the public key from the new node.
  144. This interface would show the requests that have been received from nodes attempting
  145. to join the blocktree. The user can then choose to approve or deny the request, and can specify
  146. the path where the new node will attach. If the user chooses to approve the request, then the writecap
  147. is signed using the node's key and transmitted to then new node.
  148. The ability to cope with key compromise is an important design consideration in any real-world
  149. cryptosystem. In blocktree the compromise of a node key is handled by re-keying every block under
  150. the directory where the node was attached. Specifically, this means that a new block key is generated for
  151. each block, and the readcap for the compromised node is removed. This ensures that new writes to
  152. these blocks will not be visible to the holder of the compromised key. To ensure that writes
  153. will not be allowed, the root directory contains a revocation list containing the public keys
  154. which have been revoked by the tree. These only need to be maintained until their writecaps
  155. expire, after which time the list can be cleaned. Note that if the root private key is compromised or lost,
  156. then the blocktree must be abandoned, there is no recovery. This is real security, the artifact
  157. which grants control over the blocktree is the root private key which is why storing the root
  158. private key in multiple secure cryptographic co-processors is so important.
  159. Nodes in a blocktree communicate with one another by sending blocktree messages. These messages
  160. are used for implementing consensus and distributed locking,
  161. as well as notifying parent nodes when a
  162. child has written new data. The later mechanism allows the parent to replicate the data stored in
  163. a child for redundancy. User code is also able to initiate the sending of messages. Messages are
  164. addressed using blocktree paths. When a node receives a message that is not addressed to it,
  165. but is addressed to its blocktree, it forwards it to the closest node to the recipient that it
  166. is connected to. In order to enable efficient low-latency message transfers, nodes maintain open
  167. TCP connections to the other nodes in their cluster, and the cluster leader maintains a connection to
  168. its parent. Diffie-Hellman key exchange is used to exchange a key for use in an AEAD cipher, and
  169. once this cipher context is established, the two nodes mutually authenticate each other using their
  170. respective key pairs. When a node comes online, it uses the global blocktree (described later)
  171. to find the other nodes in its cluster. If it is not part of a cluster, or this information is
  172. not stored in the global blocktree, then it instead looks up the
  173. IP address of a root node and connects to it. The root node may direct the node to connect to
  174. one of root's children, and this process repeats until the new node is connected to its parent.
  175. A concept that has proven to be very useful in the world of filesystems is the symbolic link.
  176. This is a short file that contains the path to another file, and is interpreted by most programs
  177. as being a "link" to that file. Blocktree supports a similar system, where a block can be
  178. marked as a symbolic link and a blocktree path placed in its body. This also provides us with
  179. a convenient way of storing readcaps for data that a node would otherwise not have access to.
  180. For instance a symbolic link could be created which points to a block in another user's blocktree.
  181. The other user only knows the public key of the owner of our blocktree, so they issue
  182. a readcap to it. But the root nodes can open this readcap and extract
  183. the block key. This key can then be encapsulated using the public key of the node which
  184. requires access, and placed in the symbolic link. When the node needs to read the data
  185. in the block, it opens the readcap in the symbolic link, follows the link to the block (how
  186. that actually happens will be discussed below) and decrypts its contents.
  187. While the consistency of individual blocks can be maintained using Raft, a distributed locking
  188. mechanism is employed to enable transactions which span multiple blocks. This is
  189. accomplished by exploiting the hierarchical arrangement of nodes in the tree. In order to describe
  190. this, its first helpful to define a new term. The \emph{nodetree} of a blocktree is tree obtained
  191. from the blocktree by collapsing all the blocks that a node (or cluster of nodes) is responsible
  192. for into a single logical block representing the node itself. Thus we can talk about a node having
  193. a parent, and by this we mean its parent in the nodetree. In terms of the blocktree, the parent
  194. of a node is the first node encountered when the path back to the root is traversed.
  195. Now, distributed locking works as follows:
  196. \begin{itemize}
  197. \item A node sends a request to lock a block to the current consensus leader in its cluster.
  198. If the node is not part of a cluster, then it is the leader. This request contains a timestamp
  199. for when the lock expires.
  200. \item If the leader is responsible for the block then it moves on to the next step. Otherwise
  201. it contacts its parent and forwards the lock request and this step is repeated for the parent.
  202. \item The responsible node checks to see if there is already a lock for this block. If there is
  203. then the request fails. Otherwise the request succeeds and a lock is placed on the block. A
  204. message indicating the result is then passed back up the tree ending at the original node. This
  205. message includes the principal of the node enforcing the lock.
  206. \item Once the locking node is done making its updates it sends a message directly to the node
  207. enforcing the lock, causing it to be removed.
  208. \end{itemize}
  209. Locking a block locks the subtree rooted at that block. Thus no writes to any path contained in
  210. the path of the locked block will be allowed, unless they come from locking node. If the locking
  211. node does not send the message
  212. unlocking the block before the lock expires, then the modifications which had been performed by
  213. it are dropped and the block reverts to its prior state. Since the locking node is the leader
  214. of the consensus cluster that is responsible for the block, this guarantees that
  215. writes from other nodes will not be accepted.
  216. \section{Connecting Blocktrees}
  217. In order to allow nodes to access blocks in other blocktrees, a global ledger of events is used.
  218. This ledger is implemented using a proof of work (PoW) blockchain and a corresponding
  219. cryptocurrency known as blockcoin. Nodes mine chain blocks (not to be confused with the tree
  220. blocks we've been discussing up till now) in the same way they do in other PoW blockchain
  221. systems such as Bitcoin \cite{bitcoin}. The node which manages to mine the next chain block receives
  222. a reward,
  223. which is the sum of the fees for each event in the chain and a variable amount of newly minted
  224. blockcoin. The amount of new blockcoin created by a chain block is directly proportional to the
  225. amount of data storage events contained in the chain block. Thus the total amount of blockcoin
  226. in circulation has a direct relationship to the amount of data stored in the system, reflecting
  227. the fact that blockcoin exists to provide an accounting mechanism for data.
  228. When a node writes data to a tree block, and it wishes this block to be globally accessible or
  229. replicated for redundancy,
  230. it produces what are called fragments. Fragments are the output symbols of the RaptorQ code
  231. \cite{raptorq}. This algorithm is an example of an Erasure Code, which is a class of fountain codes
  232. which have the property that only $m$ out of $n$ (where $m < n$) symbols are needed to reconstruct
  233. the original data. Such a code ensures that even if some of the fragments are lost, as long as $m$
  234. remain, the original data can be recovered.
  235. Once these fragments have been computed an event is created for each one and published to the
  236. blockchain. This event indicates to other nodes that this node wishes to store a fragment and
  237. states the amount of blockcoin it will pay for each of the fragment's maintenance payments. When
  238. another nodes wishes to accept the offer, it directly contacts the first node, which then sends
  239. it the fragment and publishes an event stating that the fragment is stored with the second
  240. node. This event includes the path of the block the fragment was computed from, the fragment's
  241. ID (the sequence number from the erasure code), and the principal of the node which stored it.
  242. Thus any other node in the network can use the information contained in this event to
  243. determine the set of nodes which contain the fragments of any given path.
  244. In order for the node which stored a fragment to receive its next payment, it has to pass
  245. a time-bound challenge-response protocol initiated by the node that owns the fragment.
  246. The owning node select a leaf in the Merkle tree of the fragment and sends the index of
  247. this leaf to the storing node. The storing node then walks the path from this leaf back to
  248. the root of the Merkle tree, and updates a hash value using the data in each node it traverses.
  249. It sends this result back to the owning node which verifies that this value matches its
  250. own computation. If it does then the owning node signs a message indicating that the challenge
  251. passed and that the storing node should be paid. The storing node receives this message and uses
  252. it to construct an event, which it signs and publishes to the blocktree. This event causes
  253. the blockcoin amount specified to be transferred from the owning node's to the storing
  254. node's blocktree.
  255. The fact that payments occur over time provides a simple incentive for nodes to be honest and
  256. store the data they agree to. In banking terms, the storing node views the fragment as an
  257. asset, it is a loan of its disk space which provides a series of payments over time.
  258. On the other hand the owning node views the fragment as a liability, it requires payments to
  259. be made over time. In order for a blocktree owner to remain solvent, it must balance its
  260. liabilities with its assets, incentivizing it to store data for others so that its own data
  261. will be stored.
  262. In order for nodes to be able to contact other nodes, a mechanism is required for associating
  263. an internet protocol (IP) address with a principal. This is done by having nodes publish events
  264. to the blockchain when their IP address changes. This event includes their new IP address,
  265. their public key, and a digital signature computed using their private key. Other nodes can
  266. then verify this signature to ensure that an attacker cannot bind the wrong
  267. IP address to a principal in order to receive messages not meant for it.
  268. While this event ledger is useful for appending new
  269. events, and ensuring previous events
  270. cannot be changed, another data structure is required to enable efficient queries.
  271. In particular, it's important to be able to quickly perform the
  272. following queries:
  273. \begin{itemize}
  274. \item Find the set of nodes storing the fragments for a given path.
  275. \item Find the IP address of a node or owner given a principal.
  276. \item Find the public key associated with a principal.
  277. \end{itemize}
  278. To enable these queries a special blocktree is maintained by each node in the network: the global
  279. blocktree. This tree does not support the usual writing and locking semantics of local blocktrees.
  280. In functional programming terms, it can be thought of as a left fold over all of the events in the
  281. blockchain.
  282. The above queries are facilitated by the following blocks:
  283. \begin{itemize}
  284. \item \emph{/global/fragments}: this block contains a hashtable where the key is a path and the
  285. value is the list of nodes storing the fragments for the block at that path.
  286. \item \emph{/global/principals}: contains a hashtable where the key is the a principal and the value
  287. is the tuple containing the public key of that principal, its current IP address, and its current
  288. blockcoin balance.
  289. \end{itemize}
  290. To compute the entries in these tree blocks, the nodes in the network iterate over all the chain blocks, updating
  291. their local copy of each tree block appropriately. The experienced reader will recognize that this is an event
  292. sourced architecture. Currently only these two tree blocks are known to be needed, but if new events are
  293. added to the system it can be easily extended to enable queries that have yet to be envisioned.
  294. \section{Programming Environment}
  295. Enabling an excellent developer experience is one of the primary goals of this system (the others being security
  296. and scalability). Nodes execute user code that has been compiled into WebAssembly modules \cite{wasm}. Such code
  297. running on a blocktree node is referred to as an "app". An app
  298. executes in a sandbox that isolates it from other code, as well as the security critical operations of the node
  299. itself. The sandbox provides the code with an extension of the WebAssembly System Interface (WASI), with extra
  300. system calls to interact with the particulars of the blocktree system.
  301. The extra system calls fall into these categories:
  302. \begin{itemize}
  303. \item Distributed Locking
  304. \item Messaging
  305. \item Supervision Trees
  306. \item Protocol Contracts (Session Types)
  307. \end{itemize}
  308. The standard WASI filesystem APIs are used
  309. to interact with the contents of blocktrees. For instance a file descriptor for a block can be obtained
  310. by calling path\_open. Writes and reads of blocks are performed using the privileges of the node on which
  311. the app is running, but the node may limit the app's access depending on its permissions.
  312. When an app is installed it is given a directory under which it can store data that is shared between all nodes
  313. in the blocktree. The path of this block is formed by prefixing the path the app was published at
  314. with the string ``/apps". When an app is runs on a node, it is confined to a block contained
  315. in the node's directory. It is only allowed read and write blocks in this block, but to allow it to
  316. access shared data, a symbolic link is created to the app's shared directory in ``/apps".
  317. % App Publishing
  318. Apps are published by writing them into a blocktree. The path of the directory used to publish an app is used to
  319. identify it. Only one app per directory is allowed. This directory contains the WebAssembly module itself as
  320. well as a JSON manifest. This manifest defines the app's user-friendly name as well as the list
  321. of permissions it requires. This list of permissions is used to determine which APIs the app has access to.
  322. % Privacy Safe vs Unsafe
  323. Apps are broken into two categories: those that are privacy safe and those that are not.
  324. An app is privacy unsafe if it requests any permissions which allow it to send data
  325. outside of the blocktree it's part of. Thus requesting the ability to
  326. open a TCP socket would cause an app to be privacy unsafe. Similarly, the creation of a protocol
  327. handler for HTTP would also be privacy unsafe. Privacy unsafe apps can limit the scope of their
  328. unsafety by
  329. imposing limits on the unsafe APIs that they request. For instance an app which needs to send
  330. blocktree message back to the blocktree it was published in can request the messaging permission
  331. for a path in this tree. Similarly, an app which only wants to open a TCP socket listening on the
  332. local network, can limit the scope of its requested permission.
  333. % Protocol Contracts
  334. In order to make it easy to write apps which interface with existing systems, most notably
  335. those using HTTP, apps are able to define protocols using a contract language and then register
  336. call backs for these protocols. This works by providing a system call which the app can
  337. supply a protocol contract to. This contract is then compiled into a state machine by the node
  338. and a handle is returned to the app. This handle is then used to register callbacks for different
  339. parts of the protocol. This ensures that the protocol is handled by the node itself and that
  340. protocols can be shared between many different apps as a library. For instance, the HTTP protocol
  341. would be compiled to a particularly simply state machine, with only one state: the listening state.
  342. This state would expose a hook that where a callback can be registered to handle a request.
  343. This state also defines the record format used to pass the request information to the callback
  344. and the record format of the return value that is expected in order to produce the response.
  345. More complicated (stateful) protocols would have more states, each defining their own request and
  346. response records, as well as hooks. One nice thing about this setup is that it will enable
  347. optimizations where the state machine and the user callbacks can be compiled into a program
  348. which can be safely run in the node itself, or even in a SmartNIC. This would require that
  349. the callbacks only use an approved set of APIs, but could enable much higher performance.
  350. % Supervision Trees
  351. Apps can arrange themselves into supervision trees, in the same way that Erlang
  352. processes are arranged \cite{armstrong}. In this scheme, when a child app crashes, or the node its
  353. running on dies (which is detected by other nodes), then the app receives a message.
  354. In the
  355. simplest case this can be used to implement a logging system, where crashes and node
  356. deaths are recorded. More interestingly, this can be used to integrate with a control
  357. plane. For instance, if a blocktree were running in AWS, when a message is received indicating
  358. that a node has died, a new EC2 instance could be started to replace it. The reliability of Erlang
  359. and other system employing the Actor Model have shown the robustness of this approach.
  360. Apps can form this relationship after they've started running, provided that both of the apps
  361. have permission to send messages to each other. Apps, with the appropriate permissions, can also
  362. spawn other apps on descendent nodes. This can be used to implement map-reduce workloads, where
  363. an app spawns mapping jobs on descendent nodes containing the data of interest, and it processes
  364. their messages to compute the final reduction. Due to the tiny size of most programs, this is a much
  365. more efficient approach than moving the data to the nodes performing the computation.
  366. \section{A Brave New Web}
  367. In order to explore how blocktree can be used, the design of several hypothetical systems
  368. is discussed. It's important to note that blocktree does not try to force all computation
  369. to be local to a user's device, it merely tries to enable this for applications where it
  370. is possible.
  371. \subsection{Contacts and Mail}
  372. The first application we'll consider is one which manages a user's contacts. This would expose
  373. the usual create, read, update and delete operations, allowing a user to input the name
  374. of a person they know and associate
  375. that name with their public key. Once the principal of a person is known, then their public
  376. key can be looked up in the global blocktree. This principal needs to be communicated to the
  377. user via some out-of-band method. They could receive it in an email, a text message, or embedded
  378. in a QR code. Of course this out-of-band communication needs to be authenticated, otherwise
  379. it would be easy to fool the user into associating an attacker's key for the person.
  380. The user now has a way of associating a blocktree with the name of this person. However, the
  381. root public key of this blocktree is not enough to establish secure communications, because
  382. the root private key is not available to every node in the person's blocktree. In particular
  383. it would be inadvisable for the root private key to be stored on a user's mobile device. To
  384. address this mailbox directories are created.
  385. For each contact two directories are created: the inbox and the outbox. The user creates a readcap
  386. for another user's root key and adds it to their outbox. The inbox for the other user is a symbolic
  387. link to the user's outbox in the blocktree of the other user. Thus each user
  388. can write messages into their own blocktree at a location where the other party knows how
  389. to find them. But, in order for a node to read these messages it requires a its own readcap.
  390. Only the root
  391. nodes can issue this readcap as only they have access to the root key. Once permission has been
  392. granted to a node, a root node can use the root key to decrypt the readcap issued to it, and then
  393. encrypt it using the public key of the node. The resulting readcap is then stored in the header
  394. of the inbox.
  395. In addition to being able to check the inbox for mail, a blocktree message is sent to the receiving
  396. blocktree when mail is sent. This message may contain the entire contents of the mail, if the
  397. contents are short enough. But if the mail contains a lot of data, for instance a video, then
  398. the message just serves as a notification that new mail is available.
  399. \subsection{Social Network}
  400. Building a social network on top of the contacts app is fairly straight-forward. Once
  401. a contacts entry has been created for a person, most interactions between the user and that
  402. person can be implemented by sending mail. For example, when the
  403. user sends a direct message to a person mail is placed in their outbox
  404. and a blocktree message is sent to the root cluster of their blocktree. If the user wishes to
  405. send a message to a group of people, mail is sent to each one.
  406. This same mechanism can be used to implement status updates. When a user updates their status,
  407. they send mail to each of their "friends". The social networking app running on each of the contacts'
  408. devices will then display the latest status update from the user as their current status. This
  409. setup allows the user to "unfriend" anyone by simply omitting them from the list of recipients
  410. of these updates. To allow comments and likes on the status update to be visible to everyone
  411. that it was shared with, the block created in the outbox of each of the friends
  412. is a symbolic link to a single status update block. This symbolic link contains the
  413. readcap for the status update block.
  414. Comments and likes on status updates are implemented by sending mail to the user who posted the
  415. update. When one of the user's nodes receives this mail, it then updates the block containing
  416. the status update with the new comment, or increments the like counter.
  417. It then sends mail to the people this status update was shared with, notifying them
  418. that the data has changed.
  419. \subsection{An e-commerce website.}
  420. The previous two examples show how to build decentralized versions of existing web applications,
  421. but blocktree also excels at building traditional web applications. Imagine an e-commerce website
  422. with multiple warehouses, all of whose inventory is to be visible to customers of the site. Part
  423. of the design consideration for the site is that the warehouses need to be able to update their
  424. inventory even when their internet service goes down, and that these updates need to be visible
  425. on the website once connectivity is restored.
  426. To accomplish this a designer could create a directory for each warehouse. This directory would have
  427. nodes attached to it that are physically located at each warehouse. The inventory of the warehouse
  428. is then maintained in the warehouse's directory, satisfying the first requirement. Now, in order to
  429. enable efficient queries of the overall available inventory, the data from each of the warehouses
  430. needs to be merged. This is accomplished by creating another directory containing the merged data.
  431. In event sourcing terms this is called a read-model. This directory will have another cluster of
  432. nodes attached which will act as web servers. These nodes will subscribe to events published by the
  433. warehouse clusters, events indicating changing levels of inventory, and digest this information into
  434. a format that can be efficiently queried. These nodes will also publish events to the warehouses
  435. when orders are placed, cancelled or amended.
  436. When a warehouse goes offline, its previous inventory counts are
  437. still recorded in the read-model, so queries can still be answered using the cluster's best
  438. knowledge of the warehouse. Once the warehouse comes back online, then the events that were recorded by the
  439. warehouse cluster while it was offline can be replayed to the web server cluster, and their read-model
  440. can be brought up to date.
  441. One of the advantages of this approach is that the cluster of web servers need not be physically
  442. close to each other. In fact they could be spread over several data centers, even in different
  443. cloud providers. This allows for greater fault tolerance and reliability. Of course, running a
  444. consensus cluster over a larger area means more latency and thus reduced performance, but if
  445. the workload is read-heavy, and writes can be handled mostly in the warehouse clusters, then this
  446. tradeoff may be worthwhile.
  447. I hope this example shows that having a standard format for data and the federation of servers,
  448. can provide designers with much greater flexibility, even if they do not care about decentralization
  449. or their user's privacy.
  450. \subsection{The Open Metaverse}
  451. As a final example I'd like to consider a platform for recording spacial information. The key insight
  452. that enables this is very general: blocktree enables the creation of distributed tree-like data structures.
  453. For instance, its straight forward to imagine creating a distributed hashtable implemented as a red-black tree.
  454. This impedance match between efficient query structures and the design of blocktree is one
  455. of the reasons why I believe it is so useful. The particular data structure germane to building the metaverse
  456. is the Binary Space Partition (BSP) tree.
  457. I don't see the metaverse as being one world, but rather many. Each world would be hosted by its own blocktree.
  458. The world will have a directory in the blocktree, that directory will be the root of a BSP. read
  459. access to the world is controlled by adding readcaps to this directory. If the world is to be
  460. publicly accessible, then then its block can be stored in plaintext.
  461. Thinking of worlds
  462. like ours (those that can be reasonably approximated as the surface of a sphere) we can impose the usual latitude
  463. and longitude coordinate system. We can then define parcels in the world by specifying the polygonal boundary as
  464. a set of these coordinates (more precisely, the parcel is the convex hull of this set of points). These parcels
  465. are then recorded as blocks in the world's directory, whose path is determined by its virtual location using
  466. the BSP algorithm. If a parcel
  467. is owned by the same user who owns the world, then the data contained in the parcel is stored in
  468. the same blocktree. However, if a parcel is to be owned by another user, then a symbolic link is created
  469. pointing to a block in the owner's blocktree. They can then write whatever data they want into this block, defining
  470. the contents of their parcel. Collaboration on a single parcel is accomplished by issuing a read and writecap to
  471. another user.
  472. It's easy to imagine that one world would be more important than the rest and that he creation of a metaverse
  473. representation of Earth will be an important undertaking. The hierarchical nature of permissions in blocktree
  474. make such a shared world possible. National blocktrees could be given ownership of their virtual territory,
  475. this would then
  476. be delegated down to the state and municipal levels. Finally municipalities would delegate individual parcels
  477. to their actual owners. Owners could then use these as they see fit, including leasing them to third parties.
  478. The ending date of such a lease would be enforced technically by the writecap issued to the lessee; when it
  479. expires so too does the lease.
  480. \section{Conclusion}
  481. In this paper I have given the outline of a decentralized method of organizing information, computation,
  482. and trust in a way that I hope will be useful and easy to use. The use of cryptographic primitives for
  483. implementing access control were discussed, as well as methods of protecting private keys. A blockchain
  484. and corresponding cryptocurrency was proposed as a means of incentivizing the distribution of data.
  485. Erasure coding was used to ensure that distributed data is resilient to the loss of nodes and can be
  486. reconstructed efficiently. A programming environment based on WASM and WASI was proposed as a way
  487. of providing an interface to this data. APIs for defining protocol contracts and efficient web servers
  488. were indicated. APIs for constructing supervision trees were mentioned as a means for building reliable
  489. systems.
  490. When Sir Tim Berners-Lee invented HTTP he could not have anticipated the applications
  491. that his inventions would bring about. It has been said that the key to the success of his system is
  492. that it made networking programming so easy that anyone could do it. I don't know what the future will
  493. bring, but I hope that this system, or one like it, will enable anyone to fearlessly build distributed
  494. systems.
  495. One thing is certain however, it is a moral imperative that we provide users with viable alternatives
  496. to online services which harvest their data and weaponize it against them. Only then will the web
  497. become the place it was meant to be.
  498. \printbibliography
  499. \end{document}