State of the Art

Build caching isn't an entirely new concept. make does a great job of caching built artifacts, and that's been around forever! So, what does Buildless bring to the table, and how does it make builds faster?

Local caching is good

For a long time, tools like make, Gradle, and so many others have performed local caching. In this operating mode, built artifacts are kept on-disk somewhere, and recalled as needed during a developer's build. These techniques are great and have saved oodles of time for programmers over the years.

But a local cache can't easily be shared with your colleagues. There are challenges which are obvious--correctness, synchronization, freshness, flush mechanisms, and so on--and these multiply in the presence of more than one machine. Local caches simply are not designed to be shared, in most scenarios, and so they make assumptions accordingly.

Remote caching is better

Remotely caching built artifacts changes this equation, because a central server has a chance to see this traffic, and optimize it better than a local cache can muster. With a sufficiently robust server-side agent, caching effectiveness can reach 80% - 90%, in terms of build-steps skipped (cache hits, in this case).

Server agents which operate in tandem with local builds can offload time-consuming tasks like compression, replication, and signature verification, yielding even more gains.

"Just Right" Doesn't Exist

Because of the lax durability guarantees but global replication needs of a build cache, no easy solution exists: in-memory datastores like Redis only solve the problem one region or server at a time, and some objects should be persisted for a long period of time (but not all), necessitating support a coordination layer; that is the architecture presented here, for Buildless.

Cloud + Edge

Buildless partners with Cloudflare to provide best-of-breed networking and edge services. In 300 cities around the world, Buildless runs on servers, and holds the most useful cached objects at the edge of our network within Cloudflare's datacenters.

At the Edge

TBD: Diagram of Edge

Our edge network on Cloudflare runs in every datacenter, and enjoys optimized anycast routes to the internet.

Buildless uses top-notch encryption and compression technologies to deliver the best protocol and transport experience possible. Our edge network maintains active status of higher-order components and can route around issues as they occur.

Cache clients are automatically routed to the lowest-latency region available for use.

TLSv1.3 supported and encouraged!
Brotli at level 11 supported
Many other optimizations, through Cloudflare and our own services

At the Origin

TBD: Diagram of Origin

Our origin systems

Once traffic arrives to the Buildless network, it moves to our origin servers, which run our API, event processing, queueing, and related services.

Buildless partners with DragonflyDB for our in-memory datastore. Our adapters, datastructures, and techniques at this level are proprietary, but, in general, we optimize for a very specific use case:

Pseudo-Durability: Best-effort persistence guarantees. Read more in our Durability guide.
Overwrite-Only: No functionality for upserts, updates, diff-based changes, etc., are supported.
Async Write-path: Write paths are heavily optimized to defer the work of compressing, encrypting, and replicating cached objects. This keeps build steps which contribute data to the cache fast, and build speeds fair.
Aggressive Read-path: Read paths are heavily pre-optimized during background write processing. Replication is immediate, but no consensus requirement exists before written objects are served.
Never Fail: The cache is designed to never, ever fail your build. At worst, failures by Buildless should result in a lack of acceleration that would normally be enjoyed by the developer, but at no time should your build fail because of an error, outage, billing issue, or any other issue on our end. Read more in our Reliability guide.

Data Architecture

Buildless is inherently a data-handling platform: data is either moving in (cache stores) or out (cache hits). This section describes how Buildless treats cache data at-rest (while held for use), and in-flight (as it moves through our systems).

At Rest

TBD: Diagram of Data At Rest

Data at rest is always encrypted and compressed, in addition to signing steps applied by clients.

Data is semi-persistent in Buildless by default, held mostly in-memory, and always compressed and encrypted. Data is encrypted at several levels:

Buildless maintains service-level symmetric encryption mechanisms
Each Buildless account maintains account-scope symmetric encryption mechanisms
Underlying services used by Buildless apply their own symmetric encryption, in many cases

Buildless leverages best-of-breed symmetric encryption technologies and algorithms, and keeps systems aggressively up-to-date, with configurations that express current best practices.

🔑
Private Keying
Buildless accounts with custom encryption keying skip the middle step above, with their keys replacing the keys normally used by the Buildless service.

In Flight

TBD: Diagram of Data In Flight

Data in-flight is always encrypted, with perfect forward secrecy and mutual certificate verification.

Buildless leverages end-to-end mTLS within our network core, and supports the latest transport encryption technologies at our edge.

Customer data is encrypted at all times in transit, even internally within our network
TLSv1.3 is supported and used internally and externally, with PFS (Perfect Forward Secrecy) active
Systems specifically withhold support for broken, weak, or flawed algorithms

🌐
Replication is broadcast, wait-free, and best effort.
Buildless doesn't make clients don't wait for replication. This process happens in the background, because the same object is unlikely to be fetched (or stored) within two regions in a narrow time period.
Replication always overwrites, just like a normal write operation.

Buildless: Architecture

State of the Art

Local caching is good

Remote caching is better

"Just Right" Doesn't Exist

Cloud + Edge

At the Edge

At the Origin

Data Architecture

At Rest

🔑
Private Keying

In Flight

🌐
Replication is broadcast, wait-free, and best effort.

State of the Art

Local caching is good

Remote caching is better

"Just Right" Doesn't Exist

Cloud + Edge

At the Edge

At the Origin

Data Architecture

At Rest

🔑Private Keying

In Flight

🌐Replication is broadcast, wait-free, and best effort.

🔑
Private Keying

🌐
Replication is broadcast, wait-free, and best effort.