Build caching isn't an entirely new concept.
make does a great job of caching built artifacts, and that's been around forever! So, what does Buildless bring to the table, and how does it make builds faster?
For a long time, tools like
make, Gradle, and so many others have performed local caching. In this operating mode, built artifacts are kept on-disk somewhere, and recalled as needed during a developer's build. These techniques are great and have saved oodles of time for programmers over the years.
But a local cache can't easily be shared with your colleagues. There are challenges which are obvious--correctness, synchronization, freshness, flush mechanisms, and so on--and these multiply in the presence of more than one machine. Local caches simply are not designed to be shared, in most scenarios, and so they make assumptions accordingly.
Remotely caching built artifacts changes this equation, because a central server has a chance to see this traffic, and optimize it better than a local cache can muster. With a sufficiently robust server-side agent, caching effectiveness can reach 80% - 90%, in terms of build-steps skipped (cache hits, in this case).
Server agents which operate in tandem with local builds can offload time-consuming tasks like compression, replication, and signature verification, yielding even more gains.
Because of the lax durability guarantees but global replication needs of a build cache, no easy solution exists: in-memory datastores like Redis only solve the problem one region or server at a time, and some objects should be persisted for a long period of time (but not all), necessitating support a coordination layer; that is the architecture presented here, for Buildless.
Buildless partners with Cloudflare to provide best-of-breed networking and edge services. In 300 cities around the world, Buildless runs on servers, and holds the most useful cached objects at the edge of our network within Cloudflare's datacenters.
TBD: Diagram of Edge
Our edge network on Cloudflare runs in every datacenter, and enjoys optimized anycast routes to the internet.
Buildless uses top-notch encryption and compression technologies to deliver the best protocol and transport experience possible. Our edge network maintains active status of higher-order components and can route around issues as they occur.
Cache clients are automatically routed to the lowest-latency region available for use.
- TLSv1.3 supported and encouraged!
- Brotli at level 11 supported
- Many other optimizations, through Cloudflare and our own services
TBD: Diagram of Origin
Our origin systems
Once traffic arrives to the Buildless network, it moves to our origin servers, which run our API, event processing, queueing, and related services.
Buildless partners with DragonflyDB for our in-memory datastore. Our adapters, datastructures, and techniques at this level are proprietary, but, in general, we optimize for a very specific use case:
- Pseudo-Durability: Best-effort persistence guarantees. Read more in our Durability guide.
- Overwrite-Only: No functionality for upserts, updates, diff-based changes, etc., are supported.
- Async Write-path: Write paths are heavily optimized to defer the work of compressing, encrypting, and replicating cached objects. This keeps build steps which contribute data to the cache fast, and build speeds fair.
- Aggressive Read-path: Read paths are heavily pre-optimized during background write processing. Replication is immediate, but no consensus requirement exists before written objects are served.
- Never Fail: The cache is designed to never, ever fail your build. At worst, failures by Buildless should result in a lack of acceleration that would normally be enjoyed by the developer, but at no time should your build fail because of an error, outage, billing issue, or any other issue on our end. Read more in our Reliability guide.
Buildless is inherently a data-handling platform: data is either moving in (cache stores) or out (cache hits). This section describes how Buildless treats cache data at-rest (while held for use), and in-flight (as it moves through our systems).
TBD: Diagram of Data At Rest
Data at rest is always encrypted and compressed, in addition to signing steps applied by clients.
- Buildless maintains service-level symmetric encryption mechanisms
- Each Buildless account maintains account-scope symmetric encryption mechanisms
- Underlying services used by Buildless apply their own symmetric encryption, in many cases
Buildless leverages best-of-breed symmetric encryption technologies and algorithms, and keeps systems aggressively up-to-date, with configurations that express current best practices.
Buildless accounts with custom encryption keying skip the middle step above, with their keys replacing the keys normally used by the Buildless service.
TBD: Diagram of Data In Flight
Data in-flight is always encrypted, with perfect forward secrecy and mutual certificate verification.
Buildless leverages end-to-end mTLS within our network core, and supports the latest transport encryption technologies at our edge.
- Customer data is encrypted at all times in transit, even internally within our network
- TLSv1.3 is supported and used internally and externally, with PFS (Perfect Forward Secrecy) active
- Systems specifically withhold support for broken, weak, or flawed algorithms
Replication is broadcast, wait-free, and best effort.
Buildless doesn't make clients don't wait for replication. This process happens in the background, because the same object is unlikely to be fetched (or stored) within two regions in a narrow time period.
Replication always overwrites, just like a normal write operation.
Updated 30 days ago