
Why your model regressed at 03:14 AM (and what to do about it)
A walkthrough of three real incidents — and the eval, observability, and rollback patterns that would have caught each of them before a single user noticed.
Read essayEssays on infrastructure, evaluation, governance, and the unglamorous work that separates demos from products.

A walkthrough of three real incidents — and the eval, observability, and rollback patterns that would have caught each of them before a single user noticed.
Read essay
How we reduced LLM serving cost by 41% without changing a single model weight — and the three places it bit us in production.

An opinionated take on what belongs in your evaluation pipeline before you ship anything to a paying customer.

Policy-as-code is now inline at inference — and configurable per tenant, route, and audit class.

A measured look at where investing in retrieval quality pays back — and where it doesn't.

A customer engineering story about moving a multi-tenant LLM workload from a generic provider onto Xelvoraa.

Boring is reliable. Reliable is fast. Fast is what your customers actually wanted.