Journal

Field notes from applied AI.

Essays on infrastructure, evaluation, governance, and the unglamorous work that separates demos from products.

FeaturedDec 02, 202512 min read

Why your model regressed at 03:14 AM (and what to do about it)

A walkthrough of three real incidents — and the eval, observability, and rollback patterns that would have caught each of them before a single user noticed.

Read essay

Recent

Latest writing.

EngineeringNov 24, 20258 min

Speculative decoding without the regrets

How we reduced LLM serving cost by 41% without changing a single model weight — and the three places it bit us in production.

EvalsNov 11, 20256 min

The eval harness we wish we'd built first

An opinionated take on what belongs in your evaluation pipeline before you ship anything to a paying customer.

ProductOct 28, 20254 min

Introducing Guardrails 2.0

Policy-as-code is now inline at inference — and configurable per tenant, route, and audit class.

ResearchOct 14, 202511 min

On the cost of "good enough" retrievers

A measured look at where investing in retrieval quality pays back — and where it doesn't.

CustomersSep 30, 20257 min

How Northwind cut inference cost 3.1×

A customer engineering story about moving a multi-tenant LLM workload from a generic provider onto Xelvoraa.

NotesSep 18, 20253 min

The case for boring AI infrastructure

Boring is reliable. Reliable is fast. Fast is what your customers actually wanted.