Infrastructure-as-Code for Latency-Critical Bare Metal Systems

Infrastructure

Leanne Fok is a Senior Infrastructure Engineer in Amsterdam with 15+ years at Optiver. She previously led the Infrastructure Platform team responsible for our internal Infrastructure-as-Code stack and is now leading the Infrastructure Kubernetes project.

Infrastructure-as-Code is straightforward in the cloud. But what about on bare metal?

In her PyLadies talk, Infrastructure-as-Code in a Latency-Critical Bare Metal World,” Senior Infrastructure Engineer Leanne Fok shares how we brought declarative IaC principles to physical, latency-sensitive trading infrastructure where nanoseconds matter and every cable is intentional.

Managing thousands of bare metal servers across global data centers introduces a different class of problems. There is no managed control plane. No provider APIs to rely on. Just hardware, strict performance constraints, and very little tolerance for configuration drift.

So how do you:

  • Define infrastructure intent clearly enough to automate it
  • Enforce standards across physical devices, ports, power, and networking
  • Detect when real-world state quietly diverges from what you declared
  • Build trust in automation while migrating from legacy processes

In this talk, Leanne walks through the architectural patterns we adopted, including declarative intent modeling, reconciliation loops, truth collectors, and a custom Terraform provider that translates infrastructure definitions into enforceable standards.

Watch the full 15-minute talk

Interested in latency-critical systems?

Explore

 and learn more about the 
infrastructure challenges
behind them.

Related articles

View all

We sat down with Pat Cooney, Head of Platform Engineering, to talk about what Platform Engineering means here and where agentic AI fits into the picture. Pat has spent over a decade at Optiver across markets, regions, and roles.

When people talk about developer productivity, they often jump straight to tools: powerful coding agents, faster compilers, smarter automation. These things matter, but they are not the whole story.

Pushing Postgres beyond storage

Software

In most systems, the database acts as a boundary. You write data into it, and other systems read from it. If you need something more dynamic, like reacting to changes as they happen, you usually introduce something alongside it, whether that is a service layer, a queue, or a stream.

Large language models (LLMs) are getting surprisingly good at learning the basics of trading. Consider that the latest models are able to perform tasks like pricing simple scenarios, reasoning through rules and even outlining basic strategies.

UI as a Systems Problem

Data Engineering

If a UI doesn’t feel instant, it feels broken and users start to question what they’re seeing. A grid lags, values don’t update when expected, or a filter that used to feel instant starts to slow down. In high-demand systems like a trader’s workstation, a single desktop may be running many latency-sensitive applications at once, all competing for CPU, memory, and network bandwidth, so issues can show up quickly.

When Speed and Scale Collide

Data Engineering

Data systems are often described along two axes: speed and scale. In practice, “speed” usually means some combination of latency and throughput, and systems are often optimized for one at the expense of the other, sometimes by trading efficiency for raw capacity. Those distinctions tend to break down quickly once systems move beyond simple use cases. Once a system is both data-heavy and interactive, speed and scale stop being independent variables. Decisions made to improve one almost always affect the other, sometimes in ways that are not immediately obvious and only surface under real usage.

Click below

Learn more about Optiver