Back
Life at Optiver  · 

Software Reliability at Optiver: Design

When errors occur in our environment, we aim to learn everything we can from them so we can improve our processes, system, and culture. However, a common response to errors betrays a misunderstanding of where they come from: why wasn’t this caught in testing?

Pointing to a lack of testing betrays a presupposition that the proximate problem is the one to be solved, and that future prevention of it can be guaranteed with exhaustive testing.

Today we examine a different philosophy: architecture and design form the foundation of correctness, not testing.

Design correctness into software rather than test errors out of it

It is better to build quality into the system from the start than to “test errors out” at the end. Test-Driven Development techniques, which exemplify this mindset, require that tests be written prior to writing code, preventing errors from entering the code in the first place. The tests and the code co-evolve at a very fine granularity: testing and thinking about correctness are ideas that form part of the design and development process itself.

Perhaps the most important feature of this style of software engineering is that it inevitably leads to lightweight systems with simple designs. Without doubt, simplicity of design is the most important factor in assuring correctness of software. The converse is true also: complex designs are without doubt the most error-prone. They are also the designs that benefit the least from testing.

Architecture underpins correctness

We strive to build logical checks and protections into the software systems themselves. This mindset is best reflected by our automation risk controls, such as perimeter limits* and trade reconciliation**, components which give assurance that the system is working as intended, or at least within tolerable bounds. This way of thinking constitutes an architectural ideal, not a software development methodology. It encourages developers to build tools and systems whose sole purpose is to reduce our vulnerability to errors in other parts of the system.

A principal benefit of this approach is that it facilitates decoupling the parts that need to be correct from those that can tolerate some errors. In the ideal case, the latter parts are precisely those that we need to change frequently (e.g. the strategy-level logic); the former are the more static parts. Being static, they lend themselves to very rigorous reliability assurance techniques, and, being isolated, they allow us to focus our attention on the parts where correctness is paramount.

* Perimeter Limits: These are a variety of price, quantity, and rate limits we check just before an order is sent from our system. These checks are contained in separate, heavily tested, tightly controlled libraries which are leveraged by our automated trading strategies.

** Trade Reconciliation: We compare trades booked by our automated trading system against trades reported by an independent source to check that our record of the trades we have made matches the exchange’s record.

A great example from our system’s evolution can be seen in “trade booking”, the process we use to guarantee every trade we make is recorded. Ensuring correctness and reliability in trade booking is a key part of our foundation for protecting against automated risk. Numerous examples of catastrophic trading losses stem from trading firms being unaware of trades made by their automated systems.

Our system was originally designed such that a trading application would not be able to execute until the “trading booking” component had connected to it. To use client-server parlance, the trading application was the server accepting a connection from its client, the trade booking component. This design, an artifact of the evolutionary history of the system, made it difficult to guarantee that every trade made by our system was recorded. The trading system would need to have, as a normal, expected state of operation, one in which it is running, but unable to trade because its client has not yet connected. This requirement for an additional state was at odds with the purpose of the system, which is to submit orders and make trades. While this design can be tested for correctness, it made the system unnecessarily complex.

We decided to simplify by inverting the client-server relationship. If the trading application (now a client) could not establish an initial connection to the trade booking component (now a server), it would crash. If the trade booking component disconnected, the trading component would also crash. If an error occurred while sending a trade to the trade booking component, again, the trading component would crash. In short, the system was designed to only run if it could book trades.

This change eliminated needless complexity, reduced the number of states to be tested, and provided guarantees about trade booking, thus improving correctness and reliability. Any potential damage due to a system error is now contained, deterministic, and made known to the operations team immediately. In the case of a system error, at most we will drop a limited number of trades before the system shuts down, and a human-driven process is then used to recover.

A final note: this design decision is only effective in practice because we ensure it is universally applied throughout Optiver’s systems. The key to that universal application is the subject of our next post: people.

David Kent, Chief of Staff – Technology

David is a Stanford Computer Science alum and spent several years as a developer at Amazon.com. He joined Optiver as a Software Engineering Lead in 2009 and has led many of Optiver’s software development teams. He is presently Chief of Staff for the Optiver US Technology Group.

Life at OptiverMeet the team
Insights

Related Articles

  • Life at Optiver

    Insight to action: The world of equity analysts at a market maker

    Investment acumen meets instinct In the ever-evolving world of the capital markets, the role of Equity Analyst stands out as a goal for those with a penchant for curiosity, analysis and investment acumen. The position is not just coveted for its intellectual rigor and the pivotal role it plays in investment decisions. Essentially, it provides […]

    Learn more
    Americas
  • Experienced, Life at Optiver, Technology

    Behind the scenes: Engineering Optiver’s global trading network

    Optiver's global trading network is a marvel of engineering, ensuring rapid and reliable data transmission essential for electronic trading. Network Engineer Ryan Bennett reveals how dedicated fibre optic cables and meticulous route planning maintain Optiver's competitive edge. Despite challenges like geographical hurdles and fibre cuts, the network's resilience and continuous improvement keep Optiver at the forefront of trading innovation.

    Learn more
    Europe, Global
  • Experienced, Life at Optiver

    Risk and reward within a dynamic trading firm: Insights from Optiver’s CRO Europe

    In business, risk management is often thought of as a of back-office support function—the department generally responsible for steering a company away from pitfalls and worse-case scenarios with cautionary, arms-length advice. Not at Optiver. In our high-stakes trading firm environment, it’s a core discipline that directly impacts the success of daily trading operations. As Optiver […]

    Learn more
    Global
  • Nicolas_Infrastructure_as_code
    Series
    Experienced, Life at Optiver, Technology

    Navigating Infrastructure as Code (IaC) in a non-cloud trading environment

    In the high-performance landscape of algorithmic trading, technological infrastructure isn't just important—it's critical. While Infrastructure as Code (IaC) is a well-established practice in cloud-based solutions, its application in non-cloud environments presents unique challenges, especially in latency-sensitive environments like ours at Optiver.

    Learn more
    Global
  • Series
    Life at Optiver

    From ideation to production: US tech intern summer projects

    Foreword by US CTO, Alex Itkin One of the most exciting parts of summer at Optiver is hosting the ever growing intern cohort. This summer in the US alone we had 35 interns working across our software, hardware and trading infrastructure teams. The goal of the internship is to give students an opportunity to spend […]

    Learn more
    Americas
  • Series
    Life at Optiver

    Tech intern projects at Optiver Amsterdam

    This summer, Optiver’s Amsterdam office hosted a group of tech interns eager to tackle the challenges of market making. Beyond just theory, they worked hands-on with our core trading technologies, directly engaging with some of the most interesting technical challenges in the financial industry.  In this blog post, four of our Software Engineering interns delve […]

    Learn more