Back
Life at Optiver  · 

Software Reliability at Optiver: People

A “perfect” process and pithy principles only go so far. The pursuit of software reliability must be lived and breathed by the entire organization- especially in the pursuit of a complex endeavor like software engineering in a dynamic environment like financial technology. It cannot be encoded in a strict process or delegated to a single team.

The software development team is responsible for reliability

Assuring the reliability of software is best achieved by injecting reliability concerns directly into the software development process itself, not by establishing software verification as an isolated step at the end of the development cycle. This is because software reliability and system design are inextricably linked; decoupling the two fundamentally undermines the former. As a result, we believe that assuring software reliability is the responsibility of the development team, not that of a distinct person, team or role.

Creating a distinct Quality Assurance or Software Tester role (or team) devalues the input that the developers can have. They are the people who can best understand what can go wrong with a system and how errors might arise.

Distinguishing development from testing roles also undermines the importance the former group ought to place on correctness. This runs contrary to one of an engineer’s core obligations, which is to ensure the system behaves correctly. It also opens the door to the evolution of designs that are simultaneously error-prone and less amenable to rigorous testing.

One of the more common reasons for establishing a separate testing team or testing role is that a second and independent person thinking about and checking a system can improve the chances of finding problems. While we acknowledge that the involvement of more than one person can improve software quality, we feel this is best achieved at the design and build stage, and by means of intensive code and design review – from other developers.

Culture matters 

We depend critically on a team process and an office culture that understands and facilitates continued improvement on the software reliability front. This applies not just to the development team, but to other stakeholders such as traders, operations engineers, and other users of our software. Those stakeholders need to understand the importance of software reliability concerns, and know those concerns are an integral part of the development process. Indeed, they need to encourage and proactively support software engineers in this area.

In our day-to-day work we try to record every unexpected occurrence in our environment as a “Production Event”. These run the gamut from a bad keyboard to a major outage of our trading system. We dive into each of these incidents and try to understand the root cause. We consciously avoid chalking events up to bad luck, a perfect storm, or user error.

A key element of this process is that every person involved must have high expectations for our system’s performance and the operational excellence of all who use, run, and maintain it.

As an example, recently one of our operations engineers was surprised when a normally benign user error caused more damage than expected. She dug into the full timeline of events in our system and realized this user error was only possible because of a design flaw. She suggested a design change to our software engineers and asked probing questions about its implications. In response our software engineers researched the answers to her questions and found her proposal was a positive step forward. It would make our system safer with no functional downside, so they moved ahead in implementing it.

A culture of correctness must permeate the entire organization for this approach to software reliability to succeed. In this circumstance, the operations engineer thought about how the system should work, realized it was more broken than it appeared, proposed a change, and pushed the software engineers to respond. The software engineers recognized that good system design ideas and questions can come from anywhere within the organization, respected the opinion of their colleague, and changed the system accordingly.

David Kent, Chief of Staff – Technology

David is a Stanford Computer Science alum and spent several years as a developer at Amazon.com. He joined Optiver as a Software Engineering Lead in 2009 and has led many of Optiver’s software development teams. He is presently Chief of Staff for the Optiver US Technology Group.

Life at OptiverMeet the team
Insights

Related Articles

  • Experienced, Life at Optiver

    Machine learning opportunities in capital markets

    Solving problems at scale The allure of “problems at scale” is significant for researchers aspiring to transition from academia to the private sector. At Optiver, we are constantly scaling up in every dimension – adding more features, models, financial exchanges on which we trade; and expanding our range of products, asset classes and geographic colocations. […]

    Learn more
    Americas
  • Life at Optiver

    Insight to action: The world of equity analysts at a market maker

    Investment acumen meets instinct In the ever-evolving world of the capital markets, the role of Equity Analyst stands out as a goal for those with a penchant for curiosity, analysis and investment acumen. The position is not just coveted for its intellectual rigor and the pivotal role it plays in investment decisions. Essentially, it provides […]

    Learn more
    Americas
  • Experienced, Life at Optiver, Technology

    Behind the scenes: Engineering Optiver’s global trading network

    Optiver's global trading network is a marvel of engineering, ensuring rapid and reliable data transmission essential for electronic trading. Network Engineer Ryan Bennett reveals how dedicated fibre optic cables and meticulous route planning maintain Optiver's competitive edge. Despite challenges like geographical hurdles and fibre cuts, the network's resilience and continuous improvement keep Optiver at the forefront of trading innovation.

    Learn more
    Europe, Global
  • Experienced, Life at Optiver

    Risk and reward within a dynamic trading firm: Insights from Optiver’s CRO Europe

    In business, risk management is often thought of as a of back-office support function—the department generally responsible for steering a company away from pitfalls and worse-case scenarios with cautionary, arms-length advice. Not at Optiver. In our high-stakes trading firm environment, it’s a core discipline that directly impacts the success of daily trading operations. As Optiver […]

    Learn more
    Global
  • Nicolas_Infrastructure_as_code
    Series
    Experienced, Life at Optiver, Technology

    Navigating Infrastructure as Code (IaC) in a non-cloud trading environment

    In the high-performance landscape of algorithmic trading, technological infrastructure isn't just important—it's critical. While Infrastructure as Code (IaC) is a well-established practice in cloud-based solutions, its application in non-cloud environments presents unique challenges, especially in latency-sensitive environments like ours at Optiver.

    Learn more
    Global
  • Series
    Life at Optiver

    From ideation to production: US tech intern summer projects

    Foreword by US CTO, Alex Itkin One of the most exciting parts of summer at Optiver is hosting the ever growing intern cohort. This summer in the US alone we had 35 interns working across our software, hardware and trading infrastructure teams. The goal of the internship is to give students an opportunity to spend […]

    Learn more