Skip to content
Recruitment BlogSoftware Reliability at Optiver
recruitment blog

Software Reliability at Optiver

October 19, 2018

Our next series of posts is drawn from a document written by our CTO, Pierre Salverda, three years ago: Software Reliability at Optiver: Guiding Principles. We introduce this series with an examination of the problem.

Errors are inherent in the building of software systems. Therefore, a strategy for mitigating the risk of error is paramount. As we mentioned in our first post, Optiver is a proprietary trading firm operating in a domain characterized by intense competition among technically sophisticated firms. In building and operating trading systems, we primarily encounter two broad classes of risk:

1. Automated Trading Risk (ATR): This is the risk that our Automated Trading Systems malfunction in a way which results in ongoing financial loss to Optiver or even market disruption more broadly. The former case is worrisome for obvious reasons. The latter exposes Optiver to reputational damage and possibly even regulatory sanctions, and it erodes public faith in the markets as a whole.

2. Erosion of Trust: Collaboration between Optiver’s trading and technology groups is key to our long-term success. We trade confidently when our traders trust the system to behave as they expect. Both groups are excited to tackle new problems and release modifications to our system given a track record of rapid, solid releases. System errors erode this foundation of trust, resulting in nervous trading and fractured collaboration.

It is overly simplistic, and counterproductive, to make a complete absence of errors the goal. Development productivity can grind to a halt in pursuit of perfection. An unproductive technology group can erode trust in similar ways to a sloppy one. Due to this, we must have a nuanced understanding of what is acceptable to Optiver, and what is not. Disrupting the market is completely unacceptable. We must guard against that as much as possible. Large financial loss is also unacceptable, as a catastrophic error could lead to the end of Optiver. Minor financial loss, on the other hand, is the beginning of a gray area. Minor financial loss may, while regrettable, be acceptable in pursuit of a corresponding longer-term or expected-value gain. Productivity loss, while also regrettable, is often preferable to the “analysis paralysis” which typically accompanies attempts at perfection.

In the coming weeks we will examine six principles which form the foundation of our approach to software reliability at Optiver: 

1. Not all errors are equal. The extent to which errors in different parts of the infrastructure can expose the firm to real risk is highly varied. 2. Software testing is just part of a reliability strategy. The correctness problem is best tackled on many fronts, the most important of which is system design. 3. Design correctness into software rather than test errors out. System design and correctness are inextricably linked – because design determines complexity, and complexity induces errors. 4. Architecture underpins correctness. It is often easier to contain the damage caused by errors than it is to avoid the errors outright. 5. The software development team is responsible for reliability. Reliability is best achieved by injecting these concerns directly into the software development process itself, not by establishing software verification as an isolated step at the end of the development cycle. 6. Culture matters. Everyone must understand and care about continuous improvement of software reliability.

Each week we will present a pair of these principles as excerpts from this document. Coupled with those principles is a discussion of how we have put them into practice. Finally, we will close this series with an examination of Test Driven Development by Joseph Fourness, the lead of our Architecture team here in Chicago.

David Kent, Chief of Staff – Technology David is a Stanford Computer Science alum and spent several years as a developer at Amazon.com. He joined Optiver as a Software Engineering Lead in 2009 and has led many of Optiver’s software development teams. He is presently Chief of Staff for the Optiver US Technology Group.