In our previous post, we described how CME’s iLink upgrade would require us to rethink our trading systems. This shuffling of design priorities would force us to embrace technology new to Optiver, and reconsider previous priorities. To survive this change, we would have to innovate. Technological innovation is often associated with places like Xerox PARC, Microsoft Research, and Google X; unbridled creativity manifested in exciting visions of the future. But at Optiver we aim for a different type of innovation.
We have no specialized R&D team at Optiver. We have only dipped our toes into hot, “new” technologies like Machine Learning and Cryptocurrency. Rather, we view innovation as a natural outcome of a problem-driven engineering process. We try to deeply understand and embrace the constraints of the problem at hand and then apply a disciplined, rigorous, scientific process to solve it. We aim for best-in-class technology used every day by real traders on real exchanges to improve the market. Through the years we have seen that:
Constraints + Discipline = Innovation + Flexibility
Constraints + Discipline
By constraints we mean, in a nutshell, an exhaustive understanding of the problem. We aim to know everything from trading requirements to physical limitations to architectural guidelines. We let the problem guide our decision-making rather than the latest technical fads. As we examined the CME’s changes, some of the constraints we looked into were:
- Trading Constraints: What were the most important signals from the market which indicated we needed to change prices or respond to an opportunity? How did the CME’s different “matching algorithms” affect those signals and the latency envelope in which we needed to respond?
- Physical Constraints: Given a 10Gig line to the exchange, what is the latency impact of each meter of cable? How many ports are on our switches? What is the distance between the switch and the servers in our rack and what is the shortest possible length we can make a fiber optic cable before it loses the ability to bend enough to connect a server to a switch?
- Architectural Constraints: How fast can our current system update “theoretical prices” (our opinion of the current fair trading price for an option)? Why? Can we trade accuracy for speed? How much speed do we need?
We also embraced constraints not imposed by the problem itself. Some were self-imposed. For example, at the edge of our systems we have a number of “perimeter limits”, which are safety checks that occur just before sending a message to the exchange. These ensure that our system is behaving as we expect. Our CTO, Pierre, challenged the designers of the system to boil down these checks to a single counter in hardware, rather than a counter for every limit.
A self-imposed constraint like that can often simplify decision-making. By imposing a constraint like a single counter, we were forced to prove we needed more. We found that while 1 counter was possible, having 2 resulted in a far less complicated system, while still allowing us to reduce the complexity of the risk checking system compared to one with a dozen or more counters.
In other cases, we realized we were needlessly butting our heads against self-imposed constraints that had nothing to do with the problem. One was our decision to implement the system using C++, a language we use quite extensively. We quickly realized that, because of the importance of the latency constraint, we were having to get very smart – too smart – about nuanced features of modern C++. For us, that’s a red flag. It was clear that the language itself was imposing constraints on our thinking, distracting us from the problem at hand. We realized we could eliminate all those worries simply by working in C. Instantly our focus shifted back to the real problem and the constraints it imposed on us – the intrinsic constraints.
Once we understand our problem, we come up with a potential solution and a theory about how we expect that solution to behave in the market. We carefully construct that solution, verify its behavior with our traders, and then release it and carefully analyze the results. Based on the comparison of those results to our expectations, we then decide on the next step of iteration and repeat the entire process. At every step we let results evolve our understanding of the problem.
For this project some areas of focus were:
- The first release of our FPGA system did almost nothing. It connected to the exchange and served as a TCP pass-through. This allowed us to validate, test, and harden our deployment, configuration, and operational processes before worrying about trading strategy nuances.
- We tightly defined expectations of the latency of our new system and monitored it closely to ensure it met those expectations.
- We built our new system to initially handle only the most profitable subset of signals and responses. This allowed us to focus on a small slice of the problem.
- We put in place a number of metrics and audits and watched the performance and behavior of our new system versus our old system.
Innovation + Flexibility
As a technology group in a trading company, our goal is to solve trading problems. We only want innovation that facilitates this goal. We are not unique in aiming for this type of innovation. The very first touch screen was viewed as wildly innovative. So was the first cell phone with a vague semblance of “internet” connectivity. While these were inventive and innovative in a sense, the market-impacting innovation came as the result of creating a product: the iPhone. For the first time a group of engineers and designers figured out how to combine a usable and compelling touch screen with connectivity to the full internet in a palm-sized device with all day battery life. To top it off, everyone from my two-year-old son to my ninety-year-old grandma could use it with no explanation.
With the upgrade to iLink, we found that a number of business problems were immediately solved. Some improvements we saw upon release of iLink 2.0 were:
- Our most profitable and latency sensitive trading strategies were seamlessly able to leverage FPGA technology.
- Our standard metrics for evaluating system performance noticeably improved.
- New strategic avenues were opened to trading. The increased determinism gave our traders the ability to pursue strategies which were previously impossible to execute profitably.
- Nailing our deployment process enabled us to rapidly and reliably update our FPGA-based trading systems.
The final two bullet points touch on flexibility. Our hope is that our technical solutions, in addition to solving trading problems, increase our flexibility. The software itself is not highly configurable. But we see a flexible system as one that is easily modifiable to meet emerging needs. We embrace the “soft” aspect of software and the “field-programmable” aspect of FPGAs and build systems which are specific enough that we can rapidly change the code and hardware itself, test it, and deploy it in a fast and iterative way.
The common knowledge about FPGA development is to expect timelines that are twice as long as software. In practice we have seen the opposite: our software engineers are often the ones racing to catch up to the progress of our hardware engineers. A key reason for that is we embrace the inherent constraints of our problems, technology, and infrastructure, and are disciplined in the engineering process we apply along the way.
David Kent, Chief of Staff – Technology
David is a Stanford Computer Science alum and spent several years as a developer at Amazon.com. He joined Optiver as a Software Engineering Lead in 2009 and has led many of Optiver’s software development teams. He is presently Chief of Staff for the Optiver US Technology Group.