Digging into market data before the Nasdaq blackout at 12:20 EDT on August 22, 2013, we came
across several significant periods of extremely high quote bursts. When we plotted
quote rates by individual multicast line (there are 6 used by the Tape C Securities
Information Processor or SIP), we discovered that each quote burst mapped directly to an individual multicast line. Furthermore, the 3 quote bursts at 11:48, 11:50 and 11:54
- which were just minutes before Nasdaq decided to shut down the SIP - were each comprised of a loop of the last 50 minutes worth of quotes. Basically, the previous 50 minutes
worth of quotes in hundreds of stocks were blasted, with fresh new timestamps, to millions
of subscribers, over a 3 second period.
To make matters worse, when these thousands of stale quotes
were resent, they were grouped by reporting exchange: the last 50 minutes of EDGE quotes were sent (immediately),
then about 5 seconds later, the last 50 minutes of BATS quotes were sent (again, immediately),
and so on
for up to a dozen exchanges (the order of exchanges was inconsistent among
different stocks). Worse - during the few seconds between the batches of
stale quotes, real, actual quotes were mixed in. Since each stale quote was given
a fresh new timestamp, it was impossible for anyone or any computer to distinguish the stale quotes from actual
real-time quotes. Even the SIP was confused, because with each bogus stale quote, a
new (incorrect) NBBO was calculated and sent! Meaning thousands of bogus NBBOs were also blasted to
millions of subscribers in many bell-weather stocks, such as EBay, Microsoft, Telsa Motors and Yahoo. No wonder
Nasdaq decided to pull the plug.
Conclusion
One thing we can conclude: this burst of quotes had to originate with the SIP
itself. It is highly unlikely that anyone or any process external to the SIP environment
could have caused or triggered the resending of old quotes; grouped by exchange no less.
Which means ARCA, and anyone external to the SIP could not have caused the quote bursts
which probably led to Nasdaq deciding to shut down the SIP. Furthermore, based on comments
by officials in the press, the exchange was ready to go back on line just a short 20
minutes later. That is not nearly enough time to assess and take corrective measures,
if the events were indeed triggered by an unknown, external issue. Lastly, when trading
resumed 30 minutes before close, the severe quote updating
problems from ARCA persisted, but there were no more bursts of stale
quotes.
Update
From this reuters article there is a strong implication that TCP
is used as the inbound protocol for a contributor (an exchange, ARCA in this case) publishing (sending data) to
the SIP (run by Nasdaq):
Nasdaq believes Arca's connectivity problems ultimately led to a freeze
in the SIP, prompting the exchange to shut down the connection just before the processor
froze, they said.
Because of Arca's repeated attempts to connect to the SIP, the processor's
memory reached capacity, its servers were overwhelmed, and it was unable to revert to
backup systems.
TCP connections require "memory" (available ports actually, but close enough). Latency
sensitive, mission critical data feeds use UDP (multicast), they would never use TCP
- because there is an inherent delay, and a connection requirement that could fail.
UDP doesn't require a "connection" or memory and avoids latency, but this is a topic
for another day. The take away is that TCP is used, and there are limits to the number
of TCP connections.
If this is true, then the following theory fits the available facts.
A Theory
ARCA's connection to the SIP breaks, so it retries, connects for a short period of time,
which then breaks, another connection, which breaks, over and over in quick succession. Each connection reducing the total number of available connections
(temporarily - for a few minutes), so that eventually any new connection fails. If Nasdaq is monitoring the health of the SIP via polling TCP, it won't be able to connect either (all connections
are exhausted) and will think the
SIP is down. But they probably see the SIP is still sending quotes from the outbound
side (which, by the way, uses UDP/multicast). The engineers get the back-up SIP ready,
but the back-up SIP doesn't know where the production SIP (the one not accepting connections)
left off, because they can't connect to it either.
The back-up SIP starts making requests to each of the 12 or so exchanges for
the last 50 or so minutes of quotes (probably from the last known feed positions recorded before
connections were exhausted).
The back-up SIP request 50 minutes from EDGE, and transmits those, then requests and
transmits 50 minutes from BATS, and so on. Sound familiar? It should, because that is exactly the
pattern we see in the data.
Details
The SIP sends quotes for all Nasdaq listed symbols over 6 multicast lines: each line carries
a subset of stocks grouped by symbol (for example, multicast line #1 carries symbols between
"A" and "CDZZZZ") as shown in the following table:
Multicast Line
Symbol Start
Symbol End
1
A
CDZZZZ
2
CE
FDZZZZ
3
FE
LKZZZZ
4
LL
PBZZZZ
5
PC
SPZZZZ
6
SQ
ZZZZZZ
The first chart below plots the number of messages for each multicast line between 10:50
and 12:10 EDT. Note there are several message surges: each of which is confined to an individual multicast
lines (single color). The surge on line #6 at 10:55 (red line) is from zeroed bids and asks from
ARCA (this is detailed in another chart below). The 3 surges at 11:48 (blue), 11:50
(red) and 11:54 (green) are actually from a resending of
the previous 50 minutes worth of quotes as if they were new quotes. Each quote had a new timestamp
and marked as if it were real-time - which caused these quotes to update the NBBO!
We detail this in the stock ORLY in 2 below.
1. Tape C message count by multicast line.2. Bids and Asks for Symbol ORLY.
Note how the darker pink and blue lines (right side) are really just duplicates of the
lighter pink and blue lines (left side). Also note how the duplicate stream of quotes (right sides) are sent over a few seconds,
while the original quotes took almost an full hour. The blast was from resending about
an hours worth of quotes as fast as possible!
3. The Stack of Charts below show ORLY and examples of other symbols with the
same quote blast loop.
Note that the NBBO was affected by these quote blasts - software processing these quotes
would think the quotes were new. And the wide price range (the last hour's range) would
be sure to cause mass confusion.