Introduction
Bandwidth
Packet Size
Speed
Comparison to WinZip
Conclusion
Press Release
Contact
Sales
Jerry
Chandler
800-652-2291
402-255-8229
NxCoreSales@dtn.com
Name
|
Bytes |
Kilobyte |
1,000
|
Megabyte |
1,000,000
|
Gigabyte |
1,000,000,000
|
Time
Unit
|
Per second
|
Millisecond
|
1,000
|
Microsecond
|
1,000,000
|
Nanosecond
|
1,000,000,000
|
|
Introduction
One day of data from a direct OPRA feed, October 18,
2006, was compressed with nanex compression
technology and analysed in
detail using a PowerEdge 1750 with dual 3.2GHz Xeon processors, 2 GB of
RAM and Fujitsu 146GB SCSI hard disks on a 533MHz front side bus.
Results are shown for the time period 9:30am to 4:00pm which is the
open and closing of trading. Data before and after this period is not
shown so that the most important time periods can be scaled to show
important details. In general, the off-hours period has similar
characteristics, but at much much lower and negligible values.
Compression Bandwidth
Figure 1 below illustrates the incredible size reduction
achieved by nanex compression.
Note how the uncompressed OPRA data peaks in the 90mbps range,
while the nanex compressed data is easily contained in 5 mbps. A standard T-1
Telco circuit is 1.5 mbps. Typical cable modems are capable of between
2 and 6 mbps.
Figure 1.
OPRA Direct Bandwidth with and without nanex compression in 1 second intervals. |
Figure
2 below shows the same data in Figure 1, but plotted
on a logarithmic scale
to show compression details. Note how closely the compressed stream
tracks the source data even though the data from opening rotation
differs significantly from data after that period. This is due to
the fact that nanex
compression automatically and quickly adapts to characteristic changes in the input
stream; an important
feature for real-time financial data compression.
Figure 2.
Same data from Figure 1, but plotted in Logarithmic scale for
comparison details. |
Figure
3 below shows the Compression Ratio which is simply the ratio of the
two lines in Figure 1. The compression ratio plotted over time
clearly illustrates the compression engine's ability to adapt to changing market
conditions. The inverse
of the
compression ratio subtracted from 1.0 and expressed as a percentage is
another common way to describe the amount of compression. A Ratio of 20 to 1, shown simply as 20 in the
graph below, would become 95% (1.0 - 1/20)
Figure
3.
The ratio of uncompressed message size to compressed message size |
Packet Size
The majority of OPRA packets during trading hours are 991
bytes in length. The second most common size is 941 bytes. The
distribution of packets sizes with significant occurrences is shown in
Figure 4 below. In this chart, note that there were approximately
42 million OPRA packets with a length of 991 bytes. Each OPRA packet
contains multiple messages -- typically 14 to 15 quotes. A regular
quote update is 65 bytes. If the NBBO (National Best Bid/Offer) changes and
that information is not part of the quote message, there will be one or
two 16 byte appendages for a total message size of 81 or 97 bytes
respectively. Because OPRA packets cannot exceed 1000 bytes by
definition, and messages are never split between packets, there is
always some wasted transport space. As a general rule of thumb, the
average OPRA quote message size is 66 bytes.
Figure 4.
Distribution of OPRA Direct Packet Sizes |
The distribution of nanex compressed
packet sizes is shown in Figure 5 below. The
most common size is 52 bytes. The very smooth bell shaped distribution
is a characteristic of a well balanced compression algorithm.
Figure 5.
Distribution of nanex compressed
OPRA Direct packet sizes |
Compression Speed
Pay close attention to the units of time chosen to describe latency or
the speed of a critical algorithm. For real-time financial latency
measurements, the term "milliseconds" should have disappeared years
ago, yet it's common to find articles today describing "low latency" in
terms of milliseconds, or worse the term "sub-second" is used.
In one millisecond, there are 1,000 microseconds or 1 million nanoseconds. A 1 GHz machine has an
instruction clock that ticks once per nanosecond or 1 million times per
millisecond. In terms of cpu processing, a millisecond is a very, very
long period of time and many software instructions (a few hundred
thousand) can be executed during this interval.
Notes from the developer of nanex:
Speed was the most important feature during the development of the [nanex] compression algorithm. It was expected
that there would be the usual trade-off between size and speed, that
is,
you can have a fast algorithm but with average compression -- or you
can have great compression but at slow speeds. During the first few
thousand hours of developement, this was indeed the case.
But then a
point came, around the 32nd major code branch, where both speed and
compression started improving together almost in a lockstep fashion.
When the compression first exceeded 90% [10 to 1], and the speed was still well
within acceptable levels, there was considerable skepticism and a
strong belief that something was wrong: either the compression was
missing something, or the tools measuring the results were flawed (both
of which had occurred many years prior). Even
after verifying the results several times using different methods, with
the algorithm continuing to improve to the point of dropping the output
size almost in half yet again, and speed even faster than before, I did
not accept the results completely.
It was only after months of
consistent use and testing from that point, that I came to accept it
for what it was -- an algorithm that produced results far beyond what I
ever imagined or hoped for -- to the extent that I became
comfortable using the word "magic" when describing it.
Eric Scott Hunsader
July 2003
|
The time to
compress one OPRA packet is a remarkably steady 12 microseconds. The chart in
figure 6 shows the average compression time per packet in 1 second
intervals. A typical OPRA packet contains 15 quote messages.
Figure 6.
Average time in microseconds to compress one OPRA packet at once second
intervals
|
Nanex compression can be used at
the message level (it does not need to read the entire packet to begin
compressing). However, since most existing software will processes
OPRA data at the packet level, the analysis for this paper focused on
the packet level for easier comparison and understanding. Nonetheless,
figure 7 is presented below, showing the time spent compressing on a
message level. Note the scale is in nanoseconds.
Figure 7. Average
time in nanoseconds to compress 1 OPRA message at 1 second
intervals |
Figure 8 below, compares cpu times other common operations for
reference and context. All operations were compiled for maximum
performance and executed on the same hardware. The exception is the
entry for the Interrupt response time of the newly released SLERT, Suse
Linux Realtime, product which is included for context. Note that nanex
compression time is a little over two inline memory copy operations.
Figure 8. Timings of common operations on a 991 byte OPRA packet. |
Figure 9 shows the distribution of compression timings (in
microseconds) for one day of OPRA packets. Note the surprisingly narrow
range of timing results which is a rare and unexpected characteristic
of compression algorithms in general. Nanex
compression time is remarkably predictable: an
invaluable benefit for time sensitive tasks. For
example, a task that accumulates multiple OPRA streams into one could
better determine whether there was enough time to compress and include
another packet in the outbound stream or not.
Figure 9.
Distribution of time to compress one OPRA Packet. |
Another task that benefits from
consistent and predictable compression
processing time is the determination of the maximum compression rate
for a given processor. Figure 10, shows the estimated maximum
processing rate in messages per second compared to the actual messages
per second on October 18, 2006. The estimated maximum sustained rate is
over 6 times the actual peak rate.
Figure 10.
Estimated maximum sustained processing rate in messages per second
on a 3.2 GHz Xeon processor (one thread).
|
Figure
11 presents a pie chart view of the distribution of
packet compression times. Note that 99.9% of packets are
compressed in 20 microseconds or less, and 25% of packets are
compressed in less than 10 microseconds.
Figure 11.
Pie chart showing the distribution of packet compression times for the
first 1 million OPRA packets on October 18, 2006.
|
As expected, the time to uncompress a packet is
much shorter than the time to compress it. Furthermore, uncompression
time is even more constant and predictable than compression time.
Figure 12 shows the distribution of uncompression times for one trading
day. The most frequent uncompression time is just 7 microseconds.
The uncompression time would smaller by 2 microseconds or more if the
packets were uncompressed into a more efficient and ready to use binary
format rather than the ASCII format used in the OPRA specification.
Figure 12. Distribution of time to uncompress one OPRA packet |
Comparison to WinZip
Compression
algorithms used by programs such as Winzip, zlib,
and gzip, are not written for real-time streaming applications. They
require a large "window" or sample of data before compression begins.
Having a large sample of data to scan for redundancy is a significant
advantage
to compression so it's not really a fair comparison. Nonetheless,
Figure 13 shows the results of compressing all the OPRA packets as if
it were a single file compared to nanex compression. The raw OPRA Direct packets would require
a file over 83 gigabytes. Winzip's compresses the file to just under 20
gigabytes (20 billion bytes), yielding a ratio of just over 4 to 1.
Nanex compression produces a file that is just 4.4 gigabytes, a ratio
of 19 to 1.
Figure 13.
Comparison to Winzip in size (bytes). Smaller is better.
|
Figure 14 shows the comparison of compression times between
WinZip and nanex compression on a day's worth of OPRA packets in a
file. Not only is nanex compression 4 times smaller than WinZip, it's
also nearly 4 times faster.
Figure 14.
Comparison to Winzip in time (minutes). Smaller is better.
|
Figure 15.
Comparison of archived packet file sizes.
|
Conclusion
Nanex
compression produces very small packet sizes quickly and consistently.
It can easily handle OPRA's rapid growth in message update rates. With
it's ability to automatically adapt to changing input data, it is
likely to continue to run for many years without maintenance or changes
to the algorithm.
|