Why custom number systems?
Historically, real-time systems have used custom number systems to be able to satisfy latency or resource constraints. For example, data acquisition systems for high-performance instrumentation, satellite, and wireless communication systems have used specialized fixed-point numbers to optimize power efficiency while still maintaining real-time requirements.
When computer systems generalized in mini-computers and personal computers, the IEEE floating-point standard IEEE-754 evolved to a robust number system that offered more flexibility and portability than custom fixed-point systems to address the growing needs of computational science and engineering software. Careful considerations around precision and dynamic range created a number system that was useful to a broad range of use-cases.
However, IEEE-754 contains a couple of gotchas that high-performance applications have been tripping over consistently through the years. The design to round on each floating-point operation causes the number system to not be algebraically distributive nor associative, which creates problems in parallel execution. As parallelism tends to be dynamic, the order of the rounding operations becomes dynamic as well, and with it, computational results are no longer reproducible.
More egregiously, the limited number of formats has caused insurmountable performance problems for a key modern application: Deep Learning. The energy consumption and memory bandwidth of single-precision floating-point in inference applications was a limiting factor, and in the past decade, all large AI efforts have ditched IEEE-754 in favor of new, custom, floating-point formats, such as Google's bfloat16, and NVIDIA's TensorFloat.
For energy-constrained systems ranging from autonomous vehicles and robots to edge and cloud infrastructures, and even supercomputers, redesigning applications to use mixed-precision algorithms that select number systems that are perfectly tailored to the precision and dynamic range requirements of the computational stage can offer a 100x performance benefit.
The Universal Number Library was designed to enable application and algorithm designers a high-performance and productive platform to pursue these economies of scale.
Last updated