Understanding DSP Number Formats

When we’re looking at number formats for DSP, we have to ask ourselves 2 questions. What’s the dynamic range of the data format used, and what’s the accuracy, or rather the degree of error that results from calculations. The dynamic range is just the range of distinct numbers that can be represented in the number format of n-bit. For fixed point representation, the dynamic range is always 2 power n. The dynamic range hints at how much headroom is available as well. What’s headroom? Well headroom is the amount by which a signal can exceed the nominal limit that you’ve constrained that signal to be within, and the ability of the system or the number format to handle this excess in a way that the exceeded signal is faithfully represented and not clipped in any way. Let’s say for example we have a 16 bit fixed point format, we’re using the entire dynamic range available for representing audio sample data. This could be 2^15 -1 to - 2^15 for pure integers, or 1 to -1 for pure fractions, of course it’s not really 1 to -1, I’ve just rounded it up. Audio signals that swing from -1 to 1 will utilize the entire dynamic range available within the fixed point format. So, we’re saying that the nominal value for audio is between -1 and 1. To find the headroom that’s available, we need to find out what happens when the audio signal exceeds this range. Well there is no more space available beyond the range. As soon as the signal dips below -1 or beyond 1, an overflow occurs. Previously I said an overflow would result in the value wrapping back around to the other end of the range. Strictly speaking, this is not true, and the result of overflow is actually undefined behaviour and depends on system to system. So obviously, overflow has to be accounted for and dealt with at every step of the calculation, usually by clipping the signal and holding the signal values at the extreme end of the range when the signal exceeds the range. So technically, there is no headroom when dealing with fixed point audio. Now this is a problem in signal processing. A lot of the time, intermediate stages in a signal processing pipeline can push a signal above it’s nominal range, and gain stages after could bring the signal back into the nominal range for audio. At times like these, headroom is really important for the signal to grow and expand in intermediate stages. Without headroom, the signal is always clipped in the intermediate stages when the signal grows, and when gain stages are applied to bring the signal level down, the gain of the already clipped signal is reduced. So, yes, we can be considerate and check for overflows when they happen, but there is no good way to deal with overflow, since there is no headroom for representing the values that flowed over, unless of course you could save the intermediate results in a larger register. This is pretty much what all fixed point DSP processors do, to avoid the problem, they have larger registers when performing calculations internally. As an example, the 56k family of DSP chips that are produced by Motorola, is called 56k, because the accumulator that is present to hold and accumulate values during operations performed on 24-bit audio, is 56-bits in length. So, it’s fair to assume that there is enough dynamic range and headroom within these processors in the intermediate stages. Let’s look at accuracy now. We know that in fixed point numbers, the difference between consecutive numbers is constant. This is perfect for addition and subtraction. Addition or subtraction of two fixed point numbers are perfectly accurate, with no error that can accumulate. But obviously, and I’m gonna sound like a broken record now, overflow is still a concern and has to be checked, but I’m going to assume for the rest of the video that this is handled. But what about ratios though. Not all is well and good when dividing a number though. We already saw earlier that integer division that results in fractional components are error prone, essentially the fractional component is discarded, that is the error. If the numbers are large, the error relative to the number is quite small. But when the numbers are small themselves, the relative error is quite high. This is very detrimental for audio signals, where signals levels are low. Relative error will be much higher in these signals, which is not ideal. If you want to quantify the error, there is a likelihood that the error will be as high as 0.5 of an LSB, if you’re rounding to the nearest integer, and as high as 1 LSB, if you are truncating the number discarding the fractional part completely. Take the number 3. Dividing that by 2, we get 1.5. But in integer representation, the fractional part is either discarded or rounded to the nearest value, in this case, 1. The difference between the expected value and the actual value is the error. Here’s a graph with the x axis representing integer numbers from 0 to about 300. The Y axis represents the relative error associated with a number divided by it’s adjacent number. As we can see, as the number reaches 0, the relative error increases. The same is true on the negative axis as well. For a single calculation, this error, which could be anywhere between 0 to half an LSB, is not very significant. You could get away with inaudible error components. But when a signal goes through a calculation, and then is fed back again, and when this feedback loop occurs many times per sample, a small error during the first step of calculation could easily accumulate and grow and become inaccurate over time. Though multiply accumulate operation probably occur in larger length registers, the degree of error that could arise from recursive filter paths could be unacceptable if you need pro quality audio, or when your signal path is non deterministic, and you don’t know how many more calculative steps are in your signal path. Fixed point numbers have traditionally been associated with storing digital audio. For playback, this is still the preferred format. DSP processors in some of the cheaper or low powered devices are fixed point processors, and that’s all they can handle. During playback, you generally don’t tend to perform any complex signal processing, and fixed point format is perfectly fine and efficient. But today, in computer audio, we rarely use fixed point format for signal processing, and rely primarily on floating point numbers, either in its single precision or double precision form. The reasons for this are numerous, but there are also several nuances and pitfalls which make floating point a hairy format to work with. With floating point format, the devil is in the detail, and we’ll take a deep dive into it, and uncover all the advantages and fallacies of floating point audio starting from the next video.

Transcript for:Understanding DSP Number Formats

Transcript for:
Understanding DSP Number Formats