To simplify our understanding of the conversion from analog to digital signals, we’ll start from the simplest periodic analog signal, the humble sine wave. We had previously derived the sine wave function, and it's represented by this equation. The equation is not really important in this context. But what is important is that the sine wave is in essence a 1 dimensional function. Though I may have represented the sine wave as a squiggle shape on a 2 dimensional plane, it’s actually a single valued spatial point at any given time, if you don't consider time as a dimension of course. As you progress time, we can imagine that the sine wave acquires a different value. How many unique values can it have? Infinitely many of course. The sine wave is said to be a continuous function over time. Meaning that, no matter how close two time intervals are, you wouldn’t be able to find a single break in the signal, nor loss of resolution nor staggering of the shape of the wave function. The signal is continuous. A clear advantage of digital signals over analog signals is in it’s storage capabilities. How then are we supposed to store a signal which has infinitely many points? All the hard drives in the world could not accommodate the number of unique and distinct signal values that this one sine wave function can produce. Obviously, for sanity's sake we wouldn’t want to store all the values, only enough values to closely resemble this wave function. So what we want to do is break apart the wave and make it discontinuous, to produce values only at certain points in time. To convert it into a discrete signal. The process of transforming an analog signal to a digital one begins here. By asking the question, how often do we want to measure the signal. Or how many samples do we want to have within a given interval of time? In this example, we have 30 samples within a 1 second interval of this 1Hz sine wave. We can also say that we have sampled this wave every 0.03 seconds. This seems like a fairly accurate representation of the sine wave right? We can also go lower, by sampling it 20 times a second, or 8 or even 2, or 44100 times a second. What’s the right number? How many samples are enough to accurately represent this wave then? Well thankfully, it’s not as arbitrary as you think. Continuous to discrete signal conversion is governed by the Nyquist Shannon Sampling Theorem, which states that any band limited continuous-time signal can be accurately converted to and from digital signals when sampled at a rate, at least twice as high as the highest frequency component of the waveform. Sounds complicated, but let’s break it down. The sampling theorem states 2 important facts. 1) That to represent an analog signal in the digital domain, the number of samples that we would need for each second of the signal should be more than the highest frequency represented by the signal. If a signal has a 10kHz frequency component, the sampling rate needs to be more than 20khz. 2) It also states that the analog signal needs to be bandlimited to the highest frequency and cannot contain any frequencies above that. Let's put a pin on the second point, and just talk about the first. In our case, we had a 1Hz sine wave. This is a constant 1Hz wave, and 1Hz is the highest frequency that this wave function will ever produce. THe sampling theorem states that, to digitally represent this wave, we need at least more than 2 sample points. Let’s say 3. Any more is fine as well, but the minimum required is 3. We can say that the sample rate is 3Hz. Hertz is an overloaded term here, and not to be confused with the frequency aspect of the wave. The reason that both cycles per second (which is the frequency) and samples per second (which is the sample rate) have the same derived unit of Hertz is because they both represent the same basic unit of 1 over seconds. This is how a 1Hz wave could be sampled using 3 sample points. You might wonder at this time that 3 points are clearly not enough to represent this wave correctly without losing a lot of resolution, I mean, if this sampled waveform would have to be converted back into an analog signal, there is room for interpretation right? The signal could be rendered in several different ways. Surely if the 1Hz wave were sampled, at say, 40Hz sampling rate, there would be less room for divergence, and you would get a more accurate representation right? Well the short answer is no! This is where a lot of misconceptions of higher sampling rate = better audio quality comes from. The absolute beauty of the sampling theorem is that it proves mathematically that both the 3 Hz sampled waveform and 40Hz sampled waveform of a 1Hz wave, when reconverted back into its analog form doesn’t just produce the same output signal, but produces a signal which is indistinguishable from the original analog input signal before conversion. The input and output signals would be essentially carbon copies of each other with no loss in resolution. This seems really strange, because you are essentially stripping out information and discarding it when converting it into a digital signal, and then when you reconvert back, how is the missing information magically reappearing? Well this is where the second point comes into effect, the signal would need to be band limited. This means that an input signal going into the conversion process is stripped of all frequency components above the maximum of 1hz and the digital signal once reconverted, is again stripped of all frequency components higher than the maximum of 1hz. This can be accomplished by using low pass filters. A theoretical low pass filter would only allow the passage of frequencies lower than a set threshold, and would block all frequencies higher than the threshold. I mentioned a theoretical low pass filter here, since practical filters can’t simply cut off abruptly at a certain frequency but rather ramp down smoothly. More about this later. But let’s now look at a more audible example. I’m going to use Audacity to demonstrate some of the examples here. If you haven’t heard of audacity before, it’s a free and open source digital audio editor, and you can download it from the link below. I’m going to change the project sample rate to a low value. I’m going to choose 8000 as the sample rate. So according to the sampling theorem, we’ve learnt that we can only hope to represent frequencies below half that rate. We can represent frequencies till 4000Hz and no more than that. Let’s try to generate a sine wave, and we’ll go to Generate and Tone, and then we can choose a frequency here. You can see that Audacity will not let me generate a tone higher than 4000Hz. We’ll bypass this failsafe later and see what would happen if we choose a frequency over half the sampling rate. But for now, we’ll choose something more sensible. Say about 1Khz. Sounds like 1Khz. We can quickly select a bit of the audio signal, go to Analyze and Plot Spectrum, and we can indeed see that it represents a 1Khz pure tone. I’ll admit, this is a bit of a cheat. Because we are not generating an analog 1kHz signal and digitizing it, but rather building the sine wave in the digital domain and then sampling it. But it’s fair to assume that the end product is the same, and without having a physical analog signal generator sitting beside me at the moment, this will do fine. Now, let’s explore the fringe of possible frequencies. 4000Hz. Exactly half the sampling rate. Audacity lets us do this, but where’s the signal? You can play it, and you can plot it but the signal is nulled out, nothing exists. This is easy to illustrate. Let’s take a 4Khz signal and only sample it twice a cycle. This is the current situation. All sampled points unfortunately sit at the zero valued point of the signal. So, naturally this is interpreted as no signal at all. But increase the phase offset of the signal by a bit, and suddenly the wave comes to life. So, this is the reason why the sampling theorem states that all frequencies lower than half the sampling rate can be represented accurately, and doesn’t say anything about exactly half the sample rate, because it wouldn’t be an accurate representation for all scenarios. So, what about 3999Hz? That’s lower than half the sampling rate? Let’s find out. In audacity, we’ll generate a sine tone of 3999Hz. And you get this bizarre output where the amplitude is modulating from high to low and back constantly. Surely enough, you can hear the 3999Hz signal, and the frequency plot says so as well, except for the fact that the amplitude is messed up. I’ll try to explain this in a little while. But let’s push on, and push up the sampling rate. WE need to create a new project in Audacity, because you shouldn't have audio signals of different sample rates in the same project. In this project, we’ll select our sampling rate to the next available value of 11025Hz. And we’ll produce the same 3999Hz sine signal. This signal is comfortably below half the sample rate, and it’s reproduced quite well. No weird amplitude modulation. The plot shows the correct frequency as well. Let’s zoom in. Eeesh that looks pretty bad. In no way does that look like a sine way, or a periodic function for that matter. But the strange part is that it’s generating a pure sinusoidal tone of 3999Hz. Hmmm, let’s repeat this exercise for a sample rate of 44100. Generate the same tone. And zoom in. That looks more representable, has better resolution because of more data points, but it produces the same tone. No perceptible change whatsoever. What advantage do we have then, when changing the sample rate from 11025 to 44100? We got more data points, but no general increase in quality. Ofcourse, with a sample rate of 44100, you can represent more frequencies than possible with a sample rate 11025Hz, but for the specific frequency of 3999Hz, it offers no advantage. I’ll repeat the same exercise for much higher sample rates. First at 96000Hz and then at the preposterously high 192000Hz, And I’ll lay them all together so you can hear the difference for yourself. Convinced yet, that there is no difference? Audibly there is no difference. But viewing with waveform, it’s dodgy alright. Now, let’s explore why this is the case. For one thing, we’ve been viewing the waveform all wrong. We don't just linearly interpolate between 2 data points and put a straight line between them. This is an inaccurate representation. Let’s fix that. We’re back in the project with 11025hz as the sample rate. We’ll go to Edit, Preferences, Tracks, and we’ll change the display samples to Stem Plot. Much better, because this is exactly what we have in the digital domain, discrete data points and nothing else. Now, when the time arrives when this digital signal needs to be converted into an analog signal for us to listen to it, it goes through a series of electronic circuits. In the simplest case, it goes through a resistor ladder circuit and without unnecessarily complicating this video, it produces a stair step pattern of an intermediate analog signal. So our data points are turned into an analog signal which looks like this. This isn’t the final output signal yet. In this form, the analog signal has a lot of super high frequency components, some would even be ultra sonic frequency components which we can’t even hear. How would you tell this? Well it's not clearly evident from waveform what the exact high frequencies are, but whenever a time domain signal shifts almost instantaneously like this, it means that the cone of the speaker would have to resonate and vibrate according to the signal, and move it’s cone really fast. When events like that occur once in a while, it’s usually perceived as clicks and noise. But if it tends to repeat periodically, you are likely to get a high frequency tone, and the frequency would depend on the slope and height of such shifts. In a complex real world input signal, you would end up getting a complex spectrum of high frequency components in this intermediary signal. The obvious thing to do now would be to cut off these excess frequencies and discard them. And the most obvious place to start would be all the frequencies above half the sampling rate, from which point on will be referred to as the Nyquist frequency. This is because we promised that the input signal would be band limited when it first arrived at analog to digital conversion stage by putting a low pass filter on it with its cutoff as the Nyquist Frequency. Clearly the output signal would need to adhere to the same rules and the same low pass cut off at the Nyquist Frequency. And when the filter is applied, the intermediate signal is transformed into this, which is essentially a smoothed out version, which intersects all the points that were previously sampled. And this is where things get interesting. The signal produced that goes through all these points, is a mathematically unique solution. There can only be one band limited signal that can pass through all these points, and it is the input signal itself. So, the input signal passed through the analog to digital conversion step, is the only possible signal that can be produced at the output stage of the digital to analog conversion. If the output signal were to deviate even slightly from the input signal, it means that it would have to contain frequencies higher than the Nqyuist frequency and it would fail to meet the criteria put forth by the sampling theorem about band limiting. The conversion is mathematically lossless. Ofcourse, analog components are never noise free and the whole process would add a little bit of noise into the signal, but if the number of conversions are kept to a minimum, it’s never noise that can be heard by our ears. It would be stellar to see and hear this intermediate signal with all its high frequency noise. But it’s not possible because the filter right after it is baked into the DAC circuitry, and you don’t have an off switch to switch off the filtering. But what you can see is the output analog waveform. I can’t demonstrate this to you, because anything visual that I can reproduce on the computer screen is still well within the digital format, and you need specialist analog equipment to view any analog waveforms. I, unfortunately, cannot show you this, because I don’t have an analog oscilloscope lying around my house. But I can redirect you to this excellent video demonstration by Monty from Xiph.org, where he introduces a full signal path that you can view, which can help you solidify the concepts that I’m putting forth here. The video is in the description below. Do you remember that I mentioned that the low pass filters cannot cut off frequencies exactly at the Nyquist, but would require a smooth ramp. For this reason we need a bit of a buffer between the highest frequency that we want to represent and the nyquist frequency we need to completely cut off at. This gives us a clue about what could’ve gone wrong with the 8Khz sampling of the 3999Hz signal that was amplitude modulating. Let’s imagine the cut off frequency being implemented by Audacity is at 4Khz or the Nyquist frequency. The practical limitations would not allow it simply cut off all frequencies above that, and there will be some spill over. We know that there will be extra high frequency content, because the intermediate signal will look like this. There will be a gradual reduction of frequency content above the Nyquist, but they’ll still be there. This is in direct violation with the sampling theorem which requires there to be no frequencies above the Nyquist. But what happens when there is. We get the folding of the high frequencies at and above the nyquist, back onto the original spectrum, due to a phenomenon called aliasing. Clearly aliasing puts additional frequencies into the output signal which didn’t exist previously to begin with, and this is a bad thing and we want to avoid aliasing artifacts under any cost. These extra closely represented high frequency signals folded back on to the original signal is what causes the interfering amplitude changes. Let me demonstrate this. Let’s go back to audacity and select a new project with a pretty high sample rate. Let's generate 2 sine tones, one is 400Hz tone, and another is 401Hz tone. The difference between the 2 frequency signals is 1Hz. Now, let's select both tracks, head on over to Tracks, Mix and Mix and render to a new track. What we can see and hear, is an acoustical phenomenon called beating. It’s an interference pattern between two sounds of slightly different frequencies, perceived as a periodic variation in volume. More about this in a module on acoustic. But this is essentially what’s happening in our example here, with aliasing ruining our output signal. So it's well worth it to have a bit of a buffer between the highest frequency we want to represent and the Nyquist frequency above which no frequencies should exist. This is why the beating effect and any aliasing vanished when we tried to represent 3999Hz signal on the next higher sampling rate of 11025Hz. This meant that there was a considerable buffer between the highest frequency content that we wanted to represent, which is 3999Hz, and the Nyquist frequency, which is half the sampling rate at 5512.5Hz. So for all practical intents and purposes, we want to modify the sampling theorem to suggest that instead of sampling at 2 times the highest representable frequency, we want to increase that to 2.5 times the highest representable frequency, commonly called the Engineering Nyquist theorem, not because of the mathematical limitations, but practical electronic limitations. I think this is a good time to step off from this video before it becomes any longer. But we’ve barely scratched the surface here, we haven’t even talked about sampling actual audio here, we’ve only looked at sampling control signals. But this is a good start theoretically, because it enables you to make informed decisions when choosing a sample rate for your project. In the next video, we’ll look at some of the more commonly used sample rates for audio processing and for representing high fidelity audio. We’ll study the advantages and drawbacks associated with these sample rates. We’ll look at choosing a sample rate for a specific situation or application, checking out hearing tests to solidify our concepts and exploring what aliasing actually sounds like and eliminating it for good. See you next in the next one.