In this video we're going to take a look at how we code text in binary. We'll describe the ASCII and Unicode coding systems for coding character data and explain why the Unicode system was introduced. In most computers bits to group together in 8-bit bytes. A byte can hold 256 different possible combinations of zeros and ones.
This means 256 different characters can be represented. We use one byte to hold or to code or to relate to one unique character. Here are the first 128 characters of the ASCII character set, along with their unique binary codes.
As you can see from this picture, the standard ASCII character set doesn't actually use the 8. most significant bit. In this diagram it's always zero. This is the last digit and it's often used for error checking information.
Even so, by using seven bits we still have 128 unique codes from 7 zeros through to 7 ones and this is enough to store all the lower case and upper case English alphabet the digits 0 to 9 punctuation and special characters such as percent, dollar sign and hash, and also certain special non-display signals such as null, the escape key and an acknowledgement. Over the years, different computers have been used to detect and detect the error. computer designers have used different sets of codes for representing the same characters.
Two computers will need to have the same character set installed for documents produced in them to be readable. These days most personal computers use ASCII, the American Standard Code for Information Interchange. Computers can however have more than one character set installed.
So why do we need other character sets such as Unicode? Taking a look at this ASCII character set again, you can start to see a problem. It's already pretty full. What happens when we need to code and represent other special characters that exist in various European languages? The problem becomes much worse when we take into account languages that have diagrammatic fonts, such as Chinese and Korean scripts.
There is simply no room left in the ASCII character set, no more unique values to represent all these different characters. The solution is to create a character set which is larger than one byte or eight bits. There are various versions of Unicode character set available, but each one is at least 16 bits long.
This gives us a vastly increased number of combinations of zeros or ones. The latest version of Unicode, has enough combinations of zeros and ones to contain over 120,000 characters covering 129 modern and historic scripts as well as various symbol sets. This is why it is called Unicode.
It's a character set which can be used to universally encode the text of almost all modern and ancient languages. The downside however to using a universal character set such as Unicode is it now requires over twice as much space to represent each character than ASCII did.