Computer Science is Easy

January 6, 2021

How is sound encoded digitally?

In the real world, sound is made up of varying pressure waves, typically travelling through the air. Sound can also be considered as a form of analogue data, in that it is continually varying. Unlike digital data which is broken into discrete values and discrete parts, sound occurs in a continuously varying manner over time[1]. Analogue sound has two properties that are continuously varying: frequency[2] and amplitude[3].

Since these two properties are continuously varying, it’s impossible to capture all the data contained in analogue sound. Digital sound is encoded by capturing a reasonable approximation of the sound by taking samples of the amplitude at regular intervals. While the basic process is very simple, there are still a number of concepts and variables that must be understood to achieve the right balance of quality and file size:

Sampling: The process of taking digital measurements of the amplitude of a sound wave at regular intervals (generally in the tens of kHz), which are encoded as a series of binary values. The amplitude is captured by a microphone and must be converted into digital data by an Analogue to Digital Converter (ADC) prior to sampling. Samples are stored in the order they are taken, and the result is a digital recording that is a good approximation of the original.
The diagram below shows the relationship between the original sound wave (dotted line), the samples (white X symbols with corresponding digital values), and the resulting approximation (solid line):
Sample: A numeric value, encoded in binary, that represents the amplitude of the sound wave at a specific point in time.
Sampling rate: The number of samples taken per second, measured in Hertz. The sampling rate is one of two factors that affect the overall quality and file size of the digital recording. A higher sampling rate allows the sound wave to be captured more accurately over time, giving a better frequency range for the recording. This allows for more detailed sound to be recorded more accurately but also increases the file size.
Sampling resolution: The number of bits per sample, which determines the number of discrete values that a sample can take. This is measured in bits (normally 8 bits, 16 bits, or 24 bits per sample) and is sometimes called the sampling depth. Higher sampling resolutions result in a higher quality sound, as sampling resolution affects:
- Dynamic range: The amplitude of the loudest and softest sounds that can be recorded. A higher sampling resolution allows for a wider dynamic range accurately.
- Quantization: When an amplitude value is sampled, it must be assigned the digital value closest to it. The difference between the original amplitude and the digital value of the sample is called the quantization error. When there are only a small number of values to choose from, this will be significant and will result in a noticeable loss of quality.

As always, we must choose the most appropriate sampling resolution and sampling rate to meet our needs. A higher-quality recording will require a correspondingly larger file size, which requires more storage space, more memory, and more time to transmit, as well as potentially being too big to send as an email attachment.

One widespread standard for digital sound recording, CD-quality sound, is designed to match the capabilities of the human ear. It uses a sampling rate of 44.1 kHz and a 16-bit sampling resolution. These values have been specifically chosen to accurately represent any sound the human ear can perceive, making CD-quality sound as near to the original as possible while not wasting storage space (which would limit the amount of music that could be stored). The extra 4.1 kHz in the sampling rate covers recording losses and possible artefacts that must be filtered out.

No matter how much detail we choose to encode, a digital recording will always contain less information than the analogue sound it represents. However, humans can only hear sounds between a very specific frequency range (between 20 Hz and 20 kHz), over a dynamic range of around 90 dB. The Nyquist Theorem states, to accurately record a sound, the sampling resolution must be at least double the highest frequency required, which gives us a minimum sampling rate of 40 kHz. A 16-bit sampling resolution is sufficient to cover a dynamic range of 90 dB. This means it’s possible to produce digital recordings that are theoretically indistinguishable from the original analogue sound with a sampling rate of 40 kHz and a sampling resolution of 16 bits.

Calculating file size for digital sound recordings

As has been mentioned above, file size is affected by the sampling rate and sampling resolution, but there are two more factors to consider:

The duration of the recording: Longer recordings use more samples and require more storage.
The number of audio channels: A stereo recording contains two separate channels (one for the right speaker and another for the lest), and requires twice as much data as a mono recording, which only contains data for one speaker.

To calculate the file size of a digital recording, use this formula:

FILE SIZE = DURATION IN SECONDS × SAMPLING RATE × SAMPLING RESOLUTION IN BYTES × NUMBER OF CHANNELS

It is a fairly common-sense formula when you understand the variables involved. As always, read the question carefully.

Features of sound editing software

A major advantage of digital sound is that it can be edited again and again, without any risk of generation loss[4], making it the standard for producing industry-quality sound recordings. Digital sound editing software offers a lot of features that give the user a great deal of control over the end product, including:

Cut, copy, and paste: As with any editing software, the ability to perform these basic operations is of great importance.
Multiple tracks: Several digital recordings can be combined into one final product, with each recording held on its own track. This allows for complex ‘layering’ of sounds to create a sophisticated result.
Timeline adjustments: The sound in each track can be moved along the time axis so that it starts sooner or later.
Pan: Every track can be ‘panned’ so that it is biased toward the left speaker, the right speaker, or balanced between the two.
Gain: The ‘loudness’ of each track can be set using the gain control, allowing for different sounds to be blended together in a balanced fashion.
Filters: From simple amplification to equalisation and distortion, filters can be applied to tracks to gain the desired result.
Exporting: The final sound can be exported as any of a number of digital sound formats.

[1] It might help to consider a non-computing based example: A light switch can be either on, or off, with nothing in between. It has two discrete states. A dimmer switch can be on, off, or anything in between, giving it an infinite range of possible settings. The former is analogous to digital data, and the latter is a form of analogue data.

[2] Frequency refers to the number of complete sound waves (cycles) per second. Low-frequency sounds are responsible for the ‘bass’ end of our hearing range, while high-frequency sounds give us the more detailed sounds. Frequency also determines the pitch of the sound, with higher frequencies giving us higher-pitched sounds or musical notes. Frequency is measured in Hertz (Hz), with 1 Hz corresponding to one complete sound wave per second, and 1 kHz representing 1000 cycles per second.

[3] A basic understanding of this can be reached by considering amplitude to be the ‘loudness’ of the sound at a given point in time.

[4] Generation loss occurs when an analogue copy is taken of an original. The copying process is not perfect, as the analogue signal cannot be read perfectly from the original or written perfectly to the copy. The more times this occurs, taking a copy of a copy of a copy, the more pronounced this effect becomes.

January 6, 2021

What is Binary Coded Decimal (BCD)?

One of the limitations of binary is its inability to store certain fractions accurately. For example, it is not possible to store 0.1 in binary (it’s an irrational number when that radix is used). This leads to rounding errors when using floating-point arithmetic, which can cause significant problems, especially when the arithmetic is performed within a loop, compounding the issue.

Binary Coded Decimal (BCD) solves this problem, among others, by encoding each denary digit in its own nibble. Consider the binary integer encoding for 204:

This value is completely different in BCD, where the priority is not to encode data as efficiently as possible, but represent each denary digit as a separate unit:

Of course, this is inefficient; a nibble can store 16 values, and BCD only uses each one for a value between 0 and 9. But the separation of digits provides some advantages:

No rounding errors: When digits are stored individually, they can be stored perfectly accurately, eliminating rounding errors.
Simpler electronics can be used: Some devices are very simple, and it is not worth including the circuitry required for them to interact with binary integers. Using BCD allows these devices to be created with much simpler electronics[1].

Common applications of BCD, predictably, include devices that benefit from one or both of these advantages:

Calculators: Using BCD avoids rounding errors. Also, the user inputs numeric values one digit at a time. Using BCD allows the calculator to support this input style without adding unnecessary complexity.
Alarm clocks: While it is very possible to store the current time, and the user’s chosen alarm time, as binary integers, the circuitry involved is quite complex. It is much simpler to store the times in BCD and only sound the alarm when all four digits in the current time match the four digits in the alarm time.
Simple embedded systems: Temperature alarms, air conditioners, microwaves, and more. These devices might not use BCD in reality (it depends on the choices made by the designers), but they are good candidates. For this reason, they might be featured in an exam question, so it’s worth bearing them in mind.

BCD has its disadvantages. The main issue is that it is a less efficient way of storing denary values, requiring roughly 20% more memory.

[1] This isn’t universally the case. General arithmetic using BCD requires 10-20% greater complexity when compared to using binary. Nonetheless, for simple devices, BCD is simpler.

January 2, 2021

Bitmap graphics vs. vector graphics: What’s the difference?

The differing approaches to encoding an image as a bitmap or vector result in different properties, advantages, disadvantages, and applications. These are shown in the table below.

Bitmap graphics	Vector graphics
Suitable for photographs, scans, and images with continuous tones.	Suitable for line drawing, diagrams, logos, and other images that use blocks of colour and geometric shapes.
Quality degrades when image is enlarged or user zooms in, as pixels become noticeable (the image becomes pixelated).	Quality remains perfect when image is enlarged, as the shapes are simply rendered again at the new size. A vector image can be printed and displayed at any size, with full accuracy and quality.
File size is larger than an equivalent vector image.	Smaller file size than equivalent bitmap image, as long as the image is suitable to be encoded as a vector. This is because the vector format only stores data about shapes, rather than needing to store the colour of every pixel in the final image.
Bitmap images are suitable for compression.	Vector images can be compressed, but they are already encoded in a very minimalistic format and compression will not result in a noticeable reduction in file size.
Bitmap graphics is simple to render (display or print), as the computer only needs to read and display/print the pixel data.	Vector graphics must be interpreted by the computer and each pixel must be rendered from scratch before it can be displayed/printed, which is a more complex process requiring more processing power.
Bitmap images do not store layers or separate objects; everything is flattened down into a raster of pixels upon saving.	Vector images include layer data and all shapes are kept separate, allowing for editing later.

January 2, 2021

What is vector graphics and how does it work?

In contrast to bitmap graphics, vector graphics encodes images as a set of geometric shapes. Each shape has specific properties (size, fill colour, outline colour, location, etc.) and is called a drawing object.

Here are some quick definitions of vector graphics terms:

Drawing object: A geometric shape that forms part of a vector image.
Property: A variable that is used to define a characteristic of a drawing object, such as its colour.
Drawing list: The set of commands that define the drawing objects and their properties for a vector image.

A vector image is created, one drawing object at a time (normally by the user, manually). The final drawing list will be stored as a file, encoded using an appropriate vector graphics format. SVG (Scalable Vector Graphics) is one such format, and a simple file is shown below to illustrate the data it stores and the structure it uses:

Properties of vector graphics editing software

Software for editing vector graphics will provide facilities for managing layers, aligning objects with each other or spacing them equal distances apart, applying gradient fills, applying colours, and so on. Most applications also provide facilities for more advanced operations, such as combining shapes, subtracting one shape from another, or adding two shapes to give only the area they both have in common.

January 2, 2021

What is bitmap graphics and how does it work?

A bitmap image is made up of small dots called pixels (short for picture elements). Each pixel has a colour and a location, and they are arranged in a grid (called the raster). The pixel is the smallest part of an image that can be stored and displayed by the computer. When displayed together in the proper arrangement, these pixels form an image.

Each pixel’s colour is stored using the intensity (i.e. brightness) value for three channels, which correspond with the three primary colours of light: red, green, and blue. Any colour can be made by mixing these three colours in differing intensities. This colour encoding scheme is known as RGB.

The most common standard is to use 8 bits to give us a range of 0-255 for each of the red, green, and blue values. This means that we require 24 bits per pixel. This can be seen in the colour picker included in software such as Microsoft Office:

An RGB value of 0, 0, 0 would correspond to black, and a value of 255, 255, 255 would be white.

To encode an entire image in binary, the RGB values for each pixel are converted to binary, and stored together in the order that they will be displayed, starting with the top-left pixel row by row until the entire image has been stored.

Before we move on, it’s worth covering the key terms used in bitmap images:

Pixel: Short for picture element, the smallest addressable part of a bitmap image. It has a colour, which is encoded in binary, and a location, which is determined by its order in the bitmap file (i.e. the first pixel in the bitmap file is displayed in the top-left of the image, and the last pixel is displayed in the bottom-right).
Bitmap graphic: A digital image made up of rows of pixels, displayed together in the correct order.
Colour depth: The number of possible colours that could be assigned to a pixel in the image. Colour depth is expressed in terms of either the number of bits per pixel, or the number of colours in the palette. A higher colour depth results in a higher quality image where the pixels are encoded with colours very close to the original image. Lower colour depth will reduce the image quality, with noticeable bands of colour where similar colours in the original image have been assigned the same colour during the encoding process, but the image will require less storage.
Bitmap header: An area at the start of a bitmap file containing the height and width of the image (in pixels), the colour depth, and a declaration that the image is a bitmap image (rather than a different file format).
Image resolution: The width and height of the image, measured in pixels (e.g. 1024 x 768). Higher resolution images contain more data and are sharper and/or can be displayed at a larger size. Lower resolution images require less storage but at a lower quality.
Screen resolution: The number of physical pixels available in a display. This can refer to one of two standards:
- Total number of pixels: In these cases, the resolution is expressed as the width in pixels and the height in pixels. For example, a Full HD screen is 1960 x 1080 pixels.
- Sharpness: In these cases, the resolution will be expressed in terms of the number of pixels per inch (PPI). Larger numbers of pixels per inch result in a sharper image that can display more fine detail. Lower PPI values result in a screen with visible divisions between pixels and a less sharp image.

The most common colour depths

There are many colour depth standards available. Here are the ones you should learn as a starting point:

1-bit (monochrome): Two colours (normally black and white) are available
4-bit: Sixteen colours (most games in the ‘80s used 4-bit graphics)
8-bit: 256 colours (most PC games in the early ‘90s)
16-bit: 65536 colours
24-bit: 16.7 million colours
32-bit: 16.7 million colours, plus an 8-bit value for transparency

It’s worth noting that colour depth can be expressed in any number of colours, or in any number of bits. For example, a 13-colour bitmap image will require 4 bits per pixel (as this is the minimum number of bits required to store one of 13 distinct values). While these don’t come up in the real world any more, questions have appeared in past papers using arbitrary colour depths. As always, read the question carefully.

Estimating the file size of a bitmap

The file size of a bitmap image can be estimated using the following calculation:

FILE SIZE = IMAGE WIDTH × IMAGE HEIGHT × COLOUR DEPTH IN BYTES

Be careful: colour depth is generally expressed in bits, but the calculation requires you to convert the colour depth to bytes first. A monochrome image uses only 1/8^th byte per pixel. If you fail to account for this, you will get the wrong answer.

Properties of bitmap editing software

Bitmap graphics packages provide tools that are tailored to the capabilities of the image format, including adjustments for brightness and contrast, colour correction, sharpen/soften, background removal tools, and filters (e.g. sepia tone and blur).

January 1, 2021

An introduction to ASCII and Unicode

ASCII is a character encoding standard where every character (letter, number, or symbol) in the text is assigned a numeric value in the ASCII character set (the set of characters that can be stored, used, and understood by the computer). Encoding is a simple process: look up the character’s value in the ASCII table and store its binary equivalent.

The standard ASCII character set includes 128 characters with numeric values from 0-127. Some of these are special characters that are no longer in use[1]. Each character requires 7 bits of storage (which leaves the most significant bit available to be used as the parity bit for data transmission). There is also an extended character set which uses all eight bits available to give 256 characters, with numeric codes from 0-255. Part of the ASCII table is shown below (the SPACE character, which is number 32 in the ASCII table, and non-printing characters, are not shown):

33	!	53	5	73	I	93	]	113	q
34	“	54	6	74	J	94	^	114	r
35	#	55	7	75	K	95	_	115	s
36	$	56	8	76	L	96	`	116	t
37	%	57	9	77	M	97	a	117	u
38	&	58	:	78	N	98	b	118	v
39	‘	59	;	79	O	99	c	119	w
40	(	60	<	80	P	100	d	120	x
41	)	61	=	81	Q	101	e	121	y
42	*	62	>	82	R	102	f	122	z
43	+	63	?	83	S	103	g	123	{
44	,	64	@	84	T	104	h	124	\|
45	–	65	A	85	U	105	i	125	}
46	.	66	B	86	V	106	j	126	~
47	/	67	C	87	W	107	k
48	0	68	D	88	X	108	l
49	1	69	E	89	Y	109	m
50	2	70	F	90	Z	110	n
51	3	71	G	91	[	111	o
52	4	72	H	92	\	112	p

Encoding text into binary using the ASCII table is a very simple process:

Split the text into characters
Find the denary equivalent of every character by looking it up in the ASCII table
Convert each denary value into its binary equivalent

For example, the text “Hello” would be encoded like so:

To decode the binary back into plain text, you would reverse the process.

ASCII is very simple, but has some limitations:

Only 128 characters can be represented in ASCII (or 256 if using the extended character set)
The extended character set is not always encoded the same way for all systems (for example, Microsoft does not use the same character set as other publishers, which can lead to problems).
128 characters is plenty for the Latin alphabet, but nowhere near enough to include characters for languages that utilise different alphabets

As computer hardware decreased in cost, and international support became more and more of an issue, Unicode was developed to allow for a wider range of characters to be encoded in binary.

The first significant difference between Unicode and ASCII is that Unicode uses at least 16 bits to encode each character. This allows for character sets containing up to 65536 symbols. The Unicode standard also supports multiple character sets called planes. There are 17 planes (the Basic Multilingual Plane, which contains the characters used for most languages and purposes, and 16 additional planes for languages such as Chinese, which contains around 50000 symbols).

Even though Unicode exists to support the massive number of characters used in alphabets and other applications worldwide, it has also been designed to be as efficient as possible by employing some tricks:

Unicode is a superset of ASCII: This means that Unicode supports every ASCII character, and much more besides. The standard ASCII character set runs from 0 to 127, requiring only 7 bits per character. As a result, the MSB is always 0 for an ASCII character. Unicode is designed so that none of its characters (outside of the standard ASCII character set) start with a ‘0’. This makes it possible to recognise ASCII characters and encode them in only one byte (rather than the 16 bit minimum for a Unicode character).
Basic Multilingual Plane characters require 16 bits: The developers of Unicode know that most characters will be either present in the ASCII character set or come from the Basic Multilingual Plane. Characters from the latter only require 16 bits per character.
Characters from other planes require 24 or 32 bits: Uncommon characters and those from less common languages require 24 or 32 bits each. This maximises support for many alphabets while keeping storage and transmission requirements as low as possible.

Unicode is also completely standardised, meaning that every character is encoded in the same way on every computer that uses the standard, no matter where it is or which manufacturer made it.

[1] For example, the LF or Line Feed character would be used to move paper through an electric typewriter by one line. This is no longer used, of course, but at the time that ASCII was created it was necessary, as almost all output from a computer was sent to an electric typewriter rather than a screen.

Simplifying Computer Science for everybody

Calculating file size for digital sound recordings

Features of sound editing software

Properties of vector graphics editing software

The most common colour depths

Estimating the file size of a bitmap

Properties of bitmap editing software