S4C2: Digital Data Storage

>>> Click here to access this episode of the Syllab Podcast on Spotify <<<

a) Computer architecture

In Chapter 1, we established the merits of the binary format, both for data and logic, and covered the operating principles and manufacturing of microprocessors. It may feel like this is what digital computing is about but we would be wrong, the data processing is indeed core to the system though only one of its pillars. In isolation, the ability to transform inputs into outputs would not carry us far if we cannot get our hands on the inputs and then make the outputs available either for immediate further processing (where they effectively become inputs) or for later retrieval. This notion of data retrieval and storage is the topic of this second chapter and it is part hardware part software. Which leads us to the third pillar, the instructions. Without instructions, themselves packets of data in need of storage, the hardware would not know which data it should be using, where to get it, what operation to carry out and where to store the output. All this calls for a proper architecture and languages that can speak to the hardware in bits though it is possible to layer on more human-readable and easier-to-program languages on top of this. We will tackle this in Chapter 3, and by the end of it we will get a good understanding of how computers work. For more details on the screen technology you can refer to the Chapter 8 on television and for batteries you will need to wait or jump, depending on when you read this, to S6 Section 5.a.

Right from the start it was clear the flow of data within a computer had to be controlled by a different unit so that, in the ubiquitous von Neumann architecture devised by the famed namesake mathematician in 1945, the central processing unit (CPU) consists of registers, an arithmetic logic unit (ALU) and the control unit, and this CPU interfaces with the rest of the computer via a bus, essentially the data highway, which can be linked to several types of memory hardware as well as input and output devices (“I/O devices”) such as a keyboard, a mouse to direct a pointer, a screen for visual display and perhaps a printer.

The ALU can be thought of as the number-crunching powerhouse; it carries out operations on two integer binary numbers appropriately named operands and outputs a third value based on the logic gates involved for the arithmetic or bitwise operation that was called for. Bitwise operations are essentially simple bit manipulation, for example 0101 XOR 1010 will return 0000, a smart way to reset a counter, and NOT 11110000 returns 00001111, so for an 8-bit unsigned integer NOT x will always be equal to 255 – x (if you are not familiar with the logical operations such as NOT and XOR you may want to check out S4 Section 1.b). For non-integers, the control unit will instead call on a floating-point unit (FPU), another type of logic unit that can also carry out arithmetic operations. In base-10, a floating-point number is a decimal number turned integer followed by 10 to the power of another integer. For example, 333.54 would become 33354×10^-2. Floating-point numbers work best to deal with numbers of extreme magnitude, be they small or large.

After loading them into its instruction register, the control unit (CU) converts the next set of instructions into timely signals dispatched to other parts of the CPU, memory and I/O devices. This decoding and interaction include, among other steps, identifying the location of the operands in the overall system and their retrieving so they can be processed. The execution process of a particular program is itself being kept orderly thanks to a program counter which indicates both where the CPU is at in the program sequence and where in memory the next instruction is stored, what is called the memory address.

As for the bus (the now standard name for what was originally called databus), it is the system enabling diverse elements of the computer to communicate. There is of course a hardware aspect to it but as one should expect by now this data transfer needs to obey certain rules called communication protocols, accordingly there is also a software aspect. To clarify, the Cambridge dictionary defines software as “the instructions that control what a computer does; computer programs” and hardware as “the physical and electronic parts of a computer or other piece of equipment.” These buses can be internal or external, as in the peripheral bus acronym USB, which stands for Universal Serial Bus, setting the industry standard for connecting external devices to a host, directly or indirectly via a hub. Nowadays, the same bus can be used for two-way data transfer and for addresses, which corresponds to the physical location where the required data needs to be written or fetched from.

b) Processor registers and cache

We now have a broad understanding of the overall architecture of and data flow within a computer and recognize that memory is an equally integral aspect of its performance, in addition to its ability to carry out arithmetic or logical operations. Imagine having a pad and working through a set of instructions; whenever you have a result, this result needs to be written down somewhere before it can be reused. It seems like it is one and the same to us, the operating and data manipulation but it isn’t. In fact, that is why we write it down. If you are solving an equation, you read the operands from the previous line (that’s the fetching), you process them, and then you write down the result before using this result for you next processing, which corresponds to executing the next set of instructions in a program.

However, bear in mind there is only so much paper you can have accessible on your desk at any point in time, this is scarce resources, either because of space constraint or in the case of a CPU because it is expensive to manufacture. Consequently, you would want to store data you have an immediate need for (or can reasonably expect to have an immediate need for) in a very accessible location, even on the notepad itself, whereas other data such as a story or geography lessons can be stored in a less costly medium taking more time to access. Thus, there exists a data storage hierarchy that can be divided into 4 categories: #1 the processor registers contained within the CPU itself, #2 the cache memory, #3 the main memory, and #4 mass storage, whether on-device or external.

We have already talked about the instructions register in the CPU but registers can also hold values such as numbers and addresses. The size of these registers, such as 32-bit, effectively becomes the data-size unit of a computer system and this is where the naming “32-bit computing” or “32-bit computer” originates from.

Physically, the registers hold the binary information, the bits, in circuits with two possible stable states. These circuits are called latches or flip-flops. Being electrical in nature, when powered off, this information is lost so the memory is called volatile, which is also the case of the cache and the main memory but not the hard drive of your computer or other external memory device. Some data may not be immediately required for the current processing cycle, and thus not held in the registers, but it may be called upon frequently or just with high likelihood. For this subset of data, it makes sense to have dedicated memory space located within the CPU itself or close to it and relying on a technology with faster read and write access than the main memory – think nanoseconds instead of microseconds. This buffer is called the cache memory and it is generally tiered with L1 cache pertaining to the CPU itself and being the fastest, L2 is a little slower than L1 and can be inside or outside the CPU depending on design, and L3 is larger and slower than L2, it would also be shared by all cores of the CPU. For example, the CPU within my laptop has L1 of 80 KB per core, L2 of 2MB per core, shared L3 of 18 MB, and the main cores run at a frequency of 2.6 GHz. KB stands for Kilobyte and MB for Megabyte, that is respectively one thousand and one million bytes where a byte is the digital information unit consisting of 8 bits.

It should be noted that almost all electronic devices require some compact software providing immediate control over the hardware. This is called firmware and this type of control is designated as “low-level” because there is essentially no abstraction embedded in the code, unlike in a mid-level or high-level programming language – more on this in S4 Section 3.a. This type of data needs to be readily accessible no matter what, so it needs to be non-volatile though it does not need to be updated or more broadly be reprogrammable. Historically, the preferred mode of storage has thus been ROM, which stands for read-only memory. This can be manufactured relatively simply on integrated circuits by representing binary data in a form of a fixed arrangement of transistors representing 1s and 0s and linking them to specific addresses. When an address is referred to by a set of instructions, the corresponding bits are transferred for further processing; thus it works like a baked-in index table.

c) RAM and Flash memory

Compared to the cache, the main working memory is physically further away from the processor and potentially not as fast for read and write. The reason for this is not so much the additional distance but the technology used. For cache, SRAM is used, which stands for static RAM and RAM in turn means random-access memory. RAM, also called direct access memory, should be contrasted with sequential access and merely means any addressable datum can be accessed as efficiently as any other, as opposed to having to progress through an entire list until reaching the right address, like on an old magnetic tape.

Static RAM relies on the flip-flops, analogous to what we described in the previous section for the in-processor registers. Get the address and read the information, it is quick. The main memory however, somewhat confusingly called “RAM”, is actually DRAM where the D means dynamic. Unlike SRAM, DRAM technology involves less expensive and more compact circuitry consisting of one transistor and one capacitor holding some tiny electric charge or not, thus storing the 1 or 0 bit in the form of its charge status. There are two main drawbacks to DRAM, the first is the need for refreshing because the capacitors slowly leak electricity, and the second is a much slower reading process involving the electrical isolation of the relevant bits (called the bit line), the half-charging of another set of capacitors electrically linked to the bit line and the amplification of deviations from this mid-charge level depending on whether the corresponding bit line capacitor had a charge of 1 or 0. So the read out takes time and so does the resetting of value after it, while the writing process is a little faster. If you wish to know more, I have inserted the link to the relevant Wikipedia entry in the last section of this chapter.

When time is less of the essence, i.e. counting in tenths of seconds or even full seconds to transfer files rather than process data in the CPU, it is fine to trade-off speed for price and non-volatility. The bulk of the memory in a computer in terms of capacity, or even on a smartphone for that matter, belongs to this mass storage category, the fourth in the hierarchy laid out earlier. Nowadays mass often means hundreds of gigabytes or even terabytes, enough to store entire music libraries and many movies in high definition, as well as entire personal administrative and picture archives.

The current technology of choice for these devices, which can take the form of solid-state drive, removable USB sticks or memory cards, is called Flash. There are two types of flash memory: NAND Flash is the more popular and cheaper whereas NOR flash has direct access capability and other few practical properties making it the preferred choice for program execution. The Flash technology is quite astute and consists in embedding a bit value in a transistor by injecting, or not, electrons into an intermediary gate called the float gate between the control gate and the n-p-n semiconductor (refer to S4 Section 1.c if this sounds unfamiliar). The process bears the name of Fowler-Nordheim tunnelling and the gate is called floating because it isn’t in direct contact with the semiconductor, being separated by an oxide layer which prevents the electrons from leaking into said silicon. If electrons have not been injected, the threshold voltage for the control gate remains unaltered but if electrons have been pumped in, then the floating gate creates an electric charge obstacle that needs to be overcome, thereby increasing the threshold voltage of the control gate. By applying an intermediary current voltage with value somewhere in between the standard and the increased threshold, it is possible to deduce whether electrons have been added to the floating gate, which reveals the value of the bit, 1 for electrons and 0 for no electrons. Decidedly very smart.

It should be noted that the process of erasing these bits does wear down the oxide layer though for single-level cell devices (the standards ones with 1 bit per cell), the limit is in excess of 100,000 cycles, plenty enough for the average consumer.

d) Magnetic data storage

Until the flash memory was invented, then commercialized in the 1980s and began to be competitively priced, most mass storage devices installed on desktop or laptop computers were hard disk drives (HDD) using digital magnetic storage technology. The overall notion is easy to grasp: use magnetic coil to polarize magnetic material in a north-south or south-north orientation with a 1 bit being encoded as a change of orientation and 0 corresponding to no transition. In practice, it gets pretty high tech for two main reasons: information density and write or read speed.

The magnetic material is a thin layer of iron oxide grains or nowadays a cobalt-based alloy less than 20nm thick resting on a circular platter of non-magnetic material. In fact, resting has not been the appropriate word for a while because the magnetic domains are no longer horizontal or longitudinal but vertical or perpendicular, thus delivering much higher information density. The hard drives of computers typically would have three platters and either 3 or 6 heads depending on whether the read and write heads are combined or separate. The platter being read will spin and the air flow will lift the head just a few nanometres above the surface, close enough to detect the transitions in magnetic fields.

So far so good, except that considering the number of bits that need to be read, the platter speeds really fast, the industry standard for consumer HDDs is 5,400rpm or 7,200rpm, which translates into 90 or 120 revolutions per second. At these speeds, and with the compounding effects of dilation through heat and mechanical deformation over time, it is difficult for the read-system to be sure exactly how many zeros have passed under the head since the speed at which the magnetic domains in a particular region of the platter at a particular time can vary by a small fraction, enough to run the risk of bits being missed when there are too many zeros in a row. This challenge is one shared by other telecommunications systems where the reading of each bit value cannot be exactly synchronized to a clock. Furthermore, due to the nature of magnetic fields, there is a limit to the sustainability of changes in polarity over very narrow spaces.

Nothing a coding convention called “run-length limited” (RLL) can’t sort out though. RLL specifies 2 parameters called “d” and “k” with the former being the minimum number of zeros before a change of polarity representing a 1 can be written and the latter being the maximum number of zeros that can be written consecutively. The idea is to ensure the 1s are given enough space on the platter but are not too far away. Consequently, sometimes a 1 needs to be forced and sometimes it will be 0 so the magnetic 1s and 0s can no longer match the bits one wants to encode. Not a problem, these just need to be mapped onto a binary code that meets the convention requirements, in other words binary information needs to be encoded as a slightly different binary information, still made of 1 and 0s. For example, in (1,7) RLL where d=1 and k=7, if we want to encode the 2 bits 10, these would be translated as 001 on the magnetic storage and when four consecutive bits have two zeros in the middle then these are mapped onto a 6-bit code, e.g. 1000 would become 001000. If you want to learn more about this, the link for the Wikipedia entry of Run-length limited is included at the end of the chapter.

Before the current HDD became mainstream, some of us may recall the quaint but slow floppy disks, a real icon still present as the symbol for “saving” your work on many software such as the one I am using to type these words. The main consumer version which became ubiquitous had a disk diameter of 3.5 inches (just under 9cm) and a capacity of 800KB and one would load a game, software or even an operating system using one or several of them. Different information density, different read and write speed, only one platter but same data storage overarching technology.

If magnetic storage remained the norm for use cases requiring the ability to erase and rewrite new information regularly, for the write-once or the read-only memory it is the optical data storage that made the floppy of computing obsolete as it did for the tape cassette across both audio and video content.

e) Optical data storage

In the initial digital compact disc read-only versions, the CD-ROM, the readout of the binary information also suffers from the same potential speed variation and data throughput rate issues as the HDD platter so RLL code is being used to translate between two sets of binary data. In the case of magnetic storage, a 1 was represented by a change in the orientation of the magnetic field whereas in a CD-ROM it is a change in depth of the track. This means the information is embedded in the form of “pits” interspersed with the non-pit horizontal surface called “land” and the length of each pit or land reflecting a certain number of zeros.

The read out of the data thus hinges on the ability to detect changes in depth and this is accomplished by focusing a laser on the track that is coated with a reflective surface. You may think one just need to time the light as it would take a little longer to hit the reflective surface and be sensed but we are talking light speed literally and the indents are a fraction of a micro-meter in depth so this would be both challenging and impractical if there are any wobbles during the read out. It is much more clever than that…
The main property of a laser lies in its coherence because the photons emitted have the same frequency, direction of travel and polarization. This means no interferences between the electromagnetic waves. If you wish to understand more about how this is achieved, you can refer to the link for the Wikipedia entry on stimulated emission in the last section. Through a bit of engineering, the laser beam is split into several beams and passed through filters converting the main three beams (one central framed by 2 secondary ones) into circularly polarized light. The trick here is that the depth of the pit is one quarter of the radiation wavelength (after adjusting for the refraction index) so if the central beam hits a pit, it rotates by twice one quarter phase (one quarter on the way down the pit and one quarter on the way back up after being reflected) more than the side beam reflecting on land did, and this half-phase difference means some of the waves cancel each other out and no or less light is being received by the sensor. Likewise, if the central beam hits the land while both or one of the side beams are reflected by a pit. Thus the changes in light intensity sensed announce a 1 and no change is interpreted as a 0 before going through the relevant RLL translation.

This matter of wavelength also has implications in terms of data density because the lower it is, the shorter the pits can be and the higher the information density on a given medium. And so we went through infrared laser for CD-ROMs storing up to 700MB (B is for bytes, not bits) to red laser for DVD offering 4.7GB and blue-violet light for Blu-ray disc achieving an impressive capacity of 25 GB and therefore entire movies in high definition.

As compared to the reading technology, the writing of CD-ROMs is comparatively less complex and can be done at scale by pressing a mold featuring the negative image of the pits and lands on the main CD material (polycarbonate resin) before covering the tracks with a thin layer of reflective aluminium. As for the mold, it is created by replicating the profile of a master CD by applying a thin layer of metal and then coating in plastic. And the master? It relies on a manufacturing process analogous to the creation of microchips described in S4 Section 1.d with the application of a photoresist layer and the flashing of a laser to expose selected areas. The exposed photoresist will then be removed by using a developer and washing it away, thus creating the desired pits.

Obviously, this doesn’t quite work for individual users looking to write data on their own CD, and thus there are differences in terms of both materials and technologies between ROM, recordable (CD-R) and rewritable (CD-RW) disks.

A CD-R would feature a spiral groove covered with an extremely thin layer of dye and on top of this a reflecting layer made of metal (aluminium, gold, silver or silver alloy) then a protective one made of lacquer – and the laser would be beamed from underneath, thus crossing the dye layer before reaching the reflective one. Therefore, instead of changes in depth via the use of pits, the 1 bits are encoded by changes in reflectivity detectable by the laser sensor and these changes are created by heating up the dye with a higher-powered writing laser. Hence the process is colloquially known as burning data onto a CD. However, this process is irreversible so the data cannot be erased and no new data can be recorded.

For the CD-RW, a different material is layered which can be altered back and forth between two states: it is deposited with a polycrystalline structure (not as ordered as crystal but still very structured) and, when liquefied, it loses much of its order and becomes amorphous, losing its reflective quality in the process. Unlike the burning of the CD-R, this process is reversible by heating up the amorphous area without melting it. Conceptually, this means the 1 bit is instantiated as phase changes of matter. Clever.

Finally, a couple of remarks on the rotation speed of disks. The first is that, to maintain a constant rate of data being read (also known as constant linear velocity), the rotation speed has to slow down for data located closer to the edge (the track starts inside and spirals outward). The ratio on a standard disk with a diameter of 12cms about 2.4x. The second has to do with the writing and reading speed, it is shown as a multiplier such as 52x in the modern versions. This multiple is benchmarked on the original CD-ROM with speeds of 150kB/s. However, for DVDs the 1x standard is ~1.39MB/s.

f) Trivia – The vinyl record and the cassette

A bit of time travel to conclude this information-heavy chapter; let’s revisit data storage media from yesteryear, though it must be acknowledged both still have niche markets.

Whilst we are still in the groove, we can start with the vinyl record. These are made of PVC (polyvinyl chloride) and came into various sizes, the main ones being 7 inches used for two singles (one per side) and 12 inches for full albums. Unlike a traditional (but more modern) CD-ROM, a vinyl rotates on itself at the same speed, it has a constant angular velocity rather than a constant linear velocity, and this rate of rotation is often a byword for the size of the disk: the 7-inch is called 45, meaning it spins at a rate of 45 revolutions per minute, and the 12-inch is known as 33 (its rpm is actually 33⅓) or alternatively as LP (for long play). Accordingly, the rate at which the data is encoded in the disk needs to be packed more densely as the needle spirals inward along the groove, starting from the outer edge.

If RAM, HDD and CDs rely on electric, magnetic and optical storage respectively, the nature of the data writing and reading for vinyl records can best be described as mechanical, fittingly so since sound propagates as pressure waves, also mechanical in nature. So the vibrations that need to be generated by the device can also be read as physical oscillations of the needle, moving left to right along the groove. We were not in the digital age then so no cut out necessary, it is possible and obviously preferrable to have an analog encoding of the data with smooth and progressive movements. Except that sideways works well for a single data channel but for a stereo player there needs to be two channels encoded. One would think sideways and vertical, unfortunately vertical oscillation would put significant strain on the needle, be subject to interference from dust, and behave slightly differently for mechanical reasons.

Not to worry, a sprinkle of engineering, a dash of math and voila: rotate this vertical-sideways axes by 45° to form a V-cut and the problem is solved. By summing up or integrating the movements across both channels we can derive the vertical movement and by subtracting them we end up with the sideways movement with 0 representing the centre of the groove.

Vinyl offered very satisfying sound quality but it was neither portable nor was it convenient for recording sounds. For this, the magnetic tape was the preferrable technology and, in its cassette format, it proved widely popular for music enthusiasts who could load their Walkman and listen to their favourite artists on the go, and later do the same in the tape players within cars. Production of the Walkman by Sony ceased in 2010 though cassettes are still produced for some markets. These devices featured capacity of 60 to 120mins in their most popular formats, half of this on each side and one would have to reverse the tape manually for the most compact players. The write and read technology is essentially similar to the one described for the HDD except that the data stored is analog rather than binary. The thin tape is layered with a mix of cobalt and iron oxide, ferromagnetic materials with a polarization that can be read and changed by the electromagnet coil head (more sophisticated versions have dedicated heads for reading, writing and erasing). For stereo, 2 mono tracks of 1.5mm would be lined side by side across the width of the tape and the consumer tapes would be recorded and read at a constant linear velocity of just under 10 cm/s or even 5 cm/s for contents with longer duration – it would be significantly higher for professional audio recording.