CS Ramble — Set 1b - memory, text, numbers

This is post is part of set 1 of A Ramble Around CS.

Computer memory

You can imagine all your computer’s memory as a series of little boxes, numbered from 0 (because we’re computer people), and going up, and up, and up. The laptop I’m typing this on has 32MB of memory, or 34,359,738,368 little boxes!1

· · · 0 1 2 3 4 5 6 7 8 9 10 11 34,359,738,367

We’ll get into “bits” and “bytes” and counting in “binary” later, but for now, let’s just take it as given that each of those boxes holds a single “byte”: a number from 0 to 255.

Text

If we want to store letters in the boxes, we have to come up with some kind of “Character Encoding”, assigning a number to each letter.

I’d use 1 for “A”, 2 for “B”, …, except it’s probably better to just jump straight to the actual numbers your computer (usually) uses (called “ASCII3) so we get used to seeing the real numbers:

“Hello, world” H e l l o , w o r l d 72 101 108 108 111 44 32 119 111 114 108 100 0 1 2 3 4 5 6 7 8 9 10 11

Numbers

Storing numbers in the boxes is easy if they’re between 0 and 255. Just stick them in there!

42 17 0 255 0 1 2 3

For negative numbers, we pretend some of the numbers are negative:

0 1 2 126 127 128 129 130 253 254 255 Value 0 1 2 126 127 -128 -127 -126 -3 -2 -1 Interpretation

Probably, it’s better to think of it in modular (clock) arithmetic:

0 0 1 1 2 2 3 3 4 4 · · · · · · 255 -1 254 -2 253 -3 252 -4 · · · · · · · · · · · · 126 126 127 127 128 -128 129 -127 130 -126 · · · · · · Value Interpretation Signed interpretation of byte values

Since there are 256 spots in a full circle, going 255 spaces clockwise (adding 255) is the same as going one space counter-clockwise (subtracting 1).

But that still only gives us 256 different values.

Bigger numbers

For bigger numbers, we’ll have to combine pairs of bytes, or groups of 4 or 8 (or more) bytes.

In normal arabic numerals, we have ten choices, 0–9, and then spill over to the next space, whose value is multiplied by ten.

We can do the same with bytes: we have 256 choices, 0–255, and then spill over to the next space, whose value is multiplied by 256.

0 1 2 254 255 256 257 258 65534 65535 Value 0 0 0 0 0 1 1 1 255 255 0 1 2 254 255 0 1 2 254 255 Representation as 2 bytes

So, 255 = 0×256 + 255, and 258 = 1×256 + 2. I’ve put the “×256” byte first, to match how in the number “12”, the “×10” digit goes first. Which byte you put first is a choice, and most current computers actually put the littlest byte first and the “×256” byte second. This is called “little-endian”, because the littlest byte comes first. The opposite “endianness” is of course “big-endian”. When you’re storing things on disk, or sending numbers to a Raspberry Pi with a laser, you can pick your own endianness!

Just like with individual bytes, you can use half the space for negative numbers, and turn the range 0…65535 into -32768…32767.

A note on names for things

All of these things have multiple names you might run into:

Bytes   Signed? Min Max  Common names
1 0 255 (unsigned) byte, (unsigned) char, uint8
1 -128 127 (signed) char, byte, int8
2 0 65535 unsigned int, unsigned short, uint16
2 -32768 32767 int, short, int16
4 0 2³²-1 unsigned int, unsigned long, uint32
4 -2³¹ 2³¹-1 int, long, int32, rune
8 0 2⁶⁴-1 unsigned long, uint64, uint
8 -2⁶³ 2⁶³-1 long, int64, int

Note that many of the terms are ambiguous, between programming languages, or even between computers. In Go, for example, “int” can mean 32 or 64 bits, depending on whether your computer is running in 32- or 64-bit mode. I’ve marked ambiguous names in italics.

In the next part, we’ll discuss how your program’s variables point to things in memory.


  1. That’s 32×1024×1024×1024. 1024 bytes is a “kilobyte”, because 1024 is close to 1000, 1024×1024=1048576 bytes is a “megabyte”, because it’s close to a million bytes, and 1024×1024×1024=1073741824 bytes is a “gigabyte”.2 ↩︎

  2. Technically, according to the SI prefixes, a “kilobyte” (KB) is exactly 1000 bytes, and you should use “kibibyte” (KiB) to refer to 1024 bytes. Since “kibibyte” sounds more like a Pokémon or a kind of dog food for small, yappy dogs, nobody uses it. Except hard disk manufacturers, who insist on using exact powers of 10 to make their hard disks sound 10% bigger (for terabytes) than they really are. ↩︎

  3. If you’re on Linux or MacOS, you can type man ascii in your terminal to see a list of ASCII codes. Ignore the hexadecimal (for now) and octal (forever) sections; the decimal section at the bottom is useful. ↩︎