LINFO

Byte Definition


A byte (represented by the upper-case letter B), is a contiguous sequence of a fixed number of bits that is used as a unit of memory, storage and instructions execution in computers.

A bit (represented by a lower case b) is the most basic unit of information in computing and communications. Every bit has a value of either zero or one. Although computers usually provide ways to test and manipulate single bits, they are almost always designed to store data and execute instructions in terms of bytes.

The number of bits in a byte varied according to the model of computer and its operating system in the early days of computing. For example, the PDP-7, for which the first version of UNIX was written, had 18-bit bytes. Today, however, a byte virtually always consists of eight bits.

Whereas a bit can have only one of two values, an eight-bit byte (also referred to as an octet) can have any of 256 possible values, because there are 256 possible permutations (i.e., combinations of zero and one) for eight successive bits (i.e., 28). Thus, an eight-bit byte can represent any unsigned integer from zero through 255 or any signed integer from -128 to 127. It can also represent any character (i.e., letter, number, punctuation mark or symbol) in a seven-bit or eight-bit character encoding system, such as ASCII (the default character coding used on most computers).

Multiple bytes are used to represent larger numbers and to represent characters from larger character sets. For example, two bytes (i.e., 16-bits) can store any one of 65,536 (i.e., 216) possible values, that is, the unsigned integers between 0 and 65,535 or signed numbers from -32,768 to 32,767. Likewise, the range of integer values that can be stored in 32 bits is 0 through 4,294,967,295, or -2,147,483,648 through 2,147,483,647.

A maximum of 32 bits is required to represent a character encoded in Unicode, which is an attempt to provide a unique encoding (i.e., identification number) for every character currently or historically used by the world's languages. However, the majority of the world's languages only need a single-byte character encoding because they use alphabetic scripts, which generally have fewer than 256 characters.

The word byte can also refer to a datatype (i.e., category of data) in certain programming languages and database systems. The C programming language, for example, defines byte to be synonymous with the unsigned char datatype, which is an integer datatype capable of holding at least 256 different values.


Kilobytes, Megabytes, Gigabytes, Terabytes, Petabytes

Because bytes represent a very small amount of data, for convenience they are commonly referred to in multiples, particularly kilobytes (represented by the upper-case letters KB or just K), megabytes (represented by the upper-case letters MB or just M) and gigabytes (represented by the upper-case letters GB or just G).

A kilobyte is 1,024 bytes, although it is often used loosely as a synonym for 1,000 bytes. A megabyte is 1,048,576 bytes, but it is frequently used as a synonym for one million bytes. For example, a computer that has a 256MB main memory can store approximately 256 million bytes (or characters) in memory at one time. A gigabyte is equal to 1,024 megabytes.

One terabyte (TB) is equal to 1024 gigabytes or roughly one trillion bytes. One petabyte is equal to a 1024 terabytes or about a million gigabytes. Some supercomputers now have a petabyte hard disk drive (HDD) capacity and a multipetabyte tape storage capacity. The prefix peta is an alteration of penta, the Greek word for five.

An exabyte is 1024 times larger than a petabyte. The prefix exa is an alteration of hexa, the Greek word for six. As of 2005, exabytes of data are rarely encountered in a practical context. For example the total amount of printed material in the world is estimated to be around a fifth of an exabyte. However, the total amount of digital data that is now created, captured and replicated worldwide might be several hundred exabytes per year.


Origins

The term byte was coined by Werner Buchholz, a researcher at IBM, in 1956 during the early design phase for the IBM Stretch, the company's first supercomputer. It was a modification of the word bite that was intended to avoid accidentally misspelling it as bit. In 1962 Buchholz described a byte as "a group of bits used to encode a character, or the number of bits transmitted in parallel to and from input-output units."

Byte is also sometimes considered a contraction of BinarY digiT Eight. IBM used to teach that a Binary Yoked Transfer Element (BYTE) was formed by a series of bits joined together "like so many yoked oxen." Binary refers to the fact that computers perform all their computations with the base 2 numbering system (i.e., only zeros and ones), in contrast to the decimal system (i.e., base 10), which is commonly used by humans.

The movement toward an eight-bit byte began in late 1956. A major reason that eight was considered the optimal number was that seven bits can define 128 characters (as against only 64 characters for six bits), which is sufficient for the approximately 100 unique codes needed for the upper and lower case letters of the English alphabet as well as punctuation marks and special characters, and the eighth bit could be used as a parity check (i.e., to confirm the accuracy of the other bits).

This size was later adopted by IBM's highly popular System/360 series of mainframe computers, which was announced in April 1964, and this was a key factor in its eventually becoming the industry-wide standard.

If computers were used for nothing other than binary calculations, as some once were, there would be no need for bytes. However, because they are extensively used to manipulate character-based information, it is necessary to have encodings for those symbols, and thus bytes are necessary.




Created May 14, 2005. Updated October 3, 2007.
Copyright © 2005 The Linux Information Project. All Rights Reserved.