LINFO

Database Definition



A database is a set of data that has a regular structure and that is organized in such a way that a computer can easily find the desired information.

Data is a collection of distinct pieces of information, particularly information that has been formatted (i.e., organized) in some specific way for use in analysis or making decisions.

A database can generally be looked at as being a collection of records, each of which contains one or more fields (i.e., pieces of data) about some entity (i.e., object), such as a person, organization, city, product, work of art, recipe, chemical, or sequence of DNA. For example, the fields for a database that is about people who work for a specific company might include the name, employee identification number, address, telephone number, date employment started, position and salary for each worker.

Several basic types of database models have been developed, including flat, hierarchical, network and relational. Such models describe not only the structure of the conforming databases but also the operations that can be performed on them. Typically, a database has a schema, which is a description of the model, including the types of entities that are in it and the relationships among them.

Flat databases are the simplest type. They were long the dominant type, and they can still be useful, particularly for very small scale and simple applications. An example is a single table on paper or in a computer file that contains a list of companies with information about each such as name, address, product category, contact name, etc. A flat database can also exist in the form of a set of index cards, each containing the information for one of the entities.

The development and subsequent rapid advance of electronic computers in the second half of the twentieth century led to the development of database models that are far more efficient for dealing with large volumes of information than flat databases. The most notable is the relational model, which was proposed by E. F. Codd in 1970. Codd, a researcher at IBM, criticized existing data models for their inability to distinguish between the abstract descriptions of data structures and descriptions of the physical access mechanisms.

A relational database is a way of organizing data such that it appears to the user to be stored in a series of interrelated tables. Interest in this model was initially confined to academia, perhaps because the theoretical basis is not easy to understand, and thus the first commercial products, Oracle and DB2, did not appear until around 1980. Subsequently, relational databases became the dominant type for high performance applications because of their efficiency, ease of use, and ability to perform a variety of useful tasks that had not been originally envisioned.

Object-oriented databases became a new focus of research during the 1990s, in part because of the great success that the object-oriented concept was having in programming languages (e.g., C++ and Java). Such databases have had some success in fields in which it is necessary to accommodate bulky and more complex data than relational systems can easily cope with, such as multimedia and engineering data, and some object-oriented concepts were thus integrated into leading commercial relational database products.

Subsequently, during the past few years considerable attention has been devoted to XML (extensible markup language) databases because of their ability to eliminate the traditional division between documents and data by breaking down the former into more atomistic, machine-searchable units. Some XML concepts are likewise being integrated into the mainstream relational database products.

Hypermedia can be considered to be a type of network database. Hypermedia is a computer-based information retrieval system that enables a user to gain or provide access to text, images (both still and moving) and sound via hyperlinks. Most hypermedia consists of hypertext, and the largest example by far is, of course, the web.

Such databases have a very different structure and means for accessing and maintaining them than relational databases, and in some ways they can appear to be highly anarchic. One of the most important advances in the past few years has been the development of high performance search mechanisms, most notably Google, that greatly increase the usefulness and convenience of such databases.

A database management system (DBMS) is software that has been created to allow the efficient use and management of databases, including ensuring that data is consistent and correct and facilitating its updating. For small, single user databases, all functions are often managed by a single program; for larger and multi-user databases, multiple programs are usually involved and a client-server architecture is generally employed. The first DBMSs were developed in the 1960s in an attempt to make more effective use of the new direct access storage devices (i.e., hard disk drives) that were becoming available as supplements and eventual replacements for punched cards and magnetic tape. The word database is commonly used in a broad sense to refer not only just to structured data but also to the DBMS that is used with it.

Some types of databases, particularly relational databases, can be easily manipulated, and information can be obtained from them in an extremely flexible manner by using queries, which are statements in specialized languages. The dominant query language is the semi-standardized SQL (structured query language), which differs slightly according to the specific DBMS. Although critics claim that SQL is not consistent with the relational model, it works extremely well in practice and no replacement is on the horizon.

Databases can be stand-alone programs, or they can be built into other programs, including those that are considered part of the operating system. The typical computer contains numerous databases of the latter type. They include the many plain text (i.e., human-readable characters) configuration files that allow users to easily modify system behavior on Linux and other Unix-like operating systems, such as /etc/fstab, /etc/hosts and /etc/passwd. Likewise, web browsers contain simple databases listing the most recently visited web sites and user preferences, and e-mail programs contain simple databases listing e-mail addresses, recent communications and user preferences.

The term database was originally written as data base, and it may have been first used in 1963 at a symposium sponsored by the System Development Corporation of Santa Monica, California. The use of the term database (single word) became popular in some European countries in the early 1970s, and it subsequently spread to the U.S.






Created June 22, 2006.
Copyright © 2006 The Linux Information Project. All Rights Reserved.