LINFO

HTML Definition



Hypertext markup language (HTML) is the basic language used to create documents for the Web and, along with HTTP (hypertext transfer protocol) and URLs (universal resource locators), is one of the three main protocols of the Web.

Hypertext is text that contains hyperlinks. A hyperlink is an automated cross-reference to another location on the same document or to another document which, when selected by a user, causes the computer to display the linked location or document within a very short period of time.

A markup language is a set of tags that can be embedded in digital text to provide additional information about it, including its content, structure and appearance. This information facilitates automated operations on the text, including formatting it for display, searching it and even modifying it. Some type of markup language is employed by every word processing program and by nearly every other program that displays text, although such languages and their tags are typically hidden from the user.

HTML consists of a set of predefined tags that can be embedded in text by web site designers in order to indicate the details of how web pages are rendered (i.e., converted into a final, easily usable, form) by web browsers. These details include paragraphing, margins, fonts (including style and size), columns, colors (background and text), links, the location of images, text flow around images, tables and user input form elements (such as spaces for adding text and submit buttons).

The tags are enclosed in pointed brackets. For example, the tag that is used to indicate the start of a new paragraph is  <p>, and the tag that is used to indicate the end of a paragraph is  </p>. Likewise, the tag set for indicating bold text is  <b> and  </b>; and thus the coding for the word bold that appears earlier in this sentence is merely  <b>bold</b>.

Tags that are not used to enclose text consist of only a single tag rather than a pair or tags. An example is the tag  <br />,   which stands for break and is used to start a new line.

A forward slash is always used to close single tags and to indicate the closing tag in tag sets. In the case of single tags, it is added after the text of the tag followed by a single space. In the case of closing tags, it precedes the text of the tag.

HTML is not a programming language, because it does not have any conditionals (e.g., if statements) that allow logic operations and thus it does not provide for interaction with users. This is not a problem, however, because a variety of programming languages that have such capabilities (such as PHP, PERL and JavaScript) can easily be used together with HTML.

The most common filename extension for files written in HTML is .html. However, older operating systems and filesystems, such as earlier versions of MS-DOS, limited file extensions to three letters, and thus an .htm extension is still supported by browsers and other programs.

The first published specification for a language called HTML was drafted by Tim Berners-Lee, the founder of the Web, with Dan Connolly, and it was published in 1993 by the IETF (Internet Engineering Task Force) as a formal application of SGML (standardized generalized markup language). There was no official standard HTML 1.0 specification, because multiple informal HTML standards existed at the time. The first formal standard was HTML 2.0, which was published in November 1995 as IETF RFC 1866.

HTML 3.0, which was proposed in March 1995, was designed to be compatible with HTML 2.0 while providing many new capabilities, including support for tables, text flow around images and the display of mathematical equations. However, it was too complex for the browsers then available and was thus soon followed by HTML 3.2, which discarded most of its new capabilities and instead adopted many of the features that had already been implemented in the then dominant Netscape and Mosaic browsers.

Since 1996, the HTML specifications have been developed and maintained by the World Wide Web Consortium (W3C), which was founded at the Massachusetts Institute of Technology (MIT) by Berners-Lee in 1994.

HTML 4.0 was first released as a W3C Recommendation in December 1997. It was then superseded by HTML 4.01, which was published in December 1999 and was intended to be the final version of HTML.

Following the publication of HTML 4.0, the W3C's HTML Working Group has increasingly focused on the development of XHTML (extensible HTML) as HTML's successor. XHTML is a reformulation of HTML as an XML (extensible markup language) that is much stricter and cleans up some of the ambiguities and irregularities of HTML, thereby allowing browsers to be simplified (which is particularly good for mobile devices). It is also more flexible and powerful than HTML, particularly in that it allows the use of user-defined tags, and it has been intended as a transitional step towards replacing HTML by XML languages (of which XHTML is just one) as the standard way to write web pages.

Nevertheless, HTML remains in widespread use and is thus expected to dominate web page creation for years to come. Thus, all web browsers continue to support HTML 4.01 and earlier versions. HTML's popularity is so great, in fact, that there has even been considerable discussion about bringing out a new version, which would likely be called HTML 5.0.





Created January 28, 2007.
Copyright © 2007 The Linux Information Project. All Rights Reserved.