Internationalization: A Brief Introduction

Internationalization1 is the process of designing a product (i.e., a good or a service) so that it can be localized without major engineering changes.

This contrasts with localization, which is the adapting of a product to a specific country, region, language, dialect, culture, etc.

Internationalization and localization are important because of the numerous differences that exist among countries, regions and cultures with respect to (1) languages (not only distinct languages but also dialects and other differences within a single language), (2) weights and measures, (3) currency, (4) date and time formats, (5) names and titles, (6) citizen identification numbering systems, (7) telephone numbers, addresses and postal codes, (8) religious, cultural and political sensitivities, (9) profanity and (10) legal systems.

As is the case with quality control (and security in the case of computer software), internationalization is something which needs to be considered at all stages of a product's life cycle (i.e., from initial planning through delivery through servicing and maintenance) in order for it to be truly effective. However, like quality control (and security), it often gets ignored or neglected until products have already been developed because of other priorities, including tight budgets and deadlines, and because of lack of knowledge about or understanding of it.

Internationalization and Languages

The issues of internationalization and localization with regard to human languages are particularly complex. This is in large part because thousands of languages and dialects are in use throughout the world, and they differ not only in their vocabularies but also in their structures and writing systems.

Many people, particularly in English-speaking countries, assume that English is the international language and that most people around the world can, or should be able to, use products that are designed for English speakers. Thus, they feel that it is sufficient to design products for use by people who can speak, or at least read, English.

However, this is not correct, or wise. In fact, only about ten percent of the world's population use English as their primary language. The number of native speakers (i.e., people who were brought up speaking a language as their primary language) of English is far smaller than the number of native speakers of Chinese. Some estimates place it third or fourth after Chinese and Bengali, e.g., 937 million just for Mandarin, the main dialect of Chinese, and 322 million for all dialects of English. Moreover, English is in decline as a first language, according to some studies, due to faster population growth rates in non-English speaking countries.

The number of writing systems in use today is far smaller than the number of languages, although the smaller number is compensated for by the extreme differences in the systems. For example, the number of distinct characters used by languages varies wildly, from a few dozen for some languages to tens of thousands for Chinese.

Characters are the basic symbols that are used to write or print a language. For example, the characters used by the English language consist of the letters of the alphabet, numerals, punctuation marks and a variety of symbols (e.g., the ampersand, the dollar sign and the arithmetic symbols).

But the differences in writing systems are far greater than just the number of characters, and this further increases the complexity of internationalizing computer software. For example, while most languages are written from left to right, some are written from right to left. A few are written vertically, particularly Asian languages, such as Mongolian and Japanese, which is often written from top to bottom and starting with the right-most column. Moreover, in some languages the shapes of the letters can change according to their locations in words and other factors.

One of the most important advances with regard to internationalization of software has been the development of Unicode. Unicode is a system that attempts to provide a unique encoding (i.e., identification number) for every character used by the world's languages, both existing and historical. Although it is an excellent project, it is not without controversy, particularly with regard to han unification, that is, how multiple versions of Chinese characters that are used in East Asian nations are dealt with.

Convergence and Divergence

Some developers of products have observed that there is a trend towards reducing differences among countries, and thus they have concluded that it is less important to allow for national and regional variations in products than in the past. An example of such convergence is the metric system. Another is the introduction of the Euro as the common currency of the European Union.

However, at the same time, there has also been a trend in the opposite direction. This has been the result of smaller countries and minorities (e.g., languages and ethnic groups) within countries striving to preserve or regain their historical languages, dialects, religions and cultures.

lifecycle Formerly the argument was often made that the world should strive towards greater standardization because it was necessary for economic efficiency and in order to obtain the maximum benefits from free trade. However, its critics claim that this argument was shallow and overlooked several important things. One was that there are also costs to standardization, including the adverse effect on traditional cultures and values. Although difficult to quantify in monetary terms, such items are no less important to many people than items that are easy to quantify, such as units of product produced and sold. Also, some critics point out that the emphasis was put on imitating the economically and militarily dominant country(s) rather than on the true merits of the standards.

For example, it had been seriously suggested that the Chinese should abandon their 4,000 year old traditional writing system and replace it with Roman letters (i.e., the system used by English and other Western European countries) because it was not compatible with typewriters. Even some Chinese intellectuals started to believe this.

However, advances in computer technology have made it very easy to write Chinese electronically and have completely eliminated arguments that it is an obstacle to advancing technology. Moreover, recent studies have shown that there are other advantages in the traditional writing system. For example, although it is certainly a chore for students to learn thousands of complex characters, such mental training at a young age can have certain psychological benefits that are difficult to obtain just by learning the few dozen letters in the Roman alphabet.

Some proponents of maintaining the Chinese writing system have also asked why it is that the literacy rates (i.e., percentage of the population that can read and write) is far higher in some countries that use Chinese characters (e.g., Taiwan and Japan) than it is in the U.S.

Advances in technology are not only allowing differences among countries, regions, languages, dialects, religions, etc. to continue, but they are also allowing them to flourish. Internationalization as used in this article is about how to design products so that they can be localized without putting any pressure on countries to suppress their differences for some real or imagined benefits of uniformity.

Well-crafted internationalization does not necessarily lead to uniformity. In fact, it can promote diversity by making it easier to localize products.

Internationalization and Software

Internationalization and localization are particularly important considerations for computer software. One reason for this is that, in contrast to some products, software is highly portable. That is, it is extremely easy to transport around the world, e.g., via the Internet, via disks sent by conventional mail or via retail sale boxes shipped by air.

A second reason is that there is usually a large demand from around the world for useful and good quality software. It is often the case that a single application program will dominate, or at least have a very large share in, its particular market niche. Among the many examples that can be cited are the Apache web server, which is used to host the majority of web sites on the Internet, Photoshop, which dominates high-end image processing, and Google, which is by far the most popular web search engine.

Internationalization and localization are important not only for the software itself but also for the documentation for the software. Documentation is any communicable material (such as text, audio or video or some combinations thereof) that is used to describe, explain or instruct regarding some attributes of an object, system or procedure, such as its parts, assembly, installation, maintenance and use. As is the case with internationalization, documentation is ideally developed along with a product and fully integrated into it rather than being written at a later stage almost as an afterthought.

Free Software and Internationalization

Some types of software are more suitable for internationalization and localization than others. In particular, free software (i.e., software which is free both in a monetary sense and with regard to use) is much better suited to localization than proprietary (i.e., commercial) software. There are several reasons for this.

The most important is that its source code (i.e., the original version in human-readable form) is freely available. This makes it easy for programmers in any country or region to modify the software. It is virtually impossible to modify proprietary software because the source code is kept secret and also because there are usually legal restrictions on doing so.

Another reason that free software is more suitable for localization is that there is little incentive for volunteers to develop localized versions of proprietary software for their country or region because they have no ownership and little control of what happens to their efforts.

Free software provides the users, i.e., the native speakers, access to the localization process (i.e., translation and other customization), and thus the product can evolve to meet the requirements of the local community. It is frequently claimed that the localized versions of free software are superior to localized versions of proprietary software, the reason being that the former were created by the community actually using the software rather than paid workers who may or may not have the same knowledge and viewpoints and members of the local community of users.

Approaches to Internationalization

There are two general approaches to internationalization and localization. One is to create a world-ready, single-binary (i.e., uncompiled source code) version of an application program that is ready for use in all markets. The other is to create, or allow for users to create, dedicated editions of the product for specific markets.

The former can be useful and economical for catering to a limited number of markets. However, it has the disadvantage that it can be difficult to coordinate the great amount of expertise that would be required to produce such a version for a large number of distinctive markets. For example, it is necessary to have access to people who understand each of the foreign languages and cultures and who also have a technical background.

The second approach offers much greater flexibility, as it allows local versions to be created when possible and as desired. It also makes it easier to update specific versions. Moreover, it can help keep the source code smaller and less complex. A common practice to facilitate localization when using this approach is to keep the textual data and other locality-dependent components separate from the main body of the source code.

1 The word internationalization is sometimes abbreviated as I18N (or i18n or I18n), particularly where the word is being used numerous times, because of its length. The number 18 refers to the number of letters omitted. Likewise, localization is sometimes abbreviated as L10N. However, some people object to these abbreviations because they can be difficult to understand or remember for people not familiar with the subject area.

Created April 30, 2006.
Copyright © 2006 The Linux Information Project. All Rights Reserved.