Internet and the alphabets of Russia's minority languages

At the present moment the overwhelming majority of Russia's national minorities uses Cyrillic-based writing systems. Some languages have adopted the Russian alphabet without any modifications, but in most cases new characters and diacritical letters have been added. The total number of additional Cyrillic characters in the languages of the former Soviet Union reaches 70.

The Cyrillic encodings widely used on the Internet (KOI8, Windows CP 1251 and others) were created on the basis of the Latin and Russian alphabets. Later on, special letters used in Ukrainian, Belorussian, Serbian and Macedonian were added. The creators of these standards were apparently not interested in the needs of such large nations such as the Azeris, Kazakhs or Uzbeks - partly due the fact that the code pages had no room left for new additional letters.

As a result many nations with Russian-based writing systems were left at the mercy of commercial firms and local computer specialists that started to create fonts for individual languages. This approach has many negative sides. The creation of high-quality fonts requires professional skills, hard work and considerable amounts of money. The quality of national fonts is usually much lower than the quality of, for example, the Microsoft fonts distributed for free. Large companies are not interested in the creation of a wide selection of fonts for comparably small and poor markets. Moreover, as a result of the lack of standardization, the encodings of the fonts turn out to be different. A text, typed with one font, can not be read and printed with another if the encoding is different. Fonts with different encodings require special keyboard layouts. Everyone suffers because of the lack of standardization.

There are three basic solutions for the problem of the national alphabets. The most perspective one appears to be Unicode - an international standard that supports all existing writing systems of the world, including the alphabets of almost all languages of Russia. The Cyrillic part of the Unicode 3.0 standard currently lacks additional letters of only a few Arctic and Far Eastern languages. When these lines were written (December 2000), however, for most of us Unicode remains a fact of the future. The operation systems Windows 95 and 98, for example, do not support Unicode as such, and there are only a few Unicode programmes and fonts with the necessary additional letters for these platforms.

Another variant is the creation of alternative multilanguage Cyrillic encodings for individual platforms (particularly, the Windows and Macintosh operation systems). An example of such encoding is Cyrillic Asian, created by the Paratype company. It includes additional letters used in the languages of Central Asia, Kazakhstan, Azerbaijan and some republics of the Russian federation. In this encoding addtional letters occupy the Unicode positions of Ukrainian, Serbian and Macedonian special letters, as well as of some lesser-used mathematical and other symbols. In other words, Cyrillic Asian replaces the standard Cyrillic encoding and can be regarded only as a temporary solution of the problem. The author of this paper has offered a similar system for a number of Uralic languages.

The creation of national standards should be regarded as another temporary solution. Until now, the only example is the Cyrillic Tatar encoding accepted by the government of Tatarstan in 1996. There are freely distributed keyboard layouts and fonts for this encoding where Tatar letters replace special letters of the Serbian and Macedonian languages. From a technical point of view such a solution is quite acceptable. Naturally, national standards can not be created without an active role played by local governments. The lack of such standards in other republics is a testimony of the local leaders' attitudes towards the mother tongues.

Due to technical problems or simple lazyness, many of us still continue to use Latin transcription, particularly when typing e-mail. This is the case not only with minority languages, but with Russian as well. The Latin alphabet has been used to create web sites, for example, in the Bashkir, Ossetian and Tatar languages. Every author uses his own system of transcription, which creates great confusion.

Since the beginning of the 1990's, some former Soviet republics (Azerbaijan, Moldavia, Turkmenistan, Uzbekistan) have officially switched into the Latin alphabet. A Latin-based Chechen alphabet was accepted in 1992, and in the autumn 2000 the transition into Latin started in Tatarstan. The need to adopt new information technologies is frequently named as one of the major reasons for switching into the Latin alphabet. At the same time, however, numerous problems caused by this switch are being ignored, as well as the fact that the Latin alphabet as such has practically no advantages compared with the Cyrillic. As for the Internet, the practicality of new national Latin-based alphabets depends from whether they are supported by the existing standard encodings (Latin 1, Latin 2, etc.).

For the Baltic states, for example, a special Baltic encoding has been created. Moldavia, with the switch into the Romanian alphabet, became a part of the Central European zone (Latin 2). In theory, technical support for these languages is guaranteed, though some problems arise with the parallel use of Russian. The alphabet adopted in Azerbaijan differes from the Turkish only by one letter, as a result of which the Azeris can not use standard Turkish encodings. A similar situation was created in Uzbekistan, where the new alphabet was originally supplemented with a number of additional letters. Later on, all special letters were rejected - a decision that reminds of the worst times of the forced transfer into the Cyrillic at the end of the 1930's. The Chechen alphabet contains 15 additional letters, some of which, it seems, are not represented even in the standard Unicode. At the moment, the Azeri, Chechen and Tatar Latin alphabets are not covered by widely used international code pages.

So, what should be done? First of all, let us not spoil our mother tongues. When possible, existing and officially accepted alphabets and orthographies should be used. If there is no such possibility for technical reasons, we should try to unify systems of practical transcription on the Latin or Cyrillic basis. Here we would like to point out, for example, the system of Cyrillic Kazakh transcription offered by Andrei Sergeev, which can be used also for a number of other Turkic languages. Interestingly, such transcription closely reminds the orthographies of the related Kumyk and Nogai languages.

Secondly, let us stop daydreaming of a switch into the Latin alphabet which could lead into new and even more serious problems. What we need is standardization of fonts, creation of national and, ideally, multilingual (for language groups or geographic areals) standards based on the widely used Russian encodings. This, however, should only be regarded as a temporary solution. Eventually, let us hope, problems with the national alphabets will disappear together with a complete transition to the Unicode standard.


Created by [email protected]