Unicode – Characters (Part 4)

Even more common than encoding in the decimal system is encoding in the hexadecimal system in Unicode. Now you may wonder why you cannot just stick to working with decimal codes. The world of computers, however, is a world of bits and bytes; it works mainly with binary numbers.

Unicode in the hexadecimal system

In binary code, each digit, called bit, or “binary digit“, can have only one of the two values ‘0’ or ‘1’, “corresponding to the electrical values of off or on, respectively” (Indiana University, 2018). Four of these positions, filled by zeros and ones, then are combined to a half-byte, called “nibble“, and eight such binary digits are combined to a byte. Historically, eight such binary digits (1 byte) have been used to represent a character. (In the beginning, the 8-bit coding per character, which can encode up to 256 characters, was still sufficient. Later, when more characters needed be encoded, computer scientists resorted to using more than 1 byte per character.) To this day, in most computer systems the byte represents the smallest addressable storage unit.

The preference for the eight-digit byte also explains why computer scientists often prefer the hexadecimal system. As the name “sixteener system” implies, it is based on powers of the number 16. Instead of using, like the decimal system, 10 digits (0-9) before going to the next higher level, the hexadecimal system uses 16 digits (0-15, see video). In order to avoid ambiguity when using multi-digit numbers, the numbers 10-15 are represented by the letters A-F.

With the hexadecimal system one can represent quite practically the sequence of binary numbers: a hexadecimal digit stands for four binary digits (1 nibble), while two hexadecimal digits stand for eight binary digits (1 byte). It is therefore often easier or more obvious for computer scientists to convert binary codes into hexadecimal instead of decimal codes.

In fact, the hexadecimal system is so common and widespread that in the character code charts published by the Unicode Consortium, all information is given in hexadecimal coding only.

Hexadecimal codes for characters in African writing systems

The website of the Unicode Consortium (unicode.org) contains character code charts with the hexadecimal codes for the various characters arranged in blocks of individual writing systems. Unicode has also put together some explanatory specifications specifically for African writing systems (as of 2014, Unicode version 7.0.0, while as of the now-released Unicode version 11.0.0, see the specifications of 2018, pp. 741-764).

The easiest way to get at the codes for characters that you would like to include in your text is to look at the Unicode page containing the character code charts.

For characters in African languages written in Latin script or in writing systems modified from Latin letters, it is best to search in one of the “Latin” blocks:

**Latin scripts in Unicode 11.0.0, 2018: decimal and hexadecimal ranges, based on unicode.org 2018. CC: AN 2018, BY-NC-SA.**

Other writing systems, however, each have their own blocks:

**European and Middle Eastern Scripts in Unicode 11.0.0, 2018 (selection): decimal and hexadecimal ranges, based on unicode.org 2018. CC: AN 2018, BY-NC-SA.**

On the website of unicode.org there are also several specific African writing systems listed, such as Adlam (for Fula in the Sahel and West Africa), Bamum (for Bamum in Cameroon), Bassa Vah (for Bassa in Liberia and Sierra Leone), Mende Kikakui (the Kikakui script for Mende in Sierra Leone), N’Ko (for Manding languages in West Africa), Osmanya (for Somali in Northeast Africa), Tifinagh (for Berber languages in Morocco, Algeria, Mali and Niger, for example Tuareg), Vai (for Vai in Liberia and Sierra Leone) and Ethiopian characters (e.g. for Ge’ez and Amharic in Ethiopia, or Tigrinya in Ethiopia and Eritrea). With the latest version of Unicode 11.0.0., by June 2018, Medefaidrin (“used for modern liturgical purposes in Africa” by Christian Ibibio congregations in Nigeria, unicode.org 2018) has also been added:

**African scripts in Unicode 11.0.0, 2018: decimal and hexadecimal ranges, based on unicode.org 2018. CC: AN 2018, BY-NC-SA.**

So far, the number of Egyptian hieroglyphs recorded in Unicode is limited (and a complex matter if glyphs are to be arranged in a particular way). However, the consortium already announced that Unicode will be expanded in the near future by a substantial number of other Egyptian hieroglyphics. So far, not all ancient Egyptian writing systems are included in Unicode; for example, the “priestly writing” Hieratic, and Demotic are still lacking.

Also numerous other African writing systems are not yet recorded in Unicode, such as the Borama (Gadabuursi) and Kaddare alphabets (for Somali), the Luo alphabet (“Luo Lakeside Script” for Luo languages, especially in Kenya), Zaghawa (for Zaghawa in Chad and Sudan), or Mandombe (for Bantu languages of the two Congos, also Angola). Hopefully, more of these writing systems will find their way into Unicode in the future. For Mandombe, for example, a new proposal for Unicode has been submitted in 2016.

Continue to Unicode Characters Part 5.

VAD

Alle Artikel