Unicode – Characters (Part 2)



Unicode – Characters (Part 2)

Creating and editing foreign-language documents is one of the tasks scientists often are engaged with. It is thus convenient, if they may enter their texts also in the original languages and use for that purpose the appropriate characters. The character codes of the Unicode standard offer the means to do so. But where, for example, do scholars of African studies find the right codes for the characters they would like to use?

Decimal codes for characters in African writing systems

African languages ​​based on the Latin writing system can usually be typed by using characters of one of the Latin character blocks, which in Unicode (except for “Latin Extended Additional”, with the decimal codes 7680-7935) almost all found on the anterior decimal positions between 32-879.

The Pennstate University website (2016), moreover, lists some African languages ​​for which digitized typing conventions already exist. For example, for Fula, Hausa, Igbo and Wolof specific Unicode decimal codes are recommended, which can be used as so-called “Alt codes” for keyboard input under Windows. The website also suggests some alternative solutions for Mac users:

Unicode (decimal) for African languages, (Pennstate University 2016). CC: AN 2018, BY-NC-SA

For computer inputs in other writing systems, you may search for the corresponding characters in one of the various Unicode character tables. The University of Wisconsin-Madison, for example, has a Unicode-table (2005) organized in blocks of different writing systems, which provides a large number of decimal codes for different languages ​​and their writing systems. The table covers most of the 97,655 Unicode characters known by 2005 (of currently 137,374 characters of Unicode version 11.0.0, 2018):

Unicode (decimal), (University of Wisonsin-Madison 2015). CC: AN 2018, BY-NC-SA.

In addition, a Unicode character table created in 2012 by the St. Petersburg web studio „Sa•design“ (with updates of 2015 up to 2017) allows browsing the entire repertoire of Unicode characters. If you move the cursor to a desired letter in this program, a small field will open underneath, where to the right of the hexadecimal notation (“U +”) also the decimal indication (“Dec:”, followed by the numeric decimal code) is recorded. The repertoire of the table contains all Unicode characters up to and including the Unicode version 10.0.0 of 2017. You can also search for targeted blocks of specific writing systems:

Unicode (decimal), (Sa•design 2017). CC: AN 2018, BY-NC-SA.

The software “BabelMap“, which is free for download, also makes it possible to browse through the entire Unicode character repertoire and to find the associated encodings: When browsing, at the bottom right of the respective marked character, the decimal code is displayed right behind the term “Decimal”. BabelMap version of 2018 indicates in its description that the data of the latest Unicode version 11.0.0 from 2018 already are incorporated:

Unicode (decimal) in BabelMap Version, 2018. CC: AN 2018, BY-NC-SA.

Alternatively, you may also use the online tool “BabelMap Online (Unicode 11.0)“.

Unicode Lookup” (2009) is another useful search tool, especially since you can type in search codes. For the decimal codes you have to look in the second column under “Dec”. However, the characters of the newer Unicode versions, like Adlam of 2016 or Bamum of 2009, are not yet contained in the table:

Unicode (decimal) in Unicode Lookup 2009, CC: AN 2018, BY-NC-SA.

If you also want to add to your characters phonetic transcriptions in the International Phonetic Alphabet (IPA), you similarly can enter these characters via Unicode. The decimal codes for IPA can be found, for instance, on the website (2017). The corresponding code is shown in the second column of the table, following the respective glyph:

Unicode (decimal) for International Phonetic Alphabet (IPA), 2017. CC: AN 2018, BY-NC-SA.

The decimal codes are entered then into the computer via the so-called “Windows Alt Codes” to digitally represent the desired characters.

Continue to Unicode Characters Part 3.