ISO/IEC 8859

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP














ISO 8859 encoding family
Standard
ISO/IEC 8859
Extends
US-ASCII
Preceded by
ISO 646
Succeeded by
ISO 10646 (Unicode)
Other related encoding(s)
Windows-125x

ISO/IEC 8859 is a joint ISO and IEC series of standards for 8-bit character encodings. The series of standards consists of numbered parts, such as ISO/IEC 8859-1, ISO/IEC 8859-2, etc. There are 15 parts, excluding the abandoned ISO/IEC 8859-12. The ISO working group maintaining this series of standards has been disbanded.


ISO/IEC 8859 parts 1, 2, 3, and 4 were originally Ecma International standard ECMA-94.




Contents





  • 1 Introduction


  • 2 Characters


  • 3 The parts of ISO/IEC 8859

    • 3.1 Table



  • 4 Relationship to Unicode and the UCS


  • 5 Development status


  • 6 See also


  • 7 Notes


  • 8 References




Introduction


While the bit patterns of the 95 printable ASCII characters are sufficient to exchange information in modern English, most other languages that use Latin alphabets need additional symbols not covered by ASCII. ISO/IEC 8859 sought to remedy this problem by utilizing the eighth bit in an 8-bit byte to allow positions for another 96 printable characters. Early encodings were limited to 7 bits because of restrictions of some data transmission protocols, and partially for historical reasons. However, more characters were needed than could fit in a single 8-bit character encoding, so several mappings were developed, including at least ten suitable for various Latin alphabets.


The ISO/IEC 8859-n encodings only contain printable characters, and were designed to be used in conjunction with control characters mapped to the unassigned bytes. To this end a series of encodings registered with the IANA add the C0 control set (control characters mapped to bytes 0 to 31) from ISO 646 and the C1 control set (control characters mapped to bytes 128 to 159) from ISO 6429, resulting in full 8-bit character maps with most, if not all, bytes assigned. These sets have ISO-8859-n as their preferred MIME name or, in cases where a preferred MIME name is not specified, their canonical name. Many people use the terms ISO/IEC 8859-n and ISO-8859-n interchangeably. ISO/IEC 8859-11 did not get such a charset assigned, presumably because it was almost identical to TIS 620.



Characters


The ISO/IEC 8859 standard is designed for reliable information exchange, not typography; the standard omits symbols needed for high-quality typography, such as optional ligatures, curly quotation marks, dashes, etc. As a result, high-quality typesetting systems often use proprietary or idiosyncratic extensions on top of the ASCII and ISO/IEC 8859 standards, or use Unicode instead.


As a rule of thumb, if a character or symbol was not already part of a widely used data-processing character set and was also not usually provided on typewriter keyboards for a national language, it did not get in. Hence the directional double quotation marks « and » used for some European languages were included, but not the directional double quotation marks and used for English and some other languages. French did not get its œ and Œ ligatures because they could be typed as 'oe'. Likewise, Ÿ, needed for all-caps text, was dropped as well.[1][2][3] Albeit under different codepoints, these three characters were later reintroduced with ISO/IEC 8859-15 in 1999, which also introduced the new euro sign character €. Likewise Dutch did not get the ij and IJ letters, because Dutch speakers had become used to typing these as two letters instead. Romanian did not initially get its Ș/ș and Ț/ț (with comma) letters, because these letters were initially unified with Ş/ş and Ţ/ţ (with cedilla) by the Unicode Consortium, considering the shapes with comma beneath to be glyph variants of the shapes with cedilla. However, the letters with explicit comma below were later added to the Unicode standard and are also in ISO/IEC 8859-16.
Most of the ISO/IEC 8859 encodings provide diacritic marks required for various European languages using the Latin script. Others provide non-Latin alphabets: Greek, Cyrillic, Hebrew, Arabic and Thai. Most of the encodings contain only spacing characters although the Thai, Hebrew, and Arabic ones do also contain combining characters. However, the standard makes no provision for the scripts of East Asian languages (CJK), as their ideographic writing systems require many thousands of code points. Although it uses Latin based characters, Vietnamese does not fit into 96 positions (without using combining diacritics) either. Each Japanese syllabic alphabet (hiragana or katakana, see Kana) would fit, but like several other alphabets of the world they are not encoded in the ISO/IEC 8859 system.



The parts of ISO/IEC 8859


ISO/IEC 8859 is divided into the following parts:






































































Part
Name
Revisions
Description

Part 1

Latin-1
Western European

1987, 1998
Perhaps the most widely used part of ISO/IEC 8859, covering most Western European languages: Danish (partial),[nb 1]Dutch (partial),[nb 2]English, Faeroese, Finnish (partial),[nb 3]French (partial),[nb 3]German, Icelandic, Irish, Italian, Norwegian, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish, Catalan, and Swedish. Languages from other parts of the world are also covered, including: Eastern European Albanian, Southeast Asian Indonesian, as well as the African languages Afrikaans and Swahili. The missing euro sign and capital Ÿ are in the revised version ISO/IEC 8859-15 (see below). The corresponding IANA character set is ISO-8859-1.

Part 2

Latin-2
Central European

1987, 1999
Supports those Central and Eastern European languages that use the Latin alphabet, including Bosnian, Polish, Croatian, Czech, Slovak, Slovene, Serbian, and Hungarian. The missing euro sign can be found in version ISO/IEC 8859-16.

Part 3

Latin-3
South European

1988, 1999

Turkish, Maltese, and Esperanto. Largely superseded by ISO/IEC 8859-9 for Turkish.

Part 4

Latin-4
North European

1988, 1998

Estonian, Latvian, Lithuanian, Greenlandic, and Sami.

Part 5

Latin/Cyrillic

1988, 1999
Covers mostly Slavic languages that use a Cyrillic alphabet, including Belarusian, Bulgarian, Macedonian, Russian, Serbian, and Ukrainian (partial).[nb 4]

Part 6

Latin/Arabic

1987, 1999
Covers the most common Arabic language characters. Does not support other languages using the Arabic script. Needs to be BiDi and cursive joining processed for display.

Part 7

Latin/Greek

1987, 2003
Covers the modern Greek language (monotonic orthography). Can also be used for Ancient Greek written without accents or in monotonic orthography, but lacks the diacritics for polytonic orthography. These were introduced with Unicode.

Part 8

Latin/Hebrew

1988, 1999
Covers the modern Hebrew alphabet as used in Israel. In practice two different encodings exist, logical order (needs to be BiDi processed for display) and visual (left-to-right) order (in effect, after bidi processing and line breaking).

Part 9

Latin-5
Turkish

1989, 1999
Largely the same as ISO/IEC 8859-1, replacing the rarely used Icelandic letters with Turkish ones.

Part 10

Latin-6
Nordic

1992, 1998
A rearrangement of Latin-4. Considered more useful for Nordic languages. Baltic languages use Latin-4 more.

Part 11

Latin/Thai

2001
Contains characters needed for the Thai language. Virtually identical to TIS 620.

Part 12

Latin/Devanagari
N/A
The work in making a part of 8859 for Devanagari was officially abandoned in 1997. ISCII and Unicode/ISO/IEC 10646 cover Devanagari.

Part 13

Latin-7
Baltic Rim

1998
Added some characters for Baltic languages which were missing from Latin-4 and Latin-6.

Part 14

Latin-8
Celtic

1998
Covers Celtic languages such as Gaelic and the Breton language.

Part 15

Latin-9

1999
A revision of 8859-1 that removes some little-used symbols, replacing them with the euro sign and the letters Š, š, Ž, ž, Œ, œ, and Ÿ, which completes the coverage of French, Finnish and Estonian.

Part 16

Latin-10
South-Eastern European

2001
Intended for Albanian, Croatian, Hungarian, Italian, Polish, Romanian and Slovene, but also Finnish, French, German and Irish Gaelic (new orthography). The focus lies more on letters than symbols. The currency sign is replaced with the euro sign.

Each part of ISO/IEC 8859 is designed to support languages that often borrow from each other, so the characters needed by each language are usually accommodated by a single part. However, there are some characters and language combinations that are not accommodated without transcriptions. Efforts were made to make conversions as smooth as possible. For example, German has all of its seven special characters at the same positions in all Latin variants (1–4, 9, 10, 13–16), and in many positions the characters only differ in the diacritics between the sets. In particular, variants 1–4 were designed jointly, and have the property that every encoded character appears either at a given position or not at all.



Table













































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































Comparison of the various parts (1–16) of ISO/IEC 8859
BinaryOctDec
Hex
1234567891011131415
16
1010 0000240160A0

Non-breaking space (NBSP)
1010 0001241161A1
¡ĄĦĄЁ  ¡Ą¡
Ą
1010 0010242162A2
¢˘ĸЂ ¢¢Ē¢¢
ą
1010 0011243163A3
£Ł£ŖЃ £Ģ£
Ł
1010 0100244164A4
¤Є¤¤Ī¤Ċ

1010 0101245165A5
¥Ľ ĨЅ ¥Ĩċ¥

1010 0110246166A6
¦ŚĤĻІ ¦Ķ¦
Š
1010 0111247167A7
§Ї §
§
1010 1000250168A8
¨Ј ¨ĻØ
š
1010 1001251169A9
©ŠİŠЉ ©Đ
©
1010 1010252170AA
ªŞĒЊ ͺ×ªŠŖª
Ș
1010 1011253171AB
«ŤĞĢЋ «Ŧ«
«
1010 1100254172AC
¬ŹĴŦЌ،¬Ž¬¬
Ź
1010 1101255173AD

soft hyphen (SHY)

SHY
1010 1110256174AE
®Ž ŽЎ  ®Ū®
ź
1010 1111257175AF
¯Ż¯Џ ¯ŊÆŸ¯
Ż
1011 0000260176B0
°А °°
°
1011 0001261177B1
±ąħąБ ±ą±
±
1011 0010262178B2
²˛²˛В ²ē²Ġ²
Č
1011 0011263179B3
³ł³ŗГ ³ģ³ġ³
ł
1011 0100264180B4
´Д ΄´ī
Ž
1011 0101265181B5
µľµĩЕ ΅µĩµµ

1011 0110266182B6
śĥļЖ Άķ

1011 0111267183B7
·ˇ·ˇЗ ··
·
1011 1000270184B8
¸И Έ¸ļø
ž
1011 1001271185B9
¹šıšЙ Ή¹đ¹¹
č
1011 1010272186BA
ºşēК Ί÷ºšŗº
ș
1011 1011273187BB
»ťğģЛ؛»ŧ»
»
1011 1100274188BC
¼źĵŧМ Ό¼ž¼
Œ
1011 1101275189BD
½˝½ŊН ½½
œ
1011 1110276190BE
¾ž žО Ύ¾ū¾
Ÿ
1011 1111277191BF
¿żŋП؟Ώ ¿ŋæ¿
ż
1100 0000300192C0
ÀŔÀĀР ΐ ÀĀĄ
À
1100 0001301193C1
ÁСءΑ ÁĮ
Á
1100 0010302194C2
ÂТآΒ ÂĀ
Â
1100 0011303195C3
ÃĂ ÃУأΓ ÃĆÃ
Ă
1100 0100304196C4
ÄФؤΔ Ä
Ä
1100 0101305197C5
ÅĹĊÅХإΕ ÅÅ
Ć
1100 0110306198C6
ÆĆĈÆЦئΖ ÆĘ
Æ
1100 0111307199C7
ÇĮЧاΗ ÇĮĒ
Ç
1100 1000310200C8
ÈČÈČШبΘ ÈČČ
È
1100 1001311201C9
ÉЩةΙ É
É
1100 1010312202CA
ÊĘÊĘЪتΚ ÊĘŹ
Ê
1100 1011313203CB
ËЫثΛ ËĖ
Ë
1100 1100314204CC
ÌĚÌĖЬجΜ ÌĖĢ
Ì
1100 1101315205CD
ÍЭحΝ ÍĶ
Í
1100 1110316206CE
ÎЮخΞ ÎĪÎ
1100 1111317207CF
ÏĎÏĪЯدΟ ÏĻÏ
BinaryOctDec
Hex
123456789101113141516
1101 0000320208D0
ÐĐ ĐаذΠ ĞЊŴÐ
1101 0001321209D1
ÑŃÑŅбرΡ ÑŅŃÑ
Ń
1101 0010322210D2
ÒŇÒŌвز  ÒŌŅÒ
1101 0011323211D3
ÓĶгسΣ ÓÓ
1101 0100324212D4
ÔдشΤ ÔŌÔ
1101 0101325213D5
ÕŐĠÕеصΥ Õ
Ő
1101 0110326214D6
ÖжضΦ ÖÖ
1101 0111327215D7
×зطΧ ×Ũ××
Ś
1101 1000330216D8
ØŘĜØиظΨ ØŲØ
Ű
1101 1001331217D9
ÙŮÙŲйعΩ ÙŲŁÙ
1101 1010332218DA
ÚкغΪ ÚŚÚ
1101 1011333219DB
ÛŰÛл Ϋ Û ŪÛ
1101 1100334220DC
Üм ά Ü Ü
1101 1101335221DD
ÝŬŨн έ İÝ ŻÝ
Ę
1101 1110336222DE
ÞŢŜŪо ή ŞÞ ŽŶÞ
Ț
1101 1111337223DF
ßп ίß฿
ß
1110 0000340224E0
àŕàāрـΰאàāąà
1110 0001341225E1
áсفαבáįá
1110 0010342226E2
âтقβגâāâ
1110 0011343227E3
ãă ãуكγדãćã
ă
1110 0100344228E4
äфلδהää
1110 0101345229E5
åĺċåхمεוåå
ć
1110 0110346230E6
æćĉæцنζזæęæ
1110 0111347231E7
çįчهηחçįē
ç
1110 1000350232E8
èčèčшوθטèčč
è
1110 1001351233E9
éщىιיé
é
1110 1010352234EA
êęêęъيκךêęź
ê
1110 1011353235EB
ëыًλכëė
ë
1110 1100354236EC
ìěìėьٌμלìėģ
ì
1110 1101355237ED
íэٍνםíķ
í
1110 1110356238EE
îюَξמîīî
1110 1111357239EF
ïďïīяُοןïļï
1111 0000360240F0
ðđ đِπנğðšŵð
đ
1111 0001361241F1
ñńñņёّρסñņńñ
ń
1111 0010362242F2
òňòōђْςעòōņò
1111 0011363243F3
óķѓ σףóó
1111 0100364244F4
ôє τפôōô
1111 0101365245F5
õőġõѕ υץõ
ő
1111 0110366246F6
öі φצöö
1111 0111367247F7
÷ї χק÷ũ÷÷
ś
1111 1000370248F8
øřĝøј ψרøųø
ű
1111 1001371249F9
ùůùųљ ωשùųłù
1111 1010372250FA
úњ ϊתúśú
1111 1011373251FB
ûűûћ ϋ ûūû
1111 1100374252FC
üќ ό ü ü
1111 1101375253FD
ýŭũ§ ύLRMıý żý
ę
1111 1110376254FE
þţŝūў ώRLMşþ žŷþ
ț
1111 1111377255FF
ÿ˙џ   ÿĸ ÿ
BinaryOctDec
Hex
123456789101113141516

At position 0xA0 there's always the non breaking space and 0xAD is mostly the soft hyphen, which only shows at line breaks. Other empty fields are either unassigned or the system used is not able to display them.


There are new additions as ISO/IEC 8859-7:2003 and ISO/IEC 8859-8:1999 versions. LRM stands for left-to-right mark (U+200E) and RLM stands for right-to-left mark (U+200F).



Relationship to Unicode and the UCS


Since 1991, the Unicode Consortium[nb 4] has been working with ISO and IEC to develop the Unicode Standard and ISO/IEC 10646: the Universal Character Set (UCS) in tandem. Newer editions of ISO/IEC 8859 express characters in terms of their Unicode/UCS names and the U+nnnn notation, effectively causing each part of ISO/IEC 8859 to be a Unicode/UCS character encoding scheme that maps a very small subset of the UCS to single 8-bit bytes. The first 256 characters in Unicode and the UCS are identical to those in ISO/IEC-8859-1 (Latin-1).


Single-byte character sets including the parts of ISO/IEC 8859 and derivatives of them were favoured throughout the 1990s, having the advantages of being well-established and more easily implemented in software: the equation of one byte to one character is simple and adequate for most single-language applications, and there are no combining characters or variant forms. As Unicode-enabled operating systems became more widespread, ISO/IEC 8859 and other legacy encodings became less popular. While remnants of ISO 8859 and single-byte character models remain entrenched in many operating systems, programming languages, data storage systems, networking applications, display hardware, and end-user application software, most modern computing applications use Unicode internally, and rely on conversion tables to map to and from other encodings, when necessary.



Development status


The ISO/IEC 8859 standard was maintained by ISO/IEC Joint Technical Committee 1, Subcommittee 2, Working Group 3 (ISO/IEC JTC 1/SC 2/WG 3). In June 2004, WG 3 disbanded, and maintenance duties were transferred to SC 2. The standard is not currently being updated, as the Subcommittee's only remaining working group, WG 2, is concentrating on development of Unicode's Universal Coded Character Set.



See also


  • List of computer character sets


  • RPL character set (An ISO 8859-1 superset on HP calculators, referred to as "ECMA-94" as well)


  • DEC Multinational Character Set (MCS)


  • DEC National Replacement Character Set (NRCS)


Notes




  1. ^ Missing several accented vowels including Ǿ and ǿ. These can be replaced with non-accented vowels at the cost of increased ambiguity.


  2. ^ Only the IJ/ij (letter IJ) is missing, which is usually represented as IJ.


  3. ^ ab Missing characters are in ISO/IEC 8859-15.


  4. ^ ab 8859-5 misses the Ґ/ґ letter, which was reintroduced into the Ukrainian alphabet in 1990.




References




  1. ^ Haralambous, Yannis (September 2007). Fonts & Encodings. Translated by Horne, P. Scott (1st ed.). Sebastopol, California, USA: O'Reilly Media, Inc. pp. 37–38. ISBN 978-0-596-10242-5. ISBN 0-596-10242-9. […] According to a urban legend, the French delegate was out sick the day when the standard came up for a vote and had to have his Belgian counterpart act as his proxy. In fact, the French delegate was an engineer, who was convinced that this ligature was useless, and the Swiss and German representatives pressed hard to have the mathematical symbols × and ÷ included at the positions where Œ and œ would logically appear. […] 


  2. ^ André, Jacques (2003-10-15) [2003-10-02]. André, Bernard; Baron, Georges-Louis; Bruillard, Éric, eds. "Histoire d'Œ, histoire d'@ des rumeurs typographiques et de leurs enseignements". Traitement de texte et production de documents INRP/GEDIAPS (in French): 19–34. Archived from the original on 2016-12-08. Retrieved 2016-12-09. 


  3. ^ André, Jacques (November 1996). "ISO Latin-1, norme de codage des caractères européens? trois caractères français en sont absents!" (PDF). Cahiers GUTenberg (in French) (25): 65–77. Archived from the original (PDF) on 2008-11-30. 



  • Published versions of each part of ISO/IEC 8859 are available, for a fee, from the ISO catalogue site and from the IEC Webstore.

  • PDF versions of the final drafts of some parts of ISO/IEC 8859 as submitted to the ISO/IEC JTC 1/SC 2/WG 3 for review & publication are available at the WG 3 web site:

    • ISO/IEC 8859-1:1998 - 8-bit single-byte coded graphic character sets, Part 1: Latin alphabet No. 1 (draft dated February 12, 1998, published April 15, 1998)


    • ISO/IEC 8859-4:1998 - 8-bit single-byte coded graphic character sets, Part 4: Latin alphabet No. 4 (draft dated February 12, 1998, published July 1, 1998)


    • ISO/IEC 8859-7:1999 - 8-bit single-byte coded graphic character sets, Part 7: Latin/Greek alphabet (draft dated June 10, 1999; superseded by ISO/IEC 8859-7:2003, published October 10, 2003)


    • ISO/IEC 8859-10:1998 - 8-bit single-byte coded graphic character sets, Part 10: Latin alphabet No. 6 (draft dated February 12, 1998, published July 15, 1998)


    • ISO/IEC 8859-11:1999 - 8-bit single-byte coded graphic character sets, Part 11: Latin/Thai character set (draft dated June 22, 1999; superseded by ISO/IEC 8859-11:2001, published 15 December 2001)


    • ISO/IEC 8859-13:1998 - 8-bit single-byte coded graphic character sets, Part 13: Latin alphabet No. 7 (draft dated April 15, 1998, published October 15, 1998)


    • ISO/IEC 8859-15:1998 - 8-bit single-byte coded graphic character sets, Part 15: Latin alphabet No. 9 (draft dated August 1, 1997; superseded by ISO/IEC 8859-15:1999, published March 15, 1999)


    • ISO/IEC 8859-16:2000 - 8-bit single-byte coded graphic character sets, Part 16: Latin alphabet No. 10 (draft dated November 15, 1999; superseded by ISO/IEC 8859-16:2001, published July 15, 2001)



  • ECMA standards, which in intent correspond exactly to the ISO/IEC 8859 character set standards, can be found at:

    • Standard ECMA-94: 8-Bit Single Byte Coded Graphic Character Sets - Latin Alphabets No. 1 to No. 4 2nd edition (June 1986)


    • Standard ECMA-113: 8-Bit Single-Byte Coded Graphic Character Sets - Latin/Cyrillic Alphabet 3rd edition (December 1999)


    • Standard ECMA-114: 8-Bit Single-Byte Coded Graphic Character Sets - Latin/Arabic Alphabet 2nd edition (December 2000)


    • Standard ECMA-118: 8-Bit Single-Byte Coded Graphic Character Sets - Latin/Greek Alphabet (December 1986)


    • Standard ECMA-121: 8-Bit Single-Byte Coded Graphic Character Sets - Latin/Hebrew Alphabet 2nd edition (December 2000)


    • Standard ECMA-128: 8-Bit Single-Byte Coded Graphic Character Sets - Latin Alphabet No. 5 2nd edition (December 1999)


    • Standard ECMA-144: 8-Bit Single-Byte Coded Character Sets - Latin Alphabet No. 6 3rd edition (December 2000)


  • ISO/IEC 8859-1 to Unicode mapping tables as plain text files are at the Unicode FTP site.

  • Informal descriptions and code charts for most ISO/IEC 8859 standards are available in ISO/IEC 8859 Alphabet Soup (Mirror)









Popular posts from this blog

California

New York City

CNN