HTML Character Sets

Complete reference for character encodings in HTML documents

Recommended: UTF-8

UTF-8 is the recommended character encoding for HTML documents. It covers almost all characters and symbols in the world and is the most widely used character encoding on the web. Always use UTF-8 for new documents.

<meta charset="UTF-8">

Common Character Sets

Character Set Description
UTF-8 Universal character set. Covers all characters and symbols in the world. Recommended for all HTML documents.
UTF-16 Unicode Transformation Format 16-bit. Used by Windows and Java systems.
UTF-32 Unicode Transformation Format 32-bit. Fixed-width encoding using 32 bits per character.
ISO-8859-1 Latin alphabet No. 1. Covers Western European languages. Also known as Latin-1.
Windows-1252 Windows Latin-1. Similar to ISO-8859-1 but with additional characters.
ASCII American Standard Code for Information Interchange. 7-bit character set covering basic English characters.

ISO Character Sets

Character Set Description
ISO-8859-1 Latin alphabet No. 1 (Western European languages)
ISO-8859-2 Latin alphabet No. 2 (Central and Eastern European languages)
ISO-8859-3 Latin alphabet No. 3 (South European languages)
ISO-8859-4 Latin alphabet No. 4 (North European languages)
ISO-8859-5 Latin/Cyrillic alphabet
ISO-8859-6 Latin/Arabic alphabet
ISO-8859-7 Latin/Greek alphabet
ISO-8859-8 Latin/Hebrew alphabet
ISO-8859-9 Latin alphabet No. 5 (Turkish)
ISO-8859-10 Latin alphabet No. 6 (Nordic languages)
ISO-8859-15 Latin alphabet No. 9 (Western European languages with Euro sign)

Windows Character Sets

Character Set Description
Windows-1250 Central and Eastern European languages
Windows-1251 Cyrillic languages
Windows-1252 Western European languages
Windows-1253 Greek
Windows-1254 Turkish
Windows-1255 Hebrew
Windows-1256 Arabic
Windows-1257 Baltic languages
Windows-1258 Vietnamese

Other Character Sets

Character Set Description
GB2312 Simplified Chinese characters
Big5 Traditional Chinese characters
Shift_JIS Japanese characters
EUC-JP Japanese characters (Extended Unix Code)
EUC-KR Korean characters
KOI8-R Russian Cyrillic
TIS-620 Thai characters