Text Encoding

From Leo's Notes
Last edited on 30 December 2021, at 02:01.

The file contents containing 'Hello' in various encodings:

Description Binary Representation
This is the traditional ANSI encoding. 48 65 6C 6C 6F
This is the Unicode (little-endian) encoding with no BOM. 48 00 65 00 6C 00 6C 00 6F 00
This is the Unicode (little-endian) encoding with BOM. The BOM (FF FE) serves two purposes: First, it tags the file as a Unicode document, and second, the order in which the two bytes appear indicate that the file is little-endian. FF FE 48 00 65 00 6C 00 6C 00 6F 00
This is the Unicode (big-endian) encoding with no BOM. Notepad does not support this encoding. 00 48 00 65 00 6C 00 6C 00 6F
This is the Unicode (big-endian) encoding with BOM. Notice that this BOM is in the opposite order from the little-endian BOM. FE FF 00 48 00 65 00 6C 00 6C 00 6F
This is UTF-8 encoding. The first three bytes are the UTF-8 encoding of the BOM. EF BB BF 48 65 6C 6C 6F
This is UTF-7 encoding. The first five port this encoding. 2B 2F 76 38 2D 48 65 6C 6C 6F

See Also:[edit | edit source]