Encoding

  • Encoding = how each character is represented by one or more bytes
    • 1 byte is 8 bits (base-2 octet) = 2 hex digits (base-16)
  • Old way = multiple standards
    • Byte b1 in “Latin1” = “±”
    • Byte b1 in “Latin2” = “ą”
  • Modern = UTF-8 (all human characters)
    • Seriously, ALL: "\uA66E" = ꙮ (“multiocular O”)
      • From a single 15th-century manuscript
      • As “o” in a word meaning “many-eyed”