51. Tokenization Beyond Text

Learning objectives

  • Talk about tokenization beyond text

Tokenization Beyond Text

Text

tokenization of text

Images

Vectorization

tokenization of images

  • images are partitioned into patches

  • each patch is flattened into a vector

  • image source: Shusen Wang

Positional Encoding

positional encoding of patches

Audio

Abstraction

audio abstraction

Music

Fourier

signal domain