53. Vision Transformers

Learning objectives

Conclude multimodal models

Multimodal Models

CLIP

Contrastive Language Image Pre-Training

CLIP architecture

image source: Chip Huyen

Vision Neurons

vision neurons

vision neurons

image source: dstill.pub

Quo Vadimus?

Presently, here are some more applications of multimodal models.

home

receipt bookkeeping

image source: Recycle This Pittsburgh
scan grocery receipts
OCR
AI text decoding
code expense report

pedagogy

COPUS

image source: COPUS

software dev

Vision Question Answering

VQA

image source: paper

comp bio

transfer learning between RNA and ATAC sequencing

scButterfly

image source: scButterfly

medicine

data types

image source: Science Direct