26. Sampling and Structured Outputs

Learning objectives

  • Understand how token distribution modifies the responses

Sampling

From Decoding Strategies in Large Language Models by Maxime Labonne.

Why is sampling important?

LLMs don’t directly produce text.

They calculate logits: scores assigned to every possible token.

Logits in LLMS

How do these probabilities generate text? With decoding methods

Top-k sampling

It uses the probability distribution to select a token from the k most likely options.

It introduces randomness in the selection process.

Top-k distribution

It can use the temperature parameter

Top-k sampling

Top-k

Generated text: I have a dream job and I want to

Nucleus sampling or Top-p sampling

It chooses a cutoff value p such as the sum of the probabilities of the selected tokens exceeds p.

It forms a nucleus of tokens from which to randomly choose the next token.

Top-p distribution

The number of tokens on the nucleus can vary from step to step.

If the generated probability distributions vary considerably, the selection of tokens might not be always among the most probable ones.

Generation of unique and varied sequences.

Top-p

Generated text: I have a dream. I'm going to

Structured Outputs

Structured Outputs

If the LMM is a component of a larger system or pipeline, we need a way to pass the output as an input of another component.

We need to work with schemas such as:

  • JSON
  • Pydantic: data validation library for Ptyhion.
    • It ensures data from LLM is accurate and valid
  • Outlines: make the LLMs speak the language of every application.