4. Online Learning and Regret Minimization

Learning objectives

Introduce terminology about optimization

I hope that you like math

Online Convex Optimization

OCO

Regret

$\text{Regret}_{T}(A) = \text{sup}\left[\sum_{t=1}^{T}f_{t}(x_{t}^{A}) - \text{min}_{x}\sum_{t=1}^{T}f_{t}(x)\right]$

$\vec{x}_{t}^{A}$ : player actions of an algorithm in a decision set
$T$ : number of game iterations

Applications

spam filtering
path finding
portfolio selection
recommendation systems

Experts and Adversaries

Theorem 1.2 Let $\epsilon\in(0,0.5)$ . Suppose that the best expert makes $L$ mistakes. Then:

$\exists$ an efficient deterministic algorithm $< 2(1+\epsilon)L + \frac{2\log N}{\epsilon}$ mistakes
$\exists$ an efficient randomized algorithm $\leq (1+\epsilon)L + \frac{\log N}{\epsilon}$ mistakes

Weighted Majority Algorithm

predict according to majority of experts

$a_{t} = \begin{cases} A, & W_{t}(A) \geq W_{t}(B) \\ B, & \text{otherwise}\end{cases}$

update weights

$W_{t+1}(i) = \begin{cases}W_{t}(i), & \text{if expert i was correct} \\ W_{t}(i)(1-\epsilon), & \text{if expert i was wrong}\end{cases}$

Hedging

$W_{t+1}(i) = W_{t}(i)e^{-\epsilon \ell_{t}(i)}$

$\epsilon$ : learning rate
$\ell_{t}(i)$ : loss by expert $i$ at iteration $t$

DSLC.io/genai | DSLC.io

4. Online Learning and Regret Minimization

Slides
Tools
Close

4. Online Learning and Regret Minimization
Learning objectives
Online Convex Optimization
Regret
Applications
Experts and Adversaries
Weighted Majority Algorithm
Hedging

f Fullscreen
s Speaker View
o Slide Overview
e PDF Export Mode
r Scroll View Mode
? Keyboard Help