Imitative generalization

LW post

about transfer from what we can supervise to where we can't

Imitative Generalization: only using ML for IID tasks, imitating the way humans generalize

“how a human would answer if they'd learnt from all the data the model has learnt from”

aka “learning the prior”

correspondence with Bayesian inference

Imitative generalization with distributional shift:

labeled dataset: D

humans don't know how to identify all breeds, classify in dataset D'

in D', some huskies are not on snow

a human prior favors the assumption “husky is a fluffy dog” over “husky is a bunch of white pixels/snow”

jointly learning:

z = string of text instructions to label images ("Husky: large fluffy dog looking like a wolf. Greyhound: tall, very skinny dog. …")
H_prior(z): log probability that human assigns to instructions z
- going to train M_prior to approximate
H^L(y|x,z): human probability of assigning label “husky” given image x and instructions z
- train M^L(y|x,z)

find z^* maximizing M_prior(z) + \sum_{x,y in D} M^L(y|x,z).

then give z^* to humans, have humans use it to predict labels for images D' (i.e.: query H^L(y'|x',z*))

use H^L(y'|x',z*) to train model M^L_test to approximate H^L(.|.,z^*).