https://www.lesswrong.com/posts/JKj5Krff5oKMb8TjT/imitative-generalisation-aka-learning-the-prior-1
LW post
about transfer from what we can supervise to where we can't
Imitative Generalization: only using ML for IID tasks, imitating the way humans generalize
“how a human would answer if they'd learnt from all the data the model has learnt from”
aka “learning the prior”
correspondence with Bayesian inference
Imitative generalization with distributional shift:
labeled dataset: D
humans don't know how to identify all breeds, classify in dataset D'
in D', some huskies are not on snow
a human prior favors the assumption “husky is a fluffy dog” over “husky is a bunch of white pixels/snow”
jointly learning:
- z = string of text instructions to label images ("Husky: large fluffy dog looking like a wolf. Greyhound: tall, very skinny dog. …")
- H_prior(z): log probability that human assigns to instructions z
- going to train M_prior to approximate
- H^L(y|x,z): human probability of assigning label “husky” given image x and instructions z
- train M^L(y|x,z)
find z^* maximizing M_prior(z) + \sum_{x,y in D} M^L(y|x,z).
then give z^* to humans, have humans use it to predict labels for images D' (i.e.: query H^L(y'|x',z*))
use H^L(y'|x',z*) to train model M^L_test to approximate H^L(.|.,z^*).

https://www.lesswrong.com/posts/SL9mKhgdmDKXmxwE4/learning-the-prior