Matthew A. Clarke
Matthew A. Clarke
Home
Projects
Talks
Publications
Contact
Light
Dark
Automatic
AI Alignment
Selective Generalization: Improving Capabilities While Maintaining Alignment
Training to improve capabilities may cause undesired changes in model behavior. For example, training models on oversight protocols or …
Ariana Azarbal
,
Matthew A. Clarke
,
Jorio Cocolla
,
Cailley Factor
,
Alex Cloud
PDF
Code
Dataset
Project
Cite
×