Announcement_2
🚨 Our new preprint, Weight Updates as Activation Shifts, is out! We move beyond trial-and-error by deriving a principled framework for activation steering. Code here.
🚨 Our new preprint, Weight Updates as Activation Shifts, is out! We move beyond trial-and-error by deriving a principled framework for activation steering. Code here.