Trafo Mech Int on the web!
A downloadable project
We created a simple web app allowing users to create some standard mechanistic interpretability plots (based on Stefan’s explainer) for arbitrary prompts.
By Stefan Heimersheim and Jonathan Ng.
If the page shows an error ("Oh no") pull out the menu at the bottom right, select the three dots, and click Reboot app. It can break if too many people use it at the same time.
Edit: Got a huggingface-hosted version now (16GB memory, crashes less, same code) here: https://huggingface.co/spaces/StefanHex/simple-trafo-mech-int
*Trafo is the German abbreviation for Transformer :)
Cover image by DALL E, Open AI.
Download
Download
Alignment Jam 4 Stefan & Jonathan.pdf 449 kB
Leave a comment
Log in with itch.io to leave a comment.