A downloadable project

We created a simple web app allowing users to create some standard mechanistic interpretability plots (based on Stefan’s explainer) for arbitrary prompts.

By Stefan Heimersheim and Jonathan Ng.

If the page shows an error ("Oh no") pull out the menu at the bottom right, select the three dots, and click Reboot app. It can break if too many people use it at the same time.

Edit: Got a huggingface-hosted version now (16GB memory, crashes less, same code) here: https://huggingface.co/spaces/StefanHex/simple-trafo-mech-int

*Trafo is the German abbreviation for Transformer :)

Cover image by DALL E, Open AI.

Download

Download
Alignment Jam 4 Stefan & Jonathan.pdf 449 kB

Leave a comment

Log in with itch.io to leave a comment.