About

Apertus Claritas is a platform for collecting and sharing interpretability research on Apertus, Switzerland's open multilingual language model. It brings together researchers, students and independent contributors to document what we are learning: what works, what fails and where understanding remains incomplete.

What makes Apertus Claritas unique is that it offers both an inside view and an open view. Rather than only showcasing polished results, it creates space for exploratory findings, intermediate insights, negative results and technically grounded reflections that help others understand this model more deeply.

Interpretability sketch one
Interpretability sketch two
Interpretability sketch three
Interpretability sketch four

Topics

Topics we cover include but are not limited to the following:

  • Features circuits, latent representations and geometry
  • Sparse autoencoders and transcoders
  • Probing, activation steering and interventions
  • Training dynamics across checkpoints and parameter scales
  • Safety monitoring including hallucinations, anthropomorphic concepts and behavioural drift
  • Agentic interpretability and automated monitoring
  • Tools, datasets and interactive interpretability interfaces

Get to know us

Team· 0

No team members yet.

Advisors· 0

No advisors yet.

Reviewers· 0

No reviewers yet.

Contributors· 0

No contributors yet.

Supporting labs
ivia-labLAS
Hosted within the Swiss AI ecosystem
EPFL AI CenterETH AI CenterCSCS