About me

Hi there!
I’m Niklas, currently a Researcher @Anthropic, where I work on Finetuning. Before that, I was a Research Scientist @Meta AI (FAIR), working on memory in language models, reasoning with structured data, cryptography and different types of interpretability.
From 2021-2023, I was a Physics + AI postdoc @MIT, where I worked on interpretability and robustness, both researching in the AI space aswell as using gained insights for more trustable data collection in collider physics.
In 2021 I completed a PhD in Physics from CERN, where I developed trigger & core software for the upgrade of the LHCb experiment.
Python and C++ are the languages I do best.

See my CV for more details.

When not at work, you might find me playing video games, in the gym, outside, playing lasertag, skiing, biking or so.

Papers

Last update: Mar 8, 2025

While at CERN and MIT, I was part of a large experimental physics collaboration, LHCb, which jointly publishes everything. I did not have direct contributions to the papers individually, but I was part of the team that developed the software and algorithms to be able to collect the data in the first place. This makes my Scholar page a little wonky, therefore I keep track here, where I contributed directly to the papers.

A parallel algorithm for fast reconstruction of proton collisions on heterogeneous architectures (Preprint)
Transformers Can Navigate Mazes With Multi-Step Prediction (Preprint)
MagicPIG: LSH Sampling for Efficient LLM Generation (ICLR)
The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More (NeurIPS)
Memory Mosaics (ICLR)
Transforming the Bootstrap: Using Transformers to Compute Scattering Amplitudes in Planar N = 4 Super Yang-Mills Theory (Preprint)
The cool and the cruel: separating hard parts of LWE secrets (AfricaCrypt)
From Neurons to Neutrons: A Case Study in Interpretability (ICML)
Salsa Fresca: Angular Embeddings and Pre-Training for ML Attacks on Learning With Errors (Preprint)
DiSK: A Diffusion Model for Structured Knowledge (Preprint)
NuCLR: Nuclear Co-Learned Representations (IMCL SynS&ML)
Development of the Topological Trigger for LHCb Run 3 (ACAT)
Finding NEEMo: Geometric Fitting using Neural Estimation of the Energy Movers Distance (ML4PS NeurIPS)
Towards Understanding Grokking: An Effective Theory of Representation Learning (NeurIPS)
Robust and Provably Monotonic Networks (ML: Sci. Technol.) & Expressive Monotonic Neural Networks (ICLR) (once fledged out a little more for physics, once for ML audience)
A Comparison of CPU and GPU Implementations for the LHCb Experiment Run 3 Trigger (Comp. Soft. Big Sci.)
Thesis: A Selection Framework for LHCb’s Upgrade Trigger
Evolution of the energy efficiency of LHCb’s real-time processing (CHEP)
Configuration and scheduling of the LHCb trigger application (CHEP)
A new scheduling algorithm for the LHCb upgrade trigger application (ACAT)
New approaches for track reconstruction in LHCb’s Vertex Locator (CHEP)