Mechanistic interpretability
Localizing sparse model components, causal features, and internal routing paths that explain downstream behavior.
Software Engineering Undergraduate
I study how modern AI systems make decisions, where those decisions become fragile, and how interventions can turn mechanistic clues into reliable understanding.
Research Direction
Localizing sparse model components, causal features, and internal routing paths that explain downstream behavior.
Studying how persuasive evidence can redirect model choices, and how compact interventions can monitor or block that shift.
Building on competitive programming experience to design careful experiments, efficient tooling, and robust evaluation pipelines.
Publication
The paper identifies a compact causal mechanism behind persuasion-induced factual errors: a small group of mid-layer attention heads routes answer options through a low-dimensional choice geometry. Persuasion redirects attention toward a target option, and intervention on a rank-one evidence-routing feature can steer or block the effect.
The mechanism appears across open-source LLMs and realistic poisoning settings such as Generative Engine Optimization, suggesting persuasion can be framed as a narrow, monitorable circuit rather than a diffuse loss of belief.
Honors
ICPC Regional Contest (Shanghai), Silver Medal
CCPC Invitational Contest (Northeast), Gold Medal
RoboCom Developer Competition, National First Prize
Toolkit