Hi, I'm Ian.
I love deep problems in CS and AI. I build systems that make AI feel like magic. Doing that @ Salesforce.
ResearchThree Functional Roles of the Per-Layer Embedding Gate in Gemma-4 E2B
I ran polysemy tests, magnitude decompositions, and a full causal ablation battery across all 35 layers of Gemma 4 E2B's Per-Layer Embedding gate. The gate contains at least three independent mechanisms with different causal signatures: Layer 6 carries word-sense information correlationally but its causal contribution is syntactic and lexical; Layers 13-14 inject a high-magnitude token-identity signal that is net-harmful on English and German but net-helpful on Chinese; Layer 33 is a late-stage output prior whose removal is severely damaging (+1.59 NLL). The primary evidence for the L13/14 finding is mean-ablation (−0.159 nats, P=1.000 on 500k tokens), not zero-ablation. The Chinese sign flip means domain-conditioned analysis, not uniform pruning. Treating PLE as a single mechanism is the wrong unit of analysis.
ResearchCracking Open Gemma 3 4B Part 2: Transcoders And Generation-Time Behavioral Circuits
Transcoders decompose MLP computation rather than residual stream state. The headline generation-time overconfidence result (Cohen's d=3.22) was topic-confounded: with topic-matched pairs and 95% bootstrap CIs, L29 Feature 216 lands at d=2.50 CI [1.79, 3.74] and L22 Feature 314 at d=1.72 CI [1.09, 2.73], while L17's signal drops to d=1.22. Over-refusal L17 F3109 (d=4.41) is the cleanest encoding-time feature, and steering it across a 0→300 clamp sweep produces clean dose-response modulation without coherence collapse — which revises the earlier 'transcoder features are polysemantic and unsteerable' framing as an artifact of zero-ablating multiple features at once, not a property of transcoder features themselves. Sycophancy transcoder features failed a 4-class OOD probe (10 of 20 fire at 0.0 on all probe classes); they are format-specific to the discovery dataset. Hallucination stays null at both 16k and 65k transcoder widths — wider decomposition disperses rather than sharpens the signal.
ResearchLarge Audio Deepfake detection models perform well on academic benchmarks but fail in the real world compared to smaller models
A 2M parameter model with no pretrained backbone beat my 350M parameter WavLM pipeline by 24 percentage points on out-of-distribution data. I ran 50 experiments across four architectures, multiple datasets, and different audio codecs. The results inverted every assumption I had.
ResearchCracking Open Gemma 3 4B Part 1: Finding Behavioral Circuits With Sparse Autoencoders
I ran contrastive feature discovery across six model behaviors, four layers, and hundreds of prompts to find SAE features that reliably detect sycophancy resistance, over-refusal, hallucination, and more. What I initially called 'sycophancy features' turned out to be the opposite: the circuit the model uses to resist agreeable pressure. Hallucination at encoding time produced almost nothing, but shifting the monitor to generation time found real candidates. The difference comes down to where in the forward pass each behavior lives.
ProjectAI-SPY Text Detection: AI Writing Detection That Shows Its Work
A multi-level AI text detection system built on DeBERTa and a custom dual-path architecture — sentence, paragraph, and document analysis with attention-based attribution so you can see exactly which parts triggered the verdict.
BlogBuilding AI text detection that explains itself
Most AI detectors give you a percentage and call it a day. We built one that shows you which sentences triggered the verdict, how much each one mattered, and why. It uses attention-based attribution on a sliding window transformer.