About
As an undergrad at Cornell, I worked on LLM interpretability and truthfulness, and was a primary contributor to the papers Representation Engineering and Localizing Lying in Llama. I have also done work on semantic representations in the brain and LLM robustness.
Papers
Localizing Lying in Llama: Understanding Instructed Dishonesty on True-False Questions Through Prompting, Probing, and Patching
Accepted at NeurIPS 2023 SoLaR Workshop
Summary: We prompt Llama-2-70B-chat to lie and localize mechanisms involved using activation patching and linear probing.
ArXiv | Code | ThreadConsiderations of Biological Plausibility in Deep Learning
Published front page of the Cornell Undergraduate Research Journal (CURJ)
Winner of the $300 James E. Rice Award
A literature review of learning algorithms and their biological plausibility
PaperProjects
ProctorAI
Proctor is a multimodal AI companion that watches your screen and yells at you if it sees you being unproductive. Within a few days, it gained over 250 stars on Github.
CodeTheoretical Bound on the Weights of a Neural Network under Nesterov's Accelerated Gradient Flow
I solved an "open question" posed in a previous paper by deriving this bound on the weights of a neural network. This was done in summer of 2021 during an REU at Johns Hopkins under Rene Vidal.
PDFCereBERTo: Improving Distributional Robustness with Brain-Like Language Representations
Improving the out-of-distribution robustness of BERT and GPT-2 by pretraining them to predict brain fMRI data.
Code