Data distributional properties drive emergent in-context learning in transformers S Chan, A Santoro, A Lampinen, J Wang, A Singh, P Richemond, ... Advances in Neural Information Processing Systems 35, 18878-18891, 2022 | 187 | 2022 |
The transient nature of emergent in-context learning in transformers A Singh, S Chan, T Moskovitz, E Grant, A Saxe, F Hill Advances in Neural Information Processing Systems 36, 2024 | 8 | 2024 |
Confronting reward model overoptimization with constrained rlhf T Moskovitz, AK Singh, DJ Strouse, T Sandholm, R Salakhutdinov, ... arXiv preprint arXiv:2310.04373, 2023 | 7 | 2023 |
Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs AK Singh, DJ Strouse arXiv preprint arXiv:2402.14903, 2024 | 4 | 2024 |
Decoding data quality via synthetic corruptions: Embedding-guided pruning of code data Y Yang, AK Singh, M Elhoushi, A Mahmoud, K Tirumala, F Gloeckle, ... arXiv preprint arXiv:2312.02418, 2023 | 4 | 2023 |
Know your audience: specializing grounded language models with listener subtraction AK Singh, D Ding, A Saxe, F Hill, AK Lampinen arXiv preprint arXiv:2206.08349, 2022 | 4* | 2022 |
Social alarms and reminders P Singh, A Deopura, V Gupta, P Singh, V Jeyachandran, A Singh, K Singh US Patent 10,382,613, 2019 | 2 | 2019 |
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation AK Singh, T Moskovitz, F Hill, SCY Chan, AM Saxe arXiv preprint arXiv:2404.07129, 2024 | | 2024 |
Training a speaker neural network using one or more listener neural networks AK Singh, F Ding, FG Hill, AK Lampinen US Patent App. 18/199,896, 2023 | | 2023 |
Deep Attentional Modulation for Zero-shot Learning in Object Recognition A Singh Massachusetts Institute of Technology, 2021 | | 2021 |