Convolutional 2d knowledge graph embeddings T Dettmers, P Minervini, P Stenetorp, S Riedel AAAI 2018, 2018 | 3079 | 2018 |
Qlora: Efficient finetuning of quantized llms T Dettmers, A Pagnoni, A Holtzman, L Zettlemoyer NeurIPS 2023 (Oral), 2023 | 2000 | 2023 |
Bloom: A 176b-parameter open-access multilingual language model T Le Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, R Castagné, ... | 1626 | 2023 |
Llm. int8 (): 8-bit matrix multiplication for transformers at scale T Dettmers, M Lewis, Y Belkada, L Zettlemoyer NeurIPS 2022, 2022 | 853* | 2022 |
Sparse networks from scratch: Faster training without losing performance T Dettmers, L Zettlemoyer arXiv preprint arXiv:1907.04840, 2019 | 379 | 2019 |
Base layers: Simplifying training of large, sparse models M Lewis, S Bhosale, T Dettmers, N Goyal, L Zettlemoyer ICML 2021, 2021 | 234 | 2021 |
8-bit Approximations for Parallelism in Deep Learning T Dettmers ICLR 2016, 2016 | 228 | 2016 |
8-bit Optimizers via Block-wise Quantization T Dettmers, M Lewis, S Shleifer, L Zettlemoyer ICLR 2022 (Spotlight), 2022 | 218 | 2022 |
The case for 4-bit precision: k-bit inference scaling laws T Dettmers, L Zettlemoyer ICML 2023, 2023 | 170 | 2023 |
Spqr: A sparse-quantized representation for near-lossless llm weight compression T Dettmers, R Svirschevski, V Egiazarian, D Kuznedelev, E Frantar, ... arXiv preprint arXiv:2306.03078, 2023 | 162 | 2023 |
Branch-train-merge: Embarrassingly parallel training of expert language models M Li, S Gururangan, T Dettmers, M Lewis, T Althoff, NA Smith, ... arXiv preprint arXiv:2208.03306, 2022 | 123 | 2022 |
Petals: Collaborative inference and fine-tuning of large models A Borzunov, D Baranchuk, T Dettmers, M Ryabinin, Y Belkada, ... ACL 2022, Demonstration, 2022 | 82* | 2022 |
Stable and low-precision training for large-scale vision-language models M Wortsman, T Dettmers, L Zettlemoyer, A Morcos, A Farhadi, L Schmidt NeurIPS 2023, 2023 | 29 | 2023 |
Swarm parallelism: Training large models can be surprisingly communication-efficient M Ryabinin, T Dettmers, M Diskin, A Borzunov NeurIPS 2023, 2023 | 20 | 2023 |
Jack the reader-A machine reading framework D Weissenborn, P Minervini, T Dettmers, I Augenstein, J Welbl, ... arXiv preprint arXiv:1806.08727, 2018 | 12 | 2018 |
Training transformers together A Borzunov, M Ryabinin, T Dettmers, Q Lhoest, L Saulnier, M Diskin, ... NeurIPS 2021 Demonstration, 2022 | 9 | 2022 |
High performance natural language processing G Ilharco, C Ilharco, I Turc, T Dettmers, F Ferreira, K Lee EMNLP 2020, Tutorial, 2020 | 7 | 2020 |
Matformer: Nested transformer for elastic inference S Kudugunta, A Kusupati, T Dettmers, K Chen, I Dhillon, Y Tsvetkov, ... arXiv preprint arXiv:2310.07707, 2023 | 6 | 2023 |
Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model LZ Liu, T Dettmers, XV Lin, V Stoyanov, X Li EMNLP 2023, 2023 | 3 | 2023 |
OLMoE: Open Mixture-of-Experts Language Models N Muennighoff, L Soldaini, D Groeneveld, K Lo, J Morrison, S Min, W Shi, ... arXiv preprint arXiv:2409.02060, 2024 | 2 | 2024 |