On the variance of the adaptive learning rate and beyond L Liu, H Jiang, P He, W Chen, X Liu, J Gao, J Han arXiv preprint arXiv:1908.03265, 2019 | 1828 | 2019 |
Deberta: Decoding-enhanced bert with disentangled attention P He, X Liu, J Gao, W Chen arXiv preprint arXiv:2006.03654, 2020 | 1506 | 2020 |
Lora: Low-rank adaptation of large language models EJ Hu, Y Shen, P Wallis, Z Allen-Zhu, Y Li, S Wang, L Wang, W Chen arXiv preprint arXiv:2106.09685, 2021 | 1503 | 2021 |
Multi-task deep neural networks for natural language understanding X Liu, P He, W Chen, J Gao arXiv preprint arXiv:1901.11504, 2019 | 1251 | 2019 |
What Makes Good In-Context Examples for GPT-? J Liu, D Shen, Y Zhang, B Dolan, L Carin, W Chen arXiv preprint arXiv:2101.06804, 2021 | 535 | 2021 |
Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing P He, J Gao, W Chen arXiv preprint arXiv:2111.09543, 2021 | 381 | 2021 |
Smart: Robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization H Jiang, P He, W Chen, X Liu, J Gao, T Zhao arXiv preprint arXiv:1911.03437, 2019 | 357 | 2019 |
Reasonet: Learning to stop reading in machine comprehension Y Shen, PS Huang, J Gao, W Chen Proceedings of the 23rd ACM SIGKDD international conference on knowledge …, 2017 | 322 | 2017 |
Short text conceptualization using a probabilistic knowledgebase Y Song, H Wang, Z Wang, H Li, W Chen Proceedings of the twenty-second international joint conference on …, 2011 | 286 | 2011 |
Understanding the difficulty of training transformers L Liu, X Liu, J Gao, W Chen, J Han arXiv preprint arXiv:2004.08249, 2020 | 203 | 2020 |
Fusionnet: Fusing via fully-aware attention with application to machine comprehension HY Huang, C Zhu, Y Shen, W Chen arXiv preprint arXiv:1711.07341, 2017 | 196 | 2017 |
Improving multi-task deep neural networks via knowledge distillation for natural language understanding X Liu, P He, W Chen, J Gao arXiv preprint arXiv:1904.09482, 2019 | 177 | 2019 |
Document transformation for multi-label feature selection in text categorization W Chen, J Yan, B Zhang, Z Chen, Q Yang Seventh IEEE International Conference on Data Mining (ICDM 2007), 451-456, 2007 | 169 | 2007 |
Adversarial training for large neural language models X Liu, H Cheng, P He, W Chen, Y Wang, H Poon, J Gao arXiv preprint arXiv:2004.08994, 2020 | 133 | 2020 |
Check your facts and try again: Improving large language models with external knowledge and automated feedback B Peng, M Galley, P He, H Cheng, Y Xie, Y Hu, Q Huang, L Liden, Z Yu, ... arXiv preprint arXiv:2302.12813, 2023 | 128 | 2023 |
A novel click model and its applications to online advertising ZA Zhu, W Chen, T Minka, C Zhu, Z Chen Proceedings of the third ACM international conference on Web search and data …, 2010 | 128 | 2010 |
User-click modeling for understanding and predicting search-behavior Y Zhang, W Chen, D Wang, Q Yang Proceedings of the 17th ACM SIGKDD international conference on Knowledge …, 2011 | 127 | 2011 |
Generation-augmented retrieval for open-domain question answering Y Mao, P He, X Liu, Y Shen, J Gao, J Han, W Chen arXiv preprint arXiv:2009.08553, 2020 | 122 | 2020 |
Few-shot named entity recognition: A comprehensive study J Huang, C Li, K Subudhi, D Jose, S Balakrishnan, W Chen, B Peng, ... arXiv preprint arXiv:2012.14978, 2020 | 121* | 2020 |
X-SQL: reinforce schema representation with context P He, Y Mao, K Chakrabarti, W Chen arXiv preprint arXiv:1908.08113, 2019 | 109* | 2019 |