|Roberta: A robustly optimized bert pretraining approach|
Y Liu, M Ott, N Goyal, J Du, M Joshi, D Chen, O Levy, M Lewis, ...
arXiv preprint arXiv:1907.11692, 2019
|Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension|
M Lewis, Y Liu, N Goyal, M Ghazvininejad, A Mohamed, O Levy, ...
arXiv preprint arXiv:1910.13461, 2019
|Spanbert: Improving pre-training by representing and predicting spans|
M Joshi, D Chen, Y Liu, DS Weld, L Zettlemoyer, O Levy
Transactions of the association for computational linguistics 8, 64-77, 2020
|Chromatographic peak alignment using derivative dynamic time warping|
C Bork, K Ng, Y Liu, A Yee, M Pohlscheidt
Biotechnology Progress 29 (2), 394-402, 2013
|Multilingual denoising pre-training for neural machine translation|
Y Liu, J Gu, N Goyal, X Li, S Edunov, M Ghazvininejad, M Lewis, ...
arXiv preprint arXiv:2001.08210, 2020
|Recipes for building an open-domain chatbot|
S Roller, E Dinan, N Goyal, D Ju, M Williamson, Y Liu, J Xu, M Ott, ...
arXiv preprint arXiv:2004.13637, 2020
|Mask-predict: Parallel decoding of conditional masked language models|
M Ghazvininejad, O Levy, Y Liu, L Zettlemoyer
arXiv preprint arXiv:1904.09324, 2019
|Cloze-driven pretraining of self-attention networks|
A Baevski, S Edunov, Y Liu, L Zettlemoyer, M Auli
arXiv preprint arXiv:1903.07785, 2019
|Hierarchical learning for generation with long source sequences|
T Rohde, X Wu, Y Liu
arXiv preprint arXiv:2104.07545, 2021