Follow
Haiyang Xu
Haiyang Xu
Alibaba Group, DIDI AI LABS, SEU
Verified email at seu.edu.cn - Homepage
Title
Cited by
Cited by
Year
mPLUG-Owl: Modularization empowers large language models with multimodality
Q Ye, H Xu, G Xu, J Ye, M Yan, Y Zhou, J Wang, A Hu, P Shi, Y Shi, C Li, ...
arXiv preprint arXiv:2304.14178, 2023
6312023
mPLUG-Owl2: Revolutionizing multi-modal large language model with modality collaboration
Q Ye, H Xu, J Ye, M Yan, H Liu, Q Qian, J Zhang, F Huang, J Zhou
CVPR2024 Highlight, 2023
1842023
Learning alignment for multimodal emotion recognition from speech
H Xu, H Zhang, K Han, Y Wang, Y Peng, X Li
InterSpeech2019, 2019
1712019
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
C Li, H Xu, J Tian, W Wang, M Yan, ...
EMNLP2022, 2022
134*2022
E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning
H Xu, M Yan, C Li, B Bi, S Huang, W Xiao, F Huang
ACL2021 Oral, 2021
1122021
mPLUG-2: A modularized multi-modal foundation model across text, image and video
H Xu, Q Ye, M Yan, Y Shi, J Ye, Y Xu, C Li
ICML2023, 2023
1062023
Neural Topic Modeling with Bidirectional Adversarial Training
R Wang, X Hu, D Zhou, Y He, Y Xiong, C Ye, H Xu
ACL2020, 2020
902020
mPLUG-DocOwl: Modularized multimodal large language model for document understanding
J Ye, A Hu, H Xu, Q Ye, M Yan, Y Dan, C Zhao, G Xu, C Li, J Tian, Q Qi, ...
arXiv preprint arXiv:2307.02499, 2023
782023
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
J Ye, A Hu, H Xu, Q Ye, M Yan, G Xu, C Li, J Tian, Q Qian, J Zhang, Q Jin, ...
EMNLP2023, 2023
712023
Evaluation and analysis of hallucination in large vision-language models
J Wang, Y Zhou, G Xu, P Shi, C Zhao, H Xu, Q Ye, M Yan, J Zhang, J Zhu, ...
arXiv preprint arXiv:2308.15126, 2023
712023
Hitea: Hierarchical temporal-aware video-language pre-training
Q Ye, G Xu, M Yan, H Xu, Q Qian, J Zhang, F Huang
ICCV2023, 2022
582022
An llm-free multi-dimensional benchmark for mllms hallucination evaluation
J Wang, Y Wang, G Xu, J Zhang, Y Gu, H Jia, H Xu, M Yan, J Zhang, ...
arXiv preprint arXiv:2311.07397, 2023
482023
Haiyang Xu, et al. mplug: Effective and efficient vision-language learning by cross-modal skip-connections
C Li, H Xu, J Tian, W Wang, M Yan
EMNLP2022 1 (2), 2022
44*2022
mPLUG-DocOwl 1.5: Unified structure learning for ocr-free document understanding
A Hu, H Xu, J Ye, M Yan, L Zhang, B Zhang, C Li, J Zhang, Q Jin, F Huang, ...
EMNLP2024, 2024
352024
Mobile-Agent: Autonomous multi-modal mobile device agent with visual perception
J Wang, H Xu, J Ye, M Yan, W Shen, J Zhang, F Huang, J Sang
ICLR2024 Workshop on Large Language Model (LLM) Agents, 2024
322024
Hallucination augmented contrastive learning for multimodal large language model
C Jiang, H Xu, M Dong, J Chen, W Ye, M Yan, Q Ye, J Zhang, F Huang, ...
CVPR2024, 2023
322023
An unsupervised Bayesian modelling approach for storyline detection on news articles
D Zhou, H Xu, Y He
EMNLP2015, 1943-1948, 2015
302015
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
Y Shi, X Yang, H Xu, C Yuan, B Li, W Hu, ZJ Zha
CVPR2022, 2021
292021
Semvlp: Vision-language pre-training by aligning semantics at multiple levels
C Li, M Yan, H Xu, F Luo, W Wang
arXiv preprint arXiv:2103.07829 3, 2021
262021
Unsupervised Storyline Extraction from News Articles.
D Zhou, H Xu, XY Dai, Y He
IJCAI2016, 3014-3021, 2016
252016
The system can't perform the operation now. Try again later.
Articles 1–20