Non-Autoregressive Coarse-to-Fine Video Captioning B Yang, Y Zou, F Liu, C Zhang In Proceedings of AAAI 2021, 2021 | 93 | 2021 |
A medical multimodal large language model for future pandemics F Liu, T Zhu, X Wu, B Yang, C You, C Wang, L Lu, Z Liu, Y Zheng, X Sun, ... NPJ Digital Medicine 6 (1), 226, 2023 | 46 | 2023 |
O2NA: An object-oriented non-autoregressive approach for controllable video captioning F Liu, X Ren, X Wu, B Yang, S Ge, Y Zou, X Sun In Findings of ACL 2021, 2021 | 42 | 2021 |
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation Y Li, B Yang, X Cheng, Z Zhu, H Li, Y Zou In Proceedings of ICCV 2023, 2023 | 32 | 2023 |
CLIP Meets Video Captioning: Concept-Aware Representation Learning Does Matter B Yang, T Zhang, Y Zou In Proceedings of PRCV 2022 (Oral), 2022 | 29* | 2022 |
Retrieve, reason, and refine: Generating accurate and faithful patient instructions F Liu*, B Yang*, C You, X Wu, S Ge, Z Liu, X Sun, Y Yang, D Clifton In Proceedings of NeurIPS 2022, 2022 | 17 | 2022 |
Concept-aware video captioning: Describing videos with effective prior information B Yang, M Cao, Y Zou IEEE Transactions on Image Processing, 2023 | 14 | 2023 |
PCLmed at ImageCLEFmedical 2023: Customizing General-Purpose Foundation Models for Medical Report Generation B Yang, A Raza, Y Zou, T Zhang In Proceedings of CLEF 2023, 2023 | 12* | 2023 |
Adaptive curriculum learning for video captioning S Li, B Yang, Y Zou IEEE Access 10, 31751-31759, 2022 | 12 | 2022 |
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning B Yang*, F Liu*, X Wu, Y Wang, Y Wang, Y Zou In Proceedings of ACL 2023, 2023 | 11 | 2023 |
WorldGPT: a Sora-inspired video AI agent as Rich world models from text and image inputs D Yang, L Hu, Y Tian, Z Li, C Kelly, B Yang, C Yang, Y Zou arXiv preprint arXiv:2403.07944, 2024 | 10 | 2024 |
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation B Yang*, F Liu*, Y Zou, X Wu, Y Wang, DA Clifton IEEE Transactions on Pattern Analysis and Machine Intelligence 46 (8), 5712-5724, 2024 | 9 | 2024 |
Visiongpt: Vision-language understanding agent using generalized multimodal framework C Kelly*, L Hu*, B Yang*, Y Tian, D Yang, C Yang, Z Huang, Z Li, J Hu, ... arXiv preprint arXiv:2403.09027, 2024 | 6 | 2024 |
Graph-in-graph network for automatic gene ontology description generation F Liu, B Yang, C You, X Wu, S Ge, A Woicik, S Wang In Proceedings of KDD 2022 (Oral), 2022 | 6 | 2022 |
Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels B Yang*, F Liu*, Z Li, Q Yin, C You, B Yin, Y Zou In Findings of ACL 2023, 2023 | 5 | 2023 |
Embracing Language Inclusivity and Diversity in CLIP through Continual Language Learning B Yang, Y Dai, X Cheng, Y Li, A Raza, Y Zou In Proceedings of AAAI 2024, 2024 | 4 | 2024 |
Visual oriented encoder: Integrating multimodal and multi-scale contexts for video captioning B Yang, Y Zou In Proceedings of ICPR 2020, 188-195, 2021 | 3 | 2021 |
MAKEN: Improving Medical Report Generation with Adapter Tuning and Knowledge Enhancement in Vision-Language Foundation Models S Wu, B Yang, Z Ye, H Wang, H Zheng, T Zhang 2024 IEEE International Symposium on Biomedical Imaging (ISBI), 1-5, 2024 | 2 | 2024 |
KC-Prompt: End-To-End Knowledge-Complementary Prompting for Rehearsal-Free Continual Learning Y Li, Y Liu, X Cheng, Z Zhu, HX Li, B Yang, Z Huang ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 2 | 2024 |
Improving Medical Report Generation with Adapter Tuning and Knowledge Enhancement in Vision-Language Foundation Models S Wu, B Yang, Z Ye, H Wang, H Zheng, T Zhang arXiv preprint arXiv:2312.03970, 2023 | 2 | 2023 |