Követés
Xize Cheng(成曦泽)
Xize Cheng(成曦泽)
E-mail megerősítve itt: zju.edu.cn - Kezdőlap
Cím
Hivatkozott rá
Hivatkozott rá
Év
Connecting multi-modal contrastive representations
Z Wang, Y Zhao, H Huang, J Liu, A Yin, L Tang, L Li, Y Wang, Z Zhang, ...
Advances in Neural Information Processing Systems 36, 22099-22114, 2023
322023
Chat-3d v2: Bridging 3d scene and large language models with object identifiers
H Huang, Z Wang, R Huang, L Liu, X Cheng, Y Zhao, T Jin, Z Zhao
arXiv preprint arXiv:2312.08168, 2023
252023
Mixspeech: Cross-modality self-learning with audio-visual stream mixup for visual speech translation and recognition
X Cheng, T Jin, R Huang, L Li, W Lin, Z Wang, Y Wang, H Liu, A Yin, ...
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023
232023
Wavtokenizer: an efficient acoustic discrete codec tokenizer for audio language modeling
S Ji, Z Jiang, W Wang, Y Chen, M Fang, J Zuo, Q Yang, X Cheng, Z Wang, ...
arXiv preprint arXiv:2408.16532, 2024
222024
3drp-net: 3d relative position-aware network for 3d visual grounding
Z Wang, H Huang, Y Zhao, L Li, X Cheng, Y Zhu, A Yin, Z Zhao
arXiv preprint arXiv:2307.13363, 2023
162023
Opensr: Open-modality speech recognition via maintaining multi-modality alignment
X Cheng, T Jin, L Li, W Lin, X Duan, Z Zhao
arXiv preprint arXiv:2306.06410, 2023
162023
Distilling coarse-to-fine semantic matching knowledge for weakly supervised 3d visual grounding
Z Wang, H Huang, Y Zhao, L Li, X Cheng, Y Zhu, A Yin, Z Zhao
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023
162023
Av-transpeech: Audio-visual robust speech-to-speech translation
R Huang, H Liu, X Cheng, Y Ren, L Li, Z Ye, J He, L Zhang, J Liu, X Yin, ...
arXiv preprint arXiv:2305.15403, 2023
142023
Chat-scene: Bridging 3d scene and large language models with object identifiers
H Huang, Y Chen, Z Wang, R Huang, R Xu, T Wang, L Liu, X Cheng, ...
The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
122024
TAVT: Towards Transferable Audio-Visual Text Generation
W Lin, T Jin, W Pan, L Li, X Cheng, Y Wang, Z Zhao
Proceedings of the 61st Annual Meeting of the Association for Computational …, 2023
122023
Exploring group video captioning with efficient relational approximation
W Lin, T Jin, Y Wang, W Pan, L Li, X Cheng, Z Zhao
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023
102023
AudioLCM: Text-to-Audio Generation with Latent Consistency Models
H Liu, R Huang, Y Liu, H Cao, J Wang, X Cheng, S Zheng, Z Zhao
arXiv preprint arXiv:2406.00356, 2024
92024
Rethinking Missing Modality Learning from a Decoding Perspective
T Jin, X Cheng, L Li, W Lin, Y Wang, Z Zhao
Proceedings of the 31st ACM International Conference on Multimedia, 4431-4439, 2023
82023
FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
Z Wang, Z Zhang, X Cheng, R Huang, L Liu, Z Ye, H Huang, Y Zhao, T Jin, ...
Forty-first International Conference on Machine Learning, 0
8
Controlspeech: Towards simultaneous zero-shot speaker cloning and zero-shot language style control with decoupled codec
S Ji, J Zuo, W Wang, M Fang, S Zheng, Q Chen, Z Jiang, H Huang, ...
arXiv preprint arXiv:2406.01205, 2024
72024
Weakly-supervised spoken video grounding via semantic interaction learning
Y Wang, W Lin, S Zhang, T Jin, L Li, X Cheng, Z Zhao
Proceedings of the 61st Annual Meeting of the Association for Computational …, 2023
72023
Diffusion denoising process for perceptron bias in out-of-distribution detection
L Liu, Y Ren, X Cheng, R Huang, C Li, Z Zhao
arXiv preprint arXiv:2211.11255, 2022
72022
Wavchat: A survey of spoken dialogue models
S Ji, Y Chen, M Fang, J Zuo, J Lu, H Wang, Z Jiang, L Zhou, S Liu, ...
arXiv preprint arXiv:2411.13577, 2024
62024
Omnibind: Large-scale omni multimodal representation via binding spaces
Z Wang, Z Zhang, H Zhang, L Liu, R Huang, X Cheng, H Zhao, Z Zhao
arXiv preprint arXiv:2407.11895, 2024
62024
Molecule-Space: Free Lunch in Unified Multimodal Space via Knowledge Fusion
Z Wang, Z Zhang, X Cheng, R Huang, L Liu, Z Ye, H Huang, Y Zhao, T Jin, ...
arXiv preprint arXiv:2405.04883, 2024
62024
A rendszer jelenleg nem tudja elvégezni a műveletet. Próbálkozzon újra később.
Cikkek 1–20