Kaisi Guan (关开思)

profile.png

I am a Master student at Gaoling School of Artificial Intelligence (GSAI), Renmin University of China (RUC). I am advised by Prof. Ruihua Song. Prior to this, I got my bachelor’s degree from GSAI in 2024.

My research centers on Omni understanding & generation, building systems that perceive the world across modalities, and learn to generate it in turn with multimodal.

Main Research Interests:

  • Omni Generation: Building controllable, high-fidelity generative models for video and audio, exploring how their modeling paradigms converge toward a unified framework, along with post-training methods for better alignment and controllability.

  • Omni Understanding: Investigating vision–language–audio interplay and building omni-modal models with stronger understanding of video and audio.

I will graduate in 2027 and am seeking job opportunities in video / audio / image generation. Feel free to contact me at guankaisi@ruc.edu.cn.

News

June, 2026 Our work Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction was accepted by ECCV 2026!
June, 2026 Our work ChronusOmni: Improving Time Awareness of Omni Large Language was accepted by ECCV 2026!
April, 2026 Our work HuM-Eval: A Coarse-to-Fine Framework for Human-Centric Video Evaluation was accepted by ICME 2026!
June, 2025 Our work ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering was accepted by ICCV 2025!
Sep , 2024 Our work BSharedRAG: Backbone Shared Retrieval-Augmented Generation for the E-commerce Domain was accepted by EMNLP 2024 !

Experiences

2026.4 - Present Weixin weixin PNG Image @ , Tencent. Advised by Wenjing Wang Intern
2025.1 - 2025.10 AIML @ Apple PNG Image Apple. Advised by Jeff Lai and Kieran Liu Intern
2023.1 - 2023.7 ModelBest @ Apple PNG Image Intern
2024.9 - Present AIMind Lab @ Gaoling School of Artificial Intelligence, ruc_logo PNG Image Advised by Ruihua Song Master Student
2020.9 - 2024.06 Gaoling School of Artificial Intelligence, ruc_logo PNG Image RUC Undergraduate student

Selected Publications

  1. Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction
    Kaisi Guan, Xihua Wang, Zhengfeng Lai, Xin Cheng, Peng Zhang, XiaoJiang Liu, Ruihua Song, Meng Cao
    arXiv preprint arXiv:2510.03117 2026
  2. ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering
    Kaisi Guan, Zhengfeng Lai, Yuchong Sun, Peng Zhang, Wei Liu, Kieran Liu, Meng Cao, and Ruihua Song
    International Conference on Computer Vision, ICCV 2025, Oct 2025
  3. BSharedRAG: Backbone Shared Retrieval-Augmented Generation for the E-commerce Domain
    Kaisi Guan, Qian Cao, Yuchong Sun, Xiting Wang, and Ruihua Song
    In Findings of the Association for Computational Linguistics: EMNLP 2024, Nov 2024
  4. SyncDPO: Enhancing Temporal Synchronization in Video-Audio Joint Generation via Preference Learning
    Xin Cheng, Xihua Wang, Ying Ba, Yuyue Wang, Kaisi Guan, Yinbo Wang, Wenpu Li, Ruihua Song
    arXiv preprint arXiv:2605.12179 2026
  5. HuM-Eval: A Coarse-to-Fine Framework for Human-Centric Video Evaluatio
    Bingzi Zhang, Kaisi Guan, Ruihua Song
    IEEE International Conference on Multimedia and Expo (ICME 2026).
  6. VSSFlow: Unifying Video-conditioned Sound and Speech Generation via Joint Learning
    Xin Cheng, Yuyue Wang, Xihua Wang, Yihan Wu, Kaisi Guan, Yijing Chen, Peng Zhang, Xiaojiang Liu, Meng Cao, Ruihua Song
    arXiv preprint arXiv:2509.24773 2026
  7. ChronusOmni: Improving Time Awareness of Omni Large Language Models
    Yijing Chen, Yihan Wu, Kaisi Guan, Yuchen Ren, Yuyue Wang, Ruihua Song, Liyun Ru
    arXiv preprint arXiv:2512.09841 2026