Guian Fang

I am a Ph.D. student at Show Lab, National University of Singapore, advised by Prof. Mike Zheng Shou.

Previously, I received my B.Eng. in HCPLab, Artificial Intelligence at the School of Intelligent Systems Engineering, Sun Yat-sen University, advised by Xiaodan Liang (梁小丹), co-supervised by Shengcai Liao.

Beyond research, I'm passionate about gaming and enjoy collecting various credit cards as a hobby. I believe in fostering open communication within the research community. Whether you'd like to chat about academic pursuits, share experiences, or explore collaborative opportunities, I'm always happy to connect.

Research

My research focuses on Generative Models for Vision, particularly Video World Models for video understanding and generation. Representative papers are highlighted. * denotes equal contribution.

ACG Creation Made Easy for Everyone.

MikoAI*, Guian Fang*

FramePrompt: In-context Controllable Animation with Zero Structural Changes

Generate 3D Worlds in Production with AI

Cybever*, Guian Fang*

Image-to-3D: Let AI do the heavy lifting so 3D Professionals can do the storytelling

HumanRefiner research visualization showing human pose refinement

HumanRefiner:
Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance

Guian Fang*, Wenbiao Yan*, Yuanfan Guo*, Jianhua Han, Zutao Jiang, Hang Xu, Shengcai Liao, Xiaodan Liang

ECCV, 2024

Dataset page Paper

A new large-scale benchmark (AbHuman) for anatomical anomalies in humans, and a plug-and-play method (HumanRefiner) for refining abnormal human generations with pose-reversible guidance.

ChartThinker framework diagram showing contextual chain-of-thought approach

ChartThinker:
A Contextual Chain-of-Thought Approach to Optimized Chart Summarization

Mengsha Liu, Daoyuan Chen, Yaliang Li, Guian Fang, Ying Shen

LREC-Coling, 2024

Dataset page Paper

ChartThinker leverages chain-of-thought reasoning and context retrieval to generate accurate and coherent chart summaries, outperforming previous methods on a diverse benchmark.

LLaMA2-Accessory toolkit architecture diagram

LLaMA2-Accessory:
An Open-source Toolkit for LLM Development

Chris Liu, Ziyi Lin, Guian Fang, Jiaming Han, Yijiang Liu, Renrui Zhang, Longtian Qiu, Yichi Zhang, Siyuan Huang

Checkpoint page Documentation

An open-source toolkit supporting pretraining, finetuning, and deployment of large language and multimodal models, making LLM development more accessible and efficient.

RealignDiff framework showing coarse-to-fine semantic re-alignment process

RealignDiff:
Boosting Text-to-Image Diffusion Model with Coarse-to-fine Semantic Re-alignment

Guian Fang*, Zutao Jiang*, Jianhua Han, Guansong Lu, Hang Xu, Shengcai Liao, Xiaojun Chang, Xiaodan Liang

TNNLS, 2023

Paper

RealignDiff introduces a two-stage semantic re-alignment strategy to significantly improve the consistency between generated images and text prompts in diffusion models.

Honors & Awards

2nd Place, Asia and Pacific Mathematical Contest in Modeling (2022)
Silver Medal, China Collegiate Algorithm Design & Programming Challenge Contest (2022)
2nd Place, Social Computing Innovation Competition (2022)
1st Place, National College Computer Ability Challenge (2023)
Recipient, Huawei Intelligent Foundation Scholarship (2022)
Recipient, National Encouragement Scholarship (2022)
1st Place, SYSU Outstanding Student Scholarship (2023)
Recipient, National Scholarship (2023)
Recipient, Li Xuerou Foundation Scholarship (2023)

Activities & Services

Conference Reviewer

CV: ECCV, CVPR, ICCV
ML: NeurIPS, ICLR, ICML, AISTATS
NLP: ACL Rolling Review (ARR): ACL, EMNLP, NAACL, EACL

Workshop Organizer

LOVEU Workshop @ CVPR 2024: Long-form Video Understanding Towards Multimodal AI Assistant and Copilot

Acknowledgements

I feel incredibly fortunate to have collaborated with such remarkable individuals who have generously offered me their mentorship.