Mingkang Dong | 董明康

Hi there! I am Mingkang Dong, an undergraduate student majoring in Software engineering at Universiti Malaya. My current research and engineering interests lie at the intersection of Vision-Language Models (VLMs), Large Language Models (LLMs), and Embodied AI.    Recently, I have been working on Vision-Language-Action (VLA) systems for robotic understanding    and multimodal reasoning, as well as medical VLMs that enable visual-textual understanding    in radiology and clinical imaging domains.

I have hands-on experience in fine-tuning and evaluating large multimodal models I am currently a research assistant at Shanghai Jiao Tong University, where I focus on integrating multimodal representation learning    with embodied intelligence and interactive perception.

Understanding how intelligence emerges from perception, language, and action — and turning that understanding into working systems —    is the challenge that keeps me excited every single day.

I am currently looking for a Mphil position or a funded RA position, available starting from 2027 March, physical internship preferred

Preprint / Publications

Projects

Evo-1 Embodied Deployment: Full-Stack VLA Implementation on SO101 robot arm

During my intern in Shanghai Jiao Tong University, I led the deployment of our team's lightweight Evo-1 VLA model Through optimization and adaptation for the SO101 robotic arm, I achieved great performance in complex, real-world tasks. Key achievements include Picking up trash, Precision placement of tennis balls, Precision placement of golf balls, Putting rectangular objects into the box, Fine grasping of delicate items like snacks, and Grasping of shuttlecocks.