Mingkang Dong | 董明康

Hi there! I am Mingkang Dong, an undergraduate student majoring in Software engineering at Universiti Malaya. My current research and engineering interests lie at the intersection of Vision-Language Models (VLMs), Large Language Models (LLMs), and Embodied AI.    Recently, I have been working on Vision-Language-Action (VLA) systems for robotic understanding    and multimodal reasoning, as well as medical VLMs that enable visual-textual understanding    in radiology and clinical imaging domains.

I have hands-on experience in fine-tuning and evaluating large multimodal models I am currently a research assistant at Shanghai Jiao Tong University, where I focus on integrating multimodal representation learning    with embodied intelligence and interactive perception.

Understanding how intelligence emerges from perception, language, and action — and turning that understanding into working systems —    is the challenge that keeps me excited every single day.

I am currently looking for a Mphil position or a funded RA position, available starting from 2027 March, physical internship preferred

Preprint / Publications

Projects

Evo-1 Embodied Deployment: Full-Stack VLA Implementation on SO101 robot arm

During my intern in Shanghai Jiao Tong University, I led the deployment of our team's lightweight Evo-1 VLA model Through optimization and adaptation for the SO101 robotic arm, I achieved outstanding performance in complex, real-world tasks. Key achievements include Picking up trash,Precision placement of tennis balls, Putting rectangular into the box, and Fine grasping of delicate items like snacks.