Synthetic-Data-Driven MLLM for 3D Spatial Reasoning

指导老师：Jigang Wu创建者：姜荣臻

Vision-language models (VLMs) are advancedAI systems designed to process and understand information from both visual andtextual data simultaneously. By integrating deep learning techniques, thesemodels can interpret the content of images in a way that aligns with humanunderstanding of language, making them crucial for tasks that require nuancedinterpretation of multimedia content.

Spatial reasoning refers to the capabilityof vision-language models to interpret and comprehend the spatial relationshipsamong objects in images. For instance, when presented with an image of a soccerfield teeming with players during a match, models equipped with spatialreasoning can provide responses such as “player 7 is closest to player 10” or “the distance from player to the goal post is 8 meters.” Thisskill is vital for tasks requiring precise analysis of spatial information,such as in robotics.

However, the spatial reasoning performanceof current VLMs remains unsatisfactory. This limitation is not due to thestructure of the VLMs themselves but rather the quality and quantity of thetraining data. The majority of 3D datasets heavily rely on human annotations,which are inefficient and labor-intensive. Additionally, the scale of thesedatasets remains limited, with over half containing fewer than 100,000 datasamples. These challenges significantly impede the spatial reasoningcapabilities of vision-language models.

What is required is a greater quantity ofdata in the form of visual question-answer (VQA) pairs. Therefore, this projectaims to construct a pipeline leveraging the latest framework called ‘VQASynth’ toautomatically generate a large-scale dataset of VQA pairs. This initiative isintended to enhance the spatial reasoning capabilities of vision-languagemodels.

|

苏公网安备32011102010047号

Climbing Photography RobotYang Liu
SJTU Echo - Voice QA Agent for Campus GuideJigang Wu
Automatic Pill DispenserMian Li
Target Detection System Utilizing LiDAR and Vision Sensor FusionPeisen Huang
TrimMaticYang Liu
Warehouse TurketYang Liu
Development Project for Engineering Machinery Contour Scanning ToolsPeisen Huang
Hybrid Energy Emission Tracker BuoyMilias Liu
Multifunctional Target Tracking VehicleMian Li
Robotic Car with Auto HumidificationYang Liu
A Study of Pneumatically Actuated Conforming Ergonomic System for Custom Foot OrthosesShane Johnson
Piezoelectric Speed BumpMilias Liu
项目申请及展览要求Mian Li
Smart Walking StickMilias Liu
Energy Recovery BikeKwee-Yan Teh,Milias Liu
Hydropower HeaterMilias Liu
Design and Implementation of Embedded ARM Hardware PlatformPeisen Huang
Auto Bicycle Seat DryerMilias Liu
Teslab Thermoelectric GeneratorMilias Liu
Numerical Simulation on the Contractile Behavior of a Light-controlled Microtubule Kinesin Motor SystemQu Zijie
Design and Implementation of Outdoor Self-Powered Lightweight Computing PlatformAn Zou
THERMOELECTRIC GENERATOR UTILIZING SHOWER HEATMilias Liu
Optimizing Data Integrity and Efficiency through Advanced Deduplication TechniquesJigang Wu
Sun Tracking Solar PanelMilias Liu
Non-Intrusive Residential Load Disaggregation Technology Based on Deep LearningPeisen Huang
A Smart Nano-grid Green Energy Routing System with Advanced Hardware and SoftwarePeisen Huang
Self-Charging Smart WatchMilias Liu
Piezoelectric Dance FloorMilias Liu
Speech-Based Emotion Analysis for Mental Health MonitoringJigang Wu
Performance Evaluation of Waterproof Materials for FootwearPeisen Huang
GreenGlide: Agricultural Super Capacitor AirplaneMilias Liu
Mobile Zeppelin: Signal Transmitting in Crisis TimesMilias Liu
Solar-Powered Long-Endurance Broadcast Video AirshipMilias Liu
Clean Energy AircraftMilias Liu
Solar Energy Powered Hovering CarMilias Liu
LuminavigatorMilias Liu
Dora-Drone DeliveryMilias Liu
Phantom DroneMilias Liu
Soul SymphonyManuel Charlemagne
Solar Flying System in Inter-Floor DeliveryMilias Liu
Smart BalloonMilias Liu
A Mixed Reality-based Human-robot Interaction System for Smart ManufacturingPeisen Huang
Journey of Time:Exploring TombsManuel Charlemagne
Aeroelasticity of Different Roofs Design of Shanghai World Financial Center BuildingKwee-Yan Teh
Castle WarriorManuel Charlemagne
Traffic-Monitoring Drone Based on Solar Power GenerationMilias Liu
测试问题请忽略不影响前台正常访问Jigang Wu
Deep Blue Saga 2Manuel Charlemagne
Automatic Matcha MakerShane Johnson
Net Spreader for Fruit TreesShane Johnson
Influence of Wash Conditions on Failure of Fast Fashion Summer Clothing FabricKwee-Yan Teh
Humonoid Robotic LegShane Johnson
Mythology in PolalandManuel Charlemagne
Automatic Sous-Vide Egg Cooking MachineShane Johnson
Vibration Isolation Baby StrollerShane Johnson
Stuck on SundayNathaniel Murray
AmnesiaManuel Charlemagne
Time WielderManuel Charlemagne
Light and ShadowManuel Charlemagne
Nerf Bullet Retrieval Gaming SystemShane Johnson
App & Database Design & Development of JI University - Industry Capstone Project Chengbin Ma
SJTU Treasure Hunt Mini ProgramShane Johnson
Optimized Geometry, Dynamics, and Lightweight Design for Rotating PlatesPeisen Huang
AwakenManuel Charlemagne
Wireless Engineering Vehicle Diagnose Flashing Tool Phase IIChong Han
Jelly Rush: Samurai's EscapeManuel Charlemagne
Vehicle Infotainment System Automation Test ToolChong Han
Apply In-/Near-Memory Computing to Accelerate Blooming Effect ProcessorChong Han
Matrix DeceptionManuel Charlemagne
Honor JourneyManuel Charlemagne
Solar Power Detecting RoverTing Sun
Remote Control and Monitoring Solution for RobotChong Han
Shock Simulation RoverYanfeng Shen
Galaxy Fish based on Steering Gear and Bluetooth Remote ControlYanfeng Shen
Modeling Lorenz-Based Strange Attractor Using Analog CircuitsXuyang Lu
Doctor Octopus（a hexapod robot）Yanfeng Shen
Astrostride: Electromagnetic Footwear for Space Station MobilityYanfeng Shen
Gravity ConquerorYanfeng Shen
Interactive Mars Exploration Entertainment System: Music PuppyYanfeng Shen
Quad-leg Multi-terrain Mars RoverYanfeng Shen
Impact Mitigated Shoes Using CFMShane Johnson
Analysis of Vibration-Induced Measurement Inaccuracies in Riveting ProcessesPeisen Huang
Automatic Force Controlled Fruit PeelerShane Johnson
Red Planet ScoutYanfeng Shen
Space PianoYanfeng Shen
Mouse CatastropheManuel Charlemagne
Solar-powered droneMilias Liu
Dessert DeliveryYanfeng Shen
Personal Companion AI with Customized Large Language ModelJigang Wu
IoT Cat Care SystemShane Johnson
Pressure-Relieving Dodecahedron Timer Shane Johnson
SJTU Fluid Perplexus GameShane Johnson
Elm Game: A Memoir of MercenairiesManuel Charlemagne
Hysteretic Phenomena and Heating Effects of Metal Under Cyclic LoadingKwee-Yan Teh
Excavator Digital Twin Tool ChainPeisen Huang
Development and Design Improvements for Tea Field RobotsChengbin Ma
HW/SW Co-design for LLM QuantizationXinfei Guo
Software Platform for the Development and Testing of Entrepreneurship IdeasChong Han
Document Writing Software Products Based on SphinxChong Han