The Intelligence Bottleneck: X Square Robot's Push for Embodied AI
The robotics industry is currently focused on spectacle, showcasing humanoids that can perform backflips or dance on stage. However, the real difficulty lies in teaching machines to function within the unpredictable environments of human life and work. X Square Robot is addressing this gap by targeting the intelligence deficit rather than the hardware.
According to CEO Wang Qian, the hardware foundations—including humanoid locomotion, dexterous hands, and force-control systems—are largely in place. The bottleneck is the brain. To bridge this gap, X Square Robot has open-sourced three specific technologies over the past several weeks: Wall-OSS-0.5 (a Vision-Language-Action model), WALL-WM (a World Action Model for understanding physical events), and XRZero-G0 (a framework for robot-free data collection designed to reduce training costs).
The deployment of Wall-OSS-0.5 shifts the focus from task-specific fine-tuning to the efficacy of pretraining itself. By deploying the pretrained model directly on physical robots, the company tested its performance across 17 real-world tasks. The results showed strong zero-shot performance in areas such as object sorting, ring stacking, and deformable-object manipulation.
This progress relies on a "gradient-bridged" training framework. Wall-OSS-0.5 does not separate perception from control; instead, it converts robot actions into action tokens that are learned alongside language and visual representations. This unified approach allows perception, language understanding, and action generation to evolve together. The company found that this method of action training improves manipulation ability and enhances visual grounding performance.
The strategic implication is clear: physical interaction can strengthen a model's fundamental understanding of the world. While many VLA systems can repeat learned trajectories, they often lack an understanding of physical cause and effect. X Square Robot’s move to open-source these models suggests a shift toward building foundational world models that prioritize physical reasoning over mere imitation.
Consider whether the industry can achieve true autonomy without first solving the cost of data collection through frameworks like XRZero-G0.
Subscribe to The Mansa Report
Strategic intelligence on AI, business building, and the future of technology. Delivered Monday through Friday.