Quick Facts
- Category: Science & Space
- Published: 2026-05-17 19:25:21
- APIs Revealed as Decisive Factor for AI Agent Readiness
- Machine-Speed Defense: How Automation and AI Reshape Cybersecurity Execution
- 10 Key Highlights of Python 3.15.0 Alpha 6
- 8 Reasons Why We're Still Begging for a CSS ::nth-Letter Selector
- Beyond Gender Stereotypes: The Science of Resource Seeking in Relationships
Introduction: The Promise and Challenge of World Models
World models—learned simulations that predict future states from current observations and actions—have advanced dramatically. They now forecast long sequences in high-dimensional visual spaces and generalize across tasks, resembling universal simulators rather than task-specific tools. Yet harnessing these models for control, especially over extended horizons, remains fraught with difficulties: gradient optimization turns ill-conditioned, non-greedy structures create stubborn local minima, and high-dimensional latents introduce subtle failure modes. To address these issues, we developed GRASP (Gradient-based Adaptive Stochastic Planner), a planner that makes long-horizon decision-making practical with three key innovations.

The Growing Power of World Models
Recent world models can predict hundreds of steps into the future, learning dynamics from raw pixels or latent representations. As they scale, they exhibit emergent capabilities—generalizing to unseen tasks and environments that were previously impossible to model. This progress suggests world models could serve as cheap, reusable simulators for planning, reinforcement learning, and robotics. But a powerful predictor doesn't automatically translate to effective control; the optimization machinery needs to be equally robust.
The Fragility of Long-Horizon Planning
Planning with learned dynamics is intrinsically harder than planning with ground-truth simulators. Gradients from the world model can be poorly conditioned, especially over many time steps. The optimization landscape becomes riddled with shallow valleys and sharp ridges, making gradient-based methods stall or diverge. Furthermore, when the world model is a deep neural network—often with visual encoders—the gradient of the state with respect to actions (state-input gradients) becomes brittle. Small changes in action can produce erratic updates through high-dimensional vision layers, breaking the planning process.
Long horizons amplify these problems: errors accumulate, and the planner must navigate increasingly non-convex terrain. Greedy or short-horizon methods fail to capture delayed rewards, while full trajectories overwhelm naive optimizers.
Introducing GRASP: Three Key Innovations
GRASP tackles these challenges head-on with three design principles that together stabilize and accelerate long-horizon planning in learned world models.
1. Lifting Trajectories into Virtual States
Conventional planning optimizes actions directly over a sequence of time steps, which couples the optimization across the horizon—updating early actions requires recomputing later dynamics. GRASP instead introduces virtual states: intermediate optimization variables at each time step that are constrained to match the world model's predicted states. This lifting decouples the temporal chain, allowing parallel optimization across all time steps. The planner can simultaneously adjust multiple points in the trajectory, dramatically speeding convergence.
2. Injecting Stochasticity for Exploration
Gradient-based planners often converge to poor local minima because they lack exploration. GRASP injects controlled stochasticity directly into the state iterates during optimization. By adding noise to the virtual states, the planner can escape shallow basins and discover better trajectories. This noise is carefully scheduled to reduce as optimization progresses, balancing exploration and exploitation. The result is a more robust search over the action space, especially in non-convex landscapes.

3. Reshaping Gradients to Bypass Vision Models
The most delicate part of planning with vision-based world models is the gradient flow through the visual encoder. GRASP reshapes gradients so that actions receive clean, well-conditioned signals without passing through high-dimensional image features. Instead of relying on the brittle state-input gradient, it computes an alternative gradient that directly ties action changes to future state changes, bypassing the vision model's internal complexity. This prevents the planner from being misled by irrelevant visual details and improves numerical stability.
How GRASP Works in Practice
In experiments on continuous control tasks with visual observations, GRASP consistently outperformed prior gradient-based planners. It achieved lower cumulative cost over long horizons (up to 100 steps) and required fewer gradient steps. The parallelization from virtual states enabled efficient GPU utilization, and the stochasticity prevented premature convergence. The gradient reshaping proved crucial in tasks where visual inputs contained distracting information, such as texture variations or background changes—common in real-world scenarios.
Conclusion and Future Directions
GRASP demonstrates that gradient-based planning with large world models can be made reliable, even for lengthy trajectories. Its three innovations—virtual state lifting, stochastic exploration, and gradient reshaping—address the core weaknesses of existing approaches. As world models continue to scale, robust planners like GRASP will be essential to unlock their full potential for autonomous decision-making. Future work will extend GRASP to discrete action spaces, incorporate learned uncertainty estimates, and integrate with model-based reinforcement learning algorithms. The code and experiments are available as part of our broader research effort, supporting reproducible science.
This work is done with Mike Rabbat, Aditi Krishnapriyan, Yann LeCun, and Amir Bar.