Robotics
WAM
2025ExperimentalPublished
Key
innovation
Unified training of world prediction and action generation in a single autoregressive transformer — the model jointly learns physical dynamics (future visual observations) and robot policy (action sequences), enabling richer embodied representations without separate world-model and policy networks.
Category
Robotics
Abstraction level
Pattern
Components
Visual tokenizer
Action head
Future-frame decoder
Language conditioning
Implementation
Reference implementations
Implementation pitfalls
Kolaps na łatwiejsze zadanieCritical
Słaba tokenizacja akcjiHigh
Wysoki koszt obliczeniowy treninguHigh
Sim-to-real gap w rolloutMedium
Evolution
Technical details
Hyperparameters (configurable axes)
Action tokenization schemeHigh
Future prediction horizonHigh
Action vs video loss weightingCritical
Decoder architectureMedium
Pretraining data mixHigh
Execution paradigm
Primary mode
dense
Activation pattern
all_paths_active
Parallelism
Parallelism level
partially_parallel
Scope
trainingacross_tokens
Hardware requirements
Primary
Good fit