Afford-VLA: Grounding End-to-End Vision-Language-Action Models via Internalized Geometric Affordance Alignment
Traditional VLA models fail in fine manipulation due to implicit 2D spatial encoding. This technical review evaluates Afford-VLA, a framework projecting action spaces onto explicit 3D contact probability fields, while...
Robotopian Team |