Trendaavat aiheet
#
Bonk Eco continues to show strength amid $USELESS rally
#
Pump.fun to raise $1B token sale, traders speculating on airdrop
#
Boop.Fun leading the way with a new launchpad on Solana.
𝗜'𝘃𝗲 𝗵𝗲𝗮𝗿𝗱 𝘁𝗵𝗶𝘀 𝗮 𝗹𝗼𝘁 𝗿𝗲𝗰𝗲𝗻𝘁𝗹𝘆: "𝗪𝗲 𝘁𝗿𝗮𝗶𝗻𝗲𝗱 𝗼𝘂𝗿 𝗿𝗼𝗯𝗼𝘁 𝗼𝗻 𝗼𝗻𝗲 𝗼𝗯𝗷𝗲𝗰𝘁 𝗮𝗻𝗱 𝗶𝘁 𝗴𝗲𝗻𝗲𝗿𝗮𝗹𝗶𝘀𝗲𝗱 𝘁𝗼 𝗮 𝗻𝗼𝘃𝗲𝗹 𝗼𝗯𝗷𝗲𝗰𝘁 - 𝘁𝗵𝗲𝘀𝗲 𝗻𝗲𝘄 𝗩𝗟𝗔 𝗺𝗼𝗱𝗲𝗹𝘀 𝗮𝗿𝗲 𝗰𝗿𝗮𝘇𝘆!"
Let's talk about what's actually happening in that "A" (Action) part of your VLA model.
The Vision and Language components? They're incredible. Pre-trained on internet-scale data, they understand objects, spatial relationships, and task instructions better than ever.
But the Action component? That's still learned from scratch on your specific robot demonstrations.
𝗛𝗲𝗿𝗲'𝘀 𝘁𝗵𝗲 𝗿𝗲𝗮𝗹𝗶𝘁𝘆: Your VLA model has internet-scale understanding of what a screwdriver looks like and what "tighten the screw" means. But the actual motor pattern for "rotating wrist while applying downward pressure"? That comes from your 500 robot demos.
𝗪𝗵𝗮𝘁 𝘁𝗵𝗶𝘀 𝗺𝗲𝗮𝗻𝘀 𝗳𝗼𝗿 "𝗴𝗲𝗻𝗲𝗿𝗮𝗹𝗶𝘀𝗮𝘁𝗶𝗼𝗻":
• 𝗩𝗶𝘀𝗶𝗼𝗻 𝗴𝗲𝗻𝗲𝗿𝗮𝗹𝗶𝘀𝗮𝘁𝗶𝗼𝗻: Recognises novel objects instantly (thanks to pre-training)
• 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗴𝗲𝗻𝗲𝗿𝗮𝗹𝗶𝘀𝗮𝘁𝗶𝗼𝗻: Understands new task instructions (thanks to pre-training)
• 𝗔𝗰𝘁𝗶𝗼𝗻 𝗴𝗲𝗻𝗲𝗿𝗮𝗹𝗶𝘀𝗮𝘁𝗶𝗼𝗻: Still limited to motor patterns seen during robot training
Ask that same robot to "unscrew the bottle cap" and it fails because:
• Vision: Recognises bottle and cap
• Language: Understands "unscrew"
• Action: Never learned the "twist while pulling" motor pattern
𝗧𝗵𝗲 𝗵𝗮𝗿𝗱 𝘁𝗿𝘂𝘁𝗵 𝗮𝗯𝗼𝘂𝘁 𝗩𝗟𝗔 𝗺𝗼𝗱𝗲𝗹𝘀:
The "VL" gives you incredible zero-shot understanding. The "A" still requires task-specific demonstrations.
We've cracked the perception and reasoning problem. We haven't cracked the motor generalisation problem.
37,55K
Johtavat
Rankkaus
Suosikit