一站式 Web3 探索中心 | 去中心化应用商店 & Web3 线下活动

热门话题

让我给你描绘一下场景。假设你想让一个类人机器人为你做一顿美味的晚餐，而你则坐在沙发上看Netflix。从技术层面来看，这将如何运作，因为类人机器人需要多个大脑来为你做晚餐？机器人的智能并不是单一的，而是一个AI模块团队，结合了缓慢的深思熟虑与快速的反应（系统2 + 系统1设计）。它的视觉-语言-行动（VLA）模型将认知分为推理模块和反应控制策略。由于机器人运行在多脑认知架构上，它会启动一个专门的“厨师”操作员来处理你的请求，比如用摄像头巡视厨房，查找食谱，然后指挥它的肢体开始切菜。这些大脑可以分解为以下操作员。大脑 #1：要做一顿美味的晚餐，你需要一个执行规划者。它解释你的命令（“准备晚餐”）以确定目标（做意大利面）。通过自然语言理解，它决定需要哪些子任务（找食材、煮意大利面、摆桌子等）以及哪些其他大脑应该处理每个任务。它协调多代理系统：激活视觉、知识和运动的专门模块。这个深思熟虑的大脑（系统2）做出高层决策，制定方法，并在任何物理动作开始之前分配责任。大脑 #2：现在你有了食谱，你需要一些机器人的眼睛和空间意识。它处理摄像头的反馈，以识别食材、工具及其在厨房中的位置。通过先进的计算机视觉，它看到切菜板、冰箱里的蔬菜、台面上的刀等。它构建了环境的3D地图，并跟踪相关物体（比如盐或锅的位置）。这个感知大脑（系统2）运行速度比反应慢，但为规划提供准确的场景上下文。通过识别所有相关的部分，它让机器人在现实世界中获得知识。大脑 #3：这个大脑充当机器人的知识库和记忆（系统2）。它检索和解析任务所需的信息，在这种情况下，是合适的食谱和烹饪说明。它可能会查询在线食谱或其内部数据库以获取意大利面食谱，然后解释步骤（煮水、切蒜等）。它回忆起关于厨房的事实（比如香料放在哪里）和过去的烹饪经验。本质上，提供语义理解和世界知识。然后将抽象指令（如“焦糖化洋葱”）计算为具体参数（温度、时间），以便机器人可以执行，确保计划符合你的偏好。大脑 #4：在目标和环境明确后，我们制定了详细的计划。它将高层目标分解为有序的动作和条件步骤。它安排任务（有时并行进行，比如在切菜的同时预热烤箱）并设定里程碑（水煮开、酱料准备好）。它还跟踪进度，并可以在事情发生变化时即时重新规划（比如某种食材缺失）。然后将这个动作序列交给运动层的大脑执行。另一个系统2的大脑。大脑 #5：是时候从系统2架构转向系统1，将计划转化为具体的机器人动作。对于每个动作（如“走到冰箱”或“切胡萝卜”），它为机器人的身体和肢体生成适用的轨迹。这个模块处理路径规划和逆向运动学，计算关节路径和角度，以便机器人平稳移动而不发生碰撞。它通常应用学习到的运动策略（如扩散变换策略）来为复杂任务产生流畅的动作。如果大脑4说要从冰箱里取一个锅，大脑5就会计算出如何让机器人到达那里以及如何抓住锅。在需要时，它协调多个肢体（例如，使用双手提起重锅）。高层意图转化为硬件和软件的协同运动。大脑 #6：一旦运动计划设定，就该执行了。这个低层系统1控制大脑驱动机器人的执行器（电机和关节）。它持续读取传感器（关节角度、力量、平衡），并发送控制信号以跟随轨迹。使用控制回路（PID控制器、模型预测控制等）保持精度，如果机器人开始倾斜或刀具偏离轨迹，它会立即纠正。这些是以毫秒速度运作的反射和精细运动技能。当机器人切割胡萝卜时，大脑6调节力量并调整刀片角度，以获得均匀的切片而不滑动。这就像系统的潜意识“肌肉记忆”，自动处理低层细节。大脑 #7：最后一部分是专注于持续改进。在晚餐准备期间和之后，它分析性能。它是否洒了东西？搅拌时是否太慢？这个模块使用强化学习和自我校准来随着时间的推移更新机器人的模型。机器人的核心技能最初是在大量人类演示和试错中训练的，但你需要不断微调它们。如果它发现了一种更高效的切丁技术或更好的铲子握法，它会更新其策略，以便下次晚餐更顺利。这个自适应大脑使类人机器人随着经验变得更加熟练。 Codec：操作员在行动 Codec的架构如何将这些大脑结合在一起？每个“脑”作为机器人AI系统中的一个独立操作员模块运行。Codec的Fabric编排为每个操作员提供了自己的安全、沙盒环境。这意味着视觉模块、语言/逻辑模块、规划模块等都在隔离中运行，但通过定义的接口进行通信。如果一个模块崩溃或出现错误，它不会使整个机器人瘫痪，其他模块会安全运行。这种模块化设计也使得在不影响其他模块的情况下轻松更新或更换一个大脑，并根据需要添加新的专门操作员。这种操作员方法直接支持多脑框架。当你请求晚餐时，机器人的执行大脑（大脑1）可以启动一个专门的“厨师”操作员来处理该任务，而其他操作员则并行处理感知和控制。每个操作员仅访问其所需的资源（例如，食谱代理可能有互联网访问权限以获取说明，而控制代理仅与硬件接口），这提高了安全性。 Codec的模块化、沙盒设计是所有这些多样技能协同工作的粘合剂，类似于软件中的微服务，使类人机器人能够可靠地处理从头开始烹饪晚餐等复杂任务。这就是为什么$CODEC将成为机器人技术的主要基础设施。

You’ll see foundation models for Humanoids continually using a System 2 + System 1 style architecture which is actually inspired by human cognition. Most vision-language-action (VLA) models today are built as centralized multimodal systems that handle perception, language, and action within a single network. Codec’s infrastructure is perfect for this as it treats each Operator as a sandboxed module. Meaning you can spin up multiple Operators in parallel, each running its own model or task, while keeping them encapsulated and coordinated through the same architecture. Robots and Humanoids in general typically have multiple brains, where one Operator might handle vision processing, another handling balance, another doing high level planning etc, which can all be coordinated through Codec’s system. Nvidia’s foundation model Issac GR00T N1 uses the two module System 2 + System 1 architecture. System 2 is a vision-language model (a version of PaLM or similar, multimodal) that observes the world through the robot’s cameras and listens to instructions, then makes a high level plan. System 1 is a diffusion transformer policy that takes that plan and turns it into continuous motions in real time. You can think of System 2 as the deliberative brain and System 1 as the instinctual body controller. System 2 might output something like “move to the red cup, grasp it, then place it on the shelf,” and System 1 will generate the detailed joint trajectories for the legs and arms to execute each step smoothly. System 1 was trained on tons of trajectory data (including human teleoperated demos and physics simulated data) to master fine motions, while System 2 was built on a transformer with internet pretraining (for semantic understanding). This separation of reasoning vs. acting is very powerful for NVIDIA. It means GR00T can handle long horizon tasks that require planning (thanks to System 2) and also react instantly to perturbations (thanks to System 1). If a robot is carrying a tray and someone nudges the tray, System 1 can correct the balance immediately rather than waiting for the slower System 2 to notice. GR00T N1 was one of the first openly available robotics foundation models, and it quickly gained traction. Out of the box, it demonstrated skill across many tasks in simulation, it could grasp and move objects with one hand or two, hand items between its hands, and perform multi step chores without any task specific programming. Because it wasn’t tied to a single embodiment, developers showed it working on different robots with minimal adjustments. This is also true for Helix (Figure’s foundation model) which uses this type of architecture. Helix allows for two robots or multiple skills to operate, Codec could enable a multi agent brain by running several Operators that share information. This “isolated pod” design means each component can be specialized (just like System 1 vs System 2) and even developed by different teams, yet they can work together. It’s a one of a kind approach in the sense that Codec is building the deep software stack to support this modular, distributed intelligence, whereas most others only focus on the AI model itself. Codec also leverages large pre trained models. If you’re building a robot application on it, you might plug in an OpenVLA or a Pi Zero foundation model as part of your Operator. Codec provides the connectors, easy access to camera feeds or robot APIs, so you don’t have to write the low level code to get images from a robot’s camera or to send velocity commands to its motors. It’s all abstracted behind a high level SDK. One of the reasons I’m so bullish on Codec is exactly what I outlined above. They’re not chasing narratives, the architecture is built to be the glue between foundation models, and it frictionlessly supports multi brain systems, which is critical for humanoid complexity. Because we’re so early in this trend, it’s worth studying the designs of industry leaders and understanding why they work. Robotics is hard to grasp given the layers across hardware and software, but once you learn to break each section down piece by piece, it becomes far easier to digest. It might feel like a waste of time now, but this is the same method that gave me a head start during AI szn and why I was early on so many projects. Become disciplined and learn which components can co exist and which components don’t scale. It’ll pay dividends over the coming months. Deca Trillions ( $CODEC ) coded.

8.93K