Глеб Макаревичсотрудник Центра Индоокеанского региона ИМЭМО РАН
Given the complexity of home scenarios and their long-tail distribution, today’s mainstream technical approaches are still evolving. On the data side, training data often relies on lab demonstrations, limited real-world trajectories, and publicly available videos, leaving significant room to improve generalization to unknown environments and novel task combinations. On the objective and representation side, traditional VLA systems are typically optimized around aligning vision–language–action and reproducing behaviors; deeper modeling of the semantic structure behind actions and a composable skill space is still needed. As a result, models behave more like they are “matching/reusing” existing action fragments rather than generating feasible new strategies based on goals and constraints, making it difficult to handle the highly long-tailed and constantly changing task demands found in real homes.
,推荐阅读PDF资料获取更多信息
对于她们而言,这个选择远不止“再拿一个文凭”那么简单。,详情可参考heLLoword翻译官方下载
Фото: Essam Al-Sudani / Reuters,推荐阅读91视频获取更多信息