发现前沿AI资源

探索来自arXiv、HuggingFace和GitHub的最新研究论文、模型、应用和项目。您的全面AI导航中心。

最新论文

来自arXiv的最新研究论文

Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning

Jan 14, 2026

Vision-Language-Action (VLA) tasks require reasoning over complex visual scenes and executing adaptive actions in dynamic environments. While recent studies on reasoning VLAs show that explicit chain-of-thought (CoT) can improve generalization, they suffer from high inference latency due to lengthy reasoning traces. We propose Fast-ThinkAct, an efficient reasoning framework that achieves compact yet performant planning through verbalizable latent reasoning. Fast-ThinkAct learns to reason efficiently with latent CoTs by distilling from a teacher, driven by a preference-guided objective to align manipulation trajectories that transfers both linguistic and visual planning capabilities for embodied control. This enables reasoning-enhanced policy learning that effectively connects compact reasoning to action execution. Extensive experiments across diverse embodied manipulation and reasoning benchmarks demonstrate that Fast-ThinkAct achieves strong performance with up to 89.3\% reduced inference latency over state-of-the-art reasoning VLAs, while maintaining effective long-horizon planning, few-shot adaptation, and failure recovery.

cs.CVcs.AIcs.LG+1

Value-Aware Numerical Representations for Transformer Language Models

Jan 14, 2026

3 authors

Transformer-based language models often achieve strong results on mathematical reasoning benchmarks while remaining fragile on basic numerical understanding and arithmetic operations. A central limitation is that numbers are processed as symbolic tokens whose embeddings do not explicitly encode numerical value, leading to systematic errors. We introduce a value-aware numerical representation that augments standard tokenized inputs with a dedicated prefix token whose embedding is explicitly conditioned on the underlying numerical value. This mechanism injects magnitude information directly into the model's input space while remaining compatible with existing tokenizers and decoder-only Transformer architectures. Evaluation on arithmetic tasks shows that the proposed approach outperforms baselines across numerical formats, tasks, and operand lengths. These results indicate that explicitly encoding numerical value is an effective and efficient way to improve fundamental numerical robustness in language models.

cs.CLcs.AIcs.LG

ShortCoder: Knowledge-Augmented Syntax Optimization for Token-Efficient Code Generation

Jan 14, 2026

9 authors

Code generation tasks aim to automate the conversion of user requirements into executable code, significantly reducing manual development efforts and enhancing software productivity. The emergence of large language models (LLMs) has significantly advanced code generation, though their efficiency is still impacted by certain inherent architectural constraints. Each token generation necessitates a complete inference pass, requiring persistent retention of contextual information in memory and escalating resource consumption. While existing research prioritizes inference-phase optimizations such as prompt compression and model quantization, the generation phase remains underexplored. To tackle these challenges, we propose a knowledge-infused framework named ShortCoder, which optimizes code generation efficiency while preserving semantic equivalence and readability. In particular, we introduce: (1) ten syntax-level simplification rules for Python, derived from AST-preserving transformations, achieving 18.1% token reduction without functional compromise; (2) a hybrid data synthesis pipeline integrating rule-based rewriting with LLM-guided refinement, producing ShorterCodeBench, a corpus of validated tuples of original code and simplified code with semantic consistency; (3) a fine-tuning strategy that injects conciseness awareness into the base LLMs. Extensive experimental results demonstrate that ShortCoder consistently outperforms state-of-the-art methods on HumanEval, achieving an improvement of 18.1%-37.8% in generation efficiency over previous methods while ensuring the performance of code generation.

cs.SEcs.AIcs.CL

SAM3-DMS: Decoupled Memory Selection for Multi-target Video Segmentation of SAM3

Jan 14, 2026

3 authors

Segment Anything 3 (SAM3) has established a powerful foundation that robustly detects, segments, and tracks specified targets in videos. However, in its original implementation, its group-level collective memory selection is suboptimal for complex multi-object scenarios, as it employs a synchronized decision across all concurrent targets conditioned on their average performance, often overlooking individual reliability. To this end, we propose SAM3-DMS, a training-free decoupled strategy that utilizes fine-grained memory selection on individual objects. Experiments demonstrate that our approach achieves robust identity preservation and tracking stability. Notably, our advantage becomes more pronounced with increased target density, establishing a solid foundation for simultaneous multi-target video segmentation in the wild.

cs.CV

COMPOSE: Hypergraph Cover Optimization for Multi-view 3D Human Pose Estimation

Jan 14, 2026

4 authors

3D pose estimation from sparse multi-views is a critical task for numerous applications, including action recognition, sports analysis, and human-robot interaction. Optimization-based methods typically follow a two-stage pipeline, first detecting 2D keypoints in each view and then associating these detections across views to triangulate the 3D pose. Existing methods rely on mere pairwise associations to model this correspondence problem, treating global consistency between views (i.e., cycle consistency) as a soft constraint. Yet, reconciling these constraints for multiple views becomes brittle when spurious associations propagate errors. We thus propose COMPOSE, a novel framework that formulates multi-view pose correspondence matching as a hypergraph partitioning problem rather than through pairwise association. While the complexity of the resulting integer linear program grows exponentially in theory, we introduce an efficient geometric pruning strategy to substantially reduce the search space. COMPOSE achieves improvements of up to 23% in average precision over previous optimization-based methods and up to 11% over self-supervised end-to-end learned methods, offering a promising solution to a widely studied problem.

cs.CV

Efficient Camera-Controlled Video Generation of Static Scenes via Sparse Diffusion and 3D Rendering

Jan 14, 2026

4 authors

Modern video generative models based on diffusion models can produce very realistic clips, but they are computationally inefficient, often requiring minutes of GPU time for just a few seconds of video. This inefficiency poses a critical barrier to deploying generative video in applications that require real-time interactions, such as embodied AI and VR/AR. This paper explores a new strategy for camera-conditioned video generation of static scenes: using diffusion-based generative models to generate a sparse set of keyframes, and then synthesizing the full video through 3D reconstruction and rendering. By lifting keyframes into a 3D representation and rendering intermediate views, our approach amortizes the generation cost across hundreds of frames while enforcing geometric consistency. We further introduce a model that predicts the optimal number of keyframes for a given camera trajectory, allowing the system to adaptively allocate computation. Our final method, SRENDER, uses very sparse keyframes for simple trajectories and denser ones for complex camera motion. This results in video generation that is more than 40 times faster than the diffusion-based baseline in generating 20 seconds of video, while maintaining high visual fidelity and temporal stability, offering a practical path toward efficient and controllable video synthesis.

cs.CV

精选博客文章

来自我们团队的最新见解和教程

查看全部

精选

2 min read

欢迎来到 AIPOD 博客

介绍我们全新的AI研究洞察和教程博客平台

AIPOD 团队

2025年11月6日

欢迎来到 AIPOD 博客我们很高兴向您介绍 AIPOD 博客 - 您获取AI研究洞察、教程和行业分析的全新目的地。您将在这里找到什么我们的博客将包含： - 研究洞察：深入解析最新的AI论文和突破性进展 - 教程指南：实现AI模型和技术的分步指导 - 行业分析：AI领域的趋势和发展动态 - 工具评测：最新AI工具和框架的评估丰富的互动内容 ...

公告博客人工智能

阅读更多 →

精选

15 min read

AI的演进：从AlphaGo到ChatGPT及更远

AIPOD团队

2024年11月17日

AI的演进：从AlphaGo到ChatGPT及更远过去八年见证了人工智能前所未有的加速发展。从2016年AlphaGo的历史性胜利，到2022年ChatGPT的病毒式爆发，再到2024年的推理突破，我们经历了一场从根本上改变了技术、社会以及我们对机器能力理解的革命。这是那场革命的故事——一条将我们带到今天的突破性时间线。 2016年：AlphaGo时刻 201...

ai-history timeline alphago+4

阅读更多 →

按类别浏览

按主题探索AI研究

发现前沿AI资源

最新论文

Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning

Value-Aware Numerical Representations for Transformer Language Models

ShortCoder: Knowledge-Augmented Syntax Optimization for Token-Efficient Code Generation

SAM3-DMS: Decoupled Memory Selection for Multi-target Video Segmentation of SAM3

COMPOSE: Hypergraph Cover Optimization for Multi-view 3D Human Pose Estimation

Efficient Camera-Controlled Video Generation of Static Scenes via Sparse Diffusion and 3D Rendering

精选博客文章

欢迎来到 AIPOD 博客

AI的演进：从AlphaGo到ChatGPT及更远

热门AI工具

最佳AI工具

ChatGPT替代品

AI编程助手

开源AI模型

按类别浏览

机器学习

计算机视觉

自然语言处理

强化学习