ArXiv AI: Weekly Top Picks

1766007197438

Coverage: 2026-02-01 → 2026-02-08

This week in AI papers

We keep an eye on new AI papers on arXiv, pick one or two that really matter each day, and share the key ideas — no hype, just clear explanations.

Unpacked by our trio: Alex the plain-language host, Marc the hands-on power user, and Jamie the senior ML engineer.

LLM Daily – PhysicsAgentABM: Physics-Guided Generative Agent-Based Modeling

2026-02-08 · 10 min · 19.1 MB

Excerpt — Large language model (LLM)-based multi-agent systems enable expressive agent reasoning but are expensive to scale and poorly calibrated for timestep-aligned state-transition simulation, while classical agent-based…

LLM Daily – PhysicsAgentABM: Physics-Guided Generative Agent-Based Modeling

📝 Article 📄 PDF

LLM Daily – Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory

2026-02-07 · 10 min · 14.6 MB

Excerpt — Memory is increasingly central to Large Language Model (LLM) agents operating beyond a single context window, yet most existing systems rely on offline, query-agnostic memory construction that can be inefficient and may…

LLM Daily – Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory

📝 Article 📄 PDF

LLM Daily – Learning to Share: Selective Memory for Efficient Parallel Agentic Systems

2026-02-06 · 10 min · 12.4 MB

Excerpt — Agentic systems solve complex tasks by coordinating multiple agents that iteratively reason, invoke tools, and exchange intermediate results. To improve robustness and solution quality, recent approaches deploy multiple…

LLM Daily – Learning to Share: Selective Memory for Efficient Parallel Agentic Systems

📝 Article 📄 PDF

LLM Daily – AgenticPay: A Multi-Agent LLM Negotiation System for Buyer-Seller Transactions

2026-02-06 · 10 min · 13.4 MB

Excerpt — Large language model (LLM)-based agents are increasingly expected to negotiate, coordinate, and transact autonomously, yet existing benchmarks lack principled settings for evaluating language-mediated economic…

LLM Daily – AgenticPay: A Multi-Agent LLM Negotiation System for Buyer-Seller Transactions

📝 Article 📄 PDF

LLM Daily – DyTopo: Dynamic Topology Routing for Multi-Agent Reasoning via Semantic Matching

2026-02-06 · 10 min · 14.5 MB

Excerpt — Multi-agent systems built from prompted large language models can improve multi-round reasoning, yet most existing pipelines rely on fixed, trajectory-wide communication patterns that are poorly matched to the stage-…

LLM Daily – DyTopo: Dynamic Topology Routing for Multi-Agent Reasoning via Semantic Matching

📝 Article 📄 PDF

LLM Daily – PersoPilot: An Adaptive AI-Copilot for Transparent Contextualized Persona Classi

2026-02-05 · 10 min · 16.0 MB

Excerpt — Understanding and classifying user personas is critical for delivering effective personalization. While persona information offers valuable insights, its full potential is realized only when contextualized, linking user…

LLM Daily – PersoPilot: An Adaptive AI-Copilot for Transparent Contextualized Persona Classi

📝 Article 📄 PDF

LLM Daily – AIANO: Enhancing Information Retrieval with AI-Augmented Annotation

2026-02-05 · 10 min · 15.1 MB

Excerpt — The rise of Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) has rapidly increased the need for high-quality, curated information retrieval datasets. These datasets, however, are currently created…

LLM Daily – AIANO: Enhancing Information Retrieval with AI-Augmented Annotation

📝 Article 📄 PDF

LLM Daily – WebSentinel: Detecting and Localizing Prompt Injection Attacks for Web Agents

2026-02-04 · 10 min · 14.1 MB

Excerpt — Prompt injection attacks manipulate webpage content to cause web agents to execute attacker-specified tasks instead of the user's intended ones. Existing methods for detecting and localizing such attacks achieve limited…

LLM Daily – WebSentinel: Detecting and Localizing Prompt Injection Attacks for Web Agents

📝 Article 📄 PDF

LLM Daily – Understanding Agent Scaling in LLM-Based Multi-Agent Systems via Diversity

2026-02-04 · 10 min · 15.8 MB

Excerpt — LLM-based multi-agent systems (MAS) have emerged as a promising approach to tackle complex tasks that are difficult for individual LLMs. A natural strategy is to scale performance by increasing the number of agents;…

LLM Daily – Understanding Agent Scaling in LLM-Based Multi-Agent Systems via Diversity

📝 Article 📄 PDF

LLM Daily – Drift-Bench: Diagnosing Cooperative Breakdowns in LLM Agents under Input Faults

2026-02-03 · 10 min · 14.9 MB

Excerpt — As Large Language Models transition to autonomous agents, user inputs frequently violate cooperative assumptions (e.g., implicit intent, missing parameters, false presuppositions, or ambiguous expressions), creating…

LLM Daily – Drift-Bench: Diagnosing Cooperative Breakdowns in LLM Agents under Input Faults

📝 Article 📄 PDF

LLM Daily – Avenir-Web: Human-Experience-Imitating Multimodal Web Agents with Mixture of Gro

2026-02-03 · 10 min · 14.6 MB

Excerpt — Despite advances in multimodal large language models, autonomous web agents still struggle to reliably execute long-horizon tasks on complex and dynamic web interfaces. Existing agents often suffer from inaccurate…

LLM Daily – Avenir-Web: Human-Experience-Imitating Multimodal Web Agents with Mixture of Gro

📝 Article 📄 PDF

LLM Daily – AgentRx: Diagnosing AI Agent Failures from Execution Trajectories

2026-02-03 · 10 min · 14.2 MB

Excerpt — AI agents often fail in ways that are difficult to localize because executions are probabilistic, long-horizon, multi-agent, and mediated by noisy tool outputs. We address this gap by manually annotating failed agent…

LLM Daily – AgentRx: Diagnosing AI Agent Failures from Execution Trajectories

📝 Article 📄 PDF

LLM Daily – CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Re

2026-02-01 · 10 min · 14.3 MB

Excerpt — Existing benchmarks for Large Language Model (LLM) agents focus on task completion under idealistic settings but overlook reliability in real-world, user-facing applications. In domains, such as in-car voice assistants,…

📝 Article 📄 PDF

LLM Daily – Optimizing Agentic Workflows using Meta-tools

2026-02-01 · 10 min · 13.8 MB

Excerpt — Agentic AI enables LLM to dynamically reason, plan, and interact with tools to solve complex tasks. However, agentic workflows often require many iterative reasoning steps and tool invocations, leading to significant…

📝 Article 📄 PDF

LLM Daily – $G^2$-Reader: Dual Evolving Graphs for Multimodal Document QA

2026-02-01 · 10 min · 13.4 MB

Excerpt — Retrieval-augmented generation is a practical paradigm for question answering over long documents, but it remains brittle for multimodal reading where text, tables, and figures are interleaved across many pages. First,…

📝 Article 📄 PDF

Listen on Spotify (EN) Copy RSS (EN) Listen on Spotify (FR) Copy RSS (FR)

ArXiv AI: Weekly Top Picks

This week in AI papers

LLM Daily – PhysicsAgentABM: Physics-Guided Generative Agent-Based Modeling

LLM Daily – Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory

LLM Daily – Learning to Share: Selective Memory for Efficient Parallel Agentic Systems

LLM Daily – AgenticPay: A Multi-Agent LLM Negotiation System for Buyer-Seller Transactions

LLM Daily – DyTopo: Dynamic Topology Routing for Multi-Agent Reasoning via Semantic Matching

LLM Daily – PersoPilot: An Adaptive AI-Copilot for Transparent Contextualized Persona Classi

LLM Daily – AIANO: Enhancing Information Retrieval with AI-Augmented Annotation

LLM Daily – WebSentinel: Detecting and Localizing Prompt Injection Attacks for Web Agents

LLM Daily – Understanding Agent Scaling in LLM-Based Multi-Agent Systems via Diversity

LLM Daily – Drift-Bench: Diagnosing Cooperative Breakdowns in LLM Agents under Input Faults

LLM Daily – Avenir-Web: Human-Experience-Imitating Multimodal Web Agents with Mixture of Gro

LLM Daily – AgentRx: Diagnosing AI Agent Failures from Execution Trajectories

LLM Daily – CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Re

LLM Daily – Optimizing Agentic Workflows using Meta-tools

LLM Daily – $G^2$-Reader: Dual Evolving Graphs for Multimodal Document QA

Read more

AI Signals Report — Control planes, not just models

Your Bankers Are Ready. Your Bank Isn't.

One Line in Shanghai: What Xi's AI Speech Tells European Banks Betting on Chinese Open Models

Article 50 Goes Live in Five Days — and It Stopped Being a Legal Problem