ArXiv AI: Weekly Top Picks

cover

Coverage: 2026-01-10 → 2026-01-17

This week in AI papers

We keep an eye on new AI papers on arXiv, pick one or two that really matter each day, and share the key ideas — no hype, just clear explanations.

Unpacked by our trio: Alex the plain-language host, Marc the hands-on power user, and Jamie the senior ML engineer.

LLM Daily – AgentGuardian: Learning Access Control Policies to Govern AI Agent Behavior

2026-01-17 · 10 min · 15.1 MB

Excerpt — Artificial intelligence (AI) agents are increasingly used in a variety of domains to automate tasks, interact with users, and make decisions based on data inputs. Ensuring that AI agents perform only authorized actions…

LLM Daily – AgentGuardian: Learning Access Control Policies to Govern AI Agent Behavior

📝 Article 📄 PDF

LLM Daily – ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack

2026-01-16 · 10 min · 12.9 MB

Excerpt — Large Language Models (LLMs) have enabled the development of powerful agentic systems capable of automating complex workflows across various fields. However, these systems are highly vulnerable to indirect prompt…

LLM Daily – ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack

📝 Article 📄 PDF

LLM Daily – Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scal

2026-01-16 · 10 min · 14.7 MB

Excerpt — The rise of AI agent frameworks has introduced agent skills, modular packages containing instructions and executable code that dynamically extend agent capabilities. While this architecture enables powerful…

LLM Daily – Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scal

📝 Article 📄 PDF

LLM Daily – PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent

2026-01-15 · 10 min · 13.6 MB

Excerpt — While GUI agents have shown strong performance under explicit and completion instructions, real-world deployment requires aligning with users' more complex implicit intents. In this work, we highlight Hierarchical…

LLM Daily – PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent

📝 Article 📄 PDF

LLM Daily – DeepResearchEval: An Automated Framework for Deep Research Task Construction and

2026-01-15 · 10 min · 12.5 MB

Excerpt — Deep research systems are widely used for multi-step web research, analysis, and cross-source synthesis, yet their evaluation remains challenging. Existing benchmarks often require annotation-intensive task…

LLM Daily – DeepResearchEval: An Automated Framework for Deep Research Task Construction and

📝 Article 📄 PDF

LLM Daily – RAGShaper: Eliciting Sophisticated Agentic RAG Skills via Automated Data Synthes

2026-01-14 · 10 min · 14.4 MB

Excerpt — Agentic Retrieval-Augmented Generation (RAG) empowers large language models to autonomously plan and retrieve information for complex problem-solving. However, the development of robust agents is hindered by the…

LLM Daily – RAGShaper: Eliciting Sophisticated Agentic RAG Skills via Automated Data Synthes

📝 Article 📄 PDF

LLM Daily – To Retrieve or To Think? An Agentic Approach for Context Evolution

2026-01-14 · 10 min · 14.5 MB

Excerpt — Current context augmentation methods, such as retrieval-augmented generation, are essential for solving knowledge-intensive reasoning tasks.However, they typically adhere to a rigid, brute-force strategy that executes…

LLM Daily – To Retrieve or To Think? An Agentic Approach for Context Evolution

📝 Article 📄 PDF

LLM Daily – Is Agentic RAG worth it? An experimental comparison of RAG approaches

2026-01-13 · 10 min · 13.8 MB

Excerpt — Retrieval-Augmented Generation (RAG) systems are usually defined by the combination of a generator and a retrieval component that extracts textual context from a knowledge base to answer user queries. However, such…

LLM Daily – Is Agentic RAG worth it? An experimental comparison of RAG approaches

📝 Article 📄 PDF

LLM Daily – Beyond Single-Shot: Multi-step Tool Retrieval via Query Planning

2026-01-13 · 10 min · 13.8 MB

Excerpt — LLM agents operating over massive, dynamic tool libraries rely on effective retrieval, yet standard single-shot dense retrievers struggle with complex requests. These failures primarily stem from the disconnect between…

LLM Daily – Beyond Single-Shot: Multi-step Tool Retrieval via Query Planning

📝 Article 📄 PDF

LLM Daily – VIGIL: Defending LLM Agents Against Tool Stream Injection via Verify-Before-Comm

2026-01-12 · 10 min · 14.3 MB

Excerpt — LLM agents operating in open environments face escalating risks from indirect prompt injection, particularly within the tool stream where manipulated metadata and runtime feedback hijack execution flow. Existing…

LLM Daily – VIGIL: Defending LLM Agents Against Tool Stream Injection via Verify-Before-Comm

📝 Article 📄 PDF

LLM Daily – Agentic LLMs as Powerful Deanonymizers: Re-identification of Participants in the

2026-01-12 · 10 min · 15.9 MB

Excerpt — On December 4, 2025, Anthropic released Anthropic Interviewer, an AI tool for running qualitative interviews at scale, along with a public dataset of 1,250 interviews with professionals, including 125 scientists, about…

📝 Article 📄 PDF

LLM Daily – Don't Break the Cache: An Evaluation of Prompt Caching for Long-Horizon Agentic

2026-01-12 · 10 min · 12.9 MB

Excerpt — Recent advancements in Large Language Model (LLM) agents have enabled complex multi-turn agentic tasks requiring extensive tool calling, where conversations can span dozens of API calls with increasingly large context…

LLM Daily – Don't Break the Cache: An Evaluation of Prompt Caching for Long-Horizon Agentic

📝 Article 📄 PDF

LLM Daily – Distilling Feedback into Memory-as-a-Tool

2026-01-12 · 10 min · 13.7 MB

Excerpt — We propose a framework that amortizes the cost of inference-time reasoning by converting transient critiques into retrievable guidelines, through a file-based memory system and agent-controlled tool calls. We evaluate…

LLM Daily – Distilling Feedback into Memory-as-a-Tool

📝 Article 📄 PDF

Listen on Spotify (EN) Copy RSS (EN) Listen on Spotify (FR) Copy RSS (FR)

ArXiv AI: Weekly Top Picks

This week in AI papers

LLM Daily – AgentGuardian: Learning Access Control Policies to Govern AI Agent Behavior

LLM Daily – ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack

LLM Daily – Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scal

LLM Daily – PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent

LLM Daily – DeepResearchEval: An Automated Framework for Deep Research Task Construction and

LLM Daily – RAGShaper: Eliciting Sophisticated Agentic RAG Skills via Automated Data Synthes

LLM Daily – To Retrieve or To Think? An Agentic Approach for Context Evolution

LLM Daily – Is Agentic RAG worth it? An experimental comparison of RAG approaches

LLM Daily – Beyond Single-Shot: Multi-step Tool Retrieval via Query Planning

LLM Daily – VIGIL: Defending LLM Agents Against Tool Stream Injection via Verify-Before-Comm

LLM Daily – Agentic LLMs as Powerful Deanonymizers: Re-identification of Participants in the

LLM Daily – Don't Break the Cache: An Evaluation of Prompt Caching for Long-Horizon Agentic

LLM Daily – Distilling Feedback into Memory-as-a-Tool

Read more

AI Signals Report — Control planes, not just models

Your Bankers Are Ready. Your Bank Isn't.

One Line in Shanghai: What Xi's AI Speech Tells European Banks Betting on Chinese Open Models

Article 50 Goes Live in Five Days — and It Stopped Being a Legal Problem