ArXiv AI: Weekly Top Picks

1766007197438

Coverage: 2026-02-22 → 2026-03-01

This week in AI papers

We keep an eye on new AI papers on arXiv, pick one or two that really matter each day, and share the key ideas — no hype, just clear explanations.

Unpacked by our trio: Alex the plain-language host, Marc the hands-on power user, and Jamie the senior ML engineer.

LLM Daily – General Agent Evaluation

2026-03-01 · 10 min · 18.3 MB

Excerpt — The promise of general-purpose agents - systems that perform tasks in unfamiliar environments without domain-specific engineering - remains largely unrealized. Existing agents are predominantly specialized, and while…

📝 Article 📄 PDF

LLM Daily – CiteLLM: An Agentic Platform for Trustworthy Scientific Reference Discovery

2026-03-01 · 10 min · 17.3 MB

Excerpt — Large language models (LLMs) have created new opportunities to enhance the efficiency of scholarly activities; however, challenges persist in the ethical deployment of AI assistance, including (1) the trustworthiness of…

LLM Daily – CiteLLM: An Agentic Platform for Trustworthy Scientific Reference Discovery

📝 Article 📄 PDF

LLM Daily – ProactiveMobile: A Comprehensive Benchmark for Boosting Proactive Intelligence o

2026-03-01 · 10 min · 19.6 MB

Excerpt — Multimodal large language models (MLLMs) have made significant progress in mobile agent development, yet their capabilities are predominantly confined to a reactive paradigm, where they merely execute explicit user…

LLM Daily – ProactiveMobile: A Comprehensive Benchmark for Boosting Proactive Intelligence o

📝 Article 📄 PDF

LLM Daily – On Data Engineering for Scaling LLM Terminal Capabilities

2026-02-28 · 10 min · 18.7 MB

Excerpt — Despite rapid recent progress in the terminal capabilities of large language models, the training data strategies behind state-of-the-art terminal agents remain largely undisclosed. We address this gap through a…

LLM Daily – On Data Engineering for Scaling LLM Terminal Capabilities

📝 Article 📄 PDF

LLM Daily – AIDG: Evaluating Asymmetry Between Information Extraction and Containment in Mul

2026-02-26 · 10 min · 13.1 MB

Excerpt — Evaluating the strategic reasoning capabilities of Large Language Models (LLMs) requires moving beyond static benchmarks to dynamic, multi-turn interactions. We introduce AIDG (Adversarial Information Deduction Game), a…

LLM Daily – AIDG: Evaluating Asymmetry Between Information Extraction and Containment in Mul

📝 Article 📄 PDF

LLM Daily – WarpRec: Unifying Academic Rigor and Industrial Scale for Responsible, Reproduci

2026-02-26 · 10 min · 13.0 MB

Excerpt — Innovation in Recommender Systems is currently impeded by a fractured ecosystem, where researchers must choose between the ease of in-memory experimentation and the costly, complex rewriting required for distributed…

LLM Daily – WarpRec: Unifying Academic Rigor and Industrial Scale for Responsible, Reproduci

📝 Article 📄 PDF

LLM Daily – Agentic Problem Frames: A Systematic Approach to Engineering Reliable Domain Age

2026-02-24 · 10 min · 13.3 MB

Excerpt — Large Language Models (LLMs) are evolving into autonomous agents, yet current "frameless" development--relying on ambiguous natural language without engineering blueprints--leads to critical risks such as scope creep…

LLM Daily – Agentic Problem Frames: A Systematic Approach to Engineering Reliable Domain Age

📝 Article 📄 PDF

LLM Daily – Proximity-Based Multi-Turn Optimization: Practical Credit Assignment for LLM Age

2026-02-24 · 10 min · 12.7 MB

Excerpt — Multi-turn LLM agents are becoming pivotal to production systems, spanning customer service automation, e-commerce assistance, and interactive task management, where accurately distinguishing high-value informative…

LLM Daily – Proximity-Based Multi-Turn Optimization: Practical Credit Assignment for LLM Age

📝 Article 📄 PDF

LLM Daily – Decomposing Retrieval Failures in RAG for Long-Document Financial Question Answe

2026-02-23 · 10 min · 12.2 MB

Excerpt — Retrieval-augmented generation is increasingly used for financial question answering over long regulatory filings, yet reliability depends on retrieving the exact context needed to justify answers in high stakes…

LLM Daily – Decomposing Retrieval Failures in RAG for Long-Document Financial Question Answe

📝 Article 📄 PDF

Listen on Spotify (EN) Copy RSS (EN) Listen on Spotify (FR) Copy RSS (FR)

ArXiv AI: Weekly Top Picks

This week in AI papers

LLM Daily – General Agent Evaluation

LLM Daily – CiteLLM: An Agentic Platform for Trustworthy Scientific Reference Discovery

LLM Daily – ProactiveMobile: A Comprehensive Benchmark for Boosting Proactive Intelligence o

LLM Daily – On Data Engineering for Scaling LLM Terminal Capabilities

LLM Daily – AIDG: Evaluating Asymmetry Between Information Extraction and Containment in Mul

LLM Daily – WarpRec: Unifying Academic Rigor and Industrial Scale for Responsible, Reproduci

LLM Daily – Agentic Problem Frames: A Systematic Approach to Engineering Reliable Domain Age

LLM Daily – Proximity-Based Multi-Turn Optimization: Practical Credit Assignment for LLM Age

LLM Daily – Decomposing Retrieval Failures in RAG for Long-Document Financial Question Answe

Read more

AI Signals Report — Control planes, not just models

Your Bankers Are Ready. Your Bank Isn't.

One Line in Shanghai: What Xi's AI Speech Tells European Banks Betting on Chinese Open Models

Article 50 Goes Live in Five Days — and It Stopped Being a Legal Problem