ArXiv AI: Weekly Top Picks

1766007197438

Coverage: 2026-03-01 → 2026-03-08

This week in AI papers

We keep an eye on new AI papers on arXiv, pick one or two that really matter each day, and share the key ideas — no hype, just clear explanations.

Unpacked by our trio: Alex the plain-language host, Marc the hands-on power user, and Jamie the senior ML engineer.

LLM Daily – LifeBench: A Benchmark for Long-Horizon Multi-Source Memory

2026-03-08 · 10 min · 7.5 MB

Excerpt — Long-term memory is fundamental for personalized agents capable of accumulating knowledge, reasoning over user experiences, and adapting across time. However, existing memory benchmarks primarily target declarative…

LLM Daily – LifeBench: A Benchmark for Long-Horizon Multi-Source Memory

📝 Article 📄 PDF

LLM Daily – VRM: Teaching Reward Models to Understand Authentic Human Preferences

2026-03-07 · 10 min · 17.2 MB

Excerpt — Large Language Models (LLMs) have achieved remarkable success across diverse natural language tasks, yet the reward models employed for aligning LLMs often encounter challenges of reward hacking, where the approaches…

LLM Daily – VRM: Teaching Reward Models to Understand Authentic Human Preferences

📝 Article 📄 PDF

LLM Daily – From Threat Intelligence to Firewall Rules: Semantic Relations in Hybrid AI Agen

2026-03-05 · 10 min · 20.1 MB

Excerpt — Web security demands rapid response capabilities to evolving cyber threats. Agentic Artificial Intelligence (AI) promises automation, but the need for trustworthy security responses is of the utmost importance. This…

LLM Daily – From Threat Intelligence to Firewall Rules: Semantic Relations in Hybrid AI Agen

📝 Article 📄 PDF

LLM Daily – RUMAD: Reinforcement-Unifying Multi-Agent Debate

2026-03-04 · 10 min · 20.8 MB

Excerpt — Multi-agent debate (MAD) systems leverage collective intelligence to enhance reasoning capabilities, yet existing approaches struggle to simultaneously optimize accuracy, consensus formation, and computational…

LLM Daily – RUMAD: Reinforcement-Unifying Multi-Agent Debate

📝 Article 📄 PDF

LLM Daily – Beyond Task Completion: Revealing Corrupt Success in LLM Agents through Procedur

2026-03-04 · 10 min · 17.0 MB

Excerpt — Large Language Model (LLM)-based agents are increasingly adopted in high-stakes settings, but current benchmarks evaluate mainly whether a task was completed, not how. We introduce Procedure-Aware Evaluation (PAE), a…

📝 Article 📄 PDF

LLM Daily – MED-COPILOT: A Medical Assistant Powered by GraphRAG and Similar Patient Case Re

2026-03-03 · 10 min · 19.5 MB

Excerpt — Clinical decision-making requires synthesizing heterogeneous evidence, including patient histories, clinical guidelines, and trajectories of comparable cases. While large language models (LLMs) offer strong reasoning…

LLM Daily – MED-COPILOT: A Medical Assistant Powered by GraphRAG and Similar Patient Case Re

📝 Article 📄 PDF

LLM Daily – A Novel Hierarchical Multi-Agent System for Payments Using LLMs

2026-03-02 · 10 min · 19.0 MB

Excerpt — Large language model (LLM) agents, such as OpenAI's Operator and Claude's Computer Use, can automate workflows but unable to handle payment tasks. Existing agentic solutions have gained significant attention; however,…

LLM Daily – A Novel Hierarchical Multi-Agent System for Payments Using LLMs

📝 Article 📄 PDF

LLM Daily – General Agent Evaluation

2026-03-01 · 10 min · 18.3 MB

Excerpt — The promise of general-purpose agents - systems that perform tasks in unfamiliar environments without domain-specific engineering - remains largely unrealized. Existing agents are predominantly specialized, and while…

📝 Article 📄 PDF

LLM Daily – CiteLLM: An Agentic Platform for Trustworthy Scientific Reference Discovery

2026-03-01 · 10 min · 17.3 MB

Excerpt — Large language models (LLMs) have created new opportunities to enhance the efficiency of scholarly activities; however, challenges persist in the ethical deployment of AI assistance, including (1) the trustworthiness of…

LLM Daily – CiteLLM: An Agentic Platform for Trustworthy Scientific Reference Discovery

📝 Article 📄 PDF

LLM Daily – ProactiveMobile: A Comprehensive Benchmark for Boosting Proactive Intelligence o

2026-03-01 · 10 min · 19.6 MB

Excerpt — Multimodal large language models (MLLMs) have made significant progress in mobile agent development, yet their capabilities are predominantly confined to a reactive paradigm, where they merely execute explicit user…

LLM Daily – ProactiveMobile: A Comprehensive Benchmark for Boosting Proactive Intelligence o

📝 Article 📄 PDF

Listen on Spotify (EN) Copy RSS (EN) Listen on Spotify (FR) Copy RSS (FR)

ArXiv AI: Weekly Top Picks

This week in AI papers

LLM Daily – LifeBench: A Benchmark for Long-Horizon Multi-Source Memory

LLM Daily – VRM: Teaching Reward Models to Understand Authentic Human Preferences

LLM Daily – From Threat Intelligence to Firewall Rules: Semantic Relations in Hybrid AI Agen

LLM Daily – RUMAD: Reinforcement-Unifying Multi-Agent Debate

LLM Daily – Beyond Task Completion: Revealing Corrupt Success in LLM Agents through Procedur

LLM Daily – MED-COPILOT: A Medical Assistant Powered by GraphRAG and Similar Patient Case Re

LLM Daily – A Novel Hierarchical Multi-Agent System for Payments Using LLMs

LLM Daily – General Agent Evaluation

LLM Daily – CiteLLM: An Agentic Platform for Trustworthy Scientific Reference Discovery

LLM Daily – ProactiveMobile: A Comprehensive Benchmark for Boosting Proactive Intelligence o

Read more

AI Signals Report — Control planes, not just models

Your Bankers Are Ready. Your Bank Isn't.

One Line in Shanghai: What Xi's AI Speech Tells European Banks Betting on Chinese Open Models

Article 50 Goes Live in Five Days — and It Stopped Being a Legal Problem