ArXiv AI: Weekly Top Picks

1766007197438

Coverage: 2026-03-29 → 2026-04-05

This week in AI papers

We keep an eye on new AI papers on arXiv, pick one or two that really matter each day, and share the key ideas — no hype, just clear explanations.

Unpacked by our trio: Alex the plain-language host, Marc the hands-on power user, and Jamie the senior ML engineer.

LLM Daily – Reliable Control-Point Selection for Steering Reasoning in Large Language Models

2026-04-05 · 10 min · 7.0 MB

Excerpt — Steering vectors offer a training-free mechanism for controlling reasoning behaviors in large language models, but constructing effective vectors requires identifying genuine behavioral signals in the model's hidden…

LLM Daily – Reliable Control-Point Selection for Steering Reasoning in Large Language Models

📝 Article 📄 PDF

LLM Daily – RetailBench: Evaluating Long-Horizon Autonomous Decision-Making and Strategy Sta

2026-04-05 · 10 min · 7.1 MB

Excerpt — Large Language Model (LLM)-based agents have achieved notable success on short-horizon and highly structured tasks. However, their ability to maintain coherent decision-making over long horizons in realistic and dynamic…

LLM Daily – RetailBench: Evaluating Long-Horizon Autonomous Decision-Making and Strategy Sta

📝 Article 📄 PDF

LLM Daily – De Jure: Iterative LLM Self-Refinement for Structured Extraction of Regulatory R

2026-04-04 · 10 min · 12.1 MB

Excerpt — Regulatory documents encode legally binding obligations that LLM-based systems must respect. Yet converting dense, hierarchically structured legal text into machine-readable rules remains a costly, expert-intensive…

LLM Daily – De Jure: Iterative LLM Self-Refinement for Structured Extraction of Regulatory R

📝 Article 📄 PDF

LLM Daily – To Memorize or to Retrieve: Scaling Laws for RAG-Considerate Pretraining

2026-04-03 · 10 min · 14.0 MB

Excerpt — Retrieval-augmented generation (RAG) improves language model (LM) performance by providing relevant context at test time for knowledge-intensive situations. However, the relationship between parametric knowledge…

LLM Daily – To Memorize or to Retrieve: Scaling Laws for RAG-Considerate Pretraining

📝 Article 📄 PDF

LLM Daily – Optimizing RAG Rerankers with LLM Feedback via Reinforcement Learning

2026-04-03 · 10 min · 11.0 MB

Excerpt — Rerankers play a pivotal role in refining retrieval results for Retrieval-Augmented Generation. However, current reranking models are typically optimized on static human annotated relevance labels in isolation,…

LLM Daily – Optimizing RAG Rerankers with LLM Feedback via Reinforcement Learning

📝 Article 📄 PDF

LLM Daily – Multi-Layered Memory Architectures for LLM Agents: An Experimental Evaluation of

2026-04-02 · 10 min · 8.3 MB

Excerpt — Long-horizon dialogue systems suffer from semanticdrift and unstable memory retention across extended sessions. This paper presents a Multi-Layer Memory Framework that decomposes dialogue history into working, episodic,…

LLM Daily – Multi-Layered Memory Architectures for LLM Agents: An Experimental Evaluation of

📝 Article 📄 PDF

LLM Daily – Beyond pass@1: A Reliability Science Framework for Long-Horizon LLM Agents

2026-04-02 · 10 min · 6.7 MB

Excerpt — Existing benchmarks measure capability -- whether a model succeeds on a single attempt -- but production deployments require reliability -- consistent success across repeated attempts on tasks of varying duration. We…

LLM Daily – Beyond pass@1: A Reliability Science Framework for Long-Horizon LLM Agents

📝 Article 📄 PDF

LLM Daily – Distilling Human-Aligned Privacy Sensitivity Assessment from Large Language Mode

2026-04-01 · 10 min · 10.5 MB

Excerpt — Accurate privacy evaluation of textual data remains a critical challenge in privacy-preserving natural language processing. Recent work has shown that large language models (LLMs) can serve as reliable privacy…

📝 Article 📄 PDF

LLM Daily – GNNVerifier: Graph-based Verifier for LLM Task Planning

2026-04-01 · 10 min · 14.0 MB

Excerpt — Large language models (LLMs) facilitate the development of autonomous agents. As a core component of such agents, task planning aims to decompose complex natural language requests into concrete, solvable sub-tasks.…

LLM Daily – GNNVerifier: Graph-based Verifier for LLM Task Planning

📝 Article 📄 PDF

LLM Daily – LLM-Augmented Release Intelligence: Automated Change Summarization and Impact An

2026-04-01 · 10 min · 8.4 MB

Excerpt — Cloud-native software delivery platforms orchestrate releases through complex, multi-stage pipelines composed of dozens of independently versioned tasks. When code is promoted between environments -- development to…

LLM Daily – LLM-Augmented Release Intelligence: Automated Change Summarization and Impact An

📝 Article 📄 PDF

LLM Daily – LABSHIELD: A Multimodal Benchmark for Safety-Critical Reasoning and Planning in

2026-03-30 · 10 min · 6.2 MB

Excerpt — Artificial intelligence is increasingly catalyzing scientific automation, with multimodal large language model (MLLM) agents evolving from lab assistants into self-driving lab operators. This transition imposes…

LLM Daily – LABSHIELD: A Multimodal Benchmark for Safety-Critical Reasoning and Planning in

📝 Article 📄 PDF

LLM Daily – ActiveUltraFeedback: Efficient Preference Data Generation using Active Learning

2026-03-30 · 10 min · 7.3 MB

Excerpt — Reinforcement Learning from Human Feedback (RLHF) has become the standard for aligning Large Language Models (LLMs), yet its efficacy is bottlenecked by the high cost of acquiring preference data, especially in low-…

LLM Daily – ActiveUltraFeedback: Efficient Preference Data Generation using Active Learning

📝 Article 📄 PDF

LLM Daily – HeartAgent: An Autonomous Agent System for Explainable Differential Diagnosis in

2026-03-29 · 10 min · 13.0 MB

Excerpt — Heart diseases remain a leading cause of morbidity and mortality worldwide, necessitating accurate and trustworthy differential diagnosis. However, existing artificial intelligence-based diagnostic methods are often…

📝 Article 📄 PDF

LLM Daily – Natural-Language Agent Harnesses

2026-03-29 · 10 min · 11.8 MB

Excerpt — Agent performance increasingly depends on harness engineering, yet harness design is usually buried in controller code and runtime-specific conventions, making it hard to transfer, compare, and study as a scientific…

📝 Article 📄 PDF

Listen on Spotify (EN) Copy RSS (EN) Listen on Spotify (FR) Copy RSS (FR)

ArXiv AI: Weekly Top Picks

This week in AI papers

LLM Daily – Reliable Control-Point Selection for Steering Reasoning in Large Language Models

LLM Daily – RetailBench: Evaluating Long-Horizon Autonomous Decision-Making and Strategy Sta

LLM Daily – De Jure: Iterative LLM Self-Refinement for Structured Extraction of Regulatory R

LLM Daily – To Memorize or to Retrieve: Scaling Laws for RAG-Considerate Pretraining

LLM Daily – Optimizing RAG Rerankers with LLM Feedback via Reinforcement Learning

LLM Daily – Multi-Layered Memory Architectures for LLM Agents: An Experimental Evaluation of

LLM Daily – Beyond pass@1: A Reliability Science Framework for Long-Horizon LLM Agents

LLM Daily – Distilling Human-Aligned Privacy Sensitivity Assessment from Large Language Mode

LLM Daily – GNNVerifier: Graph-based Verifier for LLM Task Planning

LLM Daily – LLM-Augmented Release Intelligence: Automated Change Summarization and Impact An

LLM Daily – LABSHIELD: A Multimodal Benchmark for Safety-Critical Reasoning and Planning in

LLM Daily – ActiveUltraFeedback: Efficient Preference Data Generation using Active Learning

LLM Daily – HeartAgent: An Autonomous Agent System for Explainable Differential Diagnosis in

LLM Daily – Natural-Language Agent Harnesses

Read more

Your Bankers Are Ready. Your Bank Isn't.

One Line in Shanghai: What Xi's AI Speech Tells European Banks Betting on Chinese Open Models

Article 50 Goes Live in Five Days — and It Stopped Being a Legal Problem

Stop Waiting: This Is the Best Time to Hire Junior Talent.