ArXiv AI: Weekly Top Picks

cover

Coverage: 2025-11-30 → 2025-12-07

This week in AI papers

We keep an eye on new AI papers on arXiv, pick one or two that really matter each day, and share the key ideas — no hype, just clear explanations.

Unpacked by our trio: Alex the plain-language host, Marc the hands-on power user, and Jamie the senior ML engineer.

LLM Daily – SEAL: Self-Evolving Agentic Learning for Conversational Question Answering over

2025-12-06 · 10 min · 16.5 MB

Excerpt — Knowledge-based conversational question answering (KBCQA) confronts persistent challenges in resolving coreference, modeling contextual dependencies, and executing complex logical reasoning. Existing approaches, whether…

LLM Daily – SEAL: Self-Evolving Agentic Learning for Conversational Question Answering over

📝 Article 📄 PDF

LLM Daily – ASTRIDE: A Security Threat Modeling Platform for Agentic-AI Applications

2025-12-06 · 10 min · 14.3 MB

Excerpt — AI agent-based systems are becoming increasingly integral to modern software architectures, enabling autonomous decision-making, dynamic task execution, and multimodal interactions through large language models (LLMs).…

LLM Daily – ASTRIDE: A Security Threat Modeling Platform for Agentic-AI Applications

📝 Article 📄 PDF

LLM Daily – Context-Aware Hierarchical Learning: A Two-Step Paradigm towards Safer LLMs

2025-12-05 · 10 min · 12.8 MB

Excerpt — Large Language Models (LLMs) have emerged as powerful tools for diverse applications. However, their uniform token processing paradigm introduces critical vulnerabilities in instruction handling, particularly when…

LLM Daily – Context-Aware Hierarchical Learning: A Two-Step Paradigm towards Safer LLMs

📝 Article 📄 PDF

LLM Daily – In-Context Representation Hijacking

2025-12-05 · 10 min · 15.0 MB

Excerpt — We introduce Doublespeak, a simple in-context representation hijacking attack against large language models (LLMs). The attack works by systematically replacing a harmful keyword (e.g., bomb) with a benign token (e.g.,…

LLM Daily – In-Context Representation Hijacking

📝 Article 📄 PDF

LLM Daily – Think in Parallel, Answer as One: Logit Averaging for Open-Ended Reasoning

2025-12-04 · 10 min · 14.5 MB

Excerpt — Majority voting has proven effective for close-ended question answering by aggregating parallel reasoning traces. However, it is not directly applicable to open-ended reasoning, such as code generation and web-based…

LLM Daily – Think in Parallel, Answer as One: Logit Averaging for Open-Ended Reasoning

📝 Article 📄 PDF

LLM Daily – IACT: A Self-Organizing Recursive Model for General AI Agents: A Technical White

2025-12-03 · 10 min · 13.1 MB

Excerpt — This technical white paper introduces the Interactive Agents Call Tree (IACT), a computational model designed to address the limitations of static, hard-coded agent workflows. Unlike traditional systems that require…

📝 Article 📄 PDF

LLM Daily – Beyond Single-Agent Safety: A Taxonomy of Risks in LLM-to-LLM Interactions

2025-12-03 · 10 min · 16.2 MB

Excerpt — This paper examines why safety mechanisms designed for human-model interaction do not scale to environments where large language models (LLMs) interact with each other. Most current governance practices still rely on…

📝 Article 📄 PDF

LLM Daily – An Empirical Study of Agent Developer Practices in AI Agent Frameworks

2025-12-02 · 10 min · 12.6 MB

Excerpt — The rise of large language models (LLMs) has sparked a surge of interest in agents, leading to the rapid growth of agent frameworks. Agent frameworks are software toolkits and libraries that provide standardized…

LLM Daily – An Empirical Study of Agent Developer Practices in AI Agent Frameworks

📝 Article 📄 PDF

LLM Daily – Agentic Policy Optimization via Instruction-Policy Co-Evolution

2025-12-02 · 10 min · 16.4 MB

Excerpt — Reinforcement Learning with Verifiable Rewards (RLVR) has advanced the reasoning capability of large language models (LLMs), enabling autonomous agents that can conduct effective multi-turn and tool-integrated…

LLM Daily – Agentic Policy Optimization via Instruction-Policy Co-Evolution

📝 Article 📄 PDF

LLM Daily – Does Self-Evaluation Enable Wireheading in Language Models?

2025-12-01 · 10 min · 12.9 MB

Excerpt — Self-evaluation is increasingly central to language model training, from constitutional AI to self-refinement. We investigate whether coupling self-evaluation to reward signals creates incentives for wireheading, where…

LLM Daily – Does Self-Evaluation Enable Wireheading in Language Models?

📝 Article 📄 PDF

LLM Daily – MCP vs RAG vs NLWeb vs HTML: A Comparison of the Effectiveness and Efficiency of

2025-12-01 · 10 min · 15.2 MB

Excerpt — Large language model agents are increasingly used to automate web tasks such as product search, offer comparison, and checkout. Current research explores different interfaces through which these agents interact with…

LLM Daily – MCP vs RAG vs NLWeb vs HTML: A Comparison of the Effectiveness and Efficiency of

📝 Article 📄 PDF

Listen on Spotify (EN) Copy RSS (EN) Listen on Spotify (FR) Copy RSS (FR)

ArXiv AI: Weekly Top Picks

This week in AI papers

LLM Daily – SEAL: Self-Evolving Agentic Learning for Conversational Question Answering over

LLM Daily – ASTRIDE: A Security Threat Modeling Platform for Agentic-AI Applications

LLM Daily – Context-Aware Hierarchical Learning: A Two-Step Paradigm towards Safer LLMs

LLM Daily – In-Context Representation Hijacking

LLM Daily – Think in Parallel, Answer as One: Logit Averaging for Open-Ended Reasoning

LLM Daily – IACT: A Self-Organizing Recursive Model for General AI Agents: A Technical White

LLM Daily – Beyond Single-Agent Safety: A Taxonomy of Risks in LLM-to-LLM Interactions

LLM Daily – An Empirical Study of Agent Developer Practices in AI Agent Frameworks

LLM Daily – Agentic Policy Optimization via Instruction-Policy Co-Evolution

LLM Daily – Does Self-Evaluation Enable Wireheading in Language Models?

LLM Daily – MCP vs RAG vs NLWeb vs HTML: A Comparison of the Effectiveness and Efficiency of

Read more

AI Signals Report — Control planes, not just models

Your Bankers Are Ready. Your Bank Isn't.

One Line in Shanghai: What Xi's AI Speech Tells European Banks Betting on Chinese Open Models

Article 50 Goes Live in Five Days — and It Stopped Being a Legal Problem