Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search Paper • 2508.15884 • Published 12 days ago • 1
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model Paper • 2508.14444 • Published 13 days ago • 32
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling Paper • 2508.17445 • Published 9 days ago • 75
Provable Benefits of In-Tool Learning for Large Language Models Paper • 2508.20755 • Published 5 days ago • 9
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers Paper • 2508.20453 • Published 5 days ago • 48
MCPToolBench++: A Large Scale AI Agent Model Context Protocol MCP Tool Use Benchmark Paper • 2508.07575 • Published 22 days ago • 1
LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools? Paper • 2508.01780 • Published 30 days ago • 17
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL Paper • 2508.13167 • Published 27 days ago • 121
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers Paper • 2508.14704 • Published 13 days ago • 41
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction Paper • 2508.11987 • Published 17 days ago • 62
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization Paper • 2508.14460 • Published 13 days ago • 79
LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries Paper • 2508.15760 • Published 12 days ago • 43
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs Paper • 2508.16153 • Published 11 days ago • 119
Autonomous Evaluation and Refinement of Digital Agents Paper • 2404.06474 • Published Apr 9, 2024 • 3
Why Cannot Large Language Models Ever Make True Correct Reasoning? Paper • 2508.10265 • Published 19 days ago • 1