Yedidia AGNIMO
YedsonUQ
·
AI & ML interests
[Uncertainty Quantification, "Hallucinations"] in LLMs, Federated Learning
Organizations
None yet
Understanding LLM Representation
Test-Time Scaling (TTS)
Long-context
AI-Automated Scientific Research
-
SurveyX: Academic Survey Automation via Large Language Models
Paper • 2502.14776 • Published • 101 -
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
Paper • 2408.06292 • Published • 127 -
Towards an AI co-scientist
Paper • 2502.18864 • Published • 52 -
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Paper • 2504.17192 • Published • 115
Distributed Training and Federated Learning
Findings
-
Large Language Models Think Too Fast To Explore Effectively
Paper • 2501.18009 • Published • 24 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
Intuitive physics understanding emerges from self-supervised pretraining on natural videos
Paper • 2502.11831 • Published • 20 -
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity
Paper • 2502.13063 • Published • 73
Hallucination
Models Series
-
EuroBERT: Scaling Multilingual Encoders for European Languages
Paper • 2503.05500 • Published • 81 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 418 -
Qwen2.5-1M Technical Report
Paper • 2501.15383 • Published • 73 -
Baichuan-Omni-1.5 Technical Report
Paper • 2501.15368 • Published • 64
Reinforcement Learning (RL)
Uncertainty Quantification
-
Evolution and The Knightian Blindspot of Machine Learning
Paper • 2501.13075 • Published • 6 -
From Aleatoric to Epistemic: Exploring Uncertainty Quantification Techniques in Artificial Intelligence
Paper • 2501.03282 • Published -
Efficient Test-Time Scaling via Self-Calibration
Paper • 2503.00031 • Published • 15 -
Investigating Human-Aligned Large Language Model Uncertainty
Paper • 2503.12528 • Published • 4
Hallucination Frameworks Ideas
Query decomposition, ambiguity,
Efficient Inference
Agents AI
Foundational Deep Learning - Architecture
-
Forgetting Transformer: Softmax Attention with a Forget Gate
Paper • 2503.02130 • Published • 32 -
L^2M: Mutual Information Scaling Law for Long-Context Language Modeling
Paper • 2503.04725 • Published • 21 -
Transformers without Normalization
Paper • 2503.10622 • Published • 169 -
I-Con: A Unifying Framework for Representation Learning
Paper • 2504.16929 • Published • 30
Benchmark and Evaluation
-
Humanity's Last Exam
Paper • 2501.14249 • Published • 76 -
Benchmarking LLMs for Political Science: A United Nations Perspective
Paper • 2502.14122 • Published • 2 -
IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval
Paper • 2503.04644 • Published • 21 -
ExpertGenQA: Open-ended QA generation in Specialized Domains
Paper • 2503.02948 • Published
Explainable AI - Interpretable AI
Theory, Conceptualization, Paradigms
Learning Paradigm/Scheme
Reasoning - Chain-of-Thought
-
Evolving Deeper LLM Thinking
Paper • 2501.09891 • Published • 116 -
Reasoning Language Models: A Blueprint
Paper • 2501.11223 • Published • 34 -
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong
Paper • 2501.09775 • Published • 34 -
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Paper • 2501.09686 • Published • 41
Retrieval Augmented Generation (RAG)
-
Chain-of-Retrieval Augmented Generation
Paper • 2501.14342 • Published • 60 -
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models
Paper • 2502.01142 • Published • 24 -
SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model
Paper • 2501.18636 • Published • 31 -
UniversalRAG: Retrieval-Augmented Generation over Multiple Corpora with Diverse Modalities and Granularities
Paper • 2504.20734 • Published • 63
Survey
Fine-Tuning, PEFT
Hallucination Frameworks Ideas
Query decomposition, ambiguity,
Understanding LLM Representation
Efficient Inference
Test-Time Scaling (TTS)
Agents AI
Long-context
Foundational Deep Learning - Architecture
-
Forgetting Transformer: Softmax Attention with a Forget Gate
Paper • 2503.02130 • Published • 32 -
L^2M: Mutual Information Scaling Law for Long-Context Language Modeling
Paper • 2503.04725 • Published • 21 -
Transformers without Normalization
Paper • 2503.10622 • Published • 169 -
I-Con: A Unifying Framework for Representation Learning
Paper • 2504.16929 • Published • 30
AI-Automated Scientific Research
-
SurveyX: Academic Survey Automation via Large Language Models
Paper • 2502.14776 • Published • 101 -
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
Paper • 2408.06292 • Published • 127 -
Towards an AI co-scientist
Paper • 2502.18864 • Published • 52 -
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Paper • 2504.17192 • Published • 115
Benchmark and Evaluation
-
Humanity's Last Exam
Paper • 2501.14249 • Published • 76 -
Benchmarking LLMs for Political Science: A United Nations Perspective
Paper • 2502.14122 • Published • 2 -
IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval
Paper • 2503.04644 • Published • 21 -
ExpertGenQA: Open-ended QA generation in Specialized Domains
Paper • 2503.02948 • Published
Distributed Training and Federated Learning
Explainable AI - Interpretable AI
Findings
-
Large Language Models Think Too Fast To Explore Effectively
Paper • 2501.18009 • Published • 24 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
Intuitive physics understanding emerges from self-supervised pretraining on natural videos
Paper • 2502.11831 • Published • 20 -
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity
Paper • 2502.13063 • Published • 73
Theory, Conceptualization, Paradigms
Hallucination
Learning Paradigm/Scheme
Models Series
-
EuroBERT: Scaling Multilingual Encoders for European Languages
Paper • 2503.05500 • Published • 81 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 418 -
Qwen2.5-1M Technical Report
Paper • 2501.15383 • Published • 73 -
Baichuan-Omni-1.5 Technical Report
Paper • 2501.15368 • Published • 64
Reasoning - Chain-of-Thought
-
Evolving Deeper LLM Thinking
Paper • 2501.09891 • Published • 116 -
Reasoning Language Models: A Blueprint
Paper • 2501.11223 • Published • 34 -
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong
Paper • 2501.09775 • Published • 34 -
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Paper • 2501.09686 • Published • 41
Reinforcement Learning (RL)
Retrieval Augmented Generation (RAG)
-
Chain-of-Retrieval Augmented Generation
Paper • 2501.14342 • Published • 60 -
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models
Paper • 2502.01142 • Published • 24 -
SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model
Paper • 2501.18636 • Published • 31 -
UniversalRAG: Retrieval-Augmented Generation over Multiple Corpora with Diverse Modalities and Granularities
Paper • 2504.20734 • Published • 63
Uncertainty Quantification
-
Evolution and The Knightian Blindspot of Machine Learning
Paper • 2501.13075 • Published • 6 -
From Aleatoric to Epistemic: Exploring Uncertainty Quantification Techniques in Artificial Intelligence
Paper • 2501.03282 • Published -
Efficient Test-Time Scaling via Self-Calibration
Paper • 2503.00031 • Published • 15 -
Investigating Human-Aligned Large Language Model Uncertainty
Paper • 2503.12528 • Published • 4
Survey