1 99 3

Ksenia Se

Kseniase

https://www.turingpost.com/

AI & ML interests

None yet

Recent Activity

replied to their post 2 days ago

11 Powerful Image Models Everyone is buzzing around image generation this week, or more specifically, Google's Nano-Banana. So today we want to share a list of models that can be your great toolkit for image generation + editing + multi-turn refinement. 1. Gemini 2.5 Flash Image, or Nano-Banana → https://deepmind.google/models/gemini/image/ Google’s newest image model with conversational editing, character consistency, and multi-image fusion. Available in AI Studio and the Gemini API. Price: $2.50 per 1M tokens 2. FLUX (Black Forest Labs) → https://bfl.ai/ A family of models known for rich detail and, excellent prompt adherence, and fast iterative generation. Offered in several variants, from Pro to open-source, it's accessible via Hugging Face, Replicate, Azure AI Foundry, etc., and used as a base in many pipelines. Price: $0.025-0.08 per image 3. Midjourney v7 → https://www.midjourney.com/ Enhanced image fidelity, prompt comprehension, and anatomical coherence (hands, bodies, objects) + provides a smart lightbox editor. The Omni-reference tool improves character and object consistency in your images. It remains accessible via Discord with a supporting web interface. Price: $10-60/month 4. Stable Diffusion 3.5 (Stability AI) → https://stability.ai/stable-image Open-weights line with improved text rendering, photorealism, and prompt adherence compared to earlier versions. It introduces technical innovations through its MMDiT architecture. Price: $0.025-0.065 per image 5. OpenAI GPT-Image-1 →https://platform.openai.com/docs/guides/image-generation?image-generation-model=gpt-image-1 It's the same multimodal model that powers ChatGPT's image capabilities, offering high-fidelity image generation, precise edits, including inpainting, and accurate text rendering. Available via the Images API. Price: $40 per 1M tokens Read further below ⬇️ If you like this, also subscribe to the Turing post: https://www.turingpost.com/subscribe

posted an update 2 days ago

posted an update 23 days ago

6 Must-read books about AI and Machine Learning: Sharing some free, useful resources for you. In this collection, we’ve gathered the most recent books to give you up-to-date information on key fundamental topics. Hope this helps you master AI and machine learning: 1. Machine Learning Systems by Vijay Janapa Reddi → https://www.mlsysbook.ai/ Provides a framework for building effective ML solutions, covering data engineering, optimization, hardware-aware training, inference acceleration, architecture choice, and other key principles 2. Generative Diffusion Modeling: A Practical Handbook by Zihan Ding, Chi Jin → https://arxiv.org/abs/2412.17162 Offers a unified view of diffusion models: probabilistic, score-based, consistency, rectified flow, pre/post-training. It aligns notations with code to close the “paper-to-code” gap. 3. Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges → https://arxiv.org/abs/2104.13478 Explores unified geometric principles to analyze neural networks' architectures: CNNs, RNNs, GNNs, Transformers, and guide the design of the future ones 4. Mathematical Foundations of Geometric Deep Learning by Haitz Saez de Ocariz Borde and Michael Bronstein → https://arxiv.org/abs/2508.02723 Dives into the the key math concepts behind geometric Deep Learning: geometric and analytical structures, vector calculus, differential geometry, etc. 5. Interpretable Machine Learning by Christoph Molnar → https://github.com/christophM/interpretable-ml-book Practical guide to simple, transparent models (e.g., decision trees) and model-agnostic methods like LIME, Shapley values, permutation importance, and accumulated local effects. 6. Understanding Deep Learning by Simon J.D. Prince → https://udlbook.github.io/udlbook/ Explores core deep learning concenpts: models, training, evaluation, RL, architectures for images, text, and graphs, addressing open theoretical questions Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe

View all activity

Organizations

replied to their post 2 days ago

Adobe Firefly → https://www.adobe.com/products/firefly.html
What is really important, it's trained on ethically sourced data with C2PA provenance. It integrates into Creative Cloud tools like Photoshop Generative Fill, Express, and Firefly Boards. Recent updates add partner AI models (like Google Imagen, OpenAI) and a new Firefly mobile app for iOS and Android. Price: $9.99-29.99/month
Runway Gen-4 (images and videos) → https://runwayml.com/research/introducing-runway-gen-4
A still-image base model tuned for stylistic control and consistency. Its References feature allows users to input up to 3 images, helping preserve visual identity across outputs. Now fully accessible via the Runway API. Price: $12-76/month
Ideogram 3.0 → https://ideogram.ai/features/3.0
The current leader for clean, controllable text in images with Style Reference and strong layout/typography. Great for posters, logos, marketing, etc. Price: ~$0.03-0.09 per output image
Leonardo Phoenix (Leonardo AI) → https://leonardo.ai/phoenix/
Leonardo’s first foundation model emphasizing prompt adherence + readable text. It offers Style Reference for visual control, and Character Reference for consistent characters across shots. Price: $10-48/month
Freepik Mystic → https://www.freepik.com/ai/mystic
Delivers Full‑HD photorealism including lifelike portraits and accurate in‑image text without requiring post-processing. Built in collaboration with Magnific AI, it's integrated into the Freepik AI Image Generator suite. Price: € 5-143.75/month
PixArt-Σ (open-source) → https://pixart-alpha.github.io/PixArt-sigma-project/
A DiT-based T2I model that directly generates up to 4K, showing strong prompt following with a compact footprint. It's a great OSS alternative for researchers/builders. Freely available

posted an update 2 days ago

Post

233

11 Powerful Image Models

Everyone is buzzing around image generation this week, or more specifically, Google's Nano-Banana. So today we want to share a list of models that can be your great toolkit for image generation + editing + multi-turn refinement.

1. Gemini 2.5 Flash Image, or Nano-Banana →
https://deepmind.google/models/gemini/image/
Google’s newest image model with conversational editing, character consistency, and multi-image fusion. Available in AI Studio and the Gemini API. Price: $2.50 per 1M tokens

2. FLUX (Black Forest Labs) → https://bfl.ai/
A family of models known for rich detail and, excellent prompt adherence, and fast iterative generation. Offered in several variants, from Pro to open-source, it's accessible via Hugging Face, Replicate, Azure AI Foundry, etc., and used as a base in many pipelines. Price: $0.025-0.08 per image

3. Midjourney v7 → https://www.midjourney.com/
Enhanced image fidelity, prompt comprehension, and anatomical coherence (hands, bodies, objects) + provides a smart lightbox editor. The Omni-reference tool improves character and object consistency in your images. It remains accessible via Discord with a supporting web interface. Price: $10-60/month

4. Stable Diffusion 3.5 (Stability AI) → https://stability.ai/stable-image
Open-weights line with improved text rendering, photorealism, and
prompt adherence compared to earlier versions. It introduces technical innovations through its MMDiT architecture. Price: $0.025-0.065 per image

5. OpenAI GPT-Image-1 →https://platform.openai.com/docs/guides/image-generation?image-generation-model=gpt-image-1
It's the same multimodal model that powers ChatGPT's image capabilities, offering high-fidelity image generation, precise edits, including inpainting, and accurate text rendering. Available via the Images API. Price: $40 per 1M tokens

Read further below ⬇️
If you like this, also subscribe to the Turing post: https://www.turingpost.com/subscribe

1 reply

posted an update 23 days ago

Post

3494

6 Must-read books about AI and Machine Learning:

Sharing some free, useful resources for you. In this collection, we’ve gathered the most recent books to give you up-to-date information on key fundamental topics. Hope this helps you master AI and machine learning:

1. Machine Learning Systems by Vijay Janapa Reddi → https://www.mlsysbook.ai/
Provides a framework for building effective ML solutions, covering data engineering, optimization, hardware-aware training, inference acceleration, architecture choice, and other key principles

2. Generative Diffusion Modeling: A Practical Handbook by Zihan Ding, Chi Jin → https://arxiv.org/abs/2412.17162
Offers a unified view of diffusion models: probabilistic, score-based, consistency, rectified flow, pre/post-training. It aligns notations with code to close the “paper-to-code” gap.

3. Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges → https://arxiv.org/abs/2104.13478
Explores unified geometric principles to analyze neural networks' architectures: CNNs, RNNs, GNNs, Transformers, and guide the design of the future ones

4. Mathematical Foundations of Geometric Deep Learning by Haitz Saez de Ocariz Borde and Michael Bronstein → https://arxiv.org/abs/2508.02723
Dives into the the key math concepts behind geometric Deep Learning: geometric and analytical structures, vector calculus, differential geometry, etc.

5. Interpretable Machine Learning by Christoph Molnar → https://github.com/christophM/interpretable-ml-book
Practical guide to simple, transparent models (e.g., decision trees) and model-agnostic methods like LIME, Shapley values, permutation importance, and accumulated local effects.

6. Understanding Deep Learning by Simon J.D. Prince → https://udlbook.github.io/udlbook/
Explores core deep learning concenpts: models, training, evaluation, RL, architectures for images, text, and graphs, addressing open theoretical questions

Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe

2 replies

replied to their post 30 days ago

iVideoGPT → https://huggingface.co/papers/2405.15223
Unifies visual observations, actions, and rewards into a single token sequence, enabling scalable, interactive world modeling of high-dimensional environments
MaskGWM → https://huggingface.co/papers/2502.11663
It's used for autonomous driving. It improves long-horizon and multi-view prediction by combining video generation with MAE-style feature-level context learning. Its innovations include: scalable Diffusion Transformers, diffusion-aware mask tokens, and spatial-temporal masking.
World-model-augmented (WMA) web agent → https://huggingface.co/papers/2410.13232
This mix of a world model and LLM-based web agents enables agents to simulate future outcomes in natural language and avoid mistakes in long-horizon tasks. The world model's transition-focused abstraction allows for efficient policy improvement
Navigation World Models from Meta →
https://huggingface.co/papers/2412.03572
Allows agents to simulate and evaluate navigation trajectories before acting. Powered by a large Conditional Diffusion Transformer, NWM adapts to dynamic constraints and generalizes to unfamiliar environments with a single image
Сosmos World Foundation Models by NVIDIA →
https://huggingface.co/papers/2501.03575
Include 3 model families: 1) Cosmos-Predict1 simulates how the visual world evolves over time, learning physical world dynamics from video clips; 2) Cosmos-Transfer1 allows to guide world generation using multiple spatial control signals: segmentation, depth, edge maps, blurred visual inputs, etc.; 3) Cosmos-Reason1 reasons about what is happening, what will happen next, and what actions are feasible.
DreamerV3, Google DeepMind → https://arxiv.org/abs/2301.04104
A single, general-purpose world model-based RL algorithm. It demonstrates robust, farsighted planning in complex environments without human data or reward shaping, and excels in tasks like collecting diamonds in Minecraft from scratch.
Genie 2, Google DeepMind →
https://deepmind.google/discover/blog/genie-2-a-large-scale-foundation-world-model/
Generates diverse training environments for embodied agents. From a single image prompt, it creates playable virtual worlds controllable via keyboard and mouse usable by both humans and AI systems.

posted an update 30 days ago

Post

3410

12 Powerful World Models

World models are one of the most challenging areas in AI, pushing the boundaries of reasoning, perception, and planning. They're gen AI systems that help models and agents learn internal representations of real-world environments.

Today, we invite you to take a look at 12 standout examples:

1. WorldVLA → WorldVLA: Towards Autoregressive Action World Model (2506.21539)
This autoregressive world model integrates action prediction and visual world modeling in a single framework, allowing each to enhance the other. It introduces an attention masking strategy to reduce action prediction errors

2. SimuRA → https://arxiv.org/abs/2507.23773
A generalized world model that uses a language-based world model to simulate and plan actions before execution, enabling more general and flexible reasoning

3. PAN (Physical, Agentic, and Nested) world models → Critiques of World Models (2507.05169)
Has a hybrid architecture that combines discrete concept-based reasoning (via LLMs) with continuous perceptual simulation (via diffusion models), enabling rich multi-level, multimodal understanding and prediction

4. MineWorld by Microsoft Research → MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft (2504.08388)
Enables real-time, interactive world modeling in Minecraft by combining visual and action tokenization within an autoregressive Transformer. It uses parallel decoding for fast scene generation (4–7 FPS)

5. WorldMem → WORLDMEM: Long-term Consistent World Simulation with Memory (2504.12369)
Uses a memory bank with attention over time-stamped frames and states to maintain long-term and 3D spatial consistency in scene generation. So it reconstruct past scenes and simulate dynamic world changes across large temporal gaps

Read further below ⬇️

If you like this, also subscribe to the Turing post: https://www.turingpost.com/subscribe

Plus explore this article for a comprehensive overview of the history and current evolution of world models: https://www.turingpost.com/p/topic-35-what-are-world-models

1 reply

replied to their post about 1 month ago

CISPO: Clipped Importance Sampling Policy Optimization →
https://huggingface.co/papers/2506.13585
This RL algorithm from the MiniMax-M1 project clips importance-sampling weights instead of per-token updates. This lets all tokens (even rare but crucial ones) contribute to learning, avoiding the token-level clipping. CISPO also avoids KL penalties and uses group relative advantage like GRPO.
PAPO: Perception-Aware Policy Optimization → https://huggingface.co/papers/2507.06448
Enhances RL in vision-language tasks by adding a KL-based perception loss to the GRPO objective for better visual alignment during training. It boosts accuracy by 4–8% and reduces perception errors by ~30%.
OPO: On-Policy RL with Optimal Baseline → https://huggingface.co/papers/2505.23585
A simplified RL algorithm from Microsoft that enforces strict on-policy training by using freshly sampled outputs from the current policy for every update, minimizing off-policy drift. It minimizes gradient variance, avoiding auxiliary models and regularization.
EXPO: Expressive Policy Optimization → https://huggingface.co/papers/2507.07986
Trains complex policies by pairing a large base model with a lightweight edit policy that suggests better actions, selecting the best of both without backpropagating through the base.

posted an update about 1 month ago

Post

4986

9 new policy optimization techniques

Reinforcement Learning (RL) won't stuck in the same old PPO loop - in the last two months alone, researchers have introduced a new wave of techniques, reshaping how we train and fine-tune LLMs, VLMs, and agents.

Here are 9 fresh policy optimization techniques worth knowing:

1. GSPO: Group Sequence Policy Optimization → Group Sequence Policy Optimization (2507.18071)
Shifts from token-level to sequence-level optimization, clipping, and rewarding to capture the full picture and increase stability compared to GRPO. GSPO-token variation also allows token-level fine-tuning.

2. LAPO: Length-Adaptive Policy Optimization → LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization (2507.15758)
A two-stage RL framework that trains models to adaptively control reasoning length by learning typical solution lengths for shorter and more efficient reasoning.

3. HBPO: Hierarchical Budget Policy Optimization → Hierarchical Budget Policy Optimization for Adaptive Reasoning (2507.15844)
This one trains model to adapt reasoning depth based on problem complexity. It divides training samples into subgroups with different token budgets, using budget-aware rewards to align reasoning effort with task difficulty.

4. SOPHIA: Semi-off-policy reinforcement learning → Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning (2507.16814)
Combines on-policy visual understanding from the Vision Language Models (VLMs) with off-policy reasoning from an LM, assigning outcome-based rewards and propagating visual rewards backward through the reasoning steps.

5. RePO: Replay-Enhanced Policy Optimization → RePO: Replay-Enhanced Policy Optimization (2506.09340)
Introduces a replay buffer into on-policy RL for LLMs, retrieving diverse off-policy samples for each prompt to broaden the training data per prompt

Read further below ⬇️
If you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe

1 reply

posted an update about 1 month ago

Post

6212

6 Essential Reads on core AI/ML topics:

Time to look at some free useful resources that can help you upgrade your knowledge of AI and machine learning!
Today we offer you these 6 must-read surveys that can be your perfect guides to the major fields and techniques:

1. Foundations of Large Language Models by Tong Xiao and Jingbo Zhu → https://arxiv.org/abs/2501.09223
Many recommend this 270-page book as a good resource to focus on fundamental concepts, such as pre-training, generative models, prompting, alignment, and inference

2. Large Language Models Post-Training: Surveying Techniques from Alignment to Reasoning -> A Survey on Post-training of Large Language Models (2503.06072)
Read this to master policy optimization (RLHF, DPO, GRPO), supervised and parameter-efficient fine-tuning, reasoning, integration, and adaptation techniques

3. Agentic Large Language Models, a survey by Leiden University → https://arxiv.org/abs/2503.23037
Surveys agentic LLMs across reasoning, tools, and multi-agent collaboration, highlighting their synergy. It also explores their promise, risks and applications in medicine, finance, science.

4. A Survey of Context Engineering for Large Language Models → A Survey of Context Engineering for Large Language Models (2507.13334)
Defines Context Engineering as systematic info design for LLMs beyond prompting, covering retrieval, processing, management, and architectures like RAG and multi-agent systems

5. A Survey of Generative Categories and Techniques in Multimodal Large Language Models → https://arxiv.org/abs/2506.10016
Covers multimodal models, exploring six generative modalities, key techniques (SSL, RLHF, CoT), architectural trends, and challenges

6. Large Language models for Time Series Analysis: Techniques, Applications, and Challenges → https://arxiv.org/abs/2506.11040
Explains how LLMs transform time series analysis by enhancing pattern recognition and long-term dependency handling + shows how to build them

Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe

1 reply

replied to their post about 2 months ago

FreeLoRA → https://huggingface.co/papers/2507.01792
Enables training-free image generation with multiple subjects by fine-tuning each LoRA module on one subject. During inference, subject-aware activation applies modules only to their target tokens, ensuring clean, interference-free fusion.
LoRA-Augmented Generation (LAG) → https://huggingface.co/papers/2507.05346
Uses large collections of task-specific LoRA adapters without needing extra training or data. It selects and applies the most relevant adapters at each layer and token, exceling in knowledge-intensive tasks.
ARD-LoRA (Adaptive Rank Dynamic LoRA) → https://huggingface.co/papers/2506.18267
Adjusts the rank of LoRA adapters dynamically across transformer layers and heads by learning per-head scaling factors through a meta-objective. It balances performance, efficiency, using fewer parameters and reducing memory use.
WaRA → https://huggingface.co/papers/2506.24092
Designed for vision tasks, it uses wavelet transforms and decomposes weight updates into multiple resolutions, capturing both coarse and detailed patterns.
BayesLoRA → https://huggingface.co/papers/2506.22809
Adds uncertainty estimation to LoRA adapters using MC-Dropout, helping models gauge confidence in unfamiliar situations. It detects variance outside fine-tuned distributions, supporting more cautious and adaptive behavior of models.
Dual LoRA Learning (DLoRAL) → https://huggingface.co/papers/2506.15591
Trains two LoRA branches: C-LoRA captures temporal coherence from degraded input, while D-LoRA improves visual detail. It's used for video super-resolution that enhances both spatial detail and temporal consistency.
Safe Pruning LoRA (SPLoRA) → https://huggingface.co/papers/2506.18931
Improves the safety of LoRA-tuned LMs by selectively removing LoRA layers that reduce alignment, using a new E-DIEM metric to detect safety-related shifts without relying on data labels.
PLoP (Precise LoRA Placement) → https://huggingface.co/papers/2506.20629
A lightweight method that automatically selects optimal LoRA adapter placement during fine-tuning based on the model and task

posted an update about 2 months ago

Post

5140

13 New types of LoRA

LoRA (Low-Rank Adaptation) is a popular lightweight method for fine-tuning AI models. It doesn't update the full model, it adds small trainable components, low-rank matrices, while keeping the original weights frozen. Only these adapters are trained.

Recently, many interesting new LoRA variations came out, so it’s a great time to take a look at these 13 clever approaches:

1. T-LoRA → T-LoRA: Single Image Diffusion Model Customization Without Overfitting (2507.05964)
A timestep-dependent LoRA method for adapting diffusion models with a single image. It dynamically adjusts updates and uses orthogonal initialization to reduce overlap, achieving better fidelity–alignment balance than standard LoRA

2. SingLoRA → SingLoRA: Low Rank Adaptation Using a Single Matrix (2507.05566)
Simplifies LoRA by using only one small matrix instead of usual two, and multiplying it by its own transpose (like A × Aᵀ). It uses half the parameters of LoRA and avoids scale mismatch between different matrices

3. LiON-LoRA → LiON-LoRA: Rethinking LoRA Fusion to Unify Controllable Spatial and Temporal Generation for Video Diffusion (2507.05678)
Improves control and precision in video diffusion models when training data is limited. It builds on LoRA, adding 3 key principles: linear scalability, orthogonality, and norm consistency. A controllable token and modified self-attention enables smooth adjustment of motion

4. LoRA-Mixer → LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing (2507.00029)
Combines LoRA and mixture-of-experts (MoE) to adapt LLMs for multiple tasks. It dynamically routes task-specific LoRA experts into linear projections of attention modules, supporting both joint training and frozen expert reuse

5. QR-LoRA → QR-LoRA: Efficient and Disentangled Fine-tuning via QR Decomposition for Customized Generation (2507.04599)
Separates content and style when combining multiple LoRA adapters. It implements QR decomposition to structure parameter updates, where the orthogonal Q matrix reduces interference between features, and the R matrix captures specific transformations

Read further in the comments 👇

If you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe

1 reply

replied to their post about 2 months ago

AllVoiceLab MCP Server -> https://github.com/allvoicelab/AllVoiceLab-MCP
Enables AI agents to access advanced text-to-speech, voice conversion, and video translation APIs, powering use cases like global content localization, AI audiobooks, and voice-driven media production.
MCP Email Server -> https://github.com/Shy2593666979/mcp-server-email
For email functionality: write and send emails with multiple recipients, add and search files within specified directories.
Google Admin MCP Server -> https://github.com/securityfortech/google-admin-mcp
Manage Google Workspace users through the Admin Directory API (list, create, get info about users, etc.)
Android MCP Server -> https://github.com/minhalvp/android-mcp-server
Provides programmatic control over Android devices through ADB (Android Debug Bridge).
DeepView MCP -> https://github.com/ai-1st/deepview-mcp
Enables IDEs (Cursor, Windsurf, etc.) to analyze large codebases using Gemini's extensive context window.
Calculator MCP Server -> https://github.com/githejie/mcp-server-calculator
May sound easy, but it's essential for precise numerical calculations within LLMs
MCP Aggregator -> https://github.com/nazar256/combine-mcp
Combines multiple MCP servers into a single interface for more convenient use

posted an update about 2 months ago

Post

6535

13 Outstanding MCP Servers

MCP is redefining how AI assistants connect to the world of data and tools, so no wonder MCP servers are in high demand now. That’s why we’ve curated 13 cool MCP servers to upgrade your workflow:

1. Hugging Face Official MCP Server -> https://github.com/evalstate/hf-mcp-server
Provides an access and interaction with Hugging Face models, datasets, and Gradio Spaces for dynamic tool integration and configuration across environments.

2. Browser MCP -> https://browsermcp.io/
An MCP server +Chrome extension. It allows to automate your browser with AI apps like VS Code, Claude, Cursor, and Windsurf.

3. Bright Data MCP -> https://github.com/brightdata/brightdata-mcp
This one is for working with data in real-time: searching the web, navigating websites, taking action and retrieving data.

4. JSON MCP -> https://github.com/VadimNastoyashchy/json-mcp
Interact with JSON files: split, merge, find specific data, and validate content within them.

5. Octagon Deep Research MCP -> https://github.com/OctagonAI/octagon-deep-research-mcp
Allows for deep research via AI agents, integrating seamlessly with MCP clients like Claude Desktop and Cursor for powerful, unlimited research capabilities.

6. VLM Run MCP Server -> https://docs.vlm.run/mcp/introduction
Provides an agent the ability to see, understand and process visual content.

Read further in the comments 👇

P.S.:
Our most read explanation of MCP on Hugging Face https://huggingface.co/blog/Kseniase/mcp

Our first list of 13 awesome MCP servers: https://huggingface.co/posts/Kseniase/204958200717570

If you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe

1 reply

replied to their post 2 months ago

DeepResearcher -> https://github.com/GAIR-NLP/DeepResearcher
An RL framework for training deep research agents end-to-end in real-world environments with web search, exhibiting emergent behaviour like planning, multi-source validation, self-reflection, and honest defining when the agent doesn't know the answer
Search-R1 -> https://github.com/PeterGriffinJin/Search-R1
Features interleaved search access and an open-source RL training pipeline supporting various algorithms (PPO, GRPO, etc.), LLMs (LLaMA3, Qwen2.5, etc.), and search engines (online, local, retrievers)
ReCall -> https://github.com/Agent-RL/ReCall
Trains LLMs to reason with tools via RL, no supervised tool-use data needed. It enables agentic use of tools like OpenAI o3 and supports synthetic data generation across diverse environments and multi-step tasks
OWL -> https://github.com/camel-ai/owl
A framework built on CAMEL-AI framework enabling dynamic multi-agent collaboration for task automation across diverse domains

Here's an awesome study exploring the entire roadmap of Deep Research assistants. Don't forget to check it out -> https://huggingface.co/papers/2506.18096

posted an update 2 months ago

Post

3661

10 Open-source Deep Research assistants

Deep Research agents are quickly becoming our daily co-workers — built for complex investigations, not just chat. With modular architecture, advanced tool use and real web access, they go far beyond typical AI. While big-name agents get the spotlight, we want to highlight some powerful recent open-source alternatives:

1. DeerFlow -> https://github.com/bytedance/deer-flow
A modular multi-agent system combining LMs and tools for automated research and code analysis. It links a coordinator, planner, team of specialized agent, and reporter, and converts reports to speech via Text-to-Speech (TTS)

2. Alita -> https://github.com/CharlesQ9/Alita
Uses a single problem-solving module for scalable reasoning through simplicity. It self-evolves by generating and reusing Model Context Protocols (MCPs) from open-source tools to build external capabilities for diverse tasks

3. WebThinker -> https://github.com/RUC-NLPIR/WebThinker
Lets reasoning models autonomously search the web and navigate pages. Deep Web Explorer allows interaction with links and follow-up searches. Through a Think-Search-and-Draft process models generate and refine reports in real time. RL training with preference pairs improves the workflow

4. SimpleDeepSearcher -> https://github.com/RUCAIBox/SimpleDeepSearcher
A lightweight framework showing that supervised fine-tuning is a real alternative to complex RL, using simulated web interactions and multi-criteria curation to generate high-quality training data

5. AgenticSeek -> https://github.com/Fosowl/agenticSeek
A private, on-device assistant that picks the best agent expert for browsing, coding, or planning—no cloud needed. Includes voice input via speech-to-text

6. Suna -> https://github.com/kortix-ai/suna
Offers web browsing, file and doc handling, CLI execution, site deployment, and API/service integration—all in one assistant

Subscribe to the Turing Post:https://www.turingpost.com/subscribe
Read further ⬇️

2 replies

upvoted 2 articles 2 months ago

Article

Accidentally Building an AI Reasoning Research Ecosystem (Or: Can AI Stop Thinking?)

•

Jun 26

• 3

Article

What Coding Agent Wins?

and 1 other •

Jun 26

• 7

published an article 2 months ago

Article

What Coding Agent Wins?

and 1 other •

Jun 26

• 7

replied to their post 2 months ago

Constraint-Based Decoding -> https://huggingface.co/papers/2502.05111
Guide generation using hard constraints, like context-free grammar (CFG) rules. This keeps outputs aligned with task goals, especially in structured prediction or planning. Can be combined with symbolic solvers or logic-checking agents
Exploration Prompts (Explore-then-Pick) -> https://huggingface.co/papers/2506.09014
Generate multiple diverse responses via sampling, then use a learned Sample Set Aggregator (SSA), trained with reinforcement learning, to pick the best answer. Similar to “draft → verify” strategies, but the final selection is done via a trained model, not heuristics.
Prompt Perturbation Sampling for Inference -> https://huggingface.co/papers/2502.11027
From a pool of diverse model responses sampled with prompt perturbation, distill only the most elegant, logically consistent outputs to improve metrics like Pass@10. This is a post‑generation inference technique.
Prompt Ordering via Embedding Clustering -> https://openreview.net/pdf?id=1Iu2Yte5N6
Uncovers that few-shot prompt permutations form clusters in the model’s embedding space — especially by first demonstration — and uses this to design a cluster-based ordering method for generating strong in-context example sequences.
Controlled Prompting Variations -> https://huggingface.co/papers/2504.02111
Controlled “bad” prompts (like irrelevant info, misleading framing) expose fragilities in model reasoning. So use light adversarial prompting in evaluations to find breaking points. Plus remove irrelevant info to reduce confusion and improve focus; standardize format to minimize inconsistency and hallucination; and implement explicitly prompt reasoning to boost accuracy and transparency

posted an update 2 months ago

Post

5430

10 Techniques for Boosting LLM Reasoning in 2025

Everyone’s chasing top reasoning, but sometimes it's still the bottleneck for many real-world tasks. This week, let's spotlight some powerful techniques that have shown promise in helping LLMs achieve more consistent logic, planning, and depth:

1. Retrieval-Augmented CoT Chaining (RAG+CoT) -> CoT-RAG: Integrating Chain of Thought and Retrieval-Augmented Generation to Enhance Reasoning in Large Language Models (2504.13534)
Combines Chain-of-Thought prompting with retrieval augmentation at intermediate steps. Relevant documents are fetched after each reasoning subgoal, updating context dynamically. Great for open-domain QA, math, logic and multi-hop fact-checking

2. Tool-use by example injection -> Self-Training Large Language Models for Tool-Use Without Demonstrations (2502.05867)
Injects few-shot tool interaction examples during training to implicitly teach calling patterns. Helps in plug-and-play tool use without training new architectures

3. Visual Scratchpads, or multimodal reasoning support -> Imagine while Reasoning in Space: Multimodal Visualization-of-Thought (2501.07542)
Using structured visual inputs or sketchable intermediate steps (diagrams, grids, trees) boosts performance in tasks like planning, geometry, and multi-agent simulation. In real practice thanks to this GPT-4o, Claude, and Gemini show marked improvement

4. System 1 vs System 2 Prompt switching -> Adaptive Deep Reasoning: Triggering Deep Thinking When Needed (2505.20101)
Changing a fast, intuitive response prompt with a slow, deliberate reasoning mode is among the most popular AI trends. E.g., models tend to respond more reliably when explicitly instructed to “think like a researcher.” This can also reduce hallucinations in open-ended generation and debate tasks

5. Adversarial Self-Chat Fine-Tuning -> Self-playing Adversarial Language Game Enhances LLM Reasoning (2404.10642)
Generate debates between model variants or model vs human, then fine-tune on the winner’s response. It helps models learn to better defend their reasoning. Used in Claude’s Constitutional AI and SPPO-style tuning

Read further below👇

Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe

2 replies

reacted to their post with 👍 3 months ago

Post

3579

11 Types of JEPA

Since Meta released the newest V-JEPA 2 this week, we thought it's a good time to revisit a few other interesting JEPA variants. JEPA, or Joint Embedding Predictive Architecture, a self-supervised learning framework that predicts the latent representation of a missing part of the input.

Here are 11 JEPA types that you should know about:

1. V-JEPA 2 -> V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning (2506.09985)
Trained on 1M+ hours of internet videos and a little bit of robot interaction data, V-JEPA 2 can watch, understand, answer questions, and help robots plan and act in physical world

2. Time-Series-JEPA (TS-JEPA) -> Time-Series JEPA for Predictive Remote Control under Capacity-Limited Networks (2406.04853)
It's a time-series predictive model that learns compact, meaningful representations. A self-supervised semantic actor then uses them to generate control commands without raw data

3. Denoising JEPA (D-JEPA) -> Denoising with a Joint-Embedding Predictive Architecture (2410.03755)
Combines JEPA with diffusion techniques. By treating JEPA as masked image modeling and next-token prediction, D-JEPA generates data auto-regressively, incorporating diffusion and flow-matching losses

4. CNN-JEPA -> CNN-JEPA: Self-Supervised Pretraining Convolutional Neural Networks Using Joint Embedding Predictive Architecture (2408.07514)
This SSL approach applies JEPA idea to CNNs using a sparse encoder, depthwise separable convolutions, and improved masking. On ImageNet-100, CNN-JEPA outperforms I-JEPA with 73.3% accuracy

5. Stem-JEPA -> Stem-JEPA: A Joint-Embedding Predictive Architecture for Musical Stem Compatibility Estimation (2408.02514)
Identifies instrument stems by mapping mixes and stems into a shared space using an encoder and predictor. It captures timbre, harmony, and rhythm for tasks like stem retrieval, alignment, and genre or key estimation

6. DMT-JEPA (Discriminative Masked Targets JEPA) -> DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture (2405.17995)
Improves discriminative power by generating masked targets from semantically similar neighboring patches and uses lightweight cross-attention for aggregation

Read further below👇

Also, subscribe to the Turing Post -> https://www.turingpost.com/subscribe

1 reply

Ksenia Se

AI & ML interests

Recent Activity

Organizations

Kseniase's activity

Accidentally Building an AI Reasoning Research Ecosystem (Or: Can AI Stop Thinking?)

What Coding Agent Wins?

What Coding Agent Wins?