I recently added a recipe in ellora to improve reasoning capabilities to Gemma-3-1B using self-supervised learning. Model now shows step-by-step thinking in <think> tags before answering.
Logic puzzle accuracy: 61% โ 84%. 3 hours training on single GPU. ๐ง
Used GRPO where model generates multiple responses and learns to prefer better reasoning. Works surprisingly well for making smaller models more transparent.
I'm excited to announce that I've just released the newest versions of my Kuvera models and the expanded Personal Finance Reasoning dataset on Hugging Face!
What's new: I've expanded the Personal Finance Reasoning Dataset, which now includes 18.9k samples of real-world financial questions paired with detailed, empathetic answers. The previous generation pipeline was also streamlined with better psychological context and response validations.
I've also released new Kuvera models trained on this improved dataset: - Kuvera-4B & 8B: These are my upgraded non-reasoning models, fine-tuned to provide practical financial advice. I've specifically trained the 8B model to better understand the user's emotional context. - Kuvera-12B: A first experimental reasoning model focused on the query resolution.
As the sole person working on this project, this release is a noticeable step forward from my previous work, offering more powerful and nuanced tools for financial AI.
I am actively looking to collaborate with others who are passionate about analyzing and improving the quality of personal finance advice generated by large language models. If this sounds like you, please reach out!
P.S. The paper on the framework used to generate these models along with the detailed evaluation of the main 8B model's responses is going to be released soon!
Time to look at some free useful resources that can help you upgrade your knowledge of AI and machine learning! Today we offer you these 6 must-read surveys that can be your perfect guides to the major fields and techniques:
1. Foundations of Large Language Models by Tong Xiao and Jingbo Zhu โ https://arxiv.org/abs/2501.09223 Many recommend this 270-page book as a good resource to focus on fundamental concepts, such as pre-training, generative models, prompting, alignment, and inference
2. Large Language Models Post-Training: Surveying Techniques from Alignment to Reasoning -> A Survey on Post-training of Large Language Models (2503.06072) Read this to master policy optimization (RLHF, DPO, GRPO), supervised and parameter-efficient fine-tuning, reasoning, integration, and adaptation techniques
3. Agentic Large Language Models, a survey by Leiden University โ https://arxiv.org/abs/2503.23037 Surveys agentic LLMs across reasoning, tools, and multi-agent collaboration, highlighting their synergy. It also explores their promise, risks and applications in medicine, finance, science.
4. A Survey of Context Engineering for Large Language Models โ A Survey of Context Engineering for Large Language Models (2507.13334) Defines Context Engineering as systematic info design for LLMs beyond prompting, covering retrieval, processing, management, and architectures like RAG and multi-agent systems
5. A Survey of Generative Categories and Techniques in Multimodal Large Language Models โ https://arxiv.org/abs/2506.10016 Covers multimodal models, exploring six generative modalities, key techniques (SSL, RLHF, CoT), architectural trends, and challenges
6. Large Language models for Time Series Analysis: Techniques, Applications, and Challenges โ https://arxiv.org/abs/2506.11040 Explains how LLMs transform time series analysis by enhancing pattern recognition and long-term dependency handling + shows how to build them