Kristaller486's picture

Kristaller486

kristaller486

·

krist486

AI & ML interests

NLP, Machine Translation

Recent Activity

liked a model 2 days ago

meituan-longcat/LongCat-Flash-Chat

liked a dataset 4 days ago

Agisight/google-smol-en-ru

liked a dataset 5 days ago

Just-ln-Case/Marker

View all activity

Organizations

upvoted a collection 13 days ago

DeepSeek-V3.1

3 items • Updated 12 days ago • 218

upvoted a paper 17 days ago

Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning

Paper • 2508.09726 • Published 20 days ago • 13

upvoted a paper 21 days ago

SONAR-LLM: Autoregressive Transformer that Thinks in Sentence Embeddings and Speaks in Tokens

Paper • 2508.05305 • Published 26 days ago • 45

upvoted a paper about 1 month ago

Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance

Paper • 2507.22448 • Published Jul 30 • 65

upvoted a collection about 2 months ago

T-pro-2.0

Hybrid reasoning model based on Qwen3 32B • 12 items • Updated Jul 18 • 30

upvoted a collection 2 months ago

Skywork-Reward-V2

Scaling preference data curation to the extreme • 9 items • Updated Jul 4 • 23

upvoted 3 papers 3 months ago

Geopolitical biases in LLMs: what are the "good" and the "bad" countries according to contemporary language models

Paper • 2506.06751 • Published Jun 7 • 72

Exploring the Latent Capacity of LLMs for One-Step Text Generation

Paper • 2505.21189 • Published May 27 • 62

Quartet: Native FP4 Training Can Be Optimal for Large Language Models

Paper • 2505.14669 • Published May 20 • 78

upvoted a collection 3 months ago

Falcon-H1

Falcon-H1 Family of Hybrid-Head Language Models (Transformer-SSM), including 0.5B, 1.5B, 1.5B-Deep, 3B, 7B, and 34B (pretrained & instruction-tuned). • 38 items • Updated Jul 31 • 53

upvoted 3 papers 4 months ago

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Paper • 2505.09343 • Published May 14 • 69

Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers

Paper • 2504.20752 • Published Apr 29 • 93

Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

Paper • 2504.17192 • Published Apr 24 • 115

upvoted a paper 5 months ago

RuOpinionNE-2024: Extraction of Opinion Tuples from Russian News Texts

Paper • 2504.06947 • Published Apr 9 • 4

upvoted 2 collections 5 months ago

Cogito v1 Preview

5 items • Updated Apr 8 • 119

Gemma 3 QAT INT4 (from Flax)

These are converted from the official QAT INT4 Flax checkpoints on Kaggle. Supported formats: AutoAWQ, GGUF • 12 items • Updated Apr 6 • 6

upvoted a paper 5 months ago

Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback

Paper • 2406.09279 • Published Jun 13, 2024 • 3

upvoted a paper 6 months ago

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

Paper • 2503.01307 • Published Mar 3 • 39

upvoted 2 collections 6 months ago

Slam

All resources for SpeechLMs from "Slamming: Training a Speech Language Model on One GPU in a Day". We provide tokeniser, lm, and datasets • 7 items • Updated May 22 • 13

RuModernBERT

Modernized BERT for Russian • 2 items • Updated Feb 19 • 5