Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

rbiswasfc 's Collections
Research Papers
Synthetic Data Generation
Empowering SLMs

Synthetic Data Generation

updated Mar 11, 2024

A curated list of papers focusing on synthetic data generation

Upvote
4

  • Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models

    Paper • 2402.13064 • Published Feb 20, 2024 • 50

  • Textbooks Are All You Need II: phi-1.5 technical report

    Paper • 2309.05463 • Published Sep 11, 2023 • 87

  • DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows

    Paper • 2402.10379 • Published Feb 16, 2024 • 32

  • Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

    Paper • 2312.06585 • Published Dec 11, 2023 • 29

  • HuggingFaceTB/cosmopedia

    Viewer • Updated Aug 12, 2024 • 31.1M • 6.09k • 632

  • OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset

    Paper • 2402.10176 • Published Feb 15, 2024 • 38

  • Efficient Exploration for LLMs

    Paper • 2402.00396 • Published Feb 1, 2024 • 23

  • Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling

    Paper • 2401.16380 • Published Jan 29, 2024 • 51

  • Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI

    Paper • 2401.14019 • Published Jan 25, 2024 • 24
Upvote
4
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs