Spaces:
Running
Running
File size: 4,537 Bytes
571b640 0dd5d28 5e7afcf 0dd5d28 5e7afcf 0dd5d28 92cc155 0dd5d28 5e7afcf 0dd5d28 a2c1456 5e7afcf a2c1456 0dd5d28 5e7afcf 0dd5d28 92cc155 261046a 5e7afcf 92cc155 5e7afcf ac24b3d 5e7afcf 92cc155 5e7afcf 92cc155 5e7afcf a2c1456 5e7afcf 261046a 92cc155 261046a 92cc155 5e7afcf 261046a a2c1456 261046a 5e7afcf a2c1456 92cc155 a2c1456 5e7afcf a2c1456 5e7afcf a2c1456 5e7afcf 92cc155 5e7afcf 92cc155 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
---
title: README
emoji: π
colorFrom: red
colorTo: indigo
sdk: static
pinned: false
---
# UV Scripts
**Ready-to-run ML tools powered by UV - zero setup, maximum power**
Run state-of-the-art ML workflows with a single command. From OCR to classification, all scripts work instantly with `uv run`.
## What are UV scripts?
UV scripts are self-contained Python scripts that use [inline metadata](https://docs.astral.sh/uv/guides/scripts/) to specify dependencies. Just `uv run script.py` and everything installs automatically.
Perfect for:
- π **GPU workflows** on [HF Jobs](https://huggingface.co/docs/huggingface_hub/guides/jobs)
- π» **Local processing** on your machine
- π **Reproducible pipelines** that work anywhere
## π Quick Example
```bash
# Extract text from images with state-of-the-art OCR (no local GPU needed!)
hf jobs uv run --flavor l4x1 \
https://huggingface.co/datasets/uv-scripts/ocr/raw/main/nanonets-ocr.py \
your-images your-extracted-text
```
## π Browse Scripts
| Script Collection | Description | GPU Required |
| ------------------------------------------------------------------------------- | --------------------------------------------------------- | ------------ |
| [ocr](https://huggingface.co/datasets/uv-scripts/ocr) | Extract text from images with VLMs (LaTeX, tables, forms) | β
|
| [classification](https://huggingface.co/datasets/uv-scripts/classification) | Text classification with guaranteed valid outputs | β
|
| [dataset-creation](https://huggingface.co/datasets/uv-scripts/dataset-creation) | Create datasets from PDFs and files | β |
| [vllm](https://huggingface.co/datasets/uv-scripts/vllm) | High-performance inference with vLLM | β
|
| [synthetic-data](https://huggingface.co/datasets/uv-scripts/synthetic-data) | Generate high-quality synthetic data with CoT reasoning | β
|
| [deduplication](https://huggingface.co/datasets/uv-scripts/deduplication) | Remove duplicates using semantic similarity | β |
| [openai-oss](https://huggingface.co/datasets/uv-scripts/openai-oss) | Generate responses with visible reasoning traces | β
|
## π― Why UV Scripts?
### Zero Setup
No virtual environments, no dependency conflicts, no installation steps. UV handles everything automatically when you run the script.
### GPU Optimized
Seamlessly run on local GPUs or scale to cloud with [HF Jobs](https://huggingface.co/docs/huggingface_hub/guides/jobs). Same script, different compute.
## π Featured Scripts
### OCR Any Document Dataset
Extract text from images with state-of-the-art accuracy:
```bash
# Handles LaTeX, tables, forms, handwriting
hf jobs uv run --flavor l4x1 \
https://huggingface.co/datasets/uv-scripts/ocr/raw/main/nanonets-ocr.py \
your-images extracted-text
```
### Deduplicate Datasets (CPU-Friendly!)
Remove duplicates using semantic similarity - no GPU needed:
```bash
# Fast semantic deduplication on CPU
uv run https://huggingface.co/datasets/uv-scripts/deduplication/raw/main/semantic-dedupe.py \
your-dataset text your-dataset-clean \
--method duplicates --threshold 0.9
```
### Generate Synthetic Training Data
Create high-quality synthetic data with chain-of-thought reasoning:
```bash
# Generate synthetic math problems with reasoning
hf jobs uv run --flavor l4x1 \
https://huggingface.co/datasets/uv-scripts/synthetic-data/raw/main/cot-self-instruct.py \
--seed-dataset math-examples --output-dataset synthetic-math \
--task-type reasoning --num-samples 1000
```
## π Getting Started with HF Jobs
Run any UV script on GPU infrastructure:
```bash
hf jobs uv run --flavor l4x1 \
https://huggingface.co/datasets/uv-scripts/[collection]/raw/main/[script].py \
[args]
```
Choose your GPU flavor:
- `l4x1` - Good balance for most tasks
- `a10g-large` - More memory for larger models
- `a100-large` - Maximum performance
## π Learn More
- [UV Documentation](https://docs.astral.sh/uv/)
- [HF Jobs Guide](https://huggingface.co/docs/huggingface_hub/guides/jobs)
- [Script Examples](https://github.com/astral-sh/uv/tree/main/scripts)
---
_UV Scripts is a community project showcasing the power of [UV](https://github.com/astral-sh/uv) for ML workflows._
|