File size: 4,537 Bytes
571b640
 
 
 
 
 
 
 
 
0dd5d28
 
5e7afcf
0dd5d28
5e7afcf
0dd5d28
 
 
 
 
 
92cc155
 
 
0dd5d28
 
5e7afcf
0dd5d28
 
a2c1456
5e7afcf
 
a2c1456
0dd5d28
 
5e7afcf
0dd5d28
92cc155
 
 
 
 
 
261046a
 
 
5e7afcf
 
 
 
 
92cc155
5e7afcf
 
 
ac24b3d
5e7afcf
 
 
 
92cc155
5e7afcf
92cc155
5e7afcf
 
a2c1456
 
5e7afcf
 
 
261046a
92cc155
261046a
92cc155
5e7afcf
261046a
 
 
 
 
 
 
 
 
 
 
 
a2c1456
261046a
 
 
5e7afcf
 
a2c1456
92cc155
a2c1456
5e7afcf
a2c1456
 
 
 
 
5e7afcf
a2c1456
 
 
 
5e7afcf
 
 
 
92cc155
5e7afcf
 
 
 
92cc155
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
---
title: README
emoji: πŸ“š
colorFrom: red
colorTo: indigo
sdk: static
pinned: false
---

# UV Scripts

**Ready-to-run ML tools powered by UV - zero setup, maximum power**

Run state-of-the-art ML workflows with a single command. From OCR to classification, all scripts work instantly with `uv run`.

## What are UV scripts?

UV scripts are self-contained Python scripts that use [inline metadata](https://docs.astral.sh/uv/guides/scripts/) to specify dependencies. Just `uv run script.py` and everything installs automatically.

Perfect for:

- πŸš€ **GPU workflows** on [HF Jobs](https://huggingface.co/docs/huggingface_hub/guides/jobs)
- πŸ’» **Local processing** on your machine
- πŸ”„ **Reproducible pipelines** that work anywhere

## πŸš€ Quick Example

```bash
# Extract text from images with state-of-the-art OCR (no local GPU needed!)
hf jobs uv run --flavor l4x1 \
  https://huggingface.co/datasets/uv-scripts/ocr/raw/main/nanonets-ocr.py \
  your-images your-extracted-text
```

## πŸ“š Browse Scripts

| Script Collection                                                               | Description                                               | GPU Required |
| ------------------------------------------------------------------------------- | --------------------------------------------------------- | ------------ |
| [ocr](https://huggingface.co/datasets/uv-scripts/ocr)                           | Extract text from images with VLMs (LaTeX, tables, forms) | βœ…           |
| [classification](https://huggingface.co/datasets/uv-scripts/classification)     | Text classification with guaranteed valid outputs         | βœ…           |
| [dataset-creation](https://huggingface.co/datasets/uv-scripts/dataset-creation) | Create datasets from PDFs and files                       | ❌           |
| [vllm](https://huggingface.co/datasets/uv-scripts/vllm)                         | High-performance inference with vLLM                      | βœ…           |
| [synthetic-data](https://huggingface.co/datasets/uv-scripts/synthetic-data)     | Generate high-quality synthetic data with CoT reasoning   | βœ…           |
| [deduplication](https://huggingface.co/datasets/uv-scripts/deduplication)       | Remove duplicates using semantic similarity               | ❌           |
| [openai-oss](https://huggingface.co/datasets/uv-scripts/openai-oss)             | Generate responses with visible reasoning traces          | βœ…           |

## 🎯 Why UV Scripts?

### Zero Setup

No virtual environments, no dependency conflicts, no installation steps. UV handles everything automatically when you run the script.

### GPU Optimized

Seamlessly run on local GPUs or scale to cloud with [HF Jobs](https://huggingface.co/docs/huggingface_hub/guides/jobs). Same script, different compute.

## 🌟 Featured Scripts

### OCR Any Document Dataset

Extract text from images with state-of-the-art accuracy:

```bash
# Handles LaTeX, tables, forms, handwriting
hf jobs uv run --flavor l4x1 \
  https://huggingface.co/datasets/uv-scripts/ocr/raw/main/nanonets-ocr.py \
  your-images extracted-text
```

### Deduplicate Datasets (CPU-Friendly!)

Remove duplicates using semantic similarity - no GPU needed:

```bash
# Fast semantic deduplication on CPU
uv run https://huggingface.co/datasets/uv-scripts/deduplication/raw/main/semantic-dedupe.py \
  your-dataset text your-dataset-clean \
  --method duplicates --threshold 0.9
```

### Generate Synthetic Training Data

Create high-quality synthetic data with chain-of-thought reasoning:

```bash
# Generate synthetic math problems with reasoning
hf jobs uv run --flavor l4x1 \
  https://huggingface.co/datasets/uv-scripts/synthetic-data/raw/main/cot-self-instruct.py \
  --seed-dataset math-examples --output-dataset synthetic-math \
  --task-type reasoning --num-samples 1000
```

## πŸš€ Getting Started with HF Jobs

Run any UV script on GPU infrastructure:

```bash
hf jobs uv run --flavor l4x1 \
  https://huggingface.co/datasets/uv-scripts/[collection]/raw/main/[script].py \
  [args]
```

Choose your GPU flavor:
- `l4x1` - Good balance for most tasks
- `a10g-large` - More memory for larger models
- `a100-large` - Maximum performance

## πŸ“– Learn More

- [UV Documentation](https://docs.astral.sh/uv/)
- [HF Jobs Guide](https://huggingface.co/docs/huggingface_hub/guides/jobs)
- [Script Examples](https://github.com/astral-sh/uv/tree/main/scripts)

---

_UV Scripts is a community project showcasing the power of [UV](https://github.com/astral-sh/uv) for ML workflows._