---
title: README
emoji: 🚀
colorFrom: pink
colorTo: red
sdk: static
pinned: true
---
FuriosaAI develops data center AI accelerators. Our RNGD (pronounced “Renegade) accelerator, currently sampling, 
excels at high-performance inference for LLMs and agentic AI. 

Get started fast with common inference tasks on RNGD 
using these pre-compiled popular Hugging Face models – no manual conversion or quantization needed. Requires Furiosa SDK 2025.2 or later on a server with RNGD accelerator.

Need a model with custom configurations? Compile it yourself using our [Model Preparation Workflow](https://developer.furiosa.ai/latest/en/furiosa_llm/model-preparation-workflow.html) on Furiosa Docs. 
Visit [Supported Models](https://developer.furiosa.ai/latest/en/overview/supported_models.html) in the SDK documentation 
for more information and learn more about RNGD at https://furiosa.ai/rngd.


## Pre-compiled models

| Pre-compiled Model                                                                                            | Description                          | Base Model                                                                                                    | Support Version |
| ------------------------------------------------------------------------------------------------------------- | ------------------------------------ |-------------------------------------------------------------------------------------------------------------- | ----------------|
| [furiosa-ai/bert-large-uncased-INT8-MLPerf](https://huggingface.co/furiosa-ai/bert-large-uncased-INT8-MLPerf) | INT8 quantized, optimized for MLPerf | [google-bert/bert-large-uncased](https://huggingface.co/google-bert/bert-large-uncased)                       | 2025.2          |
| [furiosa-ai/gpt-j-6b-FP8-MLPerf](https://huggingface.co/furiosa-ai/gpt-j-6b-FP8-MLPerf)                       | FP8 quantized, optimized for MLPerf  | [EleutherAI/gpt-j-6b](https://huggingface.co/EleutherAI/gpt-j-6b)                                             | 2025.2          |
| [furiosa-ai/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/furiosa-ai/DeepSeek-R1-Distill-Llama-8B)     | BF16                                 | [deepseek-ai/DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B)   | >= 2025.3       |
| [furiosa-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/furiosa-ai/DeepSeek-R1-Distill-Llama-70B)   | BF16                                 | [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) | >= 2025.3       |
| [furiosa-ai/EXAONE-3.5-7.8B-Instruct](https://huggingface.co/furiosa-ai/EXAONE-3.5-7.8B-Instruct)             | BF16                                 | [LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct](https://huggingface.co/LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct)           | >= 2025.2       |
| [furiosa-ai/EXAONE-3.5-32B-Instruct](https://huggingface.co/furiosa-ai/EXAONE-3.5-32B-Instruct)               | BF16                                 | [LGAI-EXAONE/EXAONE-3.5-32B-Instruct](https://huggingface.co/LGAI-EXAONE/EXAONE-3.5-32B-Instruct)             | >= 2025.2       |
| [furiosa-ai/Llama-3.1-8B-Instruct](https://huggingface.co/furiosa-ai/Llama-3.1-8B-Instruct)                   | BF16                                 | [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)                   | >= 2025.2       |
| [furiosa-ai/Llama-3.1-8B-Instruct-FP8](https://huggingface.co/furiosa-ai/Llama-3.1-8B-Instruct-FP8)           | FP8 quantized                        | [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)                   | >= 2025.2       |
| [furiosa-ai/Llama-3.3-70B-Instruct](https://huggingface.co/furiosa-ai/Llama-3.3-70B-Instruct)                 | BF16                                 | [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct)                 | >= 2025.3       |
| [furiosa-ai/Llama-3.3-70B-Instruct-INT8](https://huggingface.co/furiosa-ai/Llama-3.3-70B-Instruct-INT8)       | INT8 weight quantization             | [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct)                | >= 2025.3       |
| [furiosa-ai/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/furiosa-ai/Qwen2.5-Coder-32B-Instruct)         | BF16                                 | [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct)                     | >= 2025.3       |


## Examples

First, install the pre-requisites by following [Installing Furiosa-LLM](https://developer.furiosa.ai/latest/en/getting_started/furiosa_llm.html#installing-furiosa-llm).

Then, run the following command to start the Furiosa-LLM server with the Llama-3.1-8B-Instruct-FP8 model:

```
furiosa-llm serve furiosa-ai/Llama-3.1-8B-Instruct-FP8
```

For reasoning models like DeepSeek-R1-Distill-Llama-8B, you can enable the reasoning mode with a proper reasoning parser:

```
furiosa-llm serve furiosa-ai/DeepSeek-R1-Distill-Llama-8B \
  --enable-reasoning --reasoning-parser deepseek_r1
```

Once your server has launched, you can query the model with input prompts:
```sh
curl http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "EMPTY",
    "messages": [{"role": "user", "content": "What is the capital of France?"}]
    }' \
    | python -m json.tool
```

You can also learn more about usages from [Quick Start with Furiosa-LLM](https://developer.furiosa.ai/latest/en/getting_started/furiosa_llm.html#installing-furiosa-llm).