Inference Providers documentation
HF Inference
HF Inference
All supported HF Inference models can be found here
HF Inference is the serverless Inference API powered by Hugging Face. This service used to be called “Inference API (serverless)” prior to Inference Providers. If you are interested in deploying models to a dedicated and autoscaling infrastructure managed by Hugging Face, check out Inference Endpoints instead.
As of July 2025, hf-inference focuses mostly on CPU inference (e.g. embedding, text-ranking, text-classification, or smaller LLMs that have historical importance like BERT or GPT-2).
Supported tasks
Automatic Speech Recognition
Find out more about Automatic Speech Recognition here.
Language
Client
Provider
import os
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hf-inference",
api_key=os.environ["HF_TOKEN"],
)
output = client.automatic_speech_recognition("sample1.flac", model="openai/whisper-large-v3")
Chat Completion (LLM)
Find out more about Chat Completion (LLM) here.
Language
Client
Provider
import os
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hf-inference",
api_key=os.environ["HF_TOKEN"],
)
completion = client.chat.completions.create(
model="HuggingFaceTB/SmolLM3-3B",
messages=[
{
"role": "user",
"content": "What is the capital of France?"
}
],
)
print(completion.choices[0].message)
Feature Extraction
Find out more about Feature Extraction here.
Language
Client
Provider
import os
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hf-inference",
api_key=os.environ["HF_TOKEN"],
)
result = client.feature_extraction(
"Today is a sunny day and I will get some ice cream.",
model="intfloat/multilingual-e5-large",
)
Fill Mask
Find out more about Fill Mask here.
Language
Client
Provider
import os
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hf-inference",
api_key=os.environ["HF_TOKEN"],
)
result = client.fill_mask(
"The answer to the universe is undefined.",
model="google-bert/bert-base-uncased",
)
Image Classification
Find out more about Image Classification here.
Language
Client
Provider
import os
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hf-inference",
api_key=os.environ["HF_TOKEN"],
)
output = client.image_classification("cats.jpg", model="Falconsai/nsfw_image_detection")
Image Segmentation
Find out more about Image Segmentation here.
Language
Client
Provider
import os
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hf-inference",
api_key=os.environ["HF_TOKEN"],
)
output = client.image_segmentation("cats.jpg", model="jonathandinu/face-parsing")
Object Detection
Find out more about Object Detection here.
Language
Client
Provider
import os
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hf-inference",
api_key=os.environ["HF_TOKEN"],
)
output = client.object_detection("cats.jpg", model="facebook/detr-resnet-50")
Question Answering
Find out more about Question Answering here.
Provider
import os
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hf-inference",
api_key=os.environ["HF_TOKEN"],
)
answer = client.question_answering(
question="What is my name?",
context="My name is Clara and I live in Berkeley.",
model="deepset/roberta-base-squad2",
)
Summarization
Find out more about Summarization here.
Language
Client
Provider
import os
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hf-inference",
api_key=os.environ["HF_TOKEN"],
)
result = client.summarization(
"The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct.",
model="facebook/bart-large-cnn",
)
Table Question Answering
Find out more about Table Question Answering here.
Provider
import os
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hf-inference",
api_key=os.environ["HF_TOKEN"],
)
answer = client.table_question_answering(
query="How many stars does the transformers repository have?",
table={"Repository":["Transformers","Datasets","Tokenizers"],"Stars":["36542","4512","3934"],"Contributors":["651","77","34"],"Programming language":["Python","Python","Rust, Python and NodeJS"]},
model="google/tapas-base-finetuned-wtq",
)
Text Classification
Find out more about Text Classification here.
Language
Client
Provider
import os
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hf-inference",
api_key=os.environ["HF_TOKEN"],
)
result = client.text_classification(
"I like you. I love you",
model="tabularisai/multilingual-sentiment-analysis",
)
Text Generation
Find out more about Text Generation here.
Language
Client
Provider
import os
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hf-inference",
api_key=os.environ["HF_TOKEN"],
)
completion = client.chat.completions.create(
model="HuggingFaceTB/SmolLM3-3B",
messages="\"Can you please let us know more details about your \"",
)
print(completion.choices[0].message)
Text To Image
Find out more about Text To Image here.
Language
Client
Provider
import os
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hf-inference",
api_key=os.environ["HF_TOKEN"],
)
# output is a PIL.Image object
image = client.text_to_image(
"Astronaut riding a horse",
model="black-forest-labs/FLUX.1-dev",
)
Token Classification
Find out more about Token Classification here.
Language
Client
Provider
import os
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hf-inference",
api_key=os.environ["HF_TOKEN"],
)
result = client.token_classification(
"My name is Sarah Jessica Parker but you can call me Jessica",
model="dslim/bert-base-NER",
)
Translation
Find out more about Translation here.
Language
Client
Provider
import os
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hf-inference",
api_key=os.environ["HF_TOKEN"],
)
result = client.translation(
"Меня зовут Вольфганг и я живу в Берлине",
model="google-t5/t5-small",
)
Zero Shot Classification
Find out more about Zero Shot Classification here.
Language
Provider
import os
import requests
API_URL = "https://router.huggingface.co/hf-inference/models/facebook/bart-large-mnli"
headers = {
"Authorization": f"Bearer {os.environ['HF_TOKEN']}",
}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
output = query({
"inputs": "Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!",
"parameters": {"candidate_labels": ["refund", "legal", "faq"]},
})