---
title: Spanish Embeddings Api
emoji: 🐨
colorFrom: green
colorTo: green
sdk: docker
pinned: false
---

# Multilingual & Legal Embeddings API

A high-performance FastAPI application providing access to **5 specialized embedding models** for Spanish, Catalan, English, and multilingual text. Each model has its own dedicated endpoint for optimal performance and clarity.

🌐 **Live API**: [https://aurasystems-spanish-embeddings-api.hf.space](https://aurasystems-spanish-embeddings-api.hf.space)  
📖 **Interactive Docs**: [https://aurasystems-spanish-embeddings-api.hf.space/docs](https://aurasystems-spanish-embeddings-api.hf.space/docs)

## 🚀 Quick Start

### Basic Usage
```bash
# Test jina-v3 endpoint (multilingual, loads at startup)
curl -X POST "https://aurasystems-spanish-embeddings-api.hf.space/embed/jina-v3" \
     -H "Content-Type: application/json" \
     -d '{"texts": ["Hello world", "Hola mundo"], "normalize": true}'

# Test Catalan RoBERTa endpoint
curl -X POST "https://aurasystems-spanish-embeddings-api.hf.space/embed/roberta-ca" \
     -H "Content-Type: application/json" \
     -d '{"texts": ["Bon dia", "Com estàs?"], "normalize": true}'
```

## 📚 Available Models & Endpoints

| Endpoint | Model | Languages | Dimensions | Max Tokens | Loading Strategy |
|----------|--------|-----------|------------|------------|------------------|
| `/embed/jina-v3` | jinaai/jina-embeddings-v3 | Multilingual (30+) | 1024 | 8192 | **Startup** |
| `/embed/roberta-ca` | projecte-aina/roberta-large-ca-v2 | Catalan | 1024 | 512 | On-demand |
| `/embed/jina` | jinaai/jina-embeddings-v2-base-es | Spanish, English | 768 | 8192 | On-demand |
| `/embed/robertalex` | PlanTL-GOB-ES/RoBERTalex | Spanish Legal | 768 | 512 | On-demand |
| `/embed/legal-bert` | nlpaueb/legal-bert-base-uncased | English Legal | 768 | 512 | On-demand |

### Model Recommendations

- **🌍 General multilingual**: Use `/embed/jina-v3` - Best overall performance
- **🇪🇸 Spanish general**: Use `/embed/jina` - Excellent for Spanish/English
- **🇪🇸 Spanish legal**: Use `/embed/robertalex` - Specialized for legal texts
- **🏴󠁧󠁢󠁣󠁡󠁴󠁿 Catalan**: Use `/embed/roberta-ca` - Best for Catalan text
- **🇬🇧 English legal**: Use `/embed/legal-bert` - Specialized for legal documents

## 🔗 API Endpoints

### Model-Specific Embedding Endpoints

Each model has its dedicated endpoint:

```
POST /embed/jina-v3      # Multilingual (startup model)
POST /embed/roberta-ca   # Catalan
POST /embed/jina         # Spanish/English
POST /embed/robertalex   # Spanish Legal
POST /embed/legal-bert   # English Legal
```

### Utility Endpoints

```
GET /                    # API information
GET /health             # Health check and model status
GET /models             # List all models with specifications
```

## 📖 Usage Examples

### Python

```python
import requests

API_URL = "https://aurasystems-spanish-embeddings-api.hf.space"

# Example 1: Multilingual with Jina v3 (startup model - fastest)
response = requests.post(
    f"{API_URL}/embed/jina-v3",
    json={
        "texts": [
            "Hello world",      # English
            "Hola mundo",       # Spanish
            "Bonjour monde",    # French
            "こんにちは世界"     # Japanese
        ],
        "normalize": True
    }
)
result = response.json()
print(f"Jina v3: {result['dimensions']} dimensions")  # 1024

# Example 2: Catalan text with RoBERTa-ca
response = requests.post(
    f"{API_URL}/embed/roberta-ca",
    json={
        "texts": [
            "Bon dia, com estàs?",
            "Barcelona és una ciutat meravellosa",
            "M'agrada la cultura catalana"
        ],
        "normalize": True
    }
)
catalan_result = response.json()
print(f"Catalan: {catalan_result['dimensions']} dimensions")  # 1024

# Example 3: Spanish legal text with RoBERTalex
response = requests.post(
    f"{API_URL}/embed/robertalex",
    json={
        "texts": [
            "Artículo primero de la constitución",
            "El contrato será válido desde la fecha de firma",
            "La jurisprudencia establece que..."
        ],
        "normalize": True
    }
)
legal_result = response.json()
print(f"Spanish Legal: {legal_result['dimensions']} dimensions")  # 768

# Example 4: English legal text with Legal-BERT
response = requests.post(
    f"{API_URL}/embed/legal-bert",
    json={
        "texts": [
            "This agreement is legally binding",
            "The contract shall be governed by English law",
            "The party hereby agrees and covenants"
        ],
        "normalize": True
    }
)
english_legal_result = response.json()
print(f"English Legal: {english_legal_result['dimensions']} dimensions")  # 768

# Example 5: Spanish/English bilingual with Jina v2
response = requests.post(
    f"{API_URL}/embed/jina",
    json={
        "texts": [
            "Inteligencia artificial y machine learning",
            "Artificial intelligence and machine learning",
            "Procesamiento de lenguaje natural"
        ],
        "normalize": True
    }
)
bilingual_result = response.json()
print(f"Bilingual: {bilingual_result['dimensions']} dimensions")  # 768
```

### JavaScript/Node.js

```javascript
const API_URL = 'https://aurasystems-spanish-embeddings-api.hf.space';

// Function to get embeddings from specific endpoint
async function getEmbeddings(endpoint, texts) {
    const response = await fetch(`${API_URL}/embed/${endpoint}`, {
        method: 'POST',
        headers: {
            'Content-Type': 'application/json',
        },
        body: JSON.stringify({
            texts: texts,
            normalize: true
        })
    });
    
    if (!response.ok) {
        throw new Error(`Error: ${response.status}`);
    }
    
    return await response.json();
}

// Usage examples
try {
    // Multilingual embeddings
    const multilingualResult = await getEmbeddings('jina-v3', [
        'Hello world',
        'Hola mundo',
        'Ciao mondo'
    ]);
    console.log('Multilingual dimensions:', multilingualResult.dimensions);
    
    // Catalan embeddings
    const catalanResult = await getEmbeddings('roberta-ca', [
        'Bon dia',
        'Com estàs?'
    ]);
    console.log('Catalan dimensions:', catalanResult.dimensions);
    
} catch (error) {
    console.error('Error:', error);
}
```

### cURL Examples

```bash
# Multilingual with Jina v3 (startup model)
curl -X POST "https://aurasystems-spanish-embeddings-api.hf.space/embed/jina-v3" \
     -H "Content-Type: application/json" \
     -d '{
       "texts": ["Hello", "Hola", "Bonjour"],
       "normalize": true
     }'

# Catalan with RoBERTa-ca
curl -X POST "https://aurasystems-spanish-embeddings-api.hf.space/embed/roberta-ca" \
     -H "Content-Type: application/json" \
     -d '{
       "texts": ["Bon dia", "Com estàs?"],
       "normalize": true
     }'

# Spanish legal with RoBERTalex
curl -X POST "https://aurasystems-spanish-embeddings-api.hf.space/embed/robertalex" \
     -H "Content-Type: application/json" \
     -d '{
       "texts": ["Artículo primero"],
       "normalize": true
     }'

# English legal with Legal-BERT
curl -X POST "https://aurasystems-spanish-embeddings-api.hf.space/embed/legal-bert" \
     -H "Content-Type: application/json" \
     -d '{
       "texts": ["This agreement is binding"],
       "normalize": true
     }'

# Spanish/English bilingual with Jina v2
curl -X POST "https://aurasystems-spanish-embeddings-api.hf.space/embed/jina" \
     -H "Content-Type: application/json" \
     -d '{
       "texts": ["Texto en español", "Text in English"],
       "normalize": true
     }'
```

## 📋 Request/Response Schema

### Request Body

```json
{
    "texts": ["text1", "text2", "..."],
    "normalize": true,
    "max_length": null
}
```

| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `texts` | array[string] | ✅ Yes | - | 1-50 texts to embed |
| `normalize` | boolean | No | `true` | L2-normalize embeddings |
| `max_length` | integer/null | No | `null` | Max tokens (model-specific limits) |

### Response Body

```json
{
    "embeddings": [[0.123, -0.456, ...], [0.789, -0.012, ...]],
    "model_used": "jina-v3",
    "dimensions": 1024,
    "num_texts": 2
}
```

## ⚡ Performance & Limits

- **Maximum texts per request**: 50
- **Startup model**: `jina-v3` loads at startup (fastest response)
- **On-demand models**: Load on first request (~30-60s first time)
- **Typical response time**: 100-300ms after models are loaded
- **Memory optimization**: Automatic cleanup for large batches
- **CORS enabled**: Works from any domain

## 🔧 Advanced Usage

### LangChain Integration

```python
from langchain.embeddings.base import Embeddings
from typing import List
import requests

class MultilingualEmbeddings(Embeddings):
    """LangChain integration for multilingual embeddings"""
    
    def __init__(self, endpoint: str = "jina-v3"):
        """
        Initialize with specific endpoint
        
        Args:
            endpoint: One of "jina-v3", "roberta-ca", "jina", "robertalex", "legal-bert"
        """
        self.api_url = f"https://aurasystems-spanish-embeddings-api.hf.space/embed/{endpoint}"
        self.endpoint = endpoint
    
    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        response = requests.post(
            self.api_url,
            json={"texts": texts, "normalize": True}
        )
        response.raise_for_status()
        return response.json()["embeddings"]
    
    def embed_query(self, text: str) -> List[float]:
        return self.embed_documents([text])[0]

# Usage examples
multilingual_embeddings = MultilingualEmbeddings("jina-v3")
catalan_embeddings = MultilingualEmbeddings("roberta-ca")
spanish_legal_embeddings = MultilingualEmbeddings("robertalex")
```

### Semantic Search

```python
import numpy as np
from typing import List, Tuple

def semantic_search(query: str, documents: List[str], endpoint: str = "jina-v3", top_k: int = 5):
    """Semantic search using specific model endpoint"""
    
    response = requests.post(
        f"https://aurasystems-spanish-embeddings-api.hf.space/embed/{endpoint}",
        json={"texts": [query] + documents, "normalize": True}
    )
    
    embeddings = np.array(response.json()["embeddings"])
    query_embedding = embeddings[0]
    doc_embeddings = embeddings[1:]
    
    # Calculate cosine similarities (already normalized)
    similarities = np.dot(doc_embeddings, query_embedding)
    top_indices = np.argsort(similarities)[::-1][:top_k]
    
    return [(idx, similarities[idx]) for idx in top_indices]

# Example: Multilingual search
documents = [
    "Python programming language",
    "Lenguaje de programación Python",
    "Llenguatge de programació Python",
    "Language de programmation Python"
]

results = semantic_search("código en Python", documents, "jina-v3")
for idx, score in results:
    print(f"{score:.4f}: {documents[idx]}")
```

## 🚨 Error Handling

### HTTP Status Codes

| Code | Description |
|------|-------------|
| 200 | Success |
| 400 | Bad Request (validation error) |
| 422 | Unprocessable Entity (schema error) |
| 500 | Internal Server Error (model loading failed) |

### Common Errors

```python
# Handle errors properly
try:
    response = requests.post(
        "https://aurasystems-spanish-embeddings-api.hf.space/embed/jina-v3",
        json={"texts": ["text"], "normalize": True}
    )
    response.raise_for_status()
    result = response.json()
except requests.exceptions.HTTPError as e:
    print(f"HTTP error: {e}")
    print(f"Response: {response.text}")
except requests.exceptions.RequestException as e:
    print(f"Request error: {e}")
```

## 📊 Model Status Check

```python
# Check which models are loaded
health = requests.get("https://aurasystems-spanish-embeddings-api.hf.space/health")
status = health.json()

print(f"API Status: {status['status']}")
print(f"Startup model loaded: {status['startup_model_loaded']}")
print(f"Available models: {status['available_models']}")
print(f"Models loaded: {status['models_count']}/5")

# Check endpoint status
for model, endpoint_status in status['endpoints'].items():
    print(f"{model}: {endpoint_status}")
```

## 🔒 Authentication & Rate Limits

- **Authentication**: None required (open API)
- **Rate limits**: Generous limits on Hugging Face Spaces
- **CORS**: Enabled for all origins
- **Usage**: Free for research and commercial use

## 🏗️ Architecture

### Endpoint-Per-Model Design
- **Startup model**: `jina-v3` loads at application startup for fastest response
- **On-demand loading**: Other models load when first requested
- **Memory optimization**: Progressive loading reduces startup time
- **Model caching**: Once loaded, models remain in memory for fast inference

### Technical Stack
- **FastAPI**: Modern async web framework
- **Transformers**: Hugging Face model library
- **PyTorch**: Deep learning backend
- **Docker**: Containerized deployment
- **Hugging Face Spaces**: Cloud hosting platform

## 📄 Model Licenses

- **Jina models**: Apache 2.0
- **RoBERTa models**: MIT/Apache 2.0
- **Legal-BERT**: Apache 2.0

## 🤝 Support & Contributing

- **Issues**: [GitHub Issues](https://huggingface.co/spaces/AuraSystems/spanish-embeddings-api/discussions)
- **Interactive Docs**: [FastAPI Swagger UI](https://aurasystems-spanish-embeddings-api.hf.space/docs)
- **Model Papers**: Check individual model pages on Hugging Face

---

Built with ❤️ using **FastAPI** and **Hugging Face Transformers**