ATLAS-Teach-8B-Instruct

A supervised fine-tuned teaching model that forms the foundation for Reinforcement Collaborative Learning (RCL). This checkpoint represents the initial teaching capability before reinforcement learning optimization.

Model Details

Architecture

Base Model: Qwen/Qwen3-8B
Parameters: 8B
Context Length: 16,384 tokens
Training Stage: Supervised Fine-tuning (SFT)

Training Framework

Method: Reinforcement Collaborative Learning (RCL) - SFT phase
Hardware: 4x H100 GPUs
Optimization: DeepSpeed ZeRO-3
Precision: BF16

Dataset

Arc-Intelligence/Arc-ATLAS-Teach-v0

Custom dataset designed for adaptive teaching scenarios
Formatted with RCL-specific teaching protocols
Includes reasoning traces and solution demonstrations

Adaptive Teaching Approach

The model follows a structured teaching protocol:

Two-Pass System

Student Diagnostic: Brief capability assessment (≤500 tokens)
Adaptive Response: Tailored teaching based on diagnosed understanding level

Key Features

Asymmetric reward structure (2x penalty for performance degradation)
Efficiency-aware teaching generation
Solution tag enforcement (<solution></solution>)

Usage

Basic Generation

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Arc-Intelligence/ATLAS-Teach-8B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("Arc-Intelligence/ATLAS-Teach-8B-Instruct")

# Example prompt following RCL format
prompt = """Question: {problem_text}

Briefly describe:
1. What type of problem this is
2. The key concepts or steps needed
3. Any potential challenges you see

Your initial approach:"""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_new_tokens=2048,
    temperature=0.7,
    do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

Teaching Format

The model expects structured input for optimal teaching generation:

Problem statement with clear question
Optional student approach for adaptive guidance
Responses include <solution> tags for final answers

Training Configuration

Key hyperparameters from SFT phase:

Learning rate: 1e-5
Batch size: Per-device batch size of 1
Mixed precision: BF16
Gradient accumulation: Optimized for 4 GPU setup

Limitations

Pre-RL Checkpoint: This model has not undergone reinforcement learning optimization
Domain Scope: Primarily trained on mathematical and reasoning problems
Token Limits: Student diagnostic capped at 500 tokens for efficiency
Evaluation: Full benchmark results pending RL phase completion

Future Development

This SFT checkpoint serves as the foundation for:

Reinforcement learning with adaptive teaching rewards
Student model capability assessment integration
Multi-turn teaching dialogue optimization

License

Apache 2.0

Repository

Training code and implementation details: GitHub - RCL

Arc-Intelligence
/

ATLAS-8B-Instruct