ATLAS-Teach-8B-Instruct

A supervised fine-tuned teaching model that forms the foundation for Reinforcement Collaborative Learning (RCL). This checkpoint represents the initial teaching capability before reinforcement learning optimization.

Model Details

Architecture

  • Base Model: Qwen/Qwen3-8B
  • Parameters: 8B
  • Context Length: 16,384 tokens
  • Training Stage: Supervised Fine-tuning (SFT)

Training Framework

  • Method: Reinforcement Collaborative Learning (RCL) - SFT phase
  • Hardware: 4x H100 GPUs
  • Optimization: DeepSpeed ZeRO-3
  • Precision: BF16

Dataset

Arc-Intelligence/Arc-ATLAS-Teach-v0

  • Custom dataset designed for adaptive teaching scenarios
  • Formatted with RCL-specific teaching protocols
  • Includes reasoning traces and solution demonstrations

Adaptive Teaching Approach

The model follows a structured teaching protocol:

Two-Pass System

  1. Student Diagnostic: Brief capability assessment (≤500 tokens)
  2. Adaptive Response: Tailored teaching based on diagnosed understanding level

Key Features

  • Asymmetric reward structure (2x penalty for performance degradation)
  • Efficiency-aware teaching generation
  • Solution tag enforcement (<solution></solution>)

Usage

Basic Generation

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Arc-Intelligence/ATLAS-Teach-8B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("Arc-Intelligence/ATLAS-Teach-8B-Instruct")

# Example prompt following RCL format
prompt = """Question: {problem_text}

Briefly describe:
1. What type of problem this is
2. The key concepts or steps needed
3. Any potential challenges you see

Your initial approach:"""

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_new_tokens=2048,
    temperature=0.7,
    do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

Teaching Format

The model expects structured input for optimal teaching generation:

  • Problem statement with clear question
  • Optional student approach for adaptive guidance
  • Responses include <solution> tags for final answers

Training Configuration

Key hyperparameters from SFT phase:

  • Learning rate: 1e-5
  • Batch size: Per-device batch size of 1
  • Mixed precision: BF16
  • Gradient accumulation: Optimized for 4 GPU setup

Limitations

  • Pre-RL Checkpoint: This model has not undergone reinforcement learning optimization
  • Domain Scope: Primarily trained on mathematical and reasoning problems
  • Token Limits: Student diagnostic capped at 500 tokens for efficiency
  • Evaluation: Full benchmark results pending RL phase completion

Future Development

This SFT checkpoint serves as the foundation for:

  • Reinforcement learning with adaptive teaching rewards
  • Student model capability assessment integration
  • Multi-turn teaching dialogue optimization

License

Apache 2.0

Repository

Training code and implementation details: GitHub - RCL

Downloads last month
57
Safetensors
Model size
8.19B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Arc-Intelligence/ATLAS-8B-Instruct

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Finetuned
(253)
this model
Quantizations
2 models

Dataset used to train Arc-Intelligence/ATLAS-8B-Instruct