ATLAS-Teach-8B-Instruct
A supervised fine-tuned teaching model that forms the foundation for Reinforcement Collaborative Learning (RCL). This checkpoint represents the initial teaching capability before reinforcement learning optimization.
Model Details
Architecture
- Base Model: Qwen/Qwen3-8B
- Parameters: 8B
- Context Length: 16,384 tokens
- Training Stage: Supervised Fine-tuning (SFT)
Training Framework
- Method: Reinforcement Collaborative Learning (RCL) - SFT phase
- Hardware: 4x H100 GPUs
- Optimization: DeepSpeed ZeRO-3
- Precision: BF16
Dataset
Arc-Intelligence/Arc-ATLAS-Teach-v0
- Custom dataset designed for adaptive teaching scenarios
- Formatted with RCL-specific teaching protocols
- Includes reasoning traces and solution demonstrations
Adaptive Teaching Approach
The model follows a structured teaching protocol:
Two-Pass System
- Student Diagnostic: Brief capability assessment (≤500 tokens)
- Adaptive Response: Tailored teaching based on diagnosed understanding level
Key Features
- Asymmetric reward structure (2x penalty for performance degradation)
- Efficiency-aware teaching generation
- Solution tag enforcement (
<solution></solution>
)
Usage
Basic Generation
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("Arc-Intelligence/ATLAS-Teach-8B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("Arc-Intelligence/ATLAS-Teach-8B-Instruct")
# Example prompt following RCL format
prompt = """Question: {problem_text}
Briefly describe:
1. What type of problem this is
2. The key concepts or steps needed
3. Any potential challenges you see
Your initial approach:"""
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=2048,
temperature=0.7,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
Teaching Format
The model expects structured input for optimal teaching generation:
- Problem statement with clear question
- Optional student approach for adaptive guidance
- Responses include
<solution>
tags for final answers
Training Configuration
Key hyperparameters from SFT phase:
- Learning rate: 1e-5
- Batch size: Per-device batch size of 1
- Mixed precision: BF16
- Gradient accumulation: Optimized for 4 GPU setup
Limitations
- Pre-RL Checkpoint: This model has not undergone reinforcement learning optimization
- Domain Scope: Primarily trained on mathematical and reasoning problems
- Token Limits: Student diagnostic capped at 500 tokens for efficiency
- Evaluation: Full benchmark results pending RL phase completion
Future Development
This SFT checkpoint serves as the foundation for:
- Reinforcement learning with adaptive teaching rewards
- Student model capability assessment integration
- Multi-turn teaching dialogue optimization
License
Apache 2.0
Repository
Training code and implementation details: GitHub - RCL
- Downloads last month
- 57