Explanation of files made available

#15
by Alphag0 - opened

Could anyone explain what the different safetensors + other model files in the repo are for please? There's some in the root directory for this repo, as well as some in the original folder, and a model.bin in the 'metal' folder. Similar situation for the 20b repo as well.

What are each of these for - which are the actual model, many thanks!

[EDIT] I'm aware of how to run a model using the safetensors, I'm just unsure which ones to actually use.

The .safetensors files are split into multiple parts mainly for practical reasons: some file systems like FAT32 have a 4 GB file size limit, and even on modern systems, dealing with a single 60+ GB file isn’t ideal. Plus, it allows inference engines (like vllm, transformers, etc.) to load the files in parallel, which speeds up model startup. The mapping between model layers and the corresponding files is defined in model.safetensors.index.json.

The model.bin in the metal/ folder is a precompiled version of the model designed to run with Metal on Macs (M1/M2/M3). It includes both the weights and the inference graph, so there’s no need to rebuild the model dynamically.

Ah that makes sense with regards to the metal/folder. What's the reason for having a set of 7 safetensors in the original/ folder and then a set of 14 in the repo root directory?

The root directory contains a version of the model that is ready for immediate use with inference frameworks such as vLLM, Transformers, or the gpt_oss chat interface. This version is already formatted for those systems and does not require manual conversion.

In contrast, the original/ subdirectory holds the reference checkpoint, as published by OpenAI on the Hugging Face Hub, typically corresponding to a standard PyTorch export (e.g., via transformers’ save_pretrained method). This version is required for the lower-level torch and triton backends provided in the official implementation.

As clarified in the OpenAI git repository:

The torch and triton implementation requires original checkpoint under gpt-oss-120b/original/ and gpt-oss-20b/original/ respectively. While vLLM uses the Hugging Face converted checkpoint under gpt-oss-120b/ and gpt-oss-20b/ root directory respectively.

This confirms the distinction between the two formats and their intended usage contexts.

So in summary:

  • original/ = raw, reference weights used for low-level or educational backends (torch, triton)

  • root = converted format for inference engines (vLLM, Transformers, etc.)

Understood, thank you very much for clarifying - much appreciated!

Alphag0 changed discussion status to closed

Sign up or log in to comment