--- base_model: EleutherAI/gpt-j-6b language: - en license: apache-2.0 pipeline_tag: text-generation library_name: furiosa-llm tags: - furiosa-ai --- # Model Overview - **Model Architecture:** GPT-J - **Input:** Text - **Output:** Text - **Model Optimizations:** - Beam search optimization (beam=4) for MLPerf (This model cannot run for greedy search, top-k, top-p) - **Maximum Context Length:** 2k tokens - Maximum Prompt Length: 1920 tokens - Maximum Generation Length: 2048 tokens - **Intended Use Cases:** Intended for commercial and non-commercial use. Same as [EleutherAI/gpt-j-6b](https://huggingface.co/EleutherAI/gpt-j-6b), this models is intended for text summarization. - **Release Date:** 04/12/2025 - **Version:** v2025.2 - **License(s):** [Apache License 2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md) - **Supported Inference Engine(s):** Furiosa LLM - **Supported Hardware Compatibility:** FuriosaAI RNGD - **Preferred Operating System(s):** Linux - **Fine-tunes:** This model is fine-tuned for text summarization. More details can be found at [Datasets & Models at mlcommons/inferences/gpt-j/README.md](https://github.com/mlcommons/inference/blob/7bf59976b5f4eb7c5b8f30a88af832e028028446/language/gpt-j/README.md#datasets--models) - **Quantization:** - Tool: Furiosa Model Compressor v0.6.2, included in Furiosa SDK 2025.2 - Weight: float8, Activation: float8, KV cache: float8 - Calibration: [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) ([instruction](https://github.com/mlcommons/inference/blob/7bf59976b5f4eb7c5b8f30a88af832e028028446/language/gpt-j/README.md#download--process-dataset)) ## Description: This is pre-compiled model of a fine-tuned and quantized version of [EleutherAI/gpt-j-6b](https://huggingface.co/EleutherAI/gpt-j-6b). [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) is used for calibration and fine-tuned for text summarization. Detailes about how this model was fine-tuned and calibrated can be found in [mlcommons/inferences/gpt-j/README.md](https://github.com/mlcommons/inference/blob/7bf59976b5f4eb7c5b8f30a88af832e028028446/language/gpt-j/README.md). As mentioned above, this model is fine-tuned for text summarization task. Please use the following prompt when using this model and replace the {INPUT} part accordingly: ``` Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: Summarize the following news article: ### Input: {INPUTS} ### Response: ``` ## Usage ### Furiosa-LLM Follow the example command below after [installing Furiosa-LLM and its prerequisites](https://developer.furiosa.ai/latest/en/getting_started/furiosa_llm.html#installing-furiosa-llm). ```sh furiosa-llm serve furiosa-ai/gpt-j-6b-FP8-MLPerf ``` ### MLPerf Benchmark using RNGD Follow the example command below after [installing furiosa-mlperf and its prerequisites](https://developer.furiosa.ai/latest/en/getting_started/furiosa_mlperf.html). ```sh furiosa-mlperf gpt-j-offline furiosa-ai/gpt-j-6b-FP8-MLPerf ./mlperf-result ```