arxiv:2507.14256

Impact of Code Context and Prompting Strategies on Automated Unit Test Generation with Modern General-Purpose Large Language Models

Published on Jul 18

Upvote

Authors:

Jakub Walczak ,

Piotr Tomalak ,

Abstract

Chain-of-thought prompting strategy enhances the quality of unit tests generated by large language models, improving branch coverage and mutation scores.

AI-generated summary

Generative AI is gaining increasing attention in software engineering, where testing remains an indispensable reliability mechanism. According to the widely adopted testing pyramid, unit tests constitute the majority of test cases and are often schematic, requiring minimal domain expertise. Automatically generating such tests under the supervision of software engineers can significantly enhance productivity during the development phase of the software lifecycle. This paper investigates the impact of code context and prompting strategies on the quality and adequacy of unit tests generated by various large language models (LLMs) across several families. The results show that including docstrings notably improves code adequacy, while further extending context to the full implementation yields definitely smaller gains. Notably, the chain-of-thought prompting strategy -- applied even to 'reasoning' models -- achieves the best results, with up to 96.3\% branch coverage, a 57\% average mutation score, and near-perfect compilation success rate. Among the evaluated models, M5 (Gemini 2.5 Pro) demonstrated superior performance in both mutation score and branch coverage being still in top in terms of compilation success rate. All the code and resulting test suites are publicly available at https://github.com/peetery/LLM-analysis.

View arXiv page View PDF GitHub 2 Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2507.14256 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.14256 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.14256 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.