Ensuring quality in distributed, living, and constantly evolving systems is one of the core commitments of engineering at Nubank. In this environment, where Clojure is widely adopted and teams work with code that changes continuously, Raíssa Barreira, Software Developer at Nubank, found the starting point to bring practice and research together: understanding how artificial intelligence can support the generation of unit tests responsibly. 

The project addresses a very real need in day-to-day development and pushes the boundaries of a frontier still largely unexplored within functional programming languages. The research is being conducted at LABES (Software Engineering Lab) at ICMC/USP, under the guidance of Prof. José Carlos Maldonado and in collaboration with Prof. Auri Vincenzi, currently a researcher at the University of Porto, further strengthening the bridge between academia and industry.

Where the research begins

Raíssa joined Nubank in 2023, and it was here that she had her first contact with Clojure. From that experience came the curiosity to understand how to improve testing quality in the language that powers much of Nu’s systems.

In recent years, there has been significant progress in using language models to automatically generate tests, but Clojure rarely appears in those investigations. This gap inspired the question that guides her research: how to integrate AI into the process of creating unit tests in Clojure in a thoughtful, incremental way that can be directly compared to human-written work.

Check our job opportunities

How the incremental strategy works

The central idea is to build tests progressively. Rather than creating a single test suite and analyzing it all at once, the process evolves in small layers. Each layer helps reveal gaps, refine existing tests, and prepare the environment for more rigorous evaluations. This approach makes it possible to identify fragile points before they become bigger issues.

In Clojure, this strategy becomes especially useful, since the language favors expressiveness and composition, which increases the need for careful verification across multiple execution paths.

The role of coverage testing and mutation testing

Coverage testing measures how much of the code is actually exercised by the tests. It helps reveal forgotten paths, rare branches, and parts of the system that never get executed.

Although it does not directly guarantee the logical quality of the tests, it quickly shows where the gaps are. In the incremental strategy, this metric works as an initial thermometer to guide what needs to be created or revised.

After reaching a solid level of coverage, the strategy moves to a more demanding stage: mutation testing. It creates slightly altered versions of the original code and checks whether the tests can detect that something is wrong. When the issue is identified, the mutant is said to be killed. When everything passes silently, it signals fragility. This technique provides a deeper view of how capable the tests truly are of capturing incorrect behavior.

The computational cost is much higher, but the gain in confidence is as well.

How artificial intelligence fits into this process

Language models are especially helpful at the beginning of the test suite construction. They understand textual instructions, infer behavior from functions, and produce Clojure test code quickly. This reduces manual effort and speeds up the creation of an initial test base.

At the same time, AI brings challenges that cannot be ignored. Models can hallucinate nonexistent behaviors, produce superficial tests, and rely heavily on the clarity of the input. They also lack formal mechanisms to guarantee correctness. This means the generated tests must be analyzed, reviewed, and continuously compared.

The incremental strategy emerges as a way to balance this scenario. Instead of relying entirely on AI, the process creates room for validation, direct comparison with human-written tests, and successive iterations that elevate the overall quality of the suite.

Research pipeline

The cycle begins with the creation of two test suites: one written by developers and another generated by AI. Both are evaluated through coverage. If the coverage falls below expectations, new tests are created and the process continues until an acceptable level is reached.

Once that stage matures, the cycle moves to mutation testing, where both suites are evaluated and expanded again. Over time, it becomes possible to observe how each suite evolves and understand where AI approaches or diverges from human quality.

What the initial results show

In the first scenarios, the AI-generated tests managed to outperform manually written tests, both in speed and in killing mutants. The conclusion is promising but still limited, since this early experiment does not yet represent the real complexity of production systems.

The next stage of the study involves applying the strategy to a larger and more diverse set of Clojure programs. The goal is to evaluate the volume of tests produced, the coverage achieved, the number of mutants killed, the computational effort involved, and the token consumption of the AI models. These indicators will help determine the actual cost and benefit of adopting this form of automation in real environments.

Raíssa’s research reinforces the importance of bringing academia and industry closer together. Nubank operates large-scale, distributed, and mission-critical systems. For this reason, any advancement that increases quality and reduces risk is meaningful.

The incremental AI-supported strategy opens pathways to evolve legacy test bases, reduce effort in major refactorings, and strengthen team confidence in every change deployed to production.

More broadly, the study enriches the conversation about adopting AI responsibly in the development lifecycle. Not as a replacement, but as part of a system that enables validation, comparison, and continuous improvement.

The strategy is still under development, but it already shows potential. With more experiments, a broader codebase, and additional evaluation cycles, it may become a relevant tool for improving software quality in Clojure.

Check our job opportunities