At the Engineering Meetup #13, a packed audience gathered to learn about one of the most critical systems protecting Nubank and its customers: the Defense Platform. 

This session was led by three experts deeply involved in building and supporting this technology: Alessandro Bottmann, a Staff Software Engineer with over 30 years in tech, known for leading engineering teams and delivering high-impact solutions; Jairo Júnior, a Senior Software Engineer passionate about architecture and scalable systems; and Rafael Rodrigues, a seasoned Solutions Architect at AWS with more than 15 years of experience in cloud computing.

Together, they walked us through the architecture, evolution, and future of the system that powers fraud prevention across all of Nubank’s products and regions.

A platform born from complexity

In the early days, fraud detection at Nubank was decentralized. Each product team had to integrate with separate systems and services to detect and respond to suspicious activity. It worked, but it didn’t scale. There were inefficiencies, inconsistencies, and long delays to get a new defense up and running.

That’s where the idea for the Defense Platform was born. Rather than continuing to patch together fragmented solutions, the team envisioned a unified system, one that could handle millions of events per day, integrate easily with new products, and operate reliably across multiple regions. Over the last five years, this vision became reality.

Today, the platform processes hundreds of millions of events every day with over 99.98% availability. It’s used in Brazil, Mexico, and Colombia, and is ready to expand even further. All of this is made possible by a design focused on reliability, scalability, low-latency execution, and cost efficiency: a system built for long-term evolution.

Check our job opportunities

What happens when a transaction hits the platform?

Let’s say a PIX transaction comes in. The Defense Platform receives the event with all its context: origin, destination, amount, and more. From there, a component called the Flow Orchestrator takes over. It knows exactly what that type of event needs to go through: in the case of PIX, over 40 different processes might run in parallel, including both business rules and machine learning models.

These components pull data from what Nubank calls features, a flexible system that can access internal databases, third-party providers, and other services. Once the rules and models evaluate the risk, the orchestrator makes a decision: is the transaction safe to proceed, or should it trigger an action?

Actions can happen in real time, like blocking the transaction or displaying a warning in the app, or they can run asynchronously, such as opening an internal case for investigation. Either way, the event is logged and pushed through Nubank’s distributed ETL system — which aggregates over 100 terabytes of logs every day — for continuous analysis and improvement.

Built to handle millions, designed to adapt

The numbers behind the platform are massive. It processes around 450 million events per day, generating about 5 million internal requests per minute. That’s because a single event, like a PIX transaction, may trigger dozens of downstream processes. 

To support this load, Nubank relies on a highly distributed architecture with 20 “shards” in Brazil alone, essentially full replicas of the entire system that help spread the traffic and maintain low latency for millions of users.

This architecture is powered by a tech stack rooted in Clojure, Datomic (on top of DynamoDB), and Kafka. Machine learning models are built in Python, and observability is deeply embedded at every layer through logs, traces, and real-time metrics.

Optimizing the heart of risk detection

At its core, the platform revolves around a dual structure: detection and action. Detection can be powered by hand-written rules or machine learning models. Models tend to be slower to execute, so the platform uses a clever strategy to determine when they’re needed. If a rule can already confidently mark a transaction as high-risk, the model is skipped entirely, saving both time and resources.

All defenses go through a shadow testing phase before they’re fully released. In this stage, new rules and models run in parallel with the production environment, using real inputs but not affecting real users. This allows the team to validate accuracy and performance in live conditions without introducing risk.

The orchestrator, reimagined

The first version of the orchestrator was simple but inefficient. It executed components layer by layer, which meant even low-latency processes had to wait unnecessarily for others to finish. Recently, the team refactored this into a DAG-based model (Directed Acyclic Graph), using an open-source library developed at Nubank called Nodely.

In the new model, each component waits only for the data it depends on — nothing more. As a result, processing time in complex flows dropped from 550 milliseconds to around 350, a significant improvement when you’re dealing with millions of transactions every day.

Making defense more accessible

Right now, writing rules still requires engineering work. But the team is working to change that. By moving more of the platform to a declarative, configuration-based model, the goal is to let fraud analysts and other non-engineering roles contribute directly. That means faster deployments, greater ownership, and better responsiveness to emerging threats.

The platform is also evolving to give product teams more visibility into the operational cost of each fraud defense. With that, teams will be able to make smarter decisions not just about risk, but about efficiency too.

Insights from AWS: Taking fraud detection further

Following the walkthrough of Nubank’s platform, Rafael Rodrigues from AWS gave a practical look at how financial institutions are using cloud tools to fight fraud. He showed how Amazon Rekognition can handle document and facial verification, including liveness detection. 

He also demonstrated how Textract can extract data from ID documents, and how SageMaker can be used to train fraud detection models, whether through traditional supervised learning or more advanced techniques like anomaly detection and graph-based modeling.

One highlight was AWS CleanRooms, which allows companies to collaborate securely on shared datasets without exposing sensitive data, opening the door to joint anti-fraud efforts across institutions.

Why graphs change the game

Graph-based modeling was a standout topic. Unlike traditional fraud detection, which looks at transactions in isolation, graph models reveal the relationships between users, devices, IPs, and more.

Rafael showed how modeling these connections can quickly expose fraud rings, identify stolen identities being reused across multiple accounts, and spot suspicious behavior that’s otherwise hard to detect. With Amazon Neptune as the graph database and SageMaker for training, the potential for more powerful, contextual defenses is clear.

Final thoughts

Fraud never stops evolving — and neither does our platform. With an architecture that combines performance, flexibility, and observability, the Defense Platform is constantly improving. Whether it’s adopting new models, rethinking orchestration, or empowering more people to build defenses, the mission stays the same: keep customers safe, at scale.

And by combining in-house expertise with external collaboration — like the tools and services from AWS — we’re building a security ecosystem that grows stronger with every transaction.

Check our job opportunities