most read
Software Engineering
Why We Killed Our End-to-End Test Suite Sep 24
Culture & Values
The Spark Of Our Foundation: a letter from our founders Dec 9
Software Engineering
The value of canonicity Oct 30
Careers
We bring together great minds from diverse backgrounds who enable discussion and debate and enhance problem-solving.
Learn more about our careers



At Nubank, innovation in machine learning (ML) and data science (DS) drives our mission to build the Purple Future. Recently, we hosted the 92nd edition of our DS & ML Meetup, themed “Practices to Scale Machine Learning Operations.” This event featured a deep dive into the technical challenges and solutions of building real-time ML systems, with a focus on fraud detection—a domain where speed, accuracy, and scalability are critical.
Led by Otávio Vasques, Lead Machine Learning Engineer, the session explored the architecture, optimization strategies, and deployment practices behind Nubank’s real-time ML models. Key topics included the differences between batch and real-time models, the role of the Model Server, techniques to reduce latency and infrastructure costs, and best practices for testing and deploying models using shadow mode.
In this article, we’ll unpack these insights, offering a behind-the-scenes look at how Nubank scales machine learning operations to protect millions of customers. Whether you’re a data scientist, ML engineer, or simply curious about real-time ML, this post provides actionable lessons for building robust, scalable systems. Let’s dive in!
What are real-time models?
Real-time models differ from traditional batch models in one crucial way: they operate within the infrastructure of services, not just data pipelines. While batch models process large datasets overnight and generate predictions for the next day, real-time models respond to events as they happen.
This is essential for use cases like fraud detection, where delaying a decision by even a few seconds could mean the difference between stopping a fraudulent transaction or letting it go through.
For example, if someone steals your credit card and tries to make a purchase, a real-time model can flag the transaction immediately, whereas a batch model would only catch it the next day—long after the damage is done.
Check our job opportunities
The architecture of real-time models at Nubank
At Nubank, our real-time models are built on a robust architecture that ensures low latency and high reliability. Here’s how it works:
Optimizing for scale and speed
Real-time models are resource-intensive, especially when operating at Nubank’s scale. Here are some of the techniques we use to optimize performance:
1. Fragmented vs. Global Deployment
2. Pre-Policy Filtering
3. Parallelizing Feature Retrieval
4. Monitoring and Timeouts
Building reliable feature pipelines
Feature engineering is a critical part of any ML model, but it’s especially challenging in real-time systems. Here’s how we ensure consistency and reliability:
Testing and shadow mode
Deploying real-time models requires rigorous testing to ensure they perform as expected. Here’s our approach:
Final thoughts
Building real-time ML models is a complex but rewarding challenge. At Nubank, we’ve learned that success depends on a combination of robust architecture, careful optimization, and collaboration across teams. While the techniques we’ve developed are tailored to fraud detection, many of the principles—like parallelizing feature retrieval, monitoring dependencies, and testing rigorously—can be applied to other real-time use cases.
As was emphasized during the meetup, not every model needs every optimization. The key is to be critical about your use case, understand your constraints, and focus on the improvements that will deliver the most value. And remember: real-time ML is a team effort. It takes data scientists, engineers, and analysts working together to build systems that are both fast and reliable.
Check our job opportunities