Building Foundation Models into Nubank’s AI Platform

Author: Hiroto Udagawa

The work described here is a collaborative effort by many engineers and product partners at Nubank (alphabetical): Abhishek Shivanna, Cassiano Abreu, Daniel Seerig, Denis Reis, Guilherme Peixoto, Gustavo Millen, Helder Dias, Henrique Lopes, Ivanildo Santos, José Mora, Juliana Forlin, Louise Farias, Lucas Ikeda, Luiz Coelho, Matias Roqueta, and Rick Slangen. We also thank Rohan Ramanath, Daniel Silva, and Guilherme Tanure for their support.

Nubank acquired Hyperplane in July 2024 to integrate our small team’s Foundation Model technology into one of the world’s largest digital banks quickly and effectively. While our startup has expertise in developing large models to understand financial behavior, Nubank has built a decade-long culture of overcoming the complexities around productionizing technological innovations in the financial industry. Through our combined efforts, we had an opportunity to accelerate Nubank’s advancement into an AI-first bank.

This blog post outlines how platform and product teams worked together over the first eight months of this effort to train and deploy billion-parameter models across several predictive modeling use cases. We break down our journey into three sections:

Foundation Model Project Design: How we productionize Foundation Models using Nubank’s existing AI Platform and run projects to measure impact compared to Tabular ML baselines
Technologies and System Architecture: The technologies and tools we develop to drive these Foundation Model projects
Progress in the First Eight Months: Our progress deploying Foundation Models in Nubank over the first eight months of this effort

After these initial successes, our approach remains unchanged:

Ingest and validate more unstructured data sources
Train more advanced Foundation Models
Deploy them to more critical use cases

For more details about our modeling approach, check out our previous blog series.

Foundation Model Project Design

We base our Foundation Model approach on Big Tech leaders in AI by building large, generalizable models that better understand customer behaviors and then deploy these models across critical decision engines. Nubank’s business presents incredible opportunities with AI. Its digital-first, multi-country approach generates vast amounts of rich transactional data, which can drive better financial experiences for diverse customers worldwide. Furthermore, like many Big Tech companies, predictive modeling underpins Nubank’s products, and incremental model improvements generate outsized value for the business.

Nubank, like most financial institutions, historically relies on linear models, gradient-boosted trees, and aggregated tabular features as the backbone of its predictive AI decisions. To introduce Foundation Models into Nubank efficiently, we made a few key initial choices:

Components where Nubank has expertise would not be touched. Nubank’s years of experience in data infrastructure, model governance, and model deployment are essential guardrails for productionizing these complex models in a highly regulated environment.
We identify the minimum components needed to build and deploy Foundation Models. Furthermore, we would establish a clear interface between these new and existing components across the AI Platform’s layers.
Once these interfaces are decided, the required components will be developed freely from scratch using a revamped tech stack.

We narrowed our development focus to three components:

Sequence Data Preprocessing: Existing feature ETL pipelines were built to convert raw sequence data into tabular features. We develop new workflows to transform, validate, and monitor sequence data, making it available for modeling.
GPU/Heterogeneous Clusters: Large-scale linear and gradient-boosted tree models can be run on CPU clusters. However, training and deploying transformer-based architectures require large heterogeneous clusters. As scaling laws are reached, optimizing these pipelines is critical to control costs.
Transformers & DNNs: Large-scale transformers and DNNs are the backbones of our newly proposed Foundation Model architectures.

We highlight below an overview of how Nubank’s AI Platform now works to support both Tabular and Foundation Model projects:

With this structure in place, we onboard product use cases and train Foundation Model-based challengers against Tabular ML baselines. Building a tight, reliable experiment iteration loop is paramount when introducing significant modeling changes to an existing product. These projects require great care in ingesting data, running evaluations, and deploying models in the same way as each incumbent. By changing as few levers as possible, we fairly replicate baselines and measure delta performance improvements. To further isolate the impact, these engagements begin by leveraging only the same raw sequence data sources as in the baseline model (represented as aggregated tabular features). This ensures that all model inputs are already validated and monitored in production.

As models get larger and data less structured, it is easy to get lost in the new complexity. Our aforementioned approach enables us to build and measure progress incrementally and in a principled manner. Moreover, by leveraging years of Nubank’s expertise in governing and productionizing model innovations, these new projects can focus on optimizing metrics within a constrained yet ambitious modeling scope.

Check our job opportunities

Technologies and System Architecture

With this interface and experiment design established, we develop our Foundation Model System with a revamped tech stack. Below is an overview of our system architecture:

Each Foundation Model Project leverages technologies developed across the following areas:

AI Research

We are building a world-class research organization to study how AI can be uniquely deployed to drive better experiences at Nubank. While we are inspired by applied AI research across Big Tech (e.g., Generative Recommenders), financial institutions have a unique role in advancing research areas like User Behavior Modeling and Causal Modeling. Our first research efforts primarily focus on feeding raw sequential data sources to large-scale transformer model architectures to capture behavioral signals that are impossible to detect using traditional aggregated feature approaches.

Sequence Data Processing

As with all ML innovations, immense effort is required behind the scenes to ensure that the data fed into our models is of high quality and free of leakages. We formed a dedicated team to ingest, validate, and enrich Nubank’s enormous trove of rich transaction (and non-transaction) data sources. While Nubank already has monitoring in place, this team also works to develop additional tooling to ensure that sequence data can be deployed safely in production.

Core & Custom Pipelines

We are building a data processing and model pipeline stack to handle these new data and model workloads. We leverage Ray to enable our small infra team to scale out heterogeneous clusters, allowing ML Engineers to train billion-parameter models on all 100+ million Nubank customers and their transaction histories. Many decisions at Nubank occur across all users on a monthly basis, so our models ingest O(billions) of labels and O(thousands) transactions per label. Resultingly, our final Foundation Models process O(trillions) tokens during training.

We wrap reusable data preparation, training, and inference pipelines into core components that any ML Engineer at Nubank can run off-the-shelf. Furthermore, we add the capability to plug custom components into the modeling stack, enabling modelers to build pipelines tailored to their specific problems.

Internal Tools

Traditionally, model development can be isolated within smaller initiatives where one or a few ML engineers train relatively lightweight models for their specific problem. However, building large, horizontally deployable foundation models requires tight coordination among dozens of researchers and engineers. We are developing additional tooling across model tracking, cataloging, and reporting to ensure that data and models progress in a unified direction. Below, we highlight a few examples of the tooling we have built:

The Model Catalog allows us to view all models trained in AI Core’s platform, compare results, and filter them based on specific criteria.

In the Model View, a modeler can see all artifacts of a model and analyse their input data, inference outputs, and any parameters used to train that model.

In the Reporting Tool, one can drag and drop common analytics plots and visualizations, speeding up the evaluation and comparison of models.

Progress in the First Eight Months

Over the first eight months, we have steadily progressed on all components of the System Architecture and on our mission of introducing Foundation Models across product areas.

At the beginning of this post, we outlined the following approach to our effort:

Ingest and validate more unstructured data sources
Train more advanced Foundation Models
Deploy them to more critical use cases

We highlight our progress in these three areas.

1. Ingest and validate Unstructured Data Sources

We show our data team’s progress in ingesting sequence data sources and testing their impact on our models. We track progress across the following three metrics:

Ingested: Data source has been ingested and is ready for models to consume
Modeled: Data source has passed quality checks, and impact has been assessed in at least one task
Productionized: Data Source has all the monitoring required for model productionization

While the current productionized data sources only include transaction sources, we plan to experiment soon with incorporating app event and product usage signals into our transformer models.

2. Train more advanced Foundation Models

We measure incremental model performance across use cases over time. These lifts stem from three main efforts: more data sources, improved model architectures, and model scale. Below, we present the average AUC lift of our models across four benchmark tasks over the first several months.

Note that a +1.20% AUC lift is 2~3x the lift for a mature model’s typical annual release. Furthermore, we achieve this lift without adding any new data source, relying solely on signals already in the baseline models but represented as tabular features.

3. Deploy them to more critical use cases

We measure adoption over four metrics:

Problems Onboarded: Ingest features and labels, and align on evaluations and key metrics for the engagement.
Baselines Replicated: Replicate baseline metrics using the same labels and tabular features as those in production models to ensure that all data is correct and evaluations are fair.
Challengers Built: Train challenger models by feeding sequence data into our model architectures to achieve sufficient metric improvement to launch a new model.
Models in Production: Get model approval and deploy them into production pipelines to serve customers.

In this blog post, we shared our progress during the first eight months of introducing Predictive Foundation Models into Nubank’s AI Platform. This work has enabled the deployment of large-scale transformer-based sequence models in several key decision engines. Achieving Nubank’s AI-First vision will require a sweeping shift in technology, culture, and product thinking. The investment and impact from these Foundation Models mark our early efforts to accelerate this transformation.

Check our job opportunities

Building Foundation Models into Nubank’s AI Platform

Foundation Model Project Design