Reviewed by Felipe Yukio

In today’s digital age, data plays a pivotal role in driving business strategies and decisions. As a trailblazer in the fintech industry, Nubank understands the power of data and seeks to maximize its potential. This blog post will delve deep into the concept of Core Datasets and how it’s proving to be a game-changer for Nubank.

We’ll also shed light on the practice of data self-service and why it’s an asset to modern businesses. Keep reading!

Understanding data self-service

Imagine a workspace where every employee, regardless of their department, has the ability to access and analyze necessary data whenever required. That’s precisely what data self-service is all about. It democratizes data, allowing for a smoother flow of information across various departments, promoting a culture of data-driven decision-making.

Benefits of data self-service

  1. Empowerment and speed: with data at their fingertips, teams can rapidly address problems, generate new datasets, and execute analytical tasks without depending on a centralized data team.
  2. Enhanced problem solving: immediate data access leads to quick identification and resolution of issues, enabling a proactive approach.

However, this powerful tool doesn’t come without its set of challenges.

Challenges

  1. Reprocessing: each department often has a unique perspective on how data should behave within its realm. This can lead to multiple teams creating similar datasets. Instead of having redundant datasets, it’s imperative to reuse and repurpose them. Failure to do so can lead to repetitive data processing by different departments, incurring unnecessary costs.
  2. Interdependency: when multiple teams rely on a single dataset, alterations made for one team can inadvertently impact others. Thus, there arises a need for a unified ‘source of truth’ to address these challenges—enter Core Datasets.

Check our job opportunities

Nubank’s Core Datasets in action

Core Datasets are the bedrock of reliable and best practice-oriented data management. They mitigate common issues like reprocessing and pave the way for consistent and trustworthy data streams. These datasets act as the reference point, ensuring uniformity and reducing discrepancies.

To truly appreciate the value of Core Datasets, let’s explore two use cases from Nubank’s operations:

Customer data challenges

Different business units within Nubank once had to grapple with specialized business rules, which made a consolidated customer analysis incredibly complex. Recognizing the complications this could bring, Nubank looked to Core Datasets as the solution.

By utilizing these datasets, we’ve been able to present a comprehensive view of our customers. What resulted was not just a harmonized customer analysis process but a centralized source of truth. This shift simplified both the maintenance and evolution of our customer data, fostering greater efficiency and clarity in our operations.

Data discrepancies in credit card products

With a diverse range of credit card products under Nubank’s banner, we found ourselves navigating a labyrinth of metrics, each with its unique set of business rules. The breadth of data sources meant that reconciling them became a meticulous task.

To address this, we initiated the unification of business rules for corporate indicators. This process required a deep-rooted collaboration between stakeholders and business units. By clearly defining ownership both functionally and technically, we achieved a cohesive corporate view. This new perspective respected specialized views, ensuring that while we had a comprehensive overview, the unique nuances of each unit were not lost. 

Furthermore, this shift bolstered our governance processes, especially in areas concerning data quality, usability, and integrity.

Core Datasets Reference

In practice, core datasets have stricter documentation, called design specs. Nubank’s Analytics Engineering team reference the following link:

Data Quality at Airbnb 

Operational dynamics

No two problems are identical, and thus, their solutions might vary. When working with Core Datasets, the essence is to achieve the properties listed below, ensuring the final dataset is:

  • The definitive source for particular use cases.
  • Scalable with the company’s organic growth.
  • Characterized by clear business rules, either via code or documentation.
  • Thoroughly documented.

At Nubank, two distinct approaches have been implemented to achieve this: the Tabular Modelling and the EAVT Modelling. Let’s dive into it while understanding the theoretical motivation behind these methodologies.

Kimball’s Dimensional Modeling concepts

Imagine a table depicting transactions. A primary key, the ‘grain,’ defines its core essence. From here, various characteristics or ‘dimension tables’ are appended to describe these events, culminating in what’s termed as the ‘Star schema’. Following these principles, Nubank ensures operational efficiency.

This involves mapping business processes, defining the grain, identifying dimensions, and detailing the actual event, leading to a structured database with flexible and scalable properties.

However, technological advancements and changing paradigms have shifted the focus from storage to processing concerns.

The EAVT approach

Given the cost-intensive nature of implementing more information to a table, a more column-rich dataset becomes appealing. From this line of thinking emerged the EAVT (Entity, Attribute, Value, Timestamp) model. EAVT can be visualized as a table where columns are stacked up, ready to be pivoted into the desired tabular format when necessary.

The EAVT model, with its emphasis on Entity, Attribute, Value, and Timestamp, presents a refreshing perspective in the realm of data handling. One of its most pronounced advantages is the reduced need for schema modifications. This, in turn, provides a greater degree of modularity and facilitates simpler iterations. When working with large datasets, such a model proves invaluable, allowing data handlers to adapt swiftly to changes.

However, every silver lining has a cloud. While the EAVT model is revolutionary in many ways, it may not be the perfect fit for all scenarios. For instance, when dealing with smaller tables, implementing the EAVT model can be seen as excessive, perhaps even cumbersome. Another challenge arises when handling intricate business logic. In such situations, the model demands complex manipulation, which can be daunting for those unacquainted with its details.

Nubank has developed a robust framework for EAVT manipulation, equipped with monitoring tools, alert systems, and business rule trackers.

In conclusion, our journey within the intricate world of data has been both challenging and enlightening. Despite the milestones achieved, it feels as if we’re just starting, and we eagerly look forward to the endless possibilities ahead!

Check our job opportunities