Written by: Felipe Almeida
Reviewed By: Luiz Felix, Fernando Santos, Hellen Lima

Recruitment processes are always a hot topic in any field, as they can be lengthy and intense. At Nu, we constantly discuss how to streamline them and focus on the essential skills for each role. Our interviews aim to gather as much information as possible about candidates to make the best hiring decisions. However, we know it’s impossible to summarize all of a person’s experiences and skills in just a few questions, answers, and exercises during a limited number of interviews.

As Nubank has been using data science since its founding in 2013, it means we’ve interviewed many candidates for various positions, such as Data Scientists, Machine Learning Engineers, and Data Science Managers.

If you’d like to know more about the hiring process for Data Science and Machine Learning Engineering roles, this is the right article!

Here are our tips:

1. Understand the problem first

In many situations, candidates may be asked how they would solve a specific problem using machine learning. In Nu’s hiring processes, it’s common for the questions to address real business problems, requiring candidates to consider different possibilities, their pros and cons, and the implications of these decisions for the business as a whole.  

Often, understanding the problem is more important than the solution itself. When the problem is clearly defined, identifying and validating potential solutions becomes significantly easier.  

The main point here is to ensure that candidates ask questions and understand the problem and the possibilities involved before starting to work on the solution. At some point, for example, it may be pertinent to use data exploration techniques to identify patterns in past events and, from that, propose the development of a machine learning model. In summary, the candidate should explore these decisions and reflect on them.

Check our job opportunities

2. Recognize when not to use Data Science or Machine Learning solutions

It’s essential to focus more on business outcomes than on how those outcomes are achieved. Data science is not an end in itself. Here are some examples of when machine learning-based solutions may not be appropriate:

  • When the business needs a quick solution to an urgent problem, and model training and development would take too long.
  • When there isn’t enough data available. In this case, a manually created heuristic may be sufficient until a suitable volume of data is collected.
  • When the cost of implementing a machine learning-based solution outweighs the benefit of solving the problem, making this approach unfeasible.

3. Understand how and where machine learning applications differ from traditional software

Machine learning systems share many characteristics with traditional software, but they also present unique challenges. Beyond being important in deciding whether or not to use machine learning, it’s important to be aware of these challenges.  

Machine learning production systems need to be connected to the underlying infrastructure. The code needs to be reproducible, version-controlled, and go through rigorous unit and integration tests. Moreover, it should be extracted into other libraries and shared with the team. Code complexity must be kept under control to make it easy to analyze and refactor later.  

In summary, everything you’ve learned about basic software engineering also applies to machine learning code.  

Among the main differences between machine learning systems and regular software are:

Data tracking: To reproduce the behavior of a machine learning system, you not only need the exact code used but also the data employed during training and prediction. If you want to know more about this topic, this post introduces it.

Silent failures: A machine learning system can produce unexpected results without a clear warning. For example, if an important feature for your model fails due to data issues, the system will continue making predictions, unaware that it’s receiving inadequate data.

4. Find the balance between model performance and complexity

More complex models usually offer superior performance but require more hardware resources and a more sophisticated deployment environment. In practice, this reflects higher execution costs, increased maintenance needs, more challenging monitoring, and a greater difficulty in understanding how the data fed into the model influences its predictions.  

Therefore, it’s worth considering the following aspects when evaluating whether a simpler or more complex model is appropriate to solve a given problem:

  • What is the practical gain from replacing a simple model, like a decision tree, with a neural network, for instance?
  • How to weigh the number of explanatory variables to be added to the model?
  • What’s the cost-benefit of optimizing the model’s performance?

The answer to these questions is usually “it depends on the business demand.” The job of a Data Scientist is to help stakeholders understand where to draw the line and where diminishing returns begin.

5. Consider how models drive business outcomes

The higher your seniority level, the greater the expectation that you’ll be able to identify how machine learning models help solve business problems. Therefore, it’s important to be able to explain how using a more complex model can translate into customer satisfaction, reduced wait times, or higher success rates, and how these aspects can be evaluated.  

Contextualizing the role of a machine learning model within the business flow may depend on your role. Here are some questions that can help during this process:  

Data Science

  • “How would you measure the business impact of this model?”  
  • “If I told you that non-technical users will rely on this model’s results, how would that change your model choice?”  

Machine Learning Engineering

  • “How would an application use the predictions produced by this model?”  
  • “Thinking about scenario X, what are the pros and cons of running this model as a real-time service versus a daily task?”

6. Don’t be afraid to say “I don’t know”

It’s almost certain that a candidate will be asked about topics they don’t have experience with, so don’t be afraid to be honest in these situations. No one expects you to know everything, just that you can learn what’s needed as part of your daily work.  

Pretending you know something you don’t know is much worse than admitting your lack of knowledge. The interviewer will likely ask more detailed questions and notice your lack of understanding. It’s perfectly okay to admit inexperience in a particular subject and try to learn more about it from the interviewer.  

Still, you may have some related knowledge that helps you make an educated guess, as long as you make it clear that it’s speculation. When asked about an unknown topic, you can respond with something like, “I’m not exactly sure what you mean by X, but based on the context and my past experience, I think it’s something like… Is that correct?”

7. Expect to be asked about areas where you claim expertise

It’s interesting to highlight experiences in specific areas related to data science/machine learning, especially when they connect with the position you’re applying for. If you’re interviewing for a computer vision company, for example, it’s worth mentioning that you have experience working with convolutional neural networks and image processing in your previous job.  Naturally, this will attract specific questions about those areas. So it’s important to keep your resume updated and make sure you can answer questions on those topics.

8. Consider the lifecycle of a machine learning application

It’s important to understand how a machine learning system evolves from an idea to a real production system. Let’s briefly talk about the three main stages of a system’s life cycle. Our blog posts dedicated to the issues relevant to each of these moments:  

  • Before: Before having a functional model, you need to understand the problem to be solved and make sure DS/ML is the way to go. At this point, you’ll probably talk to several stakeholders, like product managers, business analysts, and software engineers. Make sure all parties involved understand the limitations and how the model will be used.
  • During: During data analysis and modeling, many assumptions are tested, and once again, there will be much communication to validate assumptions and keep stakeholders updated as the project progresses. At this stage, you usually need to think of a business rules layer that will turn the model’s output into final decisions for the application. The first deployment is a crucial project step where many things can go wrong. This is when the model is integrated into the infrastructure it will be part of. Tests and adjustments will be needed before it’s fully ready for production use.
  • After: After the model’s implementation, it’s essential to monitor routines to ensure that inputs and outputs are functioning as expected. Finally, you need to consider if and when the model should be retrained, and whether the model itself will affect future training sets.

All information here is to be taken as a rough guideline only. While we have tried to make the text as widely applicable as possible, what works for us at Nubank may not necessarily be suitable for every situation.

Check our job opportunities