Building and Maintaining Internal Tools for DS/ML teams: Lessons Learned

Written by: Felipe Almeida
Contributors: Arthur Kamienski, Caique Lima, Sarah Malaman, Vitor Pinheiro

Introduction

Data Science (DS) and Machine Learning (ML) practitioners rely on many tools, even though this is not always immediately obvious. Whenever we import a library into a Python notebook or run commands such as grep, find and sed on UNIX-like systems, we are using and benefiting from a tool somebody else wrote.

But we can also write our own tools. Tools usually emerge when somebody detects repeated functionality across multiple products or projects. The natural programmer instinct is to extract the common behavior into a separate place to avoid duplication and make future changes easier. Thus a software library is born.

The use of tools such as libraries (but also UI-based Applications, Platforms and command-line Applications) across an entire organization is commonplace in modern technology companies. This strategy is usually termed inner-sourcing. While it offers considerable advantages, it also comes with challenges that need careful consideration.

The creation of organization-wide software tools for internal consumption has been termed inner-sourcing in technical media

In this post we’ll list lessons we learned over the years using internal tools at Nubank. These lessons are not only about technical points, but also subjective issues one must heed to ensure the success of these tools, treating them as internal products.

The article is aimed at teams that build and maintain internal tools to be used by other teams, especially those with DS and ML use-cases in mind.

We’ll start by summarizing the whys of building internal tools with a brief overview of the main types, then list the lessons we deem most important, in no particular order.

Check our job opportunities

Why build internal tools?

We build tools because they make us more effective and efficient. This is true at the individual and especially at the organizational level, as there are many emergent and nonlinear phenomena that only take place once you reach a certain scale.

Tools drive standardization

If teams use a common set of libraries, it’s easier to make organization-wide changes. Once you ship a new library version and all teams update it, boom; changes are applied everywhere. This is often the case with security and logging features.
If all teams use the same set of tools, people can change teams and quickly get productive.

Tools increase the impact of local solutions

Enhancements and optimizations created by small teams (or even individuals) can be quickly shared among everyone in the organization. This is a massive amplifier effect that grows larger the larger the company itself is.

Tools encode tacit knowledge

Tools encode knowledge that would otherwise be scattered across multiple teams or, even worse, only live tacitly in the heads of a few key people. Once they leave the company, the knowledge is lost as well.

Tools reduce risk and mistakes

Using a common set of robust, time-tested functions/libraries is much less risky than having each team build their own.
Tools can be used to enforce organization-wide principles and rules, by incorporating those into the code. These include access control and logging functionality, for example.

Tools decrease TTM

No analysis paralysis: there’s no need to spend time researching what is the best way to handle each problem when there is a default tool built for that.

Types of internal tools

At the end of the day, software tools are just another piece of code. But if we look closer, there are different types of tools with respect to the interface they expose to users and the capabilities they provide. Each of them plays a separate role in a tech organization.

A key distinction is that of tactical vs strategic tools: those that are usually used within at most 1-2 teams and those that can be shared by the whole organization — potentially becoming a key asset for a company.

Figure 1 below summarizes the main types: UI-based and Command-line Applications, Libraries and Platforms.

***Figure 1****: Types of internal tools. UI-based and CLI applications are usually smaller in scope and complexity. Libraries and platforms are built specifically to be reused by multiple teams.*

It’s easy to see the difference between UI-based and command-line applications, but this is not the case with libraries versus platforms: One key difference is that platforms usually require one to go all-in on them. You can’t normally pick and choose which parts of a platform you want to use — you either use it all or you don’t. Compare this with libraries: you can use multiple libraries at the same time, mixing and combining them in multiple ways, but once you choose a platform you stick with it.

Libraries are collections of software that allow you to pick and choose what you want to use. Platforms are usually opinionated — enforcing design decisions that you can’t opt out of.

Another difference is that a platform is usually opinionated; clear design choices are made and you need to embrace them if you choose to use it. For this reason, platforms are great to enforce standardization and organization-wise patterns that all teams should adhere to.

With the intro out of the way, let’s dive in on the actual meat of the article.

Internal tools are products and should be managed as such

Internal tools are products and must be managed as such. You must think of internal users as customers — which is sometimes hard due to the proximity and informality of day-to-day operations.

Ideally you should have dedicated product managers understanding and interacting with users — and setting up priorities based on what they need. If that’s not possible, there should at least be people with product management skills in the team.

The bare minimum you need to manage an internal product are: a clear roadmap, structured feedback from users, and a good support channel.

Roadmaps: A clear, public, roadmap with milestones helps users and the developing team understand what is planned and which tasks have priority over others.

Structured feedback routines: You need to survey users from time to time, to get positive and negative feedback. A combination of short, objective questions with an open-ended feedback space usually works well. Prefer anonymous feedback to make sure clients don’t refrain from criticizing due to office politics. Make it clear that any feedback (especially constructive criticism) is welcomed.

Support channels: A Slack channel will do at first, but it’s hard to scale as the number of users increases: you may soon need ticket or issue-tracking systems to handle the load, which enable you to quantify and measure how good support is.

Examples and good documentation help reduce the need for 1:1 support. Also, make sure users know how to search Slack for answers to frequent questions.

Evangelization is key

Hardly anyone wants to learn yet another tool — especially if the advantages aren’t immediately apparent. We must actively practice evangelization, promoting the tool to prospective users.

In some instances, the adoption of a tool might be mandated through top-down approaches, such as incorporating it into Objectives and Key Results (OKRs) or other corporate goal-setting frameworks. In these scenarios, there’s probably not a lot of need for evangelizing. But that’s not our focus here: we’d rather build tools users want to use, because they see value in it — not because those are forced upon them.

A good tool that “just works” and is liked will always need less marketing than one hated by users.

From the point of view of customers, using a new tool means having to learn it and maybe change some aspect of their work.

This can be a significant barrier, because people generally resist change, especially when it involves modifying a workflow they are accustomed to and proficient in. Several factors are at play here: perceptions about job security, ego issues and perceived loss of influence. People don’t want to let go of something they spent 2 years mastering, especially if they think that confers them status within the organization.

In our experience, it’s better to show value first. If you do it right, the decision to adopt your tool will be obvious and easy. Here is how:

Pilot projects: The development team themselves can execute initial projects using the tool to generate early success stories that encourage adoption.

Support channels: We believe that around 30% of a tool’s success comes from good support. A dedicated Slack channel for prompt, judgment-free assistance significantly enhances user willingness to engage with the tool.

Accessible resources: Providing ample examples and maintaining a repository with clear, well-written source code allows users to easily understand and utilize the tool. The support channel (e.g. Slack) doubles as a searchable repository of solutions, further facilitating user learning.

Quantifying benefits: Demonstrating the tool’s potential impact in terms of time and monetary savings helps make a compelling case for its adoption. Use approximate dollar amounts and time estimates (e.g. man-hours saved) to make the impact more concrete.

Logging, instrumentation and user feedback

When building internal tools, there is a real risk we’ll just assume we know what users need and how they’re using our tools.

Use data instead. Without data, product management is reduced to hunch-based decision-making, rendering it a purely political endeavor where the highest-paid person’s opinion (HiPPO) wins.

Collecting information enables the maintainer team to understand how users are using the tools: it helps gauge customer satisfaction but also detect focus areas you should concentrate efforts on — making maintenance more effective.

We see two main ways of collecting information: implicitly and explicitly, via system instrumentation and surveys, respectively. Each provides insights from a different angle and both have advantages and disadvantages, as we’ll see next.

Two ways of collecting tool usage data: implicitly (logs, instrumentation, dashboards) and explicitly (user surveys)

Collect data implicitly via logging and instrumentation. Every programming language supports logging, which can then be visualized in tools such as Splunk or Grafana. The main advantage of implicit feedback is that it’s impossible to fake — but it takes work to set up.

Here are some examples of what information can be obtained via logging:

How many customers are using the tool over time?
Which specific functionalities/components are used the most?
Error messages / exceptions;
Detect users who started using the tool but eventually churned;
Measure loading times, response times.

On the other hand, user surveys are good for explicitly obtaining feedback -– as long as users feel comfortable about it. The main risk is that internal customers may avoid giving honest and sometimes negative feedback due to fear of office politics and retaliation. User surveys should be anonymous to avoid that.

User surveys should be anonymous. People will never be comfortable delivering negative feedback if they feel comments may hurt their career in the organization.

When choosing surveys à la Google Forms, be mindful of your customers’ time: Pack as much as possible into a short form — people have better things to do than answering a 20-page survey on your product.

Prefer questions with numerically-ranked answers (e.g. Likert scale) to enable comparisons over time and be as specific as possible: “On a scale of 1 to 5, how much would you say tool X has increased your productivity when doing task Y?”.

Finally, make sure to include 1-2 open-ended questions to allow general opinions and advice from users.

Examples are the best type of documentation

Having documentation is a common requirement for internal tools — especially when you need to cater to nontechnical users.

However, good documentation costs a lot to write and especially to maintain. Adding features to tools will always have priority over updating the docs. So the tools evolve and documentation lags behind.

The usual path for tool documentation is to grow stale over time and die a slow death, eventually getting to the point where nobody trusts it anymore. Sometimes it even becomes outright misleading: instructions that don’t make sense anymore, reference to features that have been changed, etc.

Also, nobody wants to read documentation. Users want to get things done as quickly as possible. The fastest way is usually by looking at examples and trying to figure out how to adapt those to their own use case. Make it easy for them by having a set of canonical examples as part of the official documentation.

People nowadays have short attention spans and nobody has the time – or the inclination – to read written documentation to learn how something works, except as a last resort. Focus on examples instead.

In addition to the benefits above, a good suite of examples also help reduce support effort. It saves the support teams from having to address questions that could be answered by pointing at an example.

Examples can even double as integration tests for tools — thus making sure the examples are always validated against new builds, as part of your CI/CD flow. Your documentation is now testable — which is the best of both worlds, preventing it from becoming stale.

Finally, examples help prevent “bad practices” from propagating among the user base. You’ll be surprised by how many users copy and paste other people’s code in an attempt to make things work.

Examples are the best type of documentation, especially for internal tools to be used by technical folks such as data practitioners. They are easier to create, validate and maintain than written text; they are closer to the source of truth (the source code) and quicker to consume. Examples are second-only to the code itself.

Maintain Consistency

When someone uses a tool they need to build a mental model of how it works. A simple example is the pedal setup in manual cars.

Car brands vary widely with respect to colors, sizes, and even the side the driving wheel is located: but they are consistent with respect to the pedal layout: once you’ve built a mental model of how to use them, you can rely on it no matter what car you drive. This can be seen in Figure 2 below.

***Figure 2****: The vast majority of manual (i.e. non-automatic) cars follow this pedal layout. Once you have learned how it works, every car looks the same.* *Source*

Now imagine if every different car brand had a different pedal setup. When driving a new car, you would never know what to expect and you’d have to build a whole new mental model and re-learn how to drive it.

Cars differ widely with respect to colors, styles and sizes, but they are consistent when it comes to the pedal setup: the mental model (clutch, brake and gas) doesn’t usually vary.

Internal tools such as libraries and platforms are also like that. The more consistent a tool is — using the same patterns, names, structure and conventions everywhere — the faster users can build a mental model, and the more intuitive the tool will feel.

This helps prevent errors, reduces the need for support and, most importantly, decreases the cognitive burden of your tool. Using a consistent tool means that once you’ve learned how to accomplish one specific task, you’ve learned them all – as is the case with cars and pedal setups.

We all know tools that feel easy and intuitive and those where you can never quite remember how to use (and need to constantly google for examples). Pandas and Matplotlib are examples that come to mind here: although very good libraries there’s some inconsistency in names, argument order and function semantics.

A good rule of thumb to keep in mind is the principle of least surprise, whereby you, the tool maintainer, always choose to structure things in a way that feels more natural, or less surprising to the user.

The principle of least surprise: In interface design, always do the least surprising thing.

The biggest advantage of consistency is to decrease the mental burden of users, but we have found it even helps in discoverability: users are better able to explore your tools if things are consistently named and structured.

Here are some practical tips on how to achieve consistency in internal tools:

Convention over configuration: Made famous by the Ruby on Rails framework, it states that things should follow some convention by default. For example, classes should be named FooBar (in CamelCase), and the corresponding database table should be called foo_bar (snake case).

Consistent parameter ordering: Keep the order of arguments in functions consistent! If you have multiple functions where each takes a name and a file_path, make it so that parameter order is kept the same! E.g. name is always the first argument and file_path the second.

Consistent structure: If you structure a library in folders, make sure they all follow the same patterns, for example: countries/business_areas/squads. This helps users find things more easily.

Consistent ordering: When listing anything, order things in the same way. Finding things is made much easier if one knows they are ordered alphabetically for instance.

Use the same names for the same things: If your tool uses visual elements and you’ve decided to call them “elements”, always call them “elements”, in the source code, in the documentation, everywhere. Not “visual components”, not “widgets”. Using different names for the same things (or, conversely, the same name for different things) makes it harder for users to understand complex tools.

Beware of Dependencies

As people start using tools you build, they will inevitably start to depend on those to get their job done. If the tools change in unexpected ways, this will impact users and prevent them from working.

This is actually a good problem to have; it means that people are extracting enough value from your tools to make it part of their workflows. The only tools without dependency concerns are those that nobody uses.

But dependencies are a problem because they limit the ability to change and enhance the tools: updates may break workflows that currently depend on the tools.

Like any software, internal tools need to change and evolve as more features are added. However, the more tightly users have coupled their workflows with your tools the harder it’ll be to add features and make the necessary changes to combat entropy and software rot.

You must limit coupling between users and tools, so that when the time comes to add new features and refactor, you can do it safely.

More specifically:

You need extensive integration testing so you can validate that the user-facing API stays stable as you add more features and make changes. Without integration testing, maintainers won’t ever be confident enough to make big changes to the codebase, leading to degradation and code rot. Extra points if you add these to CI/CD workflows.

Use versioning from the get go. For libraries, use semver (Semantic Versioning), which helps users choose specific versions of tools to depend upon, in a predictable way. For other types of tools, make sure that users know what version (v1, v2, etc) of the tool they are working with. This enables you, the tool maintainer, to create new versions while keeping the old ones still active.

In libraries, limit the use of public members such as functions and classes. In dynamic languages such as Python, users are still able to call private members, but they are aware that they are operating at their own risk.

Decouple the Public API from the internal implementation. This includes creating syntactic sugar and nicer import structures to call the core tool functionality. This allows you to make whatever changes you need to the tool internals; everything will work as long as the Public API is kept the same. Figure 3 below shows an example.

***Figure 3***: By restricting user access to a carefully thought out Public API, you create an indirection layer, which reduces coupling between users and your tool. All changes will be contained in the Public API and will be much easier to manage.

Hyrum’s law is an extreme interpretation of the dependency problem; it states that clients will depend, not only on the public API your tool exposes, but on any observable behavior that can be inferred or seen from outside.

According to Hyrum’s law, every public detail that can be seen — or inferred — from outside will eventually be depended upon by someone as the number of clients increases.

The law was originally about dependencies between software modules, but the analogy is still valid; it’s just impossible to guarantee that there’ll never be breaking changes as you update your tools. But still, there’s a lot we, the tool maintainers, can do to reduce the risks by keeping APIs stable while making sure the tool is updated as needed.

80/20 Thinking

Good tools do not try to embrace the world. They strike a balance between how much freedom and how much constraint they expose to users: too few constraints makes tools too unstructured and hard to use; too many, and advanced use-cases will be left out.

You should focus on making sure that the 80% of common tasks just work in a natural, intuitive, and efficient way. But you must also leave space for the advanced use-cases — the remaining 20%. This means supporting some level of customization to cater to advanced users who will have complex and sometimes unorthodox use-cases you had never previously considered.

With a good tool, the 80% of the use-cases are made easy, while the remaining 20% are made possible.

Users will always find a way to get their job done. If they cannot adapt your tool to their use-case, they will not use it. Full stop. It is your job as the tool maintainer to write a tool that just works for the simple use-cases but can still be customized for advanced ones.

Inner-source tools are open-source within the organization; so the first way to encourage customization is to follow good SE practices and write clean code, so that users can, themselves, understand it and eventually contribute Pull Requests (PRs) to the codebase.

Three strategies have been particularly useful for us:

Object-orientation enables easy overriding of default functionality

Object-oriented programming isn’t suitable for every use-case, but it works well when you want to expose components that can be extended by users if need be. This is even easier in dynamic languages such as Python (the lingua franca of the DS/ML world) where you can extend core classes and override default behavior at runtime by simply adding methods.

This has proved useful on multiple occasions; many advanced users can code so they hack their way into achieving whatever they want by creating slightly modified subclasses in their own code.

Sensible defaults

Most tools have configuration options; It can be a config file, “settings” screens on an app, or something like that. Configuration options enable user customization, so you definitely need them. But you don’t want users to have to set configuration options for every single task.

Configuration options should have sensible defaults that cater to the majority of cases — while enabling customization when appropriate. Deciding which default values to use requires you to understand your average user and how they use your tool. Find these out by interviewing them and studying their needs.

Ease of use over code elegance, every time

When you need to choose between ease of use and internal code elegance, favor ease of use — even at the cost of some technical debt.

Losing users is harder to recover from than a temporary loss of code quality, which should be easy to fix if you isolate user-facing glue code from the core business rules, as we suggested in Beware of Dependencies section of this blog post.

Conclusion

As we saw in the post there are many ways to avoid — or at least mitigate — problems when building and maintaining internal tools, especially for technical users such as Data Scientists and Machine learning Engineers.

If we had to summarize the lessons as a short TL;DR, we’d say:

(a) Treat internal tools as products and users as customers. Everything starts here;
(b) Understand incentives. You want to go for carrots, not sticks;
(c) Have ways to track how your tool is being used, both implicitly and explicitly.

Check our job opportunities

Building and Maintaining Internal Tools for DS/ML teams: Lessons Learned

Introduction