most read
Software Engineering
Why We Killed Our End-to-End Test Suite Sep 24
Culture & Values
The Spark Of Our Foundation: a letter from our founders Dec 9
Software Engineering
The value of canonicity Oct 30
Careers
We bring together great minds from diverse backgrounds who enable discussion and debate and enhance problem-solving.
Learn more about our careers



Written by: Felipe Almeida
Reviewed by: Gustavo Durães, Juliano Garcia, Renata Andrade, Moisés Rojo, Mateus Magalhães
Introduction
Real-time machine learning (ML) models are frequently used to drive decisions in business processes. While they are more complex to implement and operate compared to batch models, they are often more effective because they leverage real-time data as features and they unlock use cases where immediate user feedback is essential.
Traditional (i.e. non-AI) ML-enabled real-time systems usually work as follows: An external action (e.g. a user interacting with a digital bank mobile app) sets off a chain of events that causes services to send data to a real-time model, which then outputs a prediction based on that data. That prediction is passed on to another service (sometimes called the decision layer) to determine the appropriate business decision (e.g. issue or reject a loan request, accept or deny an insurance policy request). A simple example can be seen in Figure 1:
Features used in real-time ML models often come from user behavior patterns and signals, such as geolocation data, social media engagement, or past user actions. This means that some features may directly be under the user’s control.
Most people are aware that ML models are used in digital systems today (e.g., estimating risk, recommending content, ranking items). Some may attempt to manipulate ML feature values to influence the system’s behavior. Such attempts — usually by ill-intentioned actors — to manipulate feature values are called adversarial attacks.
Models may be prone to adversarial attacks if two conditions are met: (a) features can be controlled in some way by users and (b) there exists an incentive for people to manipulate the model. Real-time ML models are even more vulnerable; the learning cycles are much shorter and decisions made upon the models’ output may happen immediately, with no human intervention.
Let’s explore some examples where real-time ML models may be vulnerable to adversarial attacks:
Example 1: Real-time credit underwriting
The decision of whether or not a retail bank should give out a loan to a customer is often made by ML models. Sometimes, by real-time models. The reason to prefer real-time over batch models is that quick decisions provide better UX and increase revenue, all else being equal.
This presents a clear conflict; banks want to lend money to people who will repay it, but some customers may want to take out loans even if they are unsure they can repay them. Therefore, there is a financial incentive for some people to try to increase their chances of being approved for a loan by manipulating the ML model.
Example 2: Real-time Fraud Detection in CC purchases
Many organizations use ML to detect fraud. As in the previous example, this creates an incentive for bad actors to try and fool models, as criminals want to carry out their fraudulent operations without getting caught.
One particular example is when a bad actor uses leaked information to try and make purchases using somebody else’s credit card. To protect customers, many payment processors employ real-time ML to detect and prevent such fraudulent transactions.
Example 3: Search Engine Ranking
Search engines use information from web pages to gauge their quality and determine how well they should rank in search results. This is usually done using ML/NLP ranking models.
The proper way to improve a webpage’s ranking is (as one would expect) to produce quality content that users want to click on and consume. The field of Search Engine Optimization (SEO) is built around this.
However, since ad revenue is a function of the number of views a website receives, there is a clear financial incentive for website managers to adversarially attack search engines to inflate their own rankings. This is called Spamdexing and includes many tactics such as using invisible text, link farms, and more.
As an ML practitioner using real-time ML in real-world scenarios, you want to make models less vulnerable to adversarial attacks and manipulation in general.
There are many ways to make them more robust (as we’ll see below) but the only sustainable long-term solution is to embrace the fact that user-facing models will be gamed eventually and have a frequent retraining schedule — so blind spots are quickly incorporated into the next retrain cycle.
Before we start with the tips, let’s address two basic prerequisites: 1) understanding feature importance and 2) having some kind of monitoring setup in place.
Check our job opportunities
Prerequisite 1: Understand feature importance and directionality
You need to know which features are the most important and which of those are vulnerable to user attacks and manipulation. This will help you identify where the majority of the risk lies and which features you need to focus on; It will also enable you to estimate the “blast radius” if a feature is exploited.
At a minimum, you need to know:
Knowing the absolute importance of features in a model (1) is usually easy; most implementations provide some way of identifying which features carry the most signal. There are also “model-agnostic” strategies such as SHAP. Counterfactual explanations may also be useful, especially to probe situations where interactions (rather than individual features) may be exploitable.
Knowing which features are vulnerable to attacks (2) is 100% dependent on the model implementation. You may need to collaborate with different team members (software engineers, data engineers, etc.) to understand exactly what information sources are used to build the model features at inference time – and whether end-users have control over those.
The two subsections below explain how the values for a specific feature affect the overall model output (3).
Understand directional feature importance
Feature importance for a trained model is usually presented as a list of features ranked by how much on average each feature contributed to the output, usually averaged over the training set.
But to protect models against abuse, it’s not enough to know that a feature is important; you also need to understand in what direction that feature impacts the output score. Knowing this is crucial because it will tell you where the incentive lies: attackers will naturally want to manipulate features in the direction that benefits them – to the detriment of the organization.
One of the best ways to see this effect is via the SHAP Beeswarm Summary plot as seen below on Figure 5. In this example, the model output is the likelihood that a person made over US$50k in the 1990s. The Y-axis indicates the “absolute” feature importance, but the X-axis shows the direction of the impact on the final model score.
The Beeswarm summary plot is also useful for debugging features during model development: you can check whether the chart matches your intuition of how each feature should impact the model score – and help you spot errors such as faulty features.
Understand value-dependent impact for key features
Understanding the rough correlation between features and the model output is useful, but knowing which values to watch out for is even more important. This is especially important in nonlinear models such as gradient-boosted trees or neural networks, where relationships between features and the score may be highly nonlinear and counterintuitive.
Attackers usually learn these relationships through trial and error and they will eventually discover not only which features affect the model score, but also which values of those features move the score up or down.
To stay ahead of attackers, you need to understand which feature values are particularly “dangerous”. Then you can build defenses (such as special monitoring) focused on those values or even decide to remove the feature from the model altogether.
Partial Dependence Plots show the effect of different values of a given feature on the final model score
For example, Figure 6 shows the impact of the “age” feature in a model trained to predict Heart Disease. Now suppose a Health Insurance provider uses this model to give financial benefits to people deemed at a high risk of having heart disease. If “age” is self-reported by customers. Ill-intentioned actors trying to receive undue benefits will eventually learn that misinforming their age – especially above 54 years – significantly increases the chances of being awarded those benefits.
Figure 6 above uses Python’s PDPBox library but PD plots are also available in frameworks such as scikit-learn.
Prerequisite 2: Have at least a basic monitoring setup
Model monitoring is crucial to enable one to observe how the model is working and, most importantly, how it’s impacting business decisions. Without at least some basic model monitoring, you can’t even detect when/if there are adversarial attacks going on, let alone do anything about it.
You should at the very least have feature monitoring and decision layer monitoring.
Feature monitoring
By feature monitoring we mean charts and plots showing the feature values over time, for every feature. For example, in Figure 7 below we can see what a monitoring asset for a given feature in a model would look like. Although basic, charts such as this are useful to assess the overall health of your model and help detect sudden changes in the distribution, which could be indicative of exploit attempts.
Decision layer monitoring
Decision layer monitoring could also be called “business monitoring”; this is where one monitors the business decisions that are made as a result of the model output. If we go back to Examples 1 and 2 in the introduction of this blog post, it would mean monitoring the rate of loans denied over time over time and the rate of CC purchases denied over time, respectively.
Decision layer monitoring answers the question: “How are business decisions being impacted by model predictions?” This is crucial because this is where the actual impact of attacks shows up. Attackers exploit models as a means to an end – the end is always to force the system into a favourable decision (to them).
This is just the bare minimum monitoring setup. Other dimensions to consider include operational metrics (CPU, RAM, Latency), performance metrics (F1, AUC, etc), train-serve skew and you should also add alerts on top.
For more monitoring considerations, see the last section in this post.
With the prerequisites out of the way, let’s dive into the tips for making ML models more robust against adversarial attacks.
Tip: Simple alerts help you detect attacks
While the goal is to prevent adversarial attacks, detection is crucial when prevention fails. For real-time ML, you need real-time detection systems that trigger alerts when certain conditions are met. Tools like Splunk, New Relic, and Datadog can be used to create alerts based on log queries.
Attackers know that they have to be quick; once they find a vulnerability they need to exploit it quickly before the targeted organization can defend itself against them. Therefore, alerts aimed at detecting adversarial attacks should try to detect sudden changes that could indicate an exploit is going on.
Also, you want to avoid alert fatigue – i.e. a large number of false positives – so you should base alerts on the actual business decisions, rather than modeling artifacts: instead of alerts based on raw model scores or features, prefer alerts based on the downstream actions that are impacted by the models, such as loans being issued, fraud incidents detected, customer messages routed incorrectly, etc
Here is one example of an alert that matches these criteria: “Raise an alert if the rate of denied CC purchases in the last 6 hours is more than 20% below the same figure last week” . This alert detects a sudden change (within 6 hours) in a business metrics (denied CC transactions) and it uses a comparison with the previous week, to account for seasonality.
One example of what to do once a true-positive alert goes off is to turn off a feature flag, as explained below. For more on alerting best practices, see Best Practices for Real-time Machine Learning: Alerting.
Tip: Frequent model retraining is the only sustainable solution at scale
User-facing ML-enabled systems will always be targets for attackers, especially in cases where there’s a possibility of financial gain. At the end of the day attackers run a cost-benefit analysis when planning attacks and your job as a practitioner is to increase the cost so much that it stops being worthwhile for most people to exploit your system; or else hoping attackers turn away from a life of crime and become upstanding members of the community instead. Right.
However, trying to fight each attack vector individually with handcrafted conditions eventually makes large-scale systems brittle and hard to maintain, defeating the whole purpose of letting data drive business decisions. It can turn into a whack-a-mole of sorts as you are constantly trying to defend systems against new exploits but attackers soon find new ways to circumvent defenses, and the cycle never ends.
After you have some guardrails to prevent disasters it’s probably better to accept that some exploits will take place — then use those as training data. Retraining models frequently allows you to incorporate exploit attempts into the training set for the next model iterations – making the model learn from these cases, growing more robust over time.
One especially useful combination is to automate model retrains, as it reduces the likelihood of human error and frees up resources to enhance the systems.
Tip: Make trial-and-error cycles costlier for attackers
Attackers learn how to game ML-enabled systems through trial and error; they interact with the system repeatedly to learn its “quirks” and idiosyncrasies. Making it harder for them to learn increases the effort needed, and therefore decreases the relative incentive to engage in adversarial attacks.
We will now list examples of how to make these learning cycles costlier; you should, however, keep in mind that these strategies can have unintended consequences, hurting the experience of legitimate users. Your specific use-case will dictate whether or not the tradeoff is worthwhile.
Example 1: Limit the frequency of interactions
You can limit the number of times users can interact with your system in a given time period. For example, customers who want to request an insurance policy may only make one attempt per month. This would require ill-intentioned users to wait much longer to learn, or farm multiple accounts, etc.
It’s also why operating systems force you to wait some seconds to try again after you type in the wrong password. This is done to reduce the chance of brute force attacks.
Example 2: Delay the model decision
The closer in time the user input is to the model response, the clearer the causation links will be, and the easier it will be for people to find patterns. Mitigate this by delaying the model feedback, instead of providing an immediate response.
For example, when customers apply for a mortgage via a mobile application, choose to inform them about the outcome after some hours (even if the outcome is immediately available).
Tip: You may need to trade off robustness for performance
Sometimes the potential impact of adversarial attacks is so high that the only available solution is to purposefully “dumb down” a model by actively reducing its sensitivity — effectively trading off some performance for enhanced robustness.
Here are some ways to achieve this:
Tip: Regularization is still your friend
Regularization is the process of making models less sensitive to feature changes. This is usually done to prevent overfitting, enhance generalization and to extend the shelf life of ML models, as regularized models tend to decay more slowly. But increasing regularization also helps making your model more robust against adversarial attacks
There are at least two approaches to regularization, with adversarial robustness in mind: the usual model-specific approaches (i.e. hyper params available for tuning in ML algorithms) and data-centric approaches (e.g. adding noise to your training dataset to make the model less sensitive to these features)
Model-specific regularization
By model-specific regularization, we refer to techniques that can only be used for specific ML algorithms. Most often these are model hyperparameters that decrease the model capacity and therefore its ability to overfit. In Table 1 below we list some examples of hyperparameters we have found to be particularly useful:
Gamma
Learning Rate
Maximum Tree Depth
Minimum Child Weight
Number of Trees
Number of Trees
Dropout
Early Stopping
L1/L2 Weight Regularization
Learning Rate
Data-centric regularization
There are also regularization techniques that involve data preprocessing of some kind. The main advantage of these is that they can be shared across multiple models and they can be made to be feature-specific; you can choose to only regularize some features, while keeping others untouched. This (together with hyperparameters as seen above) should force the model to trust these features less, without outright dropping them.
One simple way to achieve data-centric regularization is to add uncorrelated noise to your training set – particularly to the vulnerable features that you want to grow more robust to.
There are also “domain-specific” ways one can restrict the values of a feature. For example, if “age” is a feature in your model, you can enforce a valid range for the values during preprocessing time by using floor and ceiling functions. Even if attackers find out how to modify these values, the damage will be somewhat limited because they will be within the predefined range.
Don’t forget to test this approach, however, to see if it works for you.
Tip: Use feature flags to quickly handle attacks
Even though we ultimately want to avoid adversarial attacks, one must be prepared to act quickly to limit damages when and if they happen. Feature flags (sometimes called feature toggles) are one way to handle attacks quickly.
Feature flags are simple IF/ELSE conditions in the code that control whether runtime execution takes one path or the other, essentially turning parts of the code on or off. This is shown in Figure 8 below:
Feature flags are usually controlled from outside the normal CI/CD flow — so that one can update them without the need for a full service deployment, which may take time and/or fail. A common strategy is to store feature flags as files in a remote storage system such as S3.
Here are some examples on how to use feature flags to react to an ongoing adversarial attack:
Feature flags are very useful in all sorts of other situations unrelated to ML too. They are an interesting topic in and of themselves.
Tip: Favor coarse-grained encoding for especially sensitive features
Sometimes you just cannot afford to drop vulnerable features from a model. This may happen due to regulatory requirements or with features that are absolutely crucial to the model, and removing it would render the model useless.
One way to keep potentially exploitable features in a model (while reducing the attack surface a little bit) is to decrease their granularity. This usually makes the model slightly less accurate but provides a level of protection against attacks.
For numerical features, this usually means grouping values into bins, or buckets. Several software packages support binning or “bucketization” as preprocessing steps in a feature engineering pipeline.
With categorical features, decreasing granularity means reducing their dimensionality: either by grouping values together or by using methods such as PCA after encoding. For continuous and/or numeric features on the other hand, you can use buckets (also called bins) instead of raw values.
Decreasing feature granularity helps in two ways:
It’s important, however, to verify (with PD Plots for instance) that this approach works for your specific scenario. In some cases, the modified features still present an attack surface, particularly if ill-intentioned users can infer where the boundaries are.
Monitoring considerations
Monitoring is essential for detecting adversarial attacks in real-time ML, as explained above. Here are some practical tips to enhance your monitoring setup.
Monitor percentiles for feature values
Most monitoring tools focus on monitoring average values of numerical features. Though useful, averages hide information, particularly when decisions are made using predictions in the extremes, as is the case when working with risk predictions.
To avoid potential blind spots, you need to monitor feature percentiles too, especially those related to the extremes (percentiles 1, 10, 90 and 99), so you can visually detect weird deviations, which may indicate adversarial attacks are taking place.
Monitor business decisions
Although attackers work by trying to influence model features, their ultimate goal is to impact business decisions. So these must be monitored separately. Very often, attacks are subtle but they are just strong enough to move the score over the decision boundary, affecting business decisions. In other words, you want to monitor the distribution of the decisions driven by the model at the end of the funnel. This is what we called “decision layer monitoring” in Prerequisite 2.
In the example shown in Figure 10, we show the claim approval rates for a fictitious car insurance company. The decision to approve or deny an insurance claim is made using an ML model. The peak in April 2024 is reason for concern: it could be the result of attackers trying to engage in insurance fraud. However, the average prediction of the model driving the decisions doesn’t seem out of the ordinary; without the decision layer monitoring, we would be blind to a potential attack.
Monitor by segment
Segmentation refers to dividing instances scored by a model into meaningful groups, for monitoring purposes. The main idea is that viewing aggregate metrics for all instances as a combined group hides important details: since the sample size is larger, metrics tend to be smoother, and it’s harder to detect issues that only affect a subset of the full population, especially if those are less common.
Examples of segments include:
Viewing metrics by segment is also useful to detect adversarial attacks, as they may be targeted at specific subgroups, rather than full instance space as a whole. Figure 11 below shows how visualizing metrics by segment allows one to detect issues that would otherwise go unnoticed if only looking at the global aggregate values.
Input attributes vs features
Many model pipelines include feature engineering or pre-processing on top of raw user-provided inputs before using those into the learning algorithm. It may be hard to detect directed attacks if one only monitors the features, because there may be multiple processing steps until an input attribute actually becomes a feature.
You should have separate monitoring assets for the input attribute values — in addition to monitoring features and predictions over time.
Building effective, resilient real-time ML models
Real-time ML models are powerful but vulnerable to adversarial attacks, where malicious actors manipulate features to influence outcomes. To protect your models, focus on understanding feature importance with tools like SHAP and Partial Dependence Plots, and set up active monitoring to detect unusual patterns. Use real-time alerts based on business metrics to catch attacks early.
Frequent model retraining helps your system learn from adversarial attempts, while strategies like limiting interactions, delaying decisions, or adding noise make exploitation harder. In some cases, trading off performance for robustness—by removing vulnerable features or increasing regularization—can significantly reduce risks. Tools like feature flags let you respond quickly, and reducing feature granularity minimizes the attack surface.
By combining these strategies, you can build real-time ML models that are both effective and resilient. The goal isn’t to stop all attacks but to make them so costly that attackers move on.
If you want to strengthen your ML systems, keep reading our blog for more insights. And if you’re passionate about building cutting-edge ML solutions, we’re hiring! Join us in building the Purple Future.
Check our job opportunities