most read
Software Engineering
Why We Killed Our End-to-End Test Suite Sep 24
Culture & Values
The Spark Of Our Foundation: a letter from our founders Dec 9
Software Engineering
The value of canonicity Oct 30
Careers
We bring together great minds from diverse backgrounds who enable discussion and debate and enhance problem-solving.
Learn more about our careers



When thinking about machine learning, understanding and predicting customer behavior over time is essential for various applications like fraud detection, product recommendation, and credit risk analysis. Traditionally, tabular models have been used for this task, aggregating behavioral information into static features.
However, this approach presents significant limitations in capturing temporal dynamics and the order of events. In this article, we’ll explore how sequential architectures, especially Long Short-Term Memory (LSTM) neural networks, can overcome these limitations and offer deeper, more accurate insights.
The Importance of Behavior in Machine Learning Models
Behavior refers to information dependent on actions taken over time that can characterize something or someone. Examples include:
Unlike static characteristics like age or gender, behaviors are dynamic and can vary significantly among individuals, offering a rich source of data for predictive analysis.
Check our job opportunities
Limitations of Tabular Models in Behavior Modeling
Classical tabular models rely on aggregated features that summarize behavior over specific periods—for example, calculating the sum or average of purchases made in the last 24 hours. While functional, this approach has its drawbacks:
Loss of Event Order
Aggregating data causes tabular models to ignore the temporal sequence of actions. Two customers who made the same transactions but in different orders would be treated identically by the model, despite their distinct behaviors.
Ignoring Temporal Interactions
Tabular models fail to capture how different events interact over time. For instance, if a customer makes a large purchase followed by an unusual withdrawal, the order of these events could be significant for detecting suspicious activities.
Independent Predictions
Each prediction in tabular models is independent of previous ones. The model doesn’t consider past events when making a new prediction, missing the opportunity to capture behavioral patterns that develop over time.
Sequential Architectures as an Alternative
To overcome these limitations, sequential architectures like LSTM neural networks offer an effective solution. These networks are designed to process sequential data, taking into account the order and intervals between events.
How LSTMs Work
LSTMs contain components called memory cells that retain information over time. They process event sequences one step at a time, allowing information from previous events to influence future predictions.
Practical Implementation of LSTMs in Behavior Modeling
Data Preparation
Instead of aggregating data, events are kept in their original sequential form. Each event can contain relevant features like transaction amount, event type (purchase, withdrawal, deposit), and timestamp.
Model Building
Training and Prediction
The LSTM is trained to learn patterns in event sequences that lead to specific outcomes like fraud or default. During prediction, the network processes new events, updating its internal memories and adjusting predictions according to observed behavior.
Advantages of Sequential Architectures
Capturing Temporal Patterns
By preserving the order and intervals between events, LSTMs can identify patterns that would be lost in tabular models. For example, a series of small transactions in a short period might indicate suspicious behavior that a tabular model wouldn’t detect.
Flexibility in Including Features
In addition to sequential features, it’s possible to incorporate static or aggregated features into the model, enriching the analysis without losing the ability to capture temporal dynamics.
Reduction in the Need for Feature Engineering
Sequential modeling reduces the necessity to create multiple aggregated features for different time windows, as the LSTM automatically learns which temporal patterns are relevant.
Challenges and Considerations
Computational Complexity
Sequential neural networks require more computational resources and training time. Using GPUs can accelerate the process but increases costs.
Hyperparameter Optimization
Finding ideal parameters, such as the number of sequences to consider or the processing direction (from the most recent event to the oldest or vice versa), is crucial for model performance.
Evaluation of Necessity
It’s important to assess whether the additional complexity is justified. If event order or temporality isn’t significant for the problem, tabular models might be more appropriate.
Case Studies and Practical Results
During the implementation of LSTMs for fraud detection, the following results were observed:
Integration with Traditional Features
Even when using sequential architectures, it’s possible—and often advisable—to combine traditional features into the model. For example:
This integration allows the model to leverage the best of both worlds: capturing temporal patterns and utilizing proven static features.
Frequently Asked Questions and Clarifications
How to Handle New Customers or Those with Few Events?
For customers with few or no events, the model can fill sequences with null values or zeros, and the network is trained to handle these situations. Additionally, static features may carry more weight in these circumstances.
Are LSTMs Always the Best Option?
Not necessarily. If the event order isn’t crucial to the problem or computational resources are limited, traditional models may be more efficient.
How to Optimize LSTM Performance?
Conclusion
Sequential architectures like LSTMs represent a significant advancement in behavior modeling within machine learning. By capturing temporal dynamics and event order, they offer more accurate predictions and deep insights into behavioral patterns.
Despite challenges—particularly regarding computational resources and complexity—the benefits can be substantial in applications where time and event sequence are critical.
For more insights like these, watch the recording of the Data Science & Machine Learning meetup.
Check our job opportunities