most read
Software Engineering
Why We Killed Our End-to-End Test Suite Sep 24
Product
Product Managers: what they do and why we need them Feb 15
Software Engineering
The value of canonicity Oct 30
Careers
We bring together great minds from diverse backgrounds who enable discussion and debate and enhance problem-solving.
Learn more about our careers



In the first part of this series, we talked about fklearn’s principles, and how it can be useful to build machine learning pipelines using pure functions. However, the most significant benefits of using fklearn come when doing analysis and validation of models, as we’ll see next.
Extensible Validation
For models that go into production, validation often requires a lot more than just finding out the value of some metric on a holdout set, or comparing a metric to a benchmark. We often want to answer questions such as:
These may all sound like very different questions, but they can be answered by the same generalized model validation algorithm.
In fklearn, this algorithm is implemented by the validator function:
The validator function in fklearn (inside fklearn.validation.validator) implements the generalized validation algorithm described here.
By merely swapping the splitting and evaluation functions, we can simulate and evaluate many different real-life scenarios, which helps us answer those questions.
Let’s look at an example, using the learner we defined in Part 1, in which we try to answer these two questions:
A full example of training and evaluation.
Let’s go over this example in more detail.
Check our job opportunities
Splitting Functions
In the example above, after we define the learner, we define two splitting functions:
Fklearn comes pre-packaged with many splitting functions for common use cases (you can find a full list and descriptions of when they are useful here). Most splitters are designed to simulate real-life situations, where usually models are trained on data from one time period but applied in the future. Simple cross-validation is frequently insufficient for real models.
Customizable Evaluation
This comes up often in model evaluation: single, global metrics often don’t tell the full story. We might want to isolate the effects of model rank ordering and model calibration or look into performance in specific subgroups of our population. We might also be interested in the evolution of a particular metric, rather than just a point in time estimate.
For this, we need to simultaneously evaluate multiple metrics, split across dimensions (e.g., time, customer segment). Fklearn enables this by allowing us to define “evaluation trees”, combining individual evaluation functions. Looking back at our example, here’s how we defined the evaluation function:
Example of defining an evaluation tree.
This code snippet leads to the evaluation tree shown below:
Tree representation of the evaluation function defined above.
Once final_eval_fn is applied to data, the entire tree of evaluators runs, and all results are returned in the log. This means both r2 and Spearman correlation will first be computed for everyone, then separately for each user segment. These “evaluation trees” can be very powerful, allowing Data Scientists to automate recurring analysis.
Analyzing results
As you may have noticed, most operations in fklearn return logs. These logs concentrate valuable information, be it model parameters, dataset metadata, or validation results. Fklearn can be very verbose with logging, as we’ve found ourselves regretting not saving information often.
Fklearn also provides helpful functions to extract data from these logs (they can become quite large), and it is easy to create evaluation plots using only the logs. This allows us to build a generic evaluation code that receives logs from training runs and generates dashboards with model performance, speeding up the iteration process. Here’s an example of extracting data from the logs:
Example of using fklearn’s extractors to get results from logs.
Sample stability curve plot, after extracting the data from the logs. It shows model performance (Spearman correlation between prediction and target) over time, split by segment. It would also be possible to plot R2 from the same data.
Functional bliss
As a final note on validation, notice that the learner we defined will be used again and again inside these validator calls, training several models. We get the peace of mind of knowing it is a pure function – so nothing that happens inside validation can change our model definition – and that, when validating our model, all the steps in our pipeline are being applied consistently to all the data folds. Ultimately, this means that our final model, which goes to production, accurately matches the models going through all these validation scenarios.
The same goes for tuning or feature selection. For both, fklearn provides functions that are similar in spirit to the validator: you define how to split data, how to evaluate model performance and reuse your learner function.
What’s next?
This post concludes our brief introduction to fklearn. For more examples of fklearn’s model validation capabilities and other powerful tools, check the documentation here. We also hope this gets you excited about trying fklearn yourself.
Check our job opportunities