Fraud Prediction with Machine Learning in the Financial Industry: A Data Scientist’s Experience
Insights and experiences from a data scientist on the frontlines
Hello Fellow Data enthusiasts! I’d love to share with you what I have learned from 3 years of developing machine learning models to predict fraud in the financial industry in a few articles. So If you play any roles of project manager, data scientist, ML engineer, data engineer, Mlops engineer, fraud analyst or product manager in a fraud detection project , you may find this article helpful.
In this first article of this series, I want to address below points:
- What is the business problem to solve
- High level steps of the project
Business Problem
Every day, millions of people use money transfer services worldwide. These services help us send money to loved ones and make purchases easier. But fraudsters use these systems to trick others into sending them money or taking over their accounts for fraud. This hurts both the victims and the companies involved, causing financial losses and damaging reputations. Moreover there are also the regulatory and compliance implications for the companies and liable parties in the system (For instance western union was charged $586 million in 2017 for failing to maintain an efficient anti money laundering and consumer fraud system ). Predicting the fraudulent transactions before the funds fall into the hands of fraudsters is vital for the companies. This is where AI/ML driven fraud management tools come into play.
The companies goal are mainly minimizing operational costs, improving the customer experience or reducing fraud and losses.
There are various types of fraud in this context such as:
- Elderly abuse
- Good samaritan
- romance scam
- consumer scam
- account warming
- identity theft
- Account takeover ( ATO)
- Money Laundering
If you are interested to learn more about each specific fraud type, Here are some useful links: Six Types of Payment Fraud, Money Transfer Scams
Steps of project
ML/AI projects are often done in an iterative way. But below 9 steps have been a good start points of projects in my experience.
1. Understanding the Existing System
The existing system involves people, processes, and systems.
People: Identify the key individuals with domain expertise in managing fraud. Determine their roles and how they can contribute to the project. For example, expert fraud analysts can significantly contribute by defining fraud factors and identifying trends.
Processes: Analyze how the company currently identifies fraud and how it measures its effectiveness.
Systems: Evaluate the systems currently used to detect fraud. Many companies may have an existing rule-based expert system in place.
2. Defining Stakeholders’ Goals
It is crucial to understand the different goals of stakeholders to align them and clarify expectations from the beginning. For example, from the compliance team’s perspective, a high detection rate of fraud is desirable, while the marketing team may be more concerned about the impact of false positives on customer experience. Meanwhile, the operations team may require a specific SLA for the timing of predictions to ensure smooth operations. It is inefficient to optimize all these potentially conflicting objectives in one phase of the project. Therefore, leadership support is essential for setting priorities and finding common ground.
3- Data Understanding
You have definitely heard the famous phrase: “garbage in, garbage out.” To avoid feeding poor-quality data into the ML model, we need to analyze the data sources and their quality to ensure they meet both experimentation requirements and online streaming standards. Identify constraints in the existing data and articulate their impact on the quality of predictions. This step is crucial for maintaining the integrity and accuracy of the model’s outputs.
4- Red-flags Definition
The building blocks of an ML model are features. In the context of fraud prediction, these features primarily represent fraudulent behaviors or red flags. At this stage, we extract the tacit knowledge of fraud experts and translate it into a list of red flags, which are then developed into features to feed into the model.
Red-flags for instance could be: No. of transactions a customer sends to a high risk country, High number of distinct customers sending money to one person in a short time period, etc.
5- Feature Creation / Engineering
At this stage, the identified red flags are coded into features. Various feature groups can be defined, such as remittance features, transaction patterns, and user behavior metrics. Feature engineering is a crucial step in deriving the most informative features that distinguish fraud from non-fraud. This process involves selecting, modifying, and creating new features to improve the model’s accuracy and predictive power.
6. Model Training and Testing
In this step, the goal is to fit a machine learning model, or models, to predict fraud with reasonable accuracy. The desired accuracy level depends on business requirements and the extent of improvement needed over the baseline system (this is were the objectives defined in step two are referred to).
7. Real-Time Operationalization
All previous steps were conducted in an offline, batch environment. Once the model is ready, it must be deployed in production so that its predictions can serve downstream systems in real-time (less than one second in our projects). The MLOps team is responsible for this step, optimizing the runtime of the pipeline and ensuring seamless integration with other systems.
8. Real-Time Monitoring
Once the model’s predictions are integrated into real-time systems and utilized by the operations team, it is crucial to closely monitor performance. The goal is to ensure that the real-time performance aligns with the expected results tested in the batch environment. If discrepancies arise, it is essential to identify and address the underlying issues. For example, monitoring should include tracking the number of transactions processed by the model, the number of transactions predicted as fraud, and the subsequent journey of these transactions. Additionally, the performance of the pipeline itself must be monitored to ensure the service is up and running as expected.
9. Setting Up the Feedback Loop Process
Establishing a feedback loop process is essential to continuously evaluate the model’s performance and refine it accordingly. This process involves incorporating actual labels back into the system, along with any additional pertinent information. For example, if transactions were flagged as fraud by the model, it is important to track how many of these were investigated and the outcomes of those investigations. Similarly, insights from a quality assurance team, including potential reasons for false positives, should be incorporated back into the system to enhance the feedback loop process. This iterative approach ensures ongoing improvement and optimization of the fraud detection model.
In the next article, we will see the various roles involved in this project. Let me know how your experience has been? What are the similarity or differences between your experience and mine?
Fraud Prediction with Machine Learning in financial industry was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
Originally appeared here:
Fraud Prediction with Machine Learning in financial industry
Go Here to Read this Fast! Fraud Prediction with Machine Learning in financial industry