This is a follow up to Big Data Strategy & Roadmap - Our Data Journey blog and the focus is on different use cases that Treselle System's engineering team identified to demonstrate the basic power of Machine Learning (ML) and predictive analytics that Credit Unions can apply on their Member's data. The models are kept simple but insightful on purpose so that it is easy to understand for different CU stakeholders.

Please note that this is a mock data which was bit altered so that it is suitable for ML purpose.

Use Case 1: Member Segmentation by Product


  • The objective is to understand how members are associated with different products (Checking, Savings, Auto, Home, Personal, Credit) and apply clustering to identify patterns on how members are distributed and find certain clusters for targeted marketing.
  • K-means clustering algorithm was applied and identified 6 clusters based on member attributes such as products currently associated and the balance amount.
  • For each cluster, the product probability is calculated based number of product count.
  • Members in cluster 4 and 5 have high percentage of Savings and Checking product types but very few Loan product types. This cluster can be used for promoting loan products.
  • Members in cluster 6 are strong in Savings account and can be targeted for Home Loans
  • Members in cluster 1 have higher loan products but less Savings account compared to other clusters. Its important to keep an eye on this cluster for any defaults or delinquencies
  • Click here for Demo URL

Use Case 2: Member Segmentation by Lifetime Value (LTV)


  • Member Life Time Value is calculated on the dataset by using Acquisition cost per member, Savings account interest, Loan interest and sum of yearly transaction & other service charges.
  • Categorized the members based on age group as (0-39) Generation Millennials, (39-53) Generation Xers, (53-72) Baby Boomers, (72-92) Silent Generation.
  • K-means clustering algorithm was applied and identified 4 different clusters. Some clusters have members from multiple generations and few clusters have members from single generation.
  • Observation:
    • 1st cluster members have high income and average CLV.
    • 2nd cluster members have high income and high CLV.
    • 3rd cluster members have less income and average CLV.
    • 4th cluster members have high income and less CLV.
    • Members in the cluster 1 & 4 are good segments to target and increase the profit
  • Click here for Demo URL

Use Case 3: Advanced Targeting: Propensity Scoring of Auto Loans


  • The objective of this use case is to filter all the members who have Auto Loan and train the machine learning model and apply this model on the members who don’t have Auto Loan to predict which members will opt for Auto Loan.
  • Logistic regression machine learning model was used to train the data.
  • Once the model was trained, the non auto loan members are used as test data and propensity scores are calculated for each members by using the trained classification model. This propensity score will give weight for each member that decides the probability of taking auto loans.
  • Observation: Members who have personal loans have high propensity to opt for Auto Loans.
  • Click here for Demo URL

Use Case 4: Model Comparison for Loan Eligibility


  • The objective of this use case is to use DTI (Debt-to-Income ratio) and Credit Score features and compare with multiple ML models to identify the good fit model.
  • 8 different machine learning models are applied on the dataset by splitting 70% & 30% to train and test the models and compared the accuracy. 
  • Once the model was trained, the test data was used to perform the prediction and finally compared the accuracy based on the confusion matrix output.
  • Among all models, Logistic Regression and Decision Tree was producing an accuracy of 88%.
  • Click here for Demo URL

Email me when people comment –

You need to be a member of Credit Union Big Data Analytics Community and Summit - CULytics to add comments!

Join Credit Union Big Data Analytics Community and Summit - CULytics


  • CU Employee CULytics Founder

    Thanks Raghavan for this awesome post. All the use-cases mentioned above are directly relevant to CUs.

    Can you talk more about the operationalization and expected ROI that you have seen with these models?

    • Vendor


      Note: Long reply.

      The tedious part is of course the data preparation and normalization. Below are our experiences running these sort of models on capital market industry:

      1. Energy Well Performance Projection: This is similar to Member Lifetime Value but for Energy drilling wells to perform the Estimated Ultimate Recovery (EUR) by applying multiple models such as Random Forest and Decision Trees to analyze individual oil and gas wells production data and predict production of individual and group of wells.

      ROI: This has become one of the main IP (Intellectual Property) of the platform and became an add-on feature which Portfolio Managers need to subscribe for this analytics outcome. This is the only platform in the industry to do such analysis on more than half a million drilling wells.

      Operational Details: The frequency of this data from multiple US states are from weekly to monthly and so to keep the cost low, we launch Hadoop + Spark ecosystem (4 node cluster) on-demand (AWS) bi-weekly and run this for half a million drilling wells that takes about few hours for data sanity check and preparation and 6 to 8 hours for running the models and shut them down later.

      2. Anamoly Detection of Stock Prices: This is similar to identifying outliers or anomalies or sort of frauds with respect to Banks & CUs which involves classification models such as Naive Bayes and Logistic Regression that identifies the anomalies or outliers from the source (S & P CapIQ dataset).

      ROI: Our Client's customers (Portfolio Managers) are subscribed to listen to these anomalies everyday and the system will trigger necessary events such as email to alert about the outliers. This is one of the interesting features of the platform that attracts many Portfolio Managers because its very hard to identify these outliers when they are monitoring 50 to 100 stocks of large volumes (several thousand dollars to millions).

      Operational Details: This is unlike the previous scenario where the model needs to run daily once the dataset arrives and are deployed on Reserved Hadoop & Spark ecosystem (8 node cluster) as it needs to process 8000 tickers and the computations are intensive due to backtesting capabilities.

      Finally, we do run other models just using R & Python on single node instances that are not very compute intensive. These perform clustering and recommendation machine learning algorithms.

      Off Topic: We are also in talks with couple of telecom clients and perform CDR (Call Detail Record) analysis as a PoC which gave a different perspective to these prospects of how to leverage their historical data.

      Hope this helps.


    • CU Employee CULytics Founder

      Thanks for sharing.

This reply was deleted.