Experiments¶

Version 1¶

Solution Statement¶

Classically, churn prediction is considered a binary classification problem, where each "customer" is assigned a churn prediction value of 0 or 1. For the Paystone churn service, a binary classification approach was used to create the machine learning model, where each "customer" is a currently active pay customer (merchant). However, rather than return a churn prediction value of either 0 or 1, churn probabilities between 0 and 1 were returned for each customer to allow greater flexibility in making use of these values to prevent churn. An XGBoost Classifier was trained with labeled data to create the prediction model.

Data¶

Six categories of data were used in the creation of this model, which are: * Churn data * Case data * Application type data * Product data * Transaction data * Customer base data

Churn data was used as labels for the data, whereas the other categories were used to create features. The value each of the feature categories hold is that they allow the model to understand a specific merchant's circumstances within the context of them being both customer's of Paystone, as well as businesses with their own customers. The idea behind this is that merchant churn can mainly stem from two sources: dissatisfaction with Paystone, or negative personal business experiences.

Case, application type, and product data are used as means to understand customer dissatisfaction with Paystone. Case data can allow the model to pinpoint positive or negative interactions between Paystone and merchants through cases as a medium. Application type data can allow the model to understand what pay product applications, or combinations thereof, merchants have positive or negative experiences with. Product data provides benefits to the model analogous to the benefits application type data provides with regards to product instead of application type.

Transaction and customer base data are used as means to understand negative personal business experiences of the merchant. Transaction data can allow the model to understand business operations throughout time by examining specifics of the transactions a merchant has. Customer base data can allow the model to understand a business's size and market impact according to the customer base it has.

The labeled data used for training was fairly balanced, approximately 55% negative samples (not churned) and 45% positive samples (churned) out of a total of about 11000 data points in total, of which 70% was used for training.

Models¶

For the Paystone churn prediction service, a singular XGBoost classifier model is used. Other models were considered when designing the service, such as Random Forest Classifiers, Logistic Regression, and a deep learning neural network. However, an XGBoost classifier was chosen after an examination of model performance (as measured by relevant metrics), model bias, and apparent overfitting. The role of the model is in calculating the churn probabilities when provided with the relevant processed data of a merchant.

Serving Logic¶

The serving logic is composed of the following main steps: 1. Validate the Mids (merchant ids) provided by the user 2. Process the relevant data of the merchant into a form ingestible by the model 3. Feed the processed data into the XGBoost Classifier model artifact to make churn probability predictions

For step one, the Mids that were used to call the Paystone churn service are run through a validator. The validator ensures that all the Mids provided in the request are actual Merchant identifiers, and that they have not already churned. If either of these two validations fail, the request returns an error response to prompt the removal or correction of the invalid Mids.

For step two, the Mids are run through a pre-processor to create the features to be ingested by the model artifact. The pre-processor runs a number of queries and performs relevant aggregations and calculations to generate the features. The pre-processor performs the same processing on the live data as what was performed on the original raw data used to train the model.

For the final step, the features generated are ingested by the model artifact, which will return the churn probabilities. The results are converted to conform to the prediction schema and are returned to the user as a response.

The serving logic above is wrapped in a thread lock to prevent multiple requests from being made at a time. This is required since step 2 is resource intensive and server failure due to excessive load is possible if multiple requests are allowed to process at a time. This does not constraint usage of this service since the service would not expect multiple requests at the same time anyways.

Results¶

The model was tested using a random subset of 30% of the available labeled data. The results are shown below: * 82% precision and recall for predicting a negative sample (non-churned) * 78% precision and recall for predicting a positive sample (churned) * Average accuracy of 80%

Potential Future Directions¶

Using Economical Climate Data: Economical climate data is external source data, which hasn't been explored much during this version of model creation. Understanding this data could allow the churn prediction service to have a greater resistance against economical factors, like inflation, which would have an effect on merchant churn.
Using Competitor Data: Competitor data is also external source data. Understanding data like competitors' pricing and offers would allow the prediction service to be more cognizant of how active competition could sway a merchant away from Paystone, and could also surface actions that could be taken to remedy this.