Predict Employee Attrition with SHAP: An HR Analytics Guide


Highly skilled employees leave a company. This move happens so suddenly that employee attrition becomes an expensive and disruptive affair too hot to handle for the company. Why? It takes a lot of time and money to hire and train a complete outsider with the company’s nuances.

Looking at this scenario, a question always arises in your mind whenever your colleague leaves the office where you work.

“What if we could predict who might leave and understand why?”

But before assuming that employee attrition is a mere work disconnection, or that a better learning/growth opportunity is present somewhere. Then, you are somewhat incorrect in your assumptions. 

So, whatever is happening in your office, you work, you see them going out more than coming in.

But if you don’t observe it in a pattern, then you are missing out on the whole point of employee attrition that is happening live in action in your office.

You wonder, ‘Do companies and their HR departments try to prevent valuable employees from leaving their jobs?’

Yes! Therefore, in this article, we’ll build a straightforward machine learning model to predict employee attrition, using a SHAP tool to explain the results so HR teams can take action based on the insights.

Understanding the Problem

In 2024, WorldMetrics released the Market Data Report, which clearly stated, 33% of employees leave their jobs because they don’t see opportunities for career development—that is, a third of departures are due to stagnant growth paths. Hence, out of 180 employees, 60 employees are resigning from their jobs in the company in a year. So, what is employee attrition? You might want to ask us.

  • What is employee attrition?

Gartner provided insight and expert guidance to client enterprises worldwide for 45 years, defined employee attrition as ‘the gradual loss of employees when positions are not refilled, often due to voluntary resignations, retirements, or internal transfers.’

How does analytics help HR proactively address it?

The role of HR is extremely reliable and valuable for a company because HR is the only department that can work actively and directly on employee attrition analytics and human resources.

HR can use analytics to discover the root causes of employee attrition, identify historical employee data model patterns/demographics, and design targeted actions accordingly.

Now, what method/approach is helpful to HR? Any guesses? The answer is the SHAP approach. So, what is it?

What is the SHAP approach?

SHAP is a method and tool that is used to explain the Machine Learning (ML) model output.

It also adds the why of what made the employee voluntarily resign, which you will see in the article below.

But before that, you can install it via the pip terminal and the conda terminal.

!pip install shap

or

conda install -c conda-forge shap

IBM presented a dataset in 2017 called “IBM HR Analytics Employee Attrition & Performance” using the SHAP tool/method. 

So, here is the Dataset Overview in brief that you can take a look at below,

Dataset Overview

We’ll use the IBM HR Analytics Employee Attrition dataset. It includes information about 1,400+ employees—things like age, salary, job role, and satisfaction scores to identify patterns by using the SHAP approach/tool..

Then, we will be using key columns:

  • Attrition: Whether the employee left or stayed
  • Over Time, Job Satisfaction, Monthly Income, Work Life Balance
IBM Dataset
A glimpse of the IBM HR Analytics Dataset
Source: Kaggle

Thereafter, you should practically put the SHAP approach/tool into action to overcome employee attrition risk by following these 5 steps.

5 Steps of SHAP Tool/Approach

Step 1: Load and Explore the Data

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import LabelEncoder

# Load the dataset

df = pd.read_csv('WA_Fn-UseC_-HR-Employee-Attrition.csv')

# Basic exploration

print("Shape of dataset:", df.shape)

print("Attrition value counts:\n", df['Attrition'].value_counts())

Step 2: Preprocess the Data

Once the dataset is loaded, we’ll change text values into numbers and split the data into training and testing parts.

# Convert the target variable to binary

df['Attrition'] = df['Attrition'].map({'Yes': 1, 'No': 0})

# Encode all categorical features

label_enc = LabelEncoder()

categorical_cols = df.select_dtypes(include=['object']).columns

for col in categorical_cols:

    df[col] = label_enc.fit_transform(df[col])

# Define features and target

X = df.drop('Attrition', axis=1)

y = df['Attrition']

# Split the dataset into training and testing

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 3: Build the Model

Now, we’ll use XGBoost, a fast and accurate machine learning model for evaluation. 

from xgboost import XGBClassifier

from sklearn.metrics import classification_report

# Initialize and train the model

model = XGBClassifier(use_label_encoder=False, eval_metric="logloss")

model.fit(X_train, y_train)

# Predict and evaluate

y_pred = model.predict(X_test)

print("Classification Report:\n", classification_report(y_test, y_pred))

Step 4: Explain the Model with SHAP

SHAP (SHapley Additive exPlanations) helps us understand which features/factors were most important in predicting attrition.

import shap

# Initialize SHAP

shap.initjs()

# Explain model predictions

explainer = shap.Explainer(model)

shap_values = explainer(X_test)

# Summary plot

shap.summary_plot(shap_values, X_test)

Step 5: Visualise Key Relationships

We’ll dig deeper with SHAP dependence plots or seaborn visualisations of Attrition versus Over Time. 

import seaborn as sns

import matplotlib.pyplot as plt

# Visualizing Attrition vs OverTime

plt.figure(figsize=(8, 5))

sns.countplot(x='OverTime', hue="Attrition", data=df)

plt.title("Attrition vs OverTime")

plt.xlabel("OverTime")

plt.ylabel("Count")

plt.show()

Output:

SHAP Summary
SHAP plot showing important factors affecting attrition
Source: Research Gate

Now, let’s shift our focus to 5 business insights from the Data

Feature Insight
Over Time High overtime increases attrition
Job Satisfaction Higher satisfaction reduces attrition
Monthly Income Lower income may increase attrition
Years At Company Newer employees are more likely to leave
Work Life Balance Poor balance = higher attrition

However, out of 5 insights, there are 3 key insights from the SHAP-based approach IBM dataset that the companies and HR departments should be paying attention to actively. 

3 Key Insights of the IBM SHAP approach:

  1. Employees working overtime are more likely to leave.
  2. Low job and environment satisfaction increase the risk of attrition.
  3. Monthly income also has an effect, but less than OverTime and job satisfaction.

So, the HR departments can use the insights that are mentioned above to find better solutions.

Revising Plans

Now that we know what matters, HR can follow these 4 solutions to guide HR policies. 

  1. Revisit compensation plans

Employees have families to feed, bills to pay, and a lifestyle to carry on. If companies don’t revisit their compensation plans, they are most likely to lose their employees and face a competitive disadvantage for their businesses.

  1. Reduce overtime or offer incentives

Sometimes, work can wait, but stressors cannot. Why? Because overtime is not equal to incentives. Tense shoulders but no incentive give birth to several kinds of insecurities and health issues.

  1. Improve job satisfaction through feedback from the employees themselves

Feedback is not just something to be carried forward on, but it is an unignorable implementation loop/guide of what the future should look like. If employee attrition is a problem, then employees are the solution. Asking helps, assuming erodes.

  1. Carry forward a better work-life balance notion

People join jobs not just because of societal pressure, but also to discover who they truly are and what their capabilities are. Finding a job that fits into these 2 objectives helps to boost their productivity; however over overutilizing skills can be counterproductive and counterintuitive for the companies. 

Therefore, this SHAP-based Approach Dataset is perfect for:

  • Attrition prediction
  • Workforce optimization
  • Explainable AI tutorials (SHAP/LIME)
  • Feature importance visualisations
  • HR analytics dashboards

Conclusion

Predicting employee attrition can help companies keep their best people and help to maximise profits. So, with machine learning and SHAP, the companies can see who might leave and why. The SHAP tool/approach helps HR take action before it’s too late. By using the SHAP approach, companies can create a backup/succession plan.

Frequently Asked Questions

Q1. What is SHAP?

A. SHAP explains how each feature affects a model’s prediction.

Q2. Is this model good for real companies?

A. Yes, with tuning and proper data, it can be useful in real settings.

Q3. Can I use other models?

A. Yes, you can use logistic regression, random forests, or others.

Q4. What are the top reasons employees leave?

A. Over time, low job satisfaction and poor work-life balance.

Q5. What can HR do with these insights?

A. HR can make better policies to retain employees.

Q6. Does SHAP work with all models?

A. It works best with tree-based models like XGBoost.

Q7. Can I explain a single prediction?

A. Yes, SHAP lets you visualise why one person might leave.

jyoti Makkar is a writer and an AI Generalist, recently co-founded a platform named WorkspaceTool.com to discover, compare, and select the best software for business needs.

Login to continue reading and enjoy expert-curated content.

We will be happy to hear your thoughts

Leave a reply

Som2ny Network
Logo
Compare items
  • Total (0)
Compare
0