
Highly skilled employees leave a company. This move happens so suddenly that employee attrition becomes an expensive and disruptive affair too hot to handle for the company. Why? It takes a lot of time and money to hire and train a complete outsider with the company’s nuances.
Looking at this scenario, a question always arises in your mind whenever your colleague leaves the office where you work.
“What if we could predict who might leave and understand why?”
But before assuming that employee attrition is a mere work disconnection, or that a better learning/growth opportunity is present somewhere. Then, you are somewhat incorrect in your assumptions.
So, whatever is happening in your office, you work, you see them going out more than coming in.
But if you don’t observe it in a pattern, then you are missing out on the whole point of employee attrition that is happening live in action in your office.
You wonder, ‘Do companies and their HR departments try to prevent valuable employees from leaving their jobs?’
Yes! Therefore, in this article, we’ll build a straightforward machine learning model to predict employee attrition, using a SHAP tool to explain the results so HR teams can take action based on the insights.
Understanding the Problem
In 2024, WorldMetrics released the Market Data Report, which clearly stated, 33% of employees leave their jobs because they don’t see opportunities for career development—that is, a third of departures are due to stagnant growth paths. Hence, out of 180 employees, 60 employees are resigning from their jobs in the company in a year. So, what is employee attrition? You might want to ask us.
- What is employee attrition?
Gartner provided insight and expert guidance to client enterprises worldwide for 45 years, defined employee attrition as ‘the gradual loss of employees when positions are not refilled, often due to voluntary resignations, retirements, or internal transfers.’
How does analytics help HR proactively address it?
The role of HR is extremely reliable and valuable for a company because HR is the only department that can work actively and directly on employee attrition analytics and human resources.
HR can use analytics to discover the root causes of employee attrition, identify historical employee data model patterns/demographics, and design targeted actions accordingly.
Now, what method/approach is helpful to HR? Any guesses? The answer is the SHAP approach. So, what is it?
What is the SHAP approach?
SHAP is a method and tool that is used to explain the Machine Learning (ML) model output.
It also adds the why of what made the employee voluntarily resign, which you will see in the article below.
But before that, you can install it via the pip terminal and the conda terminal.
!pip install shap
or
conda install -c conda-forge shap
IBM presented a dataset in 2017 called “IBM HR Analytics Employee Attrition & Performance” using the SHAP tool/method.
So, here is the Dataset Overview in brief that you can take a look at below,
Dataset Overview
We’ll use the IBM HR Analytics Employee Attrition dataset. It includes information about 1,400+ employees—things like age, salary, job role, and satisfaction scores to identify patterns by using the SHAP approach/tool..
Then, we will be using key columns:
- Attrition: Whether the employee left or stayed
- Over Time, Job Satisfaction, Monthly Income, Work Life Balance

Source: Kaggle
Thereafter, you should practically put the SHAP approach/tool into action to overcome employee attrition risk by following these 5 steps.

Step 1: Load and Explore the Data
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
# Load the dataset
df = pd.read_csv('WA_Fn-UseC_-HR-Employee-Attrition.csv')
# Basic exploration
print("Shape of dataset:", df.shape)
print("Attrition value counts:\n", df['Attrition'].value_counts())
Step 2: Preprocess the Data
Once the dataset is loaded, we’ll change text values into numbers and split the data into training and testing parts.
# Convert the target variable to binary
df['Attrition'] = df['Attrition'].map({'Yes': 1, 'No': 0})
# Encode all categorical features
label_enc = LabelEncoder()
categorical_cols = df.select_dtypes(include=['object']).columns
for col in categorical_cols:
df[col] = label_enc.fit_transform(df[col])
# Define features and target
X = df.drop('Attrition', axis=1)
y = df['Attrition']
# Split the dataset into training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Step 3: Build the Model
Now, we’ll use XGBoost, a fast and accurate machine learning model for evaluation.
from xgboost import XGBClassifier
from sklearn.metrics import classification_report
# Initialize and train the model
model = XGBClassifier(use_label_encoder=False, eval_metric="logloss")
model.fit(X_train, y_train)
# Predict and evaluate
y_pred = model.predict(X_test)
print("Classification Report:\n", classification_report(y_test, y_pred))
Step 4: Explain the Model with SHAP
SHAP (SHapley Additive exPlanations) helps us understand which features/factors were most important in predicting attrition.
import shap
# Initialize SHAP
shap.initjs()
# Explain model predictions
explainer = shap.Explainer(model)
shap_values = explainer(X_test)
# Summary plot
shap.summary_plot(shap_values, X_test)
Step 5: Visualise Key Relationships
We’ll dig deeper with SHAP dependence plots or seaborn visualisations of Attrition versus Over Time.
import seaborn as sns
import matplotlib.pyplot as plt
# Visualizing Attrition vs OverTime
plt.figure(figsize=(8, 5))
sns.countplot(x='OverTime', hue="Attrition", data=df)
plt.title("Attrition vs OverTime")
plt.xlabel("OverTime")
plt.ylabel("Count")
plt.show()
Output:

Source: Research Gate
Now, let’s shift our focus to 5 business insights from the Data
Feature | Insight |
---|---|
Over Time | High overtime increases attrition |
Job Satisfaction | Higher satisfaction reduces attrition |
Monthly Income | Lower income may increase attrition |
Years At Company | Newer employees are more likely to leave |
Work Life Balance | Poor balance = higher attrition |
However, out of 5 insights, there are 3 key insights from the SHAP-based approach IBM dataset that the companies and HR departments should be paying attention to actively.
3 Key Insights of the IBM SHAP approach:
- Employees working overtime are more likely to leave.
- Low job and environment satisfaction increase the risk of attrition.
- Monthly income also has an effect, but less than OverTime and job satisfaction.
So, the HR departments can use the insights that are mentioned above to find better solutions.
Revising Plans
Now that we know what matters, HR can follow these 4 solutions to guide HR policies.
- Revisit compensation plans
Employees have families to feed, bills to pay, and a lifestyle to carry on. If companies don’t revisit their compensation plans, they are most likely to lose their employees and face a competitive disadvantage for their businesses.
- Reduce overtime or offer incentives
Sometimes, work can wait, but stressors cannot. Why? Because overtime is not equal to incentives. Tense shoulders but no incentive give birth to several kinds of insecurities and health issues.
- Improve job satisfaction through feedback from the employees themselves
Feedback is not just something to be carried forward on, but it is an unignorable implementation loop/guide of what the future should look like. If employee attrition is a problem, then employees are the solution. Asking helps, assuming erodes.
- Carry forward a better work-life balance notion
People join jobs not just because of societal pressure, but also to discover who they truly are and what their capabilities are. Finding a job that fits into these 2 objectives helps to boost their productivity; however over overutilizing skills can be counterproductive and counterintuitive for the companies.
Therefore, this SHAP-based Approach Dataset is perfect for:
- Attrition prediction
- Workforce optimization
- Explainable AI tutorials (SHAP/LIME)
- Feature importance visualisations
- HR analytics dashboards
Conclusion
Predicting employee attrition can help companies keep their best people and help to maximise profits. So, with machine learning and SHAP, the companies can see who might leave and why. The SHAP tool/approach helps HR take action before it’s too late. By using the SHAP approach, companies can create a backup/succession plan.
Frequently Asked Questions
A. SHAP explains how each feature affects a model’s prediction.
A. Yes, with tuning and proper data, it can be useful in real settings.
A. Yes, you can use logistic regression, random forests, or others.
A. Over time, low job satisfaction and poor work-life balance.
A. HR can make better policies to retain employees.
A. It works best with tree-based models like XGBoost.
A. Yes, SHAP lets you visualise why one person might leave.
Login to continue reading and enjoy expert-curated content.