JobFitAI: Comprehensive Resume Analyzer Project

March 18, 2025

0 Views 0

SaveSavedRemoved 0

In today’s competitive job market, making your resume stand out is crucial. JobFitAI is an innovative solution designed to help both job seekers and recruiters by analyzing resumes and offering actionable feedback. Traditional keyword-based filtering methods can overlook crucial nuances in a candidate’s profile. To overcome these challenges, AI-powered systems can be leveraged to analyze resumes, extract key skills, and match them effectively with job descriptions.

Learning Objectives

Install all required libraries and configure your environment with DeepInfra API key.
Learn how to create a AI resume analyzer that processes both PDF and audio files.
Utilize DeepSeek-R1 via DeepInfra to extract relevant information from resumes.
Develop an interactive web app using Gradio for seamless user interaction.
Apply practical enhancements and troubleshoot common issues, adding significant value to your resume analyzer.

This article was published as a part of the Data Science Blogathon.

What is Deepseek R1

DeepSeek-R1 is an advanced open-source AI model designed for natural language processing (NLP) tasks. It is a transformer-based large language model (LLM) trained to understand and generate human-like text. DeepSeek-R1 can perform tasks such as text summarization, question answering, language translation, and more. Because it is open-source, developers can integrate it into various applications, fine-tune it for specific needs, and run it on their hardware without relying on proprietary systems. It is particularly useful for research, automation, and AI-driven applications.

Also Read: Decoding DeepSeek R1’s Advanced Reasoning Capabilities

Understanding Gradio

Gradio is a user-friendly Python library that helps developers create interactive web interfaces for machine learning models and other applications. With just a few lines of code, Gradio allows users to build shareable applications with input components (such as text boxes, sliders, and image uploads) and output displays (such as text, images, or audio). It is widely used for AI model demonstrations, quick prototyping, and user-friendly interfaces for non-technical users. Gradio also supports easy model deployment, allowing developers to share their applications via public links without requiring complex web development skills.

This guide presents JobFitAI, an end-to-end solution that extracts text, generates a detailed analysis, and provides feedback on how well the resume matches a given job description using cutting-edge technologies:

DeepSeek-R1: A powerful AI model that extracts key skills, experiences, education, and achievements from resume texts.
DeepInfra: Provides a robust OpenAI-compatible API interface that allows us to interact with AI models like DeepSeek-R1 in a seamless manner.
Gradio: A user-friendly framework that lets you build interactive web interfaces for machine learning applications quickly and easily.

Project Architecture

The JobFitAI project is built around a modular architecture, where each component plays a specific role in processing resumes. Below is an overview:

JobFitAI/ 
│── src/
│   ├── __pycache__/  (compiled Python files)
│   ├── analyzer.py
│   ├── audio_transcriber.py
│   ├── feedback_generator.py
│   ├── pdf_extractor.py
│   ├── resume_pipeline.py
│── .env  (environment variables)
│── .gitignore
│── app.py  (Gradio interface)
│── LICENSE
│── README.md
│── requirements.txt  (dependencies)

Setting Up the Environment

Before diving into the code, you need to set up your development environment.

Creating a Virtual Environment and Installing Dependencies

First, create a virtual environment in your project folder to manage your dependencies. Open your terminal and run:

python3 -m venv jobfitai
source jobfitai/bin/activate  # On macOS/Linux

python -m venv jobfitai
jobfitai\Scripts\activate # On Windows - cmd

Next, create a file named requirements.txt and add the following libraries:

requests 
whisper
PyPDF2
python-dotenv
openai
torch
torchvision
torchaudio
gradio

Install the dependencies by running:

pip install -r requirements.txt

Setting Up Environment Variables

The project requires an API token to interact with the DeepInfra API. Create a .env file in your project’s root directory and add your API token:

DEEPINFRA_TOKEN="your_deepinfra_api_token_here"

Make sure to replace your_deepinfra_api_token_here with the actual token provided by DeepInfra.

Learn to access the DeepInfra API key; here.

Project Walkthrough

The project is structured into several Python modules. In the following sections, we’ll understand the purpose of each file and its context in the project.

src/audio_transcriber.py

Resumes may not always be in text format. In cases where you receive an audio resume, the AudioTranscriber class comes into play. This file uses OpenAI’s Whisper model to transcribe audio files into text. The transcription is then used by the analyzer to extract resume details.

import whisper

class AudioTranscriber:
    """Transcribe audio files using OpenAI Whisper."""
    def __init__(self, model_size: str = "base"):
        """
        Initializes the Whisper model for transcription.
        
        Args:
            model_size (str): The size of the Whisper model to load. Defaults to "base".
        """
        self.model_size = model_size 
        self.model = whisper.load_model(self.model_size)

    def transcribe(self, audio_path: str) -> str:
        """
        Transcribes the given audio file and returns the text.
        
        Args:
            audio_path (str): The path to the audio file to be transcribed.
        
        Returns:
            str: The transcribed text.
        
        Raises:
            Exception: If transcription fails.
        """
        try:
            result = self.model.transcribe(audio_path)
            return result["text"]
        except Exception as e:
            print(f"Error transcribing audio: {e}")
            return ""

Most resumes are available in PDF format. The PDFExtractor class is responsible for extracting text from PDF files using the PyPDF2 library. This module loops through all pages of a PDF document, extracts the text, and compiles it into a single string for further analysis.

import PyPDF2

class PDFExtractor:
    """Extract text from PDF files using PyPDF2."""

    def __init__(self):
        """Initialize the PDFExtractor."""
        pass

    def extract_text(self, pdf_path: str) -> str:
        """
        Extract text content from a given PDF file.

        Args:
            pdf_path (str): Path to the PDF file.

        Returns:
            str: Extracted text from the PDF.

        Raises:
            FileNotFoundError: If the file does not exist.
            Exception: For other unexpected errors.
        """
        text = ""
        try:
            with open(pdf_path, "rb") as file:
                reader = PyPDF2.PdfReader(file)
                for page in reader.pages:
                    page_text = page.extract_text()
                    if page_text:
                        text += page_text + "\n"
        except FileNotFoundError:
            print(f"Error: The file '{pdf_path}' was not found.")
        except Exception as e:
            print(f"An error occurred while extracting text: {e}")
        
        return text

src/resume_pipeline.py

The ResumePipeline module acts as the orchestrator for processing resumes. It integrates both the PDF extractor and the audio transcriber. Based on the file type provided by the user, it directs the resume to the correct processor and returns the extracted text. This modular design allows for easy expansion if additional resume formats need to be supported in the future.

from src.pdf_extractor import PDFExtractor
from src.audio_transcriber import AudioTranscriber

class ResumePipeline:
    """
    Process resume files (PDF or audio) and return extracted text.
    """

    def __init__(self):
        """Initialize the ResumePipeline with PDFExtractor and AudioTranscriber."""
        self.pdf_extractor = PDFExtractor()
        self.audio_transcriber = AudioTranscriber()

    def process_resume(self, file_path: str, file_type: str) -> str:
        """
        Process a resume file and extract text based on its type.

        Args:
            file_path (str): Path to the resume file.
            file_type (str): Type of the file ('pdf' or 'audio').

        Returns:
            str: Extracted text from the resume.

        Raises:
            ValueError: If the file type is unsupported.
            FileNotFoundError: If the specified file does not exist.
            Exception: For other unexpected errors.
        """
        try:
            file_type_lower = file_type.lower()
            if file_type_lower == "pdf":
                return self.pdf_extractor.extract_text(file_path)
            elif file_type_lower in ["audio", "wav", "mp3"]:
                return self.audio_transcriber.transcribe(file_path)
            else:
                raise ValueError("Unsupported file type. Use 'pdf' or 'audio'.")
        except FileNotFoundError:
            print(f"Error: The file '{file_path}' was not found.")
            return ""
        except ValueError as ve:
            print(f"Error: {ve}")
            return ""
        except Exception as e:
            print(f"An unexpected error occurred: {e}")
            return ""

src/analyzer.py

This module is the backbone of the resume analyzer. It initializes the connection to DeepInfra’s API using the DeepSeek-R1 model. The main function in this file is analyze_text, which takes resume text as input and returns analysis summarizing key details from the resume. This file ensures that our resume text is processed by an AI model tailored for resume analysis.

import os
from openai import OpenAI 
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

class DeepInfraAnalyzer:
    """
    Calls DeepSeek-R1 model on DeepInfra using an OpenAI-compatible interface.
    This class processes resume text and extracts structured information using AI.
    """ 
    def __init__(
        self,
        api_key: str= os.getenv("DEEPINFRA_TOKEN"),
        model_name: str = "deepseek-ai/DeepSeek-R1"
    ):
        """
        Initializes the DeepInfraAnalyzer with API key and model name.

        :param api_key: API key for authentication 
        :param model_name: The name of the model to use 
        """
        try:
            self.openai_client = OpenAI(
                api_key=api_key, 
                base_url="https://api.deepinfra.com/v1/openai",
            )
            self.model_name = model_name 
        except Exception as e:
            raise RuntimeError(f"Failed to initialize OpenAI client: {e}")
    

    def analyze_text(self, text: str) -> str:
        """
        Processes the given resume text and extracts key information in JSON format.
        The response will contain structured details about key skills, experience, education, etc.

        :param text: The resume text to analyze
        :return: JSON string with structured resume analysis
        """
        prompt = (
            "You are an AI job resume matcher assistant. "
            "DO NOT show your chain of thought. "
            "Respond ONLY in English. "
            "Extract the key skills, experiences, education, achievements, etc. from the following resume text. "
            "Then produce the final output as a well-structured JSON with a top-level key called \"analysis\". "
            "Inside \"analysis\", you can have subkeys like \"key_skills\", \"experiences\", \"education\", etc. "
            "Return ONLY the final JSON, with no extra commentary.\n\n"
            f"Resume Text:\n{text}\n\n"
            "Required Format (example):\n"
            "```\n"
            "{\n"
            "  \"analysis\": {\n"
            "    \"key_skills\": [...],\n"
            "    \"experiences\": [...],\n"
            "    \"education\": [...],\n"
            "    \"achievements\": [...],\n"
            "    ...\n"
            "  }\n"
            "}\n"
            "```\n"
        ) 
        try:
            response = self.openai_client.chat.completions.create(
                model=self.model_name,
                messages=[{"role": "user", "content": prompt}], 
            )
            return response.choices[0].message.content
        except Exception as e:
            raise RuntimeError(f"Error processing resume text: {e}")

src/feedback_generator.py

After extracting details from the resume, the next step is to compare the resume against a specific job description. The FeedbackGenerator module takes the analysis from the resume and provides a match score along with recommendations for improvement. This module is crucial for job seekers aiming to refine their resumes to better align with job descriptions, increasing their chances of passing through ATS systems.

from src.analyzer import DeepInfraAnalyzer 

class FeedbackGenerator:
    """
    Generates feedback for resume improvement based on a job description 
    using the DeepInfraAnalyzer.
    """

    def __init__(self, analyzer: DeepInfraAnalyzer):
        """
        Initializes the FeedbackGenerator with an instance of DeepInfraAnalyzer.

        Args:
            analyzer (DeepInfraAnalyzer): An instance of the DeepInfraAnalyzer class.
        """
        self.analyzer = analyzer 

    def generate_feedback(self, resume_text: str, job_description: str) -> str:
        """
        Generates feedback on how well a resume aligns with a job description.

        Args:
            resume_text (str): The extracted text from the resume.
            job_description (str): The job posting or job description.

        Returns:
            str: A JSON-formatted response containing:
                - "match_score" (int): A score from 0-100 indicating job match quality.
                - "job_alignment" (dict): Categorization of strong and weak matches.
                - "missing_skills" (list): Skills missing from the resume.
                - "recommendations" (list): Actionable suggestions for improvement.

        Raises:
            Exception: If an unexpected error occurs during analysis.
        """
        try:
            prompt = (
                "You are an AI job resume matcher assistant. "
                "DO NOT show your chain of thought. "
                "Respond ONLY in English. "
                "Compare the following resume text with the job description. "
                "Calculate a match score (0-100) for how well the resume matches. "
                "Identify keywords from the job description that are missing in the resume. "
                "Provide bullet-point recommendations to improve the resume for better alignment.\n\n"
                f"Resume Text:\n{resume_text}\n\n"
                f"Job Description:\n{job_description}\n\n"
                "Return JSON ONLY in this format:\n"
                "{\n"
                "  \"job_match\": {\n"
                "    \"match_score\": ,\n"
                "    \"job_alignment\": {\n"
                "      \"strong_match\": [...],\n"
                "      \"weak_match\": [...]\n"
                "    },\n"
                "    \"missing_skills\": [...],\n"
                "    \"recommendations\": [\n"
                "      \"\",\n"
                "      \"\",\n"
                "      ...\n"
                "    ]\n"
                "  }\n"
                "}"
            ) 
            return self.analyzer.analyze_text(prompt)

        except Exception as e:
            print(f"Error in generating feedback: {e}")
            return "{}"  # Returning an empty JSON string in case of failure

app.py

The app.py file is the main entry point of the JobFitAI project. It integrates all the modules described above and builds an interactive web interface using Gradio. Users can upload a resume/CV file (PDF or audio) and input a job description. The application then processes the resume, runs the analysis, generates feedback, and returns a structured JSON response with both the analysis and recommendations.

import os
from dotenv import load_dotenv
load_dotenv()

import gradio as gr 
from src.resume_pipeline import ResumePipeline
from src.analyzer import DeepInfraAnalyzer
from src.feedback_generator import FeedbackGenerator

# Pipeline for PDF/audio
resume_pipeline = ResumePipeline()

# Initialize the DeepInfra analyzer   
analyzer = DeepInfraAnalyzer()

# Feedback generator
feedback_generator = FeedbackGenerator(analyzer) 
 
def analyze_resume(resume_path, job_desc):
    """
    Gradio callback function to analyze a resume against a job description.

    Args:
        resume_path (str): Path to the uploaded resume file (PDF or audio).
        job_desc (str): The job description text for comparison.
    
    """ 
    try:
        if not resume_path or not job_desc:
            return {"error": "Please upload a resume and enter a job description."}

        # Determine file type from extension
        lower_name = resume_path.lower()
        file_type = "pdf" if lower_name.endswith(".pdf") else "audio"

        # Extract text from the resume
        resume_text = resume_pipeline.process_resume(resume_path, file_type)

        # Analyze extracted text
        analysis_result = analyzer.analyze_text(resume_text)

        # Generate feedback and recommendations
        feedback = feedback_generator.generate_feedback(resume_text, job_desc)

        # Return structured response
        return {
            "analysis": analysis_result,
            "recommendations": feedback
        }
    except ValueError as e:
        return {"error": f"Unsupported file type or processing error: {str(e)}"}
    except Exception as e:
        return {"error": f"An unexpected error occurred: {str(e)}"}
    
# Define Gradio interface
demo = gr.Interface(
    fn=analyze_resume,
    inputs=[
        gr.File(label="Resume (PDF/Audio)", type="filepath"),
        gr.Textbox(lines=5, label="Job Description"),
    ],
    outputs="json",
    title="JobFitAI: AI Resume Analyzer",
    description="""
Upload your resume/cv (PDF or audio) and paste the job description to get a match score,
missing keywords, and actionable recommendations.""",
)

if __name__ == "__main__": 
    demo.launch(server_name="0.0.0.0", server_port=8000)

Running the Application with Gradio

After setting up your environment and reviewing all code components, you’re ready to run the application.

Start the Application: In your terminal, navigate to your project directory and execute the below code

python app.py

This command will launch the Gradio interface locally. Open the provided URL in your browser to see the interactive resume analyzer.
Test the JobFitAI:
- Upload a Resume/CV: Select a PDF file or an audio file containing a recorded resume.
- Enter a Job Description: Paste or type in a job description
- Review the Output: The system will display a JSON response that includes both a detailed analysis of the resume, matching score, missing keywords and feedback with suggestions for improvement.

You can find all the code files in Github repo – here.

Use Cases and Practical Applications

The JobFitAI resume analyzer can be applied in various real-world scenarios:

Improving Resume Quality

Self-Assessment: Candidates can use the tool to self-assess their resumes before applying. By understanding the match score and the areas that need improvement, they can better tailor their resumes for specific roles.
Feedback Loop: The structured JSON feedback generated by the tool can be integrated into career counseling platforms, providing personalized resume improvement recommendations.

Educational and Training Applications

Career Workshops: Educational institutions and career coaching platforms can incorporate JobFitAI into their curriculum. It serves as a practical demonstration of how AI can be used to enhance career readiness.
Coding and AI Projects: Aspiring data scientists and developers can learn about integrating multiple AI services (such as transcription, PDF extraction, and natural language processing) into a cohesive project.

Troubleshooting and Extensions

Let us now explore troubleshooting and extensions below-

Common Issues and Solutions

API Token Issues: If the DeepInfra API token is missing or incorrect, the analyzer module will fail. Always verify that your .env file contains the correct token and that the token is active.
Unsupported File Types: The application currently supports only PDF and audio formats. If you attempt to upload another file type (such as DOCX), the system will raise an error. Future extensions can include support for additional formats.
Transcription Delays: Audio transcription can sometimes take longer, especially for larger files. Consider using a higher-specification machine or a cloud-based solution if you plan on processing many audio resumes.

Ideas for Further Development

Support More File Formats: Extend the resume pipeline to support additional file types like DOCX or plain text.
Enhanced Feedback Mechanism: Integrate more sophisticated natural language processing models to provide richer, more nuanced feedback beyond the basic match score.
User Authentication: Implement user authentication to allow job seekers to save their analysis and track improvements over time.
Dashboard Integration: Build a dashboard where recruiters can manage and compare resume analyses across multiple candidates.
Performance Optimization: Optimize the audio transcription and PDF extraction processes for faster analysis on large-scale datasets.

Conclusion

The JobFitAI resume analyzer is a robust, multi-functional tool that leverages state-of-the-art AI models to bridge the gap between resumes and job descriptions. By integrating DeepSeek-R1 via DeepInfra, along with transcription and PDF extraction capabilities, you now have a complete solution to automatically analyze resumes and generate feedback for improved job alignment.

This guide provided a comprehensive walk-through—from setting up the environment to understanding each module’s role and finally running the interactive Gradio interface. Whether you’re a developer looking to expand your portfolio, an HR professional wanting to streamline candidate screening, or a job seeker aiming to enhance your resume, the JobFitAI project offers practical insights and an excellent starting point for further exploration.

Embrace the power of AI, experiment with new features, and continue refining the project to suit your needs. The future of job applications is here, and it’s smarter than ever!

Key Takeaways

JobFitAI leverages DeepSeek-R1 and DeepInfra to extract skills, experiences, and achievements from resumes for better job matching.
The system supports both PDF and audio resumes, using PyPDF2 for text extraction and Whisper for audio transcription.
Gradio enables a seamless, user-friendly web interface for real-time resume analysis and feedback.
The project uses a modular architecture and environment setup with API keys for smooth integration and scalability.
Developers can fine-tune DeepSeek-R1, troubleshoot issues, and expand functionality for more robust AI-driven resume screening.

Frequently Asked Questions

Q1: What types of resumes does JobFitAI support?

A: The current version supports resumes in PDF and audio formats. Future updates may include support for additional formats such as DOCX or plain text.

Q2: Is DeepInfra API free?

A: No, accessing the DeepSeek-R1 model through the DeepInfra API requires a paid plan. For detailed pricing information, please visit DeepInfra’s official page.

Q3: Can I customize the feedback provided by the analyzer?

A: Yes! You can adjust the prompt or integrate additional models to tailor the feedback to your specific requirements.

Q4: What should I do if I encounter issues with audio transcription?

A: Audio transcription may sometimes be delayed, especially for larger files. Verify that your environment meets the necessary computational requirements, and consider optimizing the transcription process or using cloud-based resources if needed.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Hello! I’m a passionate AI and Machine Learning enthusiast currently exploring the exciting realms of Deep Learning, MLOps, and Generative AI. I enjoy diving into new projects and uncovering innovative techniques that push the boundaries of technology. I’ll be sharing guides, tutorials, and project insights based on my own experiences, so we can learn and grow together. Join me on this journey as we explore, experiment, and build amazing solutions in the world of AI and beyond!