A Multiagent System (MAS) is a distributed system comprised of multiple intelligent agents that interact and collaborate to achieve individual and collective goals. These agents, which can be software programs, robots, or even humans, operate autonomously but engage in communication and coordination to solve complex problems that a single agent might struggle with. Key characteristics of MAS include autonomy, decentralized control, and the ability to adapt to dynamic environments, making them suitable for a wide range of applications across various industries. In terms of generation of description of images automatically for listing on ecommerce websites, multi agentic systems can be deployed that can take as input the images of the items and generate descriptions that are crafted for influencing the customers to buy.
Learning Objectives
- Understand the role of Multiagent Systems (MAS) in automating complex tasks using image analysis capabilities.
- Explore CrewAI and its features for creating multi-agent AI systems with image processing functionalities.
- Learn how agentic AI enhances e-commerce by automatically generating product descriptions from images.
- Implement a hands-on Python-based multi-agent system using CrewAI for automated e-commerce listing generation.
- Analyze real-world applications of AI-driven image analysis in industries like healthcare, manufacturing, and retail.
This article was published as a part of the Data Science Blogathon.
Image Analysis Capabilities of Agentic AI
Agentic AI systems equipped with image analysis capabilities can perform several critical functions:
- Real-Time Analysis: These systems can analyze vast amounts of visual data in real-time, enhancing operational efficiency across various sectors like healthcare, manufacturing, and retail.
- Enhanced Accuracy: With recognition rates exceeding 95%, agentic AI can significantly reduce false positives in image recognition tasks, leading to more reliable outcomes.
- Automated Decision-Making: By integrating image analysis into their workflows, these systems can automate complex tasks such as medical diagnostics or surveillance without human intervention
Applications of Agentic AI in Image Analysis
Agentic AI systems with image analysis capabilities are transforming numerous fields:
- Healthcare: In medical diagnostics, they assist in evaluating imaging data, detecting patterns, and suggesting diagnoses based on historical cases
- Manufacturing: These systems drive predictive maintenance and quality control by continuously monitoring equipment through visual data analysis
- Retail: They enhance visual search functionalities and inventory management by categorizing and indexing images efficiently
- E-commerce Listings. Generating Descriptions for items from their images can be automated end to end using these Agentic AI systems.
Crew AI for Multi-Agent Image Analysis
CrewAI is an innovative platform founded in 2023 and based in São Paulo, Brazil, that specializes in developing multi-agent systems for artificial intelligence applications. The platform enables enterprises to create, deploy, and manage teams of autonomous AI agents, referred to as “Crews,” which collaborate to accomplish complex tasks by leveraging their specific roles and expertise.
Key Features of CrewAI
- Multi-Agent Orchestration: CrewAI allows users to chain together multiple task-specific AI agents that can communicate, delegate tasks, and automate workflows, enhancing operational efficiency across various industries
- Role Specialization: Each agent within a Crew has defined roles and responsibilities, similar to how departments function in a traditional organization. This structure facilitates seamless collaboration and effective task execution
- Open-Source Framework: Launched as an open-source project in late 2023, CrewAI has garnered significant interest from developers, amassing over 20,000 stars on GitHub and building a robust community around its framework
- Enterprise Cloud Offering: Recently, CrewAI introduced its Enterprise Cloud solution, which serves as a centralized platform for managing complex AI workloads and multi-agent systems. This offering allows teams to build cloud-agnostic applications that can automate both simple and complex workflows
Crew AI with Image Analysis Capabilities
The Vision Tool of CrewAI is a specialized feature designed to enhance the capabilities of AI agents by enabling them to extract text from images. This tool significantly expands the functionality of agents, allowing them to process visual information and integrate it into their workflows.
The primary function of the Vision Tool is to extract text from images. Users can provide either a URL or a file path to the image, which the agent will analyze to retrieve textual data. You can easily integrate the Vision Tool into AI agents within CrewAI. For instance, when you configure an agent to use the Vision Tool, it automatically handles tasks that require reading and interpreting text from visual content.
The Vision Tool can be applied in various scenarios, including:
- Document Processing: Automating the extraction of information from scanned documents or images containing text.
- Data Entry Automation: Reducing manual data entry by extracting relevant information from invoices or receipts.
- Content Generation: Assisting in content creation by pulling text from images for further analysis or reporting.
Multi-Agent System for Automated E-Commerce Descriptions
In the following tutorial, we will focus on creating a framework using Crew AI where multiple AI agents will collaborate to analyze product images and generate descriptive content. This system can significantly enhance e-commerce efficiency by automating the process of item description creation, ensuring accurate and engaging listings that improve customer experience and drive sales.
Step 1: Installation of Necessary Libraries
Install Crew AI and required dependencies to set up the multi-agent framework for image analysis and description generation.
!pip install crewai crewai-tools poetry
!pip install langchain_openai
Step 2: Importing Necessary Libraries & Defining OpenAI API key
Import essential libraries like Crew AI, LangChain, and VisionTool, then configure the OpenAI API key for accessing AI models.
from langchain_openai import ChatOpenAI
from crewai import Agent, Crew, Process, Task
from crewai.project import CrewBase, agent, crew, task
from crewai_tools import VisionTool
import os
from crewai import Agent, Task, Crew, Process
os.environ['OPENAI_API_KEY'] =''
Step 3: Defining OpenAI models For Image Analysis and Description Generation
Use gpt-4o-mini
for image analysis and gpt-3.5-turbo-16k
for generating detailed item descriptions.
os.environ["OPENAI_MODEL_NAME"] = "gpt-4o-mini"
llm = ChatOpenAI(
model="gpt-3.5-turbo-16k",
temperature=0.1,
max_tokens=8000
We will be using the gpt-4o-mini model here for analysis of images here which will be used in the VisionTool() in Crew AI. For generation of image descriptions, we will be using the gpt-3.5-turbo-16k model.
Step 4: Defining Image Analysis Agent and Associated Task
Create an AI agent specialized in extracting product names and descriptions from images using VisionTool.
#Defining the URL
image_url = "https://encrypted-tbn3.gstatic.com/shopping?q=tbn:ANd9GcSlQOjwALxoeKvkmVVCX3F6nBo5rs_ssO9Ks4g6C-ygjLTjnvIZ3QDLqIomYlP77vUiABsGZ_XjA0agwiLervudXrXowDvM8xiHTL9ZJ6s&usqp=CAE"
vision_tool = VisionTool()
image_text_extractor = Agent(
role="Item Name & Description Extraction Specialist",
goal="Extract NAME OF ITEM PRESENT ALONG WITH THEIR DESCRIPTION from images efficiently using AI-powered tools. You should get ITEM NAMES from %s"%image_url,
backstory='You are an expert in NAME OF ITEM PRESENT ALONG WITH THEIR DESCRIPTION extraction, specializing in using AI to process. Make sure you use the tools provided.',
tools=[vision_tool],allow_delegation=False,verbose=True)
def text_extraction_task(agent):
return Task(
description = """Extract NAME OF ITEM PRESENT ALONG WITH THEIR DESCRIPTION from the provided image file. Ensure that the ITEM NAME & DESCRIPTION is accurate and complete,
and ready for any further analysis or processing tasks. The image file provided may contain
various products of Different BRANDS, so it's crucial to capture all readable text. """,
agent = agent,
expected_output = "A string containing NAME OF ITEM PRESENT ALONG WITH THEIR DESCRIPTION extracted from the image.",
max_iter=1
)
We will be first using the following image of item and generating its description
Step 5: Defining Image Description Generator Agent and Associated Task
Develop an AI agent to craft compelling product descriptions based on extracted details for e-commerce listings.
description_generator = Agent(
role="Crafting Specialist",
goal="From the item names & description extracted from the previous agent, craft a good description of the PRODUCT (not any PERSON) highlighting all its key features for displaying on a website",
backstory='You are an expert in crafting good descriptions for displaying on websites',
llm=llm,allow_delegation=False,verbose=True)
def description_generator_task(agent):
return Task(
description = "From the item names & description extracted from the previous agent, craft a good description of the PRODUCT (not any PERSON) highlighting all its key features for displaying on a website",
agent = agent,
expected_output = "A string containing a good description of the product.",
max_iter=1)
Step 6: Defining Image Title Generator Agent and Associated Task
Implement an agent to generate concise, engaging product titles (max 3 words) for better visibility in e-commerce platforms.
title_generator = Agent(
role="Item Title Specialist",
goal="From the item description crafted from the previous agent, craft a good title for the PRODUCT (not any PERSON) in maximum 3 words for displaying on a ecommerce website",
backstory='You are an expert in creating eye catching titles for displaying on websites',
llm=llm,allow_delegation=False,verbose=True)
def title_generator_task(agent):
return Task(
description = "From the item description crafted from the previous agent, ADD to the Description of the Product generated from previous agent A GOOD TITLE for the PRODUCT (not any PERSON) in maximum 3 words for displaying on a ecommerce website. Output should be Description of the Product generated from previous agent along with the Title",
agent = agent,
expected_output = "Output should be Description of the Product generated from previous agent along with the Title",
max_iter=1)
Step 7: Executing The Crew
Set up and run the multi-agent system in a sequential process where each task builds upon the previous one to generate structured e-commerce product descriptions.
task1 = text_extraction_task(image_text_extractor)
task2 = description_generator_task(description_generator)
task3 = title_generator_task(title_generator)
#start crew
targetting_crew = Crew(
agents=[image_text_extractor,description_generator,title_generator],
tasks=[task1,task2,task3],
verbose=True,
process=Process.sequential # Sequential process will have tasks executed one after the other and the outcome of the previous one is passed as extra content into this next.
)
targetting_result = targetting_crew.kickoff()
Output
Title: "Elegant Timepiece" Description: Introducing the Daniel Wellington Classic Petite Melrose, a stunning women's watch that effortlessly combines style and sophistication. This timepiece features a round brown dial, adorned with elegant gold-tone hands and markers, creating a striking contrast that catches the eye. The watch is beautifully complemented by a rose gold metallic bracelet, adding a touch of luxury to any outfit. Designed with precision and attention to detail, the Classic Petite Melrose is not only a fashion statement but also a reliable timekeeping companion. Its high-quality craftsmanship ensures durability and longevity, making it a timeless investment piece. The round brown dial serves as the perfect backdrop for the gold- tone hands and markers, allowing for easy readability at a glance. Whether you're attending a formal event or going about your daily routine, this watch effortlessly transitions from day to night, adding a touch of elegance to any occasion. The rose gold metallic bracelet adds a touch of glamour and sophistication to the overall design. Its sleek and slim profile ensures a comfortable fit on the wrist, while the secure clasp provides peace of mind during wear. With its classic yet contemporary design, the Daniel Wellington Classic Petite Melrose is a versatile accessory that can be paired with any outfit. Whether you're dressing up for a special occasion or simply want to elevate your everyday style, this watch is the perfect choice. Invest in timeless elegance and impeccable craftsmanship with the Daniel Wellington Classic Petite Melrose. Add this exquisite women's watch to your collection and make a statement wherever you go.
Let us now check the result for this image:
Title: "Pastel Chic Sneakers" Description: Introducing our stylish sneakers with a chunky white sole and a mix of pastel colors. These sneakers are designed to make a statement with their modern and trendy look, perfect for casual wear. The combination of mint green, peach, and gold accents adds a touch of elegance and sophistication to these sneakers. The chunky white sole not only provides comfort but also adds a fashionable touch to the overall design. Whether you're going for a walk in the park or meeting friends for a coffee, these sneakers will elevate your style and keep you looking effortlessly cool. Don't miss out on these must-have sneakers that effortlessly blend fashion and comfort.
Conclusion
Multi-Agent Systems (MAS) represent a powerful approach to solving complex problems through the collaboration of autonomous agents. By leveraging their unique capabilities, these systems can significantly enhance operational efficiency across various sectors. CrewAI stands out as an innovative platform that facilitates the development of such multi-agent systems, enabling organizations to harness the full potential of agentic AI. With features like multi-agent orchestration, role specialization, and an open-source framework, CrewAI empowers users to automate complex workflows effectively.
The integration of image analysis capabilities further enriches these systems, allowing for real-time data processing and automated decision-making. CrewAI transforms how businesses operate, as demonstrated through various applications—from content creation to customer support—by making processes more efficient and enhancing overall customer experiences in the digital marketplace.
Key Takeaways
- MAS involves multiple autonomous agents that communicate and coordinate to achieve individual and collective goals, making them ideal for tackling complex tasks, such as generating product descriptions from images for e-commerce listings.
- Agentic AI systems equipped with image analysis can perform real-time analysis with high accuracy, significantly enhancing fields like healthcare, manufacturing, retail, and e-commerce by automating tasks like medical diagnostics, quality control, and inventory management.
- CrewAI, founded in 2023, enables the creation and management of multi-agent systems, where AI agents, or “Crews,” work together to complete tasks. It offers an open-source framework and cloud-based platform, making it easier for enterprises to automate complex workflows and AI tasks.
- CrewAI’s Vision Tool allows agents to extract text from images, broadening the potential applications of these systems. It can automate tasks like document processing, data entry, and content generation, saving time and improving workflow efficiency.
- In e-commerce, multi-agent systems can automatically generate product descriptions from images. This streamlines the process and enhances the customer experience by providing detailed and appealing product listings without manual intervention.
Frequently Asked Questions
A. A Multi-Agent System (MAS) is a distributed system with multiple intelligent agents. These agents interact and collaborate to achieve individual and shared goals. They can be software programs, robots, or humans. Each agent operates autonomously while communicating and coordinating to solve complex problems.
A. Agentic AI systems with image analysis capabilities can perform real-time analysis of large volumes of visual data. They can achieve recognition rates exceeding 95% for accuracy and automate decision-making processes. This allows them to effectively handle tasks in various sectors, such as healthcare, manufacturing, and retail.
A. CrewAI is an innovative platform founded in 2023 that specializes in developing multi-agent systems for AI applications. Key features include multi-agent orchestration for task delegation and role specialization for effective collaboration. It also offers an open-source framework with strong community interest and an Enterprise Cloud solution for managing complex AI workloads.
A. The Vision Tool in CrewAI enables AI agents to extract text from images by analyzing provided URLs or file paths. This tool enhances the agents’ ability to process visual information. It can be applied in scenarios like document processing, data entry automation, and content generation by extracting text from images for further analysis or reporting.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.