Exploring Image Background Removal Using RMGB v2.0


Image segmentation models have brought ways to complete tasks in various dimensions. The open-source space has overseen different computer vision tasks and their applications. Background removal is another image segmentation task that models have continued to explore over the years. 

Bria’s RMGB v2.0 is a state-of-the-art model that performs background removal with great precision and accuracy. This model is an improvement from the older RMGB 1.4 version. This open-source model comes with accuracy, efficiency, and versatility across different benchmarks. 

This model has applications in various fields, from gaming to stock image generation. Its capabilities can also be associated with its training data and architecture, allowing it to operate in various contexts.

Learning Objectives 

  • Understand the capabilities and advancements of BraiAI’s RMGB v2.0 model.
  • Explore the model architecture and how BiRefNet enhances background removal.
  • Learn how to set up and run RMGB v2.0 for image segmentation tasks.
  • Discover real-world applications of RMGB v2.0 in gaming, e-commerce, and advertising.
  • Analyze the performance improvements over RMGB v1.4 in edge detection and accuracy.

This article was published as a part of the Data Science Blogathon.

How Does RGMB Work?

This model has a simple working principle. It takes images as input(in various formats, such as Jpeg, PNG, etc.). After processing the images, the models provide an output of a segmented image area, removing the background or foreground. 

RGMB can also provide a mask to process the image further or add a new background. 

Performance Benchmark of RGMB v2.0

This model’s performance beats its predecessor—-the RGMB v1.4 — with performance and accuracy. Results from testing a few images highlighted how the v2.0 presented a cleaner background. 

Although the earlier version performed well, RGMB v2.0 sets a new standard for understanding complex scenes and details on the edges while improving background removal in general.

Check out this link to test the earlier version with the latest can be found here.

Model Architecture of RGMB v2.0

Developed by BRAI AI, RMGB is based on the BiRefNet mechanism. This framework is an architecture that allows high-resolution tasks involving image-background separation. 

Model Architecture of RGMB v2.0

This approach combines the representation complementary representation from two sources within a high-resolution restoration model. This method combines overall scene understanding (general localization) with detailed edge information(local), allowing for clear and precise boundary detection.

RGMB v2.0 uses a two-stage model to leverage the BiRefNet architecture: the Localization and restoration modules. 

The localization module generates the general semantic map representing the image’s primary areas. This component ensures that the model accurately represents the image’s structure. With this framework, the model can identify where the location of objects in the image while considering the background. 

On the other hand, the restoration module helps with the restoration boundaries of the object in the image. It performs this process in high resolution, compared to the first stage, where the semantic map generation is done in a lower resolution. 

The restoration module has two phases: the original reference, a pixel map of the original image, provides background context. The second phase is the gradient reference, which provides the details of the fine edges. The gradient reference can also help with accuracy by giving context to images with sharp boundaries and complex colors. 

This approach yields excellent results in object separation, especially in high-resolution images. The BriRefNet architecture and the model training dataset can provide the best results on various benchmarks.  

How to Run This Model

You can run inference on this model even in low-resource environments. You can completely perform an accurate separation by working with a simple background image.

Let’s dive into how we can run the RGMB v2.0 model;

Step 1: Preparing the Environment

pip install kornia

Installing Konia is relevant for this task as it is a Python library essential for various computer vision models. Konia is a differentiable computer vision task built on PyTorch that provides functionalities for image processing, geometric transformations, filtering, and deep learning applications. 

Step 2: Importing Necessary Libraries

 from PIL import Image
import matplotlib.pyplot as plt
import torch
from torchvision import transforms
from transformers import AutoModelForImageSegmentation

These libraries are all essential to running this model. ‘PIL’ always comes in handy for image processing tasks like loading and opening images, while ‘matpotlib’ is great for displaying images and drawing graphs. 

The ‘torch’ transforms the images into a format compatible with deep learning models. Finally, we use ‘AutoModelForIMageSegmentation’, which allows us to use the pre-trained model for image segmentation. 

Step 3: Loading the pre-trained Model

model = AutoModelForImageSegmentation.from_pretrained('briaai/RMBG-2.0', trust_remote_code=True)
torch.set_float32_matmul_precision(['high', 'highest'][0])
model.to('cuda')
model.eval()

This code loads the pre-trained model for background removal, then applies the ‘trust_remote_code=True’ as it allows the execution of custom Python code. The next line optimizes the performance using matrix multiplications. 

Finally, we move the model to use available GPU and prepare it for inference. 

Step 4: Image Preprocessing

This code defines the image processing stage by resizing the image to 1024 x 1024 and converting it to tensors. So, we have the pixel values in mean and standard deviation. 

The ‘transform.compose’ function helps process the input image operation in a chain-like transformation to ensure that it is processed uniformly. This step also keeps the pixel values in a consistent range.

image_size = (1024, 1024)
transform_image = transforms.Compose([
   transforms.Resize(image_size),
   transforms.ToTensor(),
   transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

Step 5: Loading the Image

 image = Image.open("/content/Boy using a computer.jpeg")
input_images = transform_image(image).unsqueeze(0).to('cuda')

Here, we load the image and prepare it for the model. First, it opens the image using ‘PIL.’ Then, it resizes it and converts it to tensors. An extra batch dimension is also added to the image before moving it to ‘cuda’ for GPU to speed up the inference and ensure compatibility with the model. 

LOAD THE IMAGE: RMGB v2.0

Step 6: Background Removal

This code removes the background by generating a segmentation mask from the model’s predictions and applying it to the original image.

 with torch.no_grad():
   preds = model(input_images)[-1].sigmoid().cpu()
pred = preds[0].squeeze()
pred_pil = transforms.ToPILImage()(pred)
mask = pred_pil.resize(image.size)
image.putalpha(mask)

This code removes the background by getting a transparency mask from the model. It runs the model without gradient tracking, applies sigmoid() to get pixel probabilities, and moves the result to the CPU. The mask is resized to match the original image and set as its alpha channel, making the background transparent.

The result of the input image is below, with the background removed and separated from the primary object (the boy). 

Here is the file to the code. 

RMBG result: RMGB v2.0

Application of Image Background Using RMGB v2.0

There are various use cases of this model across different fields. Some of the common applications include; 

  • E-commerce: This model can be useful for completing E-Commerce product photography, as you can remove and replace the foreground in the image. 
  • Gaming: Background removal plays a huge role in creating game assets. This model can be used to separate selected images from other objects. 
  • Advertisement: You can leverage RMGB’s background removal and replacement capabilities to generate advertisement designs and content. These could be for images and even graphics. 

Conclusion

RMGB is used across various industries. This model’s capabilities have also improved from the earlier v1.2 to the more recent v2.0. Its architecture and utilization of the BiRefNet play a huge role in its performance and inference time. You can explore this model with various image types and the output and quality of performance. 

Key Takeaway

  • This model’s improvement over its predecessors is a notable aspect of how RMGB works. Context understanding is another aspect that highlights its improved performance. 
  • One thing that makes this model stand out is its versatile application across various fields, such as advertising, gaming, and e-commerce. 
  • This model’s notable feature is its easy execution and integration. This results from its unique architecture, which allows it to run on low-resource environments with fast inference time.

Resource

Frequently Asked Questions

Q1. What makes RMGB v2.0 better than RMGB v1.4?

A. RMGB v2.0 improves edge detection, background separation, and accuracy, especially in complex scenes with detailed edges.

Q2. Can RMGB v2.0 work with different image formats?

A.  It supports various formats, such as JPEG and PNG, making it adaptable for different use cases.

Q3. Does RMGB v2.0 require a high-end GPU for inference?

A. This model is optimized for low-resource environments and can run efficiently on standard GPUs.

Q4. What is the architecture behind RMGB v2.0?

A. RMGB v2.0 is built on the BiRefNet mechanism, which improves high-resolution image-background separation using localization and restoration modules.

Q5. How can I run RMGB v2.0 for background removal?

A. You can install required dependencies like Kornia, load the pre-trained model, preprocess images, and perform inference using PyTorch.

Q6. Where can I find resources to explore RMGB v2.0 further?

A. You can refer to BraiAI’s blog, Hugging Face model repository, and AIModels.fyi for documentation and implementation guides.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Hey there! I’m David Maigari, a dynamic professional with a passion for technical writing, Web Development, and the AI world. David is also an enthusiast of ML/AI innovations. Reach out to me on X (Twitter) at @maigari_david

Login to continue reading and enjoy expert-curated content.

We will be happy to hear your thoughts

Leave a reply

Som2ny Network
Logo
Compare items
  • Total (0)
Compare
0