How to mask objects in images using Large Mask Paint (LaMa)?


Despite considerable progress, modern image editing systems often struggle with large missing portions, complex geometric patterns and high resolution images. Recently, Roman Suvorov et al. came up with a SOTA technique called LaMa, which can mask any scale of the object in a given image and return a fetched image excluding the object we masked. We will talk theoretically about this strategy in this article, and we will see how it works practically. The main points to be discussed in this article are as follows.

Contents

  1. About Image Editing
  2. How does LaMa approach the problem?
  3. Implementation of LaMa

Let’s start the discussion by understanding what image inpainting is.

About Image Editing

The process of reconstructing missing areas of an image so that viewers cannot discern that those regions have been restored is called image inpainting. This method is frequently used to remove unwanted elements from images or to restore damaged areas of old photographs. The images below show some examples of image inpainting.

Image painting is an age-old technique that required human painters to work by hand. But lately, academics have come up with various approaches to automatic inpainting. In addition to the image, most of these algorithms require a mask that displays inpainting areas as input. We compare the results of nine automatic painting systems with those of trained artists.

Inpainting is a conservation technique that involves filling in damaged, deteriorated, or missing areas of the artwork to create a complete image. Oil or acrylic paintings, chemical photographic prints, sculptures, digital photos and videos are all examples of physical and digital art mediums that can be used in this approach.

The solution to the image inpainting problem by realistically filling in missing sections requires “understanding” the large-scale structure of natural images as well as image synthesis. The subject was studied before the advent of deep learning, and development has accelerated in recent years through the use of deep and extended neural networks, as well as adversarial learning.

Inpainting systems are often trained on a huge automatically produced dataset, built by randomly masking real images. Complex two-step models incorporating intermediate predictions, such as smoothed images, contours, and segmentation maps, are frequently used.

How does LaMa approach the problem?

This inpainting network is based on Fast Fourier Convolutions (FFC) which have been recently developed. Even in the first levels of the network, the FFCs allow a receptive field which extends over the whole image. According to the researchers, this characteristic of FFCs increases both the perceptual quality and the efficiency of network parameters. The inductive bias of FFC, interestingly, allows the network to generalize to high resolutions that were never experienced during training. This finding has major practical implications, as it reduces the amount of training data and calculations needed.

It also uses perceptual loss, which relies on a semantic segmentation network with a wide receptive field. This is based on the finding that an insufficient receptive field affects both the inpainting network and the loss of perception. This loss supports the overall consistency of structure and form.

An aggressive training mask generation technique to harness the potential of the high receptive fields of the first two components. The approach generates large and huge masks, forcing the network to make full use of the high receptive field of the model and the loss function.

This all leads to Big Mask Painting (LaMa), a revolutionary one-step image painting technique. High receptive field architecture (i) with high receptive field loss function (ii) and aggressive training mask generation algorithm are the core components of LaMa (iii). We rigorously benchmark LaMa against current benchmarks and assess the impact of each component offered.

Source

The large mask paint scheme is shown in the image above (LaMa). As can be seen, LaMa is based on a ResNet-like inpainting network that uses the following techniques: the recently proposed Fast Fourier Convolution (FFC), a multicomponent loss that combines adversarial loss and perceptual loss of high receptive field, and a procedure for generating large masks in training time.

Implementation of LaMa

In this section, we’ll take a look at the official LaMa implementation and see how it effectively hides the user-marked object.

  1. Let’s set up the environment by installing and importing all dependencies.
# Cloning the repo
!git clone https://github.com/saic-mdal/lama.git
 
# installing the dependencies
!pip install -r lama/requirements.txt --quiet
!pip install wget --quiet
 
# change the directory
% cd /content/lama
 
# Download the model
!curl -L $(yadisk-direct https://disk.yandex.ru/d/ouP6l8VJ0HpMZg) -o big-lama.zip
!unzip big-lama.zip
 
# Importing dependencies
import base64, os
from IPython.display import HTML, Image
from google.colab.output import eval_js
from base64 import b64decode
import matplotlib.pyplot as plt
import numpy as np
import wget
from shutil import copyfile
import shutil
  1. In order to allow users to hide the desired object in the given image, we need to write HTML code.
  1. Now we will upload the image we want to hide the object it contains for this set fname=None and the mask will hide the object.
if fname is None:
  from google.colab import files
  files = files.upload()
  fname = list(files.keys())[0]
else:
  fname = wget.download(fname)
 
shutil.rmtree('./data_for_prediction', ignore_errors=True)
! mkdir data_for_prediction
 
copyfile(fname, f'./data_for_prediction/{fname}')
os.remove(fname)
fname = f'./data_for_prediction/{fname}'
 
image64 = base64.b64encode(open(fname, 'rb').read())
image64 = image64.decode('utf-8')
 
print(f'Will use {fname} for inpainting')
img = np.array(plt.imread(f'{fname}')[:,:,:3])
 
draw(image64, filename=f"./{fname.split('.')[1]}_mask.png", w=img.shape[1], h=img.shape[0], line_width=0.04*img.shape[1])

Now we’re going to hide the deer in the image like we usually do in the Paint app.

Below we can see how the model has convolved the masked image with the original image.

  1. Now let’s do the inference.
print('Run inpainting')
if '.jpeg' in fname:
  !PYTHONPATH=. TORCH_HOME=$(pwd) python3 bin/predict.py model.path=$(pwd)/big-lama indir=$(pwd)/data_for_prediction outdir=/content/output dataset.img_suffix=.jpeg > /dev/null
elif '.jpg' in fname:
  !PYTHONPATH=. TORCH_HOME=$(pwd) python3 bin/predict.py model.path=$(pwd)/big-lama indir=$(pwd)/data_for_prediction outdir=/content/output  dataset.img_suffix=.jpg > /dev/null
elif '.png' in fname:
  !PYTHONPATH=. TORCH_HOME=$(pwd) python3 bin/predict.py model.path=$(pwd)/big-lama indir=$(pwd)/data_for_prediction outdir=/content/output  dataset.img_suffix=.png > /dev/null
else:
  print(f'Error: unknown suffix .{fname.split(".")[-1]} use [.png, .jpeg, .jpg]')
 
plt.rcParams['figure.dpi'] = 200
plt.imshow(plt.imread(f"/content/output/{fname.split('.')[1].split('/')[2]}_mask.png"))
_=plt.axis('off')
_=plt.title('inpainting result')
plt.show()
fname = None

And here is the painted image:

Truly stunning result.

Last words

We have discussed using a basic one-step solution for painting heavily obscured parts in this article. We have seen how, with the right architecture, the right loss function and the right mask generation method, such an approach can be very competitive and push the state of the art in image inpainting. The approach, in particular, produces excellent results when it comes to repetitive pixels. I encourage you to experiment more with your own photographs, or you can search the document for additional information.

The references

Previous 3 Best Ways to Add Borders to Images and Videos Using Canva
Next China unveils first images from SDGSAT-1 satellite