Pivot adjustment frame for editing real images

There has been growing interest in facial capture and animation in recent years. Visual effects companies and video game companies want to create and deliver new experiences to their audiences, whether they are a hyperrealistic human character or a fantastic creature created by visual effects and driven by a human interpreter. With rapidly growing technology and the search for more efficient ways to deliver high-quality animation, and the increasing visual realism of performance using high-performance systems, the industry has driven a wave of innovative academic and industrial developments. As a result, facial animation research and development has sparked a huge intrigue in universities and entertainment industry conglomerates – the central aspect to helping many of these developments being technical advancements in vision. by computer and graphics.

Some of the particular advancements noted are face tracking and motion capture, face tuning, dense non-rigid mesh registration, measurement of skin rendering attributes through various models, and sensing technology. Photorealistic synthesis and advanced facial editing of portrait images find many applications in several areas of animation, including special effects, extended reality, virtual worlds and next-generation communication. An artist’s control over the semantic parameters of facial capture technology such as geometric identity, expressions, reflectance, or stage lighting is typically desired during the content creation process. Image synthesis has also made huge strides with the emergence of Generative Adversary Networks (GANs). Essentially, GANs discover the domain from the image and produce new samples from the same distribution.

For such use cases, StyleGAN is one of the most popular choices for this task. Not only does it help achieve cutting edge visuals, but it also demonstrates fantastic editing abilities from its unraveled organically formed latent space. Using this property of image synthesis, there are many methods that can be adjusted to StyleGAN’s realistic latent space editing capabilities and help to perform different types of image editing techniques such as change of image. facial, expression or age orientations, going through the learned characteristics. In addition, the use of synthetic image data as additional training data has been found to be useful even though these are images rendered graphically with a wide range of applications such as facial reconstruction, the estimation of the gaze, the human pose, the estimation of the form.

Data & Analytics Conclave. Free Recordings>>

What is the pivot adjustment for images?

Traditional advanced facial editing techniques currently in use take advantage of the generative power of a pre-trained StyleGAN. To successfully edit an image using StyleGAN, you must first invert the sample image in the field of the pre-trained generator. It turns out that StyleGAN’s latent space induces an inherent trade-off between image distortion and editable; for example, it oscillates between maintaining the original appearance and convincingly modifying some of its attributes. In practice, this means that it is always difficult to apply identity preserving facial editing techniques to faces that are outside the domain of the generator. Pivotal Image Tuning is one approach to bridge this gap. The Pivotal Image Tuning technique modifies the generator only slightly so that an out-of-domain image is correctly mapped to latent space in the domain. The key idea of ​​pivot adjustment is to use a brief training process that preserves the edit quality of an image while altering its represented identity and appearance.

Using Pivotal Tuning Inversion, also known as PTI, an initial inverted latent pattern that serves as a pivot, the generator is fine-tuned. At the same time, a regularization method keeps close identities between images intact to maintain the effect. This training process ends up changing the appearance characteristics that primarily represent identity without affecting the model’s editing capabilities. The pivot adjustment can also adjust the generator to accommodate many faces while introducing negligible or no distortion to the rest of the image domain. It has been validated by reversing and editing metrics to show better performance compared to current advanced methods. Using the PTI method, advanced modifications such as pose, age or expression can be implemented on many images of well-known and recognizable identities. It withstands even more difficult cases, such as images that include heavy makeup, elaborate hairstyles or headwear. The method can reconstruct even out-of-domain visual details, such as face paintings or hands, much better than advanced methods.

Image source: https://arxiv.org/pdf/2106.05744.pdf

Here we can see an example of real images edited using PTI’s custom Multi-ID StyleGAN. All of the images depicted here are generated by the same model, refined on the political and industrial leaders of the world. As can be seen, by applying various editing operations on these newly introduced and highly recognizable identities, the model still preserves the facial features well.

Using such a technique can be a boon to the image processing and photography industry, as every aspect required can be customized as needed without compromising on quality or having to re-image!

First steps with Pivotal Tuning for images

We will try to implement a pivot adjustment pattern on an image using the pivot adjustment inversion. The input image will be automatically modified with different editing measures such as applying inversion, aging, modifying or applying smile and modifying the pose of the image. This will show us the different capabilities of the PTI model and how effective editing techniques are in performing automatic edits without degrading the quality of the images. The creators of PTI are inspired in part by the following code, and their GitHub repository can be accessed from here.

Installation of libraries

The first step would be to install our required libraries. We will be using two initially named libraries, Weights & Biases, which will help us maintain our machine learning model and provide faster and more efficient visualization. The other is the lpips library, which will provide us with a set of perceptual similarity metrics to help train our model on different image perspectives. You can use the following code to install them

 !pip install wandb
 !pip install lpips 

Importing the StyleGan model,

 !wget https://github.com/ninja-build/ninja/releases/download/v1.8.2/ninja-linux.zip
 !sudo unzip ninja-linux.zip -d /usr/local/bin/
 !sudo update-alternatives --install /usr/bin/ninja ninja /usr/local/bin/ninja 1 --force
 Importing the PTI Model Weights, 
 import os
 !git clone https://github.com/danielroich/PTI.git $CODE_DIR 
Installation of dependencies and pre-trained models

Next we will install all of our other dependencies required to create our model, here we also use Pytorch and call PTI’s image editing functions.

 import os
 import sys
 import pickle
 import numpy as np
 from PIL import Image
 import torch
 from configs import paths_config, hyperparameters, global_config
 from utils.align_data import pre_process_images
 from scripts.run_pti import run_PTI
 from IPython.display import display
 import matplotlib.pyplot as plt
 from scripts.latent_editor_wrapper import LatentEditorWrapper 

Import of pre-trained weights for our StyleGAN,

 ## Download pretrained StyleGAN on FFHQ 1024x1024
 downloader.download_file("125OG7SMkXI-Kf2aqiwLLHyCvSW-gZk3M", os.path.join(save_path, 'ffhq.pkl'))
 Downloading the Dlib tool for setting alignment and preprocessing images before feeding input to the PTI, 
 #Download Dlib tool for alingment
 downloader.download_file("1xPmn19T6Bdd-_RfCVlgNBbfYoh1muYxR", os.path.join(save_path, 'align.dat')) 
Configuring the model configuration

Now we are going to organize and configure our model from anything that has been called or imported before.

 # If set to true download the desired image from the given url.If set to False, assumes you have uploaded personal image to
 # 'image_original' dir
 use_image_online = True
 use_multi_id_training = False
 paths_config.input_data_id = image_dir_name
 paths_config.input_data_path = f'/content/PTI/{image_dir_name}_processed'
 paths_config.stylegan2_ada_ffhq = '/content/PTI/pretrained_models/ffhq.pkl'
 hyperparameters.use_locality_regularization = False 
Image preprocessing

We are now going to perform the first steps to preprocess our input image; we will correct the alignment of the input image before feeding it to the input of our model for better results. Here I am using an image of Serena Williams to adjust and correct the alignment of the head.

See also

 ## Download real face image
 ## If you want to use your own image skip this part and upload an image/images of your choosing to image_original dir
 if use_image_online:
   !wget -O personal_image.jpg https://static01.nyt.com/images/2019/09/09/opinion/09Hunter1/09Hunter1-superJumbo.jpg ## Photo of Sarena Wiliams
 original_image = Image.open(f'{image_name}.jpg')#open original image
 aligned_image = Image.open(f'/content/PTI/{image_dir_name}_processed/{image_name}.jpeg')
 aligned_image.resize((512,512)) #setting alignment  

Corrected image:

Invert images using PTI

Formation of our created model

 model_id = run_PTI(use_wandb=False, use_multi_id_training=use_multi_id_training)
 Visualizing our results, 
 def display_alongside_source_image(images): 
     res = np.concatenate([np.array(image) for image in images], axis=1) 
     return Image.fromarray(res) 
 #loading image generator
 def load_generators(model_id, image_name):
   with open(paths_config.stylegan2_ada_ffhq, 'rb') as f:
     old_G = pickle.load(f)['G_ema'].cuda()
   with open(f'{paths_config.checkpoints_dir}/model_{model_id}_{image_name}.pt', 'rb') as f_new: 
     new_G = torch.load(f_new).cuda()
   return old_G, new_G
 generator_type = paths_config.multi_id_model_type if use_multi_id_training else image_name
 old_G, new_G = load_generators(model_id, generator_type)
 # performing image synthesis
 def plot_syn_images(syn_images): 
   for img in syn_images: 
       img = (img.permute(0, 2, 3, 1) * 127.5 + 128).clamp(0, 255).to(torch.uint8).detach().cpu().numpy()[0] 
       resized_image = Image.fromarray(img,mode="RGB").resize((256,256)) 
       del img 
       del resized_image 
 w_path_dir = f'{paths_config.embedding_base_dir}/{paths_config.input_data_id}'
 embedding_dir = f'{w_path_dir}/{paths_config.pti_results_keyword}/{image_name}'
 w_pivot = torch.load(f'{embedding_dir}/0.pt'
 old_image = old_G.synthesis(w_pivot, noise_mode="const", force_fp32 = True)
 new_image = new_G.synthesis(w_pivot, noise_mode="const", force_fp32 = True)
 print('Lower image is the inversion before Pivotal Tuning and the upper image is the product of pivotal tuning')
 plot_syn_images([old_image, new_image]) 

As we can see, with the help of PTI, a whole new face can be generated from an input image!

Performing different editing techniques on the image

We are now going to apply different editing measures such as applying an inversion, modifying or applying a smile and modifying the pose of the image.

 latent_editor = LatentEditorWrapper()
 latents_after_edit = latent_editor.get_single_interface_gan_edits(w_pivot, [-2, 2])
 for direction, factor_and_edit in latents_after_edit.items():
   print(f'Showing {direction} change')
   for latent in factor_and_edit.values():
     old_image = old_G.synthesis(latent, noise_mode="const", force_fp32 = True)
     new_image = new_G.synthesis(latent, noise_mode="const", force_fp32 = True)
     plot_syn_images([old_image, new_image]) 

End Notes

In this article, we have explored what exactly the PTI or Pivotal Tuning Inversion technique is. We also tried to get a better understanding and created a practical implementation for the model. Therefore, you can try it with more complex pictures and notice the difference in processing. The link to the above collaboration implementation can be accessed from here.

Good learning!

The references

Subscribe to our newsletter

Receive the latest updates and relevant offers by sharing your email.

Join our Telegram Group. Be part of an engaging community

Previous Fluorescence Microscopy Technique High Resolution Brain Images | BioScan | Jul / Aug 2021
Next Microscopy technique allows faster high-resolution images of deeper tissue