text-conditioned fashion-image editing with guided GAN inversion
Martin Pernuš (Author), Clinton Fookes (Author), Vitomir Štruc (Author), Simon Dobrišek (Author)

Abstract

Fashion-image editing is a challenging computer-vision task where the goal is to incorporate selected apparel into a given input image. Most existing techniques, known as Virtual Try-On methods, deal with this task by first selecting an example image of the desired apparel and then transferring the clothing onto the target person. Conversely, in this paper, we consider editing fashion images with text descriptions. Such an approach has several advantages over example-based virtual try-on techniques: (i) it does not require an image of the target fashion item, and (ii) it allows the expression of a wide variety of visual concepts through the use of natural language. Existing image-editing methods that work with language inputs are heavily constrained by their requirement for training sets with rich attribute annotations or they are only able to handle simple text descriptions. We address these constraints by proposing a novel text-conditioned editing model called FICE (Fashion Image CLIP Editing) that is capable of handling a wide variety of diverse text descriptions to guide the editing procedure. Specifically, with FICE, we extend the common GAN-inversion process by including semantic, pose-related, and image-level constraints when generating images. We leverage the capabilities of the CLIP model to enforce the text-provided semantics, due to its impressive image–text association capabilities. We furthermore propose a latent-code regularization technique that provides the means to better control the fidelity of the synthesized images. We validate the FICE through rigorous experiments on a combination of VITON images and Fashion-Gen text descriptions and in comparison with several state-of-the-art, text-conditioned, image-editing approaches. Experimental results demonstrate that the FICE generates very realistic fashion images and leads to better editing than existing, competing approaches. The source code is publicly available from: https://github.com/MartinPernus/FICE.

Keywords

besedilno pogojevanje;invertiranje GAN modelov;urejanje slik;generativni umetna inteligenca;text-conditioning;GAN inversion;image editing;generative artificial intelligence;

Data

Language: English
Year of publishing:
Typology: 1.01 - Original Scientific Article
Organization: UL FE - Faculty of Electrical Engineering
UDC: 004.93
COBISS: 207863555 Link will open in a new window
ISSN: 0031-3203
Views: 67
Downloads: 61
Average score: 0 (0 votes)
Metadata: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Other data

Secondary language: Slovenian
Secondary keywords: besedilno pogojevanje;invertiranje GAN modelov;urejanje slik;generativna umetna inteligenca;
Type (COBISS): Article
Pages: 18 str.
Volume: ǂVol. ǂ158
Issue: ǂ[article no.] ǂ111022
Chronology: 2025
DOI: 10.1016/j.patcog.2024.111022
ID: 25161155
Recommended works:
, text-conditioned fashion-image editing with guided GAN inversion