text-conditioned fashion-image editing with guided GAN inversion
Martin Pernuš (Avtor), Clinton Fookes (Avtor), Vitomir Štruc (Avtor), Simon Dobrišek (Avtor)

Povzetek

Fashion-image editing is a challenging computer-vision task where the goal is to incorporate selected apparel into a given input image. Most existing techniques, known as Virtual Try-On methods, deal with this task by first selecting an example image of the desired apparel and then transferring the clothing onto the target person. Conversely, in this paper, we consider editing fashion images with text descriptions. Such an approach has several advantages over example-based virtual try-on techniques: (i) it does not require an image of the target fashion item, and (ii) it allows the expression of a wide variety of visual concepts through the use of natural language. Existing image-editing methods that work with language inputs are heavily constrained by their requirement for training sets with rich attribute annotations or they are only able to handle simple text descriptions. We address these constraints by proposing a novel text-conditioned editing model called FICE (Fashion Image CLIP Editing) that is capable of handling a wide variety of diverse text descriptions to guide the editing procedure. Specifically, with FICE, we extend the common GAN-inversion process by including semantic, pose-related, and image-level constraints when generating images. We leverage the capabilities of the CLIP model to enforce the text-provided semantics, due to its impressive image–text association capabilities. We furthermore propose a latent-code regularization technique that provides the means to better control the fidelity of the synthesized images. We validate the FICE through rigorous experiments on a combination of VITON images and Fashion-Gen text descriptions and in comparison with several state-of-the-art, text-conditioned, image-editing approaches. Experimental results demonstrate that the FICE generates very realistic fashion images and leads to better editing than existing, competing approaches. The source code is publicly available from: https://github.com/MartinPernus/FICE.

Ključne besede

besedilno pogojevanje;invertiranje GAN modelov;urejanje slik;generativni umetna inteligenca;text-conditioning;GAN inversion;image editing;generative artificial intelligence;

Podatki

Jezik: Angleški jezik
Leto izida:
Tipologija: 1.01 - Izvirni znanstveni članek
Organizacija: UL FE - Fakulteta za elektrotehniko
UDK: 004.93
COBISS: 207863555 Povezava se bo odprla v novem oknu
ISSN: 0031-3203
Št. ogledov: 67
Št. prenosov: 61
Ocena: 0 (0 glasov)
Metapodatki: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Ostali podatki

Sekundarni jezik: Slovenski jezik
Sekundarne ključne besede: besedilno pogojevanje;invertiranje GAN modelov;urejanje slik;generativna umetna inteligenca;
Vrsta dela (COBISS): Članek v reviji
Strani: 18 str.
Letnik: ǂVol. ǂ158
Zvezek: ǂ[article no.] ǂ111022
Čas izdaje: 2025
DOI: 10.1016/j.patcog.2024.111022
ID: 25161155