Semantična segmentacija scen z zlivanjem meritev LiDAR in barvnih slik

diplomsko delo

Matej Urbas (Author), Matej Kristan (Mentor)

Abstract

V okviru diplomske naloge je predstavljena metoda za semantično segmentacijo voznih scen. Moderne metode semantične segmentacije voznih scen lahko razdelimo na tri kategorije. Prva kategorija za zajem podatkov uporablja samo kamere, druga samo senzorje LiDAR, tretja pa združi podatke obeh senzorjev. V delu se osredotočamo na združevanje meritev LiDAR in barvnih slik s pomočjo mehanizma medpozornosti. Razvijemo metodo SWINCrossFusion, ki temelji na arhitekturi transformerja SWIN, za združevanje meritev pa predstavimo nov transformerski blok SWIN za izvajanje medpozornosti. Metoda izračuna poizvedbe nad podatki iz enega, ključe in vrednosti pa na podatkih drugega senzorja. Tako dobimo učinkovito in hitro združevanje lastnosti obeh senzorjev. Metodo evalviramo na podatkovni zbirki SemanticKITTI in primerjamo z referenčno metodo PMF. Razvita metoda je s 54 % mIoU za dva odstotka slabša od referenčne metode, vendar vhodne podatke procesira 40 % hitreje in porabi 1 GB grafičnega pomnilnika manj.

Keywords

transformer;pozornost;medpozornost;segmentacija;LiDAR;slike;računalništvo in informatika;univerzitetni študij;diplomske naloge;

Data

Language:	Slovenian
Year of publishing:	2023
Typology:	2.11 - Undergraduate Thesis
Organization:	UL FRI - Faculty of Computer and Information Science
Publisher:	[M. Urbas]
UDC:	004.8:004.93(043.2)
COBISS:	139854851
Views:	36
Downloads:	20
Average score:	0 (0 votes)
Metadata:

Other data

Secondary language:	English
Secondary title:	Semantic scene segmentation with LIDAR and RGB image fusion
Secondary abstract:	This diploma thesis presents a method for semantic segmentation of driving scenes. Modern methods for semantic segmentation of driving scenes can be divided into three categories. The first category uses only cameras, the second uses only LiDAR sensors, and the third combines data from both sensors to capture data. In this paper, we focus on the fusion of LiDAR and RGB image data using cross-attention mechanism. We develop SWINCrossFusion, a method based on the SWIN transformer architecture, and introduce a new SWIN transformer block for sensor fusion using cross-attention. The method computes queries over data from one sensor, and keys and values over data from the other sensor. This results in an efficient and fast merging of the measurements of the two sensors. We evaluate the method on the SemanticKITTI dataset and compare it with the reference PMF method. The developed method is with 54 % mIoU two percent worse than the reference method, but processes the input data 40 % faster and consumes 1 GB less graphic memory.
Secondary keywords:	transformer;attention;cross-attention;segmentation;LiDAR;images;computer science;computer and information science;diploma;Nevronske mreže (računalništvo);Računalništvo;Univerzitetna in visokošolska dela;
Type (COBISS):	Bachelor thesis/paper
Study programme:	1000468
Embargo end date (OpenAIRE):	1970-01-01
Thesis comment:	Univ. v Ljubljani, Fak. za računalništvo in informatiko
Pages:	58 str.
ID:	17852921