Mladen Borovič (Avtor), Eftimije Tomovski (Avtor), Tom Li Dobnik (Avtor), Sandi Majninger (Avtor)

Povzetek

Manual assignment of Universal Decimal Classification (UDC) codes is time-consuming and inconsistent as digital library collections expand. This study evaluates 17 large language models (LLMs) as UDC classification recommender systems, including ChatGPT variants (GPT-3.5, GPT-4o, and o1-mini), Claude models (3-Haiku and 3.5-Haiku), Gemini series (1.0-Pro, 1.5-Flash, and 2.0-Flash), and Llama, Gemma, Mixtral, and DeepSeek architectures. Models were evaluated zero-shot on 900 English and Slovenian academic theses manually classified by professional librarians. Classification prompts utilized the RISEN framework, with evaluation using Levenshtein and Jaro–Winkler similarity, and a novel adjusted hierarchical similarity metric capturing UDC’s faceted structure. Proprietary systems consistently outperformed open-weight alternatives by 5–10% across metrics. GPT-4o achieved the highest hierarchical alignment, while open-weight models showed progressive improvements but remained behind commercial systems. Performance was comparable between languages, demonstrating robust multilingual capabilities. The results indicate that LLM-powered recommender systems can enhance library classification workflows. Future research incorporating fine-tuning and retrieval-augmented approaches may enable fully automated, high-precision UDC assignment systems.

Ključne besede

universal decimal classification;large language models;conversational systems;recommender systems;prompt engineering;zero-shot classification;hierarchical similarity;

Podatki

Jezik: Angleški jezik
Leto izida:
Tipologija: 1.01 - Izvirni znanstveni članek
Organizacija: UM FERI - Fakulteta za elektrotehniko, računalništvo in informatiko
Založnik: MDPI AG
UDK: 004.8
COBISS: 243245571 Povezava se bo odprla v novem oknu
ISSN: 2076-3417
Št. ogledov: 0
Št. prenosov: 5
Ocena: 0 (0 glasov)
Metapodatki: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Ostali podatki

Sekundarni jezik: Slovenski jezik
Sekundarne ključne besede: univerzalna decimalna klasifikacija;jezikovni modeli;hierarhična podobnost;priporočljivi sistemi;
Vrsta dela (COBISS): Članek v reviji
Strani: 23 str.
Letnik: ǂVol. ǂ15
Zvezek: ǂiss. ǂ14, [article no.] 7666
Čas izdaje: 2025
DOI: 10.3390/app15147666
ID: 26760440