Mladen Borovič (Author), Eftimije Tomovski (Author), Tom Li Dobnik (Author), Sandi Majninger (Author)

Abstract

Manual assignment of Universal Decimal Classification (UDC) codes is time-consuming and inconsistent as digital library collections expand. This study evaluates 17 large language models (LLMs) as UDC classification recommender systems, including ChatGPT variants (GPT-3.5, GPT-4o, and o1-mini), Claude models (3-Haiku and 3.5-Haiku), Gemini series (1.0-Pro, 1.5-Flash, and 2.0-Flash), and Llama, Gemma, Mixtral, and DeepSeek architectures. Models were evaluated zero-shot on 900 English and Slovenian academic theses manually classified by professional librarians. Classification prompts utilized the RISEN framework, with evaluation using Levenshtein and Jaro–Winkler similarity, and a novel adjusted hierarchical similarity metric capturing UDC’s faceted structure. Proprietary systems consistently outperformed open-weight alternatives by 5–10% across metrics. GPT-4o achieved the highest hierarchical alignment, while open-weight models showed progressive improvements but remained behind commercial systems. Performance was comparable between languages, demonstrating robust multilingual capabilities. The results indicate that LLM-powered recommender systems can enhance library classification workflows. Future research incorporating fine-tuning and retrieval-augmented approaches may enable fully automated, high-precision UDC assignment systems.

Keywords

universal decimal classification;large language models;conversational systems;recommender systems;prompt engineering;zero-shot classification;hierarchical similarity;

Data

Language: English
Year of publishing:
Typology: 1.01 - Original Scientific Article
Organization: UM FERI - Faculty of Electrical Engineering and Computer Science
Publisher: MDPI AG
UDC: 004.8
COBISS: 243245571 Link will open in a new window
ISSN: 2076-3417
Views: 0
Downloads: 5
Average score: 0 (0 votes)
Metadata: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Other data

Secondary language: Slovenian
Secondary keywords: univerzalna decimalna klasifikacija;jezikovni modeli;hierarhična podobnost;priporočljivi sistemi;
Type (COBISS): Article
Pages: 23 str.
Volume: ǂVol. ǂ15
Issue: ǂiss. ǂ14, [article no.] 7666
Chronology: 2025
DOI: 10.3390/app15147666
ID: 26760440