Št. zadetkov: 34
Video in druga učna gradiva
Oznake:
humanities;linguistics;lexicography;social sciences;society;computer science
With the rise of digital media in the last decades, many language-related discussions have found
home on various fora and social media such as Facebook, where users can participate in a shared-interest group to discuss language use, problems and resources. The posts in these groups are formulated b ...
Leto:
2018
Vir:
videolectures.net
Video in druga učna gradiva
Oznake:
humanities;linguistics
Avtomatsko luščenje kolokacij temelji predvsem na izračunu statističnih sopojavitev besed v besedilnem korpusu, vsi tako izluščeni kandidati pa niso ustrezni. Da bi opredelili, kaj je legitimna statistična kolokacija na eni in slovarsko relevantna kolokacija na drugi strani, smo pripravili učno množ ...
Leto:
2018
Vir:
videolectures.net
Objavljeni znanstveni prispevek na konferenci
Oznake:
large language models;responsible artificial intelligence;safety datasets;Slovene;
In the paper, we present the initial preparatory phase of the compilation of a Slovene safety dataset containing harmful or offensive prompts and safe responses to them. The dataset will be used to fine-tune Slovene large language models in order to prevent unwanted model behavior and misuse by mali ...
Leto:
2024
Vir:
Fakulteta za računalništvo in informatiko (UL FRI)
Video in druga učna gradiva
Oznake:
humanities;linguistics
Leto:
2018
Vir:
videolectures.net
Raziskovalni podatki
Oznake:
computer-mediated communication;tokenisation;word normalisation;tagging;lemmatisation;manual annotation;TEI
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word normalisation, morphosyntactic tagging and lemmatisation of non-standard Slovene. As the corpus has bee ...
Leto:
2016
Vir:
CLARIN.si
Raziskovalni podatki
Oznake:
computer-mediated communication;tokenisation;word normalisation;tagging;lemmatisation;manual annotation;TEI
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word normalisation, morphosyntactic tagging and lemmatisation of non-standard Slovene. As the corpus has bee ...
Leto:
2016
Vir:
CLARIN.si
Raziskovalni podatki
Oznake:
computer-mediated communication;tokenisation;word normalisation;manual annotation;TEI
Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation and word normalisation of non-standard Slovene. The corpus is also automatically annotated with morphosyntac ...
Leto:
2016
Vir:
CLARIN.si
Raziskovalni podatki
Oznake:
computer-mediated communication;tokenisation;word normalisation;manual annotation;TEI
Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation and word normalisation of non-standard Slovene. As the corpus has been carefully manually annotated, it is a ...
Leto:
2016
Vir:
CLARIN.si
Raziskovalni podatki
Oznake:
spoken corpus;frequency list;n-grams;characters
Frequency lists of character-level n-grams were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain 1-5-gram combinations of characters occurring in the corpus along with th ...
Leto:
2019
Vir:
CLARIN.si
Raziskovalni podatki
Oznake:
frequency list;spoken corpus;words;lemmas;normalized forms
Frequency lists of words were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain all words occurring in the corpus along with their absolute and relative frequencies, perce ...
Leto:
2019
Vir:
CLARIN.si