Iskalni niz:
išči po
išči po
išči po
išči po
Vrsta gradiva:
Jezik:
Št. zadetkov: 3
Raziskovalni podatki
Oznake: spoken corpus;Torlak dialect;endangered dialects;word accents;lemmatisation;part-of-speech tagging
Torlak corpus represents a spoken variety of the endangered Torlak dialect from the Timok area in Southeast Serbia. It comprises transcripts of interviews with the local population, collected in the field between 2015 and 2017. Semi-structured interviews were conducted eliciting spontaneous speech i ...
Leto: 2020 Vir: CLARIN.si
Raziskovalni podatki
Oznake: computer-mediated communication;tokenisation;word normalisation;tagging;lemmatisation;manual annotation;TEI
ReLDI-NormTag-sr 1.0 is a manually annotated corpus of Serbian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word normalisation, morphosyntactic tagging and lemmatisation of non-standard Serbian. Each tweet is also annotated for its auto ...
Leto: 2017 Vir: CLARIN.si
Raziskovalni podatki
Oznake: computer-mediated communication;tokenisation;word normalisation;tagging;lemmatisation;manual annotation;TEI
ReLDI-NormTag-sr 1.1 is a manually annotated corpus of Serbian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word normalisation, morphosyntactic tagging and lemmatisation of non-standard Serbian. Each tweet is also annotated for its auto ...
Leto: 2017 Vir: CLARIN.si
Št. zadetkov: 3
Ključne besede:
Leto izdaje:
Avtorji:
Repozitorij:
Tipologija:
Jezik: