Nacionalni portal odprte znanosti

Iskalni niz:

išči po

Vrsta gradiva:

Jezik:

Prikaži samo zadetke s polnim besedilom

Št. zadetkov: 3

Spoken Torlak dialect corpus 1.0 (transcription)

Teodora Vuković

Raziskovalni podatki

Oznake: spoken corpus;Torlak dialect;endangered dialects;word accents;lemmatisation;part-of-speech tagging

Torlak corpus represents a spoken variety of the endangered Torlak dialect from the Timok area in Southeast Serbia. It comprises transcripts of interviews with the local population, collected in the field between 2015 and 2017. Semi-structured interviews were conducted eliciting spontaneous speech i ...

Leto: 2020 Vir: CLARIN.si

Serbian Twitter training corpus ReLDI-NormTag-sr 1.0

Nikola Ljubešić, Daša Farkaš, Filip Klubička, Tomaž Erjavec, Maja Miličević, Teodora Vuković

Raziskovalni podatki

Oznake: computer-mediated communication;tokenisation;word normalisation;tagging;lemmatisation;manual annotation;TEI

ReLDI-NormTag-sr 1.0 is a manually annotated corpus of Serbian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word normalisation, morphosyntactic tagging and lemmatisation of non-standard Serbian. Each tweet is also annotated for its auto ...

Leto: 2017 Vir: CLARIN.si

Serbian Twitter training corpus ReLDI-NormTag-sr 1.1

Nikola Ljubešić, Daša Farkaš, Filip Klubička, Tomaž Erjavec, Maja Miličević, Teodora Vuković

Raziskovalni podatki

Oznake: computer-mediated communication;tokenisation;word normalisation;tagging;lemmatisation;manual annotation;TEI

ReLDI-NormTag-sr 1.1 is a manually annotated corpus of Serbian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word normalisation, morphosyntactic tagging and lemmatisation of non-standard Serbian. Each tweet is also annotated for its auto ...

Leto: 2017 Vir: CLARIN.si

Št. zadetkov: 3

Nacionalni portal odprte znanosti

Dostop do znanja slovenskih raziskovalnih organizacij