diplomsko delo
Barbara Suhadolc (Avtor), Dejan Lavbič (Mentor)

Povzetek

Statistična analiza slovenskih besedil

Ključne besede

besedila;slovenski jezik;statistika;korpusi;poezija;računalništvo;visokošolski strokovni študij;računalništvo in informatika;diplomske naloge;

Podatki

Jezik: Slovenski jezik
Leto izida:
Tipologija: 2.11 - Diplomsko delo
Organizacija: UL FRI - Fakulteta za računalništvo in informatiko
Založnik: [B. Suhadolc]
UDK: 004.65:81'322(043.2)
COBISS: 10137940 Povezava se bo odprla v novem oknu
Št. ogledov: 65
Št. prenosov: 2
Ocena: 0 (0 glasov)
Metapodatki: JSON JSON-RDF JSON-LD TURTLE N-TRIPLES XML RDFA MICRODATA DC-XML DC-RDF RDF

Ostali podatki

Sekundarni jezik: Angleški jezik
Sekundarni naslov: Statistical analysis of Slovenian texts
Sekundarni povzetek: Slovenian language was analysed many times, but most of this research focuses on the grammatical side – use of genders and declension. The statistical part of comparisons of letters, letter combinations and other results of frequency is lack. For better overview of this information a wider analysis is needed. It should not stop just with one letter frequency, but focus on the details as the position of these letters, combinations, etc.. Text is an important part of everyday life for describing events, recording speech and thought. A person can see it everywhere: from commercial ads to contracts. It is a big part of culture and free time a man uses for reading a book, exploring internet, sending messages and so on. Because of this the physics of the text is usually not spoken about, its attention focused more on the meaning of the expression. Still, everything is pieced together by letter groups, their combinations and combinations of those combinations. For these causes, the research centred around Slovenian language and not so much as meaningful, but physical part of words that make text. To do that, number of graphs and tables that made the results easier to display were used. These were work of many short programs. The main line of work was already pre-chosen by a large number of already done analyses of different languages. Still, when the analysing hit some interesting point, it focused on it for a deeper research. Every graph or table display of the results was an outcome of a written program that made the control over the result correctness easier. For each outcome a decision had to be made to either show it in a comparison to others, make a simple display of it or make it a part of some visually pleasant graph. The text was lemmatised and analysed in that shape. At the end of the paper the ideas of result use are lined. Corpuses and written programs were uploaded on the internet for easier access.
Sekundarne ključne besede: texts;Slovene language;statistics;corpuses;poetry;computer science;computer and information science;diploma;
Vrsta datoteke: application/pdf
Vrsta dela (COBISS): Diplomsko delo/naloga
Komentar na gradivo: Univ. v Ljubljani, Fak. za računalništvo in informatiko
Strani: 60 str.
ID: 24207382
Priporočena dela:
, diplomsko delo
, diplomsko delo