Effectiveness of proactive password checker based on Markov models

doktorska disertacija

Viktor Taneski (Author), Boštjan Brumen (Mentor), Ilija Jolevski (Co-mentor)

Abstract

In this doctoral dissertation we focus on the most common method of authentication, the username-password combination. The reason for the frequent use of this authentication mechanism is its simplicity and low cost of implementation. Although passwords are so useful, they have many problems. Morris and Thompson, for the first time almost four decades ago, found that textual passwords were a weak security point of information systems. They have come to the conclusion that users are one of the biggest threats to information system's security. Since then, we face these problems on a daily basis. Users do not perform the behaviours they need to be done in order to stay safe and secure, although they are aware of the security issues. Because this is a research area that security experts have been dealing with for a long time, in this dissertation we wanted to identify problems related to textual passwords and possible suggested solutions. For this purpose, we first performed a systematic literature review on textual passwords and their security. In doing so, we wanted to evaluate the current status of passwords in terms of their strength, ways of managing passwords, and whether users are still the "weakest link". We found that one of the less researched solutions is proactive password checking. A proactive password checker could filter out the passwords that are easy-to-guess and only let through the passwords that are harder to guess. In order for a proactive password checking to be more effective, it is necessary for the checker to be able to check the probability that a certain password will be selected by the user. For this purpose, the better password checkers usually use certain tools to calculate password probability i.e., password strength. To find out which method is most suitable for calculating password strength, we have looked at similar solutions throughout history. We have found that Markov models are one of the most common methods used for password strength estimation, although we may encounter some problems when using them, such as sparsity and over-fitting. By reviewing similar solutions, we found that Markov models are mostly trained on only one dataset. This could limit the performance of the model in terms of correctly identifying bad or very strong passwords. As training datasets are important in the development of Markov models, it is clear that they will have some effect in the final assessment of the password's strength. What we explore in our dissertation, is the importance of this effect on the final password strength estimation. Mainly, we focus on exploring the effect of different but similar datasets on password strength estimation. For the purposes of our study, we analysed publicly available sets of "common passwords" and processed them regarding the frequency distribution of the letters contained in these passwords. We built different Markov models based on these datasets and frequency distribution. This helped us determine if one Markov model was sufficient or if several models were needed to effectively estimate password strength for a wide range of passwords. The results showed statistical differences between the models. In more detail, we found that: - different Markov models (trained on different databases) showed statistically different results when tested on the same dataset, - more diverse datasets are needed to be able to calculate the strength of as many passwords as possible, since one "universal" model, trained on one "universal" dataset is less effective at classifying passwords in different categories (i.e., weak, medium, strong), - different Markov models of 1st and 2nd order, in most cases, give no statistically different outputs, - overall, Markov models can be used as a basis for constructing a more effective password checker that uses multiple different and specific Markov models, which could be more effective if we want to cover a wider range of passwords.

Keywords

passwords;password analysis;password security;password problems;password strength;systematic literature review;Markov models;doktorske disertacije;

Data

Language:	English
Year of publishing:	2019
Typology:	2.08 - Doctoral Dissertation
Organization:	UM FERI - Faculty of Electrical Engineering and Computer Science
Publisher:	[V. Taneski]
UDC:	004.056.523:519.217(043.3)
COBISS:	22934294
Views:	1148
Downloads:	209
Average score:	0 (0 votes)
Metadata:

Other data

Secondary language:	Slovenian
Secondary title:	Učinkovitost proaktivnega preverjevalnika gesel, ki temelji na markovih modelih
Secondary abstract:	V doktorski disertaciji se osredotočamo na najpogostejši način avtentikacije in sicer kombinacija uporabniškega imena in gesla. Razloga za tako pogosto uporabo tega avtentikacijskega mehanizma sta njegova preprostost in nizki stroški za implementacijo. Čeprav so gesla tako uporabna, imajo veliko težav. Morris in Thompson sta pred skoraj štirimi desetletij prvič ugotovila, da so tekstovna gesla slaba točka varnosti informacijskega sistema. Prišla sta do zaključka, da so uporabniki ena največjih groženj varnosti informacijskih sistemov. Od takrat naprej, se s temi težavami spopadamo vsak dan. Uporabniki ne upoštevajo predlaganih ukrepov, ki so potrebni, da lahko ostanejo varni in zaščiteni, čeprav se zavedajo varnostnih težav. Zaradi tega, ker je to raziskovalno področje s katerim se eksperti za varnost ukvarjajo že dolgo časa, smo v tej disertaciji hoteli identificirati probleme povezane s tekstovnimi gesli, ter identificirati možne predlagane rešitve. Za ta namen smo najprej naredili sistematični pregled literature na področju tekstovnih gesel in njihovo varnost. S tem smo hoteli oceniti trenutno stanje gesel glede na njihovo moč, načine upravljanja z gesli, ter ali so uporabniki še vedno "najšibkejša povezava". Iz sistematičnega pregleda literature smo ugotovili, da je ena izmed manj raziskovanih rešitev proaktivno preverjanje gesel. S pomočjo proaktivnega preverjalnika gesel bi uporabniku omejili katera gesla lahko izbira in jih uporablja - odobrili bi le gesla, ki jih je težje uganiti. Da je lahko proaktivno preverjanje gesla bolj učinkovito je potrebno, da je preverjalnik zmožen preveriti verjetnost, da bo to geslo izbrano s strani uporabnika. Za ta namen, boljši preverjalniki ponavadi uporabljajo določena orodja za izračun vejretnosti oz. moči gesla. Da bi ugotovili kateri način je najbolj ustrezen za izračun moči gesla, smo pregledali podobne rešitve skozi zgodovino. Ugotovili smo, da so Markovi modeli ena najpogostejših metod, ki se uporabljajo za ocenjevanje moči gesla, čeprav se pri uporabi lahko srečamo z določenimi težavami, kot sta "redkost" in "prekomerno prileganje" (angl. sparsity and over-fitting). Pri pregledu podobnih rešitev smo ugotovili, da se Markovi modeli večinoma usposabljajo samo na enem naboru podatkov. To bi lahko omejilo zmogljivost modela z vidika pravilne določitve slabih ali zelo močnih gesel. Ker so nabori podatkov o usposabljanju pomembni pri razvoju modelov, je jasno, da bodo imeli nekaj učinka pri končnem ocenjevanju gesla. V naši disertaciji raziskujemo, kako pomemben je ta učinek na končno moč gesla. V glavnem se osredotočamo na raziskovanje učinka različnih, vendar podobnih nizov podatkov na oceno moči. Za potrebe naše študije smo analizirali javno dostopne nabore "skupnih gesel" in jih obdelovali glede na frekvenčno porazdelitev črk, ki jih ta gesla vsebujejo. Na podlagi teh nizov podatkov in frekvenčne porazdelitve smo zgradili različne Markove modele. To nam je pomagalo ugotoviti, ali je en Markov model dovolj ali pa je potrebnih več modelov za učinkovito preverjanje več različnih gesel. Rezultati so pokazali statistične razlike med modeli. Bolj podrobno smo ugotovili, da: - različni Markovi modeli (usposobljeni na različnih podatkovnih zbirkah) so pokazali statistično različne rezultate, ko so bili testirani na istem naboru gesel, - potrebnih je več različnih podatkovnih zbirk, da lahko izračunamo moč čim večjega števila gesel, saj je en "univerzalni" model usposobljen na enem "univerzalnem" naboru gesel, manj učinkovit pri razvrščanju gesel v različnih kategorijah (tj. šibka, srednja, močna), - različni Markovi modeli 1. in 2. reda v večini primerov niso dali statistično različnih rezultatov - na splošno, lahko Markove modele uporabimo kot osnovo za izdelavo učinkovitejšega preverjalnika gesel, ki uporablja več različnih in specifičnih Markovih modelov, kar bi lahko bilo bolj učinkovito, če želimo zajeti širši obseg gesel.
Secondary keywords:	gesla;analiza gesel;varnost gesel;težave z gesli;moč gesel;sistematični pregled literature;Markovi modeli;
Type (COBISS):	Doctoral dissertation
Thesis comment:	Univ. v Mariboru, Fak. za elektrotehniko, računalništvo in informatiko
Pages:	XXV, 119 str.
ID:	11157767