Return to search

"Konstrukcija i analiza klaster algoritma sa primenom u definisanju bihejvioralnih faktora rizika u populaciji odraslog stanovništva Srbije" / "Construction and analysis of cluster algorithmwith application in defining behavioural riskfactors in Serbian adult population"

<p>Klaster analiza ima dugu istoriju i mada se<br />primenjuje u mnogim oblastima i dalje ostaju<br />značajni izazovi. U disertaciji je prikazan uvod<br />u neglatki optimizacioni pristup u<br />klasterovanju, sa osvrtom na problem<br />klasterovanja velikih skupova podataka.<br />Međutim, ovi optimizacioni algoritmi bolje<br />funkcioni&scaron;u u radu sa neprekidnim podacima.<br />Jedan od glavnih izazova u klaster analizi je<br />rad sa velikim skupovima podataka sa<br />kategorijalnim i kombinovanim (numerički i<br />kategorijalni) tipovima promenljivih. Rad sa<br />velikim brojem instanci (objekata) i velikim<br />brojem dimenzija (promenljivih), može<br />predstavljati problem u klaster analizi, zbog<br />vremenske složenosti. Jedan od načina<br />re&scaron;avanja ovog problema je redukovanje broja<br />instanci, bez gubitka informacija.<br />Prvi cilj disertacije je bio upoređivanje<br />rezultata klasterovanja na celom skupu i<br />prostim slučajnim uzorcima sa kategorijalnim i<br />kombinovanim podacima, za različite veličine<br />uzorka i različit broj klastera. Nije utvrđena<br />značajna razlika (p&gt;0.05) u rezultatima<br />klasterovanja na uzorcima obima<br />0.03m,0.05m,0.1m,0.3m (gde je m obim<br />posmatranog skupa) i celom skupu.<br />Drugi cilj disertacije je bio konstrukcija<br />efikasnog postupka klasterovanja velikih<br />skupova podataka sa kategorijalnim i<br />kombinovanim tipovima promenljivih.<br />Predloženi postupak se sastoji iz sledećih<br />koraka: 1. klasterovanje na prostim slučajnim<br />uzorcima određene kardinalnosti; 2.<br />određivanje najboljeg klasterskog re&scaron;enja na<br />uzorku, primenom odgovarajućeg kriterijuma<br />validnosti; 3. dobijeni centri klastera iz ovog<br />uzorka služe za klasterovanje ostatka skupa.<br />Treći cilj disertacije predstavlja primenu<br />klaster analize u definisanju klastera<br />bihejvioralnih faktora rizika u populaciji<br />odraslog stanovni&scaron;tva Srbije, kao i analizu<br />sociodemografskih karakteristika dobijenih<br />klastera. Klaster analiza je primenjena na<br />velikom reprezentativnom uzorku odraslog<br />stanovni&scaron;tva Srbije, starosti 20 i vi&scaron;e godina.<br />Izdvojeno je pet jasno odvojenih klastera sa<br />karakterističnim kombinacijama bihejvioralnih<br />faktora rizika: Bez rizičnih faktora, &Scaron;tetna<br />upotreba alkohola i druge rizične navike,<br />Nepravilna ishrana i druge rizične navike,<br />Nedovoljna fizička aktivnost, Pu&scaron;enje. Rezultati<br />multinomnog logističkog regresionog modela<br />ukazuju da ispitanici koji nisu u braku, lo&scaron;ijeg<br />su materijalnog stanja, nižeg obrazovanja i žive<br />u Vojvodini imaju veću &scaron;ansu za prisustvo<br />vi&scaron;estrukih bihejvioralnih faktora rizika.</p> / <p>The cluster analysis has a long history and a<br />large number of clustering techniques have<br />been developed in many areas, however,<br />significant challenges still remain. In this<br />thesis we have provided a introduction to<br />nonsmooth optimization approach to clustering<br />with reference to clustering large datasets.<br />Nevertheless, these optimization clustering<br />algorithms work much better when a dataset<br />contains only vectors with continuous features.<br />One of the main challenges is clustering of large<br />datasets with categorical and mixed (numerical<br />and categorical) data. Clustering deals with a<br />large number of instances (objects) and a large<br />number of dimensions (variables) can be<br />problematic because of time complexity. One of<br />the ways to solve this problem is by reducing<br />the number of instances, without the loss of<br />information.<br />The first aim of this thesis was to compare<br />the results of cluster algorithms on the whole<br />dataset and on simple random samples with<br />categorical and mixed data, in terms of validity,<br />for different number of clusters and for<br />different sample sizes. There were no<br />significant differences (p&gt;0.05) between the<br />obtained results on the samples of the size of<br />0.03m,0.05m,0.1m,0.3m (where m is the size of<br />the dataset) and the whole dataset.<br />The second aim of this thesis was to<br />develop an efficient clustering procedure for<br />large datasets with categorical and mixed<br />(numeric and categorical) values. The proposed<br />procedure consists of the following steps: 1.<br />clustering on simple random samples of a given<br />cardinality; 2. finding the best cluster solution<br />on a sample (by appropriate validity measure);<br />3. using cluster centers from this sample for<br />clustering of the remaining data.<br />The third aim of this thesis was to<br />examine clustering of four lifestyle risk factors<br />and to examine the variation across different<br />socio-demographic groups in a Serbian adult<br />population. Cluster analysis was carried out on<br />a large representative sample of Serbian adults<br />aged 20 and over. We identified five<br />homogenous health behaviour clusters with<br />specific combination of risk factors: &#39;No Risk<br />Behaviours&#39;, &#39;Drinkers with Risk Behaviours&#39;,<br />&#39;Unhealthy diet with Risk Behaviours&#39;,<br />&#39;Smoking&#39;. Results of multinomial logistic<br />regression indicated that single adults, less<br />educated, with low socio-economic status and<br />living in the region of Vojvodina are most likely<br />to be a part of the clusters with a high-risk<br />profile.</p>

Identiferoai:union.ndltd.org:uns.ac.rs/oai:CRISUNS:(BISIS)99629
Date23 June 2016
CreatorsDragnić Nataša
ContributorsLužanin Zorana, Ač-Nikolić Eržebet, Tepavčević Andreja, Krejić Nataša, Kvrgić Svetlana, Grujić Vera
PublisherUniverzitet u Novom Sadu, Doktorske disertacije iz interdisciplinarne odnosno multidisciplinarne oblasti na Univerzitetu u Novom Sadu, University of Novi Sad, Doctoral dissertations in the interdisciplinary or multidisciplinary field
Source SetsUniversity of Novi Sad
LanguageSerbian
Detected LanguageUnknown
TypePhD thesis

Page generated in 0.0069 seconds