• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1
  • Tagged with
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

"Konstrukcija i analiza klaster algoritma sa primenom u definisanju bihejvioralnih faktora rizika u populaciji odraslog stanovništva Srbije" / "Construction and analysis of cluster algorithmwith application in defining behavioural riskfactors in Serbian adult population"

Dragnić Nataša 23 June 2016 (has links)
<p>Klaster analiza ima dugu istoriju i mada se<br />primenjuje u mnogim oblastima i dalje ostaju<br />značajni izazovi. U disertaciji je prikazan uvod<br />u neglatki optimizacioni pristup u<br />klasterovanju, sa osvrtom na problem<br />klasterovanja velikih skupova podataka.<br />Međutim, ovi optimizacioni algoritmi bolje<br />funkcioni&scaron;u u radu sa neprekidnim podacima.<br />Jedan od glavnih izazova u klaster analizi je<br />rad sa velikim skupovima podataka sa<br />kategorijalnim i kombinovanim (numerički i<br />kategorijalni) tipovima promenljivih. Rad sa<br />velikim brojem instanci (objekata) i velikim<br />brojem dimenzija (promenljivih), može<br />predstavljati problem u klaster analizi, zbog<br />vremenske složenosti. Jedan od načina<br />re&scaron;avanja ovog problema je redukovanje broja<br />instanci, bez gubitka informacija.<br />Prvi cilj disertacije je bio upoređivanje<br />rezultata klasterovanja na celom skupu i<br />prostim slučajnim uzorcima sa kategorijalnim i<br />kombinovanim podacima, za različite veličine<br />uzorka i različit broj klastera. Nije utvrđena<br />značajna razlika (p&gt;0.05) u rezultatima<br />klasterovanja na uzorcima obima<br />0.03m,0.05m,0.1m,0.3m (gde je m obim<br />posmatranog skupa) i celom skupu.<br />Drugi cilj disertacije je bio konstrukcija<br />efikasnog postupka klasterovanja velikih<br />skupova podataka sa kategorijalnim i<br />kombinovanim tipovima promenljivih.<br />Predloženi postupak se sastoji iz sledećih<br />koraka: 1. klasterovanje na prostim slučajnim<br />uzorcima određene kardinalnosti; 2.<br />određivanje najboljeg klasterskog re&scaron;enja na<br />uzorku, primenom odgovarajućeg kriterijuma<br />validnosti; 3. dobijeni centri klastera iz ovog<br />uzorka služe za klasterovanje ostatka skupa.<br />Treći cilj disertacije predstavlja primenu<br />klaster analize u definisanju klastera<br />bihejvioralnih faktora rizika u populaciji<br />odraslog stanovni&scaron;tva Srbije, kao i analizu<br />sociodemografskih karakteristika dobijenih<br />klastera. Klaster analiza je primenjena na<br />velikom reprezentativnom uzorku odraslog<br />stanovni&scaron;tva Srbije, starosti 20 i vi&scaron;e godina.<br />Izdvojeno je pet jasno odvojenih klastera sa<br />karakterističnim kombinacijama bihejvioralnih<br />faktora rizika: Bez rizičnih faktora, &Scaron;tetna<br />upotreba alkohola i druge rizične navike,<br />Nepravilna ishrana i druge rizične navike,<br />Nedovoljna fizička aktivnost, Pu&scaron;enje. Rezultati<br />multinomnog logističkog regresionog modela<br />ukazuju da ispitanici koji nisu u braku, lo&scaron;ijeg<br />su materijalnog stanja, nižeg obrazovanja i žive<br />u Vojvodini imaju veću &scaron;ansu za prisustvo<br />vi&scaron;estrukih bihejvioralnih faktora rizika.</p> / <p>The cluster analysis has a long history and a<br />large number of clustering techniques have<br />been developed in many areas, however,<br />significant challenges still remain. In this<br />thesis we have provided a introduction to<br />nonsmooth optimization approach to clustering<br />with reference to clustering large datasets.<br />Nevertheless, these optimization clustering<br />algorithms work much better when a dataset<br />contains only vectors with continuous features.<br />One of the main challenges is clustering of large<br />datasets with categorical and mixed (numerical<br />and categorical) data. Clustering deals with a<br />large number of instances (objects) and a large<br />number of dimensions (variables) can be<br />problematic because of time complexity. One of<br />the ways to solve this problem is by reducing<br />the number of instances, without the loss of<br />information.<br />The first aim of this thesis was to compare<br />the results of cluster algorithms on the whole<br />dataset and on simple random samples with<br />categorical and mixed data, in terms of validity,<br />for different number of clusters and for<br />different sample sizes. There were no<br />significant differences (p&gt;0.05) between the<br />obtained results on the samples of the size of<br />0.03m,0.05m,0.1m,0.3m (where m is the size of<br />the dataset) and the whole dataset.<br />The second aim of this thesis was to<br />develop an efficient clustering procedure for<br />large datasets with categorical and mixed<br />(numeric and categorical) values. The proposed<br />procedure consists of the following steps: 1.<br />clustering on simple random samples of a given<br />cardinality; 2. finding the best cluster solution<br />on a sample (by appropriate validity measure);<br />3. using cluster centers from this sample for<br />clustering of the remaining data.<br />The third aim of this thesis was to<br />examine clustering of four lifestyle risk factors<br />and to examine the variation across different<br />socio-demographic groups in a Serbian adult<br />population. Cluster analysis was carried out on<br />a large representative sample of Serbian adults<br />aged 20 and over. We identified five<br />homogenous health behaviour clusters with<br />specific combination of risk factors: &#39;No Risk<br />Behaviours&#39;, &#39;Drinkers with Risk Behaviours&#39;,<br />&#39;Unhealthy diet with Risk Behaviours&#39;,<br />&#39;Smoking&#39;. Results of multinomial logistic<br />regression indicated that single adults, less<br />educated, with low socio-economic status and<br />living in the region of Vojvodina are most likely<br />to be a part of the clusters with a high-risk<br />profile.</p>

Page generated in 0.0747 seconds