211 |
Data Mining Methods For Clustering Power Quality Data Collected Via Monitoring Systems Installed On The Electricity NetworkGuder, Mennan 01 September 2009 (has links) (PDF)
Increasing power demand and wide use of high technology power electronic devices result in need for power quality monitoring. The quality of electric power in both transmission and distribution systems should be analyzed in order to sustain power system reliability and continuity. This analysis is possible by examination of data collected by power quality monitoring systems. In order to define the characteristics of the power system and reveal the relations between the power quality events, huge amount of data should be processed. In this thesis, clustering methods for power quality events are developed using exclusive and overlapping clustering models. The methods are designed to cluster huge amount of power quality data which is obtained from the online monitoring of the Turkish Electricity Transmission System. The main issues considered in the design of the clustering methods are the amount of the data, efficiency of the designed algorithm and queries that should be supplied to the domain experts. This research work is fully supported by the Public Research grant Committee (KAMAG) of TUBITAK within the scope of National Power quality Project (105G129).
|
212 |
An Evaluation Of Clustering And Districting Models For Household Socio-economic Indicators In Address-based Population Register SystemOzcan Yavuzoglu, Seyma 01 December 2009 (has links) (PDF)
Census operations are very important events in the history of a nation. These operations cover every bit of land and property of the country and its citizens. Census data is also known as demographic data providing valuable information to various users, particularly planners to know the trends in the key areas. Since 2006, Turkey aims to produce this census data not as &ldquo / de-facto&rdquo / (static) but as &ldquo / de-jure&rdquo / (real-time) by the new Address Based Register Information System (ABPRS). Besides, by this new register based census, personal information is matched with their address information and censuses gained a spatial dimension. Data obtained from this kind of a system can be a great input for the creation of &ldquo / small statistical areas (SSAs)&rdquo / which can compose of street blocks or any other small geographical unit to which social data can be referenced and to establish a complete census geography for Turkey. Because, statistics on large administrative units are only necessary for policy design only at an extremely abstracted level of analysis which is far from " / real" / problems as experienced by individuals.
In this thesis, it is aimed to employ some spatial clustering and districting methodologies to automatically produce SSAs which are basically built upon the ABPRS data that is geo-referenced with the aid of geographical information systems (GIS) and thus help improving the census geography concept which is limited with only higher level administrative boundaries in Turkey. In order to have a clear idea of what strategy to choose for its realization, small area identification criteria and methodologies are searched by looking into the United Nations&rsquo / recommendations and by taking some national and international applications into consideration. In addition, spatial clustering methods are examined for obtaining SSAs which fulfills these criteria in an automated fashion. Simulated annealing on k-means clustering, only k-means clustering and simulated annealing on k-means clustering of Self-Organizing Map (SOM) unified distances are deemed as suitable methods. Then these methods are implemented on parcel and block datasets having either raw data or socio-economic status (SES) indices in nine neighborhoods of Keç / iö / ren whose graphical and non-graphical raw data are manipulated, geo-referenced and combined in common basemaps. Consequently, simulated annealing refinement on k-means clustering of SOM u-distances is selected as the optimum method for constructing SSAs for all datasets after making a comparative quality assessment study which allows us to see how much each method obeyed the basic criteria of small area identification while creating SSA layers.
|
213 |
Predicting The Effect Of Hydrophobicity Surface On Binding Affinity Of Pcp-like Compounds Using Machine Learning MethodsYoldas, Mine 01 April 2011 (has links) (PDF)
This study aims to predict the binding affinity of the PCP-like compounds by means of molecular hydrophobicity. Molecular hydrophobicity is an important property which affects the binding affinity of molecules. The values of molecular hydrophobicity of molecules are obtained on three-dimensional coordinate system. Our aim is to reduce the number of points on the hydrophobicity surface of the molecules. This is modeled by using self organizing maps (SOM) and k-means clustering. The feature sets obtained from SOM and k-means clustering
are used in order to predict binding affinity of molecules individually. Support vector regression and partial least squares regression are used for prediction.
|
214 |
Fostering success in reading: a survey of teaching methods and collaboration practices of high performing elementary schools in TexasEvans Jr., Richard Austin 16 August 2006 (has links)
This study examined reading programs in 68 Texas elementary schools that were
identified as successful by their scores on TAAS assessment results in the 1999-2000
school year. These schools student populations had a high proportion of culturally
diverse and low-SES students. The purposes of this study were: (1) to determine if and
how teaching methods and collaboration (intervention/support teams) were used by
effective schools to foster reading success in all students; (2) to identify cohesive
patterns (clusters) or models in schools use of collaboration and teaching methods; (3)
to examine these clusters of similar schools and see if the patterns differed based on the
school/community demography (urban, suburban, or rural). The study was conducted in
68 schools in 33 school districts that represented various demographic settings from 12
different Education Service Centers across Texas. From these original 332 variables, 26
variables were selected that were of medium frequency and strongly correlated with high
TAAS scores over a 4- year period. These 26 variables were used to examine the 68
high-performing Texas elementary schools for clusters. K-means analysis and HCA were both applied to the 26 response variables, using them as complementary techniques
to arrive at a five cluster solution. Results from correlations of individual characteristics
and from identifying school clusters suggested that school community type could
possibly be moderately predictive of student performance on the TAAS/TAKS over
time.
|
215 |
Τεχνικές και μηχανισμοί συσταδοποίησης χρηστών και κειμένων για την προσωποποιημένη πρόσβαση περιεχομένου στον Παγκόσμιο ΙστόΤσόγκας, Βασίλειος 16 April 2015 (has links)
Με την πραγματικότητα των υπέρογκων και ολοένα αυξανόμενων πηγών κειμένου στο διαδίκτυο, καθίστανται αναγκαία η ύπαρξη μηχανισμών οι οποίοι βοηθούν τους χρήστες ώστε να λάβουν γρήγορες απαντήσεις στα ερωτήματά τους. Η δημιουργία περιεχομένου, προσωποποιημένου στις ανάγκες των χρηστών, κρίνεται απαραίτητη σύμφωνα με τις επιταγές της συνδυαστικής έκρηξης της πληροφορίας που είναι ορατή σε κάθε ``γωνία'' του διαδικτύου. Ζητούνται άμεσες και αποτελεσματικές λύσεις ώστε να ``τιθασευτεί'' αυτό το χάος πληροφορίας που υπάρχει στον παγκόσμιο ιστό, λύσεις που είναι εφικτές μόνο μέσα από ανάλυση των προβλημάτων και εφαρμογή σύγχρονων μαθηματικών και υπολογιστικών μεθόδων για την αντιμετώπισή τους.
Η παρούσα διδακτορική διατριβή αποσκοπεί στο σχεδιασμό, στην ανάπτυξη και τελικά στην αξιολόγηση μηχανισμών και καινοτόμων αλγορίθμων από τις περιοχές της ανάκτησης πληροφορίας, της επεξεργασίας φυσικής γλώσσας καθώς και της μηχανικής εκμάθησης, οι οποίοι θα παρέχουν ένα υψηλό επίπεδο φιλτραρίσματος της πληροφορίας του διαδικτύου στον τελικό χρήστη. Πιο συγκεκριμένα, στα διάφορα στάδια επεξεργασίας της πληροφορίας αναπτύσσονται τεχνικές και μηχανισμοί που συλλέγουν, δεικτοδοτούν, φιλτράρουν και επιστρέφουν κατάλληλα στους χρήστες κειμενικό περιεχόμενο που πηγάζει από τον παγκόσμιο ιστό. Τεχνικές και μηχανισμοί που σκοπό έχουν την παροχή υπηρεσιών πληροφόρησης πέρα από τα καθιερωμένα πρότυπα της υφιστάμενης κατάστασης του διαδικτύου.
Πυρήνας της διδακτορικής διατριβής είναι η ανάπτυξη ενός μηχανισμού συσταδοποίησης (clustering) τόσο κειμένων, όσο και των χρηστών του διαδικτύου. Στο πλαίσιο αυτό μελετήθηκαν κλασικοί αλγόριθμοι συσταδοποίησης οι οποίοι και αξιολογήθηκαν για την περίπτωση των άρθρων νέων προκειμένου να εκτιμηθεί αν και πόσο αποτελεσματικός είναι ο εκάστοτε αλγόριθμος.
Σε δεύτερη φάση υλοποιήθηκε αλγόριθμος συσταδοποίησης άρθρων νέων που αξιοποιεί μια εξωτερική βάση γνώσης, το WordNet, και είναι προσαρμοσμένος στις απαιτήσεις των άρθρων νέων που πηγάζουν από το διαδίκτυο.
Ένας ακόμη βασικός στόχος της παρούσας εργασίας είναι η μοντελοποίηση των κινήσεων που ακολουθούν κοινοί χρήστες καθώς και η αυτοματοποιημένη αξιολόγηση των συμπεριφορών, με ορατό θετικό αποτέλεσμα την πρόβλεψη των προτιμήσεων που θα εκφράσουν στο μέλλον οι χρήστες. Η μοντελοποίηση των χρηστών έχει άμεση εφαρμογή στις δυνατότητες προσωποποίησης της πληροφορίας με την πρόβλεψη των προτιμήσεων των χρηστών. Ως εκ' τούτου, υλοποιήθηκε αλγόριθμος προσωποποίησης ο οποίος λαμβάνει υπ' όψιν του πληθώρα παραμέτρων που αποκαλύπτουν έμμεσα τις προτιμήσεις των χρηστών. / With the reality of the ever increasing information sources from the internet, both in sizes and indexed content, it becomes necessary to have methodologies that will assist the users in order to get the information they need, exactly the moment they need it. The delivery of content, personalized to the user needs is deemed as a necessity nowadays due to the combinatoric explosion of information visible to every corner of the world wide web. Solutions effective and swift are desperately needed in order to deal with this information overload. These solutions are achievable only via the analysis of the refereed problems, as well as the application of modern mathematics and computational methodologies.
This Ph.d. dissertation aims to the design, development and finally to the evaluation of mechanisms, as well as, novel algorithms from the areas of information retrieval, natural language processing and machine learning. These mechanisms shall provide a high level of filtering capabilities regarding information originating from internet sources and targeted to end users. More precisely, through the various stages of information processing, various techniques are proposed and developed. Techniques that will gather, index, filter and return textual content well suited to the user tastes. These techniques and mechanisms aim to go above and beyond the usual information delivery norms of today, dealing via novel means with several issues that are discussed.
The kernel of this Ph.d. dissertation is the development of a clustering mechanism that will operate both on news articles, as well as, users of the web. Within this context several classical clustering algorithms were studied and evaluated for the case of news articles, allowing as to estimate the level of efficiency of each one within this domain of interest. This left as with a clear choice as to which algorithm should be extended for our work.
As a second phase, we formulated a clustering algorithm that operates on news articles and user profiles making use of the external knowledge base of WordNet. This algorithm is adapted to the requirements of diversity and quick churn of news articles originating from the web.
Another central goal of this Ph.d. dissertation is the modeling of the browsing behavior of system users within the context of our recommendation system, as well as, the automatic evaluation of these behaviors with the obvious desired outcome or predicting the future preferences of users. The user modeling process has direct application upon the personalization capabilities that we can over on information as far as user preferences predictions are concerned. As a result, a personalization algorithm we formulated which takes into consideration a plethora or parameters that indirectly reveal the user preferences.
|
216 |
Analyse d'images pour une recherche d'images basée contenu dans le domaine transformé.Bai, Cong 21 February 2013 (has links) (PDF)
Cette thèse s'inscrit dans la recherche d'images basée sur leur contenu. La recherche opère sur des images eprésentéesdans un domaine transformé et où sont construits directement les vecteurs de caractéristiques ou indices. Deux types detransformations sont explorés : la transformée en cosinus discrète ou Discrete Cosine Transform (DCT) et la transforméen ondelettes discrète ou Discrete Wavelet Transform (DWT), utilisés dans les normes de compression JPEG et JPEG2000. Basés sur les propriétés des coefficients de la transformation, différents vecteurs de caractéristiquessont proposés. Ces vecteurs sont mis en oeuvre dans la reconnaissance de visages et de textures couleur.Dans le domaine DCT, sont proposés quatre types de vecteurs de caractéristiques dénommés "patterns" : Zigzag-Pattern,Sum-Pattern, Texture-Pattern et Color-Pattern. Le premier type est l'amélioration d'une approche existante. Les trois derniers intègrent la capacité de compactage des coefficients DCT, sachant que certains coefficients représentent une information de directionnalité. L'histogramme de ces vecteurs est retenu comme descripteur de l'image. Pour une réduction de la dimension du descripteur lors de la construction de l'histogramme il est défini, soit une adjacence sur des patterns proches puis leur fusion, soit une sélection des patterns les plus fréquents. Ces approches sont évaluées sur des bases de données d'images de visages ou de textures couramment utilisées. Dans le domaine DWT, deux types d'approches sont proposés. Dans le premier, un vecteur-couleur et un vecteur-texture multirésolution sont élaborés. Cette approche se classe dans le cadre d'une caractérisation séparée de la couleur et de la texture. La seconde approche se situe dans le contexte d'une caractérisation conjointe de la couleur et de la texture. Comme précédemment, l'histogramme des vecteurs est choisi comme descripteur en utilisant l'algorithme K-means pour construire l'histogramme à partir de deux méthodes. La première est le procédé classique de regroupement des vecteurs par partition. La seconde est un histogramme basé sur une représentation parcimonieuse dans laquelle la valeur des bins représente le poids total des vecteurs de base de la représentation.
|
217 |
IT žinių portalo statistikos modulis pagrįstas grupavimu / Portal Statistics Module Based on ClusteringRuzgys, Martynas 16 August 2007 (has links)
Pristatomas duomenų gavybos ir grupavimo naudojimas paplitusiose sistemose bei sukurtas IT žinių portalo statistikos prototipas duomenų saugojimui, analizei ir peržiūrai atlikti. Siūlomas statistikos modulis duomenų saugykloje periodiškais laiko momentais vykdantis duomenų transformacijas. Portale prieinami statistiniai duomenys gali būti grupuoti. Sugrupuotą informaciją pateikus grafiškai, duomenys gali būti interpretuojami ir stebimi veiklos mastai. Panašių objektų grupėms išskirti pritaikytas vienas iš žinomiausių duomenų grupavimo metodų – lygiagretusis k-vidurkių metodas. / Presented data mining methods and clustering usage in current statistical systems and created statistics module prototype for data storage, analysis and visualization for IT knowledge portal. In suggested statistics prototype database periodical data transformations are performed. Statistical data accessed in portal can be clustered. Clustered information represented graphically may serve for interpreting information when trends may be noticed. One of the best known data clustering methods – parallel k-means method – is adapted for separating similar data clusters.
|
218 |
The thalamus in Parkinson's disease: a multimodal investigation of thalamic involvement in cognitive impairmentBorlase, Nadia Miree January 2013 (has links)
Parkinson’s disease patients present with the highest risk of dementia development. The thalamus, integral to several functions and behaviours is involved in the pathophysiology of Parkinson’s disease. The aim of this thesis was to determine if anatomical abnormalities in the thalamus are associated with the development of dementia in Parkinson’s disease.
We examined the thalamus using macro and microstructural techniques and the white matter pathways that connect the thalamus with areas of the surrounding cortex using diffusion tensor imaging (DTI) based tractography. T1-weighted magnetic resonance and DT images were collected in 56 Parkinson’s disease patients with no cognitive impairment, 19 patients with mild cognitive impairment, 17 patients with dementia and 25 healthy individuals who acted as control subjects. An established automated segmentation procedure (FIRST FSL) was used to delineate the thalamus and a modified k-means clustering algorithm applied to segment the thalamus into clusters assumed to represent thalamic nuclei. Fibre tracts were determined using DTI probabilistic tracking methods available in FIRST. Microstructural integrity was quantified by fractional anisotropy and mean diffusivity (MD) DTI measures.
Results show that microstructural measures of thalamic integrity are more sensitive to cognitive dysfunction in PD than macrostructural measures. For the first time we showed a progressive worsening of cellular integrity (MD) in the groups who had greater levels of cognitive dysfunction. Thalamic degeneration was regionally specific and most advanced in the limbic thalamic nuclei which influenced executive function and attention, areas of cognition that are known to be affected in the earliest stages of PD. The integrity of the fibre tracts corresponding to these thalamic regions was also compromised. Degeneration of fibre tracts was most evident in the dementia group, indicating that they may be more protected against Lewy pathology than the nuclei of the thalamus.
Our findings confirm previous histological, animal and lesion studies and provide a reliable estimate of cortical degeneration in PD that can be applied non-invasively and in vivo. A longitudinal study is needed to monitor the progression of cognitive decline in PD but we have provided the basis for further investigation into the predictive validity of thalamic degeneration for cognitive dysfunction. In the future, the microstructural changes of the thalamus could be used as biomarkers for the identification of individuals with a higher risk for dementia development and for the longitudinal monitoring of any interventions into cognitive decline.
|
219 |
Design of robust blind detector with application to watermarkingAnamalu, Ernest Sopuru 14 February 2014 (has links)
One of the difficult issues in detection theory is to design a robust detector that takes into account the actual distribution of the original data. The most commonly used statistical detection model for blind detection is Gaussian distribution. Specifically, linear correlation is an optimal detection method in the presence of Gaussian distributed features. This has been found to be sub-optimal detection metric when density deviates completely from Gaussian distributions. Hence, we formulate a detection algorithm that enhances detection probability by exploiting the true characterises of the original data. To understand the underlying distribution function of data, we employed the estimation techniques such as parametric model called approximated density ratio logistic regression model and semiparameric estimations. Semiparametric model has the advantages of yielding density ratios as well as individual densities. Both methods are applicable to signals such as watermark embedded in spatial domain and outperform the conventional linear correlation non-Gaussian distributed.
|
220 |
Design of robust blind detector with application to watermarkingAnamalu, Ernest Sopuru 14 February 2014 (has links)
One of the difficult issues in detection theory is to design a robust detector that takes into account the actual distribution of the original data. The most commonly used statistical detection model for blind detection is Gaussian distribution. Specifically, linear correlation is an optimal detection method in the presence of Gaussian distributed features. This has been found to be sub-optimal detection metric when density deviates completely from Gaussian distributions. Hence, we formulate a detection algorithm that enhances detection probability by exploiting the true characterises of the original data. To understand the underlying distribution function of data, we employed the estimation techniques such as parametric model called approximated density ratio logistic regression model and semiparameric estimations. Semiparametric model has the advantages of yielding density ratios as well as individual densities. Both methods are applicable to signals such as watermark embedded in spatial domain and outperform the conventional linear correlation non-Gaussian distributed.
|
Page generated in 0.0431 seconds