Spelling suggestions: "subject:"bymeans clustering"" "subject:"bymeans klustering""
31 |
An Unsupervised Machine-Learning Framework for Behavioral Classification from Animal-Borne AccelerometersDentinger, Jane Elizabeth 03 May 2019 (has links)
Studies of animal spatial distributions typically use prior knowledge of animal habitat requirements and behavioral ecology to deduce the most likely explanations of observed habitat use. Animal-borne accelerometers can be used to distinguish behaviors which allows us to incorporate in situ behavior into our understanding of spatial distributions. Past research has focused on using supervised machine-learning, which requires a priori specification of behavior to identify signals whereas unsupervised approaches allow the model to identify as many signal types as permitted by the data. The following framework couples direct observation to behavioral clusters identified from unsupervised machine learning on a large accelerometry dataset. A behavioral profile was constructed to describe the proportion of behaviors observed per cluster and the framework was applied to an acceleration dataset collected from wild pigs (Sus scrofa). Although, most clusters represented combinations of behaviors, a leave-p-out validation procedure indicated this classification system accurately predicted new data.
|
32 |
Computational Intelligence and Data Mining Techniques Using the Fire Data SetStorer, Jeremy J. 04 May 2016 (has links)
No description available.
|
33 |
Methods of Determining the Number of Clusters in a Data Set and a New Clustering CriterionYan, Mingjin 29 December 2005 (has links)
In cluster analysis, a fundamental problem is to determine the best estimate of the number of clusters, which has a deterministic effect on the clustering results. However, a limitation in current applications is that no convincingly acceptable solution to the best-number-of-clusters problem is available due to high complexity of real data sets. In this dissertation, we tackle this problem of estimating the number of clusters, which is particularly oriented at processing very complicated data which may contain multiple types of cluster structure. Two new methods of choosing the number of clusters are proposed which have been shown empirically to be highly effective given clear and distinct cluster structure in a data set. In addition, we propose a sequential type of clustering approach, called multi-layer clustering, by combining these two methods. Multi-layer clustering not only functions as an efficient method of estimating the number of clusters, but also, by superimposing a sequential idea, improves the flexibility and effectiveness of any arbitrary existing one-layer clustering method. Empirical studies have shown that multi-layer clustering has higher efficiency than one layer clustering approaches, especially in detecting clusters in complicated data sets. The multi-layer clustering approach has been successfully implemented in clustering the WTCHP microarray data and the results can be interpreted very well based on known biological knowledge.
Choosing an appropriate clustering method is another critical step in clustering. K-means clustering is one of the most popular clustering techniques used in practice. However, the k-means method tends to generate clusters containing a nearly equal number of objects, which is referred to as the ``equal-size'' problem. We propose a clustering method which competes with the k-means method. Our newly defined method is aimed at overcoming the so-called ``equal-size'' problem associated with the k-means method, while maintaining its advantage of computational simplicity. Advantages of the proposed method over k-means clustering have been demonstrated empirically using simulated data with low dimensionality. / Ph. D.
|
34 |
Analysis of the visitors' profile of the islands Ilha do Superagüi e Ilha do Mel - Marketing as an instrument for sustainable tourismNiefer, Inge Andrea 05 1900 (has links)
The objectives of this work were to analyze and to compare the visitors of the immediate surroundings of two protected areas in the State of Paraná: the National Park of Superagüi and the Ecological Station “Ilha do Mel”, both islands. There was applied a questionnaire with 37 qualitative and quantitative questions. The questionnaire consisted of five parts: sociodemographic characteristics; trip characteristics; environmental conscience and attitudes; favorite activities and motivation; and perception of the destiny. The data were collected through personal interviews that in the average took from 20 to 30 minutes. 327 questionnaires were applied in Superagüi; in the period of December of 1998 to May of 2000, and 392 on the Ilha do Mel, in the period of April of 2000 to June of 2000. There are significant differences among the visitors of the two islands, this practically in all the researched characteristics. The public of the Ilha do Mel is significantly younger, what influences in several other variables, such as: civil status; education degree; and employment situation. 84% of the visitors of Ilha do Mel heard about it through friends/family, while in Superagüi only 67%. Ilha do Mel, for being a tourist destiny already for a longer time and the easy access, receives a larger number of people with repeated visits. Tourism was trip objective to a larger portion of the visitors of Ilha do Mel; in compensation they were observed significantly more researchers in Superagüi. Visitors’ environmental conscience can be considered high on both islands, but the one of the visitors of Ilha do Mel was inferior to Superagüi. Fewer respondents knew that the place they visited is a protected area. The value of the entrance fee that they are willing to pay was significantly smaller, as well as the disposition to follow the rules in favor of the conservation of nature. The interest in social and environmental subjects was significantly higher among the visitors of Superagüi. They were also willing to pay more for the use of environmental sane techniques than the respondents on Ilha do Mel. The interest in practicing the 25 tourist activities was significantly different between the two places. The comparison of the visitors’ attitude towards to problems showed that a part of the interviewees in Superagüi is much less inconvenienced with problems linked to the infrastructure that reduce the comfort during the stay, confirmed this fact by the smaller importance they give to items of tourist infrastructure. Among the visitors of Superagüi there was an accentuated concern with the improvement of the quality of the host community's life, fact not noticed on Ilha do Mel. In terms of motivation, it was shown that the visitors of Superagüi have larger appreciation to the natural and cultural values and the escape of the stress of the city than the ones of Ilha do Mel. There was also accomplished a benefit segmentation, showing that it is possible to identify distinct segments among the visitors of the same place. In Superagüi they were identified the following clusters: 1) the indifferent ones; 2) the non-sociable adventurers; 3) the sociable adventurers; 4) the enthusiasts; and 5) the non-sociable naturalists. On Ilha do Mel there were identified five different clusters: 1) the sociable adventurers; 2) the pure naturalists; 3) the enthusiasts; 4) the indifferent ones; and 5) the cultural naturalists.
|
35 |
設計與實作一個針對遊戲論壇的中文文章整合系統 / Design and Implementation of a Chinese Document Integration System for Game Forums黃重鈞, Huang, Chung Chun Unknown Date (has links)
現今網路發達便利,人們資訊交換的方式更多元,取得資訊的方式,不再僅是透過新聞,透過論壇任何人都可以快速地、較沒有門檻地分享資訊。也因為這個特性造成資訊量暴增,就算透過搜尋引擎,使用者仍需要花費許多精力蒐集、過濾與處理特定的主題。本研究以巴哈姆特電玩資訊站─英雄聯盟哈拉討論板為例,期望可以為使用者提供一個全面且精要的遊戲角色描述,讓使用者至少對該角色有大概的認知。
本研究參考網路論壇探勘及新聞文件摘要系統,設計適用於論壇多篇文章的摘要系統。首先必須了解並分析論壇的特性,實驗如何從論壇挖掘出潛藏的資訊,並認識探勘論壇會遭遇的困難。根據前面的論壇分析再設計系統架構大致可分為三階段:1. 資料前處理:論壇文章與新聞文章不同,很難直接將名詞、動詞作為關鍵字,因此使用TF-IDF篩選出論壇文章中有代表性的詞彙,作為句子的向量空間維度。2. 分群:使用K-Means分群法分辨哪些句子是比較相似的,並將相似的句子分在同一群。 3. 句子挑選:根據句子的分群結果,依句子的關鍵字含量及TF-IDF選擇出最能代表文件集的句子。
我們發現實驗分析過程中可以看到一些有用的相關資訊,在論文的最後提出可能的改善方法,期望未來可以開發更好的論壇文章分類方式。 / With the establishment of network infrastructure, forum users can provide information fast and easily. However, users can have information retrieved through search engines, but they still have difficulty handling the articles. This is usually beyond the ability of human processing. In this study, we design a tool to automate retrieval of information from each topic in a Chinese game forum.
We analyze the characteristics of the game forum, and refer to English news summary system. Our method is divided into three phases. The first phase attempts to discover the keywords in documents by TF-IDF instead of part of speech, and builds a vector space model. The second phase distinguishes the sentences by the vector space model built in the first phase. Also in the second phase, K-means clustering algorithm is exploited to gather sentences with the same sense into the same cluster. In the third phase, we choose two features to weight sentences and order sentences according to their weights. The two features are keywords of a sentence and TF-IDF.
We conduct an experiment with data collected from the game forum, and find useful information through the experiment. We believe the developed techniques and the results of the analysis can be used to design a better system in the future.
|
36 |
Algorithmes et méthodes pour le diagnostic ex-situ et in-situ de systèmes piles à combustible haute température de type oxyde solide / Ex-situ and in-situ diagnostic algorithms and methods for solid oxide fuel cell systemsWang, Kun 21 December 2012 (has links)
Le projet Européen « GENIUS » ambitionne de développer les méthodologies génériques pour le diagnostic de systèmes piles à combustible à haute température de type oxyde solide (SOFC). Le travail de cette thèse s’intègre dans ce projet ; il a pour objectif la mise en oeuvre d’un outil de diagnostic en utilisant le stack comme capteur spécial pour détecter et identifierles défaillances dans les sous-systèmes du stack SOFC.Trois algorithmes de diagnostic ont été développés, se basant respectivement sur la méthode de classification k-means, la technique de décomposition du signal en ondelettes ainsi que la modélisation par réseau Bayésien. Le premier algorithme sert au diagnostic ex-situ et est appliqué pour traiter les donnés issues des essais de polarisation. Il permet de déterminer les variables de réponse significatives qui indiquent l’état de santé du stack. L’indice Silhouette a été calculé comme mesure de qualité de classification afin de trouver le nombre optimal de classes dans la base de données.La détection de défaut en temps réel peut se réaliser par le deuxième algorithme. Puisque le stack est employé en tant que capteur, son état de santé doit être vérifié préalablement. La transformée des ondelettes a été utilisée pour décomposer les signaux de tension de la pile SOFC dans le but de chercher les variables caractéristiques permettant d’indiquer l’état desanté de la pile et également assez discriminatives pour différentier les conditions d’opération normales et anormales.Afin d’identifier le défaut du système lorsqu’une condition d’opération anormale s’est détectée, les paramètres opérationnelles réelles du stack doivent être estimés. Un réseau Bayésien a donc été développé pour accomplir ce travail.Enfin, tous les algorithmes ont été validés avec les bases de données expérimentales provenant de systèmes SOFC variés, afin de tester leur généricité. / The EU-project “GENIUS” is targeted at the investigation of generic diagnosis methodologies for different Solid Oxide Fuel Cell (SOFC) systems. The Ph.D study presented in this thesis was integrated into this project; it aims to develop a diagnostic tool for SOFC system fault detection and identification based on validated diagnostic algorithms, through applying theSOFC stack as a sensor.In this context, three algorithms, based on the k-means clustering technique, the wavelet transform and the Bayesian method, respectively, have been developed. The first algorithm serves for ex-situ diagnosis. It works on the classification of the polarization measurements of the stack, aiming to figure out the significant response variables that are able to indicate the state of health of the stack. The parameter “Silhouette” has been used to evaluate the classification solutions in order to determine the optimal number of classes/patterns to retain from the studied database.The second algorithm allows the on-line fault detection. The wavelet transform has been used to decompose the SOFC’s voltage signals for the purpose of finding out the effective feature variables that are discriminative for distinguishing the normal and abnormal operating conditions of the system. Considering the SOFC as a sensor, its reliability must be verifiedbeforehand. Thus, the feature variables are also required to be indicative to the state of health of the stack.When the stack is found being operated improperly, the actual operating parameters should be estimated so as to identify the system fault. To achieve this goal, a Bayesian network has been proposed serving as a meta-model of the stack to accomplish the estimation. At the end, the databases originated from different SOFC systems have been used to validate these three algorithms and assess their generalizability.
|
37 |
Data Mining Methods For Clustering Power Quality Data Collected Via Monitoring Systems Installed On The Electricity NetworkGuder, Mennan 01 September 2009 (has links) (PDF)
Increasing power demand and wide use of high technology power electronic devices result in need for power quality monitoring. The quality of electric power in both transmission and distribution systems should be analyzed in order to sustain power system reliability and continuity. This analysis is possible by examination of data collected by power quality monitoring systems. In order to define the characteristics of the power system and reveal the relations between the power quality events, huge amount of data should be processed. In this thesis, clustering methods for power quality events are developed using exclusive and overlapping clustering models. The methods are designed to cluster huge amount of power quality data which is obtained from the online monitoring of the Turkish Electricity Transmission System. The main issues considered in the design of the clustering methods are the amount of the data, efficiency of the designed algorithm and queries that should be supplied to the domain experts. This research work is fully supported by the Public Research grant Committee (KAMAG) of TUBITAK within the scope of National Power quality Project (105G129).
|
38 |
Predicting The Effect Of Hydrophobicity Surface On Binding Affinity Of Pcp-like Compounds Using Machine Learning MethodsYoldas, Mine 01 April 2011 (has links) (PDF)
This study aims to predict the binding affinity of the PCP-like compounds by means of molecular hydrophobicity. Molecular hydrophobicity is an important property which affects the binding affinity of molecules. The values of molecular hydrophobicity of molecules are obtained on three-dimensional coordinate system. Our aim is to reduce the number of points on the hydrophobicity surface of the molecules. This is modeled by using self organizing maps (SOM) and k-means clustering. The feature sets obtained from SOM and k-means clustering
are used in order to predict binding affinity of molecules individually. Support vector regression and partial least squares regression are used for prediction.
|
39 |
The thalamus in Parkinson's disease: a multimodal investigation of thalamic involvement in cognitive impairmentBorlase, Nadia Miree January 2013 (has links)
Parkinson’s disease patients present with the highest risk of dementia development. The thalamus, integral to several functions and behaviours is involved in the pathophysiology of Parkinson’s disease. The aim of this thesis was to determine if anatomical abnormalities in the thalamus are associated with the development of dementia in Parkinson’s disease.
We examined the thalamus using macro and microstructural techniques and the white matter pathways that connect the thalamus with areas of the surrounding cortex using diffusion tensor imaging (DTI) based tractography. T1-weighted magnetic resonance and DT images were collected in 56 Parkinson’s disease patients with no cognitive impairment, 19 patients with mild cognitive impairment, 17 patients with dementia and 25 healthy individuals who acted as control subjects. An established automated segmentation procedure (FIRST FSL) was used to delineate the thalamus and a modified k-means clustering algorithm applied to segment the thalamus into clusters assumed to represent thalamic nuclei. Fibre tracts were determined using DTI probabilistic tracking methods available in FIRST. Microstructural integrity was quantified by fractional anisotropy and mean diffusivity (MD) DTI measures.
Results show that microstructural measures of thalamic integrity are more sensitive to cognitive dysfunction in PD than macrostructural measures. For the first time we showed a progressive worsening of cellular integrity (MD) in the groups who had greater levels of cognitive dysfunction. Thalamic degeneration was regionally specific and most advanced in the limbic thalamic nuclei which influenced executive function and attention, areas of cognition that are known to be affected in the earliest stages of PD. The integrity of the fibre tracts corresponding to these thalamic regions was also compromised. Degeneration of fibre tracts was most evident in the dementia group, indicating that they may be more protected against Lewy pathology than the nuclei of the thalamus.
Our findings confirm previous histological, animal and lesion studies and provide a reliable estimate of cortical degeneration in PD that can be applied non-invasively and in vivo. A longitudinal study is needed to monitor the progression of cognitive decline in PD but we have provided the basis for further investigation into the predictive validity of thalamic degeneration for cognitive dysfunction. In the future, the microstructural changes of the thalamus could be used as biomarkers for the identification of individuals with a higher risk for dementia development and for the longitudinal monitoring of any interventions into cognitive decline.
|
40 |
Beam position diagnostics with higher order modes in third harmonic superconducting accelerating cavitiesZhang, Pei January 2013 (has links)
Higher order modes (HOM) are electromagnetic resonant fields. They can be excited by an electron beam entering an accelerating cavity, and constitute a component of the wakefield. This wakefield has the potential to dilute the beam quality and, in the worst case, result in a beam-break-up instability. It is therefore important to ensure that these fields are well suppressed by extracting energy through special couplers. In addition, the effect of the transverse wakefield can be reduced by aligning the beam on the cavity axis. This is due to their strength depending on the transverse offset of the excitation beam. For suitably small offsets the dominant components of the transverse wakefield are dipole modes, with a linear dependence on the transverse offset of the excitation bunch. This fact enables the transverse beam position inside the cavity to be determined by measuring the dipole modes extracted from the couplers, similar to a cavity beam position monitor (BPM), but requires no additional vacuum instrumentation.At the FLASH facility in DESY, 1.3 GHz (known as TESLA) and 3.9 GHz (third harmonic) cavities are installed. Wakefields in 3.9 GHz cavities are significantly larger than in the 1.3 GHz cavities. It is therefore important to mitigate the adverse effects of HOMs to the beam by aligning the beam on the electric axis of the cavities. This alignment requires an accurate beam position diagnostics inside the 3.9 GHz cavities. It is this aspect that is focused on in this thesis. Although the principle of beam diagnostics with HOM has been demonstrated on 1.3 GHz cavities, the realization in 3.9 GHz cavities is considerably more challenging. This is due to the dense HOM spectrum and the relatively strong coupling of most HOMs amongst the four cavities in the third harmonic cryo-module. A comprehensive series of simulations and HOM spectra measurements have been performed in order to study the modal band structure of the 3.9 GHz cavities. The dependencies of various dipole modes on the offset of the excitation beam were subsequently studied using a spectrum analyzer. Various data analysis methods were used: modal identification, direct linear regression, singular value decomposition and k-means clustering. These studies lead to three modal options promising for beam position diagnostics, upon which a set of test electronics has been built. The experiments with these electronics suggest a resolution of 50 micron accuracy in predicting local beam position in the cavity and a global resolution of 20 micron over the complete module. This constitutes the first demonstration of HOM-based beam diagnostics in a third harmonic 3.9 GHz superconducting cavity module. These studies have finalized the design of the online HOM-BPM for 3.9 GHz cavities at FLASH.
|
Page generated in 0.0656 seconds