Spelling suggestions: "subject:"[een] CLUSTER ANALYSIS"" "subject:"[enn] CLUSTER ANALYSIS""
401 |
Data Modeling for Outlier DetectionAbghari, Shahrooz January 2018 (has links)
This thesis explores the data modeling for outlier detection techniques in three different application domains: maritime surveillance, district heating, and online media and sequence datasets. The proposed models are evaluated and validated under different experimental scenarios, taking into account specific characteristics and setups of the different domains. Outlier detection has been studied and applied in many domains. Outliers arise due to different reasons such as fraudulent activities, structural defects, health problems, and mechanical issues. The detection of outliers is a challenging task that can reveal system faults, fraud, and save people's lives. Outlier detection techniques are often domain-specific. The main challenge in outlier detection relates to modeling the normal behavior in order to identify abnormalities. The choice of model is important, i.e., an incorrect choice of data model can lead to poor results. This requires a good understanding and interpretation of the data, the constraints, and the requirements of the problem domain. Outlier detection is largely an unsupervised problem due to unavailability of labeled data and the fact that labeled data is expensive. We have studied and applied a combination of both machine learning and data mining techniques to build data-driven and domain-oriented outlier detection models. We have shown the importance of data preprocessing as well as feature selection in building suitable methods for data modeling. We have taken advantage of both supervised and unsupervised techniques to create hybrid methods. For example, we have proposed a rule-based outlier detection system based on open data for the maritime surveillance domain. Furthermore, we have combined cluster analysis and regression to identify manual changes in the heating systems at the building level. Sequential pattern mining for identifying contextual and collective outliers in online media data have also been exploited. In addition, we have proposed a minimum spanning tree clustering technique for detection of groups of outliers in online media and sequence data. The proposed models have been shown to be capable of explaining the underlying properties of the detected outliers. This can facilitate domain experts in narrowing down the scope of analysis and understanding the reasons of such anomalous behaviors. We have also investigated the reproducibility of the proposed models in similar application domains. / Scalable resource-efficient systems for big data analytics
|
402 |
Controle estrutural e classificação do canal no baixo Tapajós : contribuições para a geomorfologia da Amazônia /Cortes, João Paulo Soares de. January 2020 (has links)
Orientador: George Luiz Luvizotto / Resumo: A formação das rias fluviais na Amazônia possui uma relação bem conhecida com o processo de avanço do nível do mar durante o Holoceno. Sugestões sobre a presença de controle estrutural e tectônico na gênese destas e de outros elementos do relevo amazônico tem sido levantadas por diversos autores, porém poucos elementos conclusivos foram apresentados até o momento. Este trabalho apresenta, no primeiro momento, uma série de evidências de diferentes fontes mostrando controle estrutural ao longo do ria do Tapajós e em áreas de terra firme adjacentes. A metodologia utilizada, é inovadora por integrar dados geomorfológicos, geológicos e geofísicos (sísmica, magnetometria e gravimetria) obtidos sem custo e disponíveis para grandes áreas, o que é uma grande vantagem em uma zona de difícil acesso como a Amazônia. Trata-se ainda de uma abordagem pouco usual dentro da geomorfologia na qual encontramos resultados muito promissores. Os resultados mostram a influência de elementos estruturais na configuração do relevo amazônico na região do baixo Tapajós. É proposto um modelo de horsts e grábens limitados por lineamentos com direção ENE-WSW com expressão regional. Em seguida apresentamos uma classificação para o canal do Tapajós baseado em variáveis morfométricas extraídas de perfil transversal. A classificação apresenta três trechos distintos para o canal do Tapajós no perímetro analisado, denominados Trecho do Canal Estreito, Baixo Trecho da Ria e Alto Trecho da Ria. Estes trechos possue... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: The formation of the fluvial rias in the Amazon has a well-known relationship with the process of sea level transgression during the Holocene. Suggestions about the presence of structural and tectonic control in the genesis of these and other elements of the Amazonian relief have been raised by several authors, but few conclusive elements have been presented so far. This work presents, in the first moment, a series of evidences from different sources showing structural control along the Tapajós Ria and in adjacent land areas. The methodology used is innovative because it integrates geomorphological, geological and geophysical data (seismic, magnetometry and gravimetry) obtained at no cost and available for large areas, which is a great advantage in an area of difficult access such as the Amazon. It is also an unusual approach within geomorphology in which we find very promising results. The results show the influence of structural elements in the configuration of the Amazonian relief in the region of the lower Tapajós. A model of horsts and grabens limited by lineaments with regional expression is proposed. Next, we present a classification for the Tapajós channel based on morphometric variables extracted from transversal profiles. The classification presents three distinct sections for the Tapajós channel in the analyzed perimeter, here called Narrower Channel Reach, Lower Ria Reach and Higher Ria Reach. These reaches have statistical support and agreement with most of the s... (Complete abstract click electronic access below) / Doutor
|
403 |
Cluster Analysis of the MMPI-2 in a Chronic Low-Back Pain PopulationDeBeus, Roger J. (Roger John) 12 1900 (has links)
The Minnesota Multiphasic Personality Inventory (MMPI) is the most frequently used psychological measure in the assessment of chronic pain. Since the introduction of the MMPI-2 in 1989 only two published studies have focused on cluster analysis of chronic pain patients. This study investigated MMPI-2 cluster solutions of chronic low-back pain patients. Data was collected from 2,051 chronic low-back pain patients from a multidisciplinary pain clinic in the southwestern United States. A hierarchical clustering procedure was performed on K-corrected T-scores of the MMPI-2 using the three validity and ten clinical scales. Four relatively homogeneous subgroups were identified for each sex with the MMPI-2. In general, these results replicated the findings of previous researchers using both the MMPI and MMPI-2.
|
404 |
Investigating the use of generational cohort theory to identify total reward preferencesDavids, Aayesha 17 March 2020 (has links)
Background: Anecdotal accounts of stereotypes and/or generalisations about perceived generational differences within the workplace have become commonplace. Generational cohort theories are often used to identify generational cohorts of employees that are argued to be different, including having differing expectations, needs, preferences and even values. In addressing and/or accommodating such individual differences organisations are increasingly adopting strategies and interventions that take such generational differences amongst employees into account (Costanza & Finkelstein, 2015). Addressing generational differences within the workplace has particularly become popular in the design and implementation of total reward or remuneration and recognition strategies, policies and practices. Understanding generational and/or demographic characteristics, specifically differences, that create distinct cohorts allow organisations to design reward and recognition packages that create distinctly unique value for their employees. Offering tailored or more focused reward strategies and practices, designed with individual differences in mind are believed to enhance attraction, employee engagement and retention and so allow an organisation to bolster its competitive advantage and contribute to sustained organisational success (Snelgar, Renard, & Venter, 2013). In support of this notion, empirical studies are showing promising results for targeted reward strategies and practices. Rationale for the Research Study Effective talent management, i.e. attracting, engaging and retaining sought-after highly skilled employees is critical for the success of any organisation. However, organisations are increasingly experiencing challenges in recruiting, motivating and retaining scarce human capital, colloquially referred to as talent (Barkhuizen, 2014). Failure on the part of organisations to understand and adapt to differences in the workforce may result in them not being able to attract the talent required; keep employees motivated and engaged; and experience unintended employee turnover which is associated with notable direct and indirect costs for them (Westerman & Yamamura, 2007). Organisations, therefore, are constantly searching for new and innovative approaches to more effectively attract, retain and engage employees (Snelgar et al., 2013). There is a growing body of research (Haynes, 2011; Snelgar, Renard, & Venter, 2013) that has shown that identifying distinct reward and recognition preferences amongst cohorts of employees and targeting reward and recognition strategies accordingly, is showing promising potential in this regard. When designing and implementing targeted approaches to reward and recognition, employee cohorts are most often identified using generational cohort theory, i.e. using various established guidelines to group employees into generational cohorts that are believed to be distinctly different to one another, while those within these groups being more similar than not. Results obtained from studies using these various employee cohorts as a framework have been used to inform the design of targeted reward and recognition practices and policies. Generational cohort theory is, however, mostly grounded on a set of historical events that took place in the United States of America (USA). Despite this, the American-based framework used to identify individuals belonging to various generations has been adopted globally, both within organisations and even used in research studies published in peer-reviewed literature. However, several authors have criticised the indiscriminate use of a popular American-based generational framework, i.e. focusing on events affecting Americans arguing that this has resulted in a somewhat narrow or even skewed view generational cohorts. These authors have gone as far as to argue that the American-based generational framework may not be appropriate or ineffective outside of the USA at all (Close, 2015). Following this reasoning, they have called for alternative frameworks that create distinct generational cohorts relevant in contexts outside of America, i.e. based on different events and criteria more applicable to those contexts. Aim of the research study The aim of the present study was to investigate the reward preferences of a broad range of employees in an effort to assess whether the popular generational model of Strauss and Howe (1991) is relevant and/or as effective in a non-American context, as well as to possibly find support for alternative perspectives or approaches to identify distinct generational cohorts in organisations that may be more appropriate and/or effective when designing reward offerings for different cohorts of employees. Given time and cost constraints, South Africa was chosen to investigate this claim given that it is a developing economy (vs the USA being a developed economy) and has a different set of notable events that have shaped its history to that which is applicable to the USA. Given the aim of the present study, an exploratory research design was considered most appropriate to investigate generational cohort theory within a non-American context as a framework to identify employee groups/cohorts that have distinctly different total reward preferences. For the purposes of the present study, it was decided that a quantitative approach would be followed as it is most useful to draw conclusions or inferences related to the total reward preferences of employee groups/cohorts. The present study followed a non–probability or convenience sampling approach with a realised sample of 169 respondents. The majority of respondents were Coloured and were further female, with majority of attaining a qualification post matric. Main results and findings A one-way Analysis of Variance (ANOVA) revealed no statistically significant difference between the generational groups based on the popular generational model of Strauss and Howe (1991), nor for a proposed generational cohort framework that was designed for the purposes of the present study and which was based on notable South African historical events. Following a data-driven exploratory approach, cluster analysis, on the other hand, yielded three distinct generational cohorts based on their perceived reward preferences for typical total rewards elements. Significant differences in the total reward preferences of respondents born after 1994 and those before 1994. Choice-based modelling (choice-based conjoint analysis) revealed that most respondents considered financial rewards as being the two most preferred total reward elements for them, including remuneration (guaranteed pay) followed by benefits and then non-financial rewards (work-life balance being the most preferred non-financial reward preference). Theoretical and Practical Implications Numerous research studies have made use of the popular American-based generational model to identify the reward preferences of cohort groups, without taking into account context-specific variables. There is further a dearth of empirical research that has been conducted to investigate generational cohort theory specifically, while none were found that were conducted in developing economies, such as South Africa. The present study address this gap in current literature. The use of choice-based modelling or choice-based conjoint analysis, furthermore, makes a methodological contribution given that this method is seldom found in total reward preference studies. This method was shown to identify total reward preferences that could not be determined using a field-survey or questionnaire. Choice-based modelling is different to typical survey approaches in that it is better able to replicate human decision making, i.e. assessing relative importance of attributes and levels based on combinations of choices and related sacrifices that humans deal with when making a choice-decision. In terms of the practical contribution of the present study, the results provide insights for organisations that may be incorporated when designing differentiated total reward strategies to accommodate and/or address the needs of the different generational groups.
|
405 |
An Application of Cluster Analysis in Identifying and Evaluating Prognostic Subgroups for Therapy-Related Acute Myeloid LeukemiaAntonilli, Stefanie January 2022 (has links)
Treatment for lymphoma with alkylating therapy is known to increase the risk of secondary malignancies such as Acute Myeloid Leukemia (AML), although the risk is not fully understood. This study investigates the characteristics of AML that arise after lymphoma treatment in contrastto AML cases without a prior lymphoma. The study population consists of 115 individuals identified from the Swedish lymphoma register (SLR) with a diagnosis in the quality register for AML between 2000-2019, matched 1:1 to lymphoma-free comparators. A hierarchical clusteranalysis with Gower’s similarity measure and the k-prototypes clustering algorithm are employed to separately identify subgroups of those with a lymphoma history and the matched comparators. The survival of lymphoma patients is compared between subgroups in a Cox regression model. The findings suggests a two-cluster partition achieved by the hierarchical method for patients with a lymphoma history as well as for lymphoma-free patients (average Silhouette 0.853 and0.842, respectively). Both partitions completely separates patients with genetic information from those without. For AML patients with a preceding lymphoma, a subgroup defined by the hierarchical two-cluster partition is associated with an increased mortality rate (HR 2.40). A three-cluster partition achieved by the k-prototypes algorithm could be more clinically relevant, however only one subgroup is associated with increased mortality (HR 2.73).
|
406 |
Clustering Generic Log Files Under Limited Data Assumptions / Klustring av generiska loggfiler under begränsade antagandenEriksson, Håkan January 2016 (has links)
Complex computer systems are often prone to anomalous or erroneous behavior, which can lead to costly downtime as the systems are diagnosed and repaired. One source of information for diagnosing the errors and anomalies are log files, which are often generated in vast and diverse amounts. However, the log files' size and semi-structured nature makes manual analysis of log files generally infeasible. Some automation is desirable to sift through the log files to find the source of the anomalies or errors. This project aimed to develop a generic algorithm that could cluster diverse log files in accordance to domain expertise. The results show that the developed algorithm performs well in accordance to manual clustering even under more relaxed data assumptions. / Komplexa datorsystem är ofta benägna att uppvisa anormalt eller felaktigt beteende, vilket kan leda till kostsamma driftstopp under tiden som systemen diagnosticeras och repareras. En informationskälla till feldiagnosticeringen är loggfiler, vilka ofta genereras i stora mängder och av olika typer. Givet loggfilernas storlek och semistrukturerade utseende så blir en manuell analys orimlig att genomföra. Viss automatisering är önsvkärd för att sovra bland loggfilerna så att källan till felen och anormaliteterna blir enklare att upptäcka. Det här projektet syftade till att utveckla en generell algoritm som kan klustra olikartade loggfiler i enlighet med domänexpertis. Resultaten visar att algoritmen presterar väl i enlighet med manuell klustring även med färre antaganden om datan.
|
407 |
Is there a meaningful subgroup of youths displaying both psychopathic traits and ADHD?Aronsson, Fanny, Laini Bovellan, Alexandra January 2021 (has links)
In this study, we examined subgroups of adolescents based on their levels of psychopathic traits and ADHD symptoms. Participants were 982 adolescents from a community sample, with a mean age of 14.28 (SD= .94) years. We used youths’ self-reports of psychopathic traits and their legal guardians’ reports of the adolescent’s ADHD symptoms to identify distinct subgroups of youths. We identified four groups that varied in levels of psychopathic traits and ADHD by using the Hierarchical clustering analysis. One group was characterized by high levels of psychopathic traits and high levels of ADHD (high combination group). The subgroups differed significantly from each other in several theoretically meaningful ways. The high combination group reported higher levels of psychopathic traits, impulsivity and hyperactivity, as well as higher levels on external variables such as aggression, delinquency and violence compared to the other subgroups. The high combination group also differed in terms of anxiety levels from the subgroup with high psychopathic traits only. These findings are in line with previous research and confirms that the construct of psychopathy is heterogenous. We identified an especially vulnerable subgroup that resembles the characteristics of the secondary psychopath.
|
408 |
Social Experiences With Mental Health Service Use Among US AdolescentsXie, Xin, Wang, Nianyang, Chu, Jun 01 January 2021 (has links)
Background: Little is known about the associations of social experiences with mental health service use. Aim: This study aimed to classify social experiences variables in the past year and examine the associations of selected variables in social experiences with mental health service use among US adolescents. Methods: A total of 13,038 adolescents (aged 12 to 17), of which 2208 received mental health services, were from the 2018 National Survey on Drug Use and Health. Multivariate logistic regression (MLR) analysis was conducted. Results: The overall prevalence of mental health service use was 16.1%. 44 variables on social experiences were grouped into 10 disjoint clusters and one variable from each cluster was selected for MLR analysis. Being female, African American, Hispanics, insured and having depression in the past year were associated with increased odds of mental health service use. Negative feelings about going to school, having a serious fight at school/work, active involvement in substance use help programs, knowledge of drug prevention, negative perceptions about the role of religious beliefs on life decisions were positively associated with mental health service use. Conclusion: Mental health service use is associated with feelings about school and peers, perceptions about drug use, and involvement in activities.
|
409 |
Improving Nitrogen Management in Corn- Wheat-Soybean Rotations Using Site Specific Management in Eastern VirginiaPeng, Wei 13 November 2001 (has links)
Nitrogen (N) is a key nutrient input to crops and one of the major pollutants to the environment from agriculture in the United States. Recent developments in site-specific management (SSM) technology have the potential to reduce both N overapplication and underapplication and increase farmers' net returns. In Virginia, due to the high variability of within-field yield-limiting factors such as soil physical properties and fertility, the adoption of SSM is hindered by high gridsampling cost. Many Virginia corn-wheat-soybean farms have practiced generating yield maps using yield monitors for several years even though few variable applications based on yield maps were reported. It is unknown if the information generated by yield monitors under actual production situations can be used to direct N management for increased net returns in this area.
The overall objective of the study is to analyze the economic and environmental impact of alternative management strategies for N in corn and wheat production based on site-specific information in eastern Virginia. Specifically, evaluations were made of three levels of site-specific information regarding crop N requirements combined with variable and uniform N application. The three levels of information are information about the yield potential of the predominant soil type within the field, information about yield potentials of all soils within the field (soil zones), information about yield potentials of smaller sub-field units which are aggregated into functional zones. Effects of information on expected net returns and net N (applied N that is not removed by the crop) were evaluated for corn-wheat-soybean fields in eastern Virginia. Ex post and ex ante evaluations of information were carried out.
Historical weather data and farm-level yield data were used to generate yield sequences for individual fields. A Markov chain model was used to describe both temporal and spatial yield variation. Soil maps were used to divide a field into several soil management units. Cluster analysis was used to group subfield units into functional zones based on yield monitor data. Yield monitor data were used to evaluate ex post information and variable application values for 1995-1999, and ex ante information and variable application values for 1999.
Ex post analysis results show that soil zone information increased N input but decreased net return, while functional zone information decreased N input and increased net returns. Variable application decreased N input compared with uniform application. Variable application based on soil zone information reduced net return due to cost of overapplication or underapplication. Variable application based on functional information increased net return.
Ex ante results show that information on spatial variability was not able to increase farmers?net return due to the cost of variable N application and information. Variable rate application decreases N input relative to uniform application. However, imprecision in the spatial predictor makes the variable application unprofitable due to an imbalance between costs of under- and over-application of N. Sensitivity analysis showed that value of information was positive when temporal uncertainty was eliminated.
The ex post results of this study suggest there is potential to improve efficiency of N use and farmers?net returns with site specific management techniques. The ex ante results suggest that site specific management improvements should be tested under conditions faced by farmers including imperfect information about temporal and spatial yield variability. / Ph. D.
|
410 |
Micro-Raman Imaging for Biology with Multivariate Spectral AnalysisMalvaso, Federica 05 May 2015 (has links)
Raman spectroscopy is a noninvasive technique that can provide complex information on the vibrational state of the molecules. It defines the unique fingerprint that allow the identification of the various chemical components within a given sample. The aim of the following thesis work is to analyze Raman maps related to three pairs of different cells, highlighting differences and similarities through multivariate algorithms. The first pair of analyzed cells are human embryonic stem cells (hESCs), while the other two pairs are induced pluripotent stem cells (iPSCs) derived from T lymphocytes and keratinocytes, respectively. Although two different multivariate techniques were employed, ie Principal Component Analysis and Cluster Analysis, the same results were achieved: the iPSCs derived from T-lymphocytes show a higher content of genetic material both compared with the iPSCs derived from keratinocytes and the hESCs . On the other side, equally evident, was that iPS cells derived from keratinocytes assume a molecular distribution very similar to hESCs.
|
Page generated in 0.0661 seconds