Global ETD Search

541	Unsupervised learning with mixed type data : for detecting money laundering / Klusteranalys av heterogen data Engardt, Sara January 2018 (has links) The purpose of this master's thesis is to perform a cluster analysis on parts of Handelsbanken's customer database. The ambition is to explore if this could be of aid in identifying type customers within risk of illegal activities such as money laundering. A literature study is conducted to help determine which of the clustering methods described in the literature are most suitable for the current problem. The most important constraints of the problem are that the data consists of mixed type attributes (categorical and numerical) and the large presence of outliers in the data. An extension to the self-organising map as well as the k-prototypes algorithms were chosen for the clustering. It is concluded that clusters exist in the data, however in the presence of outliers. More work is needed on handling missing values in the dataset. / Syftet med denna masteruppsats är att utföra en klusteranalys på delar av Handelsbankens kunddatabas. Tanken är att undersöka ifall detta kan vara till hjälp i att identifiera typkunder inom olagliga aktiviteter såsom penningtvätt. Först genomförs en litteraturstudie för att undersöka vilken algoritm som är bäst lämpad för att lösa problemet. Kunddatabasen består av data med både numeriska och kategoriska attribut. Ett utökat Kohonen-nätverk (eng: self-organising map) samt k-prototyp algoritmen används för klustringen. Resultaten visar att det finns kluster i datat, men i närvaro av brus. Mer arbete behöver göras för att hantera tomma värden bland attributen. Unsupervised learning cluster analysis mixed type data Computer Sciences Datavetenskap (datalogi)
542	A Posteriori And Interactive Approaches For Decision-making With Multiple Stochastic Objectives Bakhsh, Ahmed 01 January 2013 (has links) Computer simulation is a popular method that is often used as a decision support tool in industry to estimate the performance of systems too complex for analytical solutions. It is a tool that assists decision-makers to improve organizational performance and achieve performance objectives in which simulated conditions can be randomly varied so that critical situations can be investigated without real-world risk. Due to the stochastic nature of many of the input process variables in simulation models, the output from the simulation model experiments are random. Thus, experimental runs of computer simulations yield only estimates of the values of performance objectives, where these estimates are themselves random variables. Most real-world decisions involve the simultaneous optimization of multiple, and often conflicting, objectives. Researchers and practitioners use various approaches to solve these multiobjective problems. Many of the approaches that integrate the simulation models with stochastic multiple objective optimization algorithms have been proposed, many of which use the Pareto-based approaches that generate a finite set of compromise, or tradeoff, solutions. Nevertheless, identification of the most preferred solution can be a daunting task to the decisionmaker and is an order of magnitude harder in the presence of stochastic objectives. However, to the best of this researcher’s knowledge, there has been no focused efforts and existing work that attempts to reduce the number of tradeoff solutions while considering the stochastic nature of a set of objective functions. In this research, two approaches that consider multiple stochastic objectives when reducing the set of the tradeoff solutions are designed and proposed. The first proposed approach is an a posteriori approach, which uses a given set of Pareto optima as input. The second iv approach is an interactive-based approach that articulates decision-maker preferences during the optimization process. A detailed description of both approaches is given, and computational studies are conducted to evaluate the efficacy of the two approaches. The computational results show the promise of the proposed approaches, in that each approach effectively reduces the set of compromise solutions to a reasonably manageable size for the decision-maker. This is a significant step beyond current applications of decision-making process in the presence of multiple stochastic objectives and should serve as an effective approach to support decisionmaking under uncertainty Decision making under uncertainty pareto analysis cluster analysis a posteriori approach interactive approach Engineering Industrial Engineering
543	Statistical Analysis on Aerodynamics of Passenger Vehicles Peng, Dingkang January 2023 (has links) This thesis aims to use statistical methods to analyze wind tunnel data generated in automotive aerodynamics testing to understand the properties of aerodynamic force and pressure in a vehicle's working environment. The data used for analysis are visualized, clustered and finally analyzed for Granger causality to see whether a causal link exists between different variables. Then, the pressure measurements taken from the scaled vehicle model is visualized with heat maps and further quantified with K-means and K-medoids clustering. Using the reduced-dimension pressure data derived from cluster analysis, combined with aerodynamic force data, a VAR model is fitted, and the causal relationships between the variables in the data set is explored using Granger causality testing. statistics aerodynamics automotive engineering causality time series cluster analysis Probability Theory and Statistics Sannolikhetsteori och statistik
544	Modeling of United States Airline Fares -- Using the Official Airline Guide (OAG) and Airline Origin and Destination Survey (DB1B) Rama-Murthy, Krishna 13 September 2007 (has links) Prediction of airline fares within the United States including Alaska & Hawaii is required for transportation mode choice modeling in impact analysis of new modes such as NASA's Small Airplane Transportation System (SATS). Developing an aggregate cost model i.e. a 'generic fare model' of the disaggregated airline fares is required to measure the cost of air travel. In this thesis, the ratio of average fare to distance i.e. fare per mile and average fare is used as a measure of this cost model. The thesis initially determines the Fare Class categories to be used for Coach and Business class for the analysis .The thesis then develops a series of 'generic fare models' using round trip distance traveled as an independent variable. The thesis also develops a set of models to estimate average fare for any origin and destination pair in the US. The factors considered by these models are: the round trip distance traveled between the origin (o) and destination (d), the type of fare class chosen by the traveler (first, business class and unrestricted coach class and restricted coach class), the type of airport (large hub, medium hub, small hub, or non hub), whether or not the route is served by a low cost airline and the airline market concentration between the o-d pair. The models suggest that competition at the destination airport is more critical than the competition at origin airport for coach class fares and vice a versa for business class fares. Models suggested in this thesis predict air fares with R-square values of 0.3 to 0.75. / Master of Science Curve Fitting Airline Fares DB1B Cluster Analysis Multiple Linear Regression OAG
545	Sedimentologic and taphonomic analysis of a 1945 tsunami deposit in Sur Lagoon, Sultanate of Oman Donato , Simon Vincent 01 1900 (has links) The Sultanate of Oman is a rapidly modernizing country with a significant length of its coastline slated for development. Much of the coastline is still in its natural state and basic studies describing the sedimentary systems need to be conducted in order to plan effectively for their sustainable development and to monitor changes in them with time. For such purposes, sediment samples (surface and sub-surface), elevation data, and serial sediment cores were collected at Sur Lagoon during three field seasons. The research objectives, procedures, results, and analyses for Sur lagoon are presented in three chapters. The first chapter compares textural facies, identified on the basis of particle-size distribution (PSD) of surface sediments from Sur Lagoon and evaluated using multi-variate cluster analysis, for their value in recognizing modem sedimentary environments. Clustering the full PSD size spectrum (0.0375- 1888 μm) shows that facies identification is possible is closely tied to surface elevation, particle-size decreasing with increasing elevation above mean sea level. This analytical technique should be tested under different conditions to assess further its utility. The second chapter discusses the taphonomically distinct and laterally extensive (> 1 km2) bivalve shell bed deposited by a tsunami on November 28th, 1945. Taphonomic characteristics of this unit are compared to those of the shell-rich tsunamite from Caesarea, Israel, and resulted in the identification of three generic, tsunamigenic-specific traits in shell beds: 1) thickly bedded and laterally extensive shell deposit, 2) presence of allochthonous articulated bivalves not in life position, and 3) extensive angular fragmentation. When these three traits are found together, a tsunamigenic origin should be considered for the shell bed. The third chapter analyzes the PSD of the tsunamite in eight sediment cores for digested and undigested samples. Cluster analysis of the PSD extended the upper or lower tsunamite contacts in four cores, but in general, the tsunamite thickness is consistent with the previously identified shell beds (Chapter 3). The tsunamigenic processes that resulted in the deposition of the shell bed were complex, and deposition occurred during run-up, flooding, and backwash stages of the tsunami, incorporating marine, lagoonal, and terrestrial (wadi) sediment into the tsunamite. The results of this study provide baseline sedimentological data for an understudied region of the world. New applications of cluster analysis of PSD and taphonomic analysis have the potential to identify previously unknown tsunamites in the geological record, and lithological facies using textural analysis. / Thesis / Doctor of Philosophy (PhD) Sultanate of Oman Sur Lagoon tsunami taphonomic analysis textural analysis
546	An analysis of style-types in musical improvisation using clustering methods Ellis, Blair K. 11 1900 (has links) Research on creativity examines both the processes and products of creativity. An important avenue for analyzing creativity is by means of spontaneous improvisation, although there are major challenges to characterizing the output of improvisation due to the variable nature of the products. In the case of musical improvisation, structural approaches have used methodologies like musical transcription to look for recurring or variable musical features across a corpus of improvisations, while creativity-centered approaches have had experts make ratings of the novelty of the improvisations. One important concept missing from many analyses of improvisation is the idea that the products of a corpus can be organized into a series of “style types”, where each type differs from others in certain key structural features. Clustering methods provide a reliable quantitative means of examining the organization of style types within a diverse corpus of improvisations. In order to look at the potential of such methods, we examined a corpus of 72 vocal melodic improvisations produced by novice improvisers. We first classified the melodies acoustically using a multidimensional musical-classification scheme called CantoCore, which coded the melodies for 19 distinct features of musical structure. We next employed the simultaneous use of multiple correspondence analysis (MCA) and k-means cluster analysis with the data, and obtained three relatively discrete clusters of improvisations. Stylistic analysis of these clusters revealed that they differed in key features related to phrase structure and rhythm. Cluster analyses provide a promising means of describing and analyzing the products of creativity, including variable structures like spontaneous improvisations. / Thesis / Master of Science (MSc)
547	129Xe Magnetic Resonance Imaging Ventilation Phenotypes of Severe Asthma / Ventilation Phenotypes of Severe Asthma Thakar, Ashutosh January 2024 (has links) INTRODUCTION: Abnormal ventilation is the functional consequence of airway obstruction. In patients with severe asthma, ventilation patterns visualized by magnetic resonance imaging (MRI) exhibit significant inter-patient heterogeneity. Therefore, our objectives were to identify MRI ventilation phenotypes of severe asthma using an unsupervised clustering approach and examine their associated demographic, clinical, physiologic, and inflammatory characteristics. METHODS: This retrospective analysis included 58 adults with severe asthma who underwent hyperpolarized 129Xe ventilation MRI. Nineteen quantitative variables were extracted from ventilation MRI (including ventilation defect percent (VDP), ventilation defect size, and ventilation texture features) and transformed to principal components for hierarchical clustering. Differences in demographics, clinical characteristics, spirometry, inflammatory biomarkers, and computed tomography (CT) measurements between phenotypes were evaluated using one-way ANOVA or Kruskal-Wallis tests. RESULTS: Three ventilation phenotypes of severe asthma were identified. They were significantly different with respect to their age, prevalence of obesity, spirometry, sputum neutrophil percent, sputum cytokines (interleukin-4, interleukin-6, interleukin-15, B-cell activating factor), total lung capacity, CT air-trapping, and CT mucus score (all p<0.05). They were not different with respect to their asthma control or medication requirement, and ~75% of each phenotype reported uncontrolled asthma (ACQ-5≥1.5). Phenotype 1 had normal ventilation (VDP=1.7±0.9%) and predominantly consisted of young, obese females (88% female, 41±11 years old, 63% obese). They had normal-to-moderately reduced FEV1 (80±15%pred), normal post-bronchodilator FEV1/FVC, and reduced total lung capacity (85%pred [57-108]). 25% had intraluminal inflammation (all eosinophilic) and their sputum interleukin-4 levels were elevated. Phenotype 2 had markedly abnormal ventilation (VDP=6.2±3.8%) and was older than Phenotype 1, but also predominantly consisted of obese females (63% female, 54±13 years old, 59% obese). They had mildly-to-severely reduced FEV1 (61±17%pred) and partially reversible obstructive spirometry (72%, post-bronchodilator FEV1/FVC<0.70). 50% had intraluminal inflammation (28% eosinophilic/13% neutrophilic/9% mixed-granulocytic) and their sputum interleukin-6 levels were elevated. Phenotype 3 had severely abnormal ventilation (VDP=24.8±10.2%) and was also older than Phenotype 1 but was gender-balanced and not obese (50% female, 56±12 years old, 11% obese). They had moderately-to-very severely reduced FEV1 (41±12%pred) and partially reversible obstructive spirometry (89%, post-bronchodilator FEV1/FVC<0.70). 73% had intraluminal inflammation (39% eosinophilic/17% neutrophilic/17% mixed-granulocytic) and their sputum interleukin-15 and B-cell activating factor levels were elevated. They had the highest burden of gas-trapping and mucus on CT. CONCLUSION: Three distinct MRI ventilation phenotypes of severe asthma were identified through unbiased analysis, all of which reported uncontrolled asthma. The discordance in ventilation between phenotypes, and their characteristics, suggest different mechanisms that may be driving severe asthma. / Thesis / Master of Science (MSc) / Severe asthma is an airways disease that is characterized by inflamed, twitchy and obstructed airways. There is remarkable clinical heterogeneity between asthma patients due to the various mechanisms of disease. Abnormal ventilation is the functional consequence of abnormal airway pathology in asthma, which can be directly visualized by hyperpolarized 129Xe magnetic resonance imaging (MRI). Each ventilation pattern is unique and there is significant inter-patient variability. Thus, the goal of the thesis was to extract quantitative information from the 129Xe MRI ventilation patterns of patients with severe asthma, identify novel ventilation phenotypes, and determine their clinical relevance. An unsupervised machine learning approach using quantitative ventilation MRI features identified three unique, clinically relevant ventilation phenotypes of severe asthma with distinct clinical, physiological, and biological characteristics. The discordance in ventilation between phenotypes, and their characteristics, suggest different mechanisms that may be driving severe asthma. Magnetic Resonance Imaging Ventilation Image Processing Cluster Analysis Severe Asthma Phenotypes
548	A Multi-Variate Regression Analysis on Telecommunication Sites in a Sub-Saharan Country / En regressionsanalys i flera variabler på telekommunikationsmaster i ett land i subsahariska Afrika Berisha, Elza, Holma, Hampus January 2023 (has links) The purpose of this bachelor thesis is to investigate how different variables impact voice and data traffic for a telecom operator that operates in an undisclosed Sub-Saharan African country. The data has been provided by said company. The models, generated by using multivariate linear regression analysis, have a high explanatory power, as evidenced by high coefficients of determination. However, it is important to recognize the persistence of certain systematic issues, which are most likely due to the absence of key explanatory variables. Addressing these limitations in future research efforts will lead to a more comprehensive understanding of the subject and more robust findings to determine which factors drive voice and data traffic. In the report, the telecommunication sites are segmented based on generated income. Two segmentation models were created to categorize sites based on their data and voice revenue quartiles. A color matrix was used to depict the results. The hypothesis that nearby sites are more likely to perform similarly was tested using a quartile-based scoring method. The regression analysis uncovered significant variables and revealed information about the relationship between various factors and data and voice traffic. The regression residuals were analyzed using qualitative cluster analysis, which revealed distinct clustering patterns. Overall, the study provides useful insights into data and voice traffic segmentation and performance analysis in the analyzed region. / Syftet med detta kandidatarbete är att undersöka hur olika variabler påverkar röst- och datatrafik för en telekom-operatör som är verksam i ett Subsahariskt afrikanskt land. Studien använder sig av linjär regressionsanalys för att utveckla modeller som visar med en bra förklaringsgrad. Förklaringsgraden visas genom höga determinationskoefficienter. Men, trots ett bra resultat är det viktigt att ta hänsyn till systematiska problem hos modellerna. problemen beror troligtvis på att viktiga förklarande variabler saknas i datan. Framtida forskningsinsatse bör därför sträva efter att åtgärda dessa begränsningar, och på så sätt uppnå en mer omfattande förståelse av ämnet och mer korrekt resultat. I rapporten segmenteras telekommunikationsmasterna baserat på genererad inkomst. Två segmenteringsmodeller har utvecklats för att kategorisera masterna enligt deras kvartiler för data- och röstintäkter. Resultaten visas visuellt med hjälp av en färgmatris. Dessutom prövades hypotesen att närliggande master uppvisar liknande prestanda med hjälp av en kvartilsbaserad poängmetod. Regressionsanalysen identifierar signifikanta variabler och ger insikter i relationen mellan olika faktorer mellan data- och rösttrafik. Vidare upptäcks, via kvalitativ klusteranalys av regressionsresterna, tydliga klustringsmönster i resultatet. Sammantaget ger denna studie värdefulla insikter i data- och rösttrafiksegmentering samt prestandaanalys i den analyserade regionen. telecommunication linear regression segmentation cluster analysis telekommunikation linjär regression segmentering klusteranalys Probability Theory and Statistics Sannolikhetsteori och statistik
549	Statistical Analysis of Atmospheric Variables during Tornadic Events in Dixie Alley and Tornado Alley using Proximity Soundings from 1995 to 2015 Schroder, Zoe 06 May 2017 (has links) Tornadoes frequently occur in Tornado Alley (Northern Texas, Oklahoma, Kansas, and Nebraska) and Dixie Alley (Louisiana, Arkansas, Tennessee, Mississippi, Alabama, and Georgia). This study utilizes sounding variables taken within 2-hours and 80 km of a tornado event for the period 1995-2015 to compare and differentiate between these regions. Data bootstrapping and cluster analysis were used to assess differences and similarities in the environmental data between the regions. Of the variables used, the thermodynamic variables showed the greatest discrimination between Dixie Alley and Tornado Alley tornado environments with Dixie Alley having lower LCL heights and CAPE values as well as higher SREH and BWD values when compared to Tornado Alley. However, due to thermodynamic and kinematic inputs, EHI shows the greatest potential in discriminating between tornadic environments in Dixie Alley and Tornado Alley which is beneficial in severe weather forecasting. Cluster Analysis Bootstrap Kinematic Thermodynamic Atmospheric Variables Proximity Soundings Tornadic Environments Tornadoes
550	Profile Analysis Techniques for Observation-Based Software Testing Leon Cesin, David Zaen January 2005 (has links) No description available. Computer Science Software Engineering Software Testing Cluster Analysis Observation-based testing

Search results