Global ETD Search

21	Finding co-workers with similar competencies through data clustering / Att upptäcka medarbetare med liknande kompetensprofil via dataklustring Skoglund, Oskar January 2022 (has links) In this thesis, data clustering techniques are applied to a competence database from the company Combitech. The goal of the clustering is to connect co-workers with similar competencies and competence areas in order to enable more skill sharing. This is accomplished by implementing and evaluating three clustering algorithms, k-modes, DBSCAN, and ROCK. The clustering algorithms are fine-tuned with the use of three internal validity indices, the Dunn, Silhouette, and Davies-Bouldin score. Finally, a form regarding the clustering of the three algorithms is sent out to the co-workers, which the clustering is based on, in order to obtain external validation by calculating the clustering accuracy. The results from the internal validity indices show that ROCK and DBSCAN create the most separated and dense clusters. The results from the form show that ROCK is the most accurate of the three algorithms, with an accuracy of 94%, followed by k-modes at 58% and DBSCAN at 40% accuracy. However, the visualization of the clusters shows that both ROCK and DBSCAN create one very big cluster, which is not desirable. This was not the case for k-modes, where the clusters are more evenly sized while still being fairly well-separated. In general, the results show that it is possible to use data clustering techniques to connect people with similar competencies and that the predicted clusters agree fairly well with the gold-standard data from the co-workers. However, the results are very dependent on the choice of algorithm and parametric values, and thus have to be chosen carefully. Data analytics data clustering k-modes DBSCAN ROCK Computer and Information Sciences Data- och informationsvetenskap
22	Optimized 3D Reconstruction for Infrastructure Inspection with Automated Structure from Motion and Machine Learning Methods Arce Munoz, Samuel 09 June 2020 (has links) Infrastructure monitoring is being transformed by the advancements on remote sensing, unmanned vehicles and information technology. The wide interaction among these fields and the availability of reliable commercial technology are helping pioneer intelligent inspection methods based on digital 3D models. Commercially available Unmanned Aerial Vehicles (UAVs) have been used to create 3D photogrammetric models of industrial equipment. However, the level of automation of these missions remains low. Limited flight time, wireless transfer of large files and the lack of algorithms to guide a UAV through unknown environments are some of the factors that constraint fully automated UAV inspections. This work demonstrates the use of unsupervised Machine Learning methods to develop an algorithm capable of constructing a 3D model of an unknown environment in an autonomous iterative way. The capabilities of this novel approach are tested in a field study, where a municipal water tank is mapped to a level of resolution comparable to that of manual missions by experienced engineers but using $63\%$ . The iterative approach also shows improvements in autonomy and model coverage when compared to reproducible automated flights. Additionally, the use of this algorithm for different terrains is explored through simulation software, exposing the effectiveness of the automated iterative approach in other applications. structure from motion machine learning DBSCAN principal components analysis UAV infrastructure monitoring Engineering
23	Automatic Physical Cell Identity Planning using Machine Learning Manda, Bala Naga Sai Venkata Bharath, Yama, Manideep January 2022 (has links) Background: The growing needs of communications have a higher demand for data and stream-less services for the users. A unique physical cell identity (PCI) is assigned to transfer data between the cellular base station (gNB) and user equipment (UE). It is used to transmit the data to multiple users simultaneously. In this thesis, a heuristic algorithm is generated, aided by an unsupervised machine learning approach to improve the PCI allocation of a cell for better 5G services such as connectivity and speed. Objectives: Firstly, performing a literature review to find the appropriate performance metrics to compare both K-means and density-based spatial clustering of applications with noise (DBSCAN) technique on the PCI allocation data provided by Ericsson. Next, the better-clustering method along with heuristic algorithm was implemented to generate a efficient PCI planning. Later, compare the results of previous planning (existing PCI planning approach), proposed planning (results of using the generated heuristic algorithm) based on the ideal planning derived from the experts. Methods: The literature review is conducted for determining the best metrics for the clustering algorithms mentioned in the objectives. With the use of unsupervised learning the PCI allocation data is clustered based on its distance and neighbors. Subsequently the clusters are used in the heuristic algorithm. The results of proposed planning are compared with previous planning. Results: The literature review indicated that the silhouette coefficient and davies-bouldin index are most suitable metrics for comparing the clustering algorithms mentioned in the objectives. These two metrics are used to determine the best performing clustering algorithm. The clustering results were given as input for heuristic algorithm to generate a PCI planning. Then, the results stated that the proposed planning is better than previous planning and decreased nearly 70% collisions in the areas: Fresno, San Francisco and San Jose compared to the previous planning. Conclusions: The main goal of this study is to achieve a better PCI planning that can accommodate many users and achieve better 5G services. This PCI planning is helpful for the company to utilize its resources efficiently. Physical Cell Identity 3GPP Machine Learning 5G DBSCAN Computer Sciences Datavetenskap (datalogi)
24	Generating fishing boats behaviour based on historic AIS data : A method to generate maritime trajectories based on historicpositional data / Genering av fiskebåtsbeteende baserat på historisk AIS dat Bergman, Oscar January 2022 (has links) This thesis describes a method to generate new trajectories based on historic positiondata for a given geographical area. The thesis uses AIS-data from fishing boats to first describe a method that uses DBSCAN and OPTICS algorithms to cluster the data into clustersbased on routes where the boats travel and areas where the boats fish.Here bayesian optimization has been utilized to search for parameters for the clusteringalgorithms. In this given scenario it was shown DBSCAN is better in all fields, but it hasmany points where OPTICS has the potential to become better if it was modified a bit.This is followed by a method describing how to take the clusters and build a nodenetwork that then can be traversed using a path finding algorithm combined with internalrules to generate new routes that can be used in simulations to give a realistic enoughsituation picture. Finally a method to evaluate these generated routes are described andused to compare the routes to each other Clustering Clustering algorithms AIS OPTICS DBSCAN AI fishing boats Computer Systems Datorsystem
25	Geodynamic Modeling Applied to Venus Euen, Grant Thomas 23 May 2023 (has links) Modern geodynamic modeling is more complex than ever, and has been used to answer questions about Earth pertaining to the dynamics of the convecting mantle and core, layers humans have never directly interacted with. While the insights gleaned from these models cannot be argued, it is important to ensure calculations are understood and behaving correctly according to known math and physics. Here I perform several thermal 3-D spherical shell tests using the geodynamic code ASPECT, and compare the results against the legacy code CitcomS. I find that these two codes match to within 1.0% using a number of parameters. The application of geodynamic modeling is also traditionally to expand our understanding of Earth; however, even with a scarcity of data modern methods can provide insight into other planetary bodies. I use machine learning to show that coronae, circular features on the surface of the planet Venus, are not randomly distributed. I suggest the idea of coronae being fed by secondary mantle plumes in connected clusters. The entirety of the Venusian surface is poorly understood as well, with a large percentage being topographically smooth and much younger than the planet's hypothesized age. I use modeling to test the hypothesis of a large impact being responsible for a major resurfacing event in Venus's history, and find three distinct scenarios following impact: relatively little change, some localized change evolving into resurfacing through geologic time, or large-scale overturn and injection of heat deep into the Venusian mantle. / Doctor of Philosophy / Modern geodynamic modeling has been used to answer questions about Earth in wide-ranging fields. Despite technological improvements, it is important to ensure the calculations are understood and behaving correctly. Here I perform several tests using a code called ASPECT and compare the results against another code, CitcomS. I find that the two codes are in good agreement. Application of these techniques is also traditionally done for Earth, but modern methods can provide insight into other planets or moons as well. Coronae are circular features on the surface of Venus that are poorly understood. I use machine learning to show that these are not randomly distributed, and suggest a mechanism for the formation of clusters of coronae. The surface of Venus is also strange: it is both too flat and too young based on current ideas in planetary science. I use modeling to test whether a large impact could cause the details of Venus's surface we see today. Mantle Convection ASPECT Spherical Shell Venus Corona Clustering DBSCAN Stagnant Lid Impact
26	ALGORITMOS DE CLUSTERING PARALELOS EN SISTEMAS DE RECUPERACIÓN DE INFORMACIÓN DISTRIBUIDOS Jiménez González, Daniel 20 July 2011 (has links) La información es útil si cuando se necesita está disponible y se puede hacer uso de ella. La disponibilidad suele darse fácilmente cuando la información está bien estructurada y ordenada, y además, no es muy extensa. Pero esta situación no es la más común, cada vez se tiende más a que la cantidad de información ofrecida crezca de forma desmesurada, que esté desestructurada y que no presente un orden claro. La estructuración u ordenación manual es inviable debido a las dimensiones de la información a manejar. Por todo ello se hace clara la utilidad, e incluso la necesidad, de buenos sistemas de recuperación de información (SRI). Además, otra característica también importante es que la información tiende a presentarse de forma natural de manera distribuida, lo cual implica la necesidad de SRI que puedan trabajar en entornos distribuidos y con técnicas de paralelización. Esta tesis aborda todos estos aspectos desarrollando y mejorando métodos que permitan obtener SRI con mejores prestaciones, tanto en calidad de recuperación como en eficiencia computacional, los cuales además permiten trabajar desde el enfoque de sistemas ya distribuidos. El principal objetivo de los SRI será proporcionar documentos relevantes y omitir los considerados irrelevantes respecto a una consulta dada. Algunos de los problemas más destacables de los SRI son: la polisemia y la sinonimia; las palabras relacionadas (palabras que juntas tienen un signi cado y separadas otro); la enormidad de la información a manejar; la heterogeneidad de los documentos; etc. De todos ellos esta tesis se centra en la polisemia y la sinonimia, las palabras relacionadas (indirectamente mediante la lematización semántica) y en la enormidad de la información a manejar. El desarrollo de un SRI comprende básicamente cuatro fases distintas: el preprocesamiento, la modelización, la evaluación y la utilización. El preprocesamiento que conlleva las acciones necesarias para transformar los documentos de la colección en una estructura de datos con la información relevante de los documentos ha sido una parte importante del estudio de esta tesis. En esta fase nos hemos centrado en la reducción de los datos y estructuras a manejar, maximizando la información contenida. La modelización, ha sido la fase más analizada y trabajada en esta tesis, es la que se encarga de defi nir la estructura y comportamiento del SRI. / Jiménez González, D. (2011). ALGORITMOS DE CLUSTERING PARALELOS EN SISTEMAS DE RECUPERACIÓN DE INFORMACIÓN DISTRIBUIDOS [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/11234 Paralelo Distribuido Recuperación de información Clustering Dbscan Paralelismo Bisecting K-means Vdbscan Heuristica
27	Modeling species geographic distributions in aquatic ecosystems using a density-based clustering algorithm Castaneda Guzman, Mariana 13 September 2022 (has links) Distributional ecology is a branch of ecology which aims to reconstruct and predict the geographic range of free-living and symbiotic organisms in terrestrial and aquatic ecosystems. More recently, distributional ecology has been used to map disease transmission risk. The implementation of distributional ecology for disease transmission has, however, been erroneous in many cases. The inaccurate representation of disease distribution is detrimental to effective control and prevention. Furthermore, ecological niche modeling experiments are generally developed and tested using data from terrestrial organisms, neglecting aquatic organisms in case studies. Both disease and aquatic systems are often data limited, and current modeling methods are often insufficient. There is, therefore, a need to develop data-driven models that perform accurately even when only limited amounts of data are available or when there is little to no knowledge of the species' natural history to be modeled. Here, I propose a data-driven ecological niche modeling method that requires presence-only data (i.e., absence, pseudoabsence, or background records are not needed for model calibration). My method is expected to reconstruct environmental conditions where data-limited aquatic organisms are more likely to be present, based on a density-based clustering algorithm as a proxy of the realized niche (i.e., abiotic, and biotic environmental conditions occupied by the organism). Supported by ecological theories and methods, my central hypothesis is that because density-based clustering machine-learning modeling prevents extrapolation and interpolation, it can robustly reconstruct the realized niche of a data-limited aquatic organism. First, I assembled a comprehensive dataset of abiotic (temperature) and biotic (phytoplankton) environmental conditions and presence reports using Vibrio cholerae, a well-understood aquatic bacterium species in coastal waters globally (Chapter 2). Second, using V. cholerae as a model system, I developed detailed parameterizations of density-based clustering models to determine the parameter values with the best capacities to reconstruct and predict the species' distribution in global seawaters (Chapter 3). Finally, I compared the performance of density-based clustering modeling against traditional, correlative machine-learning ecological niche modeling methods (Chapter 4). Density-based clustering models, when assessed based on model fit and prediction, had comparable performance to traditional 'data-hungry' machine-learning correlative methods used in modern applications of ecological niche modeling. Modeling the environmental and geographic ranges of V. cholerae, an aquatic organism of free-living and parasitic ecologies, is a novel approach itself in distributional ecology. Ecological niche modeling applications to pathogens, such as V. cholerae, provide an opportunity to further the knowledge of directly-transmitted emerging diseases for which only limited data are available. Density-based clustering ecological niche modeling is termed here as Marble, honoring a previous, experimental version of this analytical approach, and is expected to provide new opportunities to understand how an ecological niche modeling method influences estimates of the distribution of data-limited organisms of complex ecology. These are lessons applicable to novel, rare, and cryptic aquatic organisms, such as emerging diseases, endangered fishes, and elusive aquatic species. / Master of Science / Distributional ecology is a branch of ecology which aims to reconstruct and predict the geographic distribution of land and water organisms. In the case of diseases, a correct representation of their geographic distributions is key for successful management. Previous studies highlight the need to develop new models that perform accurately even when limited amounts of data are available and there is little to no knowledge of the organisms' ecology. This thesis proposes a data-driven method, originally termed Marble. Marble is expected to help reconstruct environmental conditions where data-limited aquatic organisms are more likely to be found. Supported by ecological theories and methods, my hypothesis is that because Marble prevents under- and over-fitting, this method will produce results which better fit the data. Using V. cholerae, an aquatic organism, as a model system, I compared the performance of Marble against other traditional modeling algorithms. I found that Marble, in terms of model fit, performed similarly to traditional methods used in distributional ecology. Modeling the ecology of V. cholerae is a new approach in and of itself in ecological modeling. Furthermore, modeling pathogens provides an opportunity to further the knowledge of directly transmitted diseases, and Marble is expected to provide opportunities to understand how algorithm selection can reconstruct (or not) the distribution of data-limited aquatic organisms of diverse ecologies. Ecological niche modeling Vibrio cholerae climate change infectious disease remote sensing suitability DBSCAN
28	Anomaly Detection in Time Series Data using Unsupervised Machine Learning Methods: A Clustering-Based Approach / Anomalidetektering av tidsseriedata med hjälp av oövervakad maskininlärningsmetoder: En klusterbaserad tillvägagångssätt Hanna, Peter, Swartling, Erik January 2020 (has links) For many companies in the manufacturing industry, attempts to find damages in their products is a vital process, especially during the production phase. Since applying different machine learning techniques can further aid the process of damage identification, it becomes a popular choice among companies to make use of these methods to enhance the production process even further. For some industries, damage identification can be heavily linked with anomaly detection of different measurements. In this thesis, the aim is to construct unsupervised machine learning models to identify anomalies on unlabeled measurements of pumps using high frequency sampled current and voltage time series data. The measurement can be split up into five different phases, namely the startup phase, three duty point phases and lastly the shutdown phase. The approach is based on clustering methods, where the main algorithms of use are the density-based algorithms DBSCAN and LOF. Dimensionality reduction techniques, such as feature extraction and feature selection, are applied to the data and after constructing the five models of each phase, it can be seen that the models identifies anomalies in the data set given. / För flera företag i tillverkningsindustrin är felsökningar av produkter en fundamental uppgift i produktionsprocessen. Då användningen av olika maskininlärningsmetoder visar sig innehålla användbara tekniker för att hitta fel i produkter är dessa metoder ett populärt val bland företag som ytterligare vill förbättra produktionprocessen. För vissa industrier är feldetektering starkt kopplat till anomalidetektering av olika mätningar. I detta examensarbete är syftet att konstruera oövervakad maskininlärningsmodeller för att identifiera anomalier i tidsseriedata. Mer specifikt består datan av högfrekvent mätdata av pumpar via ström och spänningsmätningar. Mätningarna består av fem olika faser, nämligen uppstartsfasen, tre last-faser och fasen för avstängning. Maskinilärningsmetoderna är baserade på olika klustertekniker, och de metoderna som användes är DBSCAN och LOF algoritmerna. Dessutom tillämpades olika dimensionsreduktionstekniker och efter att ha konstruerat 5 olika modeller, alltså en för varje fas, kan det konstateras att modellerna lyckats identifiera anomalier i det givna datasetet. Anomaly detection unsupervised machine learning high frequency sampled time series clustering dimensionality reduction DBSCAN LOF Anomaly detection unsupervised machine learning high frequency sampled time series clustering dimensionality reduction DBSCAN LOF Probability Theory and Statistics Sannolikhetsteori och statistik
29	Unsupervised Anomaly Detection on Time Series Data: An Implementation on Electricity Consumption Series / Oövervakad anomalidetektion i tidsseriedata: en implementation på elförbrukningsserier Lindroth Henriksson, Amelia January 2021 (has links) Digitization of the energy industry, introduction of smart grids and increasing regulation of electricity consumption metering have resulted in vast amounts of electricity data. This data presents a unique opportunity to understand the electricity usage and to make it more efficient, reducing electricity consumption and carbon emissions. An important initial step in analyzing the data is to identify anomalies. In this thesis the problem of anomaly detection in electricity consumption series is addressed using four machine learning methods: density based spatial clustering for applications with noise (DBSCAN), local outlier factor (LOF), isolation forest (iForest) and one-class support vector machine (OC-SVM). In order to evaluate the methods synthetic anomalies were introduced to the electricity consumption series and the methods were then evaluated for the two anomaly types point anomaly and collective anomaly. In addition to electricity consumption data, features describing the prior consumption, outdoor temperature and date-time properties were included in the models. Results indicate that the addition of the temperature feature and the lag features generally impaired anomaly detection performance, while the inclusion of date-time features improved it. Of the four methods, OC-SVM was found to perform the best at detecting point anomalies, while LOF performed the best at detecting collective anomalies. In an attempt to improve the models' detection power the electricity consumption series were de-trended and de-seasonalized and the same experiments were carried out. The models did not perform better on the decomposed series than on the non-decomposed. / Digitaliseringen av elbranschen, införandet av smarta nät samt ökad reglering av elmätning har resulterat i stora mängder eldata. Denna data skapar en unik möjlighet att analysera och förstå fastigheters elförbrukning för att kunna effektivisera den. Ett viktigt inledande steg i analysen av denna data är att identifiera möjliga anomalier. I denna uppsats testas fyra olika maskininlärningsmetoder för detektering av anomalier i elförbrukningsserier: densitetsbaserad spatiell klustring för applikationer med brus (DBSCAN), lokal avvikelse-faktor (LOF), isoleringsskog (iForest) och en-klass stödvektormaskin (OC-SVM). För att kunna utvärdera metoderna infördes syntetiska anomalier i elförbrukningsserierna och de fyra metoderna utvärderades därefter för de två anomalityperna punktanomali och gruppanomali. Utöver elförbrukningsdatan inkluderades även variabler som beskriver tidigare elförbrukning, utomhustemperatur och tidsegenskaper i modellerna. Resultaten tyder på att tillägget av temperaturvariabeln och lag-variablerna i allmänhet försämrade modellernas prestanda, medan införandet av tidsvariablerna förbättrade den. Av de fyra metoderna visade sig OC-SVM vara bäst på att detektera punktanomalier medan LOF var bäst på att detektera gruppanomalier. I ett försök att förbättra modellernas detekteringsförmåga utfördes samma experiment efter att elförbrukningsserierna trend- och säsongsrensats. Modellerna presterade inte bättre på de rensade serierna än på de icke-rensade. Unsupervised learning machine learning anomaly detection time series electricity consumption synthetic anomalies DBSCAN LOF iForest OC-SVM Oövervakad inlärning maskininlärning anomalidetektion tidsserier elförbrukning syntetiska anomalier DBSCAN LOF iForest OC-SVM Mathematics Matematik
30	Deinterleaving pulse trains with DBSCAN and FART Mahmod, Shad January 2019 (has links) Studying radar pulses and looking for certain patterns is critical in order to assess the threat level of the environment around an antenna. In order to study the electromagnetic pulses emitted from a certain radar, one must first register and identify these pulses. Usually there are several active transmitters in anenvironment and an antenna will register pulses from various sources. In order to study the different pulse trains, the registered pulses first have to be sorted sothat all pulses that are transmitted from one source are grouped together. This project aims to solve this problem, using Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and compare the results with those obtained by Fuzzy Adaptive Resonance Theory (FART). We aim to further dig into these methods and map out how factors such as feature selection and training time affects the results. A solution based on the DBSCAN method is proposed which allows online clustering of new points introduced to the system. The methods are implemented and tested on simulated data. The data consists of pulse trains from simulated transmitters with unique behaviors. The deployed methods are then tested varying the parameters of the models as well as the number of pulse trains they are asked to deinterleave. The results when applying the models are then evaluated using the adjusted Rand index (ARI). The results indicate that in most cases using all possible data (in this case the angle of arrival, radio frequency, pulse width and amplitudes of the pulses) generate the best results. Rescaling the data further improves the result and tuning the parameters shows that the models work well when increasing the number of emitters. The results also indicate that the DBSCAN method can be used to get accurate estimates of the number of emitters transmitting. The online DBSCAN generates a higher ARI than FART on the simulated data set but has a higher worst case computational cost. DBSCAN FART Fuzzy Adaptive Resonance Theory Radar Clustering Deinterleaving Machine Learning Other Computer and Information Science Annan data- och informationsvetenskap

Search results