Global ETD Search

271	Early diagnosis and personalised treatment focusing on synthetic data modelling: Novel visual learning approach in healthcare Mahmoud, Ahsanullah Y., Neagu, Daniel, Scrimieri, Daniele, Abdullatif, Amr R.A. 09 August 2023 (has links) Yes / The early diagnosis and personalised treatment of diseases are facilitated by machine learning. The quality of data has an impact on diagnosis because medical data are usually sparse, imbalanced, and contain irrelevant attributes, resulting in suboptimal diagnosis. To address the impacts of data challenges, improve resource allocation, and achieve better health outcomes, a novel visual learning approach is proposed. This study contributes to the visual learning approach by determining whether less or more synthetic data are required to improve the quality of a dataset, such as the number of observations and features, according to the intended personalised treatment and early diagnosis. In addition, numerous visualisation experiments are conducted, including using statistical characteristics, cumulative sums, histograms, correlation matrix, root mean square error, and principal component analysis in order to visualise both original and synthetic data to address the data challenges. Real medical datasets for cancer, heart disease, diabetes, cryotherapy and immunotherapy are selected as case studies. As a benchmark and point of classification comparison in terms of such as accuracy, sensitivity, and specificity, several models are implemented such as k-Nearest Neighbours and Random Forest. To simulate algorithm implementation and data, Generative Adversarial Network is used to create and manipulate synthetic data, whilst, Random Forest is implemented to classify the data. An amendable and adaptable system is constructed by combining Generative Adversarial Network and Random Forest models. The system model presents working steps, overview and flowchart. Experiments reveal that the majority of data-enhancement scenarios allow for the application of visual learning in the first stage of data analysis as a novel approach. To achieve meaningful adaptable synergy between appropriate quality data and optimal classification performance while maintaining statistical characteristics, visual learning provides researchers and practitioners with practical human-in-the-loop machine learning visualisation tools. Prior to implementing algorithms, the visual learning approach can be used to actualise early, and personalised diagnosis. For the immunotherapy data, the Random Forest performed best with precision, recall, f-measure, accuracy, sensitivity, and specificity of 81%, 82%, 81%, 88%, 95%, and 60%, as opposed to 91%, 96%, 93%, 93%, 96%, and 73% for synthetic data, respectively. Future studies might examine the optimal strategies to balance the quantity and quality of medical data. Personalised and early diagnosis Machine learning Imbalanced UCI data Generative Adversarial Network Random Forest Synthetic data Visualisations Healthcare
272	Predictive Study of Flame status inside a combustor of a gas turbine using binary classification Sasikumar, Sreenand January 2022 (has links) Quick and accurate detection of flame inside a gas turbine is very crucial to mitigaterisks in power generation. Failure of flame detection increases downtime and maintenancecosts and on rare occasions it may cause explosions due to buildup of incombustible fuel inside the combustion chamber.The aim of this thesis is to investigate the applicability ofmachine learning methods to detect the presence of flame within a gas turbine. Traditionally,this is done using an optical flame detection which converts the infrared radiation toa differential reading, which is further converted as a digital signal to the control systemand gives the flame status (1 for flame ON and 0 for flame OFF). The primary purpose ofthis alternative flame detection method is to reduce the instrument cost per gas turbine. Amachine learning model is trained with the data collected over several runs of the turbineengine and would estimate if there is an occurrence of the flame, to decide if the machineshould be ON or OFF. To reduce the instrumentation cost, the presented flame predictionmethod based on deep learning methods is employed, which takes standard data such as dynamic pressure and temperature values as input. These variables are observed to have a high correlation with the flame status. The pressure is measured using a piezocryst sensorand the temperature is measured using a thermocouple. A Study is performed by trainingon several machine learning models and coming up with which model among them have worked the best on this data.The Logistic is used as a baseline and is compared with othermodels such as KNN,SVM,Naïve Bayes,RandomForest and XGBoost is trained with thedata collected over several runs of the turbine and tested on to predict flame status insidethe gas turbine.It was observed that KNN and Random Forest performed exceptionallywell as compared to the baseline model. It is recorded that the minimum time for estimation of the flame status by the machine is 0.6 seconds and if the model implementedcan give a high accuracy with the same time then the proposed method can be an effective alternate flame detection method. Binary Classification Random Forest SVM XGBoost Logistic Regression Extra-trees GasTurbine Flame Detection Probability Theory and Statistics Sannolikhetsteori och statistik
273	Interpretation, Identification and Reuse of Models. Theory and algorithms with applications in predictive toxicology. Palczewska, Anna Maria January 2014 (has links) This thesis is concerned with developing methodologies that enable existing models to be effectively reused. Results of this thesis are presented in the framework of Quantitative Structural-Activity Relationship (QSAR) models, but their application is much more general. QSAR models relate chemical structures with their biological, chemical or environmental activity. There are many applications that offer an environment to build and store predictive models. Unfortunately, they do not provide advanced functionalities that allow for efficient model selection and for interpretation of model predictions for new data. This thesis aims to address these issues and proposes methodologies for dealing with three research problems: model governance (management), model identification (selection), and interpretation of model predictions. The combination of these methodologies can be employed to build more efficient systems for model reuse in QSAR modelling and other areas. The first part of this study investigates toxicity data and model formats and reviews some of the existing toxicity systems in the context of model development and reuse. Based on the findings of this review and the principles of data governance, a novel concept of model governance is defined. Model governance comprises model representation and model governance processes. These processes are designed and presented in the context of model management. As an application, minimum information requirements and an XML representation for QSAR models are proposed. Once a collection of validated, accepted and well annotated models is available within a model governance framework, they can be applied for new data. It may happen that there is more than one model available for the same endpoint. Which one to chose? The second part of this thesis proposes a theoretical framework and algorithms that enable automated identification of the most reliable model for new data from the collection of existing models. The main idea is based on partitioning of the search space into groups and assigning a single model to each group. The construction of this partitioning is difficult because it is a bi-criteria problem. The main contribution in this part is the application of Pareto points for the search space partition. The proposed methodology is applied to three endpoints in chemoinformatics and predictive toxicology. After having identified a model for the new data, we would like to know how the model obtained its prediction and how trustworthy it is. An interpretation of model predictions is straightforward for linear models thanks to the availability of model parameters and their statistical significance. For non linear models this information can be hidden inside the model structure. This thesis proposes an approach for interpretation of a random forest classification model. This approach allows for the determination of the influence (called feature contribution) of each variable on the model prediction for an individual data. In this part, there are three methods proposed that allow analysis of feature contributions. Such analysis might lead to the discovery of new patterns that represent a standard behaviour of the model and allow additional assessment of the model reliability for new data. The application of these methods to two standard benchmark datasets from the UCI machine learning repository shows a great potential of this methodology. The algorithm for calculating feature contributions has been implemented and is available as an R package called rfFC. / BBSRC and Syngenta (International Research Centre at Jealott’s Hill, Bracknell, UK).
274	Crash Risk Analysis of Coordinated Signalized Intersections Qiming Guo (17582769) 08 December 2023 (has links) <p dir="ltr">The emergence of time-dependent data provides researchers with unparalleled opportunities to investigate disaggregated levels of safety performance on roadway infrastructures. A disaggregated crash risk analysis uses both time-dependent data (e.g., hourly traffic, speed, weather conditions and signal controls) and fixed data (e.g., geometry) to estimate hourly crash probability. Despite abundant research on crash risk analysis, coordinated signalized intersections continue to require further investigation due to both the complexity of the safety problem and the relatively small number of past studies that investigated the risk factors of coordinated signalized intersections. This dissertation aimed to develop robust crash risk prediction models to better understand the risk factors of coordinated signalized intersections and to identify practical safety countermeasures. The crashes first were categorized into three types (same-direction, opposite-direction, and right-angle) within several crash-generating scenarios. The data needed were organized in hourly observations and included the following factors: road geometric features, traffic movement volumes, speeds, weather precipitation and temperature, and signal control settings. Assembling hourly observations for modeling crash risk was achieved by synchronizing and linking data sources organized at different time resolutions. Three different non-crash sampling strategies were applied to the following three statistical models (Conditional Logit, Firth Logit, and Mixed Logit) and two machine learning models (Random Forest and Penalized Support Vector Machine). Important risk factors, such as the presence of light rain, traffic volume, speed variability, and vehicle arrival pattern of downstream, were identified. The Firth Logit model was selected for implementation to signal coordination practice. This model turned out to be most robust based on its out-of-sample prediction performance and its inclusion of important risk factors. The implementation examples of the recommended crash risk model to building daily risk profiles and to estimating the safety benefits of improved coordination plans demonstrated the model’s practicality and usefulness in improving safety at coordinated signals by practicing engineers.</p> Transport engineering Crash Risk Time-dependent Data Conditional Logit Firth Logit Mixed Logit Random Forest Penalized SVM Coordinated Signalized Intersections
275	Identifying the beginning of a kayak race using velocity signal data Kvedaraite, Indre January 2023 (has links) A kayak is a small watercraft that moves over the water. The kayak is propelled by a person sitting inside of the hull and paddling using a double-bladed paddle. While kayaking can be casual, it is used as a competitive sport in races and even the Olympic games. Therefore, it is important to be able to analyse athletes’ performance during the race. To study the race better, some kayaking teams and organizations have attached sensors to their kayaks. These sensors record various data, which is later used to generate performance reports. However, to generate such reports, the coach must manually pinpoint the beginning of the race because the sensors collect data before the actual race begins, which may include practice runs, warming-up sessions, or just standing and waiting position. The identification of the race start and the race sequence in the data is tedious and time-consuming work and could be automated. This project proposes an approach to identify kayak races from velocity signal data with the help of a machine learning algorithm. The proposed approach is a combination of several techniques: signal preprocessing, a machine learning algorithm, and a programmatic approach. Three machine learning algorithms were evaluated to detect the race sequence, which are Support Vector Machine (SVM), k-Nearest Neighbour (kNN), and Random Forest (RF). SVM outperformed other algorithms with an accuracy of 95%. Programmatic approach was proposed to identify the start time of the race. The average error of the proposed approach is 0.24 seconds. The proposed approach was utilized in the implemented web-based application with a user interface for coaches to automatically detect the beginning of a kayak race and race signal sequence. kayak race velocity signal machine learning support vector machine k nearest neighbours random forest Computer Sciences Datavetenskap (datalogi)
276	Support Vector Machine Classifiers Show High Generalizability in Automatic Fall Detection in Older Adults Alizadeh, Jalal, Bogdan, Martin, Classen, Joseph, Fricke, Christopher 08 May 2023 (has links) Falls are a major cause of morbidity and mortality in neurological disorders. Technical means of detecting falls are of high interest as they enable rapid notification of caregivers and emergency services. Such approaches must reliably differentiate between normal daily activities and fall events. A promising technique might be based on the classification of movements based on accelerometer signals by machine-learning algorithms, but the generalizability of classifiers trained on laboratory data to real-world datasets is a common issue. Here, three machine-learning algorithms including Support Vector Machine (SVM), k-Nearest Neighbors (kNN), and Random Forest (RF) were trained to detect fall events. We used a dataset containing intentional falls (SisFall) to train the classifier and validated the approach on a different dataset which included real-world accidental fall events of elderly people (FARSEEING). The results suggested that the linear SVM was the most suitable classifier in this cross-dataset validation approach and reliably distinguished a fall event from normal everyday activity at an accuracy of 93% and similarly high sensitivity and specificity. Thus, classifiers based on linear SVM might be useful for automatic fall detection in real-world applications. info:eu-repo/classification/ddc/620 ddc:620
277	Human Activity Recognition and Step Counter Using Smartphone Sensor Data Jansson, Fredrik, Sidén, Gustaf January 2022 (has links) Human Activity Recognition (HAR) is a growing field of research concerned with classifying human activities from sensor data. Modern smartphones contain numerous sensors that could be used to identify the physical activities of the smartphone wearer, which could have applications in sectors such as healthcare, eldercare, and fitness. This project aims to use smartphone sensor data together with machine learning to perform HAR on the following human locomotion activities: standing, walking, running, ascending stairs, descending stairs, and biking. The classification was done using a random forest classifier. Furthermore, in the special case of walking, an algorithm that can count the number of steps in a given data sequence was developed. The step counting algorithm was not based on a previous implementation and could therefore be considered novel. The step counter achieved a testing accuracy of 99.1\% and the HAR classifier a testing accuracy of 100\%. It is speculated that the abnormally high accuracies can be attributed primarily to the lack of data diversity, as in both cases only two persons collected the data. / Mänsklig aktivitetsigenkänning är ett växande forskningsområde som handlar om att klassificera mänskliga aktiviteter från sensordata. Moderna mobiltelefoner innehåller många sensorer som kan användas för att identifiera de fysiska aktiviteterna som bäraren utför, vilket har tillämpningar inom sektorer som sjukvård, äldreomsorg och personlig hälsa. Detta projekt använder sensordata från mobiltelefoner tillsammans med maskininlärning för att utföra aktivitetsigenkänning på följande aktiviteter: stå, gå, springa, gå uppför trappor, gå nedför trappor och cykla. Klassificeringen gjordes med hjälp av en ``random forest''-klassificerare. Vidare utvecklades en algoritm som kan räkna antalet steg i en given datasekvens som samlats in när användaren går. Stegräkningsalgoritmen baserades inte på en tidigare implementering och kan därför betraktas som ny. Stegräknaren uppnådde en testnoggrannhet på 99,1\% och aktivitetsigenkänningen en testnoggrannhet på 100\%. De oväntat höga noggrannheterna antas främst bero på bristen av diversitet i datan, eftersom den endast samlades in av två personer i båda fallen. / Kandidatexjobb i elektroteknik 2022, KTH, Stockholm Human Activity Recognition Step Counter Smartphone Sensor Data Accelerometer Gyroscope Random Forest. Elektroteknik och elektronik
278	Modélisation spatiale du dauphin chilien (Cephalorhynchus eutropia) : le cas de Seno Skyring au Chili Demers, Simon 04 1900 (has links) Les côtes chiliennes sont parmi les plus productives au monde, ce qui leur permet d'abriter une grande diversité de mammifères marins. En effet, près de la moitié des cétacés observés dans le monde sont présents dans les écosystèmes marins du Chili. Dans un contexte où l’augmentation des activités anthropiques relative à l’exploitation de nos océans s’étend jusqu’aux secteurs éloignés, les nombreux fjords du sud de la Patagonie nécessitent une attention particulière. L’évaluation des enjeux pouvant bouleverser l’équilibre de ces milieux méconnus s’avère indispensable. La répartition spatiale distincte de plusieurs espèces de petits cétacés en Patagonie, la croissance accentuée des activités anthropiques depuis les deux dernières décennies et le peu de savoir sur les enjeux de cohabitations d’habitats, justifient l’urgence de développer des connaissances pouvant démystifier l’abondance de cétacés dans plusieurs écosystèmes marins du Chili. Le projet de recherche suivant met l’emphase sur le développement de connaissances sur la distribution du dauphin chilien (Cephalorhynchus eutropia) dans le secteur Skyring situé au sud du Chili. Il est possible d’observer ce petit cétacé entre Valparaíso (33°S) et Cape Horn (55°15′S). Une étude récente propose une distinction génétique de la population suite à la dernière glaciation. La première population, située au nord du pays, se distingue par son occupation continue près des embouchures de rivières et dans les secteurs peu profonds. La seconde population,située dans la portion sud du pays, est caractérisée par sa présence discontinue dans les nombreux fjords et canaux. Actuellement, il est difficile d’évaluer le nombre total de la population mais tout porte à croire que ce nombre serait en dessous des 10 000 individus matures. La recherche suivante propose donc l’utilisation d’un outil de modélisation d’habitat basé sur une forêt d’arbres décisionnels dans le but d’identifier les différentes composantes écosystémiques qui font de Seno Skyring, un secteur de prédilection pour le dauphin chilien. Enfin, la création d’un catalogue d’identification à l’aide de la photo-identification offre un outil de suivi de la population tout en évaluant la fréquentation du dauphin chilien dans le secteur Seno Skyring. / The Chilean coasts are among the most productive in the world, which allows them to shelter a great diversity of marine mammals. Indeed, almost half of the cetaceans observed in the world are present in Chile. In a context where the increase in human activities relating to the exploitation of our oceans extends to remote areas, the numerous fjords of southern Patagonia require our special attention. It is essential to assess the issues that could upset the balance of these little-known environments. The distinct spatial distribution of several species of small cetaceans in Patagonia, the accentuated growth of anthropic activities over the last two decades and the lack of knowledge surrounding the cohabitation of their habitat, justifies the urgency of developing knowledge that could demystifies the abundancy of cetaceans present in several marine ecosystems of Chile. The following research project focus on developing knowledge about the distribution of the Chilean dolphin (Cephalorhynchus eutropia) in the Seno Skyring area located in southern Chile. It is possible to observe this small cetacean between Valparaíso (33°S) and Cape Horn (55°15′S). A recent study suggests that the population would be genetically divided following the last glaciation. The first population, located in the north of the country, is distinguished by their continuous occupation near rivers mouths and shallow areas. The second population, located in the southern portion of the country, is characterized by its discontinuous presence in the many fjords and canals of Chile. Currently, it is difficult to assess the total number of the population, but recent studies suggests that this number would be below 10,000 mature individuals. The following research proposes a habitat modeling tool based on decision trees with the aim of identifying the different ecosystem components that make Seno Skyring a chosen area for the Chilean dolphin. Finally, the creation of an identification catalog using photo-identification offers a tool for monitoring the population while evaluating the frequentation of the Chilean dolphin in the Seno Skyring sector. Chili Distribution Dauphin Aquaculture Abiotique Biotique Classification Dolphin Abiotic Biotic Random forest
279	Application of Machine Learning and AI for Prediction in Ungauged Basins Pin-Ching Li (16734693) 03 August 2023 (has links) <p>Streamflow prediction in ungauged basins (PUB) is a process generating streamflow time series at ungauged reaches in a river network. PUB is essential for facilitating various engineering tasks such as managing stormwater, water resources, and water-related environmental impacts. Machine Learning (ML) has emerged as a powerful tool for PUB using its generalization process to capture the streamflow generation processes from hydrological datasets (observations). ML’s generalization process is impacted by two major components: data splitting process of observations and the architecture design. To unveil the potential limitations of ML’s generalization process, this dissertation explores its robustness and associated uncertainty. More precisely, this dissertation has three objectives: (1) analyzing the potential uncertainty caused by the data splitting process for ML modeling, (2) investigating the improvement of ML models’ performance by incorporating hydrological processes within their architectures, and (3) identifying the potential biases in ML’s generalization process regarding the trend and periodicity of streamflow simulations.</p><p>The first objective of this dissertation is to assess the sensitivity and uncertainty caused by the regular data splitting process for ML modeling. The regular data splitting process in ML was initially designed for homogeneous and stationary datasets, but it may not be suitable for hydrological datasets in the context of PUB studies. Hydrological datasets usually consist of data collected from diverse watersheds with distinct streamflow generation regimes influenced by varying meteorological forcing and watershed characteristics. To address the potential inconsistency in the data splitting process, multiple data splitting scenarios are generated using the Monte Carlo method. The scenario with random data splitting results accounts for frequent covariate shift and tends to add uncertainty and biases to ML’s generalization process. The findings in this objective suggest the importance of avoiding the covariate shift during the data splitting process when developing ML models for PUB to enhance the robustness and reliability of ML’s performance.</p><p>The second objective of this dissertation is to investigate the improvement of ML models’ performance brought by Physics-Guided Architecture (PGA), which incorporates ML with the rainfall abstraction process. PGA is a theory-guided machine learning framework integrating conceptual tutors (CTs) with ML models. In this study, CTs correspond to rainfall abstractions estimated by Green-Ampt (GA) and SCS-CN models. Integrating the GA model’s CTs, which involves information on dynamic soil properties, into PGA models leads to better performance than a regular ML model. On the contrary, PGA models integrating the SCS-CN model's CTs yield no significant improvement of ML model’s performance. The results of this objective demonstrate that the ML’s generalization process can be improved by incorporating CTs involving dynamic soil properties.</p><p>The third objective of this dissertation is to explore the limitations of ML’s generalization process in capturing trend and periodicity for streamflow simulations. Trend and periodicity are essential components of streamflow time series, representing the long-term correlations and periodic patterns, respectively. When the ML models generate streamflow simulations, they tend to have relatively strong long-term periodic components, such as yearly and multiyear periodic patterns. In addition, compared to the observed streamflow data, the ML models display relatively weak short-term periodic components, such as daily and weekly periodic patterns. As a result, the ML’s generalization process may struggle to capture the short-term periodic patterns in the streamflow simulations. The biases in ML’s generalization process emphasize the demands for external knowledge to improve the representation of the short-term periodic components in simulating streamflow.</p> Surface water hydrology Hydrology Machine Learning prediction in ungauged basins (PUB) streamflow predictions Random forest (RF) regressor LSTM neural networks
280	Optimizing Flight Ranking:A Machine Learning Approach : Applying Machine Learning to Upgrade Flight Sorting and User Experience / Optimering av flygsortering:En approach med maskininlärning Jabeli, Habib January 2024 (has links) Flygresor.se, a leading flight comparison platform, uses machine learning to rankflights based on their likelihood of being clicked. The main goal of this project was toimprove this flight sorting to obtain a better user experience. The platform's existingmodel is based on a neural network approach and a limited set of features. The solution involved developing and comparing two machine learning models, Random Forest and XGBoost besides using a set of existing and newly created features. TheXGBoost model demonstrated superior performance by significantly improving theprediction of clicked flights by 4.18% while also achieving a remarkable increase inefficiency by being 125 times faster than the existing model. / Flygresor.se, en ledande plattform för jämförelse av flygresor, använder maskininlärning för att ranka flygresor baserat på deras sannolikhet att bli klickade. Huvudmåletmed detta projekt var att förbättra denna flygsortering för att få en bättre användarupplevelse. Plattformens befintliga modell är baserad på ett neuralt nätverk och ettbegränsat antal funktioner. Lösningen innebar att utveckla och jämföra två maskininlärningsmodeller, Random Forest och XGBoost, förutom att använda en uppsättning befintliga och nyskapade funktioner. XGBoost-modellen visade bättre prestandagenom att förbättra predikteringen av de klickade flygresor med 4,18 % samtidigt somden uppnådde högre nivå av effektivitet genom att vara 125 gånger snabbare än denbefintliga modellen. Machine Learning Flight Comparison Flygresor.se Neural Networks Flight Ranking Random Forest XGBoost Computer and Information Sciences Data- och informationsvetenskap

Search results