Global ETD Search

391	A Comparison of Various Interpolation Techniques for Modeling and Estimation of Radon Concentrations in Ohio Gummadi, Jayaram January 2013 (has links) No description available. Computer Science artificial neural networks cross-validation prior knowledge input source difference space-mapped neural networks support vector regression radon random forest regression
392	Malicious Activity Detection in Encrypted Network Traffic using A Fully Homomorphic Encryption Method Adiyodi Madhavan, Resmi, Sajan, Ann Zenna January 2022 (has links) Everyone is in need for their own privacy and data protection, since encryption transmission was becoming common. Fully Homomorphic Encryption (FHE) has received increased attention because of its capability to execute calculations over the encoded domain. Through using FHE approach, model training can be properly outsourced. The goal of FHE is to enable computations on encrypted files without decoding aside from the end outcome. The CKKS scheme is used in FHE.Network threats are serious danger to credential information, which enable an unauthorised user to extract important and sensitive data by evaluating the information of computations done on raw data. Thus the study provided an efficient solution to the problem of privacy protection in data-driven applications using Machine Learning. The study used an encrypted NSL KDD dataset. Machine learning-based techniques have emerged as a significant trend for detecting malicious attack. Thus, Random Forest (RF) is proposed for the detection of malicious attacks on Homomorphic encrypted data in the cloud server. Logistic Regression (LR) machine learning model is used to predict encrypted data on cloud server. Regardless of the distributed setting, the technique may retain the accuracy and integrity of the previous methods to obtain the final results. Malicious Activity Detection Cloud Computing Network Traffic Fully Homomorphic Encryption (FHE) Machine Learning Random Forest(RF) Logistic Regession (LR) Engineering and Technology Teknik och teknologier
393	Epigenetic Responses of Arabidopsis to Abiotic Stress Laliberte, Suzanne Rae 17 March 2023 (has links) Weed resistance to control measures, particularly herbicides, is a growing problem in agriculture. In the case of herbicides, resistance is sometimes connected to genetic changes that directly affect the target site of the herbicide. Other cases are less straightforward where resistance arises without such a clear-cut mechanism. Understanding the genetic and gene regulatory mechanisms that may lead to the rapid evolution of resistance in weedy species is critical to securing our food supply. To study this phenomenon, we exposed young Arabidopsis plants to sublethal levels of one of four weed management stressors, glyphosate herbicide, trifloxysulfuron herbicide, mechanical clipping, and shading. To evaluate responses to these stressors we collected data on gene expression and regulation via epigenetic modification (methylation) and small RNA (sRNA). For all of the treatments except shade, the stress was limited in duration, and the plants were allowed to recover until flowering, to identify changes that persist to reproduction. At flowering, DNA for methylation bisulfite sequencing, RNA, and sRNA were extracted from newly formed rosette leaf tissue. Analyzing the individual datasets revealed many differential responses when compared to the untreated control for gene expression, methylation, and sRNA expression. All three measures showed increases in differential abundance that were unique to each stressor, with very little overlap between stressors. Herbicide treatments tended to exhibit the largest number of significant differential responses, with glyphosate treatment most often associated with the greatest differences and contributing to overlap. To evaluate how large datasets from methylation, gene expression, and sRNA analyses could be connected and mined to link regulatory information with changes in gene expression, the information from each dataset and for each gene was united in a single large matrix and mined with classification algorithms. Although our models were able to differentiate patterns in a set of simulated data, the raw datasets were too noisy for the models to consistently identify differentially expressed genes. However, by focusing on responses at a local level, we identified several genes with differential expression, differential sRNA, and differential methylation. While further studies will be needed to determine whether these epigenetic changes truly influence gene expression at these sites, the changes detected at the treatment level could prime the plants for future incidents of stress, including herbicides. / Doctor of Philosophy / Growing resistance to herbicides, particularly glyphosate, is one of the many problems facing agriculture. The rapid rise of resistance across herbicide classes has caused some to wonder if there is a mechanism of adaptation that does not involve mutations. Epigenetics is the study of changes in the phenotype that cannot be attributed to changes in the genotype. Typically, studies revolve around two features of the chromosomes: cytosine methylation and histone modifications. The former can influence how proteins interact with DNA, and the latter can influence protein access to DNA. Both can affect each other in self-reinforcing loops. They can affect gene expression, and DNA methylation can be directed by small RNA (sRNA), which can also influence gene expression through other pathways. To study these processes and their role in abiotic stress response, we aimed to analyze sRNA, RNA, and DNA from Arabidopsis thaliana plants under stress. The stresses applied were sublethal doses of the herbicides, glyphosate and trifloxysulfuron, as well as mechanical clipping and shade to represent other weed management stressors. The focus of the project was to analyze these responses individually and together to find epigenetic responses to stresses routinely encountered by weeds. We tested RNA for gene expression changes under our stress conditions and identified many, including some pertaining to DNA methylation regulation. The herbicide treatments were associated with upregulated defense genes and downregulated growth genes. Shade treated plants had many downregulated defense and other stress response genes. We also detected differential methylation and sRNA responses when compared to the control plants. Changes to methylation and sRNA only accounted for about 20% of the variation in gene expression. While attempting to link the epigenetic process of methylation to gene expression, we connected all the data sets and developed computer programs to try to make correlations. While these methods worked on a simulated dataset, we did not detect broad patterns of changes to epigenetic pathways that correlated strongly with gene expression in our experiment's data. There are many factors that can influence gene expression that could create noise that would hinder the algorithms' abilities to detect differentially expressed genes. This does not, however, rule out the possibility of epigenetic influence on gene expression in local contexts. Through scoring the traits of individual genes, we found several that interest us for future studies. epigenetics weeds bioinformatics RNA Seq differential expression analysis whole genome bisulfite sequencing data mining k-means clustering decision tree random forest multi-'omics
394	Real-Time Estimation of Traffic Stream Density using Connected Vehicle Data Aljamal, Mohammad Abdulraheem 02 October 2020 (has links) The macroscopic measure of traffic stream density is crucial in advanced traffic management systems. However, measuring the traffic stream density in the field is difficult since it is a spatial measurement. In this dissertation, several estimation approaches are developed to estimate the traffic stream density on signalized approaches using connected vehicle (CV) data. First, the dissertation introduces a novel variable estimation interval that allows for higher estimation precision, as the updating time interval always contains a fixed number of CVs. After that, the dissertation develops model-driven approaches, such as a linear Kalman filter (KF), a linear adaptive KF (AKF), and a nonlinear Particle filter (PF), to estimate the traffic stream density using CV data only. The proposed model-driven approaches are evaluated using empirical and simulated data, the former of which were collected along a signalized approach in downtown Blacksburg, VA. Results indicate that density estimates produced by the linear KF approach are the most accurate. A sensitivity of the estimation approaches to various factors including the level of market penetration (LMP) of CVs, the initial conditions, the number of particles in the PF approach, traffic demand levels, traffic signal control methods, and vehicle length is presented. Results show that the accuracy of the density estimate increases as the LMP increases. The KF is the least sensitive to the initial traffic density estimate, while the PF is the most sensitive to the initial traffic density estimate. The results also demonstrate that the proposed estimation approaches work better at higher demand levels given that more CVs exist for the same LMP scenario. For traffic signal control methods, the results demonstrate a higher estimation accuracy for fixed traffic signal timings at low traffic demand levels, while the estimation accuracy is better when the adaptive phase split optimizer is activated for high traffic demand levels. The dissertation also investigates the sensitivity of the KF estimation approach to vehicle length, demonstrating that the presence of longer vehicles (e.g. trucks) in the traffic link reduces the estimation accuracy. Data-driven approaches are also developed to estimate the traffic stream density, such as an artificial neural network (ANN), a k-nearest neighbor (k-NN), and a random forest (RF). The data-driven approaches also utilize solely CV data. Results demonstrate that the ANN approach outperforms the k-NN and RF approaches. Lastly, the dissertation compares the performance of the model-driven and the data-driven approaches, showing that the ANN approach produces the most accurate estimates. However, taking into consideration the computational time needed to train the ANN approach, the large amount of data needed, and the uncertainty in the performance when new traffic behaviors are observed (e.g., incidents), the use of the linear KF approach is highly recommended in the application of traffic density estimation due to its simplicity and applicability in the field. / Doctor of Philosophy / Estimating the number of vehicles (vehicle counts) on a road segment is crucial in advanced traffic management systems. However, measuring the number of vehicles on a road segment in the field is difficult because of the need for installing multiple detection sensors in that road segment. In this dissertation, several estimation approaches are developed to estimate the number of vehicles on signalized roadways using connected vehicle (CV) data. The CV is defined as the vehicle that can share its instantaneous location every time t. The dissertation develops model-driven approaches, such as a linear Kalman filter (KF), a linear adaptive KF (AKF), and a nonlinear Particle filter (PF), to estimate the number of vehicles using CV data only. The proposed model-driven approaches are evaluated using real and simulated data, the former of which were collected along a signalized roadway in downtown Blacksburg, VA. Results indicate that the number of vehicles produced by the linear KF approach is the most accurate. The results also show that the KF approach is the least sensitive approach to the initial conditions. Machine learning approaches are also developed to estimate the number of vehicles, such as an artificial neural network (ANN), a k-nearest neighbor (k-NN), and a random forest (RF). The machine learning approaches also use CV data only. Results demonstrate that the ANN approach outperforms the k-NN and RF approaches. Finally, the dissertation compares the performance of the model-driven and the machine learning approaches, showing that the ANN approach produces the most accurate estimates. However, taking into consideration the computational time needed to train the ANN approach, the huge amount of data needed, and the uncertainty in the performance when new traffic behaviors are observed (e.g., incidents), the use of the KF approach is highly recommended in the application of vehicle count estimation due to its simplicity and applicability in the field. Real-Time Estimation Connected Vehicles Traffic Density Machine learning Kalman Filter Particle Filter Artificial Neural Network Random Forest k-Nearest Neighbors Level of Market Penetration Rate
395	Healthcare data heterogeneity and its contribution to machine learning performance Pérez Benito, Francisco Javier 09 November 2020 (has links) Tesis por compendio / [EN] The data quality assessment has many dimensions, from those so obvious as the data completeness and consistency to other less evident such as the correctness or the ability to represent the target population. In general, it is possible to classify them as those produced by an external effect, and those that are inherent in the data itself. This work will be focused on those inherent to data, such as the temporal and the multisource variability applied to healthcare data repositories. Every process is usually improved over time, and that has a direct impact on the data distribution. Similarly, how a process is executed in different sources may vary due to many factors, such as the diverse interpretation of standard protocols by human beings or different previous experiences of experts. Artificial Intelligence has become one of the most widely extended technological paradigms in almost all the scientific and industrial fields. Advances not only in models but also in hardware have led to their use in almost all areas of science. Although the solved problems using this technology often have the drawback of not being interpretable, or at least not as much as other classical mathematical or statistical techniques. This motivated the emergence of the "explainable artificial intelligence" concept, that study methods to quantify and visualize the training process of models based on machine learning. On the other hand, real systems may often be represented by large networks (graphs), and one of the most relevant features in such networks is the community or clustering structure. Since sociology, biology, or clinical situations could usually be modeled using graphs, community detection algorithms are becoming more and more extended in a biomedical field. In the present doctoral thesis, contributions have been made in the three above mentioned areas. On the one hand, temporal and multisource variability assessment methods based on information geometry were used to detect variability in data distribution that may hinder data reuse and, hence, the conclusions which can be extracted from them. This methodology's usability was proved by a temporal variability analysis to detect data anomalies in the electronic health records of a hospital over 7 years. Besides, it showed that this methodology could have a positive impact if it applied previously to any study. To this end, firstly, we extracted the variables that highest influenced the intensity of headache in migraine patients using machine learning techniques. One of the principal characteristics of machine learning algorithms is its capability of fitting the training set. In those datasets with a small number of observations, the model can be biased by the training sample. The observed variability, after the application of the mentioned methodology and considering as sources the registries of migraine patients with different headache intensity, served as evidence for the truthfulness of the extracted features. Secondly, such an approach was applied to measure the variability among the gray-level histograms of digital mammographies. We demonstrated that the acquisition device produced the observed variability, and after defining an image preprocessing step, the performance of a deep learning model, which modeled a marker of breast cancer risk estimation, increased. Given a dataset containing the answers to a survey formed by psychometric scales, or in other words, questionnaires to measure psychologic factors, such as depression, cope, etcetera, two deep learning architectures that used the data structure were defined. Firstly, we designed a deep learning architecture using the conceptual structure of such psychometric scales. This architecture was trained to model the happiness degree of the participants, improved the performance compared to classical statistical approaches. A second architecture, automatically designed using community detection in graphs, was not only a contribution / [ES] El análisis de la calidad de los datos abarca muchas dimensiones, desde aquellas tan obvias como la completitud y la coherencia, hasta otras menos evidentes como la correctitud o la capacidad de representar a la población objetivo. En general, es posible clasificar estas dimensiones como las producidas por un efecto externo y las que son inherentes a los propios datos. Este trabajo se centrará en la evaluación de aquellas inherentes a los datos en repositorios de datos sanitarios, como son la variabilidad temporal y multi-fuente. Los procesos suelen evolucionar con el tiempo, y esto tiene un impacto directo en la distribución de los datos. Análogamente, la subjetividad humana puede influir en la forma en la que un mismo proceso, se ejecuta en diferentes fuentes de datos, influyendo en su cuantificación o recogida. La inteligencia artificial se ha convertido en uno de los paradigmas tecnológicos más extendidos en casi todos los campos científicos e industriales. Los avances, no sólo en los modelos sino también en el hardware, han llevado a su uso en casi todas las áreas de la ciencia. Es cierto que, los problemas resueltos mediante esta tecnología, suelen tener el inconveniente de no ser interpretables, o al menos, no tanto como otras técnicas de matemáticas o de estadística clásica. Esta falta de interpretabilidad, motivó la aparición del concepto de "inteligencia artificial explicable", que estudia métodos para cuantificar y visualizar el proceso de entrenamiento de modelos basados en aprendizaje automático. Por otra parte, los sistemas reales pueden representarse a menudo mediante grandes redes (grafos), y una de las características más relevantes de esas redes, es la estructura de comunidades. Dado que la sociología, la biología o las situaciones clínicas, usualmente pueden modelarse mediante grafos, los algoritmos de detección de comunidades se están extendiendo cada vez más en el ámbito biomédico. En la presente tesis doctoral, se han hecho contribuciones en los tres campos anteriormente mencionados. Por una parte, se han utilizado métodos de evaluación de variabilidad temporal y multi-fuente, basados en geometría de la información, para detectar la variabilidad en la distribución de los datos que pueda dificultar la reutilización de los mismos y, por tanto, las conclusiones que se puedan extraer. Esta metodología demostró ser útil tras ser aplicada a los registros electrónicos sanitarios de un hospital a lo largo de 7 años, donde se detectaron varias anomalías. Además, se demostró el impacto positivo que este análisis podría añadir a cualquier estudio. Para ello, en primer lugar, se utilizaron técnicas de aprendizaje automático para extraer las características más relevantes, a la hora de clasificar la intensidad del dolor de cabeza en pacientes con migraña. Una de las propiedades de los algoritmos de aprendizaje automático es su capacidad de adaptación a los datos de entrenamiento, en bases de datos en los que el número de observaciones es pequeño, el estimador puede estar sesgado por la muestra de entrenamiento. La variabilidad observada, tras la utilización de la metodología y considerando como fuentes, los registros de los pacientes con diferente intensidad del dolor, sirvió como evidencia de la veracidad de las características extraídas. En segundo lugar, se aplicó para medir la variabilidad entre los histogramas de los niveles de gris de mamografías digitales. Se demostró que esta variabilidad estaba producida por el dispositivo de adquisición, y tras la definición de un preproceso de imagen, se mejoró el rendimiento de un modelo de aprendizaje profundo, capaz de estimar un marcador de imagen del riesgo de desarrollar cáncer de mama. Dada una base de datos que recogía las respuestas de una encuesta formada por escalas psicométricas, o lo que es lo mismo cuestionarios que sirven para medir un factor psicológico, tales como depresión, resiliencia, etc., se definieron nuevas arquitecturas de aprendizaje profundo utilizando la estructura de los datos. En primer lugar, se dise˜no una arquitectura, utilizando la estructura conceptual de las citadas escalas psicom´etricas. Dicha arquitectura, que trataba de modelar el grado de felicidad de los participantes, tras ser entrenada, mejor o la precisión en comparación con otros modelos basados en estadística clásica. Una segunda aproximación, en la que la arquitectura se diseño de manera automática empleando detección de comunidades en grafos, no solo fue una contribución de por sí por la automatización del proceso, sino que, además, obtuvo resultados comparables a su predecesora. / [CA] L'anàlisi de la qualitat de les dades comprén moltes dimensions, des d'aquelles tan òbvies com la completesa i la coherència, fins a altres menys evidents com la correctitud o la capacitat de representar a la població objectiu. En general, és possible classificar estes dimensions com les produïdes per un efecte extern i les que són inherents a les pròpies dades. Este treball se centrarà en l'avaluació d'aquelles inherents a les dades en reposadors de dades sanitaris, com són la variabilitat temporal i multi-font. Els processos solen evolucionar amb el temps i açò té un impacte directe en la distribució de les dades. Anàlogament, la subjectivitat humana pot influir en la forma en què un mateix procés, s'executa en diferents fonts de dades, influint en la seua quantificació o arreplega. La intel·ligència artificial s'ha convertit en un dels paradigmes tecnològics més estesos en quasi tots els camps científics i industrials. Els avanços, no sols en els models sinó també en el maquinari, han portat al seu ús en quasi totes les àrees de la ciència. És cert que els problemes resolts per mitjà d'esta tecnologia, solen tindre l'inconvenient de no ser interpretables, o almenys, no tant com altres tècniques de matemàtiques o d'estadística clàssica. Esta falta d'interpretabilitat, va motivar l'aparició del concepte de "inteligencia artificial explicable", que estudia mètodes per a quantificar i visualitzar el procés d'entrenament de models basats en aprenentatge automàtic. D'altra banda, els sistemes reals poden representar-se sovint per mitjà de grans xarxes (grafs) i una de les característiques més rellevants d'eixes xarxes, és l'estructura de comunitats. Atés que la sociologia, la biologia o les situacions clíniques, poden modelar-se usualment per mitjà de grafs, els algoritmes de detecció de comunitats s'estan estenent cada vegada més en l'àmbit biomèdic. En la present tesi doctoral, s'han fet contribucions en els tres camps anteriorment mencionats. D'una banda, s'han utilitzat mètodes d'avaluació de variabilitat temporal i multi-font, basats en geometria de la informació, per a detectar la variabilitat en la distribució de les dades que puga dificultar la reutilització dels mateixos i, per tant, les conclusions que es puguen extraure. Esta metodologia va demostrar ser útil després de ser aplicada als registres electrònics sanitaris d'un hospital al llarg de 7 anys, on es van detectar diverses anomalies. A més, es va demostrar l'impacte positiu que esta anàlisi podria afegir a qualsevol estudi. Per a això, en primer lloc, es van utilitzar tècniques d'aprenentatge automàtic per a extraure les característiques més rellevants, a l'hora de classificar la intensitat del mal de cap en pacients amb migranya. Una de les propietats dels algoritmes d'aprenentatge automàtic és la seua capacitat d'adaptació a les dades d'entrenament, en bases de dades en què el nombre d'observacions és xicotet, l'estimador pot estar esbiaixat per la mostra d'entrenament. La variabilitat observada després de la utilització de la metodologia, i considerant com a fonts els registres dels pacients amb diferent intensitat del dolor, va servir com a evidència de la veracitat de les característiques extretes. En segon lloc, es va aplicar per a mesurar la variabilitat entre els histogrames dels nivells de gris de mamografies digitals. Es va demostrar que esta variabilitat estava produïda pel dispositiu d'adquisició i després de la definició d'un preprocés d'imatge, es va millorar el rendiment d'un model d'aprenentatge profund, capaç d'estimar un marcador d'imatge del risc de desenrotllar càncer de mama. Donada una base de dades que arreplegava les respostes d'una enquesta formada per escales psicomètriques, o el que és el mateix qüestionaris que servixen per a mesurar un factor psicològic, com ara depressió, resiliència, etc., es van definir noves arquitectures d'aprenentatge profund utilitzant l’estructura de les dades. En primer lloc, es disseny`a una arquitectura, utilitzant l’estructura conceptual de les esmentades escales psicom`etriques. La dita arquitectura, que tractava de modelar el grau de felicitat dels participants, despr´es de ser entrenada, va millorar la precisió en comparació amb altres models basats en estad´ıstica cl`assica. Una segona aproximació, en la que l’arquitectura es va dissenyar de manera autoàtica emprant detecció de comunitats en grafs, no sols va ser una contribució de per si per l’automatització del procés, sinó que, a més, va obtindre resultats comparables a la seua predecessora. / También me gustaría mencionar al Instituto Tecnológico de la Informáica, en especial al grupo de investigación Percepción, Reconocimiento, Aprendizaje e Inteligencia Artificial, no solo por darme la oportunidad de seguir creciendo en el mundo de la ciencia, sino también, por apoyarme en la consecución de mis objetivos personales / Pérez Benito, FJ. (2020). Healthcare data heterogeneity and its contribution to machine learning performance [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/154414 / Compendio Variabilidad temporal Variabilidad Multifuente Geometría de la información Machine Learning Redes Convolucionales Grafos Detección de comunidades en grafos Redes Neuronales Random Forest MATEMATICA APLICADA
396	Divine Truces : Forecasting How Religious Audience Costs Affect Ceasefire Success Holmberg, Jonas January 2024 (has links) This thesis investigates how religious holidays affect the chance for ceasefire success. It does so while engaging in the topic of explanation and prediction, with combined methods consisting of regression analysis and forecasting using random forests. The theoretical framework argues that religious holidays impose higher audience costs for violence on leaders, increasing the chance for successful agreements. This would manifest as ceasefires connected to religious holidays being more successful than those that are not. The findings from the regression analysis find no support for the hypothesis, and rather indicate that conflict intensity, ceasefire duration, as well as monitoring and verification and enforcement mechanisms better explain the apparent variation in success. The forecasting indicates a minor difference in predictive power between the model including holiday and religiosity and the one excluding them, as well as minor effects of the independent variables on ceasefire success from partial dependence plots. These findings do not oppose the rejection of the hypothesis but rather indicate the need for an increased prevalence of forecasting methods within peace and conflict research and robustness tests using other definitions of success. Ceasefire Success Religion Holiday Audience Costs Forecasting Prediction Random Forest Machine Learning Övrig annan samhällsvetenskap
397	Data-Driven Diagnosis For Fuel Injectors Of Diesel Engines In Heavy-Duty Trucks Eriksson, Felix, Björkkvist, Emely January 2024 (has links) The diesel engine in heavy-duty trucks is a complex system with many components working together, and a malfunction in any of these components can impact engine performance and result in increased emissions. Fault detection and diagnosis have therefore become essential in modern vehicles, ensuring optimal performance and compliance with progressively stricter legal requirements. One of the most common faults in a diesel engineis faulty injectors, which can lead to fluctuations in the amount of fuel injected. Detecting these issues is crucial, prompting a growing interest in exploring additional signals beyond the currently used signal to enhance the performance and robustness of diagnosing this fault. In this work, an investigation was conducted to identify signals that correlate with faulty injectors causing over- and underfueling. It was found that the NOx, O2, and exhaust pressure signals are sensitive to this fault and could potentially serve as additional diagnostic signals. With these signals, two different diagnostic methods were evaluated to assess their effectiveness in detecting injector faults. The methods evaluated were data-driven residuals and Random Forest classifier. The data-driven residuals, when combined with the CUSUM algorithm, demonstrated promising results in detecting faulty injectors. The O2 signal proved effective in identifying both fault instances, while NOx and exhaust pressure were more effective at detecting overfueling. The Random Forest classifier also showed good performance in detecting both over- and underfueling. However, it was observed that using a classifier requires more extensive data preprocessing. Two preprocessing methods were employed: integrating previous measurements and calculating statistical measures over a defined time span. Both methods showed promising results, with the latter proving to be the better choice. Additionally, the generalization capabilities of these methods across different operating conditions were evaluated. It was demonstrated thatthe data-driven residuals yielded better results compared to the classifier, which requiredtraining on new cases to perform effectively. fault diagnosis data-driven fuel injectors fuel system random forest data-driven residuals fault detection diagnosis heavy duty truck Scania machine learning residual Control Engineering Reglerteknik
398	Classifying Portable Electronic Devices using Device Specifications : A Comparison of Machine Learning Techniques Westerholm, Ludvig January 2024 (has links) In this project, we explored the usage of machine learning in classifying portable electronic devices. The primary objective was to identify devices such as laptops, smartphones, and tablets, based on their physical and technical specification. These specifications, sourced from the Pricerunner price comparison website, contain height, Wi-Fi standard, and screen resolution. We aggregated this information into a dataset and split it into a training set and a testing set. To achieve the classification of devices, we trained four popular machine learning models: Random Forest (RF), Logistic Regression (LR), k-Nearest Neighbor (kNN), and Fully Connected Network (FCN). We then compared the performance of these models. The evaluation metrics used to compare performance included precision, recall, F1-score, accuracy, and training time. The RF model achieved the highest overall accuracy of 95.4% on the original dataset. The FCN, applied to a dataset processed with standardization followed by Principal Component Analysis (PCA), reached an accuracy of 92.7%, the best within this specific subset. LR excelled in a few class-specific metrics, while kNN performed notably well relative to its training time. The RF model was the clear winner on the original dataset, while the kNN model was a strong contender on the PCA-processed dataset due to its significantly faster training time compared to the FCN. In conclusion, the RF was the best-performing model on the original dataset, the FCN showed impressive results on the standardized and PCA-processed dataset, and the kNN model, with its highest macro precision and rapid training time, also demonstrated competitive performance. Supervised Machine Learning Random Forest Logistic Regression k-Nearest Neighbor Neural Networks Classifiation Multi-class Classification Device Recognition Computer and Information Sciences Data- och informationsvetenskap
399	Comparative Analysis of Machine Learning Algorithms for Cryptocurrency Price Prediction Kurtagic, Leila January 2024 (has links) As the cryptocurrency markets continuously grow, so does the need for reliable analytical tools for price prediction. This study conducted a comparative analysis of machine learning (ML) algorithms for cryptocurrency price prediction. Through a literature review, three common and reliable ML algorithms for cryptocurrency price prediction were identified: Long Short-Term Memory (LSTM), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost). Utilizing the Bitcoin All Time History dataset from TradingView, the study assessed both the individual performance of each algorithm and the potential of ensemble methods to enhance predictive accuracy. The results reveal that the LSTM algorithm outperformed RF and XGBoost in terms of predictive accuracy according to the metrics Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE). Additionally, two ensemble approaches were tested: Ensemble 1, which enhanced the LSTM model with the combined predictions from RF and XGBoost, and Ensemble 2, which integrated predictions from all three models. Ensemble 2 demonstrated the highest predictive performance among all models, highlighting the advantages of using ensemble approaches for more robust predictions. Machine Learning Cryptocurrency Price Prediction LSTM (Long Short-Term Memory) Random Forest XGBoost (eXtreme Gradient Boosting) Ensemble Methods Feature Importance Financial Analytics Computer and Information Sciences Data- och informationsvetenskap
400	Explainable predictive quality inautomotive manufacturing : Case study at Magna Electronics Ke, Damian January 2024 (has links) This thesis is a case study conducted at Magna Electronics to explore the use of machinelearning techniques in improving the predictive quality of electronic control unit (ECU)within the automotive manufacturing. This thesis aims to apply interpretable machinelearning methods to predict potential future ECU failures early. With the interpretablemachine learning the goal is to identify predictive variables that lead to ECU failure andwhich can be used as support for decision making.Logistic Regression and Random Forest were chosen as the machine learning methods,which have been used in research of predictive quality and have different levels of interpretability.TreeSHAP was used on the Random Forest as the post-hoc method to furtherunderstand the results. The models’ performances were quantitatively evaluatedthrough metrics such as accuracy and area under precision-recall curve. Subsequently, thebest-performing models were further analyzed using confusion matrices, precision-recallcurves, and horizontal bar charts to assess the impact of predictive variables.The results of this thesis indicated that while Random Forest outperformed Logistic Regression,both models demonstrated limited capability in accurately predicting faulty ECUs,due to the low AUCPR scores. The precision-recall curves suggested performance near randomguess, highlighting the possible variability in parameter impact.This study has also identified significant challenges, such as data imbalance and mislabeling,which may have had a negative effect on the results. Given these issues, the thesisadvises caution in using these results for decision-making. Although, findings of this thesisunderscore the need for a cautious approach to interpreting model outputs, suggestingthat real-world application may require to use different models based on the specific goalsand context of the analysis. Machine Learning Logistic Regression TreeSHAP Random Forest Electronic Control Unit Automotive Manufacturing Predictive Quality Explainable Artificial Intelligence Probability Theory and Statistics Sannolikhetsteori och statistik

Search results