Global ETD Search

251	Methodology Development for Improving the Performance of Critical Classification Applications Afrose, Sharmin 17 January 2023 (has links) People interact with different critical applications in day-to-day life. Some examples of critical applications include computer programs, anonymous vehicles, digital healthcare, smart homes, etc. There are inherent risks in these critical applications if they fail to perform properly. In my dissertation, we mainly focus on developing methodologies for performance improvement for software security and healthcare prognosis. Cryptographic vulnerability tools are used to detect misuses of Java cryptographic APIs and thus classify secure and insecure parts of code. These detection tools are critical applications as misuse of cryptographic libraries and APIs causes devastating security and privacy implications. We develop two benchmarks that help developers to identify secure and insecure code usage as well as improve their tools. We also perform a comparative analysis of four static analysis tools. The developed benchmarks enable the first scientific comparison of the accuracy and scalability of cryptographic API misuse detection. Many published detection tools (CryptoGuard, CrySL, Oracle Parfait) have used our benchmarks to improve their performance in terms of the detection capability of insecure cases. We also examine the need for performance improvement for healthcare applications. Numerous prediction applications are developed to predict patients' health conditions. These are critical applications where misdiagnosis can cause serious harm to patients, even death. Due to the imbalanced nature of many clinical datasets, our work provides empirical evidence showing various prediction deficiencies in a typical machine learning model. We observe that missed death cases are 3.14 times higher than missed survival cases for mortality prediction. Also, existing sampling methods and other techniques are not well-equipped to achieve good performance. We design a double prioritized (DP) technique to mitigate representational bias or disparities across race and age groups. we show DP consistently boosts the minority class recall for underrepresented groups, by up to 38.0%. Our DP method also shows better performance than the existing methods in terms of reducing relative disparity by up to 88% in terms of minority class recall. Incorrect classification in these critical applications can have significant ramifications. Therefore, it is imperative to improve the performance of critical applications to alleviate risk and harm to people. / Doctor of Philosophy / We interact with many software using our devices in our everyday life. Examples of software usage include calling transport using Lyft or Uber, doing online shopping using eBay, using social media via Twitter, check payment status from credit card accounts or bank accounts. Many of these software use cryptography to secure our personal and financial information. However, the inappropriate or improper use of cryptography can let the malicious party gain sensitive information. To capture the inappropriate usage of cryptographic functions, there are several detection tools are developed. However, to compare the coverage of the tools, and the depth of detection of these tools, suitable benchmarks are needed. To bridge this gap, we aim to build two cryptographic benchmarks that are currently used by many tool developers to improve their performance and compare their tools with the existing tools. In another aspect, people see physicians and are admitted to hospitals if needed. Physicians also use different software that assists them in caring the patients. Among this software, many of them are built using machine learning algorithms to predict patients' conditions. The historical medical information or clinical dataset is taken as input to the prediction models. Clinical datasets contain information about patients of different races and ages. The number of samples in some groups of patients may be larger than in other groups. For example, many clinical datasets contain more white patients (i.e., majority group) than Black patients (i.e., minority group). Prediction models built on these imbalanced clinical data may provide inaccurate predictions for minority patients. Our work aims to improve the prediction accuracy for minority patients in important medical applications, such as estimating the likelihood of a patient dying in an emergency room visit or surviving cancer. We design a new technique that builds customized prediction models for different demographic groups. Our results reveal that subpopulation-specific models show better performance for minority groups. Our work contributes to improving the medical care of minority patients in the age of digital health. Overall, our aim is to improve the performance of critical applications to help people by decreasing risk. Our developed methods can be applicable to other critical application domains. Software Security Machine Learning Bias
252	Using Artificial Life to Design Machine Learning Algorithms for Decoding Gene Expression Patterns from Images Zaghlool, Shaza Basyouni 26 May 2008 (has links) Understanding the relationship between gene expression and phenotype is important in many areas of biology and medicine. Current methods for measuring gene expression such as microarrays however are invasive, require biopsy, and expensive. These factors limit experiments to low rate temporal sampling of gene expression and prevent longitudinal studies within a single subject, reducing their statistical power. Thus methods for non-invasive measurements of gene expression are an important and current topic of research. An interesting approach (Segal et al, Nature Biotechnology 25 (6) 2007) to indirect measurements of gene expression has recently been reported that uses existing imaging techniques and machine learning to estimate a function mapping image features to gene expression patterns, providing an image-derived surrogate for gene expression. However, the design of machine learning methods for this purpose is hampered by the cost of training and validation. My thesis shows that populations of artificial organisms simulating genetic variation can be used for designing machine learning approaches to decoding gene expression patterns from images. If analysis of these images proves successful, then this can be applied to real biomedical images reducing the limitations of invasive imaging. The results showed that the box counting dimension was a suitable feature extraction method yielding a classification rate of at least 90% for mutation rates up to 40%. Also, the box-counting dimension was robust in dealing with distorted images. The performance of the classifiers using the fractal dimension as features, actually, seemed more vulnerable to the mutation rate as opposed to the applied distortion level. / Master of Science phenotype Machine learning biomorph genotype
253	A Machine Learning Approach for the Objective Sonographic Assessment of Patellar Tendinopathy in Collegiate Basketball Athletes Cheung, Carrie Alyse 07 June 2021 (has links) Patellar tendinopathy (PT) is a knee injury resulting in pain localized to the patellar tendon. One main factor that causes PT is repetitive overloading of the tendon. Because of this mechanism, PT is commonly seen in "jumping sports" like basketball. This injury can severely impact a player's performance, and in order for a timely return to preinjury activity levels early diagnosis and treatment is important. The standard for the diagnosis of PT is a clinical examination, including a patient history and a physical assessment. Because PT has similar symptoms to injuries of other knee structures like the bursae, fat pad, and patellofemoral joint, imaging is regularly performed to aid in determining the correct diagnosis. One common imaging modality for the patellar tendon is gray-scale ultrasonography (GS-US). However, the accurate detection of PT in GS-US images is grader dependent and requires a high level of expertise. Machine learning (ML) models, which can accurately and objectively perform image classification tasks, could be used as a reliable automated tool to aid clinicians in assessing PT in GS-US images. ML models, like support vector machines (SVMs) and convolutional neural networks (CNNs), use features learned from labelled images, to predict the class of an unlabelled image. SVMs work by creating an optimal hyperplane between classes of labelled data points, and then classifies an unlabelled datapoint depending on which side of the hyperplane it falls. CNNs work by learning the set of features and recognizing what pattern of features describes each class. The objective of this study was to develop a SVM model and a CNN model to classify GS-US images of the patellar tendon as either normal or diseased (PT present), with an accuracy around 83%, the accuracy that experienced clinicians achieved when diagnosing PT in GS-US images that were already clinically diagnosed as either diseased or normal. We will also compare different test designs for each model to determine which achieved the highest accuracy. GS-US images of the patellar tendon were obtained from male and female Virginia Tech collegiate basketball athletes. Each image was labelled by an experienced clinician as either diseased or normal. These images were split into training and testing sets. The SVM and the CNN models were created using Python. For the SVM model, features were extracted from the training set using speeded up robust features (SURF). These features were then used to train the SVM model by calculating the optimal weights for the hyperplane. For the CNN model, the features were learned by layers within the CNN as were the optimal weights for classification. Both of these models were then used to predict the class of the images within the testing set, and the accuracy, sensitivity and precision of the models were calculated. For each model we looked at different test designs. The balanced designs had the same amount of diseased and normal images. The designs with Long images had only images taken in the longitudinal orientation, unlike Long+Trans, which had both longitudinal and transverse images. The designs with Full images contained the patellar tendon and surrounding tissue, whereas the ROI images removed the surrounding tissue. The best designs for the SVM model were the Unbalanced Long designs for both the Full and ROI images. Both designs had an accuracy of 77.5%. The best design for the CNN model was the Balanced Long+Trans Full design, with an accuracy of 80.3%. Both of the models had more difficulty classifying normal images than diseased images. This may be because the diseased images had a well defined feature pattern, while the normal images did not. Overall, the CNN features and classifier achieved a higher accuracy than the SURF features and SVM classifier. The CNN model is only slightly below 83%, the accuracy of an experienced clinician. These are promising results, and as the data set size increases and the models are fine tuned, the accuracy of the model will only continue to increase. / Master of Science / Patellar tendinopathy (PT) is a common knee injury. This injury is frequently seen in sports like basketball, where athletes are regularly jumping and landing, and ultimately applying a lot of force onto the patellar tendon. This injury can severely impact a player's performance, and in order for a timely return to preinjury activity levels early diagnosis and treatment is important. Currently, diagnosis of PT involves a patient history and a physical assessment, and is commonly supplemented by ultrasound imaging. However, clinicians need to have a high level of expertise in order to accurately assess these images for PT. In order to aid in this assessment, a tool like Machine learning (ML) models could be used. ML is becoming more and more prevalent in our every day lives. These models are everywhere, from the facial recognition tool on your phone to the list of recommended items on your Amazon account. ML models can use features learned from labelled images, to predict the class of an unlabeled image. The objective of this study was to develop ML models to classify ultrasound images of the patellar tendon as either normal or diseased (PT present). Machine learning Patellar Tendinopathy Ultrasonography
254	Machine Learning Classification of Gas Chromatography Data Clark, Evan Peter 28 August 2023 (has links) Gas Chromatography (GC) is a technique for separating volatile compounds by relying on adherence differences in the chemical components of the compound. As conditions within the GC are changed, components of the mixture elute at different times. Sensors measure the elution and produce data which becomes chromatograms. By analyzing the chromatogram, the presence and quantity of the mixture's constituent components can be determined. Machine Learning (ML) is a field consisting of techniques by which machines can independently analyze data to derive their own procedures for processing it. Additionally, there are techniques for enhancing the performance of ML algorithms. Feature Selection is a technique for improving performance by using a specific subset of the data. Feature Engineering is a technique to transform the data to make processing more effective. Data Fusion is a technique which combines multiple sources of data so as to produce more useful data. This thesis applies machine learning algorithms to chromatograms. Five common machine learning algorithms are analyzed and compared, including K-Nearest Neighbour (KNN), Support Vector Machines (SVM), Convolutional Neural Network (CNN), Decision Tree, and Random Forest (RF). Feature Selection is tested by applying window sweeps with the KNN algorithm. Feature Engineering is applied via the Principal Component Analysis (PCA) algorithm. Data Fusion is also tested. It was found that KNN and RF performed best overall. Feature Selection was very beneficial overall. PCA was helpful for some algorithms, but less so for others. Data Fusion was moderately beneficial. / Master of Science / Gas Chromatography is a method for separating a mixture into its constituent components. A chromatogram is a time series showing the detection of gas in the gas chromatography machine over time. With a properly set up gas chromatographer, different mixtures will produce different chromatograms. These differences allow researchers to determine the components or differentiate compounds from each other. Machine Learning (ML) is a field encompassing a set of methods by which machines can independently analyze data to derive the exact algorithms for processing it. There are many different machine learning algorithms which can accomplish this. There are also techniques which can process the data to make it more effective for use with machine learning. Feature Engineering is one such technique which transforms the data. Feature Selection is another technique which reduces the data to a subset. Data Fusion is a technique which combines different sources of data. Each of these processing techniques have many different implementations. This thesis applies machine learning to gas chromatography. ML systems are developed to classify mixtures based on their chromatograms. Five common machine learning algorithms are developed and compared. Some common Feature Engineering, Feature Selection, and Data Fusion techniques are also evaluated. Two of the algorithms were found to be more effective overall than the other algorithms. Feature Selection was found to be very beneficial. Feature Engineering was beneficial for some algorithms but less so for others. Data Fusion was moderately beneficial. Gas Chromatography Machine Learning Classification
255	Interpretation, Verification and Privacy Techniques for Improving the Trustworthiness of Neural Networks Dethise, Arnaud 22 March 2023 (has links) Neural Networks are powerful tools used in Machine Learning to solve complex problems across many domains, including biological classification, self-driving cars, and automated management of distributed systems. However, practitioners' trust in Neural Network models is limited by their inability to answer important questions about their behavior, such as whether they will perform correctly or if they can be entrusted with private data. One major issue with Neural Networks is their "black-box" nature, which makes it challenging to inspect the trained parameters or to understand the learned function. To address this issue, this thesis proposes several new ways to increase the trustworthiness of Neural Network models. The first approach focuses specifically on Piecewise Linear Neural Networks, a popular flavor of Neural Networks used to tackle many practical problems. The thesis explores several different techniques to extract the weights of trained networks efficiently and use them to verify and understand the behavior of the models. The second approach shows how strengthening the training algorithms can provide guarantees that are theoretically proven to hold even for the black-box model. The first part of the thesis identifies errors that can exist in trained Neural Networks, highlighting the importance of domain knowledge and the pitfalls to avoid with trained models. The second part aims to verify the outputs and decisions of the model by adapting the technique of Mixed Integer Linear Programming to efficiently explore the possible states of the Neural Network and verify properties of its outputs. The third part extends the Linear Programming technique to explain the behavior of a Piecewise Linear Neural Network by breaking it down into its linear components, generating model explanations that are both continuous on the input features and without approximations. Finally, the thesis addresses privacy concerns by using Trusted Execution and Differential Privacy during the training process. The techniques proposed in this thesis provide strong, theoretically provable guarantees about Neural Networks, despite their black-box nature, and enable practitioners to verify, extend, and protect the privacy of expert domain knowledge. By improving the trustworthiness of models, these techniques make Neural Networks more likely to be deployed in real-world applications. machine learning neural network verification explainable ai machine learning privacy trustworthy machine learning formal methods
256	Integrated Process Modeling and Data Analytics for Optimizing Polyolefin Manufacturing Sharma, Niket 19 November 2021 (has links) Polyolefins are one of the most widely used commodity polymers with applications in films, packaging and automotive industry. The modeling of polymerization processes producing polyolefins, including high-density polyethylene (HDPE), polypropylene (PP), and linear low-density polyethylene (LLDPE) using Ziegler-Natta catalysts with multiple active sites, is a complex and challenging task. In our study, we integrate process modeling and data analytics for improving and optimizing polyolefin manufacturing processes. Most of the current literature on polyolefin modeling does not consider all of the commercially important production targets when quantifying the relevant polymerization reactions and their kinetic parameters based on measurable plant data. We develop an effective methodology to estimate kinetic parameters that have the most significant impacts on specific production targets, and to develop the kinetics using all commercially important production targets validated over industrial polyolefin processes. We showcase the utility of dynamic models for efficient grade transition in polyolefin processes. We also use the dynamic models for inferential control of polymer processes. Thus, we showcase the methodology for making first-principle polyolefin process models which are scientifically consistent, but tend to be less accurate due to many modeling assumptions in a complex system. Data analytics and machine learning (ML) have been applied in the chemical process industry for accurate predictions for data-based soft sensors and process monitoring/control. Specifically, for polymer processes, they are very useful since the polymer quality measurements like polymer melt index, molecular weight etc. are usually less frequent compared to the continuous process variable measurements. We showcase the use of predictive machine learning models like neural networks for predicting polymer quality indicators and demonstrate the utility of causal models like partial least squares to study the causal effect of the process parameters on the polymer quality variables. ML models produce accurate results can over-fit the data and also produce scientifically inconsistent results beyond the operating data range. Thus, it is growingly important to develop hybrid models combining data-based ML models and first-principle models. We present a broad perspective of hybrid process modeling and optimization combining the scientific knowledge and data analytics in bioprocessing and chemical engineering with a science-guided machine learning (SGML) approach and not just the direct combinations of first-principle and ML models. We present a detailed review of scientific literature relating to the hybrid SGML approach, and propose a systematic classification of hybrid SGML models according to their methodology and objective. We identify the themes and methodologies which have not been explored much in chemical engineering applications, like the use of scientific knowledge to help improve the ML model architecture and learning process for more scientifically consistent solutions. We apply these hybrid SGML techniques to industrial polyolefin processes such as inverse modeling, science guided loss and many others which have not been applied previously to such polymer applications. / Doctor of Philosophy / Almost everything we see around us from furniture, electronics to bottles, cars, etc. are made fully or partially from plastic polymers. The two most popular polymers which comprise almost two-thirds of polymer production globally are polyethylene (PE) and polypropylene (PP), collectively known as polyolefins. Hence, the optimization of polyolefin manufacturing processes with the aid of simulation models is critical and profitable for chemical industry. Modeling of a chemical/polymer process is helpful for process-scale up, product quality estimation/monitoring and new process development. For making a good simulation model, we need to validate the predictions with actual industrial data. Polyolefin process has complex reaction kinetics with multiple parameters that need to be estimated to accurately match the industrial process. We have developed a novel strategy for estimating the kinetics for the model, including the reaction chemistry and the polymer quality information validating with industrial process. Thus, we have developed a science-based model which includes the knowledge of reaction kinetics, thermodynamics, heat and mass balance for the polyolefin process. The science-based model is scientifically consistent, but may not be very accurate due to many model assumptions. Therefore, for applications requiring very high accuracy predicting any polymer quality targets such as melt index (MI), density, data-based techniques might be more appropriate. Recently, we may have heard a lot about artificial intelligence (AI) and machine learning (ML) the basic principle behind these methods is to making the model learn from data for prediction. The process data that are measured in a chemical/polymer plant can be utilized for data analysis. We can build ML models to predict polymer targets like MI as a function of the input process variables. The ML model predictions are very accurate in the process operating range of the dataset on which the model is learned, but outside the prediction range, they may tend to give scientifically inconsistent results. Thus, there is a need to combine the data-based models and scientific models. In our research, we showcase novel approaches to integrate the science-based models and the data-based ML methodology which we term as the hybrid science-guided machine learning methods (SGML). The hybrid SGML methods applied to polyolefin processes yield not only accurate, but scientifically consistent predictions which can be used for polyolefin process optimization for applications like process development and quality monitoring. polyolefins polymers process modeling Machine learning data analytics hybrid Machine learning science-guided Machine learning
257	Leveraging Infrared Imaging with Machine Learning for Phenotypic Profiling Liu, Xinwen January 2024 (has links) Phenotypic profiling systematically maps and analyzes observable traits (phenotypes) exhibited in cells, tissues, organisms or systems in response to various conditions, including chemical, genetic and disease perturbations. This approach seeks to comprehensively understand the functional consequences of perturbations on biological systems, thereby informing diverse research areas such as drug discovery, disease modeling, functional genomics and systems biology. Corresponding techniques should capture high-dimensional features to distinguish phenotypes affected by different conditions. Current methods mainly include fluorescence imaging, mass spectrometry and omics technologies, coupled with computational analysis, to quantify diverse features such as morphology, metabolism and gene expression in response to perturbations. Yet, they face challenges of high costs, complicated operations and strong batch effects. Vibrational imaging offers an alternative for phenotypic profiling, providing a sensitive, cost-effective and easily operated approach to capture the biochemical fingerprint of phenotypes. Among vibrational imaging techniques, infrared (IR) imaging has further advantages of high throughput, fast imaging speed and full spectrum coverage compared with Raman imaging. However, current biomedical applications of IR imaging mainly concentrate on "digital disease pathology", which uses label-free IR imaging with machine learning for tissue pathology classification and disease diagnosis. The thesis contributes as the first comprehensive study of using IR imaging for phenotypic profiling, focusing on three key areas. First, IR-active vibrational probes are systematically designed to enhance metabolic specificity, thereby enriching measured features and improving sensitivity and specificity for phenotype discrimination. Second, experimental workflows are established for phenotypic profiling using IR imaging across biological samples at various levels, including cellular, tissue and organ, in response to drug and disease perturbations. Lastly, complete data analysis pipelines are developed, including data preprocessing, statistical analysis and machine learning methods, with additional algorithmic developments for analyzing and mapping phenotypes. Chapter 1 lays the groundwork for IR imaging by delving into the theory of IR spectroscopy theory and the instrumentation of IR imaging, establishing a foundation for subsequent studies. Chapter 2 discusses the principles of popular machine learning methods applied in IR imaging, including supervised learning, unsupervised learning and deep learning, providing the algorithmic backbone for later chapters. Additionally, it provides an overview of existing biomedical applications using label-free IR imaging combined with machine learning, facilitating a deeper understanding of the current research landscape and the focal points of IR imaging for traditional biomedical studies. Chapter 3-5 focus on applying IR imaging coupled with machine learning for novel application of phenotypic profiling. Chapter 3 explores the design and development of IR-active vibrational probes for IR imaging. Three types of vibrational probes, including azide, 13C-based probes and deuterium-based probes are introduced to study dynamic metabolic activities of protein, lipids and carbohydrates in cells, small organisms and mice for the first time. The developed probes largely improve the metabolic specificity of IR imaging, enhancing the sensitivity of IR imaging towards different phenotypes. Chapter 4 studies the combination of IR imaging, heavy water labeling and unsupervised learning for tissue metabolic profiling, which provides a novel method to map metabolic tissue atlas in complex mammalian systems. In particular, cell type-, tissue- and organ-specific metabolic profiles are identified with spatial information in situ. In addition, this method further captures metabolic changes during brain development and characterized intratumor metabolic heterogeneity of glioblastoma, showing great promise for disease modeling. Chapter 5 developed Vibrational Painting (VIBRANT), a method using IR imaging, multiplexed vibrational probes and supervised learning for cellular phenotypic profiling of drug perturbations. Three IR-active vibrational probes were designed to measure distinct essential metabolic activities in human cancer cells. More than 20,000 single-cell drug responses were collected, corresponding to 23 drug treatments. Supervised learning is used to accurately predict drug mechanism of action at single-cell level with minimal batch effects. We further designed an algorithm to discover drug candidates with novel mechanisms of action and evaluate drug combinations. Overall, VIBRANT has demonstrated great potential across multiple areas of phenotypic drug screening. Biophysics Machine learning Deep learning (Machine learning) Supervised learning (Machine learning) Phenotype Infrared imaging Infrared spectroscopy
258	Expert Knowledge Elicitation for Machine Learning : Insights from a Survey and Industrial Case Study Svensson, Samuel, Persson, Oskar January 2023 (has links) While machine learning has shown success in many fields, it can be challenging when there are limitations with insufficient training data. By incorporating knowledge into the machine learning pipeline, one can overcome such limitations. Therefore, eliciting expert knowledge can play an important role in the machine learning project pipeline. Expert knowledge can come in many forms, and it is seldom easy to elicit and formalize it in a way that is easily implementable into a machine learning project. While it has been done, not much focus has been on how. Furthermore, the motivations for why knowledge was elicited in a particular way as well as the challenges that may exist with the elicitation, are not always focused on either. Making educated decisions for knowledge elicitation can therefore be challenging for researchers. Hence, this work aims to explore and categorize how expert knowledge elicitation has been done by researchers previously. This was done by developing a taxonomy that was then used for analyzing articles. A total of 43 articles were found, containing 97 elicitation paths that were categorized in order to identify trends and common approaches. The findings from our study were used to provide guidance for an industrial case in its initial stage to show how the taxonomy presented in this work can be applied in a real-world scenario. knowledge elicitation machine learning expert knowledge informed machine learning hybrid machine learning survey taxonomy Computer Systems Datorsystem
259	Classifying Receipts and Invoices in Visma Mobile Scanner Yasser, Almodhi January 2016 (has links) This paper presents a study on classifying receipts and invoices using Machine Learning. Furthermore, Naïve Bayes Algorithm and the advantages of using it will be discussed. With information gathered from theory and previous research, I will show how to classify images into a receipt or an invoice. Also, it includes pre-processing images using a variety of pre-processing methods and text extraction using Optical Character Recognition (OCR). Moreover, the necessity of pre-processing images to reach a higher accuracy will be discussed. A result shows a comparison between Tesseract OCR engine and FineReader OCR engine. After embracing much knowledge from theory and discussion, the results showed that combining FineReader OCR engine and Machine Learning is increasing the accuracy of the image classification. Machine Learning classifying OCR Tesseract Fine Reader
260	VisuNet: Visualizing Networks of feature interactions in rule-based classifiers Anyango, Stephen Omondi Otieno January 2016 (has links) No description available.

Search results