Return to search

Machine Learning Classification of Gas Chromatography Data

Gas Chromatography (GC) is a technique for separating volatile compounds by relying on adherence differences in the chemical components of the compound. As conditions within the GC are changed, components of the mixture elute at different times. Sensors measure the elution and produce data which becomes chromatograms. By analyzing the chromatogram, the presence and quantity of the mixture's constituent components can be determined. Machine Learning (ML) is a field consisting of techniques by which machines can independently analyze data to derive their own procedures for processing it. Additionally, there are techniques for enhancing the performance of ML algorithms. Feature Selection is a technique for improving performance by using a specific subset of the data. Feature Engineering is a technique to transform the data to make processing more effective. Data Fusion is a technique which combines multiple sources of data so as to produce more useful data. This thesis applies machine learning algorithms to chromatograms. Five common machine learning algorithms are analyzed and compared, including K-Nearest Neighbour (KNN), Support Vector Machines (SVM), Convolutional Neural Network (CNN), Decision Tree, and Random Forest (RF). Feature Selection is tested by applying window sweeps with the KNN algorithm. Feature Engineering is applied via the Principal Component Analysis (PCA) algorithm. Data Fusion is also tested. It was found that KNN and RF performed best overall. Feature Selection was very beneficial overall. PCA was helpful for some algorithms, but less so for others. Data Fusion was moderately beneficial. / Master of Science / Gas Chromatography is a method for separating a mixture into its constituent components. A chromatogram is a time series showing the detection of gas in the gas chromatography machine over time. With a properly set up gas chromatographer, different mixtures will produce different chromatograms. These differences allow researchers to determine the components or differentiate compounds from each other. Machine Learning (ML) is a field encompassing a set of methods by which machines can independently analyze data to derive the exact algorithms for processing it. There are many different machine learning algorithms which can accomplish this. There are also techniques which can process the data to make it more effective for use with machine learning. Feature Engineering is one such technique which transforms the data. Feature Selection is another technique which reduces the data to a subset. Data Fusion is a technique which combines different sources of data. Each of these processing techniques have many different implementations. This thesis applies machine learning to gas chromatography. ML systems are developed to classify mixtures based on their chromatograms. Five common machine learning algorithms are developed and compared. Some common Feature Engineering, Feature Selection, and Data Fusion techniques are also evaluated. Two of the algorithms were found to be more effective overall than the other algorithms. Feature Selection was found to be very beneficial. Feature Engineering was beneficial for some algorithms but less so for others. Data Fusion was moderately beneficial.

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/116146
Date28 August 2023
CreatorsClark, Evan Peter
ContributorsElectrical Engineering, Nazhandali, Leyla, Abbott, Amos L., Eldardiry, Hoda
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
LanguageEnglish
Detected LanguageEnglish
TypeThesis
FormatETD, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0152 seconds