Spelling suggestions: "subject:"recursive feature elimination"" "subject:"recursive feature climination""
1 |
Kernel-Based Data Mining Approach with Variable Selection for Nonlinear High-Dimensional DataBaek, Seung Hyun 01 May 2010 (has links)
In statistical data mining research, datasets often have nonlinearity and high-dimensionality. It has become difficult to analyze such datasets in a comprehensive manner using traditional statistical methodologies. Kernel-based data mining is one of the most effective statistical methodologies to investigate a variety of problems in areas including pattern recognition, machine learning, bioinformatics, chemometrics, and statistics. In particular, statistically-sophisticated procedures that emphasize the reliability of results and computational efficiency are required for the analysis of high-dimensional data. In this dissertation, first, a novel wrapper method called SVM-ICOMP-RFE based on hybridized support vector machine (SVM) and recursive feature elimination (RFE) with information-theoretic measure of complexity (ICOMP) is introduced and developed to classify high-dimensional data sets and to carry out subset selection of the variables in the original data space for finding the best for discriminating between groups. Recursive feature elimination (RFE) ranks variables based on the information-theoretic measure of complexity (ICOMP) criterion. Second, a dual variables functional support vector machine approach is proposed. The proposed approach uses both the first and second derivatives of the degradation profiles. The modified floating search algorithm for the repeated variable selection, with newly-added degradation path points, is presented to find a few good variables while reducing the computation time for on-line implementation. Third, a two-stage scheme for the classification of near infrared (NIR) spectral data is proposed. In the first stage, the proposed multi-scale vertical energy thresholding (MSVET) procedure is used to reduce the dimension of the high-dimensional spectral data. In the second stage, a few important wavelet coefficients are selected using the proposed SVM gradient-recursive feature elimination (RFE). Fourth, a novel methodology based on a human decision making process for discriminant analysis called PDCM is proposed. The proposed methodology consists of three basic steps emulating the thinking process: perception, decision, and cognition. In these steps two concepts known as support vector machines for classification and information complexity are integrated to evaluate learning models.
|
2 |
Plant-wide Performance Monitoring and Controller PrioritizationPareek, Samidh Unknown Date
No description available.
|
3 |
Plant-wide Performance Monitoring and Controller PrioritizationPareek, Samidh 06 1900 (has links)
Plant-wide performance monitoring has generated a lot of interest in the control engineering community. The idea is to judge the performance of a plant as a whole rather than looking at performance of individual controllers. Data based methods are currently used to generate a variety of statistical performance indices to help us judge the performance of production units and control assets. However, so much information can often be overwhelming if it lacks precise information. Powerful computing and data storage capabilities have enabled industries to store huge amounts of data. Commercial performance monitoring softwares such as those available from many vendor companies such as Honeywell, Matrikon, ExperTune etc typically use this data to generate huge amounts of information. The problem of data overload has in this way turned into an information overload problem. This work focuses on developing methods that reconcile these various statistical measures of performance and generate useful diagnostic measures in order to optimize process performance of a unit/plant. These methods are also able to identify the relative importance of controllers in the way that they affect the performance of the unit/plant under consideration. / Process Control
|
4 |
Machine Learning assisted system for the resource-constrained atrial fibrillation detection from short single-lead ECG signalsAbdukalikova, Anara January 2018 (has links)
An integration of ICT advances into a conventional healthcare system is spreading extensively nowadays. This trend is known as Electronic health or E-Health. E-Health solutions help to achieve the sustainability goal of increasing the expected lifetime while improving the quality of life by providing a constant healthcare monitoring. Cardiovascular diseases are one of the main killers yearly causing approximately 17.7 million deaths worldwide. The focus of this work is on studying the detection of one of the cardiovascular diseases – Atrial Fibrillation (AF) arrhythmia. This type of arrhythmia has a severe influence on the heart health conditions and could cause congestive heart failure (CHF), stroke, and even increase the risk of death. Therefore, it is important to detect AF as early as possible. In this thesis we focused on studying various machine learning techniques for AF detection using only short single lead Electrocardiography recordings. A web-based solution was built as a final prototype, which first simulates the reception of a recorded signal, conducts the preprocessing, makes a prediction of the AF presence, and visualizes the result. For the AF detection the relatively high accuracy score was achieved comparable to the one of the state-of-the-art. The work was based on the investigation of the proposed architectures and the usage of the database of signals from the 2017 PhysioNet/CinC Challenge. However, an additional constraint was introduced to the original problem formulation, since the idea of a future deployment on the resource-limited devices places the restrictions on the complexity of the computations being performed for achieving the prediction. Therefore, this constraint was considered during the development phase of the project.
|
5 |
Improvement of Bacteria Detection Accuracy and Speed Using Raman Scattering and Machine LearningMandour, Aseel 15 September 2022 (has links)
Bacteria identification plays an essential role in preventing health complications and saving patients' lives. The most widely used method to identify bacteria, the bacterial cultural method, suffers from long processing times. Hence, an effective, rapid, and non-invasive method is needed as an alternative. Raman spectroscopy is a potential candidate for bacteria identifi cation due to its effective and rapid results and the fact that, similar to the uniqueness of a human fingerprint, the Raman spectrum is unique for every material.
In my lab at the University of Ottawa, we focus on the use of Raman scattering for
biosensing in order to achieve high identifi cation accuracy for different types of bacteria.
Based on the unique Raman fingerprint for each bacteria type, different types of bacteria can be identifi ed successfully. However, using the Raman spectrum to identify bacteria poses a few challenges. First, the Raman signal is a weak signal, and so enhancement of the signal intensity is essential, e.g., by using surface-enhanced Raman scattering (SERS).
Moreover, the Raman signal can be contaminated by different noise sources. Also, the signal consists of a large number of features, and is non-linear due to the correlation between the Raman features. Using machine learning (ML) along with SERS, we can overcome such challenges in the identifi cation process and achieve high accuracy for the system identifying bacteria.
In this thesis, I present a method to improve the identifi cation of different bacteria
types using a support vector machine (SVM) ML algorithm based on SERS. I also present dimension reduction techniques to reduce the complexity and processing time while maintaining high identifi cation accuracy in the classifi cation process. I consider four bacteria types: Escherichia coli (EC), Cutibacterium acnes (CA, it was formerly known as Propi-onibacterium acnes), methicillin-resistant Staphylococcus aureus (MRSA), and methicillin-sensitive Staphylococcus aureus (MSSA). Both the MRSA and MSSA are combined in a single class named MS in the classifi cation. We are focusing on using these types of bacteria as they are the most common types in the joint infection disease.
Using binary classi fication, I present the simulation results for three binary models: EC
vs CA, EC vs MS, and MS vs CA. Using the full data set, binary classi fication achieved a classi fication accuracy of more than 95% for the three models. When the samples data set was reduced, to decrease the complexity based on the samples' signal-to-noise ratio (SNR), a classi fication accuracy of more than 95% for the three models was achieved using less than 60% of the original data set. The recursive feature elimination (RFE) algorithm was then used to reduce the complexity in the feature dimension. Given that a small number of features were more heavily weighted than the rest of the features, the number of features used in the classifi cation could be signi ficantly reduced while maintaining high classi fication accuracy.
I also present the classifi cation accuracy of using the multiclass one-versus-all (OVA)
method, i.e., EC vs all, MS vs all, and CA vs all. Using the complete data set, the OVA
method achieved classi cation accuracy of more than 90%. Similar to the binary classifi cation, the dimension reduction was applied to the input samples. Using the SNR reduction, the input samples were reduced by more than 60% while maintaining classifi cation accuracy higher than 80%. Furthermore, when the RFE algorithm was used to reduce the complexity on the features, and only the 5% top-weighted features of the full data set were used, a classi fication accuracy of more than 90% was achieved. Finally, by combining both reduction dimensions, the classi fication accuracy for the reduced data set was above 92% for a signifi cantly reduced data set.
Both the dimension reduction and the improvement in the classi fication accuracy between different types of bacteria using the ML algorithm and SERS could have a signi ficant impact in ful lfiling the demand for accurate, fast, and non-destructive identi fication of bacteria samples in the medical fi eld, in turn potentially reducing health complications and saving patient lives.
|
6 |
Prognostic Stratification in Patients with Left Heart Disease : A Machine Learning Approach / Prognostisk stratifiering hos patienter med vänstersidig hjärtsvikt : En maskininlärningsmetodSaleh, Mariam January 2024 (has links)
Left heart disease often results in left heart failure and right ventricular dysfunction which is challenging to diagnose with traditional diagnostic approaches. To address this a novel empirical 4-point right ventricular dysfunction score was created at Sahlgrenska University Hospital to overcome the limitations of single variables for diagnosing right ventricular dysfunction. In this study, we used machine learning, more specifically XGBoost coupled with interactive machine learning to develop four different models for predicting death or receiving a left ventricular assist device in patients with left heart disease (n=486). Features were selected from the dataset using recursive feature elimination with the default number of features. The initial model with 29 features, called the baseline model served as the foundation of the three additional models, each adjusted based on feedback from a clinician. The first step of feedback included removing features due to high correlation, creating a modified model with 12 features, the second step was to use 12 well-known characteristics of left and right ventricular dysfunction creating an empirical model, and adjusting the prediction threshold from 50% to 60%. The third step was to reduce the number of features to 5 based on empirical grounds. The models were compared to the right ventricular dysfunction score using the metrics area under the curve, f1 score, positive likelihood ratio, and negative likelihood ratio. The predictive efficacy of the machine learning models was superior compared to the right ventricular dysfunction score. The results also indicated that the models did neither improve nor deteriorate when reducing the number of features. However, insufficient accuracy indicates that none of the machine learning models are clinically viable. These results show the potential of machine learning in enhancing prognostic stratification in patients with left heart disease although further refinement is necessary for clinical use. / Vänstersidig hjärtsjukdom resulterar ofta i vänstersidig hjärtsvikt och högerkammarsvikt vilket är utmanade att diagnostisera med traditionella diagnostiska metoder. För att komma undan med begränsningen med enskilda variabler för att diagnostisera högerkammarsvikt skapades ett 4 poängs högerkammarsvikt score vid Sahlgrenska Universitetssjukhuset. I denna studie användes en XGBoost-algoritm kombinerat med interaktiv maskininlärning för att utveckla fyra olika prediktions modeller för att förutsäga dödlighet eller risken att få en mekanisk hjärtpump för vänster kammare hos patienter med vänster hjärtsvikt (n=486). Variabler valdes från datamängden med hjälp av rekursiv funktionseliminering med ett standardantal variabler. Den initiala modellen med 29 variabler kallades baslinjemodellen och fungerade som grunden för de tre ytterligare modellerna som justerades baserat på klinikerns feedback. Det först steget inkluderade att ta bort variabler med inbördes hög korrelation och vi skapade en modifierad modell med 12 variabler. I det andra steget i den empiriska modellen använde vi 12 kända egenskaperna vid vänsterkammar- och högerkammarsvikt och för båda justerades tröskelvärdet för prediktion från 50% till 60%. I ett tredje steg skapade vi en förenklad modell med 5 variabler ut ifrån klinisk grund. Modellerna jämfördes med höger hjärtsvikts 4 poängskalan med hjälp av mätvariablerna area under kurvan, f1-poäng, positivt sannolikhets ratio och negativt sannolikhets ratio. Detta avslöjade att maskininlärnings modellerna hade bättre prediktiv förmåga än 4-poängs högerkammarsvikt score. Dessutom visade resultatet att modellerna inte försämrades eller förbättrades när variabler valdes bort eller när nya modeller skapades på klinisk grund. Dock hade maskininlärnings modellerna otillräcklig noggrannhet för klinisk användning.
|
7 |
Automatic Flight Maneuver Identification Using Machine Learning MethodsBodin, Camilla January 2020 (has links)
This thesis proposes a general approach to solve the offline flight-maneuver identification problem using machine learning methods. The purpose of the study was to provide means for the aircraft professionals at the flight test and verification department of Saab Aeronautics to automate the procedure of analyzing flight test data. The suggested approach succeeded in generating binary classifiers and multiclass classifiers that identified six flight maneuvers of different complexity from real flight test data. The binary classifiers solved the problem of identifying one maneuver from flight test data at a time, while the multiclass classifiers solved the problem of identifying several maneuvers from flight test data simultaneously. To achieve these results, the difficulties that this time series classification problem entailed were simplified by using different strategies. One strategy was to develop a maneuver extraction algorithm that used handcrafted rules. Another strategy was to represent the time series data by statistical measures. There was also an issue of an imbalanced dataset, where one class far outweighed others in number of samples. This was solved by using a modified oversampling method on the dataset that was used for training. Logistic Regression, Support Vector Machines with both linear and nonlinear kernels, and Artifical Neural Networks were explored, where the hyperparameters for each machine learning algorithm were chosen during model estimation by 4-fold cross-validation and solving an optimization problem based on important performance metrics. A feature selection algorithm was also used during model estimation to evaluate how the performance changes depending on how many features were used. The machine learning models were then evaluated on test data consisting of 24 flight tests. The results given by the test data set showed that the simplifications done were reasonable, but the maneuver extraction algorithm could sometimes fail. Some maneuvers were easier to identify than others and the linear machine learning models resulted in a poor fit to the more complex classes. In conclusion, both binary classifiers and multiclass classifiers could be used to solve the flight maneuver identification problem, and solving a hyperparameter optimization problem boosted the performance of the finalized models. Nonlinear classifiers performed the best on average across all explored maneuvers.
|
8 |
Predicting Workforce in Healthcare : Using Machine Learning Algorithms, Statistical Methods and Swedish Healthcare Data / Predicering av Arbetskraft inom Sjukvården genom Maskininlärning, Statistiska Metoder och Svenska SjukvårdsstatistikDiskay, Gabriel, Joelsson, Carl January 2023 (has links)
Denna studie undersöker användningen av maskininlärningsmodeller för att predicera arbetskraftstrender inom hälso- och sjukvården i Sverige. Med hjälp av en linjär regressionmodell, en Gradient Boosting Regressor-modell och en Exponential Smoothing-modell syftar forskningen för detta arbete till att ge viktiga insikter för underlaget till makroekonomiska överväganden och att ge en djupare förståelse av Beveridge-kurvan i ett sammanhang relaterat till hälso- och sjukvårdssektorn. Trots vissa utmaningar med datan är målet att förbättra noggrannheten och effektiviteten i beslutsfattandet rörande arbetsmarknaden. Resultaten av denna studie visar maskininlärningspotentialen i predicering i ett ekonomiskt sammanhang, även om inneboende begränsningar och etiska överväganden beaktas. / This study examines the use of machine learning models to predict workforce trends in the healthcare sector in Sweden. Using a Linear Regression model, a Gradient Boosting Regressor model, and an Exponential Smoothing model the research aims to grant needed insight for the basis of macroeconomic considerations and to give a deeper understanding of the Beveridge Curve in the healthcare sector’s context. Despite some challenges with data, the goal is to improve the accuracy and efficiency of the policy-making around the labor market. The results of this study demonstrates the machine learning potential in the forecasting within an economic context, although inherent limitations and ethical considerations are considered.
|
9 |
Data-Driven Success in Infrastructure Megaprojects. : Leveraging Machine Learning and Expert Insights for Enhanced Prediction and Efficiency / Datadriven framgång inom infrastrukturmegaprojekt. : Utnyttja maskininlärning och expertkunskap för förbättrad prognostisering och effektivitet.Nordmark, David E.G. January 2023 (has links)
This Master's thesis utilizes random forest and leave-one-out cross-validation to predict the success of megaprojects involving infrastructure. The goal was to enhance the efficiency of the design and engineering phase of the infrastructure and construction industries. Due to the small sample size of megaprojects and limitated data sharing, the lack of data poses significant challenges for implementing artificial intelligence for the evaluation and prediction of megaprojects. This thesis explore how megaprojects can benefit from data collection and machine learning despite small sample sizes. The focus of the research was on analyzing data from thirteen megaprojects and identifying the most influential data for machine learning analysis. The results prove that the incorporation of expert data, representing critical success factors for megaprojects, significantly enhanced the accuracy of the predictive model. The superior performance of expert data over economic data, experience data, and documentation data demonstrates the significance of domain expertise. In addition, the results demonstrate the significance of the planning phase by implementing feature selection techniques and feature importance scores. In the planning phase, a small, devoted, and highly experienced team of project planners has proven to be a crucial factor for project success. The thesis concludes that in order for companies to maximize the utility of machine learning, they must identify their critical success factors and collect the corresponding data. / Denna magisteruppsats undersöker följande forskningsfråga: Hur kan maskininlärning och insiktsfull dataanalys användas för att öka effektiviteten i infrastruktursektorns plannerings- och designfas? Denna utmaning löses genom att analysera data från verkliga megaprojekt och tillämpa avancerade maskininlärningsalgoritmer för att förutspå projektframgång och ta reda på framgångsfaktorerna. Vår forskning är särskilt intresserad av megaprojekt på grund av deras komplicerade natur, unika egenskaper och enorma inverkan på samhället. Dessa projekt slutförs sällan, vilket gör att det är svårt att få tillgång till stora mängder verklig data. Det är uppenbart att AI har potential att vara ett ovärderligt verktyg för att förstå och hantera megaprojekts komplexitet, trots de problem vi står inför. Artificiell intelligens gör det möjligt att fatta beslut som är datadrivna och mer informerade. Uppsatsen lyckas med att hanterard det stora problemet som är bristen på data från megaprojekt. Uppsatsen motiveras även av denna brist på data, vilket gör forskningen relevant för andra områden som präglas av litet dataurval. Resultaten från uppsatsen visar att evalueringen av megaprojekt går att förbättra genom smart användning av specifika dataattribut. Uppsatsen inspirerar även företag att börja samla in viktig data för att möjliggöra användningen av artificiell intelligens och maskinginlärning till sin fördel.
|
Page generated in 0.1168 seconds