Global ETD Search

21	Prediction of battery lifetime using early cycle data : A data driven approach Enholm, Isabelle, Valfridsson, Olivia January 2022 (has links) A form of laboratory tests are performed to determine battery degradation due to charging and discharging of batteries (cycling). This is done as part of quality assurance in battery production since a certain amount of degradation corresponds to the end of the battery lifetime. Currently, this requires a significant amount of cycling. Thus, if it’s possible to decrease the number of cycles required, the time and costs for battery degradation testing can be reduced. The aim of this thesis is therefore to create a model for prediction of battery lifetime while using early cycle data. Further, to assist planning regarding scale of cycle testing this study aims to examine the impact of implementing such a prediction model in production. To examine which data driven model that should be used to predict the battery lifetime at the company, extensive feature engineering is performed where measurements from specific cycles are used, inspired by the previous work of Severson et al. (2019) and Fei et al. (2021). Two models are then examined: Linear Regression with Elastic net and Support Vector Regression. To investigate the extent to which an implementation of such a model can affect battery testing capacity, two scenarios are compared. The first scenario is that of the current cycle testing at the company and the second scenario involves implementing a prediction model. The comparison then examines the time required for battery testing and the number of machines to cycle the batteries (cyclers). Based on the results obtained, the data driven model that should be implemented is a Support Vector Regression model with features relating to different battery cycling phases or measurements, such as charge process, temperature and capacity. It can also be shown that if a battery lifetime prediction model is implemented, it can reduce the time and number of cyclers required for testing with approximately 93 %, compared to traditional testing. early prediction Support Vector Regression Elastic Net battery lifetime cycle life battery degradation Mathematics Matematik
22	Intelligent Design of Metal Oxide Gas Sensor Arrays Using Reciprocal Kernel Support Vector Regression Dougherty, Andrew W. 02 November 2010 (has links) No description available. Artificial Intelligence Materials Science Physics support vector regression metal oxide sensor arrays
23	A comparative analysis on the predictive performance of LSTM and SVR on Bitcoin closing prices. Rayyan, Hakim January 2022 (has links) Bitcoin has since its inception in 2009 seen its market capitalisation rise to a staggering 846 billion US Dollars making it the world’s leading cryptocurrency. This has attracted financial analysts as well as researchers to experiment with different models with the aim of developing one capable of predicting Bitcoin closing prices. The aim of this thesis was to examine how well the LSTM and the SVR models performed in predicting Bitcoin closing prices. As a measure of performance, the RMSE, NRMSE and MAPE were used as well as the Random walk without drift as a benchmark to further contextualise the performance of both models. The empirical results show that the Random walk without drift yielded the best results for both the RMSE and NRMSE scoring 1624.638 and 0.02525, respectively while the LSTM outperformed both the Random Walk without drift and the SVR model in terms of the MAPE scoring 0.0272 against 0.0274 for both the Random walk without drift and SVR, respectively. Given the performance of the Random Walk against both models, it cannot be inferred that the LSTM and SVR models yielded statistically significant predictions. / <p>Aaron Green</p> Long Short-Term Memory (LSTM) Support Vector Regression (SVR) Random Walk Bitcoin Economics and Business Ekonomi och näringsliv
24	Prediction of Human Hand Motions based on Surface Electromyography Wang, Anqi 29 June 2017 (has links) Tracking human hand motions has raised more attention due to the recent advancements of virtual reality (Rheingold, 1991) and prosthesis control (Antfolk et al., 2010). Surface electromyography (sEMG) has been the predominant method for sensing electrical activity in biomechanical studies, and has also been applied to motion tracking in recent years. While most studies focus on the classification of human hand motions within a predefined motion set, the prediction of continuous finger joint angles and wrist angles remains a challenging endeavor. In this research, a biomechanical knowledge-driven data fusion strategy is proposed to predict finger joint angles and wrist angles. This strategy combines time series data of sEMG signals and simulated muscle features, which can be extracted from a biomechanical model available in OpenSim (Delp et al., 2007). A support vector regression (SVR) model is used to firstly predict muscle features from sEMG signals and then to predict joint angles from the estimated muscle features. A set of motion data containing 10 types of motions from 12 participants was collected from an institutional review board approved experiment. A hypothesis was tested to validate whether adding the simulated muscle features would significantly improve the prediction performance. The study indicates that the biomechanical knowledge-driven data fusion strategy will improve the prediction of new types of human hand motions. The results indicate that the proposed strategy significantly outperforms the benchmark date-driven model especially when the users were performing unknown types of motions from the model training stage. The proposed model provides a possible approach to integrate the simulation models and data fusion models in human factors and ergonomics. / Master of Science biomechanical simulation data fusion motion tracking support vector regression surface electromyography
25	Surface Based Decoding of Fusiform Face Area Reveals Relationship Between SNR and Accuracy in Support Vector Regression Eltahir, Amnah 24 May 2018 (has links) The objective of this study was to expand on a method previously established in the lab for predicting subcortical structures using functional magnetic resonance imaging (fMRI) data restricted to the cortical surface. Our goal is to enhance the utility of low cost, portable imaging modalities, such as functional near infrared spectroscopy (fNIRS), which is limited in signal penetration depth. Previous work in the lab successfully employed functional connectivity to predict ten resting state networks and six anatomically de fined structures from the outer 10 mm layer of cortex using resting state fMRI data. The novelty of this study was two-fold: we chose to predict the functionally de fined region fusiform face area (FFA), and we utilized the functional connectivity of both resting state and task activation. Right FFA was identi ed for 27 subjects using a general linear model of a functional localizer tasks, and the average time series were extracted from right FFA and used as training and testing labels in support vector regression (SVR) models. Both resting state and task data decoded activity in right FFA above chance, both within and between run types. Our method is not specific to resting state, potentially broadening the scope of research questions depth-limited techniques can address. We observed a similarity in our accuracy cross-validation to previous work in the lab. We characterized this relationship between prediction accuracy and spatial signal-to-noise (SNR). We found that this relationship varied between resting state and task, as well as the functionality of features included in SVR modeling. / Master of Science / We used functional magnetic resonance imaging (fMRI) to predict activity in a deep brain region based on activity along the brain surface. This would increase the type of brain function a person could study using alternative methods that are less costly and easier to use, but can only detect signals along the surface. We were able to use this method to predict the fusiform face are, a region in the brain that responds more strongly to face images than other types of images. We also found a relationship between the quality of spatial information in the brain and the accuracy of predictions. This relationship differed depending on the types of brain regions were used to build the models, as well as whether the subjects were performing a task or rest scan. fMRI Machine learning Support Vector Regression Fusiform Face Area Resting State
26	Dynamic Load Modeling from PSSE-Simulated Disturbance Data using Machine Learning Gyawali, Sanij 14 October 2020 (has links) Load models have evolved from simple ZIP model to composite model that incorporates the transient dynamics of motor loads. This research utilizes the latest trend on Machine Learning and builds reliable and accurate composite load model. A composite load model is a combination of static (ZIP) model paralleled with a dynamic model. The dynamic model, recommended by Western Electricity Coordinating Council (WECC), is an induction motor representation. In this research, a dual cage induction motor with 20 parameters pertaining to its dynamic behavior, starting behavior, and per unit calculations is used as a dynamic model. For machine learning algorithms, a large amount of data is required. The required PMU field data and the corresponding system models are considered Critical Energy Infrastructure Information (CEII) and its access is limited. The next best option for the required amount of data is from a simulating environment like PSSE. The IEEE 118 bus system is used as a test setup in PSSE and dynamic simulations generate the required data samples. Each of the samples contains data on Bus Voltage, Bus Current, and Bus Frequency with corresponding induction motor parameters as target variables. It was determined that the Artificial Neural Network (ANN) with multivariate input to single parameter output approach worked best. Recurrent Neural Network (RNN) is also experimented side by side to see if an additional set of information of timestamps would help the model prediction. Moreover, a different definition of a dynamic model with a transfer function-based load is also studied. Here, the dynamic model is defined as a mathematical representation of the relation between bus voltage, bus frequency, and active/reactive power flowing in the bus. With this form of load representation, Long-Short Term Memory (LSTM), a variation of RNN, performed better than the concurrent algorithms like Support Vector Regression (SVR). The result of this study is a load model consisting of parameters defining the load at load bus whose predictions are compared against simulated parameters to examine their validity for use in contingency analysis. / Master of Science / Independent system Operators (ISO) and Distribution system operators (DSO) have a responsibility to provide uninterrupted power supply to consumers. That along with the longing to keep operating cost minimum, engineers and planners study the system beforehand and seek to find the optimum capacity for each of the power system elements like generators, transformers, transmission lines, etc. Then they test the overall system using power system models, which are mathematical representation of the real components, to verify the stability and strength of the system. However, the verification is only as good as the system models that are used. As most of the power systems components are controlled by the operators themselves, it is easy to develop a model from their perspective. The load is the only component controlled by consumers. Hence, the necessity of better load models. Several studies have been made on static load modeling and the performance is on par with real behavior. But dynamic loading, which is a load behavior dependent on time, is rather difficult to model. Some attempts on dynamic load modeling can be found already. Physical component-based and mathematical transfer function based dynamic models are quite widely used for the study. These load structures are largely accepted as a good representation of the systems dynamic behavior. With a load structure in hand, the next task is estimating their parameters. In this research, we tested out some new machine learning methods to accurately estimate the parameters. Thousands of simulated data are used to train machine learning models. After training, we validated the models on some other unseen data. This study finally goes on to recommend better methods to load modeling. Dynamic Load Modeling Neural Network Long-Short Term Memory Support Vector Regression Phasor Measurement Units
27	Metodologia computacional para detecção e diagnóstico automáticos e planejamento cirúrgico do estrabismo / COMPUTATIONAL METHODS FOR DETECTION AND AUTOMATIC DIAGNOSIS AND SURGICAL PLANNING OF STRABISMUS ALMEIDA, João Dallyson Sousa de 05 July 2013 (has links) Submitted by Rosivalda Pereira (mrs.pereira@ufma.br) on 2017-08-14T20:25:49Z No. of bitstreams: 1 JoaoDallyson.pdf: 6621483 bytes, checksum: 19e928fa3d5789994cc1db5d829e0575 (MD5) / Made available in DSpace on 2017-08-14T20:25:49Z (GMT). No. of bitstreams: 1 JoaoDallyson.pdf: 6621483 bytes, checksum: 19e928fa3d5789994cc1db5d829e0575 (MD5) Previous issue date: 2013-07-05 / Strabismus is a condition that affects approximately 4% of the population causing aesthetic problems, reversible at any age, and irreversible sensory changes that modify the mechanism of vision. The Hirschberg test is one of the types of existing tests to detect such a condition. Detection Systems and computeraided diagnosis are being used with some success in helping health professionals. However, in spite of the increasing routine use of high-tech technologies, the diagnosis and therapy in ophthalmology is not a reality within the strabismus subspecialty. Thus, this thesis aims to present a methodology to detect and automatically diagnose and propose the plan of strabismus surgery through digital images. To do this, the study is organized in seven steps: (1) face segmentation; (2) eye region detection; (3) eyes location; (4) limbus and brilliance location; (5) detection, (6) diagnosis and (7) surgical planning of strabismus. The effectiveness of the study in the indication of the diagnosis and surgical plan was evaluated by the mean diference between the results provided by the methodology and the original indication of the expert. Patients were evaluated for eye positions: PPO, INFRA, SUPRA, DEXTRO and LEVO. The method was 88% accurate in identifying esotropias (ET), 100% in exotropias (XT), 80.33% in hipertropias (HT) and 83.33% in hipotropias (HoT). The overall average error in diagnosis was 5:6 and 3:83 for horizontal and vertical desviations, respectivelly. In planning surgeries of medial rectus muscles the average error was 0.6 mm for recession, and 0.9 mm for ressection. For lateral rectus muscles, the average error was 0.8 mm for recession, and 1 mm for resection. / O estrabismo é uma patologia que afeta cerca de 4% da população, provocando problemas estéticos (reversíveis a qualquer idade) e alterações sensoriais irreversíveis, modi cando o mecanismo da visão. O teste de Hirschberg é um dos tipos de exames existentes para detectar tal patologia. Sistemas de Detecção e Diagnóstico auxiliados por computador estão sendo usados com relativo sucesso no auxílio aos pro fissionais de saúde. No entanto, o emprego rotineiro de recursos de alta tecnologia no auxílio diagnóstico e terapêutico em oftalmologia não é uma realidade dentro da subespecialidade estrabismo. Sendo assim, o presente trabalho tem como objetivo apresentar uma metodologia para detectar e diagnosticar automaticamente, além de propor o plano cirúrgico do estrabismo por meio de imagens digitais. Para tanto, o estudo está organizado em sete estágios: (1) segmentação da face; (2) detecção da região dos olhos; (3) localização dos olhos; (4) localização do limbo e do brilho; (5) detecção; (6) diagnóstico e (7) planejamento cirúrgico do estrabismo. A e ficácia do estudo na indicação do diagnóstico e do plano cirúrgico foi avaliada pela m édia da diferença entre os resultados fornecidos pela metodologia e as indicações originais do especialista. Os pacientes foram avaliados nas posições do olhar: PPO, INFRA, SUPRA, DEXTRO e LEVO. O método obteve acuracia de 88% na identi cação de esotropias (ET), 100% nas exotropias (XT), 80,33% nas hipertropias (HT) e 83,33% nas hipotropias (HoT). O erro médio global na realização do diagnóstico foi de 5:6 e 3:83 para desvios horizontais e verticais, respectivamente. No planejamento de cirurgias de músculos retos mediais obteve-se erro médio de 0,6 mm para recuo, e 0,9 mm para ressecção. Para os músculos retos laterais, o erro médio foi de 0,8 mm para recuo e 1 mm para ressecção. Detecção de Estrabismo Diagnóstico do Estrabismo Planejamento Cirúrgico do Estrabismo Método de Hirschberg Imagens Digitais Support Vector Classification Support Vector Regression Strabismus Detection Strabismus diagnosis Surgical Planning of Strabismus Hirschberg method Support Vector Classification Support Vector Regression Digital Images Ciência da Computação Processamento Gráfico
28	Modeling Melodic Accents in Jazz Solos / Modellering av melodiska accenter i jazzsolon Berrios Salas, Misael January 2023 (has links) This thesis looks at how accurately one can model accents in jazz solos, more specifically the sound level. Further understanding the structure of jazz solos can give a way of pedagogically presenting differences within music styles and even between performers. Some studies have tried to model perceived accents in different music styles. In other words, model how listeners perceive some tones as somehow accentuated and more important than others. Other studies have looked at how the sound level correlates to other attributes of the tone. But to our knowledge, no other studies have been made modeling actual accents within jazz solos, nor have other studies had such a big amount of training data. The training data used is a set of 456 solos from the Weimar Jazz Database. This is a database containing tone data and metadata from monophonic solos performed with multiple instruments. The features used for the training algorithms are features obtained from the software Director Musices created at the Royal Institute of Technology in Sweden; features obtained from the software "melfeature" created at the University of Music Franz Liszt Weimar in Germany; and features built upon tone data or solo metadata from the Weimar Jazz Database. A comparison between these is made. Three learning algorithms are used, Multiple Linear Regression (MLR), Support Vector Regression (SVR), and eXtreme Gradient Boosting (XGBoost). The first two are simpler regression models while the last is an award-winning tree boosting algorithm. The tests resulted in eXtreme Gradient Boosting (XGBoost) having the highest accuracy when combining all the available features minus some features that were removed since they did not improve the accuracy. The accuracy was around 27% with a high standard deviation. This tells that there was quite some difference when predicting the different solos, some had an accuracy of about 67% while others did not predict one tone correctly in the entire solo. But as a general model, the accuracy is too low for actual practical use. Either the methods were not the optimal ones or jazz solos differ too much to find a general pattern. / Detta examensarbete undersöker hur väl man kan modellera accenter i jazz-solos, mer specifikt ljudnivån. En bredare förståelse för strukturen i jazzsolos kan ge ett sätt att pedagogiskt presentera skillnaderna mellan olika musikstilar och även mellan olika artister. Andra studier har försökt modellera uppfattade accenter inom olika musik-stilar. Det vill säga, modellera hur åhörare upplever vissa toner som accentuerade och viktigare än andra. Andra studier har undersökt hur ljudnivån är korrelerad till andra attribut hos tonen. Men såvitt vi vet, så finns det inga andra studier som modellerar faktiska accenter inom jazzsolos, eller som haft samma stora mängd träningsdata. Träningsdatan som använts är ett set av 456 solos tagna från Weimar Jazz Database. Databasen innehåller data på toner och metadata från monofoniska solos genomförda med olika instrument. Särdragen som använts för tränings-algoritmerna är särdrag erhållna från mjukvaran Director Musices skapad på Kungliga Tekniska Högskolan i Sverige; särdrag erhållna från mjukvaran ”melfeature” skapad på University of Music Franz Liszt Weimar i Tyskland; och särdrag skapade utifrån datat i Weimar Jazz Database. En jämförelse mellan dessa har också gjorts. Tre inlärningsalgoritmer har använts, Multiple Linear Regression (MLR), Support Vector Regression (SVR), och eXtreme Gradient Boosting (XGBoost). De första två är enklare regressionsalgoritmer, medan den senare är en prisbelönt trädförstärkningsalgoritm. Testen resulterade i att eXtreme Gradient Boosting (XGBoost) skapade en modell med högst noggrannhet givet alla tillgängliga särdrag som träningsdata minus vissa särdrag som tagits bort då de inte förbättrar noggrannheten. Den erhållna noggrannheten låg på runt 27% med en hög standardavvikelse. Detta pekar på att det finns stora skillnader mellan att förutsäga ljudnivån mellan de olika solin. Vissa solin gav en noggrannhet på runt 67% medan andra erhöll inte en endaste ljudnivå korrekt i hela solot. Men som en generell modell är noggrannheten för låg för att användas i praktiken. Antingen är de valda metoderna inte de bästa, eller så är jazzsolin för olika för att hitta ett generellt mönster som går att förutsäga. Accents Jazz Solo Support Vector Regression (SVR) eXtreme Gradient Boosting (XGBoost) Multiple Linear Regression (MLR) Dynamic Accenter Jazz Solos Support Vector Regression (SVR) eXtreme Gradient Boosting (XGBoost) Multiple Linear Regression (MLR) Dynamisk Computer and Information Sciences Data- och informationsvetenskap
29	Effect of a diffuser on the power production of an ocean current turbine Reinecke, Josh 03 1900 (has links) Thesis (MScEng (Mechanical and Mechatronic Engineering))--University of Stellenbosch, 2011. / Please refer to full text to view abstract. Ocean current turbines Diffuser augmentation Support vector regression Renewable energy Ocean wave power Dissertations -- Mechanical engineering Theses -- Mechanical engineering
30	Σχεδιασμός, υλοποίηση και εφαρμογή μεθόδων υπολογιστικής νοημοσύνης για την πρόβλεψη παθογόνων μονονουκλεοτιδικών πολυμορφισμών Ραπακούλια, Τρισεύγενη 11 October 2013 (has links) Η πιο απλή μορφή γενετικής διαφοροποίησης στον άνθρωπο είναι οι μονονουκλεοτιδικοί πολυμορφισμοί (Single Nucleotide Polymorphisms - SNPs). Ο αριθμός αυτού του είδους πολυμορφισμών που έχουν βρεθεί στο ανθρώπινο γονιδίωμα και επηρεάζουν την παραγόμενη πρωτεΐνη αυξάνεται συνεχώς, αλλά η αντιστοίχηση τους σε πιθανές ασθένειες με πειραματικές μεθόδους είναι ασύμφορη από θέμα χρόνου και κόστους. Για αυτό τον λόγο έχουν αναπτυχθεί διάφορες υπολογιστικές μέθοδοι με σκοπό να ταξινομήσουν τους μονονουκλεοτιδικούς πολυμορφισμούς σε παθογόνους και μη. Οι περισσότερες από αυτές τις μεθόδους χρησιμοποιούν ταξινομητές, οι οποίοι παίρνοντας σαν είσοδο ένα σύνολο δομικών, λειτουργικών, ακολουθιακών και εξελικτικών χαρακτηριστικών, επιχειρούν να προβλέψουν αν ένας μονονουκλεοτιδικός πολυμορφισμός είναι παθογόνος ή μη. Για την εκπαίδευση αυτών των ταξινομητών, χρησιμοποιούνται δύο σύνολα μονονουκλεοτιδικών πολυμορφισμών. Το πρώτο αποτελείται από μονονουκλεοτιδικούς πολυμορφισμούς που έχει βρεθεί πειραματικά ότι οδηγούν σε παθογένεια και το δεύτερο από μονονουκλεοτιδικούς πολυμορφισμούς που έχει αποδειχθεί πειραματικά ότι είναι αδρανείς. Οι μέθοδοι αυτές διαφέρουν στα χαρακτηριστικά των μεταλλάξεων που λαμβάνουν υπόψη στην πρόβλεψη τους, καθώς επίσης και στην εκπαίδευση και τη φύση των τεχνικών ταξινόμησης, που χρησιμοποιούν για τη λήψη των αποφάσεων. Το βασικότερο προβλήματα τους ωστόσο έγκειται στο γεγονός ότι καθορίζουν τα χαρακτηριστικά, που θα χρησιμοποιήσουν σαν είσοδο στους ταξινομητές τους με τρόπο εμπειρικό και μάλιστα διαφορετικές μέθοδοι προτείνουν και χρησιμοποιούν διαφορετικά χαρακτηριστικά, χωρίς να τεκμηριώνουν επαρκώς τις αιτίες αυτής της διαφοροποίησης. Δύο ακόμα προβλήματα που δεν έχουν καταφέρει να αντιμετωπίσουν οι υπάρχουσες μεθοδολογίες είναι το πρόβλημα της ανισορροπίας των δύο κλάσεων ταξινόμησης και των ελλιπών τιμών σε πολλά από τα χαρακτηριστικά εισόδου των ταξινομητών, ώστε να επιτυγχάνουν πιο ακριβή και αξιόπιστα αποτελέσματα. Από τα παραπάνω είναι ξεκάθαρο πως υπάρχει μεγάλο περιθώριο βελτίωσης των υπάρχουσων μεθοδολογιών για το συγκεκριμένο πρόβλημα ταξινόμησης. Στην παρούσα διπλωματική εργασία προτείνουμε μια νέα υβριδική μεθοδολογία υπολογιστικής νοημοσύνης, που ξεπερνά πολλά από τα προβλήματα των υπάρχοντων μεθοδολογιών και βελτιώνει με τον τρόπο αυτό την απόδοσή τους. Δύο είναι τα βασικά βήματα που ακολουθήσαμε για την επίτευξη του στόχου αυτού. Πρώτον, συγκεντρώσαμε από τις διαθέσιμες δημόσιες βάσεις δεδομένων, τους μονονουκλεοτιδικούς πολυμορφισμούς που χρησιμοποιήθηκαν για την εκπαίδευση και τον έλεγχο των μοντέλων μηχανικής μάθησης. Συγκεκριμένα, συλλέχθησαν και φιλτραρίστηκαν τα θετικά και αρνητικά σύνολα εκπαίδευσης και ελέγχου, που αποτελούνται από μονονουκλεοτιδικούς πολυμορφισμούς που είτε οδηγούν σε παθογένεια, είτε είναι ουδέτεροι. Για κάθε πολυμορφισμό των δύο συνόλων υπολογίσαμε χρησιμοποιώντας υπάρχοντα διαθέσιμα εργαλεία όσο το δυνατό περισσότερα δομικά, λειτουργικά, ακολουθιακά και εξελικτικά χαρακτηριστικά. Για εκείνα τα χαρακτηριστικά, για τα οποία δεν υπήρχε κάποιο διαθέσιμο εργαλείο υπολογισμού τους, υλοποιήσαμε τον κατάλληλο κώδικα για τον υπολογισμό τους. Το δεύτερο βήμα της διπλωματικής αφορούσε το σχεδιασμό και την υλοποίηση της κατάλληλης υβριδικής μεθόδου για την επίλυση του προβλήματος που μελετάμε. Χρησιμοποιήσαμε μια νέα μέθοδο ταξινόμησης την EnsembleGASVR. Πρόκειται για μια ensemble μεθοδολογία, που συνδυάζει σε ένα ενιαίο πλαίσιο ταξινόμησης οκτώ διαφορετικούς ταξινομητές. Κάθε ένας από αυτούς τους ταξινομητές βασίζεται στον υβριδικό συνδυασμό των Γενετικών Αλγορίθμων και των μοντέλων Παλινδρόμησης Διανυσμάτων Υποστήριξης (nu-Support Vector Regression). Συγκεκριμένα ένας Προσαρμοζόμενος Γενετικός Αλγόριθμος χρησιμοποιείται για να καθοριστεί το βέλτιστο υποσύνολο χαρακτηριστικών, καθώς και οι βέλτιστες τιμές των παραμέτρων των ταξινομητών. Σαν μέθοδο ταξινόμησης των μεταλλάξεων σε ουδέτερες και παθογενείς, προτείνουμε τον nu-SVR ταξινομητή, καθώς παρουσιάζει υψηλή απόδοση, καλή γενίκευση, δεν παγιδεύεται σε τοπικά βέλτιστα, ενώ ταυτόχρονα επιτυγχάνει την ισορροπία μεταξύ της ακρίβειας και της πολυπλοκότητας του μοντέλου. Μάλιστα για να ξεπεράσουμε τα πρόβληματα των ελλιπών τιμών και της ανισορροπίας των δύο κλάσεων ταξινόμησης, αλλά και για να βελτιώσουμε τη συνολική απόδοση της μεθοδολογίας μας, επεκτείναμε τον υβριδικό αλγόριθμο, ώστε να λειτουργεί σαν μία ensemble-συλλογική τεχνική, συνδυάζοντας οκτώ επί μέρους μοντέλα ταξινόμησης. Τα πειραματικά αποτελέσματα της προτεινόμενης μεθοδολογίας ήταν εξαιρετικά ελπιδοφόρα, καθώς η EnsembleGASVR μεθοδολογία υπερτερεί σημαντικά έναντι άλλων ευρέως γνωστών μεθόδων ταξινόμησης παθογενών μεταλλάξεων. / Single Nucleotide Polymorphisms (SNPs) are the most common form of genetic variations in humans. The number of SNPs that have been found in human genome and affect protein functionality is constantly increasing. Finding matches between SNPs and diseases using experimental techniques, is excessive disadvantageous in terms of time and cost. For this reason, several computational methods have been developed. These methods classify polymorphisms as pathogenic and non-pathogenic. Most of them use classifiers, which take as input a set of structural, functional, sequential and evolutionary features and predict whether a single nucleotide polymorphism is pathogenic or neutral. For training these classifiers use two sets of SNPs. The first one consists of SNPs that have been experimentally proven as pathogenic, whereas the second set consists of SNPs that have been experimentally characterized as benign. These methods differ in the classification methods they deploy and in the features they use as inputs. However, the main problem is the determination of an empirically verified set of features for training. Specifically, different methods suggest different feature sets, without adequately documenting the causes of this differentiation. In addition, the existing methodologies do not tackle efficiently the class imbalance problem between positive and negative training sets and the problem of missing values in the datasets. In this thesis a new hybrid computational intelligence methodology is proposed, that overcomes many of the problems of existing methodologies. The proposed method achieves high classification performance and systematizes the selection of relevant features. In the first phase of this study the polymorphisms were gathered from the available public databases and they were used for training and testing of the machine learning models. Specifically, the positive and negative training and test sets were collected and filtered. They consist of single nucleotide polymorphisms that lead to either pathogenesis or are neutral. For each polymorphism of the two sets, using existing available tools, a wide range of structural, functional, sequential and evolutionary features were calculated. For those features for which there was no available tool, the suitable program (code) was developed in order to compute them. In the second step a new embedded hybrid classification method called EnsembleGASVR is designed and implemented. The method uses an ensemble methodology, based on hybrid combination of Genetic Algorithms and nu-Support Vector Regression (nu-SVR) models. An Adaptive Genetic Algorithm is used to determine the optimal subset of features and the optimal values of the parameters of classifiers. We propose the nu-SVR classifier, since it exhibits high performance, good generalization ability, it is not trapped in local optima and achieves a balance between accuracy and complexity of the model. In order to overcome the problem of missing values and class imbalance, we extended the above algorithm to function as a collective ensemble-technique, combining eight individual classification models. In overall, the method achieves 87.45% accuracy, 71.78% sensitivity and 93.16% specificity. These priliminary results are very promising and shows that EnsembleGASVR methodology significantly outperforms other well-known classification methods for pathogenic mutations. Μηχανική μάθηση Γενετικοί αλγόριθμοι 616.042 Pathogenic mutations Single Nucleotide Polymorphisms (SNPs) Ensemble methods Support vector regression

Search results