31 |
Σχεδιασμός, υλοποίηση και εφαρμογή μεθόδων υπολογιστικής νοημοσύνης για την πρόβλεψη παθογόνων μονονουκλεοτιδικών πολυμορφισμώνΡαπακούλια, Τρισεύγενη 11 October 2013 (has links)
Η πιο απλή μορφή γενετικής διαφοροποίησης στον άνθρωπο είναι οι μονονουκλεοτιδικοί πολυμορφισμοί (Single Nucleotide Polymorphisms - SNPs). Ο αριθμός αυτού του είδους πολυμορφισμών που έχουν βρεθεί στο ανθρώπινο γονιδίωμα και επηρεάζουν την παραγόμενη πρωτεΐνη αυξάνεται συνεχώς, αλλά η αντιστοίχηση τους σε πιθανές ασθένειες με πειραματικές μεθόδους είναι ασύμφορη από θέμα χρόνου και κόστους. Για αυτό τον λόγο έχουν αναπτυχθεί διάφορες υπολογιστικές μέθοδοι με σκοπό να ταξινομήσουν τους μονονουκλεοτιδικούς πολυμορφισμούς σε παθογόνους και μη. Οι περισσότερες από αυτές τις μεθόδους χρησιμοποιούν ταξινομητές, οι οποίοι παίρνοντας σαν είσοδο ένα σύνολο δομικών, λειτουργικών, ακολουθιακών και εξελικτικών χαρακτηριστικών, επιχειρούν να προβλέψουν αν ένας μονονουκλεοτιδικός πολυμορφισμός είναι παθογόνος ή μη. Για την εκπαίδευση αυτών των ταξινομητών, χρησιμοποιούνται δύο σύνολα μονονουκλεοτιδικών πολυμορφισμών. Το πρώτο αποτελείται από μονονουκλεοτιδικούς πολυμορφισμούς που έχει βρεθεί πειραματικά ότι οδηγούν σε παθογένεια και το δεύτερο από μονονουκλεοτιδικούς πολυμορφισμούς που έχει αποδειχθεί πειραματικά ότι είναι αδρανείς. Οι μέθοδοι αυτές διαφέρουν στα χαρακτηριστικά των μεταλλάξεων που λαμβάνουν υπόψη στην πρόβλεψη τους, καθώς επίσης και στην εκπαίδευση και τη φύση των τεχνικών ταξινόμησης, που χρησιμοποιούν για τη λήψη των αποφάσεων. Το βασικότερο προβλήματα τους ωστόσο έγκειται στο γεγονός ότι καθορίζουν τα χαρακτηριστικά, που θα χρησιμοποιήσουν σαν είσοδο στους ταξινομητές τους με τρόπο εμπειρικό και μάλιστα διαφορετικές μέθοδοι προτείνουν και χρησιμοποιούν διαφορετικά χαρακτηριστικά, χωρίς να τεκμηριώνουν επαρκώς τις αιτίες αυτής της διαφοροποίησης. Δύο ακόμα προβλήματα που δεν έχουν καταφέρει να αντιμετωπίσουν οι υπάρχουσες μεθοδολογίες είναι το πρόβλημα της ανισορροπίας των δύο κλάσεων ταξινόμησης και των ελλιπών τιμών σε πολλά από τα χαρακτηριστικά εισόδου των ταξινομητών, ώστε να επιτυγχάνουν πιο ακριβή και αξιόπιστα αποτελέσματα. Από τα παραπάνω είναι ξεκάθαρο πως υπάρχει μεγάλο περιθώριο βελτίωσης των υπάρχουσων μεθοδολογιών για το συγκεκριμένο πρόβλημα ταξινόμησης.
Στην παρούσα διπλωματική εργασία προτείνουμε μια νέα υβριδική μεθοδολογία υπολογιστικής νοημοσύνης, που ξεπερνά πολλά από τα προβλήματα των υπάρχοντων μεθοδολογιών και βελτιώνει με τον τρόπο αυτό την απόδοσή τους. Δύο είναι τα βασικά βήματα που ακολουθήσαμε για την επίτευξη του στόχου αυτού. Πρώτον, συγκεντρώσαμε από τις διαθέσιμες δημόσιες βάσεις δεδομένων, τους μονονουκλεοτιδικούς πολυμορφισμούς που χρησιμοποιήθηκαν για την εκπαίδευση και τον έλεγχο των μοντέλων μηχανικής μάθησης. Συγκεκριμένα, συλλέχθησαν και φιλτραρίστηκαν τα θετικά και αρνητικά σύνολα εκπαίδευσης και ελέγχου, που αποτελούνται από μονονουκλεοτιδικούς πολυμορφισμούς που είτε οδηγούν σε παθογένεια, είτε είναι ουδέτεροι. Για κάθε πολυμορφισμό των δύο συνόλων υπολογίσαμε χρησιμοποιώντας υπάρχοντα διαθέσιμα εργαλεία όσο το δυνατό περισσότερα δομικά, λειτουργικά, ακολουθιακά και εξελικτικά χαρακτηριστικά. Για εκείνα τα χαρακτηριστικά, για τα οποία δεν υπήρχε κάποιο διαθέσιμο εργαλείο υπολογισμού τους, υλοποιήσαμε τον κατάλληλο κώδικα για τον υπολογισμό τους.
Το δεύτερο βήμα της διπλωματικής αφορούσε το σχεδιασμό και την υλοποίηση της κατάλληλης υβριδικής μεθόδου για την επίλυση του προβλήματος που μελετάμε. Χρησιμοποιήσαμε μια νέα μέθοδο ταξινόμησης την EnsembleGASVR. Πρόκειται για μια ensemble μεθοδολογία, που συνδυάζει σε ένα ενιαίο πλαίσιο ταξινόμησης οκτώ διαφορετικούς ταξινομητές. Κάθε ένας από αυτούς τους ταξινομητές βασίζεται στον υβριδικό συνδυασμό των Γενετικών Αλγορίθμων και των μοντέλων Παλινδρόμησης Διανυσμάτων Υποστήριξης (nu-Support Vector Regression). Συγκεκριμένα ένας Προσαρμοζόμενος Γενετικός Αλγόριθμος χρησιμοποιείται για να καθοριστεί το βέλτιστο υποσύνολο χαρακτηριστικών, καθώς και οι βέλτιστες τιμές των παραμέτρων των ταξινομητών. Σαν μέθοδο ταξινόμησης των μεταλλάξεων σε ουδέτερες και παθογενείς, προτείνουμε τον nu-SVR ταξινομητή, καθώς παρουσιάζει υψηλή απόδοση, καλή γενίκευση, δεν παγιδεύεται σε τοπικά βέλτιστα, ενώ ταυτόχρονα επιτυγχάνει την ισορροπία μεταξύ της ακρίβειας και της πολυπλοκότητας του μοντέλου. Μάλιστα για να ξεπεράσουμε τα πρόβληματα των ελλιπών τιμών και της ανισορροπίας των δύο κλάσεων ταξινόμησης, αλλά και για να βελτιώσουμε τη συνολική απόδοση της μεθοδολογίας μας, επεκτείναμε τον υβριδικό αλγόριθμο, ώστε να λειτουργεί σαν μία ensemble-συλλογική τεχνική, συνδυάζοντας οκτώ επί μέρους μοντέλα ταξινόμησης. Τα πειραματικά αποτελέσματα της προτεινόμενης μεθοδολογίας ήταν εξαιρετικά ελπιδοφόρα, καθώς η EnsembleGASVR μεθοδολογία υπερτερεί σημαντικά έναντι άλλων ευρέως γνωστών μεθόδων ταξινόμησης παθογενών μεταλλάξεων. / Single Nucleotide Polymorphisms (SNPs) are the most common form of genetic variations in humans. The number of SNPs that have been found in human genome and affect protein functionality is constantly increasing. Finding matches between SNPs and diseases using experimental techniques, is excessive disadvantageous in terms of time and cost. For this reason, several computational methods have been developed. These methods classify polymorphisms as pathogenic and non-pathogenic. Most of them use classifiers, which take as input a set of structural, functional, sequential and evolutionary features and predict whether a single nucleotide polymorphism is pathogenic or neutral. For training these classifiers use two sets of SNPs. The first one consists of SNPs that have been experimentally proven as pathogenic, whereas the second set consists of SNPs that have been experimentally characterized as benign. These methods differ in the classification methods they deploy and in the features they use as inputs. However, the main problem is the determination of an empirically verified set of features for training. Specifically, different methods suggest different feature sets, without adequately documenting the causes of this differentiation. In addition, the existing methodologies do not tackle efficiently the class imbalance problem between positive and negative training sets and the problem of missing values in the datasets.
In this thesis a new hybrid computational intelligence methodology is proposed, that overcomes many of the problems of existing methodologies. The proposed method achieves high classification performance and systematizes the selection of relevant features. In the first phase of this study the polymorphisms were gathered from the available public databases and they were used for training and testing of the machine learning models. Specifically, the positive and negative training and test sets were collected and filtered. They consist of single nucleotide polymorphisms that lead to either pathogenesis or are neutral. For each polymorphism of the two sets, using existing available tools, a wide range of structural, functional, sequential and evolutionary features were calculated. For those features for which there was no available tool, the suitable program (code) was developed in order to compute them.
In the second step a new embedded hybrid classification method called EnsembleGASVR is designed and implemented. The method uses an ensemble methodology, based on hybrid combination of Genetic Algorithms and nu-Support Vector Regression (nu-SVR) models. An Adaptive Genetic Algorithm is used to determine the optimal subset of features and the optimal values of the parameters of classifiers. We propose the nu-SVR classifier, since it exhibits high performance, good generalization ability, it is not trapped in local optima and achieves a balance between accuracy and complexity of the model. In order to overcome the problem of missing values and class imbalance, we extended the above algorithm to function as a collective ensemble-technique, combining eight individual classification models. In overall, the method achieves 87.45% accuracy, 71.78% sensitivity and 93.16% specificity. These priliminary results are very promising and shows that EnsembleGASVR methodology significantly outperforms other well-known classification methods for pathogenic mutations.
|
32 |
Modelling And Predicting Binding Affinity Of Pcp-like Compounds Using Machine Learning MethodsErdas, Ozlem 01 September 2007 (has links) (PDF)
Machine learning methods have been promising tools in science and engineering fields. The use of these methods in chemistry and drug design has advanced after 1990s. In this study, molecular electrostatic potential (MEP) surfaces of PCP-like compounds are modelled and visualized in order to extract features which will be used in predicting binding affinity. In modelling, Cartesian coordinates of MEP surface points are mapped onto a spherical self-organizing map. Resulting maps are visualized by using values of electrostatic potential. These values also provide features for prediction system. Support vector machines and partial least squares method are used for predicting binding affinity of compounds, and results are compared.
|
33 |
A Methodology for Scheduling Operating Rooms Under UncertaintyDavila, Marbelly Paola 01 January 2013 (has links)
An operating room (OR) is considered to be one of the most costly functional areas within hospitals as well as its major profit center. It is known that managing an OR department is a challenging task, which requires the integration of many actors (e.g., patients, surgeons, nurses, technicians) who may have conflicting interests and priorities.
Considering these aspects, this dissertation focuses on developing a simulation based methodology for scheduling operating rooms under uncertainty, which reflects the complexity, uncertainty and variability associated with surgery.
We split the process of scheduling ORs under uncertainty into two main components. First, we designed a research roadmap for modeling surgical procedure duration (from incision to wound closure) based on the surgery volume and time variability. Then, using a real surgical dataset we modeled the procedure duration using parametric and distribution-free predictive methods. We found that Support Vector Regression performs better that Generalized Linear Models increasing the prediction accuracy on unseen data by at least 5.5%.
Next, we developed a simulation based methodology for scheduling ORs through a case study. For that purpose, we initially built one day feasible schedules using the 60th, 70th, 80th, and 90th percentiles to allocate surgical procedures to ORs using four different allocation policies. We then used a discrete event simulation model to evaluate the robustness of these initial feasible schedules considering the stochastic duration of all the OR activities and the arrival of surgical emergency cases. We found that on average elective waiting almost doubled the time for the emergency cases. In addition, we observed that there is not a clear effect of how being more conservative in scheduling within each scheduling policy impacts the elective waiting times. By contrast, there is a clear effect of how the scheduling policy and scheduling percentile impact the emergency waiting times. Thus, as we increase the percentile, the waiting times for emergency cases remarkably increases under half of the scheduling policies but reflects a lesser impact under scheduling the other half. OR utilization and OR overtime in a "virtual" eight operating room hospital fluctuate between 67% and 88% and 97 and 111 minutes respectively. Moreover, we noticed that both performance metrics depend not only on the levels of the scheduling policy and scheduling percentile but also are strongly affected by the increase of the emergency arrival rate.
Finally, we fit a multivariate-multiple-regression model using the output of the simulation model to assess the robustness of the model and the extent to which these results can be generalized to a single, aggregate hospital goal. Further research should include a true stochastic optimization model to integrate optimization techniques into simulation analysis.
|
34 |
Predicting the Clinical Outcome in Patients with Traumatic Brain Injury using Clinical Pathway ScoresMendoza Alonzo, Jennifer Lorena 01 January 2013 (has links)
The Polytrauma/TBI Rehabilitation Center (PRC) of the Veterans Affairs Hospital (VAH) treats patients with Traumatic Brain Injury (TBI). These patients have major motor and cognitive disabilities. Most of the patients stay in the hospital for many months without major improvements. This suggests that patients, family and the VAH could benefit if healthcare provider had a way to better assess or "predict" patients' progression. The individual progress of patients over time is assessed using a pre-defined multi-component performance measure Functional Independence Measures (FIM) at admission and discharge, and a semi-quantitative documentation parameter Clinical Pathway (CP) at weekly intervals. This work uses already de-identified and transformed data to explore developing a clinical outcome predictive model for patients with TBI, as early as possible. The clinical outcome is measured as percentage of recovery using CP scores. The results of this research will allow healthcare providers to improve the current resource management (e.g. staff, equipment, space) through setting goals for each patient, as well as to provide the family more accurate and timely information about the status and needs of the patient.
|
35 |
Evaluation of a Guided Machine Learning Approach for Pharmacokinetic ModelingJanuary 2017 (has links)
abstract: A medical control system, a real-time controller, uses a predictive model of human physiology for estimation and controlling of drug concentration in the human body. Artificial Pancreas (AP) is an example of the control system which regulates blood glucose in T1D patients. The predictive model in the control system such as Bergman Minimal Model (BMM) is based on physiological modeling technique which separates the body into the number of anatomical compartments and each compartment's effect on body system is determined by their physiological parameters. These models are less accurate due to unaccounted physiological factors effecting target values. Estimation of a large number of physiological parameters through optimization algorithm is computationally expensive and stuck in local minima. This work evaluates a machine learning(ML) framework which has an ML model guided through physiological models. A support vector regression model guided through modified BMM is implemented for estimation of blood glucose levels. Physical activity and Endogenous glucose production are key factors that contribute in the increased hypoglycemia events thus, this work modifies Bergman Minimal Model ( Bergman et al. 1981) for more accurate estimation of blood glucose levels. Results show that the SVR outperformed BMM by 0.164 average RMSE for 7 different patients in the free-living scenario. This computationally inexpensive data driven model can potentially learn parameters more accurately with time. In conclusion, advised prediction model is promising in modeling the physiology elements in living systems. / Dissertation/Thesis / Masters Thesis Computer Science 2017
|
36 |
Dynamic demand modelling and pricing decision support systems for petroleumFox, David January 2014 (has links)
Pricing decision support systems have been developed in order to help retail companies optimise the prices they set when selling their goods and services. This research aims to enhance the essential forecasting and optimisation techniques that underlie these systems. This is first done by applying the method of Dynamic Linear Models in order to provide sales forecasts of a higher accuracy compared with current methods. Secondly, the method of Support Vector Regression is used to forecast future competitor prices. This new technique aims to produce forecasts of greater accuracy compared with the assumption currentlyused in pricing decision support systems that each competitor's price will simply remain unchanged. Thirdly, when competitor prices aren't forecasted, a new pricing optimisation technique is presented which provides the highest guaranteed profit. Existing pricing decision support systems optimise price assuming that competitor prices will remain unchanged but this optimisation can't be trusted since competitor prices are never actually forecasted. Finally, when competitor prices are forecasted, an exhaustive search of a game-tree is presented as a new way to optimise a retailer's price. This optimisation incorporates future competitor price moves, something which is vital when analysing the success of a pricing strategy but is absent from current pricing decision support systems. Each approach is applied to the forecasting and optimisation of daily retail vehicle fuel pricing using real commercial data, showing the improved results in each case.
|
37 |
Aktiemarknadsprognoser: En jämförande studie av LSTM- och SVR-modeller med olika dataset och epoker / Stock Market Forecasting: A Comparative Study of LSTM and SVR Models Across Different Datasets and EpochsNørklit Johansen, Mads, Sidhu, Jagtej January 2023 (has links)
Predicting stock market trends is a complex task due to the inherent volatility and unpredictability of financial markets. Nevertheless, accurate forecasts are of critical importance to investors, financial analysts, and stakeholders, as they directly inform decision-making processes and risk management strategies associated with financial investments. Inaccurate forecasts can lead to notable financial consequences, emphasizing the crucial and demanding task of developing models that provide accurate and trustworthy predictions. This article addresses this challenging problem by utilizing a long-short term memory (LSTM) model to predict stock market developments. The study undertakes a thorough analysis of the LSTM model's performance across multiple datasets, critically examining the impact of different timespans and epochs on the accuracy of its predictions. Additionally, a comparison is made with a support vector regression (SVR) model using the same datasets and timespans, which allows for a comprehensive evaluation of the relative strengths of the two techniques. The findings offer insights into the capabilities and limitations of both models, thus paving the way for future research in stock market prediction methodologies. Crucially, the study reveals that larger datasets and an increased number of epochs can significantly enhance the LSTM model's performance. Conversely, the SVR model exhibits significant challenges with overfitting. Overall, this research contributes to ongoing efforts to improve financial prediction models and provides potential solutions for individuals and organizations seeking to make accurate and reliable forecasts of stock market trends.
|
38 |
Utilizing Artificial Intelligence to Predict Severe Weather Outbreak Severity in the Contiguous United StatesWilliams, Megan Spade 04 May 2018 (has links)
Severe weather outbreaks are violent weather events that can cause major damage and injury. Unfortunately, forecast models can mistakenly predict the intensity of these events. Frequently, the prediction of outbreaks is inaccurate with regards to their intensity, hindering the efforts of forecasters to confidently inform the public about intensity risks. This research aims to improve outbreak intensity forecasting using severe weather parameters and an outbreak ranking index to predict outbreak intensity. Areal coverage values of gridded severe weather diagnostic variables, computed from the North American Regional Reanalysis (NARR) database for outbreaks spanning 1979 to 2013, will be used as predictors in an artificial intelligence modeling ensemble to predict outbreak intensity. NARR fields will be dynamically downscaled to a National Severe Storms Laboratory-defined WRF 4-km North American domain on which areal coverages will be computed. The research will result in a model that will predict verification information on the model performance.
|
39 |
Predicting Reactor Instability Using Neural NetworksHubert, Hilborn January 2022 (has links)
The study of the instabilities in boiling water reactors is of significant importance to the safety withwhich they can be operated, as they can cause damage to the reactor posing risks to both equipmentand personnel. The instabilities that concern this paper are progressive growths in the oscillatingpower of boiling-water reactors. As thermal power is oscillatory is important to be able to identifywhether or not the power amplitude is stable. The main focus of this paper has been the development of a neural network estimator of these insta-bilities, fitting a non-linear model function to data by estimating it’s parameters. In doing this, theambition was to optimize the networks to the point that it can deliver near ”best-guess” estimationsof the parameters which define these instabilities, evaluating the usefulness of these networks whenapplied to problems like this. The goal was to design both MLP(Multi-Layer Perceptron) and SVR/KRR(Support Vector Regres-sion/Kernel Rigde Regression) networks and improve them to the point that they provide reliableand useful information about the waves in question. This goal was accomplished only in part asthe SVR/KRR networks proved to have some difficulty in ascertaining the phase shift of the waves.Overall, however, these networks prove very useful in this kind of task, succeeding with a reasonabledegree of confidence to calculating the different parameters of the waves studied.
|
40 |
The development and analysis of a computationally efficient data driven suit jacket fit recommendation systemBogdanov, Daniil January 2017 (has links)
In this master thesis work we design and analyze a data driven suit jacket fit recommendation system which aim to guide shoppers in the process of assessing garment fit over the web. The system is divided into two stages. In the first stage we analyze labelled customer data, train supervised learning models as to be able to predict optimal suit jacket dimensions of unseen shoppers and determine appropriate models for each suit jacket dimension. In stage two the recommendation system uses the results from stage one and sorts a garment collection from best fit to least fit. The sorted collection is what the fit recommendation system is to return. In this thesis work we propose a particular design of stage two that aim to reduce the complexity of the system but at a cost of reduced quality of the results. The trade-offs are identified and weighed against each other. The results in stage one show that simple supervised learning models with linear regression functions suffice when the independent and dependent variables align at particular landmarks on the body. If style preferences are also to be incorporated into the supervised learning models, non-linear regression functions should be considered as to account for increased complexity. The results in stage two show that the complexity of the recommendation system can be made independent from the complexity of how fit is assessed. And as technology is enabling for more advanced ways of assessing garment fit, such as 3D body scanning techniques, the proposed design of reducing the complexity of the recommendation system enables for highly complex techniques to be utilized without affecting the responsiveness of the system in run-time. / I detta masterexamensarbete designar och analyserar vi ett datadrivet rekommendationssystem för kavajer med mål att vägleda nät-handlare i deras process i att bedöma passform över internet. Systemet är uppdelat i två steg. I det första steget analyserar vi märkt data och tränar modeller i att lära sig att framställa prognoser av optimala kavajmått för shoppare som inte systemet har tidigare exponeras för. I steg två tar rekommendationssystemet resultatet ifrån steg ett och sorterar plaggkollektionen från bästa till sämsta passform. Den sorterade kollektionen är vad systemet är tänkt att retunera. I detta arbete föreslåar vi en specifik utformning gällande steg två med mål att reducera komplexiteten av systemet men till en kostnad i noggrannhet vad det gäller resultat. För- och nackdelar identifieras och vägs mot varandra. Resultatet i steg två visar att enkla modeller med linjära regressionsfunktioner räcker när de obereoende och beroende variabler sammanfaller på specifika punkter på kroppen. Om stil-preferenser också vill inkorpereras i dessa modeller bör icke-linjära regressionsfunktioner betraktas för att redogöra för den ökade komplexitet som medföljer. Resultaten i steg två visar att komplexiteten av rekommendationssystemet kan göras obereoende av komplexiteten för hur passform bedöms. Och då teknologin möjliggör för allt mer avancerade sätt att bedöma passform, såsom 3D-scannings tekniker, kan mer komplexa tekniker utnyttjas utan att påverka responstiden för systemet under körtid.
|
Page generated in 0.13 seconds