Spelling suggestions: "subject:"conformal prediction"" "subject:"nonformal prediction""
11 |
Comparison of Support Vector Machines and Deep Learning For QSAR with Conformal PredictionDeligianni, Maria January 2022 (has links)
Quantitative Structure Activity Relationship (QSAR) is a very useful computa-tional method which has facilitated great progress in drug development [1]. Thismethod can be used to predict a molecule’s activity against a certain target justby comparing its structural characteristics (i.e., molecular descriptors) with thosebelonging to molecules of known activity. QSAR modeling is fueled by online freedatabases consisting of millions of active and inactive molecules and by MachineLearning (ML) Methods that enable data analysis. To ensure successful implemen-tation of ML models, there is a range of evaluation methods to estimate their perfor-mance and applicability domain. So far, a great deal of research has focused on theuse of Support Vector Machines (SVMs) to classify molecules with the use of theirMolecular Signature Fingerprints as descriptors [2]. However, another MachineLearning algorithm, Deep Neural Networks (DNNs), an improvement of single-layer Neural Networks, is rising in popularity in various fields including moleculeclassification. The two models were compared using CPSign software which intro-duces Conformal Prediction, to evaluate the reliability of model predictions basedon performance for individual compounds rather than mean performance on agiven test set. Three types of descriptors were used: Molecular Signature Finger-prints, Extended Connectivity Fingerprints and physicochemical descriptors. Thecomparison showed that Multilayer Perceptron (MLP) which was used as a DNNrepresentative in current context, had performance similar to the shallower SVMmodels but additionally demanded longer training times [3]. It can be concludedthat in the field of QSAR with the aforementioned descriptors, when the numberof examples used for training is not immense, Support Vector Machines might per-form equally well and demand less resources and time than the more sophisticated MLPs.
|
12 |
Training a Multilayer Perceptron to predict the final selling price of an apartment in co-operative housing society sold in Stockholm city with features stemming from open data / Träning av en “Multilayer Perceptron” att förutsäga försäljningspriset för en bostadsrättslägenhet till försäljning i Stockholm city med egenskaper från öppna datakällorTibell, Rasmus January 2014 (has links)
The need for a robust model for predicting the value of condominiums and houses are becoming more apparent as further evidence of systematic errors in existing models are presented. Traditional valuation methods fail to produce good predictions of condominium sales prices and systematic patterns in the errors linked to for example the repeat sales methodology and the hedonic pricing model have been pointed out by papers referenced in this thesis. This inability can lead to monetary problems for individuals and in worst-case economic crises for whole societies. In this master thesis paper we present how a predictive model constructed from a multilayer perceptron can predict the price of a condominium in the centre of Stockholm using objective data from sources publicly available. The value produced by the model is enriched with a predictive interval using the Inductive Conformal Prediction algorithm to give a clear view of the quality of the prediction. In addition, the Multilayer Perceptron is compared with the commonly used Support Vector Regression algorithm to underline the hallmark of neural networks handling of a broad spectrum of features. The features used to construct the Multilayer Perceptron model are gathered from multiple “Open Data” sources and includes data as: 5,990 apartment sales prices from 2011- 2013, interest rates for condominium loans from two major banks, national election results from 2010, geographic information and nineteen local features. Several well-known techniques of improving performance of Multilayer Perceptrons are applied and evaluated. A Genetic Algorithm is deployed to facilitate the process of determine appropriate parameters used by the backpropagation algorithm. Finally, we conclude that the model created as a Multilayer Perceptron using backpropagation can produce good predictions and outperforms the results from the Support Vector Regression models and the studies in the referenced papers. / Behovet av en robust modell för att förutsäga värdet på bostadsrättslägenheter och hus blir allt mer uppenbart alt eftersom ytterligare bevis på systematiska fel i befintliga modeller läggs fram. I artiklar refererade i denna avhandling påvisas systematiska fel i de estimat som görs av metoder som bygger på priser från repetitiv försäljning och hedoniska prismodeller. Detta tillkortakommandet kan leda till monetära problem för individer och i värsta fall ekonomisk kris för hela samhällen. I detta examensarbete påvisar vi att en prediktiv modell konstruerad utifrån en “Multilayer Perceptron” kan estimera priset på en bostadsrättslägenhet i centrala Stockholm baserad på allmänt tillgängligt data (“Öppen Data”). Modellens resultat har utökats med ett prediktivt intervall beräknat utifrån “Inductive Conformal Prediction”- algoritmen som ger en klar bild över estimatets tillförlitlighet. Utöver detta jämförs “Multilayer Perceptron”-algoritmen med en annan vanlig algoritm för maskinlärande, den så kallade “Support Vector Regression” för att påvisa neurala nätverks kvalité och förmåga att hantera dataset med många variabler. De variabler som används för att konstruera “Multilayer Perceptron”-modellen är sammanställda utifrån allmänt tillgängliga öppna datakällor och innehåller information så som: priser från 5990 sålda lägenheter under perioden 2011- 2013, ränteläget för bostadsrättslån från två av de stora bankerna, valresultat från riksdagsvalet 2010, geografisk information och nitton lokala särdrag. Ett flertal välkända förbättringar för “Multilayer Perceptron”-algoritmen har applicerats och evaluerats. En genetisk algoritm har använts för att stödja processen att hitta lämpliga parametrar till “Backpropagation”-algoritmen. I detta arbete drar vi slutsatsen att modellen kan producera goda förutsägelser med en modell konstruerad utifrån ett neuralt nätverk av typen “Multilayer Perceptron” beräknad med “backpropagation”, och därmed utklassar de resultat som levereras av Support Vector Regression modellen och de studier som refererats i denna avhandling
|
13 |
Training Machine Learning-based QSAR models with Conformal Prediction on Experimental Data from DNA-Encoded Chemical LibrariesGeylan, Gökçe January 2021 (has links)
DNA-encoded chemical libraries (DEL) allows an exhaustive chemical space sampling with a large-scale data consisting of compounds produced through combinatorial synthesis. This novel technology was utilized in the early drug discovery stages for robust hit identification and lead optimization. In this project, the aim was to build a Machine Learning- based QSAR model with conformal prediction for hit identification on two different target proteins, the DEL was assayed on. An initial investigation was conducted on a pilot project with 1000 compounds and the analyses and the conclusions drawn from this part were later applied to a larger dataset with 1.2 million compounds. With this classification model, the prediction of the compound activity in the DEL as well as in an external dataset was aimed to be analyzed with identification of the top hits to evaluate model’s performance and applicability. Support Vector Machine (SVM) and Random Forest (RF) models were built on both the pilot and the main datasets with different descriptor sets of Signature Fingerprints, RDKIT and CDK. In addition, an Autoencoder was used to supply data-driven descriptors on the pilot data as well. The Libsvm and the Liblinear implementations were explored and compared based on the models’ performances. The comparisons were made by considering the key concepts of conformal prediction such as the trade-off between validity and efficiency, observed fuzziness and the calibration against a range of significance levels. The top hits were determined by two sorting methods, credibility and p-value differences between the binary classes. The assignment of correct single-labels to the true actives over a wide range of significance levels regardless of the similarity of the test compounds to the training set was confirmed for the models. Furthermore, an accumulation of these true actives in the models’ top hit selections was observed according to the latter sorting method and additional investigations on the similarity and the building block enrichments in the top 50 and 100 compounds were conducted. The Tanimoto similarity demonstrated the model’s predictive power in selecting structurally dissimilar compounds while the building block enrichment analysis showed the selectivity of the binding pocket where the target protein B was determined to be more selective. All of these comparison methods enabled an extensive study on the model evaluation and performance. In conclusion, the Liblinear model with the Signature Fingerprints was concluded to give the best model performance for both the pilot and the main datasets with the considerations of the model performances and the computational power requirements. However, an external set prediction was not successful due to the low structural diversity in the DEL which the model was trained on.
|
14 |
Regression and time estimation in the manufacturing industryBjernulf, Walter January 2023 (has links)
In this thesis an analysis is performed on operation times for different sized products in a manufacturing company. The thesis will introduce and summarise most of the theory needed to perform regression and also cover a worked example where three different regression models are learned, evaluated and analysed. Conformal prediction, which at the moment is a hot topic in machine learning, will also be introduced and will be used in the worked example.
|
15 |
Conformal prediction of air pollution concentrations for the Barcelona Metropolitan RegionIvina, Olga 20 November 2012 (has links)
This thesis is aimed to introduce a newly developed machine learning method, conformal predictors, for air pollution assessment. For the given area of study, the Barcelona Metropolitan Region (BMR), several conformal prediction models have been developed. These models use the specification which is called ridge regression confidence machine (RRCM). The conformal predictors that have been developed for the purposes of the present study are ridge regression models, and they always provide valid predictions. Instead of a point prediction, a conformal predictor outputs a prediction set, which is usually an interval. It is desired that these sets would be as small as possible.
The underlying algorithm for the conformal predictors derived in this thesis is ordinary kriging. A kriging-based conformal predictor can capture spatial distribution of the data with the use of so-called "kernel trick" / Aquest treball està destinat a introduir el nou mètode de les màquines d'aprenentatge, els predictors de conformació, per l'avaluació de la contaminació de l'aire a la Regió Metropolitana de Barcelona (RMB). Es fa servir l'especificació anomenada màquina de confiança de la regressió cresta (RRCM). Els predictors de conformació que s'han desenvolupat per les finalitats d'aquest estudi són uns models de regressió cresta, que sempre ofereixen prediccions vàlides. Un predictor de conformació genera un conjunt de predicció, que és gairebé sempre un interval, i la intenció és que sigui el més petit possible.
L'algorisme subjacent dels predictors de conformació derivats i discutits al llarg d'aquesta tesi és el kriging. El predictor de conformació basat en el kriging ordinari pot capturar la distribució espacial mitjançant una tècnica que es diu "el truc del nucli" ("kernel trick")
|
16 |
Analyzing Cell Painting images using different CNNs and Conformal Prediction variations : Optimization of a Deep Learning model to predict the MoA of different drugsHillver, Anna January 2022 (has links)
Microscopy imaging based techniques, such as the Cell Painting assay, could be used to generate images that visualize the Mechanism of Action (MoA) of a drug, which could be of great use in drug development. In order to extract information and predict the MoA of a new compound from these images we need powerful image analysis tools. The purpose with this project is to further develop a Deep Learning model to predict the MoA of different drugs from Cell Painting images using Convolutional Neural Networks (CNNs) and Conformal Prediction. The specific task was to compare the accuracy of different CNN architectures and to compare the efficiency of different nonconformity functions. During the project the CNN architectures ResNet50, ResNet101 and DenseNet121 were compared as well as the nonconformity functions Inverse Probability, Margin and a combination of them both. No significant difference in accuracy between the CNNs and no difference in efficiency between the nonconformity functions was measured. The results showed that the model could predict the MoA of a compound with high accuracy when all compounds were used both in training, validation and test of the model, which validates the implementations. However, it is desirable for the model to be able to predict the MoA of a new compound if the model has been trained on other compounds with the same MoA. This could not be confirmed through this project and the model needs to be further investigated and tested with another dataset in order to be used for that purpose.
|
17 |
Maskininlärning med konform förutsägelse för prediktiva underhållsuppgifter i industri 4.0 / Machine Learning with Conformal Prediction for Predictive Maintenance tasks in Industry 4.0 : Data-driven ApproachLiu, Shuzhou, Mulahuko, Mpova January 2023 (has links)
This thesis is a cooperation with Knowit, Östrand \& Hansen, and Orkla. It aimed to explore the application of Machine Learning and Deep Learning models with Conformal Prediction for a predictive maintenance situation at Orkla. Predictive maintenance is essential in numerous industrial manufacturing scenarios. It can help to reduce machine downtime, improve equipment reliability, and save unnecessary costs. In this thesis, various Machine Learning and Deep Learning models, including Decision Tree, Random Forest, Support Vector Regression, Gradient Boosting, and Long short-term memory, are applied to a real-world predictive maintenance dataset. The Orkla dataset was originally planned to use in this thesis project. However, due to some challenges met and time limitations, one NASA C-MAPSS dataset with a similar data structure was chosen to study how Machine Learning models could be applied to predict the remaining useful lifetime (RUL) in manufacturing. Besides, conformal prediction, a recently developed framework to measure the prediction uncertainty of Machine Learning models, is also integrated into the models for more reliable RUL prediction. The thesis project results show that both the Machine Learning and Deep Learning models with conformal prediction could predict RUL closer to the true RUL while LSTM outperforms the Machine Learning models. Also, the conformal prediction intervals provide informative and reliable information about the uncertainty of the predictions, which can help inform personnel at factories in advance to take necessary maintenance actions. Overall, this thesis demonstrates the effectiveness of utilizing machine learning and Deep Learning models with Conformal Prediction for predictive maintenance situations. Moreover, based on the modeling results of the NASA dataset, some insights are discussed on how to transfer these experiences into Orkla data for RUL prediction in the future.
|
18 |
Reliable graph predictions : Conformal prediction for Graph Neural NetworksBååw, Albin January 2022 (has links)
We have seen a rapid increase in the development of deep learning algorithms in recent decades. However, while these algorithms have unlocked new business areas and led to great development in many fields, they are usually limited to Euclidean data. Researchers are increasingly starting to find out that they can better represent the data used in many real-life applications as graphs. Examples include high-risk domains such as finding the side effects when combining medicines using a protein-protein network. In high-risk domains, there is a need for trust and transparency in the results returned by deep learning algorithms. In this work, we explore how we can quantify uncertainty in Graph Neural Network predictions using conventional methods for conformal prediction as well as novel methods exploiting graph connectivity information. We evaluate the methods on both static and dynamic graphs and find that neither of the novel methods offers any clear benefits over the conventional methods. However, we see indications that using the graph connectivity information can lead to more efficient conformal predictors and a lower prediction latency than the conventional methods on large data sets. We propose that future work extend the research on using the connectivity information, specifically the node embeddings, to boost the performance of conformal predictors on graphs. / De senaste årtiondena har vi sett en drastiskt ökad utveckling av djupinlärningsalgoritmer. Även fast dessa algoritmer har skapat nya potentiella affärsområden och har även lett till nya upptäckter i flera andra fält, är dessa algoritmer dessvärre oftast begränsade till Euklidisk data. Samtidigt ser vi att allt fler forskare har upptäckt att data i verklighetstrogna applikationer oftast är bättre representerade i form av grafer. Exempel inkluderar hög-risk domäner som läkemedelsutveckling, där man förutspår bieffekter från mediciner med hjälp av protein-protein nätverk. I hög-risk domäner finns det ett krav på tillit och att resultaten från djupinlärningsalgoritmer är transparenta. I den här tesen utforskar vi hur man kan kvantifiera osäkerheten i resultaten hos Neurala Nätverk för grafer (eng. Graph Neural Networks) med hjälp av konform prediktion (eng. Conformal Prediction). Vi testar både konventionella metoder för konform prediktion, samt originella metoder som utnyttjar strukturell information från grafen. Vi utvärderar metoderna både på statiska och dynamiska grafer, och vi kommer fram till att de originella metoderna varken är bättre eller sämre än de konventionella metoderna. Däremot finner vi indikationer på att användning av den strukturella informationen från grafen kan leda till effektivare prediktorer och till lägre svarstid än de konventionella metoderna när de används på stora grafer. Vi föreslår att framtida arbete i området utforskar vidare hur den strukturella informationen kan användas, och framförallt nod representationerna, kan användas för att öka prestandan i konforma prediktorer för grafer.
|
Page generated in 0.113 seconds