• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 15
  • 3
  • 3
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 26
  • 26
  • 7
  • 6
  • 5
  • 5
  • 5
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Evaluation et application de méthodes de criblage in silico / Evaluation and application of virtual screening methods

Guillemain, Hélène 25 October 2012 (has links)
Lors de la conception de médicaments, le criblage in silico est de plus en plus utilisé et lesméthodes disponibles nécessitent d'être évaluées. L'évaluation de 8 méthodes a mis enévidence l'efficacité des méthodes de criblage in silico et des problèmes de construction de labanque d'évaluation de référence (DUD), la conformation choisie pour les sites de liaisonn'étant pas toujours adaptée à tous les actifs. La puissance informatique actuelle le permettant,plusieurs structures expérimentales ont été choisies pour tenter de mimer la flexibilité dessites de liaison. Un autre problème a été mis en évidence : les métriques d'évaluation desméthodes souffrent de biais. De nouvelles métriques ont donc été proposées, telles queBEDROC et RIE. Une autre alternative est proposée ici, mesurant la capacité prédictive d'uneméthode en actifs. Enfin, une petite molécule active sur le TNFα in vitro et in vivo sur souris aété identifiée par un protocole de criblage in silico. Ainsi, malgré le besoin d'amélioration desméthodes, le criblage in silico peut être d'un important soutien à l'identification de nouvellesmolécules a visée thérapeutique. / Since the introduction of virtual screening in the drug discovery process, the number ofvirtual screening methods has been increasing and available methods have to be evaluated.In this work, eight virtual screening methods were evaluated in the DUD database, showingadequate efficiency. This also revealed some shortcomings of the DUD database as thebinding site conformation used in the DUD was not relevant for all the actives.As computational power now permits to address this issue, classical docking runs have beenperformed on several X-ray structures, used to represent the binding site flexibility. This alsorevealed that evaluation metrics show some biases. New evaluation metrics have thus beenproposed, e.g. BEDROC and RIE. An alternative method was also proposed usingpredictiveness curves, based on compound activity probabilityFinally, a virtual screening procedure has been applied to TNFa. A small molecule inhibitor,showing in vitro and in vivo activity in mice, has been identified. This demonstrated the valueof virtual screening for the drug discovery process, although virtual screening methods needto be improved.
12

Aplikace procesní analýzy při řízení kvality a testování software / Application of the process analysis in quality assurance and software testing

Popelka, Vladimír January 2011 (has links)
This thesis deals with questions regarding quality assurance and software testing. The subject of its theoretical part is the specification of the general concept of quality, description of standards used in the field of software product quality evaluation and finally the evaluation of software development process itself. The thesis intends to introduce the theoretical framework of software quality assurance, especially the detailed analysis of the whole software testing branch. An added value to the theoretical part constitutes the characterization of procedural approach and selected methods used towards the improvement of processes. The practical part of the thesis comprises of the exemplification -- it shows the procedural approach at software quality management, applied to a selected IT company. The main aim of the practical part is to create a purposeful project for optimization of quality assurance and software testing processes. The core of the matter is to accomplish the process analysis of the present condition of software testing methodology. For the purpose of process analysis and optimization project, the models of key processes will be created; these processes will then be depicted based on defined pattern. The description of the state-of-the-art of software product quality assurance processes is further supplemented by the evaluation of such processes maturity. The project for optimization of software testing and quality assurance processes comes from the process analysis of the present condition of software testing methodology, as well as from the evaluation of procedural models maturity. The essence of processes optimization is the incorporation of change requests and innovative intentions of individual processes into the resulting state of methodology draft. For the measurement of selected quality assurance and software testing processes, the configuration of efficiency indicators and their application on particular processes is implemented. The research on the of the state-of-the-art, as well as the elaboration of this whole project for optimization of software testing and quality assurance processes runs in conformity with the principles of DMAIC model of Six Sigma method.
13

Temporally-Embedded Deep Learning Model for Health Outcome Prediction

Boursalie, Omar January 2021 (has links)
Deep learning models are increasingly used to analyze health records to model disease progression. Two characteristics of health records present challenges to developers of deep learning-based medical systems. First, the veracity of the estimation of missing health data must be evaluated to optimize the performance of deep learning models. Second, the currently most successful deep learning diagnostic models, called transformers, lack a mechanism to analyze the temporal characteristics of health records. In this thesis, these two challenges are investigated using a real-world medical dataset of longitudinal health records from 340,143 patients over ten years called MIIDD: McMaster Imaging Information and Diagnostic Dataset. To address missing data, the performance of imputation models (mean, regression, and deep learning) were evaluated on a real-world medical dataset. Next, techniques from adversarial machine learning were used to demonstrate how imputation can have a cascading negative impact on a deep learning model. Then, the strengths and limitations of evaluation metrics from the statistical literature (qualitative, predictive accuracy, and statistical distance) to evaluate deep learning-based imputation models were investigated. This research can serve as a reference to researchers evaluating the impact of imputation on their deep learning models. To analyze the temporal characteristics of health records, a new model was developed and evaluated called DTTHRE: Decoder Transformer for Temporally-Embedded Health Records Encoding. DTTHRE predicts patients' primary diagnoses by analyzing their medical histories, including the elapsed time between visits. The proposed model successfully predicted patients' primary diagnosis in their final visit with improved predictive performance (78.54 +/- 0.22%) compared to existing models in the literature. DTTHRE also increased the training examples available from limited medical datasets by predicting the primary diagnosis for each visit (79.53 +/- 0.25%) with no additional training time. This research contributes towards the goal of disease predictive modeling for clinical decision support. / Dissertation / Doctor of Philosophy (PhD) / In this thesis, two challenges using deep learning models to analyze health records are investigated using a real-world medical dataset. First, an important step in analyzing health records is to estimate missing data. We investigated how imputation can have a cascading negative impact on a deep learning model's performance. A comparative analysis was then conducted to investigate the strengths and limitations of evaluation metrics from the statistical literature to assess deep learning-based imputation models. Second, the most successful deep learning diagnostic models to date, called transformers, lack a mechanism to analyze the temporal characteristics of health records. To address this gap, we developed a new temporally-embedded transformer to analyze patients' medical histories, including the elapsed time between visits, to predict their primary diagnoses. The proposed model successfully predicted patients' primary diagnosis in their final visit with improved predictive performance (78.54 +/- 0.22%) compared to existing models in the literature.
14

Evaluating volatility forecasts, A study in the performance of volatility forecasting methods / Utvärdering av volatilitetsprognoser, En undersökning av kvaliteten av metoder för volatilitetsprognostisering

Verhage, Billy January 2023 (has links)
In this thesis, the foundations of evaluating the performance of volatility forecasting methods are explored, and a mathematical framework is created to determine the overall forecasting performance based on observed daily returns across multiple financial instruments. Multiple volatility responses are investigated, and theoretical corrections are derived under the assumption that the log returns follow a normal distribution. Performance measures that are independent of the long-term volatility profile are explored and tested. Well-established volatility forecasting methods, such as moving average and GARCH (p,q) models, are implemented and validated on multiple volatility responses. The obtained results reveal no significant difference in the performances between the moving average and GARCH (1,1) volatility forecast. However, the observed non-zero bias and a separate analysis of the distribution of the log returns reveal that the theoretically derived corrections are insufficient in correcting the not-normally distributed log returns. Furthermore, it is observed that there is a high dependency of abslute performances on the considered evaluation period, suggesting that comparisons between periods should not be made. This study is limited by the fact that the bootstrapped confidence regions are ill-suited for determining significant performance differences between forecasting methods. In future work, statistical significance can be gained by bootstrapping the difference in performance measures. Furthermore, a more in-depth analysis is needed to determine more appropriate theoretical corrections for the volatility responses based on the observed distribution of the log returns. This will increase the overall forecasting performance and improve the overall quality of the evaluation framework. / I detta arbete utforskas grunderna för utvärdering av prestandan av volatilitetsprognoser och ett matematiskt ramverk skapas för att bestämma den övergripande prestandan baserat på observerade dagliga avkastningar för flera finansiella instrument. Ett antal volatilitetsskattningar undersökts och teoretiska korrigeringar härleds under antagandet att log-avkastningen följer en normalfördelningen. Prestationsmått som är oberoende av den långsiktiga volatilitetsprofilen utforskas och testas. Väletablerare metoder för volatilitetsprognostisering, såsom glidande medelvärden och GARCH-modeller, implementeras och utvärderas mot flera volatilitetsskattningar. De erhållna resultaten visar att det inte finns någon signifikant skillnad i prestation mellan prognoser producerade av det glidande medelvärdet och GARCH (1,1). Det observerade icke-noll bias och en separat analys av fördelningen av log-avkastningen visar dock att de teoretiskt härledda korrigeringarna är otillräckliga för att fullständigt korrigera volatilitesskattningarna under icke-normalfördelade log-avkastningar. Dessutom observeras att det finns ett stort beroende på den använda utvärderingsperioden, vilket tyder på att jämförelser mellan perioder inte bör göras. Denna studie är begränsad av det faktum att de använda bootstrappade konfidensregionerna inte är lämpade för att fastställa signifikanta skillnader i prestanda mellan prognosmetoder. I framtida arbeten behövs fortsatt analys för att bestämma mer lämpliga teoretiska korrigeringar för volatilitetsskattningarna baserat på den observerade fördelningen av log-avkastningen. Detta kommer att öka den övergripande prestandan och förbättra den övergripande kvaliteten på prognoserna.
15

Algorithmes de correspondance et superpixels pour l’analyse et le traitement d’images / Matching algorithms and superpixels for image analysis and processing

Giraud, Remi 29 November 2017 (has links)
Cette thèse s’intéresse à diverses composantes du traitement et de l’analyse d’images par méthodes non locales. Ces méthodes sont basées sur la redondance d’information présente dans d’autres images, et utilisent des algorithmes de recherche de correspondance, généralement basés sur l’utilisation patchs, pour extraire et transférer de l’information depuis ces images d’exemples. Ces approches, largement utilisées par la communauté de vision par ordinateur, sont souvent limitées par le temps de calcul de l’algorithme de recherche, appliqué à chaque pixel, et par la nécessité d’effectuer un prétraitement ou un apprentissage pour utiliser de grandes bases de données.Pour pallier ces limites, nous proposons plusieurs méthodes générales, sans apprentissage,rapides, et qui peuvent être facilement adaptées à diverses applications de traitement et d’analyse d’images naturelles ou médicales. Nous introduisons un algorithme de recherche de correspondances permettant d’extraire rapidement des patchs d’une grande bibliothèque d’images 3D, que nous appliquons à la segmentation d’images médicales. Pour utiliser de façon similaire aux patchs,des présegmentations en superpixels réduisant le nombre d’éléments de l’image,nous présentons une nouvelle structure de voisinage de superpixels. Ce nouveau descripteur permet d’utiliser efficacement les superpixels dans des approches non locales. Nous proposons également une méthode de décomposition régulière et précise en superpixels. Nous montrons comment évaluer cette régularité de façon robuste, et que celle-ci est nécessaire pour obtenir de bonnes performances de recherche de correspondances basées sur les superpixels. / This thesis focuses on several aspects of image analysis and processing with non local methods. These methods are based on the redundancy of information that occurs in other images, and use matching algorithms, that are usually patch-based, to extract and transfer information from the example data. These approaches are widely used by the computer vision community, and are generally limited by the computational time of the matching algorithm, applied at the pixel scale, and by the necessity to perform preprocessing or learning steps to use large databases. To address these issues, we propose several general methods, without learning, fast, and that can be easily applied to different image analysis and processing applications on natural and medical images. We introduce a matching algorithm that enables to quickly extract patches from a large library of 3D images, that we apply to medical image segmentation. To use a presegmentation into superpixels that reduces the number of image elements, in a way that is similar to patches, we present a new superpixel neighborhood structure. This novel descriptor enables to efficiently use superpixels in non local approaches. We also introduce an accurate and regular superpixel decomposition method. We show how to evaluate this regularity in a robust manner, and that this property is necessary to obtain good superpixel-based matching performances.
16

Approches anytime et distribuées pour l'appariment de graphes / Anytime and distributed approaches for graph matching

Abu-Aisheh, Zeina 25 May 2016 (has links)
En raison de la capacité et de l'amélioration des performances informatiques, les représentations structurelles sont devenues de plus en plus populaires dans le domaine de la reconnaissance de formes (RF). Quand les objets sont structurés à base de graphes, le problme de la comparaison d'objets revient à un problme d'appariement de graphes (Graph Matching). Au cours de la dernière décennie, les chercheurs travaillant dans le domaine de l'appariement de graphes ont porté une attention particulière à la distance d'édition entre graphes (GED), notamment pour sa capacité à traiter différent types de graphes. GED a été ainsi appliquée sur des problématiques spécifiques qui varient de la reconnaissance de molécules à la classi fication d'images. / Due to the inherent genericity of graph-based representations, and thanks to the improvement of computer capacities, structural representations have become more and more popular in the field of Pattern Recognition (PR). In a graph-based representation, vertices and their attributes describe objects (or part of them) while edges represent interrelationships between the objects. Representing objects by graphs turns the problem of object comparison into graph matching (GM) where correspondences between vertices and edges of two graphs have to be found.
17

Design and Analysis of Consistent Algorithms for Multiclass Learning Problems

Harish, Guruprasad Ramaswami January 2015 (has links) (PDF)
We consider the broad framework of supervised learning, where one gets examples of objects together with some labels (such as tissue samples labeled as cancerous or non-cancerous, or images of handwritten digits labeled with the correct digit in 0-9), and the goal is to learn a prediction model which given a new object, makes an accurate prediction. The notion of accuracy depends on the learning problem under study and is measured by a performance measure of interest. A supervised learning algorithm is said to be 'statistically consistent' if it returns an `optimal' prediction model with respect to the desired performance measure in the limit of infinite data. Statistical consistency is a fundamental notion in supervised machine learning, and therefore the design of consistent algorithms for various learning problems is an important question. While this has been well studied for simple binary classification problems and some other specific learning problems, the question of consistent algorithms for general multiclass learning problems remains open. We investigate several aspects of this question as detailed below. First, we develop an understanding of consistency for multiclass performance measures defined by a general loss matrix, for which convex surrogate risk minimization algorithms are widely used. Consistency of such algorithms hinges on the notion of 'calibration' of the surrogate loss with respect to target loss matrix; we start by developing a general understanding of this notion, and give both necessary conditions and sufficient conditions for a surrogate loss to be calibrated with respect to a target loss matrix. We then define a fundamental quantity associated with any loss matrix, which we term the `convex calibration dimension' of the loss matrix; this gives one measure of the intrinsic difficulty of designing convex calibrated surrogates for a given loss matrix. We derive lower bounds on the convex calibration dimension which leads to several new results on non-existence of convex calibrated surrogates for various losses. For example, our results improve on recent results on the non-existence of low dimensional convex calibrated surrogates for various subset ranking losses like the pairwise disagreement (PD) and mean average precision (MAP) losses. We also upper bound the convex calibration dimension of a loss matrix by its rank, by constructing an explicit, generic, least squares type convex calibrated surrogate, such that the dimension of the surrogate is at most the (linear algebraic) rank of the loss matrix. This yields low-dimensional convex calibrated surrogates - and therefore consistent learning algorithms - for a variety of structured prediction problems for which the associated loss is of low rank, including for example the precision @ k and expected rank utility (ERU) losses used in subset ranking problems. For settings where achieving exact consistency is computationally difficult, as is the case with the PD and MAP losses in subset ranking, we also show how to extend these surrogates to give algorithms satisfying weaker notions of consistency, including both consistency over restricted sets of probability distributions, and an approximate form of consistency over the full probability space. Second, we consider the practically important problem of hierarchical classification, where the labels to be predicted are organized in a tree hierarchy. We design a new family of convex calibrated surrogate losses for the associated tree-distance loss; these surrogates are better than the generic least squares surrogate in terms of easier optimization and representation of the solution, and some surrogates in the family also operate on a significantly lower dimensional space than the rank of the tree-distance loss matrix. These surrogates, which we term the `cascade' family of surrogates, rely crucially on a new understanding we develop for the problem of multiclass classification with an abstain option, for which we construct new convex calibrated surrogates that are of independent interest by themselves. The resulting hierarchical classification algorithms outperform the current state-of-the-art in terms of both accuracy and running time. Finally, we go beyond loss-based multiclass performance measures, and consider multiclass learning problems with more complex performance measures that are nonlinear functions of the confusion matrix and that cannot be expressed using loss matrices; these include for example the multiclass G-mean measure used in class imbalance settings and the micro F1 measure used often in information retrieval applications. We take an optimization viewpoint for such settings, and give a Frank-Wolfe type algorithm that is provably consistent for any complex performance measure that is a convex function of the entries of the confusion matrix (this includes the G-mean, but not the micro F1). The resulting algorithms outperform the state-of-the-art SVMPerf algorithm in terms of both accuracy and running time. In conclusion, in this thesis, we have developed a deep understanding and fundamental results in the theory of supervised multiclass learning. These insights have allowed us to develop computationally efficient and statistically consistent algorithms for a variety of multiclass learning problems of practical interest, in many cases significantly outperforming the state-of-the-art algorithms for these problems.
18

How Well Can Saliency Models Predict Fixation Selection in Scenes Beyond Central Bias? A New Approach to Model Evaluation Using Generalized Linear Mixed Models

Nuthmann, Antje, Einhäuser, Wolfgang, Schütz, Immo 22 January 2018 (has links) (PDF)
Since the turn of the millennium, a large number of computational models of visual salience have been put forward. How best to evaluate a given model's ability to predict where human observers fixate in images of real-world scenes remains an open research question. Assessing the role of spatial biases is a challenging issue; this is particularly true when we consider the tendency for high-salience items to appear in the image center, combined with a tendency to look straight ahead (“central bias”). This problem is further exacerbated in the context of model comparisons, because some—but not all—models implicitly or explicitly incorporate a center preference to improve performance. To address this and other issues, we propose to combine a-priori parcellation of scenes with generalized linear mixed models (GLMM), building upon previous work. With this method, we can explicitly model the central bias of fixation by including a central-bias predictor in the GLMM. A second predictor captures how well the saliency model predicts human fixations, above and beyond the central bias. By-subject and by-item random effects account for individual differences and differences across scene items, respectively. Moreover, we can directly assess whether a given saliency model performs significantly better than others. In this article, we describe the data processing steps required by our analysis approach. In addition, we demonstrate the GLMM analyses by evaluating the performance of different saliency models on a new eye-tracking corpus. To facilitate the application of our method, we make the open-source Python toolbox “GridFix” available.
19

Försäljningsprediktion : en jämförelse mellan regressionsmodeller / Sales prediction : a comparison between regression models

Fridh, Anton, Sandbecker, Erik January 2021 (has links)
Idag finns mängder av företag i olika branscher, stora som små, som vill förutsäga sin försäljning. Det kan bland annat bero på att de vill veta hur stort antal produkter de skall köpa in eller tillverka, och även vilka produkter som bör investeras i över andra. Vilka varor som är bra att investera i på kort sikt och vilka som är bra på lång sikt. Tidigare har detta gjorts med intuition och statistik, de flesta vet att skidjackor inte säljer så bra på sommaren, eller att strandprylar inte säljer bra under vintern. Det här är ett simpelt exempel, men hur blir det när komplexiteten ökar, och det finns ett stort antal produkter och butiker? Med hjälp av maskininlärning kan ett sånt här problem hanteras. En maskininlärningsalgoritm appliceras på en tidsserie, som är en datamängd med ett antal ordnade observationer vid olika tidpunkter under en viss tidsperiod. I den här studiens fall är detta försäljning av olika produkter som säljs i olika butiker och försäljningen ska prediceras på månadsbasis. Tidsserien som behandlas är ett dataset från Kaggle.com som kallas för “Predict Future Sales”. Algoritmerna som används i för den här studien för att hantera detta tidsserieproblem är XGBoost, MLP och MLR. XGBoost, MLR och MLP har i tidigare forskning gett bra resultat på liknande problem, där bland annat bilförsäljning, tillgänglighet och efterfrågan på taxibilar och bitcoin-priser legat i fokus. Samtliga algoritmer presterade bra utifrån de evalueringsmått som användes för studierna, och den här studien använder samma evalueringsmått. Algoritmernas prestation beskrivs enligt så kallade evalueringsmått, dessa är R², MAE, RMSE och MSE. Det är dessa mått som används i resultat- och diskussionskapitlen för att beskriva hur väl algoritmerna presterar. Den huvudsakliga forskningsfrågan för studien lyder därför enligt följande: Vilken av algoritmerna MLP, XGBoost och MLR kommer att prestera bäst enligt R², MAE, RMSE och MSE på tidsserien “Predict Future Sales”. Tidsserien behandlas med ett känt tillvägagångssätt inom området som kallas CRISP-DM, där metodens olika steg följs. Dessa steg innebär bland annat dataförståelse, dataförberedelse och modellering. Denna metod är vad som i slutändan leder till resultatet, där resultatet från de olika modellerna som skapats genom CRISP-DM presenteras. I slutändan var det MLP som fick bäst resultat enligt mätvärdena, följt av MLR och XGBoost. MLP fick en RMSE på 0.863, MLR på 1.233 och XGBoost på 1.262 / Today, there are a lot of companies in different industries, large and small, that want to predict their sales. This may be due, among other things, to the fact that they want to know how many products they should buy or manufacture, and also which products should be invested in over others. In the past, this has been done with intuition and statistics. Most people know that ski jackets do not sell so well in the summer, or that beach products do not sell well during the winter. This is a simple example, but what happens when complexity increases, and there are a large number of products and stores? With the help of machine learning, a problem like this can be managed easier. A machine learning algorithm is applied to a time series, which is a set of data with several ordered observations at different times during a certain time period. In the case of this study, it is the sales of different products sold in different stores, and sales are to be predicted on a monthly basis. The time series in question is a dataset from Kaggle.com called "Predict Future Sales". The algorithms used in this study to handle this time series problem are XGBoost, MLP and MLR. XGBoost, MLR and MLP. These have in previous research performed well on similar problems, where, among other things, car sales, availability and demand for taxis and bitcoin prices were in focus. All algorithms performed well based on the evaluation metrics used by the studies, and this study uses the same evaluation metrics. The algorithms' performances are described according to so-called evaluation metrics, these are R², MAE, RMSE and MSE. These measures are used in the results and discussion chapters to describe how well the algorithms perform. The main research question for the study is therefore as follows: Which of the algorithms MLP, XGBoost and MLR will perform best according to R², MAE, RMSE and MSE on the time series "Predict Future Sales". The time series is treated with a known approach called CRISP-DM, where the methods are followed in different steps. These steps include, among other things, data understanding, data preparation and modeling. This method is what ultimately leads to the results, where the results from the various models created by CRISP-DM are presented. In the end, it was the MLP algorithm that got the best results according to the measured values, followed by MLR and XGBoost. MLP got an RMSE of 0.863, MLR of 1,233 and XGBoost of 1,262
20

Navigating the Metric Zoo: Towards a More Coherent Model For Quantitative Evaluation of Generative ML Models

Dozier, Robbie 26 August 2022 (has links)
No description available.

Page generated in 0.1172 seconds