• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 54
  • 11
  • 4
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 94
  • 94
  • 61
  • 26
  • 26
  • 23
  • 17
  • 15
  • 14
  • 13
  • 11
  • 10
  • 10
  • 9
  • 8
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Learning algorithms and statistical software, with applications to bioinformatics

Hocking, Toby Dylan 20 November 2012 (has links) (PDF)
Statistical machine learning is a branch of mathematics concerned with developing algorithms for data analysis. This thesis presents new mathematical models and statistical software, and is organized into two parts. In the first part, I present several new algorithms for clustering and segmentation. Clustering and segmentation are a class of techniques that attempt to find structures in data. I discuss the following contributions, with a focus on applications to cancer data from bioinformatics. In the second part, I focus on statistical software contributions which are practical for use in everyday data analysis.
52

Software defect prediction using maximal information coefficient and fast correlation-based filter feature selection

Mpofu, Bongeka 12 1900 (has links)
Software quality ensures that applications that are developed are failure free. Some modern systems are intricate, due to the complexity of their information processes. Software fault prediction is an important quality assurance activity, since it is a mechanism that correctly predicts the defect proneness of modules and classifies modules that saves resources, time and developers’ efforts. In this study, a model that selects relevant features that can be used in defect prediction was proposed. The literature was reviewed and it revealed that process metrics are better predictors of defects in version systems and are based on historic source code over time. These metrics are extracted from the source-code module and include, for example, the number of additions and deletions from the source code, the number of distinct committers and the number of modified lines. In this research, defect prediction was conducted using open source software (OSS) of software product line(s) (SPL), hence process metrics were chosen. Data sets that are used in defect prediction may contain non-significant and redundant attributes that may affect the accuracy of machine-learning algorithms. In order to improve the prediction accuracy of classification models, features that are significant in the defect prediction process are utilised. In machine learning, feature selection techniques are applied in the identification of the relevant data. Feature selection is a pre-processing step that helps to reduce the dimensionality of data in machine learning. Feature selection techniques include information theoretic methods that are based on the entropy concept. This study experimented the efficiency of the feature selection techniques. It was realised that software defect prediction using significant attributes improves the prediction accuracy. A novel MICFastCR model, which is based on the Maximal Information Coefficient (MIC) was developed to select significant attributes and Fast Correlation Based Filter (FCBF) to eliminate redundant attributes. Machine learning algorithms were then run to predict software defects. The MICFastCR achieved the highest prediction accuracy as reported by various performance measures. / School of Computing / Ph. D. (Computer Science)
53

Learning with Complex Performance Measures : Theory, Algorithms and Applications

Narasimhan, Harikrishna January 2016 (has links) (PDF)
We consider supervised learning problems, where one is given objects with labels, and the goal is to learn a model that can make accurate predictions on new objects. These problems abound in applications, ranging from medical diagnosis to information retrieval to computer vision. Examples include binary or multiclass classi cation, where the goal is to learn a model that can classify objects into two or more categories (e.g. categorizing emails into spam or non-spam); bipartite ranking, where the goal is to learn a model that can rank relevant objects above the irrelevant ones (e.g. ranking documents by relevance to a query); class probability estimation (CPE), where the goal is to predict the probability of an object belonging to different categories (e.g. probability of an internet ad being clicked by a user). In each case, the accuracy of a model is evaluated in terms of a specified `performance measure'. While there has been much work on designing and analyzing algorithms for different supervised learning tasks, we have complete understanding only for settings where the performance measure of interest is the standard 0-1 or a loss-based classification measure. These performance measures have a simple additive structure, and can be expressed as an expectation of errors on individual examples. However, in many real-world applications, the performance measure used to evaluate a model is often more complex, and does not decompose into a sum or expectation of point-wise errors. These include the binary or multiclass G-mean used in class-imbalanced classification problems; the F1-measure and its multiclass variants popular in text retrieval; and the (partial) area under the ROC curve (AUC) and precision@ employed in ranking applications. How does one design efficient learning algorithms for such complex performance measures, and can these algorithms be shown to be statistically consistent, i.e. shown to converge in the limit of infinite data to the optimal model for the given measure? How does one develop efficient learning algorithms for complex measures in online/streaming settings where the training examples need to be processed one at a time? These are questions that we seek to address in this thesis. Firstly, we consider the bipartite ranking problem with the AUC and partial AUC performance measures. We start by understanding how bipartite ranking with AUC is related to the standard 0-1 binary classification and CPE tasks. It is known that a good binary CPE model can be used to obtain both a good binary classification model and a good bipartite ranking model (formally, in terms of regret transfer bounds), and that a binary classification model does not necessarily yield a CPE model. However, not much is known about other directions. We show that in a weaker sense (where the mapping needed to transform a model from one problem to another depends on the underlying probability distribution), a good bipartite ranking model for AUC can indeed be used to construct a good binary classification model, and also a good binary CPE model. Next, motivated by the increasing number of applications (e.g. biometrics, medical diagnosis, etc.), where performance is measured, not in terms of the full AUC, but in terms of the partial AUC between two false positive rates (FPRs), we design batch algorithms for optimizing partial AUC in any given FPR range. Our algorithms optimize structural support vector machine based surrogates, which unlike for the full AUC; do not admit a straightforward decomposition into simpler terms. We develop polynomial time cutting plane solvers for solving the optimization, and provide experiments to demonstrate the efficacy of our methods. We also present an application of our approach to predicting chemotherapy outcomes for cancer patients, with the aim of improving treatment decisions. Secondly, we develop algorithms for optimizing (surrogates for) complex performance mea-sures in the presence of streaming data. A well-known method for solving this problem for standard point-wise surrogates such as the hinge surrogate, is the stochastic gradient descent (SGD) method, which performs point-wise updates using unbiased gradient estimates. How-ever, this method cannot be applied to complex objectives, as here one can no longer obtain unbiased gradient estimates from a single point. We develop a general stochastic method for optimizing complex measures that avoids point-wise updates, and instead performs gradient updates on mini-batches of incoming points. The method is shown to provably converge for any performance measure that satis es a uniform convergence requirement, such as the partial AUC, precision@ and F1-measure, and in experiments, is often several orders of magnitude faster than the state-of-the-art batch methods, while achieving similar or better accuracies. Moreover, for specific complex binary classification measures, which are concave functions of the true positive rate (TPR) and true negative rate (TNR), we are able to develop stochastic (primal-dual) methods that can indeed be implemented with point-wise updates, by using an adaptive linearization scheme. These methods admit convergence rates that match the rate of the SGD method, and are again several times faster than the state-of-the-art methods. Finally, we look at the design of consistent algorithms for complex binary and multiclass measures. For binary measures, we consider the practically popular plug-in algorithm that constructs a classifier by applying an empirical threshold to a suitable class probability estimate, and provide a general methodology for proving consistency of these methods. We apply this technique to show consistency for the F1-measure, and under a continuity assumption on the distribution, for any performance measure that is monotonic in the TPR and TNR. For the case of multiclass measures, a simple plug-in method is no longer tractable, as in the place of a single threshold parameter, one needs to tune at least as many parameters as the number of classes. Using an optimization viewpoint, we provide a framework for designing learning algorithms for multiclass measures that are general functions of the confusion matrix, and as an instantiation, provide an e cient and provably consistent algorithm based on the bisection method for multiclass measures that are ratio-of-linear functions of the confusion matrix (e.g. micro F1). The algorithm outperforms the state-of-the-art SVMPerf method in terms of both accuracy and running time. Overall, in this thesis, we have looked at various aspects of complex performance measures used in supervised learning problems, leading to several new algorithms that are often significantly better than the state-of-the-art, to improved theoretical understanding of the performance measures studied, and to novel real-life applications of the algorithms developed.
54

Investigation and application of artificial intelligence algorithms for complexity metrics based classification of semantic web ontologies

Koech, Gideon Kiprotich 11 1900 (has links)
M. Tech. (Department of Information Technology, Faculty of Applied and Computer Sciences), Vaal University of Technology. / The increasing demand for knowledge representation and exchange on the semantic web has resulted in an increase in both the number and size of ontologies. This increased features in ontologies has made them more complex and in turn difficult to select, reuse and maintain them. Several ontology evaluations and ranking tools have been proposed recently. Such evaluation tools provide a metrics suite that evaluates the content of an ontology by analysing their schemas and instances. The presence of ontology metric suites may enable classification techniques in placing the ontologies in various categories or classes. Machine Learning algorithms mostly based on statistical methods used in classification of data makes them the perfect tools to be used in performing classification of ontologies. In this study, popular Machine Learning algorithms including K-Nearest Neighbors, Support Vector Machines, Decision Trees, Random Forest, Naïve Bayes, Linear Regression and Logistic Regression were used in the classification of ontologies based on their complexity metrics. A total of 200 biomedical ontologies were downloaded from the Bio Portal repository. Ontology metrics were then generated using the OntoMetrics tool, an online ontology evaluation platform. These metrics constituted the dataset used in the implementation of the machine learning algorithms. The results obtained were evaluated with performance evaluation techniques, namely, precision, recall, F-Measure Score and Receiver Operating Characteristic (ROC) curves. The Overall accuracy scores for K-Nearest Neighbors, Support Vector Machines, Decision Trees, Random Forest, Naïve Bayes, Logistic Regression and Linear Regression algorithms were 66.67%, 65%, 98%, 99.29%, 74%, 64.67%, and 57%, respectively. From these scores, Decision Trees and Random Forests algorithms were the best performing and can be attributed to the ability to handle multiclass classifications.
55

Maskininlärning som verktyg för att extrahera information om attribut kring bostadsannonser i syfte att maximera försäljningspris / Using machine learning to extract information from real estate listings in order to maximize selling price

Ekeberg, Lukas, Fahnehjelm, Alexander January 2018 (has links)
The Swedish real estate market has been digitalized over the past decade with the current practice being to post your real estate advertisement online. A question that has arisen is how a seller can optimize their public listing to maximize the selling premium. This paper analyzes the use of three machine learning methods to solve this problem: Linear Regression, Decision Tree Regressor and Random Forest Regressor. The aim is to retrieve information regarding how certain attributes contribute to the premium value. The dataset used contains apartments sold within the years of 2014-2018 in the Östermalm / Djurgården district in Stockholm, Sweden. The resulting models returned an R2-value of approx. 0.26 and Mean Absolute Error of approx. 0.06. While the models were not accurate regarding prediction of premium, information was still able to be extracted from the models. In conclusion, a high amount of views and a publication made in April provide the best conditions for an advertisement to reach a high selling premium. The seller should try to keep the amount of days since publication lower than 15.5 days and avoid publishing on a Tuesday. / Den svenska bostadsmarknaden har blivit alltmer digitaliserad under det senaste årtiondet med nuvarande praxis att säljaren publicerar sin bostadsannons online. En fråga som uppstår är hur en säljare kan optimera sin annons för att maximera budpremie. Denna studie analyserar tre maskininlärningsmetoder för att lösa detta problem: Linear Regression, Decision Tree Regressor och Random Forest Regressor. Syftet är att utvinna information om de signifikanta attribut som påverkar budpremien. Det dataset som använts innehåller lägenheter som såldes under åren 2014-2018 i Stockholmsområdet Östermalm / Djurgården. Modellerna som togs fram uppnådde ett R²-värde på approximativt 0.26 och Mean Absolute Error på approximativt 0.06. Signifikant information kunde extraheras from modellerna trots att de inte var exakta i att förutspå budpremien. Sammanfattningsvis skapar ett stort antal visningar och en publicering i april de bästa förutsättningarna för att uppnå en hög budpremie. Säljaren ska försöka hålla antal dagar sedan publicering under 15.5 dagar och undvika att publicera på tisdagar.
56

Comparision of Machine Learning Algorithms on Identifying Autism Spectrum Disorder

Aravapalli, Naga Sai Gayathri, Palegar, Manoj Kumar January 2023 (has links)
Background: Autism Spectrum Disorder (ASD) is a complex neurodevelopmen-tal disorder that affects social communication, behavior, and cognitive development.Patients with autism have a variety of difficulties, such as sensory impairments, at-tention issues, learning disabilities, mental health issues like anxiety and depression,as well as motor and learning issues. The World Health Organization (WHO) es-timates that one in 100 children have ASD. Although ASD cannot be completelytreated, early identification of its symptoms might lessen its impact. Early identifi-cation of ASD can significantly improve the outcome of interventions and therapies.So, it is important to identify the disorder early. Machine learning algorithms canhelp in predicting ASD. In this thesis, Support Vector Machine (SVM) and RandomForest (RF) are the algorithms used to predict ASD. Objectives: The main objective of this thesis is to build and train the models usingmachine learning(ML) algorithms with the default parameters and with the hyper-parameter tuning and find out the most accurate model based on the comparison oftwo experiments to predict whether a person is suffering from ASD or not. Methods: Experimentation is the method chosen to answer the research questions.Experimentation helped in finding out the most accurate model to predict ASD. Ex-perimentation is followed by data preparation with splitting of data and by applyingfeature selection to the dataset. After the experimentation followed by two exper-iments, the models were trained to find the performance metrics with the defaultparameters, and the models were trained to find the performance with the hyper-parameter tuning. Based on the comparison, the most accurate model was appliedto predict ASD. Results: In this thesis, we have chosen two algorithms SVM and RF algorithms totrain the models. Upon experimentation and training of the models using algorithmswith hyperparameter tuning. SVM obtained the highest accuracy score and f1 scoresfor test data are 96% and 97% compared to other model RF which helps in predictingASD. Conclusions: The models were trained using two ML algorithms SVM and RF andconducted two experiments, in experiment-1 the models were trained using defaultparameters and obtained accuracy, f1 scores for the test data, and in experiment-2the models were trained using hyper-parameter tuning and obtained the performancemetrics such as accuracy and f1 score for the test data. By comparing the perfor-mance metrics, we came to the conclusion that SVM is the most accurate algorithmfor predicting ASD.
57

Konzeptentwicklung für das Qualitätsmanagement und der vorausschauenden Instandhaltung im Bereich der Innenhochdruck-Umformung (IHU): SFU 2023

Reuter, Thomas, Massalsky, Kristin, Burkhardt, Thomas 06 March 2024 (has links)
Serienfertiger im Bereich der Innenhochdruck-Umformung stehen unter starkem Wettbewerbsdruck alternativer klassischer Fertigungen und deren Kostenkriterien. Wechselnde Produktionsanforderungen im globalisierten Marktumfeld erfordern flexibles Handeln bei höchster Qualität und niedrigen Kosten. Durch Reduzierung der Lager- und Umlaufbestände können Kosteneinsparungen erzielt werden. Störungsbedingte Ausfälle an IHU-Anlagen gilt es dabei auf ein Minimum zu reduzieren, um die vereinbarten Liefertermine fristgerecht zu erfüllen und Konventionalstrafen zu umgehen. Die erforderliche Produktivität und das angestrebte Qualitätsniveau lässt sich nur durch angepasste Instandhaltungsstrategien aufrechterhalten, weshalb ein Konzept für die vorausschauende Instandhaltung mit integriertem Qualitätsmanagement speziell für den Bereich der IHU erarbeitet wurde. Dynamische Prozess- und Instandhaltungsanpassungen sind zentraler Bestandteil der Entwicklungsarbeit.
58

Concept development for quality management and predictive maintenance in the area of hydroforming (IHU): SFU 2023

Reuter, Thomas, Massalsky, Kristin, Burkhardt, Thomas 06 March 2024 (has links)
Series manufacturers in the field of hydroforming face intense competition from alternative conventional manufacturing methods and their cost criteria. Changing production requirements in the globalized market environment require flexible action with highest quality and low costs. Cost savings can be achieved through reductions in warehouse and circulating stocks. Malfunction-related downtimes in hydroforming systems must be reduced to a minimum in order to meet the agreed delivery dates on time and avoid conventional penalties. The required productivity and the desired quality level can only be maintained through adapted maintenance strategies, leading to the development of a concept for predictive maintenance integrated with quality management specifically for the IHU domain. Dynamic process and maintenance adaptations are a central component to this developmental effort.
59

Using hydrological models and digital soil mapping for the assessment and management of catchments: A case study of the Nyangores and Ruiru catchments in Kenya (East Africa)

Kamamia, Ann Wahu 18 July 2023 (has links)
Human activities on land have a direct and cumulative impact on water and other natural resources within a catchment. This land-use change can have hydrological consequences on the local and regional scales. Sound catchment assessment is not only critical to understanding processes and functions but also important in identifying priority management areas. The overarching goal of this doctoral thesis was to design a methodological framework for catchment assessment (dependent upon data availability) and propose practical catchment management strategies for sustainable water resources management. The Nyangores and Ruiru reservoir catchments located in Kenya, East Africa were used as case studies. A properly calibrated Soil and Water Assessment Tool (SWAT) hydrologic model coupled with a generic land-use optimization tool (Constrained Multi-Objective Optimization of Land-use Allocation-CoMOLA) was applied to identify and quantify functional trade-offs between environmental sustainability and food production in the ‘data-available’ Nyangores catchment. This was determined using a four-dimension objective function defined as (i) minimizing sediment load, (ii) maximizing stream low flow and (iii and iv) maximizing the crop yields of maize and soybeans, respectively. Additionally, three different optimization scenarios, represented as i.) agroforestry (Scenario 1), ii.) agroforestry + conservation agriculture (Scenario 2) and iii.) conservation agriculture (Scenario 3), were compared. For the data-scarce Ruiru reservoir catchment, alternative methods using digital soil mapping of soil erosion proxies (aggregate stability using Mean Weight Diameter) and spatial-temporal soil loss analysis using empirical models (the Revised Universal Soil Loss Equation-RUSLE) were used. The lack of adequate data necessitated a data-collection phase which implemented the conditional Latin Hypercube Sampling. This sampling technique reduced the need for intensive soil sampling while still capturing spatial variability. The results revealed that for the Nyangores catchment, adoption of both agroforestry and conservation agriculture (Scenario 2) led to the smallest trade-off amongst the different objectives i.e. a 3.6% change in forests combined with 35% change in conservation agriculture resulted in the largest reduction in sediment loads (78%), increased low flow (+14%) and only slightly decreased crop yields (3.8% for both maize and soybeans). Therefore, the advanced use of hydrologic models with optimization tools allows for the simultaneous assessment of different outputs/objectives and is ideal for areas with adequate data to properly calibrate the model. For the Ruiru reservoir catchment, digital soil mapping (DSM) of aggregate stability revealed that susceptibility to erosion exists for cropland (food crops), tea and roadsides, which are mainly located in the eastern part of the catchment, as well as deforested areas on the western side. This validated that with limited soil samples and the use of computing power, machine learning and freely available covariates, DSM can effectively be applied in data-scarce areas. Moreover, uncertainty in the predictions can be incorporated using prediction intervals. The spatial-temporal analysis exhibited that bare land (which has the lowest areal proportion) was the largest contributor to erosion. Two peak soil loss periods corresponding to the two rainy periods of March–May and October–December were identified. Thus, yearly soil erosion risk maps misrepresent the true dimensions of soil loss with averages disguising areas of low and high potential. Also, a small portion of the catchment can be responsible for a large proportion of the total erosion. For both catchments, agroforestry (combining both the use of trees and conservation farming) is the most feasible catchment management strategy (CMS) for solving the major water quantity and quality problems. Finally, the key to thriving catchments aiming at both sustainability and resilience requires urgent collaborative action by all stakeholders. The necessary stakeholders in both Nyangores and Ruiru reservoir catchments must be involved in catchment assessment in order to identify the catchment problems, mitigation strategies/roles and responsibilities while keeping in mind that some risks need to be shared and negotiated, but so will the benefits.:TABLE OF CONTENTS DECLARATION OF CONFORMITY........................................................................ i DECLARATION OF INDEPENDENT WORK AND CONSENT ............................. ii LIST OF PAPERS ................................................................................................. iii ACKNOWLEDGEMENTS ..................................................................................... iv THESIS AT A GLANCE ......................................................................................... v SUMMARY ............................................................................................................ vi List of Figures......................................................................................................... x List of Tables........................................................................................................... x ABBREVIATION..................................................................................................... xi PART A: SYNTHESIS 1. INTRODUCTION ............................................................................................... 1 1.1 Catchment management ...................................................................................1 1.2 Tools to support catchment assessment and management ..............................4 1.3 Catchment management strategies (CMSs)......................................................9 1.4 Concept and research objectives.......................................................................11 2. MATERIAL AND METHODS................................................................................15 2.1. STUDY AREA ..................................................................................................15 2.1.1. Nyangores catchment ...................................................................................15 2.1.2. Ruiru reservoir catchment .............................................................................17 2.2. Using SWAT conceptual model and land-use optimization ..............................19 2.3. Using soil erosion proxies and empirical models ..............................................21 3. RESULTS AND DISCUSSION..............................................................................24 3.1. Assessing multi-metric calibration performance using the SWAT model...........25 3.2. Land-use optimization using SWAT-CoMOLA for the Nyangores catchment. ..26 3.3. Digital soil mapping of soil aggregate stability ..................................................28 3.4. Spatio-temporal analysis using the revised universal soil loss equation (RUSLE) 29 4. CRITICAL ASSESSMENT OF THE METHODS USED ......................................31 4.1. Assessing suitability of data for modelling and overcoming data challenges...31 4.2. Selecting catchment management strategies based on catchment assessment . 35 5. CONCLUSION AND RECOMMENDATIONS ....................................................36 6. REFERENCES ............................ .....................................................................38 PART B: PAPERS PAPER I .................................................................................................................47 PAPER II ................................................................................................................59 PAPER III ...............................................................................................................74 PAPER IV ...............................................................................................................88
60

[pt] ENSAIOS EM PROBLEMAS DE OTIMIZAÇÃO DE CARTEIRAS SOB INCERTEZA / [en] ESSAYS ON ASSET ALLOCATION OPTIMIZATION PROBLEMS UNDER UNCERTAINTY

BETINA DODSWORTH MARTINS FROMENT FERNANDES 30 April 2019 (has links)
[pt] Nesta tese buscamos fornecer duas diferentes abordagens para a otimização de carteiras de ativos sob incerteza. Demonstramos como a incerteza acerca da distribuição dos retornos esperados pode ser incorporada nas decisões de alocação de ativos, utilizando as seguintes ferramentas: (1) uma extensão da metodologia Bayesiana proposta por Black e Litterman através de uma estratégia de negociação dinâmica construída sobre um modelo de aprendizagem com base na análise fundamentalista, (2 ) uma abordagem adaptativa baseada em técnicas de otimização robusta. Esta última abordagem é apresentada em duas diferentes especificações: uma modelagem robusta com base em uma análise puramente empírica e uma extensão da modelagem robusta proposta por Bertsimas e Sim em 2004. Para avaliar a importância dos modelos propostos no tratamento da incerteza na distribuição dos retornos examinamos a extensão das mudanças nas carteiras ótimas geradas. As principais conclusões são: (a ) é possível obter carteiras ótimas menos influenciadas por erros de estimação, ( b ) tais carteiras são capazes de gerar retornos estatisticamente superiores com perdas bem controladas, quando comparadas com carteiras ótimas de Markowitz e índices de referência selecionados. / [en] In this thesis we provide two different approaches for determining optimal asset allocation portfolios under uncertainty. We show how uncertainty about expected returns distribution can be incorporated in asset allocation decisions by using the following alternative frameworks: (1) an extension of the Bayesian methodology proposed by Black and Litterman through a dynamic trading strategy built on a learning model based on fundamental analysis; (2) an adaptive dynamic approach, based on robust optimization techniques. This latter approach is presented in two different specifications: an empirical robust loss model and a covariancebased robust loss model based on Bertsimas and Sim approach to model uncertainty sets. To evaluate the importance of the proposed models for distribution uncertainty, the extent of changes in the prior optimal asset allocations of investors who embody uncertainty in their portfolio is examined. The key findings are: (a) it is possible to achieve optimal portfolios less influenced by estimation errors; (b) portfolio strategies of such investors generate statistically higher returns with controlled losses when compared to the classical mean-variance optimized portfolios and selected benchmarks.

Page generated in 0.0418 seconds