• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 28
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 41
  • 41
  • 41
  • 15
  • 12
  • 12
  • 8
  • 6
  • 6
  • 5
  • 5
  • 5
  • 5
  • 5
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Data-Dependent Analysis of Learning Algorithms

Philips, Petra Camilla, petra.philips@gmail.com January 2005 (has links)
This thesis studies the generalization ability of machine learning algorithms in a statistical setting. It focuses on the data-dependent analysis of the generalization performance of learning algorithms in order to make full use of the potential of the actual training sample from which these algorithms learn.¶ First, we propose an extension of the standard framework for the derivation of generalization bounds for algorithms taking their hypotheses from random classes of functions. This approach is motivated by the fact that the function produced by a learning algorithm based on a random sample of data depends on this sample and is therefore a random function. Such an approach avoids the detour of the worst-case uniform bounds as done in the standard approach. We show that the mechanism which allows one to obtain generalization bounds for random classes in our framework is based on a “small complexity” of certain random coordinate projections. We demonstrate how this notion of complexity relates to learnability and how one can explore geometric properties of these projections in order to derive estimates of rates of convergence and good confidence interval estimates for the expected risk. We then demonstrate the generality of our new approach by presenting a range of examples, among them the algorithm-dependent compression schemes and the data-dependent luckiness frameworks, which fall into our random subclass framework.¶ Second, we study in more detail generalization bounds for a specific algorithm which is of central importance in learning theory, namely the Empirical Risk Minimization algorithm (ERM). Recent results show that one can significantly improve the high-probability estimates for the convergence rates for empirical minimizers by a direct analysis of the ERM algorithm. These results are based on a new localized notion of complexity of subsets of hypothesis functions with identical expected errors and are therefore dependent on the underlying unknown distribution. We investigate the extent to which one can estimate these high-probability convergence rates in a data-dependent manner. We provide an algorithm which computes a data-dependent upper bound for the expected error of empirical minimizers in terms of the “complexity” of data-dependent local subsets. These subsets are sets of functions of empirical errors of a given range and can be determined based solely on empirical data. We then show that recent direct estimates, which are essentially sharp estimates on the high-probability convergence rate for the ERM algorithm, can not be recovered universally from empirical data.
22

Software defect prediction using maximal information coefficient and fast correlation-based filter feature selection

Mpofu, Bongeka 12 1900 (has links)
Software quality ensures that applications that are developed are failure free. Some modern systems are intricate, due to the complexity of their information processes. Software fault prediction is an important quality assurance activity, since it is a mechanism that correctly predicts the defect proneness of modules and classifies modules that saves resources, time and developers’ efforts. In this study, a model that selects relevant features that can be used in defect prediction was proposed. The literature was reviewed and it revealed that process metrics are better predictors of defects in version systems and are based on historic source code over time. These metrics are extracted from the source-code module and include, for example, the number of additions and deletions from the source code, the number of distinct committers and the number of modified lines. In this research, defect prediction was conducted using open source software (OSS) of software product line(s) (SPL), hence process metrics were chosen. Data sets that are used in defect prediction may contain non-significant and redundant attributes that may affect the accuracy of machine-learning algorithms. In order to improve the prediction accuracy of classification models, features that are significant in the defect prediction process are utilised. In machine learning, feature selection techniques are applied in the identification of the relevant data. Feature selection is a pre-processing step that helps to reduce the dimensionality of data in machine learning. Feature selection techniques include information theoretic methods that are based on the entropy concept. This study experimented the efficiency of the feature selection techniques. It was realised that software defect prediction using significant attributes improves the prediction accuracy. A novel MICFastCR model, which is based on the Maximal Information Coefficient (MIC) was developed to select significant attributes and Fast Correlation Based Filter (FCBF) to eliminate redundant attributes. Machine learning algorithms were then run to predict software defects. The MICFastCR achieved the highest prediction accuracy as reported by various performance measures. / School of Computing / Ph. D. (Computer Science)
23

Investigation and application of artificial intelligence algorithms for complexity metrics based classification of semantic web ontologies

Koech, Gideon Kiprotich 11 1900 (has links)
M. Tech. (Department of Information Technology, Faculty of Applied and Computer Sciences), Vaal University of Technology. / The increasing demand for knowledge representation and exchange on the semantic web has resulted in an increase in both the number and size of ontologies. This increased features in ontologies has made them more complex and in turn difficult to select, reuse and maintain them. Several ontology evaluations and ranking tools have been proposed recently. Such evaluation tools provide a metrics suite that evaluates the content of an ontology by analysing their schemas and instances. The presence of ontology metric suites may enable classification techniques in placing the ontologies in various categories or classes. Machine Learning algorithms mostly based on statistical methods used in classification of data makes them the perfect tools to be used in performing classification of ontologies. In this study, popular Machine Learning algorithms including K-Nearest Neighbors, Support Vector Machines, Decision Trees, Random Forest, Naïve Bayes, Linear Regression and Logistic Regression were used in the classification of ontologies based on their complexity metrics. A total of 200 biomedical ontologies were downloaded from the Bio Portal repository. Ontology metrics were then generated using the OntoMetrics tool, an online ontology evaluation platform. These metrics constituted the dataset used in the implementation of the machine learning algorithms. The results obtained were evaluated with performance evaluation techniques, namely, precision, recall, F-Measure Score and Receiver Operating Characteristic (ROC) curves. The Overall accuracy scores for K-Nearest Neighbors, Support Vector Machines, Decision Trees, Random Forest, Naïve Bayes, Logistic Regression and Linear Regression algorithms were 66.67%, 65%, 98%, 99.29%, 74%, 64.67%, and 57%, respectively. From these scores, Decision Trees and Random Forests algorithms were the best performing and can be attributed to the ability to handle multiclass classifications.
24

Maskininlärning som verktyg för att extrahera information om attribut kring bostadsannonser i syfte att maximera försäljningspris / Using machine learning to extract information from real estate listings in order to maximize selling price

Ekeberg, Lukas, Fahnehjelm, Alexander January 2018 (has links)
The Swedish real estate market has been digitalized over the past decade with the current practice being to post your real estate advertisement online. A question that has arisen is how a seller can optimize their public listing to maximize the selling premium. This paper analyzes the use of three machine learning methods to solve this problem: Linear Regression, Decision Tree Regressor and Random Forest Regressor. The aim is to retrieve information regarding how certain attributes contribute to the premium value. The dataset used contains apartments sold within the years of 2014-2018 in the Östermalm / Djurgården district in Stockholm, Sweden. The resulting models returned an R2-value of approx. 0.26 and Mean Absolute Error of approx. 0.06. While the models were not accurate regarding prediction of premium, information was still able to be extracted from the models. In conclusion, a high amount of views and a publication made in April provide the best conditions for an advertisement to reach a high selling premium. The seller should try to keep the amount of days since publication lower than 15.5 days and avoid publishing on a Tuesday. / Den svenska bostadsmarknaden har blivit alltmer digitaliserad under det senaste årtiondet med nuvarande praxis att säljaren publicerar sin bostadsannons online. En fråga som uppstår är hur en säljare kan optimera sin annons för att maximera budpremie. Denna studie analyserar tre maskininlärningsmetoder för att lösa detta problem: Linear Regression, Decision Tree Regressor och Random Forest Regressor. Syftet är att utvinna information om de signifikanta attribut som påverkar budpremien. Det dataset som använts innehåller lägenheter som såldes under åren 2014-2018 i Stockholmsområdet Östermalm / Djurgården. Modellerna som togs fram uppnådde ett R²-värde på approximativt 0.26 och Mean Absolute Error på approximativt 0.06. Signifikant information kunde extraheras from modellerna trots att de inte var exakta i att förutspå budpremien. Sammanfattningsvis skapar ett stort antal visningar och en publicering i april de bästa förutsättningarna för att uppnå en hög budpremie. Säljaren ska försöka hålla antal dagar sedan publicering under 15.5 dagar och undvika att publicera på tisdagar.
25

Comparision of Machine Learning Algorithms on Identifying Autism Spectrum Disorder

Aravapalli, Naga Sai Gayathri, Palegar, Manoj Kumar January 2023 (has links)
Background: Autism Spectrum Disorder (ASD) is a complex neurodevelopmen-tal disorder that affects social communication, behavior, and cognitive development.Patients with autism have a variety of difficulties, such as sensory impairments, at-tention issues, learning disabilities, mental health issues like anxiety and depression,as well as motor and learning issues. The World Health Organization (WHO) es-timates that one in 100 children have ASD. Although ASD cannot be completelytreated, early identification of its symptoms might lessen its impact. Early identifi-cation of ASD can significantly improve the outcome of interventions and therapies.So, it is important to identify the disorder early. Machine learning algorithms canhelp in predicting ASD. In this thesis, Support Vector Machine (SVM) and RandomForest (RF) are the algorithms used to predict ASD. Objectives: The main objective of this thesis is to build and train the models usingmachine learning(ML) algorithms with the default parameters and with the hyper-parameter tuning and find out the most accurate model based on the comparison oftwo experiments to predict whether a person is suffering from ASD or not. Methods: Experimentation is the method chosen to answer the research questions.Experimentation helped in finding out the most accurate model to predict ASD. Ex-perimentation is followed by data preparation with splitting of data and by applyingfeature selection to the dataset. After the experimentation followed by two exper-iments, the models were trained to find the performance metrics with the defaultparameters, and the models were trained to find the performance with the hyper-parameter tuning. Based on the comparison, the most accurate model was appliedto predict ASD. Results: In this thesis, we have chosen two algorithms SVM and RF algorithms totrain the models. Upon experimentation and training of the models using algorithmswith hyperparameter tuning. SVM obtained the highest accuracy score and f1 scoresfor test data are 96% and 97% compared to other model RF which helps in predictingASD. Conclusions: The models were trained using two ML algorithms SVM and RF andconducted two experiments, in experiment-1 the models were trained using defaultparameters and obtained accuracy, f1 scores for the test data, and in experiment-2the models were trained using hyper-parameter tuning and obtained the performancemetrics such as accuracy and f1 score for the test data. By comparing the perfor-mance metrics, we came to the conclusion that SVM is the most accurate algorithmfor predicting ASD.
26

Konzeptentwicklung für das Qualitätsmanagement und der vorausschauenden Instandhaltung im Bereich der Innenhochdruck-Umformung (IHU): SFU 2023

Reuter, Thomas, Massalsky, Kristin, Burkhardt, Thomas 06 March 2024 (has links)
Serienfertiger im Bereich der Innenhochdruck-Umformung stehen unter starkem Wettbewerbsdruck alternativer klassischer Fertigungen und deren Kostenkriterien. Wechselnde Produktionsanforderungen im globalisierten Marktumfeld erfordern flexibles Handeln bei höchster Qualität und niedrigen Kosten. Durch Reduzierung der Lager- und Umlaufbestände können Kosteneinsparungen erzielt werden. Störungsbedingte Ausfälle an IHU-Anlagen gilt es dabei auf ein Minimum zu reduzieren, um die vereinbarten Liefertermine fristgerecht zu erfüllen und Konventionalstrafen zu umgehen. Die erforderliche Produktivität und das angestrebte Qualitätsniveau lässt sich nur durch angepasste Instandhaltungsstrategien aufrechterhalten, weshalb ein Konzept für die vorausschauende Instandhaltung mit integriertem Qualitätsmanagement speziell für den Bereich der IHU erarbeitet wurde. Dynamische Prozess- und Instandhaltungsanpassungen sind zentraler Bestandteil der Entwicklungsarbeit.
27

Concept development for quality management and predictive maintenance in the area of hydroforming (IHU): SFU 2023

Reuter, Thomas, Massalsky, Kristin, Burkhardt, Thomas 06 March 2024 (has links)
Series manufacturers in the field of hydroforming face intense competition from alternative conventional manufacturing methods and their cost criteria. Changing production requirements in the globalized market environment require flexible action with highest quality and low costs. Cost savings can be achieved through reductions in warehouse and circulating stocks. Malfunction-related downtimes in hydroforming systems must be reduced to a minimum in order to meet the agreed delivery dates on time and avoid conventional penalties. The required productivity and the desired quality level can only be maintained through adapted maintenance strategies, leading to the development of a concept for predictive maintenance integrated with quality management specifically for the IHU domain. Dynamic process and maintenance adaptations are a central component to this developmental effort.
28

Using hydrological models and digital soil mapping for the assessment and management of catchments: A case study of the Nyangores and Ruiru catchments in Kenya (East Africa)

Kamamia, Ann Wahu 18 July 2023 (has links)
Human activities on land have a direct and cumulative impact on water and other natural resources within a catchment. This land-use change can have hydrological consequences on the local and regional scales. Sound catchment assessment is not only critical to understanding processes and functions but also important in identifying priority management areas. The overarching goal of this doctoral thesis was to design a methodological framework for catchment assessment (dependent upon data availability) and propose practical catchment management strategies for sustainable water resources management. The Nyangores and Ruiru reservoir catchments located in Kenya, East Africa were used as case studies. A properly calibrated Soil and Water Assessment Tool (SWAT) hydrologic model coupled with a generic land-use optimization tool (Constrained Multi-Objective Optimization of Land-use Allocation-CoMOLA) was applied to identify and quantify functional trade-offs between environmental sustainability and food production in the ‘data-available’ Nyangores catchment. This was determined using a four-dimension objective function defined as (i) minimizing sediment load, (ii) maximizing stream low flow and (iii and iv) maximizing the crop yields of maize and soybeans, respectively. Additionally, three different optimization scenarios, represented as i.) agroforestry (Scenario 1), ii.) agroforestry + conservation agriculture (Scenario 2) and iii.) conservation agriculture (Scenario 3), were compared. For the data-scarce Ruiru reservoir catchment, alternative methods using digital soil mapping of soil erosion proxies (aggregate stability using Mean Weight Diameter) and spatial-temporal soil loss analysis using empirical models (the Revised Universal Soil Loss Equation-RUSLE) were used. The lack of adequate data necessitated a data-collection phase which implemented the conditional Latin Hypercube Sampling. This sampling technique reduced the need for intensive soil sampling while still capturing spatial variability. The results revealed that for the Nyangores catchment, adoption of both agroforestry and conservation agriculture (Scenario 2) led to the smallest trade-off amongst the different objectives i.e. a 3.6% change in forests combined with 35% change in conservation agriculture resulted in the largest reduction in sediment loads (78%), increased low flow (+14%) and only slightly decreased crop yields (3.8% for both maize and soybeans). Therefore, the advanced use of hydrologic models with optimization tools allows for the simultaneous assessment of different outputs/objectives and is ideal for areas with adequate data to properly calibrate the model. For the Ruiru reservoir catchment, digital soil mapping (DSM) of aggregate stability revealed that susceptibility to erosion exists for cropland (food crops), tea and roadsides, which are mainly located in the eastern part of the catchment, as well as deforested areas on the western side. This validated that with limited soil samples and the use of computing power, machine learning and freely available covariates, DSM can effectively be applied in data-scarce areas. Moreover, uncertainty in the predictions can be incorporated using prediction intervals. The spatial-temporal analysis exhibited that bare land (which has the lowest areal proportion) was the largest contributor to erosion. Two peak soil loss periods corresponding to the two rainy periods of March–May and October–December were identified. Thus, yearly soil erosion risk maps misrepresent the true dimensions of soil loss with averages disguising areas of low and high potential. Also, a small portion of the catchment can be responsible for a large proportion of the total erosion. For both catchments, agroforestry (combining both the use of trees and conservation farming) is the most feasible catchment management strategy (CMS) for solving the major water quantity and quality problems. Finally, the key to thriving catchments aiming at both sustainability and resilience requires urgent collaborative action by all stakeholders. The necessary stakeholders in both Nyangores and Ruiru reservoir catchments must be involved in catchment assessment in order to identify the catchment problems, mitigation strategies/roles and responsibilities while keeping in mind that some risks need to be shared and negotiated, but so will the benefits.:TABLE OF CONTENTS DECLARATION OF CONFORMITY........................................................................ i DECLARATION OF INDEPENDENT WORK AND CONSENT ............................. ii LIST OF PAPERS ................................................................................................. iii ACKNOWLEDGEMENTS ..................................................................................... iv THESIS AT A GLANCE ......................................................................................... v SUMMARY ............................................................................................................ vi List of Figures......................................................................................................... x List of Tables........................................................................................................... x ABBREVIATION..................................................................................................... xi PART A: SYNTHESIS 1. INTRODUCTION ............................................................................................... 1 1.1 Catchment management ...................................................................................1 1.2 Tools to support catchment assessment and management ..............................4 1.3 Catchment management strategies (CMSs)......................................................9 1.4 Concept and research objectives.......................................................................11 2. MATERIAL AND METHODS................................................................................15 2.1. STUDY AREA ..................................................................................................15 2.1.1. Nyangores catchment ...................................................................................15 2.1.2. Ruiru reservoir catchment .............................................................................17 2.2. Using SWAT conceptual model and land-use optimization ..............................19 2.3. Using soil erosion proxies and empirical models ..............................................21 3. RESULTS AND DISCUSSION..............................................................................24 3.1. Assessing multi-metric calibration performance using the SWAT model...........25 3.2. Land-use optimization using SWAT-CoMOLA for the Nyangores catchment. ..26 3.3. Digital soil mapping of soil aggregate stability ..................................................28 3.4. Spatio-temporal analysis using the revised universal soil loss equation (RUSLE) 29 4. CRITICAL ASSESSMENT OF THE METHODS USED ......................................31 4.1. Assessing suitability of data for modelling and overcoming data challenges...31 4.2. Selecting catchment management strategies based on catchment assessment . 35 5. CONCLUSION AND RECOMMENDATIONS ....................................................36 6. REFERENCES ............................ .....................................................................38 PART B: PAPERS PAPER I .................................................................................................................47 PAPER II ................................................................................................................59 PAPER III ...............................................................................................................74 PAPER IV ...............................................................................................................88
29

Learning Algorithms Using Chance-Constrained Programs

Jagarlapudi, Saketha Nath 07 1900 (has links)
This thesis explores Chance-Constrained Programming (CCP) in the context of learning. It is shown that chance-constraint approaches lead to improved algorithms for three important learning problems — classification with specified error rates, large dataset classification and Ordinal Regression (OR). Using moments of training data, the CCPs are posed as Second Order Cone Programs (SOCPs). Novel iterative algorithms for solving the resulting SOCPs are also derived. Borrowing ideas from robust optimization theory, the proposed formulations are made robust to moment estimation errors. A maximum margin classifier with specified false positive and false negative rates is derived. The key idea is to employ chance-constraints for each class which imply that the actual misclassification rates do not exceed the specified. The formulation is applied to the case of biased classification. The problems of large dataset classification and ordinal regression are addressed by deriving formulations which employ chance-constraints for clusters in training data rather than constraints for each data point. Since the number of clusters can be substantially smaller than the number of data points, the resulting formulation size and number of inequalities are very small. Hence the formulations scale well to large datasets. The scalable classification and OR formulations are extended to feature spaces and the kernelized duals turn out to be instances of SOCPs with a single cone constraint. Exploiting this speciality, fast iterative solvers which outperform generic SOCP solvers, are proposed. Compared to state-of-the-art learners, the proposed algorithms achieve a speed up as high as 10000 times, when the specialized SOCP solvers are employed. The proposed formulations involve second order moments of data and hence are susceptible to moment estimation errors. A generic way of making the formulations robust to such estimation errors is illustrated. Two novel confidence sets for moments are derived and it is shown that when either of the confidence sets are employed, the robust formulations also yield SOCPs.
30

運用記憶體內運算於智慧型健保院所異常查核之研究 / A Research into In-Memory Computing Techniques for Intelligent Check of Health-Insurance Fraud

湯家哲, Tang, Jia Jhe Unknown Date (has links)
我國全民健保近年財務不佳,民國98年收支短絀達582億元。根據中央健康保險署資料,截至目前為止,特約醫事服務機構違規次數累積達13722次。在所有重大違規事件中,大部分是詐欺行為。 健保審查機制主要以電腦隨機抽樣,再由人工進行調查。然而,這樣的審查方式無法有效抽取到違規醫事機構之樣本,造成審查效果不彰。 Benford’s Law又稱第一位數法則,其概念為第一位數的值越小則該數字出現的頻率越大,反之相反。該方法被應用於會計、金融、審計及經濟領域中。楊喻翔(2012)將Benford’s Law相關指標應用於我國全民健保上,並結合機器學習演算法來進行健保異常偵測。 Zaharia et al. (2012)提出了一種具容錯的群集記憶內運算模式 Apache Spark,在相同的運算節點及資源下,其資料運算效率及速度可勝出Hadoop MapReduce 20倍以上。 為解決健保異常查核效果不彰問題,本研究將採用Benford’s Law,使用國家衛生研究院發行之健保資料計算成為Benford’s Law指標和實務指標,接著並使用支援向量機和邏輯斯迴歸來建構出異常查核模型。然而健保資料量龐大,為加快運算時間,本研究使用Apache Spark做為運算環境,並以Hadoop MapReduce作為標竿,比較運算效率。 研究結果顯示,本研究撰寫的Spark程式運算時間能較MapReduce快2倍;在分類模型上,支援向量機和邏輯斯迴歸所進行的住院資料測試,敏感度皆有80%以上;而所進行的門診資料測試,兩個模型的準確率沒有住院資料高,但邏輯斯迴歸測試結果仍保有一定的準確性,在敏感度仍有75%,整體正確率有73%。 本研究使用Apache Spark節省處理大量健保資料的運算時間。其次本研究建立的智慧型異常查核模型,確實能查核出違約的醫事機構,而模型所查核出可能有詐欺及濫用健保之醫事機構,可進行下階段人工調查,最終得改善健保查核效力。 / Financial condition of National Health Insurance (NHI) has been wretched in recent years. The income statement in 2009 indicated that National Health Insurance Administration (NHIA) was in debt for NTD $58.2 billion. According to NHIA data, certain medical institutions in Taiwan violated the NHI laws for 13722 times. Among all illegal cases, fraud is the most serious. In order to find illegal medical institutions, NHIA conducted random sampling by computer. Once the data was collected, NHIA investigators got involved in the review process. However, the way to get the samples mentioned above cannot reveal the reality. Benford's law is called the First-Digit Law. The concept of Benford’s Law is that the smaller digits would appear more frequently, while larger digits would occur less frequently. Benford’s Law is applied to accounting, finance, auditing and economics. Yang(2012) used Benford’s Law in NHI data and he also used machine learning algorithms to do fraud detection. Zaharia et al. (2012) proposed a fault-tolerant in-memory cluster computing -Apache Spark. Under the same computing nodes and resources, Apache Spark’s computing is faster than Hadoop MapReduce 20 times. In order to solve the problem of medical claims review, Benford’s Law was applied to this study. This study used NHI data which was published by National Health Research Institutes. Then, we computed NHI data to generate Benford’s Law variables and technical variables. Finally, we used support vector machine and logistics regression to construct the illegal check model. During system development, we found that the data size was big. With the purpose of reducing the computing time, we used Apache Spark to build computing environment. Furthermore, we adopted Hadoop MapReduce as benchmark to compare the performance of computing time. This study indicated that Apache Spark is faster twice than Hadoop MapReduce. In illegal check model, with support vector machine and logistics regression, we had 80% sensitivity in inpatient data. In outpatient data, the accuracy of support vector machine and logistics regression were lower than inpatient data. In this case, logistics regression still had 75% sensitivity and 73% accuracy. This study used Apache Spark to compute NHI data with lower computing time. Second, we constructed the intelligent illegal check model which can find the illegal medical institutions for manual check. With the use of illegal check model, the procedure of medical claims review will be improved.

Page generated in 0.1111 seconds