Global ETD Search

1201	Identifying student stuck states in programmingassignments using machine learning Lindell, Johan January 2014 (has links) Intelligent tutors are becoming more popular with the increased use of computersand hand held devices in the education sphere. An area of research isinvestigating how machine learning can be used to improve the precision andfeedback of the tutor. This thesis compares machine learning clustering algorithmswith various distance functions in an attempt to cluster together codesnapshots of students solving a programming task. It investigates whethera general non-problem specific implementation of a distance function canbe used to identify when a student is stuck solving an assignment. Themachine learning algorithms compared are k-medoids, the randomly initializedalgorithm that produces a pre-defined number of clusters and affinitypropagation, a two phase algorithm with dynamic cluster sizes. Distancefunctions tried are based on the Bag of Words approach, lower level APIcalls and a problem specific distance function. This thesis could not find agood algorithm to achieve the sought goal, and lists a number of possibleerror sources linked to the data, preprocessing and algorithm. The methodologyis promising but requires a controlled environment at every level toassure data quality does not detract from the analysis in later stages. education machine learning clustering intelligent tutor Computer Engineering Datorteknik
1202	Paving the Way for Self-driving Cars - Software Testing for Safety-critical Systems Based on Machine Learning : A Systematic Mapping Study and a Survey gao, shenjian, Tan, Yanwen January 2017 (has links) Context: With the development of artificial intelligence, autonomous vehicles are becoming more and more feasible and the safety of Automated Driving (AD) system should be assured. This creates a need to analyze the feasibility of verification and validation approaches when testing safety-critical system that contains machine learning (ML) elements. There are many studies published in the context of verification and validation (V&V) research area related to safety-critical components. However, there are still blind spots of research to identify which test methods can be used to test components with deep learning elements for AD system. Therefore, research should focus on researching the relation of test methods and safety-critical components, also need to find more feasible V&V testing methods for AD system with deep learning structure. Objectives: The main objectives of this thesis is to understand the challenges and solution proposals related to V&V of safety-critical systems that rely on machine learning and provide recommendations for future V&V of AD based on deep learning, both for research and practice. Methods: We performed a Systematic Literature Review (SLR) through a snowballing method, based on the guidelines from Wohlin [1], to identify research on V&V methods development for machine learning. A web-based survey was used to complement the result of literature review and evaluate the V&V challenge and methods for machine learning system. We identified 64 peer-reviewed papers and analysed the methods and challenges of V&V for testing machine learning components. We conducted an industrial survey that was answered by 63 subjects. We analyzed the survey results with the help of descriptive statistics and Chi-squared tests. Result: Through the SLR we identified two peaks for research on V&V of machine learning. Early research focused on the aerospace field and in recent years the research has been more active in other fields like automotive and robotics. 21 challenges during V&V safety-critical systems have been described and 32 solution proposals are addressing the challenges have been identified. To find the relationship between challenges and methods, a classification has been done that seven different type of challenges and five different type of solution proposals have been identified. The classification and mapping of challenges and solution methods are included in the survey questionnaire. From the survey, it was observed that some solution proposals which have attracted much research are not considered as particularly promising by practitioners. On the other hand, some new solution methods like simulated test cases are extremely promising to support V&V for safety-critical systems. Six suggestions are provided to both researchers and practitioners. Conclusion: To conclude the thesis, our study presented a classification of challenges and solution methods for V&V of safety-critical ML-based systems. We also provide a mapping for helping practitioners understand the different kinds of challenges the respective solution methods address. Based on our findings, we provide suggestions to both researchers and practitioners. Thus, through the analysis, we have given the most concern on types of challenges and solution proposals for AD systems that use deep learning, which provides certain help to design processes for V&V of safety-critical ML-based systems in the future. Safety-critical system Machine learning Software testing Software Engineering Programvaruteknik
1203	Splicing Forgery Detection and the Impact of Image Resolution Devagiri, Vishnu Manasa January 2017 (has links) Context: There has been a rise in the usage of digital images these days. Digital images are being used in many areas like in medicine, wars, etc. As the images are being used to make many important decisions, it is necessary to know if the images used are clean or forged. In this thesis, we have considered the area of splicing forgery. In this thesis, we are also considering and analyzing the impact of low-resolution images on the considered algorithms. Objectives. Through this thesis, we try to improve the detection rate of splicing forgery detection. We also examine how the examined splicing forgery detection algorithm works on low-resolution images and considered classification algorithms (classifiers). Methods: The research methods used in this research are Implementation and Experimentation. Implementation was used to answer the first research question i.e., to improve the detection rate in splicing forgery. Experimentation was used to answer the second research question. The results of the experiment were analyzed using statistical analysis to find out how the examined algorithm works on different image resolutions and on the considered classifiers. Results: One-tailed Wilcoxon signed rank test was conducted to compare which algorithm performs better, the T+ value obtained was less than To so the null hypothesis was rejected and the alternative hypothesis which states that Algorithm 2 (our enhanced version of the algorithm) performs better than Algorithm 1 (original algorithm), is accepted. Experiments were conducted and the accuracy of the algorithms in different cases were noted, ROC curves were plotted to obtain the AUC parameter. The accuracy, AUC parameters were used to determine the performance of the algorithms. Conclusions: After the results were analyzed using statistical analysis, we came to the conclusion that Algorithm 2 performs better than Algorithm 1 in detecting the forged images. It was also observed that Algorithm 1 improves its performance on low-resolution images when trained on original images and tested on images of different resolutions but, in the case of Algorithm 2, its performance is improved when trained and tested on images of the same resolution. There was not much variance in the performance of both of the algorithms on images of different resolution. Coming to the classifiers, Algorithm 1 improves its performance on linear SVM whereas Algorithm 2 improves its performance when using the simple tree classifier. Splicing Forgery Machine Learning Image Processing Computer Sciences Datavetenskap (datalogi)
1204	AI Approaches for Classification and Attribute Extraction in Text Magnusson, Ludvig, Rovala, Johan January 2017 (has links) As the amount of data online grows, the urge to use this data for different applications grows as well. Machine learning can be used with the intent to reconstruct and validate the data you are interested in. Although the problem is very domain specific, this report will attempt to shed some light on what we call strategies for classification, which in broad terms mean, a set of steps in a process where the end goal is to have classified some part of the original data. As a result, we hope to introduce clarity into the classification process in detail as well as from a broader perspective. The report will investigate two classification objectives, one of which is dependent on many variables found in the input data and one that is more literal and only dependent on one or two variables. Specifically, the data we will classify are sales-objects. Each sales-object has a text describing the object and a related image. We will attempt to place these sales-objects into the correct product category. We will also try to derive the year of creation and it’s dimensions such as height and width. Different approaches are presented in the aforementioned strategies in order to classify such attributes. The results showed that for broader attributes such as a product category, supervised learning is indeed an appropriate approach, while the same can not be said for narrower attributes, which instead had to rely on entity recognition. Experiments on image analytics in conjunction with supervised learning proved image analytics to be a good addition when requiring a higher precision score. text classification feature extraction machine learning scikit Software Engineering Programvaruteknik
1205	Performance Envelopes of Adaptive Ensemble Data Stream Classifiers Joe-Yen, Stefan 01 January 2017 (has links) This dissertation documents a study of the performance characteristics of algorithms designed to mitigate the effects of concept drift on online machine learning. Several supervised binary classifiers were evaluated on their performance when applied to an input data stream with a non-stationary class distribution. The selected classifiers included ensembles that combine the contributions of their member algorithms to improve overall performance. These ensembles adapt to changing class definitions, known as “concept drift,” often present in real-world situations, by adjusting the relative contributions of their members. Three stream classification algorithms and three adaptive ensemble algorithms were compared to determine the capabilities of each in terms of accuracy and throughput. For each< run of the experiment, the percentage of correct classifications was measured using prequential analysis, a well-established methodology in the evaluation of streaming classifiers. Throughput was measured in classifications performed per second as timed by the CPU clock. Two main experimental variables were manipulated to investigate and compare the range of accuracy and throughput exhibited by each algorithm under various conditions. The number of attributes in the instances to be classified and the speed at which the definitions of labeled data drifted were varied across six total combinations of drift-speed and dimensionality. The implications of results are used to recommend improved methods for working with stream-based data sources. The typical approach to counteract concept drift is to update the classification models with new data. In the stream paradigm, classifiers are continuously exposed to new data that may serve as representative examples of the current situation. However, updating the ensemble classifier in order to maintain or improve accuracy can be computationally costly and will negatively impact throughput. In a real-time system, this could lead to an unacceptable slow-down. The results of this research showed that,among several algorithms for reducing the effect of concept drift, adaptive decision trees maintained the highest accuracy without slowing down with respect to the no-drift condition. Adaptive ensemble techniques were also able to maintain reasonable accuracy in the presence of drift without much change in the throughput. However, the overall throughput of the adaptive methods is low and may be unacceptable for extremely time-sensitive applications. The performance visualization methodology utilized in this study gives a clear and intuitive visual summary that allows system designers to evaluate candidate algorithms with respect to their performance needs. Concept Drift Data Stream Machine Learning Online Classifiers Computer Sciences
1206	Approaches to Natural Language Processing Smith, Sydney 01 January 2018 (has links) This paper explores topic modeling through the example text of Alice in Wonderland. It explores both singular value decomposition as well as non-‐‑negative matrix factorization as methods for feature extraction. The paper goes on to explore methods for partially supervised implementation of topic modeling through introducing themes. A large portion of the paper also focuses on implementation of these techniques in python as well as visualizations of the results which use a combination of python, html and java script along with the d3 framework. The paper concludes by presenting a mixture of SVD, NMF and partially-‐‑supervised NMF as a possible way to improve topic modeling. Topic Modeling Data Mining Machine Learning NMF Other Applied Mathematics
1207	Why Machine Learning Works Montanez, George D. 01 December 2017 (has links) To better understand why machine learning works, we cast learning problems as searches and characterize what makes searches successful. We prove that any search algorithm can only perform well on a narrow subset of problems, and show the effects of dependence on raising the probability of success for searches. We examine two popular ways of understanding what makes machine learning work, empirical risk minimization and compression, and show how they fit within our search frame-work. Leveraging the “dependence-first” view of learning, we apply this knowledge to areas of unsupervised time-series segmentation and automated hyperparameter optimization, developing new algorithms with strong empirical performance on real-world problem classes. machine learning algorithmic search famine of forte no free lunch dependence
1208	Robust Machine Learning QSPR Models for Recognizing High Performing MOFs for Pre-Combustion Carbon Capture and Using Molecular Simulation to Study Adsorption of Water and Gases in Novel MOFs Dureckova, Hana January 2018 (has links) Metal organic frameworks (MOFs) are a class of nanoporous materials composed through self-assembly of inorganic and organic structural building units (SBUs). MOFs show great promise for many applications due to their record-breaking internal surface areas and tunable pore chemistry. This thesis work focuses on gas separation applications of MOFs in the context of carbon capture and storage (CCS) technologies. CCS technologies are expected to play a key role in the mitigation of anthropogenic CO2 emissions in the near future. In the first part of the thesis, robust machine learning quantitative structure-property relationship (QSPR) models are developed to predict CO2 working capacity and CO2/H2 selectivity for pre-combustion carbon capture using the most topologically diverse database of hypothetical MOF structures constructed to date (358,400 MOFs, 1166 network topologies). The support vector regression (SVR) models are developed on a training set of 35,840 MOFs (10% of the database) and validated on the remaining 322,560 MOFs. The most accurate models for CO2 working capacities (R2 = 0.944) and CO2/H2 selectivities (R2 = 0.876) are built from a combination of six geometric descriptors and three novel y-range normalized atomic-property-weighted radial distribution function (AP-RDF) descriptors. 309 common MOFs are identified between the grand canonical Monte Carlo (GCMC) calculated and SVR-predicted top-1000 high-performing MOFs ranked according to a normalized adsorbent performance score. This work shows that SVR models can indeed account for the topological diversity exhibited by MOFs. In the second project of this thesis, computational simulations are performed on a MOF, CALF-20, to examine its chemical and physical properties which are linked to its exceptional water-resisting ability. We predict the atomic positions in the crystal structure of the bulk phase of CALF-20, for which only a powder X-ray diffraction pattern is available, from a single crystal X-ray diffraction pattern of a metastable phase of CALF-20. Using the predicted CALF-20 structure, we simulate adsorption isotherms of CO2 and N2 under dry and humid conditions which are in excellent agreement with experiment. Snapshots of the CALF-20 undergoing water sorption simulations reveal that water molecules in a given pore adsorb and desorb together due to hydrogen bonding. Binding sites and binding energies of CO2 and water in CALF-20 show that the preferential CO2 uptake at low relative humidities is driven by the stronger binding energy of CO2 in the MOF, and the sharp increase in water uptake at higher relative humidities is driven by the strong intermolecular interactions between water. In the third project of this thesis, we use computational simulations to investigate the effects of residual solvent on Ni-BPM’s CH4 and N2 adsorption properties. Single crystal X-ray diffraction data shows that there are two sets of positions (Set 1 and 2) that can be occupied by the 10 residual DMSO molecules in the Ni-BPM framework. GCMC simulations of CH4 and N2 uptake in Ni-BPM reveal that CH4 uptake is in closest agreement with experiment when the 10 DMSO’s are placed among the two sets of positions in equal ratio (Mixed Set). Severe under-prediction and over-prediction of CH4 uptake are observed when the DMSO’s are placed in Set1 and Set 2 positions, respectively. Through binding site analysis, the CH4 binding sites within the Ni-BPM framework are found to overlap with the Set 1 DMSO positions but not with the Set 2 DMSO positions which explains the deviations in CH4 uptake observed for these cases. Binding energy calculations reveal that CH4 molecules are most stabilized when the DMSO’s are in the Mixed Set of positions. Machine Learning Metal Organic Frameworks QSPR Carbon Capture Computational Chemistry
1209	Semantic Analysis Of Multi Meaning Words Using Machine Learning And Knowledge Representation Alirezaie, Marjan January 2011 (has links) The present thesis addresses machine learning in a domain of naturallanguage phrases that are names of universities. It describes two approaches to this problem and a software implementation that has made it possible to evaluate them and to compare them. In general terms, the system's task is to learn to 'understand' the significance of the various components of a university name, such as the city or region where the university is located, the scienti c disciplines that are studied there, or the name of a famous person which may be part of the university name. A concrete test for whether the system has acquired this understanding is when it is able to compose a plausible university name given some components that should occur in the name. In order to achieve this capability, our system learns the structure of available names of some universities in a given data set, i.e. it acquires a grammar for the microlanguage of university names. One of the challenges is that the system may encounter ambiguities due to multi meaning words. This problem is addressed using a small ontology that is created during the training phase. Both domain knowledge and grammatical knowledge is represented using decision trees, which is an ecient method for concept learning. Besides for inductive inference, their role is to partition the data set into a hierarchical structure which is used for resolving ambiguities. The present report also de nes some modi cations in the de nitions of parameters, for example a parameter for entropy, which enable the system to deal with cognitive uncertainties. Our method for automatic syntax acquisition, ADIOS, is an unsupervised learning method. This method is described and discussed here, including a report on the outcome of the tests using our data set. The software that has been implemented and used in this project has been implemented in C. Machine Learning Supervised Learning Unsupervised Learning Computer Sciences Datavetenskap (datalogi)
1210	Using Machine Learning Methods for Evaluating the Quality of Technical Documents Luckert, Michael, Schaefer-Kehnert, Moritz January 2016 (has links) In the context of an increasingly networked world, the availability of high quality translations is critical for success in the context of the growing international competition. Large international companies as well as medium sized companies are required to provide well translated, high quality technical documentation for their customers not only to be successful in the market but also to meet legal regulations and to avoid lawsuits. Therefore, this thesis focuses on the evaluation of translation quality, specifically concerning technical documentation, and answers two central questions: How can the translation quality of technical documents be evaluated, given the original document is available? How can the translation quality of technical documents be evaluated, given the original document is not available? These questions are answered using state-of-the-art machine learning algorithms and translation evaluation metrics in the context of a knowledge discovery process. The evaluations are done on a sentence level and recombined on a document level by binarily classifying sentences as automated translation and professional translation. The research is based on a database containing 22, 327 sentences and 32 translation evaluation attributes, which are used for optimizations of five different machine learning approaches. An optimization process consisting of 795, 000 evaluations shows a prediction accuracy of up to 72.24% for the binary classification. Based on the developed sentence-based classifi- cation systems, documents are classified using recombination of the affiliated sentences and a framework for rating document quality is introduced. Therefore, the taken approach successfully creates a classification and evaluation system. machine translation evaluation machine learning Computer Sciences Datavetenskap (datalogi)

Search results