Global ETD Search

1221	Machine Learning in Logistics: Machine Learning Algorithms : Data Preprocessing and Machine Learning Algorithms Andersson, Viktor January 2017 (has links) Data Ductus is a Swedish IT-consultant company, their customer base ranging from small startups to large scale cooperations. The company has steadily grown since the 80s and has established offices in both Sweden and the US. With the help of machine learning, this project will present a possible solution to the errors caused by the human factor in the logistic business.A way of preprocessing data before applying it to a machine learning algorithm, as well as a couple of algorithms to use will be presented. / Data Ductus är ett svenskt IT-konsultbolag, deras kundbas sträcker sig från små startups till stora redan etablerade företag. Företaget har stadigt växt sedan 80-talet och har etablerat kontor både i Sverige och i USA. Med hjälp av maskininlärning kommer detta projket att presentera en möjlig lösning på de fel som kan uppstå inom logistikverksamheten, orsakade av den mänskliga faktorn.Ett sätt att förbehandla data innan den tillämpas på en maskininlärning algoritm, liksom ett par algoritmer för användning kommer att presenteras. Machine Learning Algorithms Data Preprocessing Computer Systems Datorsystem
1222	A generic framework for life simulation and learning multi-agent systems with the ability to solve complex problems in multiple domains Doukas, Gregory 09 December 2013 (has links) M.Sc. (Computer Science) / This research study investigates multi-agent systems (MASs), artificial life concepts and machine learning, amongst other things, in answering the key research question: “How can a generic multi-agent system integrate with machine learning through artificial life principles?” In answering this question, this dissertation illustrates the design and development of a generic multi-agent, life simulation and learning software framework. This framework simplifies and enables the realisation of MASs in solving complex problems in multiple domains. Finally, this research presents a prototype solution as a proof of concept of the framework’s strengths and weaknesses. The research study illustrates the design of MASs utilising sound design principles, patterns and methodologies. Furthermore, this research explores the requirements for creating and integrating MASs with other technologies, as well as the possible pitfalls in creating such large-scale systems. In addressing the necessity of learning, several machine learning techniques are examined and reinforcement learning is identified as an ideal candidate for the proposed framework. In addition, by understanding the overall machine learning process, the proposed framework integrates machine learning as three separate processes: data extraction, learning and inference. Lastly, the literature study focuses on artificial life, specifically its use in MASs, and defines what constitutes an intelligent system. This research depicts artificial life as a plausible natural integrator between MAS and machine learning technologies. The proposed framework presented in this dissertation consists of five core agent modules that can be extended, depending on the problem domain requirements. The framework in itself is self-containing and independent of any concrete implementation. A multi-agent antivirus system is presented as the prototype implementation of the proposed framework. A quantitative and qualitative analysis was conducted, identifying the results of the prototype and generic framework while highlighting strengths and weaknesses. The contribution of this research is found partly in the proposed generic framework as a means of augmenting mechanisms for MAS design and development by means of artificial life and machine learning integration. In a broader context, this research serves as a foundation towards creating advanced MAS frameworks, leading to numerous interesting and influential agent-oriented applications. Multiagent systems Machine learning Artificial life - Simulation methods
1223	The application of quantitative structure activity relationship models to the method development of countercurrent chromatography Marsden-Jones, Siân Catherine January 2016 (has links) A fundamental challenge for liquid-liquid separation techniques such as countercurrent chromatography (CCC)and centrifugal partition chromatography (CPC), is the swift, efficient selection of the two phase solvent system containing more than two solvents, for the purification of pharmaceuticals and other molecules. A purely computational model that could predict the optimal solvent systems for separation using just molecular structure would be ideal for this task. The experimental value being predicted is the partition coefficient (Kd), which is the concentration of the compound in one phase divided by the concentration in the other. Using this approach, Quantitative Structure Activity Relationship (QSAR) models have been developed to predict the partitioning of compounds in two phase systems from the molecular structure of the compound using molecular descriptors. A Kd value in the range of 0.5 to 2 will give optimal separation. Molecular descriptors are varied, examples include logP values, hydrogen bond donor values and the number of oxygen atoms. This work describes how the QSAR models were developed and tested. A dataset of experimental logKd values for 54 compounds in six different combinations of four solvents (heptane, ethyl acetate, methanol and water) was used to train the QSAR models. A set of 196 possible molecular descriptors was generated for the 54 compounds and a partial least squares regression was used to identify which of these was significant in the relationship between logKd and molecular structure. The resulting models were used to predict the logKd values of four test compounds that had not been used to build the QSAR models. When these predictions were compared to the experimental logKd values, the root mean squared error for four of the six models was less than 0.5 and less than 0.7 for the remaining two. These models were used to successfully separate a range of structurally diverse pharmaceutical compounds by predicting the best solvent systems to carry out the separation on the CCC/CPC using nothing but their molecular structure. 543
1224	Determining the Effectiveness of Soil Treatment on Plant Stress using Smart-phone Cameras Panwar, Anurag 08 June 2016 (has links) Plants are vital to the health of our biosphere, and effectively sustaining their growth is fundamental to the existence of life on this planet. A critical aspect, which decides the sustainability of plant growth is the quality of soil. All other things being fixed, the quality of soil greatly impacts the plant stress, which in turn impacts overall health. Although plant stress manifests in many ways, one of the clearest indicators are colors of the leaves. In this thesis, we conducted an experimental study in a greenhouse for detecting plant stress caused by nutrient deficienceies in soil using smartphone cameras, coupled with image processing and machine learning algorithms. The greenhouse experiment was conducted by growing two plant species; willows (Salix Pentandra) and poplars (Populus deltoides x nigra, DN34), in two treatments. These treatments included: unamended tailings (collected from a lead mine tailings pond and characterized by nutrient deficiency), and biosolids amended tailings. Biosolids are very rich in nutrients and were added to the tailings in one of the two treatments to supply plants with nutrients. Subsequently, we captured various images of plant leaves grown in both soils. Each image taken was pre-processed via filteration to remove associated noise, and was segmented into pixels to facilitate scalability of analysis. Subsequently, we designed random forests based algorithms to detect the stress of leaves as indicated by their coloring. In a dataset consisting of 34 leaves, our technique yields classifications with a high degree of prediction, recall and F1 score. Our work in this thesis, while restricted to two types of plants and soils, can be generalized. We see applications in the emerging area of urban farming in terms of empowering citizens with tools and technologies for enhancing quality of farming practices. Participatory sensing Machine vision Image processing Machine learning Computer Sciences
1225	Adversarial Deep Learning Against Intrusion Detection Classifiers Rigaki, Maria January 2017 (has links) Traditional approaches in network intrusion detection follow a signature-based ap- proach, however the use of anomaly detection approaches based on machine learning techniques have been studied heavily for the past twenty years. The continuous change in the way attacks are appearing, the volume of attacks, as well as the improvements in the big data analytics space, make machine learning approaches more alluring than ever. The intention of this thesis is to show that using machine learning in the intrusion detection domain should be accompanied with an evaluation of its robustness against adversaries. Several adversarial techniques have emerged lately from the deep learning research, largely in the area of image classification. These techniques are based on the idea of introducing small changes in the original input data in order to make a machine learning model to misclassify it. This thesis follows a big data Analytics methodol- ogy and explores adversarial machine learning techniques that have emerged from the deep learning domain, against machine learning classifiers used for network intrusion detection. The study looks at several well known classifiers and studies their performance under attack over several metrics, such as accuracy, F1-score and receiver operating character- istic. The approach used assumes no knowledge of the original classifier and examines both general and targeted misclassification. The results show that using relatively sim- ple methods for generating adversarial samples it is possible to lower the detection accuracy of intrusion detection classifiers from 5% to 28%. Performance degradation is achieved using a methodology that is simpler than previous approaches and it re- quires only 6.25% change between the original and the adversarial sample, making it a candidate for a practical adversarial approach. adversarial machine learning intrusion detection Computer Systems Datorsystem
1226	Improved Detection for Advanced Polymorphic Malware Fraley, James B. 01 January 2017 (has links) Malicious Software (malware) attacks across the internet are increasing at an alarming rate. Cyber-attacks have become increasingly more sophisticated and targeted. These targeted attacks are aimed at compromising networks, stealing personal financial information and removing sensitive data or disrupting operations. Current malware detection approaches work well for previously known signatures. However, malware developers utilize techniques to mutate and change software properties (signatures) to avoid and evade detection. Polymorphic malware is practically undetectable with signature-based defensive technologies. Today’s effective detection rate for polymorphic malware detection ranges from 68.75% to 81.25%. New techniques are needed to improve malware detection rates. Improved detection of polymorphic malware can only be accomplished by extracting features beyond the signature realm. Targeted detection for polymorphic malware must rely upon extracting key features and characteristics for advanced analysis. Traditionally, malware researchers have relied on limited dimensional features such as behavior (dynamic) or source/execution code analysis (static). This study’s focus was to extract and evaluate a limited set of multidimensional topological data in order to improve detection for polymorphic malware. This study used multidimensional analysis (file properties, static and dynamic analysis) with machine learning algorithms to improve malware detection. This research demonstrated improved polymorphic malware detection can be achieved with machine learning. This study conducted a number of experiments using a standard experimental testing protocol. This study utilized three advanced algorithms (Metabagging (MB), Instance Based k-Means (IBk) and Deep Learning Multi-Layer Perceptron) with a limited set of multidimensional data. Experimental results delivered detection results above 99.43%. In addition, the experiments delivered near zero false positives. The study’s approach was based on single case experimental design, a well-accepted protocol for progressive testing. The study constructed a prototype to automate feature extraction, assemble files for analysis, and analyze results through multiple clustering algorithms. The study performed an evaluation of large malware sample datasets to understand effectiveness across a wide range of malware. The study developed an integrated framework which automated feature extraction for multidimensional analysis. The feature extraction framework consisted of four modules: 1) a pre-process module that extracts and generates topological features based on static analysis of machine code and file characteristics, 2) a behavioral analysis module that extracts behavioral characteristics based on file execution (dynamic analysis), 3) an input file construction and submission module, and 4) a machine learning module that employs various advanced algorithms. As with most studies, careful attention was paid to false positive and false negative rates which reduce their overall detection accuracy and effectiveness. This study provided a novel approach to expand the malware body of knowledge and improve the detection for polymorphic malware targeting Microsoft operating systems. Advanced Detection Cluster Algorithms Machine Learning Malware Computer Sciences
1227	Computational strategies to identify, prioritize and design potential antimalarial agents from natural products Egieyeh, Samuel Ayodele January 2015 (has links) Philosophiae Doctor - PhD / Introduction: There is an exigent need to develop novel antimalarial drugs in view of the mounting disease burden and emergent resistance to the presently used drugs against the malarial parasites. A large amount of natural products, especially those used in ethnomedicine for malaria, have shown varying in-vitro antiplasmodial activities. Facilitating antimalarial drug development from this wealth of natural products is an imperative and laudable mission to pursue. However, the limited resources, high cost, low prospect and the high cost of failure during preclinical and clinical studies might militate against pursue of this mission. Chemoinformatics techniques can simulate and predict essential molecular properties required to characterize compounds thus eliminating the cost of equipment and reagents to conduct essential preclinical studies, especially on compounds that may fail during drug development. Therefore, applying chemoinformatics techniques on natural products with in-vitro antiplasmodial activities may facilitate identification and prioritization of these natural products with potential for novel mechanism of action, desirable pharmacokinetics and high likelihood for development into antimalarial drugs. In addition, unique structural features mined from these natural products may be templates to design new potential antimalarial compounds. Method: Four chemoinformatics techniques were applied on a collection of selected natural products with in-vitro antiplasmodial activity (NAA) and currently registered antimalarial drugs (CRAD): molecular property profiling, molecular scaffold analysis, machine learning and design of a virtual compound library. Molecular property profiling included computation of key molecular descriptors, physicochemical properties, molecular similarity analysis, estimation of drug-likeness, in-silico pharmacokinetic profiling and exploration of structure-activity landscape. Analysis of variance was used to assess statistical significant differences in these parameters between NAA and CRAD. Next, molecular scaffold exploration and diversity analyses were performed on three datasets (NAA, CRAD and malarial data from Medicines for Malarial Ventures (MMV)) using scaffold counts and cumulative scaffold frequency plots. Scaffolds from the NAA were compared to those from CRAD and MMV. A Scaffold Tree was also generated for all the datasets. Thirdly, machine learning approaches were used to build four regression and four classifier models from bioactivity data of NAA using molecular descriptors and molecular fingerprints. Models were built and refined by leave-one-out cross-validation and evaluated with an independent test dataset. Applicability domain (AD), which defines the limit of reliable predictability by the models, was estimated from the training dataset and validated with the test dataset. Possible chemical features associated with reported antimalarial activities of the compounds were also extracted. Lastly, virtual compound libraries were generated with the unique molecular scaffolds identified from the NAA. The virtual compounds generated were characterized by evaluating selected molecular descriptors, toxicity profile, structural diversity from CRAD and prediction of antiplasmodial activity. Results: From the molecular property profiling, a total of 1040 natural products were selected and a total of 13 molecular descriptors were analyzed. Significant differences were observed between the natural products with in-vitro antiplasmodial activities (NAA) and currently registered antimalarial drugs (CRAD) for at least 11 of the molecular descriptors. Molecular similarity and chemical space analysis identified NAA that were structurally diverse from CRAD. Over 50% of NAA with desirable drug-like properties were identified. However, nearly 70% of NAA were identified as potentially "promiscuous" compounds. Structure-activity landscape analysis highlighted compound pairs that formed "activity cliffs". In all, prioritization strategies for the natural products with in-vitro antiplasmodial activities were proposed. The scaffold exploration and analysis results revealed that CRAD exhibited greater scaffold diversity, followed by NAA and MMV respectively. Unique scaffolds that were not contained in any other compounds in the CRAD datasets were identified in NAA. The Scaffold Tree showed the preponderance of ring systems in NAA and identified virtual scaffolds, which maybe potential bioactive compounds or elucidate the NAA possible synthetic routes. From the machine learning study, the regression and classifier models that were most suitable for NAA were identified as model tree M5P (correlation coefficient = 0.84) and Sequential Minimization Optimization (accuracy = 73.46%) respectively. The test dataset fitted into the applicability domain (AD) defined by the training dataset. The “amine” group was observed to be essential for antimalarial activity in both NAA and MMV dataset but hydroxyl and carbonyl groups may also be relevant in the NAA dataset. The results of the characterization of the virtual compound library showed significant difference (p value < 0.05) between the virtual compound library and currently registered antimalarial drugs in some molecular descriptors (molecular weight, log partition coefficient, hydrogen bond donors and acceptors, polar surface area, shape index, chiral centres, and synthetic feasibility). Tumorigenic and mutagenic substructures were not observed in a large proportion (> 90%) of the virtual compound library. The virtual compound libraries showed sufficient diversity in structures and majority were structurally diverse from currently registered antimalarial drugs. Finally, up to 70% of the virtual compounds were predicted as active antiplasmodial agents. Conclusions:Molecular property profiling of natural products with in-vitro antiplasmodial activities (NAA) and currently registered antimalarial drugs (CRAD) produced a wealth of information that may guide decisions and facilitate antimalarial drug development from natural products and led to a prioritized list of natural products with in-vitro antiplasmodial activities. Molecular scaffold analysis identified unique scaffolds and virtual scaffolds from NAA that possess desirable drug-like properties, which make them ideal starting points for molecular antimalarial drug design. The machine learning study built, evaluated and identified amply accurate regression and classifier accurate models that were used for virtual screening of natural compound libraries to mine possible antimalarial compounds without the expense of bioactivity assays. Finally, a good amount of the virtual compounds generated were structurally diverse from currently registered antimalarial drugs and potentially active antiplasmodial agents. Filtering and optimization may lead to a collection of virtual compounds with unique chemotypes that may be synthesized and added to screening deck against Plasmodium. Natural products Machine learning Scaffold tree Antimalarials Chemoinformatics
1228	Supervised and unsupervised learning for plant and crop row detection in precision agriculture Varshney, Varun January 1900 (has links) Master of Science / Department of Computing and Information Sciences / William H. Hsu / The goal of this research is to present a comparison between different clustering and segmentation techniques, both supervised and unsupervised, to detect plant and crop rows. Aerial images, taken by an Unmanned Aerial Vehicle (UAV), of a corn field at various stages of growth were acquired in RGB format through the Agronomy Department at the Kansas State University. Several segmentation and clustering approaches were applied to these images, namely K-Means clustering, Excessive Green (ExG) Index algorithm, Support Vector Machines (SVM), Gaussian Mixture Models (GMM), and a deep learning approach based on Fully Convolutional Networks (FCN), to detect the plants present in the images. A Hough Transform (HT) approach was used to detect the orientation of the crop rows and rotate the images so that the rows became parallel to the x-axis. The result of applying different segmentation methods to the images was then used in estimating the location of crop rows in the images by using a template creation method based on Green Pixel Accumulation (GPA) that calculates the intensity profile of green pixels present in the images. Connected component analysis was then applied to find the centroids of the detected plants. Each centroid was associated with a crop row, and centroids lying outside the row templates were discarded as being weeds. A comparison between the various segmentation algorithms based on the Dice similarity index and average run-times is presented at the end of the work. precision agriculture deep learning machine learning supervised unsupervised
1229	Probabilistic Models of Topics and Social Events Wei, Wei 01 December 2016 (has links) Structured probabilistic inference has shown to be useful in modeling complex latent structures of data. One successful way in which this technique has been applied is in the discovery of latent topical structures of text data, which is usually referred to as topic modeling. With the recent popularity of mobile devices and social networking, we can now easily acquire text data attached to meta information, such as geo-spatial coordinates and time stamps. This metadata can provide rich and accurate information that is helpful in answering many research questions related to spatial and temporal reasoning. However, such data must be treated differently from text data. For example, spatial data is usually organized in terms of a two dimensional region while temporal information can exhibit periodicities. While some work existing in the topic modeling community that utilizes some of the meta information, these models largely focused on incorporating metadata into text analysis, rather than providing models that make full use of the joint distribution of metainformation and text. In this thesis, I propose the event detection problem, which is a multidimensional latent clustering problem on spatial, temporal and topical data. I start with a simple parametric model to discover independent events using geo-tagged Twitter data. The model is then improved toward two directions. First, I augmented the model using Recurrent Chinese Restaurant Process (RCRP) to discover events that are dynamic in nature. Second, I studied a model that can detect events using data from multiple media sources. I studied the characteristics of different media in terms of reported event times and linguistic patterns. The approaches studied in this thesis are largely based on Bayesian nonparametric methods to deal with steaming data and unpredictable number of clusters. The research will not only serve the event detection problem itself but also shed light into a more general structured clustering problem in spatial, temporal and textual data. Machine Learning Topic Modeling Graphical Models Non-parametric Bayesian Text Mining
1230	Detecting land-cover change using Modis time-series data Kleynhans, Waldo 15 May 2012 (has links) Anthropogenic changes to forests, agriculture and hydrology are being driven by a need to provide water, food and shelter to more than six billion people. Unfortunately, these changes have a major impact on hydrology, biodiversity, climate, socio-economic stability and food security. The most pervasive form of land-cover change in South Africa is human settlement expansion. In many cases, new human settlements and settlement expansion are informal and occur in areas that are typically covered by natural vegetation. Settlements are infrequently mapped on an ad-hoc basis in South Africa which makes information on when and where new settlements form very difficult. Determining where and when new informal settlements occur is beneficial from not only an ecological but also a social development standpoint. The objective of this thesis is to make use of coarse resolution satellite data to infer the location of new settlement developments in an automated manner by making use of machine learning methods. The specific sensor that is considered in this thesis is the MODIS sensor on-board the Terra and Aqua satellites. By using samples taken at regular intervals (8 days), a hyper-temporal time-series is constructed and consequently used to detect new human settlement formations in South Africa. Two change detection methods are proposed in this thesis to achieve the goal of automated new settlement development detection using this high-temporal coarse resolution satellite time-series data. / Thesis (PhD(Eng))--University of Pretoria, 2012. / Electrical, Electronic and Computer Engineering / unrestricted Modis sensor Human settlement expansion Machine learning methods UCTD

Search results