Global ETD Search

151	Improving the effectiveness and the efficiency of Knowledge Base Refinement Carbonara, Leonardo January 1996 (has links) Knowledge Base Refinement is an area of Machine Learning whose primary goal is the automatic detection and correction of errors in faulty expert system's knowledge bases. A very important feature of a refinement system is the mechanism used to select the refinements to be implemented. Since there are usually different ways to fix a fault, most current Knowledge Base Refinement systems use extensive heuristics to choose one or a few alternative refinements from a set of possible corrections. This approach is justified by the intention of avoiding the computational problems inherent in the generation and testing of multiple refinements. On the other hand, such systems are liable to miss solutions. The opposite approach was adopted by the Knowledge Base Refinement system KRUST which proposed many alternative corrections to refine each wrongly-solved example. Although KRUST demonstrated the feasibility of this approach, the potential of multiple refinement generation could not be fully exploited since the system used a limited set of refinement operators in order to contain the number of alternative fixes generated for each fault, and hence was unable to rectify certain kinds of errors. Additionally, the time taken to produce and test a set of refined knowledge bases was considerable for any non-trivial knowledge base. This thesis presents a major revision of the KRUST system. Like its predecessor, the resulting system, STALKER, proposes many alternative refinements to correct each wrongly classified example in the training set. Two enhancements have been made: the class of errors handled by KRUST has been augmented through the introduction of inductive refinement operators; the testing phase of Knowledge Base Refinement has been speeded up considerably by means of a technique based on a Truth Maintenance System (TMS). The resulting system is more effective than other refinement systems because it generates many alternative refinements. At the same time, STALKER is very efficient since KRUST's computationally expensive implementation and testing of refined knowledge bases has been replaced by a TMS-based simulator. 005 Machine learning; Knowledge acquisition
152	Connectionist variable binding architectures Stark, Randall J. January 1993 (has links) No description available. 003.5 Neural networks; Machine learning
153	A local model network approach to nonlinear modelling Murray-Smith, Roderick January 1994 (has links) This thesis describes practical learning systems able to model unknown nonlinear dynamic processes from their observed input-output behaviour. Local Model Networks use a number of simple, locally accurate models to represent a globally complex process, and provide a powerful, flexible framework for the integration of different model structures and learning algorithms. A major difficulty with Local Model Nets is the optimisation of the model structure. A novel Multi-Resolution Constructive (MRC) structure identification algorithm for local model networks is developed. The algorithm gradually adds to the model structure by searching for 'complexity' at ever decreasing scales of 'locality'. Reliable error estimates are useful during development and use of models. New methods are described which use the local basis function structure to provide interpolated state-dependent estimates of model accuracy. Active learning methods which automatically construct a training set for a given Local Model structure are developed, letting the training set grow in step with the model structure - the learning system 'explores' its data set looking for useful information. Local Learning methods developed in this work are explicitly linked to the local nature of the basis functions and provide a more computationally efficient method, more interpretable models and, due to the poor conditioning of the parameter estimation problem, often lead to an improvement in generalisation, compared to global optimisation methods. Important side-effects of normalisation of the basis functions are examined. A new hierarchical extension of Local Model Nets is presented: the Learning Hierarchy of Models (LHM), where local models can be sub-networks, leading to a tree-like hierarchy of softly interpolated local models. Constructive model structure identification algorithms are described, and the advantages of hierarchical 'divide-and-conquer' methods for modelling, especially in high dimensional spaces are discussed. The structures and algorithms are illustrated using several synthetic examples of nonlinear multivariable systems (dynamic and static), and applied to real world examples. Two nonlinear dynamic applications are described: predicting the strip thickness in an aluminium rolling mill from observed process data, and modelling robot actuator nonlinearities from measured data. The Local Model Nets reliably constructed models which provided the best results to date on the Rolling Mill application. 003.5 Machine learning; Learning algorithms
154	Detecting ransomware in encrypted network traffic using machine learning Modi, Jaimin 29 August 2019 (has links) Ransomware is a type of malware that has gained immense popularity in recent time due to its money extortion techniques. It locks out the user from the system files until the ransom amount is paid. Existing approaches for ransomware detection predominantly focus on system level monitoring, for instance, by tracking the file system characteristics. To date, only a small amount of research has focused on detecting ransomware at the network level, and none of the published proposals have addressed the challenges raised by the fact that an increasing number of ransomware are using encrypted channels for communication with the command and control (C&C) server, mainly, over the HTTPS protocol. Despite the limited amount of ransomware-specific data available in network traffic, network-level detection represents a valuable extension of system-level detection as this would provide early indication of ransomware activities and allow disrupting such activities before serious damage can take place. To address the aforementioned gap, we propose, in the current thesis, a new approach for detecting ransomware in encrypted network traffic that leverages network connection and certificate information and machine learning. We observe that network traffic characteristics can be divided into 3 categories – connection based, encryption based, and certificate based. Based on these characteristics, we explore a feature model that separates effectively ransomware traffic from normal traffic. We study three different classifiers – Random Forest, SVM and Logistic Regression. Experimental evaluation on diversified dataset yields a detection rate of 99.9% and a false positive rate of 0% for random forest, the best performing of the three classifiers. / Graduate Machine Learning Ransomware Network Traffic
155	BotFlowMon: Identify Social Bot Traffic With NetFlow and Machine Learning Feng, Yebo 06 September 2018 (has links) With the rapid development of online social networks (OSN), maintaining the security of social media ecosystems becomes dramatically important for public. Among all the security threats in OSN, malicious social bot is the most common risk factor. This paper puts forward a detection method called BotFlowMon that only utilize NetFlow data to identify OSN bot traffic. The detection procedure takes the raw NetFlow data as input and use DBSCAN algorithm to aggregate related flows into transaction level data. Then a special data fusion technique along with a visualization method are proposed to extract features, normalize values and help analyzing flows. A new clustering algorithm called Clustering Based on Density Sort and Valley Point Competition is also designed to subdivide transactions into basic operations. After the above preprocessing steps, some classic machine learning algorithms are applied to construct the classification model. / 2020-09-06 Machine learning NetFlow Social bot
156	An evaluation of Unsupervised Machine Learning Algorithms for Detecting Fraud and Abuse in the U.S. Medicare Insurance Program Unknown Date (has links) The population of people ages 65 and older has increased since the 1960s and current estimates indicate it will double by 2060. Medicare is a federal health insurance program for people 65 or older in the United States. Medicare claims fraud and abuse is an ongoing issue that wastes a large amount of money every year resulting in higher health care costs and taxes for everyone. In this study, an empirical evaluation of several unsupervised machine learning approaches is performed which indicates reasonable fraud detection results. We employ two unsupervised machine learning algorithms, Isolation Forest and Unsupervised Random Forest, which have not been previously used for the detection of fraud and abuse on Medicare data. Additionally, we implement three other machine learning methods previously applied on Medicare data which include: Local Outlier Factor, Autoencoder, and k-Nearest Neighbor. For our dataset, we combine the 2012 to 2015 Medicare provider utilization and payment data and add fraud labels from the List of Excluded Individuals/Entities (LEIE) database. Results show that Local Outlier Factor is the best model to use for Medicare fraud detection. / Includes bibliography. / Thesis (M.S.)--Florida Atlantic University, 2018. / FAU Electronic Theses and Dissertations Collection Machine learning Medicare fraud Algorithms
157	Causal discovery from non-experimental data: 基於非實驗數據的因果分析 / 基於非實驗數據的因果分析 / CUHK electronic theses & dissertations collection / Causal discovery from non-experimental data: Ji yu fei shi yan shu ju de yin guo fen xi / Ji yu fei shi yan shu ju de yin guo fen xi January 2014 (has links) Chen, Zhitang. / Thesis Ph.D. Chinese University of Hong Kong 2014. / Includes bibliographical references (leaves 140-146). / Abstracts also in Chinese. / Title from PDF title page (viewed on 14, September, 2016). / Chen, Zhitang. Causation--Data processing Machine learning
158	Essays on Machine Learning Methods for Data-Driven Marketing Decisions Dew, Ryan January 2019 (has links) Across three essays, I explore how modern statistical machine learning approaches can be used to glean novel marketing insights from data and to facilitate data-driven decision support in new domains. In particular, I draw on Bayesian nonparametrics, deep generative modeling, and modern Bayesian computational techniques to develop new methodologies that enhance standard marketing models, address modern challenges in data-driven marketing, and, as I show through applications to real world data, glean new, managerially relevant insights. Substantively, my work addresses issues in customer base analysis, the estimation of consumer preferences, and brand identity and logo design. In my first essay, I address how multi-product firms can understand and predict customer purchasing dynamics in the presence of partial information, by developing a Bayesian nonparametric model for customer purchasing activity. This framework yields an interpretable, model-based dashboard, which can be used to predict future activity, and guide managerial decision making. In my second essay, I explore the flexible modeling of customer brand choice dynamics using a novel form of heterogeneity, which I term dynamic heterogeneity. Specifically, I develop a novel doubly hierarchical Gaussian process framework to flexibly model how the preferences of individual customers evolve relative to one another over time, and illustrate the utility of the framework with an application to purchasing during the Great Recession. Finally, in my third essay, I explore how data and models can inform firms' aesthetic choices, in particular the design of their logos. To that end, I develop image processing algorithms and a deep generative model of brand identity that links visual data with textual descriptions of firms and brand personality perceptions, which can be used for understanding design standards, ideation, and ultimately, data-driven design. Machine learning Decision making Business
159	Effective implementation of Gaussian process regression for machine learning Davies, Alexander James January 2015 (has links) No description available. 620 Gaussian processes ; Machine learning
160	Searching for pulsars : from multi-beam receivers to interferometers Cooper, Sally January 2017 (has links) It is estimated that there are âˆ1⁄4100,000 radio emitting pulsars in the Galaxy. The characteristic narrow beams of pulsars means that only some of these will be visible from Earth due to necessary alignment of the radiation beam across our line of sight. Over 2500 pulsars have been discovered since their initial discovery fifty years ago, this year. Such a small sample of the total population can provide only limited knowledge of the different groups, properties and physics of pulsars. Some of the most basic questions surrounding their origin and radiation processes remain open but pulsar surveys provide a way of discovering new sources capable of answering them. The discovery of pulsars has always required innovative hardware and software. Their discovery, although serendipitous, relied on high time resolution, large collecting area, and plenty of on-sky time. In this thesis, I present results from two major surveys using a latest generation telescope and one of the most technically advanced surveys yet undertaken, and discuss their results, challenges and opportunities. The LOFAR Tied Array All Sky (LOTAAS) survey is an ongoing all-Northern sky search for pulsars and transients using the LOw Frequency ARray (LOFAR). It it the first large-scale pulsar survey at the low frequency of 135 MHz using a multi-beam interferometer. The survey uses 222 beams, generated in software, in a single one hour observation of mixed tied-array beams (TABs) and sub-array pointings (SAPs). Together they simultaneously provide a large field of view of âˆ1⁄4 60 square degrees and achieving sub mJy sensitivity. The sky will be observed in full with the TABs such that the SAPs will cover the sky three times over. In this thesis, I present the results of the first LOTAAS sky pass (of three). Using a pipeline that I co-developed I have efficiently processed more than 2.5 PB of data using the Dutch national supercomputer Cartesius. Processing of the survey has resulted in the redetection of 155 known pulsars that I use to analyse the sensitivity of the survey by comparing the detected fluxes to the expected fluxes of those known pulsars. I show that the LOTAAS survey fluxes mostly agree to within a factor of two of the expected fluxes extrapolated from the published values at 400 MHz. The 155 redetected known 13 pulsars include 5 millisecond pulsars as well as 22 pulsars that were blindly and independently discovered in the LOTAAS and Green Bank North Celestial Cap surveys, demonstrating the discovery potential of the survey. I present the basic parameters of the first 20 pulsars discovered in the LOTAAS survey. I demonstrate how LOFARâ€TMs multi-beaming capabilities can be exploited for the localisation and confirmation of discoveries. All pulsars discovered in the survey are monitored in long term timing programmes with the LOFAR Core and the Lovell Telescope as well as with the UK and German international LOFAR stations for the brightest sources. I present the phase-coherent timing solutions of 17 pulsars including those of PSR J0140+56 and PSR J0614+37 that were first discovered in the LOFAR Tied-Array Survey (LOTAS), a pre-cursor to LOTAAS. I show that the spin proper- ties of the LOFAR pulsars suggest a possible over abundance of LOFAR discovered sources towards the death line in the P âˆ’ P Ì‡ diagram. From this I infer that the LOTAAS survey is preferentially discovering pulsars that are older (Ï„c > 10 Myr) with low magnetic field strengths (B < 1 TG). I present the discovery and average profiles of the 22 pulsars discovered with LO- FAR and, with the exception of PSR J1529+40, all pulsars were found to have a mea- sured duty cycle less than 10% and more than half have duty cycles less than 5%. This is true for all LOFAR pulse profiles at 148 and 1520 MHz. There is no clear evidence to suggest a preferred radius to frequency mapping, although we find that it is definitely not the case that pulse widths are typically broader at lower frequencies. We have observed all of these 22 pulsars with the Lovell Telescope at 1520 MHz, of which only eight are detected regularly. For these pulsars the average profiles at 1520 MHz are presented and their width evolution with frequency examined. PSR J1529+40 displays significant profile evolution between the two frequencies. This pulsar also has a very small period derivative (10âˆ’19) compared to pulsars with similar periods (0.5 seconds). In this thesis, I also present a summary of results and discoveries of the High Time Resolution Universe (HTRU) High Latitude survey. The processing of survey data was performed at University of Manchester, UK and Swinburne University in Australia using two different pipelines named DTSC and Morello respectively. Analysis with the the DTSC pipeline led to the discovery of 7 new pulsars (Thornton, 2013) and processing with the Morello pipeline (Morello, 2016) led to the discovery of 6 pulsars presented here. Five other pulsars were discovered in the high latitude survey bringing the total number to 18 discoveries (P> 100 ms). This is greater than the 11 new pulsars 14 estimated from population synthesis simulations by Keith et al. (2010). This is an increase of 60% and the 18 pulsars presented here account for 8% of the known high latitude population. I present the phase coherent timing solutions for five of these 18 pulsars and their average profiles. The two HTRU pipelines described in this thesis, DTSC and Morello, are used as a comparison of search techniques. I show that the performance of the Morello pipeline is better and that the DTSC pipeline fails to detect three of the 18 pulsars discovered in the high latitude survey. Both surveys presented in this thesis have generated tens of millions of candidates. I explore the different methods for filtering candidate numbers to reduce the number that are needed to be viewed by eye. One of these methods is machine learning classification. We present, with Lyon et al. (2016), a set of 8 new features that are, in combination with a Decision Tree classifier, applied to both the LOTAAS and HTRU survey candidates. For LOTAAS, the classifier reduces the 20,000 candidates produced per pointing to just 500 for visual inspection. In the case of HTRU, of the 1.5 million candidates generated with periods greater than 100 ms, 350,000 are predicted to be positive, i.e. a pulsar. I test the performance of the classifier and show that it achieves 94% recall on the LOTAAS dataset and 100% recall on the HTRU dataset. However, for both surveys, we found the false positive rate to be high, up to 80% for HTRU. We demonstrate the ability of the classifier to separate pulsars from candidates arising from noise but show that radio frequency interference now presents the next challenge in candidate selection. 500 pulsar ; astronomy ; machine learning

Search results