Global ETD Search

151	A Semi-Supervised Information Extraction Framework for Large Redundant Corpora Normand, Eric 19 December 2008 (has links) The vast majority of text freely available on the Internet is not available in a form that computers can understand. There have been numerous approaches to automatically extract information from human- readable sources. The most successful attempts rely on vast training sets of data. Others have succeeded in extracting restricted subsets of the available information. These approaches have limited use and require domain knowledge to be coded into the application. The current thesis proposes a novel framework for Information Extraction. From large sets of documents, the system develops statistical models of the data the user wishes to query which generally avoid the lim- itations and complexity of most Information Extractions systems. The framework uses a semi-supervised approach to minimize human input. It also eliminates the need for external Named Entity Recognition systems by relying on freely available databases. The final result is a query-answering system which extracts information from large corpora with a high degree of accuracy. Information Extraction Natural Language Processing Support Vector Machine Machine Learn- ing Information Retrieval unstructured text
152	Reconstructing Textual File Fragments Using Unsupervised Machine Learning Techniques Roux, Brian 19 December 2008 (has links) This work is an investigation into reconstructing fragmented ASCII files based on content analysis motivated by a desire to demonstrate machine learning's applicability to Digital Forensics. Using a categorized corpus of Usenet, Bulletin Board Systems, and other assorted documents a series of experiments are conducted using machine learning techniques to train classifiers which are able to identify fragments belonging to the same original file. The primary machine learning method used is the Support Vector Machine with a variety of feature extractions to train from. Additional work is done in training committees of SVMs to boost the classification power over the individual SVMs, as well as the development of a method to tune SVM kernel parameters using a genetic algorithm. Attention is given to the applicability of Information Retrieval techniques to file fragments, as well as an analysis of textual artifacts which are not present in standard dictionaries. Machine Learning File Carving Fragmented Files Support Vector Machines SVM Digital Forensics Information Retrieval
153	A Balanced Secondary Structure Predictor Islam, Md Nasrul 15 May 2015 (has links) Secondary structure (SS) refers to the local spatial organization of the polypeptide backbone atoms of a protein. Accurate prediction of SS is a vital clue to resolve the 3D structure of protein. SS has three different components- helix (H), beta (E) and coil (C). Most SS predictors are imbalanced as their accuracy in predicting helix and coil are high, however significantly low in the beta. The objective of this thesis is to develop a balanced SS predictor which achieves good accuracies in all three SS components. We proposed a novel approach to solve this problem by combining a genetic algorithm (GA) with a support vector machine. We prepared two test datasets (CB471 and N295) to compare the performance of our predictors with SPINE X. Overall accuracy of our predictor was 76.4% and 77.2% respectively on CB471 and N295 datasets, while SPINE X gave 76.5% overall accuracy on both test datasets. Protein Secondary structure MetaSSPred Support vector machine Genetic algorithm Balanced prediction Other Computer Engineering
154	Tackling the Antibiotic Resistant Bacteria Crisis Using Longitudinal Antibiograms Tlachac, Monica 31 May 2018 (has links) Antibiotic resistant bacteria, a growing health crisis, arise due to antibiotic overuse and misuse. Resistant infections endanger the lives of patients and are financially burdensome. Aggregate antimicrobial susceptibility reports, called antibiograms, are critical for tracking antibiotic susceptibility and evaluating the likelihood of the effectiveness of different antibiotics to treat an infection prior to the availability of patient specific susceptibility data. This research leverages the Massachusetts Statewide Antibiogram database, a rich dataset composed of antibiograms for $754$ antibiotic-bacteria pairs collected by the Massachusetts Department of Public Health from $2002$ to $2016$. However, these antibiograms are at least a year old, meaning antibiotics are prescribed based on outdated data which unnecessarily furthers resistance. Our objective is to employ data science techniques on these antibiograms to assist in developing more responsible antibiotic prescription practices. First, we use model selectors with regression-based techniques to forecast the current antimicrobial resistance. Next, we develop an assistant to immediately identify clinically and statistically significant changes in antimicrobial resistance between years once the most recent year of antibiograms are collected. Lastly, we use k-means clustering on resistance trends to detect antibiotic-bacteria pairs with resistance trends for which forecasting will not be effective. These three strategies can be implemented to guide more responsible antibiotic prescription practices and thus reduce unnecessary increases in antibiotic resistance. Antibiograms Antimicrobial Resistance ARIMA Clinical Significance Model Selector Outlier Detection Regression Statistical Significance Support Vector Regression
155	Tackling the Antibiotic Resistant Bacteria Crisis Using Longitudinal Antibiograms Tlachac, Monica 31 May 2018 (has links) Antibiotic resistant bacteria, a growing health crisis, arise due to antibiotic overuse and misuse. Resistant infections endanger the lives of patients and are financially burdensome. Aggregate antimicrobial susceptibility reports, called antibiograms, are critical for tracking antibiotic susceptibility and evaluating the likelihood of the effectiveness of different antibiotics to treat an infection prior to the availability of patient specific susceptibility data. This research leverages the Massachusetts Statewide Antibiogram database, a rich dataset composed of antibiograms for $754$ antibiotic-bacteria pairs collected by the Massachusetts Department of Public Health from $2002$ to $2016$. However, these antibiograms are at least a year old, meaning antibiotics are prescribed based on outdated data which unnecessarily furthers resistance. Our objective is to employ data science techniques on these antibiograms to assist in developing more responsible antibiotic prescription practices. First, we use model selectors with regression-based techniques to forecast the current antimicrobial resistance. Next, we develop an assistant to immediately identify clinically and statistically significant changes in antimicrobial resistance between years once the most recent year of antibiograms are collected. Lastly, we use k-means clustering on resistance trends to detect antibiotic-bacteria pairs with resistance trends for which forecasting will not be effective. These three strategies can be implemented to guide more responsible antibiotic prescription practices and thus reduce unnecessary increases in antibiotic resistance. Antibiograms Antimicrobial Resistance ARIMA Clinical Significance Model Selector Outlier Detection Regression Statistical Significance Support Vector Regression
156	Detecting exoplanets with machine learning : A comparative study between convolutional neural networks and support vector machines Tiensuu, Jacob, Linderholm, Maja, Dreborg, Sofia, Örn, Fredrik January 2019 (has links) In this project two machine learning methods, Support Vector Machine, SVM, and Convolutional Neural Network, CNN, are studied to determine which method performs best on a labeled data set containing time series of light intensity from extrasolar stars. The main difficulty is that in the data set there are a lot more non exoplanet stars than there are stars with orbiting exoplanets. This is causing a so called imbalanced data set which in this case is improved by i.e. mirroring the curves of stars with an orbiting exoplanet and adding them to the set. Trying to improve the results further, some preprocessing is done before implementing the methods on the data set. For the SVM, feature extraction and fourier transform of the time-series are important measures but further preprocessing alternatives are investigated. For the CNN-method the time-series are both detrended and smoothed, giving two inputs for the same light curve. All code is implemented in python. Of all the validation parameters recall is considered the main priority since it is more important to find all exoplanets than finding all non exoplanets. CNN turned out to be the best performing method for the chosen configurations with 1.000 in recall which exceeds SVM’s recall 0.800. Considering the second validation parameter precision CNN is also the best performing method with a precision of 0.769 over SVM's 0.571. Machine learning Exoplanet Support vector machine Convolution neuralnetwork Computer and Information Sciences Data- och informationsvetenskap
157	Machine learning to detect anomalies in datacenter Lindh, Filip January 2019 (has links) This thesis investigates the possibility of using anomaly detection on performance data of virtual servers in a datacenter to detect malfunctioning servers. Using anomaly detection can potentially reduce the time a server is malfunctioning, as the server can be detected and checked before the error has a significant impact. Several approaches and methods were applied and evaluated on one virtual server: the K-nearest neighbor algorithm, the support-vector machine, the K-means clustering algorithm, self-organizing maps, CPU-memory usage ratio using a Gaussian model, and time series analysis using neural network and linear regression. The evaluation and comparison of the methods were mainly based on reported errors during the time period they were tested. The better the detected anomalies matched the reported errors the higher score they received. It turned out that anomalies in performance data could be linked to real errors in the server to some extent. This enables the possibility of using anomaly detection on performance data as a way to detect malfunctioning servers. The most simple method, looking at the ratio between memory usage and CPU, was the most successful one, detecting most errors. However the anomalies were often detected just after the error had been reported. Support vector machine were more successful at detecting anomalies before they were reported. The proportion of anomalies played a big role however and K-nearest neighbor received higher score when having a higher proportion of anomalies. machine learning anomaly detection server support vector machine performance data Computer Sciences Datavetenskap (datalogi)
158	Aggregating predictions using Non-Disclosed Conformal Prediction Carrión Brännström, Robin January 2019 (has links) When data are stored in different locations and pooling of such data is not allowed, there is an informational loss when doing predictive modeling. In this thesis, a new method called Non-Disclosed Conformal Prediction (NDCP) is adapted into a regression setting, such that predictions and prediction intervals can be aggregated from different data sources without interchanging any data. The method is built upon the Conformal Prediction framework, which produces predictions with confidence measures on top of any machine learning method. The method is evaluated on regression benchmark data sets using Support Vector Regression, with different sizes and settings for the data sources, to simulate real life scenarios. The results show that the method produces conservatively valid prediction intervals even though in some settings, the individual data sources do not manage to create valid intervals. NDCP also creates more stable intervals than the individual data sources. Thanks to its straightforward implementation, data owners which cannot share data but would like to contribute to predictive modeling, would benefit from using this method. Conformal Prediction Non-Disclosed Conformal Prediction Support Vector Regression Reliable Machine Learning Probability Theory and Statistics Sannolikhetsteori och statistik
159	Aplicação de máquinas de vetor de suporte e modelos auto-regressivos de média móvel na classificação de sinais eletromiográficos. / Application of support vector machines and autoregressive moving average models in electromyography signal classification. Barretto, Mateus Ymanaka 10 December 2007 (has links) O diagnóstico de doenças neuromusculares é feito pelo uso conjunto de várias ferramentas. Dentre elas, o exame de eletromiografia clínica fornece informações vitais ao diagnóstico. A aplicação de alguns classificadores (discriminante linear e redes neurais artificiais) aos diversos parâmetros dos sinais de eletromiografia (número de fases, de reversões e de cruzamentos de zero, freqüência mediana, coeficientes auto-regressivos) tem fornecido resultados promissores na literatura. No entanto, a necessidade de um número grande de coeficientes auto-regressivos direcionou este mestrado ao uso de modelos auto-regressivos de média móvel com um número menor de coeficientes. A classificação (em normal, neuropático ou miopático) foi feita pela máquina de vetor de suporte, um tipo de rede neural artificial de uso recente. O objetivo deste trabalho foi o de estudar a viabilidade do uso de modelos auto-regressivos de média móvel (ARMA) de ordem baixa, em vez de auto-regressivos de ordem alta, em conjunção com a máquina de vetor de suporte, para auxílio ao diagnóstico. Os resultados indicam que a máquina de vetor de suporte tem desempenho melhor que o discriminante linear de Fisher e que os modelos ARMA(1,11) e ARMA(1,12) fornecem altas taxas de classificação (81,5%), cujos valores são próximos ao máximo obtido com modelos auto-regressivos de ordem 39. Portanto, recomenda-se o uso da máquina de vetor de suporte e de modelos ARMA (1,11) ou ARMA(1,12) para a classificação de sinais de eletromiografia de agulha, de 800ms de duração e amostrados a 25kHz. / The diagnosis of neuromuscular diseases is attained by the combined use of several tools. Among these tools, clinical electromyography provides key information to the diagnosis. In the literature, the application of some classifiers (linear discriminant and artificial neural networks) to a variety of electromyography parameters (number of phases, turns and zero crossings; median frequency, auto-regressive coefficients) has provided promising results. Nevertheless, the need of a large number of auto-regressive coefficients has guided this Master\'s thesis to the use of a smaller number of auto-regressive moving-average coefficients. The classification task (into normal, neuropathic or myopathic) was achieved by support vector machines, a type of artificial neural network recently proposed. This work\'s objective was to study if low-order auto-regressive moving-average (ARMA) models can or cannot be used to substitute high-order auto-regressive models, in combination with support vector machines, for diagnostic purposes. Results point that support vector machines have better performance than Fisher linear discriminants. They also show that ARMA(1,11) and ARMA(1,12) models provide high classification rates (81.5%). These values are close to the maximum obtained by using 39 auto-regressive coefficients. So, we recommend the use of support vector machines and ARMA(1,11) or ARMA(1,12) to the classification of 800ms needle electromyography signals acquired at 25kHz. Autoregressive Electromyography Eletromiografia Fisher linear discriminant Redes neurais Regressão linear Support vector machine
160	Application of Support Vector Machines for Damage Detection in Structures Sharma, Siddharth 05 January 2009 (has links) Support vector machines (SVMs) are a set of supervised learning methods that have recently been applied for structural damage detection due to their ability to form an accurate boundary from a small amount of training data. During training, they require data from the undamaged and damaged structure. The unavailability of data from the damaged structure is a major challenge in such methods due to the irreversibility of damage. Recent methods create data for the damaged structure from finite element models. In this thesis we propose a new method to derive the dataset representing the damage structure from the dataset measured on the undamaged structure without using a detailed structural finite element model. The basic idea is to reduce the values of a copy of the data from the undamaged structure to create the data representing the damaged structure. The performance of the method in the presence of measurement noise, ambient base excitation, wind loading is investigated. We find that SVMs can be used to detect small amounts of damage in the structure in the presence of noise. The ability of the method to detect damage at different locations in a structure and the effect of measurement location on the sensitivity of the method has been investigated. An online structural health monitoring method has also been proposed to use the SVM boundary, trained on data measured from the damaged structure, as an indicator of the structural health condition. Statistical Pattern Recognition Online Health Monitoring Support Vector Machines Structural analysis (Engineering) Machine learning

Search results