Global ETD Search

1011	Catweetegories : machine learning to organize your Twitter stream Simoes, Christopher Francis 14 April 2014 (has links) We want to create a web service that will help users better organize the flood of tweets they receive every day by using machine learning. This was done by experimenting with ways to manually classify training sets of tweets such as using Amazon’s Mechanical Turk and crawling the Internet for large quantities of tweets. Once we acquired good training data, we began building a classifier. We tried NLTK and Stanford NLP as libraries for creating a classifier, and we ultimately created a classifier that is 87.5% accurate. We then built a web service to expose this classifier and to allow any user on the Internet to organize their tweets. We built our web service by using many open source tools, and we discuss how we integrated these tools to create a production quality web service. We run our web service in the Amazon cloud, and we review the costs associated with running in Amazon. Finally we review the lessons we learned and share our thoughts on further work we would like to do in the future. / text Twitter Tweet Machine learning Java Amazon web services
1012	Infinite-word topic models for digital media Waters, Austin Severn 02 July 2014 (has links) Digital media collections hold an unprecedented source of knowledge and data about the world. Yet, even at current scales, the data exceeds by many orders of magnitude the amount a single user could browse through in an entire lifetime. Making use of such data requires computational tools that can index, search over, and organize media documents in ways that are meaningful to human users, based on the meaning of their content. This dissertation develops an automated approach to analyzing digital media content based on topic models. Its primary contribution, the Infinite-Word Topic Model (IWTM), helps extend topic modeling to digital media domains by removing model assumptions that do not make sense for them -- in particular, the assumption that documents are composed of discrete, mutually-exclusive words from a fixed-size vocabulary. While conventional topic models like Latent Dirichlet Allocation (LDA) require that media documents be converted into bags of words, IWTM incorporates clustering into its probabilistic model and treats the vocabulary size as a random quantity to be inferred based on the data. Among its other benefits, IWTM achieves better performance than LDA while automating the selection of the vocabulary size. This dissertation contributes fast, scalable variational inference methods for IWTM that allow the model to be applied to large datasets. Furthermore, it introduces a new method, Incremental Variational Inference (IVI), for training IWTM and other Bayesian non-parametric models efficiently on growing datasets. IVI allows such models to grow in complexity as the dataset grows, as their priors state that they should. Finally, building on IVI, an active learning method for topic models is developed that intelligently samples new data, resulting in models that train faster, achieve higher performance, and use smaller amounts of labeled data. / text Machine learning Topic models Variational inference Bayesian nonparametrics
1013	Recovery of continuous quantities from discrete and binary data with applications to neural data Knudson, Karin Comer 10 February 2015 (has links) We consider three problems, motivated by questions in computational neuroscience, related to recovering continuous quantities from binary or discrete data or measurements in the context of sparse structure. First, we show that it is possible to recover the norms of sparse vectors given one-bit compressive measurements, and provide associated guarantees. Second, we present a novel algorithm for spike-sorting in neural data, which involves recovering continuous times and amplitudes of events using discrete bases. This method, Continuous Orthogonal Matching Pursuit, builds on algorithms used in compressive sensing. It exploits the sparsity of the signal and proceeds greedily, achieving gains in speed and accuracy over previous methods. Lastly, we present a Bayesian method making use of hierarchical priors for entropy rate estimation from binary sequences. / text Compressive sensing Computational neuroscience Signal processing Spike sorting Machine learning
1014	Situation Awareness in Colour Printing and Beyond Lundström, Jens January 2014 (has links) Machine learning methods are increasingly being used to solve real-world problems in the society. Often, the complexity of the methods are well hidden for users. However, integrating machine learning methods in real-world applications is not a straightforward process and requires knowledge both about the methods and domain knowledge of the problem. Two such domains are colour print quality assessment and anomaly detection in smart homes, which are currently driven by manual monitoring of complex situations. The goal of the presented work is to develop methods, algorithms and tools to facilitate monitoring and understanding of the complex situations which arise in colour print quality assessment and anomaly detection for smart homes. The proposed approach builds on the use and adaption of supervised and unsupervised machine learning methods. Novel algorithms for computing objective measures of print quality in production are proposed in this work. Objective measures are also modelled to study how paper and press parameters influence print quality. Moreover, a study on how print quality is perceived by humans is presented and experiments aiming to understand how subjective assessments of print quality relate to objective measurements are explained. The obtained results show that the objective measures reflect important aspects of print quality, these measures are also modelled with reasonable accuracy using paper and press parameters. The models of objective measures are shown to reveal relationships consistent to known print quality phenomena. In the second part of this thesis the application area of anomaly detection in smart homes is explored. A method for modelling human behaviour patterns is proposed. The model is used in order to detect deviating behaviour patterns using contextual information from both time and space. The proposed behaviour pattern model is tested using simulated data and is shown to be suitable given four types of scenarios. The thesis shows that parts of offset lithographic printing, which traditionally is a human-centered process, can be automated by the introduction of image processing and machine learning methods. Moreover, it is concluded that in order to facilitate robust and accurate anomaly detection in smart homes, a holistic approach which makes use of several contextual aspects is required. / PPQ / SA3L Machine learning Data mining Colour printing Smart homes
1015	Learnable similarity functions and their application to record linkage and clustering Bilenko, Mikhail Yuryevich 28 August 2008 (has links) Not available / text Cluster analysis--Data processing Pattern recognition systems Machine learning
1016	Learning for semantic parsing with kernels under various forms of supervision Kate, Rohit Jaivant, 1978- 28 August 2008 (has links) Not available / text Parsing (Computer grammar) Machine learning
1017	Pattern-Based Vulnerability Discovery Yamaguchi, Fabian 30 October 2015 (has links) No description available. 510 Vulnerabilities Machine Learning Graph Mining Security Informatik (PPN619939052)
1018	Distributed Approach for Peptide Identification Vedanbhatla, Naga V K Abhinav 01 October 2015 (has links) A crucial step in protein identification is peptide identification. The Peptide Spectrum Match (PSM) information set is enormous. Hence, it is a time-consuming procedure to work on a single machine. PSMs are situated by a cross connection, a factual score, or a probability that the match between the trial and speculative is right and original. This procedure takes quite a while to execute. So, there is demand for enhancement of the performance to handle extensive peptide information sets. Development of appropriate distributed frameworks are expected to lessen the processing time. The designed framework uses a peptide handling algorithm named C-Ranker, which takes peptide data as an input then identifies the accurate PSMs. The framework has two steps: Execute the C-Ranker algorithm on servers specified by the user and compare the correct PSM’s data generated via the distributed approach with the normal execution approach of C-Ranker. The objective of this framework is to process expansive peptide datasets utilizing a distributive approach. The nature of the solution calls for parallel execution and hence a decision to implement the same in Java has been taken. The results clearly show that distributed C-Ranker executes in less time as compared to the conventional centralized CRanker application. Around 66.67% of the overall reduction in execution time is shown with this approach. Besides, there is a reduction in the average memory usage with the distributed system running C-Ranker on multiple servers. A great significant benefit that may get overlooked is the fact the distributed CRanker can be used to solve extraordinarily large problems without incurring expenses for a powerful computer or a super computer. Comparison of this approach with An Apache Hadoop Framework for peptide identification with respect to the cost, execution times and flexibility were discussed. C-Ranker machine learning Analytical Chemistry Computer Engineering Computer Sciences
1019	Rhino and Human Detection in Overlapping RGB and LWIR Images / Noshörnings- och människodetektion i överlappande färg- och LVIR-bilder Karlsson Schmidt, Carl January 2015 (has links) The poaching of rhinoceros has increased dramatically the last few years andthe park rangers are often helpless against the militarised poachers. LinköpingUniversity is running several projects with the goal to aid the park rangers intheir work.This master thesis was produced at CybAero AB, which builds Remotely PilotedAircraft System (RPAS). With their helicopters, high end cameras with a rangesufficient to cover the whole area can be flown over the parks.The aim of this thesis is to investigate different methods to automatically findrhinos and humans, using airborne cameras. The system uses two cameras, onecolour camera and one thermal camera. The latter is used to find interestingobjects which are then extracted in the colour image. The object is then classifiedas either rhino, human or other. Several methods for classification have beenevaluated.The results show that classifying solely on the thermal image gives nearly as highaccuracy as classifying only in combination with the colour image. This enablesthe system to be used in dusk and dawn or in bad light conditions. This is animportant factor since most poaching occurs at dusk or dawn. As a conclusion asystem capable of running on low performance hardware and placeable on boardthe aircraft is presented. / Tjuvjakten av noshörningar har ökat drastiskt de senaste åren och parkvakternastår ofta handfallna mot militariserade tjuvjägare. Linköpings Universitet arbetarpå flera projekt som på olika sätt ska vara ett stöd för parkvakterna i deras arbete.Examensarbetet genomfördes på CybAero AB som jobbar med att bygga fjärrstyrdahelikoptrar, så kallade RPAS (Remotely Piloted Aircraft System). Med derassystem kan man bära högkvalitativa kameror och ha stor räckvidd så hela parkenkan övervakas.Det här examensarbetet syftar på att undersöka olika metoder för att från luftburnakameror kunna ge information om vad som pågår i parken. System bygger påatt man har två kameror, en vanlig färgkamera och en värmekamera. Värmekamerananvänds för att hitta intressanta objekt som sedan plockas ut ur färgbilden.Objektet klassificeras sedan som antingen noshörningar, människor eller annat.Flertalet metoder har utvärderas utefter deras förmåga att klassificera objektenkorrekt.Det visade sig att man kan få väldigt bra resultat när man klassificerar endastpå värmebilden vilket ger systemet möjlighet att operera även när det är skymningeller mörkt ute. Det är en väldigt viktig del då de flesta djuren skjuts vidantingen gryning eller skymning. Som slutsats i rapporten presenteras ett förslagpå system som kan köras på lågpresterande hårdvara för att kunna köras direkt iluften. Machine Learning Feature Desciptor Thermal IR LWIR Rhino Human Poaching
1020	Nonlinear mixed effects models for longitudinal DATA Mahbouba, Raid January 2015 (has links) The main objectives of this master thesis are to explore the effectiveness of nonlinear mixed effects model for longitudinal data. Mixed effect models allow to investigate the nature of relationship between the time-varying covariates and the response while also capturing the variations of subjects. I investigate the robustness of the longitudinal models by building up the complexity of the models starting from multiple linear models and ending up with additive nonlinear mixed models. I use a dataset where firms’ leverage are explained by four explanatory variables in addition to a grouping factor that is the firm factor. The models are compared using comparison statistics such as AIC, BIC and by a visual inspection of residuals. Likelihood ratio test has been used in some nested models only. The models are estimated by maximum likelihood and restricted maximum likelihood estimation. The most efficient model is the nonlinear mixed effects model which has lowest AIC and BIC. The multiple linear regression model failed to explain the relation and produced unrealistic statistics Longitudinal data machine learning techniques splines mixed effects leverage.

Search results