Global ETD Search

241	Improving Filtering of Email Phishing Attacks by Using Three-Way Text Classifiers Trevino, Alberto 13 March 2012 (has links) (PDF) The Internet has been plagued with endless spam for over 15 years. However, in the last five years spam has morphed from an annoying advertising tool to a social engineering attack vector. Much of today's unwanted email tries to deceive users into replying with passwords, bank account information, or to visit malicious sites which steal login credentials and spread malware. These email-based attacks are known as phishing attacks. Much has been published about these attacks which try to appear real not only to users and subsequently, spam filters. Several sources indicate traditional content filters have a hard time detecting phishing attacks because the emails lack the traditional features and characteristics of spam messages. This thesis tests the hypothesis that by separating the messages into three categories (ham, spam and phish) content filters will yield better filtering performance. Even though experimentation showed three-way classification did not improve performance, several additional premises were tested, including the validity of the claim that phishing emails are too much like legitimate emails and the ability of Naive Bayes classifiers to properly classify emails. email spam filtering phish phishing attacks support vector machines maximum entropy naive bayes bayesian filters Information Security
242	Maskininlärning för att förutspå churn baserat på diskontinuerlig beteendedata / Machine learning to predict churn based on discontinuous behavioral data Öbom, Anton, Bratteby, Adrian January 2017 (has links) This report is about examining the fields of machine learning and digital marketing, using machine learning as a tool to predict churn in a new domain of companies that do not track their customers extensively, i.e where behaviour data is discontinuous. To predict churn relatively simple out of the box models, such as support vector machines and random forests, are used to achieve an acceptable outcome. To be on par with the models used for churn prediction in subscription based services, this report concludes that more research has to be done using more effective evaluation metrics. Finally it is presented how these discoveries can be commercialized and the business related benefits of using churn prediction for the employer Sellpy. / Denna rapport handlar om att utforska fälten maskininlärning och digital marknadsföring, genom att använda maskininlärning som ett redskap för att förutspå churn i en typ av företag med diskontinuerlig beteendedata. För att förutspå churn finns relativt simpla "out of the box"-modeller, som support vector machines och random forests, som används för att nå acceptabla resultat. För att nå liknande resultat som i arbeten där churn utförs på kontinuerlig beteendedata konstaterar denna rapport att framtida arbeten forska på vilka utvärderingsmetriker som är mest lämpade. I rapporten presenteras också hur dessa upptäckter kan kommersialiseras och hur företaget Sellpy kan tjäna på att förutspå churn. Churn churning prediction machine learning discontinuous behavioral data random forests support vector machines Computer Sciences Datavetenskap (datalogi)
243	Tillämpning av maskininlärning för att införa automatisk adaptiv uppvärmning genom en studie på KTH Live-In Labs lägenheter / Using machine learning to implement adaptive heating; A study on KTH Live-In Labs apartments Åsenius, Ingrid January 2020 (has links) The purpose of this study is to investigate if it is possible to decrease Sweden's energy consumption through adaptive heating that uses climate data to detect occupancy in apartments using machine learning. The application of the study has been made using environmental data from one of KTH Live-In Labs apartments. The data was first used to investigate the possibility to detect occupancy through machine learning and was then used as input in an adaptive heating model to investigate potential benefits on the energy consumption and costs of heating. The result of the study show that occupancy can be detected using environmental data but not with 100% accuracy. It also shows that the features that have greatest impact in detecting occupancy is light and carbon dioxide and that the best performing machine learning algorithm, for the used dataset, is the Decision Tree algorithm. The potential energy savings through adaptive heating was estimated to be up to 10,1%. In the final part of the paper, it is discussed how a value creating service can be created around adaptive heating and its possibility to reach the market. / Syftet med den här rapporten är att undersöka om det är möjligt att sänka Sveriges energikonsumtion genom att i lägenheter införa adaptiv uppvärmning som baserar sig på närvaroklassificering av klimatdata. Klimatdatan som använts i studien är tagen från en av KTH Live-In Labs lägenheter. Datan användes först för att undersöka om det var möjligt att detektera närvaro genom maskininlärning och sedan som input i en modell för adaptiv uppvärmning. I modellen för adaptiv uppvärmning undersöktes de potentiella besparingarna i energibehov och uppvärmningskostnader. Resultaten visar att de bästa featuresen för att klassificera närvaro är ljus och koldioxid. Den maskininlärningsalgoritm som presterade bäst på datasetet var Decision Tree algoritmen. Den potentiella energibesparingen genom införandet av adaptiv uppvärmning uppskattas vara upp till 10,1%. I rapportens sista del diskuteras det hur en värdeskapande tjänst kan skapas kring adaptiv uppvärmning samt dess potential att nå marknaden. Linear Regression Support Vector Machines Decision Tree Occupancy detection energy consumption Sweden adaptive heating Computer and Information Sciences Data- och informationsvetenskap
244	Efficient Techniques For Relevance Feedback Processing In Content-based Image Retrieval Liu, Danzhou 01 January 2009 (has links) In content-based image retrieval (CBIR) systems, there are two general types of search: target search and category search. Unlike queries in traditional database systems, users in most cases cannot specify an ideal query to retrieve the desired results for either target search or category search in multimedia database systems, and have to rely on iterative feedback to refine their query. Efficient evaluation of such iterative queries can be a challenge, especially when the multimedia database contains a large number of entries, and the search needs many iterations, and when the underlying distance measure is computationally expensive. The overall processing costs, including CPU and disk I/O, are further emphasized if there are numerous concurrent accesses. To address these limitations involved in relevance feedback processing, we propose a generic framework, including a query model, index structures, and query optimization techniques. Specifically, this thesis has five main contributions as follows. The first contribution is an efficient target search technique. We propose four target search methods: naive random scan (NRS), local neighboring movement (LNM), neighboring divide-and-conquer (NDC), and global divide-and-conquer (GDC) methods. All these methods are built around a common strategy: they do not retrieve checked images (i.e., shrink the search space). Furthermore, NDC and GDC exploit Voronoi diagrams to aggressively prune the search space and move towards target images. We theoretically and experimentally prove that the convergence speeds of GDC and NDC are much faster than those of NRS and recent methods. The second contribution is a method to reduce the number of expensive distance computation when answering k-NN queries with non-metric distance measures. We propose an efficient distance mapping function that transfers non-metric measures into metric, and still preserves the original distance orderings. Then existing metric index structures (e.g., M-tree) can be used to reduce the computational cost by exploiting the triangular inequality property. The third contribution is an incremental query processing technique for Support Vector Machines (SVMs). SVMs have been widely used in multimedia retrieval to learn a concept in order to find the best matches. SVMs, however, suffer from the scalability problem associated with larger database sizes. To address this limitation, we propose an efficient query evaluation technique by employing incremental update. The proposed technique also takes advantage of a tuned index structure to efficiently prune irrelevant data. As a result, only a small portion of the data set needs to be accessed for query processing. This index structure also provides an inexpensive means to process the set of candidates to evaluate the final query result. This technique can work with different kernel functions and kernel parameters. The fourth contribution is a method to avoid local optimum traps. Existing CBIR systems, designed around query refinement based on relevance feedback, suffer from local optimum traps that may severely impair the overall retrieval performance. We therefore propose a simulated annealing-based approach to address this important issue. When a stuck-at-a-local-optimum occurs, we employ a neighborhood search technique (i.e., simulated annealing) to continue the search for additional matching images, thus escaping from the local optimum. We also propose an index structure to speed up such neighborhood search. Finally, the fifth contribution is a generic framework to support concurrent accesses. We develop new storage and query processing techniques to exploit sequential access and leverage inter-query concurrency to share computation. Our experimental results, based on the Corel dataset, indicate that the proposed optimization can significantly reduce average response time while achieving better precision and recall, and is scalable to support a large user community. This latter performance characteristic is largely neglected in existing systems making them less suitable for large-scale deployment. With the growing interest in Internet-scale image search applications, our framework offers an effective solution to the scalability problem. content-based image retrieval relevance feedback target search index structures support vector machines support concurrent accesses Computer Sciences Engineering
245	IMAGE CAPTIONING FOR REMOTE SENSING IMAGE ANALYSIS Hoxha, Genc 09 August 2022 (has links) Image Captioning (IC) aims to generate a coherent and comprehensive textual description that summarizes the complex content of an image. It is a combination of computer vision and natural language processing techniques to encode the visual features of an image and translate them into a sentence. In the context of remote sensing (RS) analysis, IC has been emerging as a new research area of high interest since it not only recognizes the objects within an image but also describes their attributes and relationships. In this thesis, we propose several IC methods for RS image analysis. We focus on the design of different approaches that take into consideration the peculiarity of RS images (e.g. spectral, temporal and spatial properties) and study the benefits of IC in challenging RS applications. In particular, we focus our attention on developing a new decoder which is based on support vector machines. Compared to the traditional decoders that are based on deep learning, the proposed decoder is particularly interesting for those situations in which only a few training samples are available to alleviate the problem of overfitting. The peculiarity of the proposed decoder is its simplicity and efficiency. It is composed of only one hyperparameter, does not require expensive power units and is very fast in terms of training and testing time making it suitable for real life applications. Despite the efforts made in developing reliable and accurate IC systems, the task is far for being solved. The generated descriptions are affected by several errors related to the attributes and the objects present in an RS scene. Once an error occurs, it is propagated through the recurrent layers of the decoders leading to inaccurate descriptions. To cope with this issue, we propose two post-processing techniques with the aim of improving the generated sentences by detecting and correcting the potential errors. They are based on Hidden Markov Model and Viterbi algorithm. The former aims to generate a set of possible states while the latter aims at finding the optimal sequence of states. The proposed post-processing techniques can be injected to any IC system at test time to improve the quality of the generated sentences. While all the captioning systems developed in the RS community are devoted to single and RGB images, we propose two captioning systems that can be applied to multitemporal and multispectral RS images. The proposed captioning systems are able at describing the changes occurred in a given geographical through time. We refer to this new paradigm of analysing multitemporal and multispectral images as change captioning (CC). To test the proposed CC systems, we construct two novel datasets composed of bitemporal RS images. The first one is composed of very high-resolution RGB images while the second one of medium resolution multispectral satellite images. To advance the task of CC, the constructed datasets are publically available in the following link: https://disi.unitn.it/~melgani/datasets.html. Finally, we analyse the potential of IC for content based image retrieval (CBIR) and show its applicability and advantages compared to the traditional techniques. Specifically, we focus our attention on developing a CBIR systems that represents an image with generated descriptions and uses sentence similarity to search and retrieve relevant RS images. Compare to traditional CBIR systems, the proposed system is able to search and retrieve images using either an image or a sentence as a query making it more comfortable for the end-users. The achieved results show the promising potentialities of our proposed methods compared to the baselines and state-of-the art methods.
246	A Comparison of Signal Processing and Classification Methods for Brain-Computer Interface Renfrew, Mark E. January 2009 (has links) No description available. Biomedical Research Electrical Engineering EEG electroencephalography BCI brain-computer interface wavelets discrete wavelet transform autoregressive support vector machines
247	A Probabilistic Technique For Open Set Recognition Using Support Vector Machines Scherreik, Matthew January 2014 (has links) No description available. Electrical Engineering
248	A SNP Microarray Analysis Pipeline Using Machine Learning Techniques Evans, Daniel T. January 2010 (has links) No description available. Bioinformatics Biology Computer Science Genetics genome-wide association machine learning predictive model machine learning support vector machines single nucleotide polymorphisms
249	Exploration of Acoustic Features for Automatic Vowel Discrimination in Spontaneous Speech Tyson, Na'im R. 26 June 2012 (has links) No description available. Computer Science Linguistics Psychology vowel discrimination spontaneous speech conversational speech discriminant analysis support vector machines buckeye corpus
250	Predicting basketball performance based on draft pick : A classification analysis Harmén, Fredrik January 2022 (has links) In this thesis, we will look to predict the performance of a basketball player coming into the NBA depending on where the player was picked in the NBA draft. This will be done by testing different machine learning models on data from the previous 35 NBA drafts and then comparing the models in order to see which model had the highest accuracy of classification. The machine learning methods used are Linear Discriminant Analysis, K-Nearest Neighbors, Support Vector Machines and Random Forests. The results show that the method with the highest accuracy of classification was Random Forests, with an accuracy of 42%. machine learning linear discriminant analysis k-nearest neighbors support vector machines random forests Probability Theory and Statistics Sannolikhetsteori och statistik

Search results