Global ETD Search

1	Towards Efficient Packet Classification Algorithms and Architectures Ahmed, Omar 22 August 2013 (has links) Packet classification plays an important role in next generation networks. Packet classification is important to fulfill the requirements for many applications including firewalls, multimedia services, intrusion detection services, and differentiated services to name just a few. Hardware solutions such as CAM/TCAM do not scale well in space. Current software-based packet classification algorithms exhibit relatively poor performance, prompting many researchers to concentrate on novel frameworks and architectures that employ both hardware and software components. In this thesis we propose two novel algorithms, Packet Classification with Incremental Update (PCIU) and Group Based Search packet classification Algorithm (GBSA), that are scalable and demonstrate excellent results in terms of preprocessing and classification. The PCIU algorithm is an innovative and efficient packet classification algorithm with a unique incremental update capability that demonstrates powerful results and is accessible for many different tasks and clients. The algorithm was further improved and made more available for a variety of applications through its implementation in hardware. Four such implementations are detailed and discussed in this thesis. A hardware accelerator based on an ESL approach, using Handel-C, resulted in a 22x faster classification than a pure software implementation running on a state of the art Xeon processor. An ASIP implementation achieved on average a 21x quicker classification. We also propose another novel algorithm, GBSA, for packet classification that is scalable, fast and efficient. On average the algorithm consumes 0.4 MB of memory for a 10k rule set. In the worst case scenario, the classification time per packet is 2 μs, and the pre-processing speed is 3M Rule/sec, based on a CPU operating at 3.4 GHz. The proposed algorithm was evaluated and compared to state-of-the-art techniques, such as RFC, HiCut, Tuple, and PCIU, using several standard benchmarks. The obtained results indicate that GBSA outperforms these algorithms in terms of speed, memory usage and pre-processing time. The algorithm, furthermore, was improved and made more accessible for a variety of applications through implementation in hardware. Three approaches using this algorithm are detailed and discussed in this thesis. The first approach was implemented using an Application Specific Instruction Processor (ASIP), while the others were pure RTL implementations using two different ESL flows (Impulse-C and Handel-C). The GBSA ASIP implementation achieved, on average, a 18x faster running speed than a pure software implementation operating on a Xeon processor. Conversely, the hardware accelerators (based on the ESL approaches) resulted in 9x faster processing.
2	Streaming Random Forests Abdulsalam, Hanady 16 July 2008 (has links) Recent research addresses the problem of data-stream mining to deal with applications that require processing huge amounts of data such as sensor data analysis and financial applications. Data-stream mining algorithms incorporate special provisions to meet the requirements of stream-management systems, that is stream algorithms must be online and incremental, processing each data record only once (or few times); adaptive to distribution changes; and fast enough to accommodate high arrival rates. We consider the problem of data-stream classification, introducing an online and incremental stream-classification ensemble algorithm, Streaming Random Forests, an extension of the Random Forests algorithm by Breiman, which is a standard classification algorithm. Our algorithm is designed to handle multi-class classification problems. It is able to deal with data streams having an evolving nature and a random arrival rate of training/test data records. The algorithm, in addition, automatically adjusts its parameters based on the data seen so far. Experimental results on real and synthetic data demonstrate that the algorithm gives a successful behavior. Without losing classification accuracy, our algorithm is able to handle multi-class problems for which the underlying class boundaries drift, and handle the case when blocks of training records are not big enough to build/update the classification model. / Thesis (Ph.D, Computing) -- Queen's University, 2008-07-15 16:12:33.221 Data mining Streams Classification algorithms Streaming algorithms
3	The predictive power of stock micro-blogging sentiment in forecasting stock market behaviour Al Nasseri, Alya Ali Mansoor January 2016 (has links) Online stock forums have become a vital investing platform on which to publish relevant and valuable user-generated content (UGC) data such as investment recommendations and other stock-related information that allow investors to view the opinions of a large number of users and share-trading ideas. This thesis applies methods from computational linguistics and text-mining techniques to analyse and extract, on a daily basis, sentiments from stock-related micro-blogging messages called “StockTwits”. The primary aim of this research is to provide an understanding of the predictive ability of stock micro-blogging sentiments to forecast future stock price behavioural movements by investigating the various roles played by investor sentiments in determining asset pricing on the stock market. The empirical analysis in this thesis consists of four main parts based on the predictive power and the role of investor sentiment in the stock market. The first part discusses the findings of the text-mining procedure for extracting and predicting sentiments from stock-related micro-blogging data. The purpose is to provide a comparative textual analysis of different machine learning algorithms for the purpose of selecting the most accurate text-mining techniques for predicting sentiment analysis on StockTwits through the provision of two different applications of feature selection, namely filter and wrapper approaches. The second part of the analysis focuses on investigating the predictive correlations between StockTwits features and the stock market indicators. It aims to examine the explanatory power of StockTwits variables in explaining the dynamic nature of different financial market indicators. The third part of the analysis investigates the role played by noise traders in determining asset prices. The aim is to show that stock returns, volatility and trading volumes are affected by investor sentiment; it also seeks to investigate whether changes in sentiment (bullish or bearish) will have different effects on stock market prices. The fourth part offers an in-depth analysis of some tweet-market relationships which represent an open problem in the empirical literature (e.g. sentiment-return relations and volume-disagreement relations). The results suggest that StockTwits sentiments exhibit explanatory power in explaining the dynamics of stock prices in the U.S. market. Taking different approaches by combining text-mining techniques with feature selection methods has proved successful in predicting StockTwits sentiments. The applications of the approach presented in this thesis offer real-time investment ideas that may provide investors and their peers with a decision support mechanism. Investor sentiment plays a critical role in determining asset prices in capital markets. Overall, the findings suggest that investor sentiment among noise traders is a priced factor. The findings confirm the existence of asymmetric spillover effects of bullish and bearish sentiments on the stock market. They also suggest that sentiment is a significant factor in explaining stock price behaviour in the capital market and imply the positive role of the stock market in the formation of investor sentiment in stock markets. Furthermore, the research findings demonstrate that disagreement is not only an important factor in determining trading volumes but it is also considered a very significant factor in influencing asset prices and returns in capital markets. Overall, the findings of the thesis provide empirical evidence that failure to consider the role of investor sentiment in traditional finance theory could lead to an imperfect picture when explaining the behaviour of stock prices in stock markets. 332.64
4	Classification of heterogeneous data based on data type impact of similarity Ali, N., Neagu, Daniel, Trundle, Paul R. 11 August 2018 (has links) Yes / Real-world datasets are increasingly heterogeneous, showing a mixture of numerical, categorical and other feature types. The main challenge for mining heterogeneous datasets is how to deal with heterogeneity present in the dataset records. Although some existing classifiers (such as decision trees) can handle heterogeneous data in specific circumstances, the performance of such models may be still improved, because heterogeneity involves specific adjustments to similarity measurements and calculations. Moreover, heterogeneous data is still treated inconsistently and in ad-hoc manner. In this paper, we study the problem of heterogeneous data classification: our purpose is to use heterogeneity as a positive feature of the data classification effort by using consistently the similarity between data objects. We address the heterogeneity issue by studying the impact of mixing data types in the calculation of data objects’ similarity. To reach our goal, we propose an algorithm to divide the initial data records based on pairwise similarity for classification subtasks with the aim to increase the quality of the data subsets and apply specialized classifier models on them. The performance of the proposed approach is evaluated on 10 publicly available heterogeneous data sets. The results show that the models achieve better performance for heterogeneous datasets when using the proposed similarity process. Heterogeneous datasets Similarity measures Two-dimensional similarity space Classification algorithms
5	Análise de sentimentos para o auxílio na gestão das cidades inteligentes. / Sentiment analysis for the aid in the smart cities management. Rossi, Rosa Helena Peccinini Silva 27 June 2019 (has links) Esta Tese tem como objetivo geral inserir a Análise de Sentimentos na gestão das Cidades Inteligentes, possibilitando a implementação de uma ferramenta que disponibilize informações que auxiliem na supervisão e gestão dessas cidades. Dentre os possíveis auxílios que podem ser prestados está a identificação de ações, meios de prevenção e predição de possíveis adversidades nos diversos Domínios de Interesse, além da busca por melhorias na qualidade vida da população, que pode ser feita por meio dessa análise, permitindo que os gestores dessas cidades possam tomar as melhores decisões de acordo com cada cenário. Este trabalho contribui com um novo método cujo o objetivo é o desenvolvimento de um Sistema de Análise de Sentimentos para Auxílio na Gestão das Cidades Inteligentes (ASCI). Esse Sistema é capaz de captar, tratar, processar, filtrar por Domínio de Interesse e avaliar os sentimentos contidos nas informações provenientes dos cidadãos de uma Cidade Inteligente. O método utiliza duas Fases de Mineração de Dados, uma para a classificação dos Domínios de Interesse e outra para a Análise de Sentimentos. Para o estudo de caso foi implementado o método ASCI por meio do qual são captadas informações provenientes da população de uma determinada região da cidade de São Paulo, por meio da Rede Social Twitter. Também foi realizado um estudo de classificação de sentimentos no Domínio específico do Transporte, no qual também foram utilizados, e tiveram seu desempenho avaliado, os classificadores do tipo Linear SVC, Logistic Regression, Multinomial Naive Bayes e Random Forest Classifier para identificar os sentimentos positivos, neutros e negativos dos tweets captados. Os dados foram avaliados usando duas técnicas de extração de características de texto: Bag of Words e TF-IDF. O método ASCI desenvolvido nesta Tese contribui de maneira relevante para a área de Análise de Sentimentos, uma vez que os resultados obtidos foram satisfatórios quando aplicado em cenários de Domínios de Interesse das Cidades Inteligentes. / The main objective of this work is to insert the Sentiment Analysis in the management of Smart Cities, enabling the implementation of a supervision and management tool in these cities. Among the possible aid services that can be applied, there is the identification of actions, ways of prevention and prediction of possible adversities in the various Domains of Interest, and also the search for improvements in the quality of life of the population. This can be done through this analysis, allowing the best decisions according to each scenario by the city managers. This work contributes to a new method whose objective is the development of a Sentiment Analysis System to Assist in the Management of Smart Cities (ASCI). This System is capable of capturing, classifying, processing, filtering by Domain of Interest and evaluating the sentiments of Smart City citizens. The method uses two Data Mining phases, one for the classification of Domains of Interest and the other for Sentiment Analysis. For the case study, the ASCI method was implemented, through which information was collected from a regional population in São Paulo city through Twitter Social Network data. A study of Sentiment Analysis in specific Domain of Interest Transport was also carried out, in which Linear SVC, Logistic Regression, Multinomial Naive Bayes and Random Forest classifiers were used to identify the positive, neutral and negative sentiments of collected tweets. The data were evaluated using two techniques of extraction of text characteristics: Bag of Words and TF-IDF. The ASCI method developed in this Thesis contributes significantly to the area of Sentiment Analysis and the results obtained were satisfactory when applied in Smart City Domain of Interest scenarios. Algoritmos (Classificação) Análise de sentimentos Cidades inteligentes Classification algorithms Data mining Mineração de dados Sentiment analysis Smart cities
6	Decision Fusion for Protein Secondary Structure Prediction Akkaladevi, Somasheker 03 August 2006 (has links) Prediction of protein secondary structure from primary sequence of amino acids is a very challenging task, and the problem has been approached from several angles. Proteins have many different biological functions; they may act as enzymes or as building blocks (muscle fibers) or may have transport function (e.g., transport of oxygen). The three-dimensional protein structure determines the functional properties of the protein. A lot of interesting work has been done on this problem, and over the last 10 to 20 years the methods have gradually improved in accuracy. In this dissertation we investigate several techniques for predicting the protein secondary structure. The prediction is carried out mainly using pattern classification techniques such as neural networks, genetic algorithms, simulated annealing. Each individual algorithm may work well in certain situations but fails in others. Capitalizing on the positive decisions can be achieved by forcing the various methods to collaborate to reach a unified consensus based on their previous performances. The process of combining classifiers is called decision fusion. The various decision fusion techniques such as the committee method, correlation method and the Bayesian inference methods to fuse the solutions from various approaches and to get better prediction accuracy are thoroughly explored in this dissertation. The RS126 data set was used for training and testing purposes. The results of applying pattern classification algorithms along with decision fusion techniques showed improvement in the prediction accuracy compared to that of prediction by neural networks or pattern classification algorithms individually or combined with neural networks. This research has shown that decision fusion techniques can be used to obtain better protein secondary structure prediction accuracy. Decision Fusion Protein Secondary Structure Prediction Pattern classification algorithms Computer Sciences
7	Design and Implementation of Analytical Mathematics for SIFT-MS Medical Applications Moorhead, Katherine Tracey January 2009 (has links) Selected Ion Flow Tube-Mass spectrometry (SIFT-MS) is an analytical measurement technology for the real-time quantification of volatile organic compounds in gaseous samples. This technology has current and potential applications in a wide variety of industries, although the focus of this research is in medical science. In this field, SIFT-MS has potential as a diagnostic device, capable of determining the presence of a particular disease or condition. In addition, SIFT-MS can be used to monitor the progression of a disease state, or predict deviations from expected behaviour. Lastly, SIFT-MS can be used for the identification of biomarkers of a particular disease state. All these possibilities are available non-invasively and in real-time, by analysing breath samples. SIFT-MS produces an extensive amount of data, requiring specific mathematical methods to identify biomarker masses that differ significantly between populations or time-points. Two classification methods are presented for the analysis of SIFT-MS mass scan data. The first method is a cross-sectional classification model, intended to differentiate between the diseased and non-diseased state. This model was validated in a simple test case. The second method is a longitudinal classification model, intended to identify key biomarkers that change over time, or in response to treatment. Both of these classification models were validated in 2 clinical trials, investigating renal function in humans and rats. The first clinical trial monitored changes in breath ammonia, TMA and acetone concentrations over the course of dialysis treatment. Correlations with the current gold standard plasma creatinine, and blood urea nitrogen were reported. Finally, biomarkers of renal function were identified that change predictably over the course of treatment. The second trial induced acute renal failure in rats, and monitored the change in renal function observed during recovery. For comparison and validation of the result, a 2-compartment model was developed for estimating renal function via a bolus injection of a radio-labelled inulin tracer, and was compared with the current gold standard plasma creatinine measurement, modified using the Cockcroft-Gault equation for rats. These two methods were compared with SIFT-MS monitoring of breath analytes, to examine the potential for non-invasive biomarkers of kidney function. Results show good promise for the non-invasive, real-time monitoring of breath analytes for diagnosis and monitoring of kidney function, and, potentially, other disease states. Classification algorithms biomarkers SIFT-MS mass spectrometry medical diagnosis and monitoring renal failure
8	HIV Drug Resistant Prediction and Featured Mutants Selection using Machine Learning Approaches Yu, Xiaxia 16 December 2014 (has links) HIV/AIDS is widely spread and ranks as the sixth biggest killer all over the world. Moreover, due to the rapid replication rate and the lack of proofreading mechanism of HIV virus, drug resistance is commonly found and is one of the reasons causing the failure of the treatment. Even though the drug resistance tests are provided to the patients and help choose more efficient drugs, such experiments may take up to two weeks to finish and are expensive. Because of the fast development of the computer, drug resistance prediction using machine learning is feasible. In order to accurately predict the HIV drug resistance, two main tasks need to be solved: how to encode the protein structure, extracting the more useful information and feeding it into the machine learning tools; and which kinds of machine learning tools to choose. In our research, we first proposed a new protein encoding algorithm, which could convert various sizes of proteins into a fixed size vector. This algorithm enables feeding the protein structure information to most state of the art machine learning algorithms. In the next step, we also proposed a new classification algorithm based on sparse representation. Following that, mean shift and quantile regression were included to help extract the feature information from the data. Our results show that encoding protein structure using our newly proposed method is very efficient, and has consistently higher accuracy regardless of type of machine learning tools. Furthermore, our new classification algorithm based on sparse representation is the first application of sparse representation performed on biological data, and the result is comparable to other state of the art classification algorithms, for example ANN, SVM and multiple regression. Following that, the mean shift and quantile regression provided us with the potentially most important drug resistant mutants, and such results might help biologists/chemists to determine which mutants are the most representative candidates for further research. HIV-1 Drug resistance prediction Delaunay triangulation Sparse representation Machine learning Classification algorithms Mean shift
9	EMERGENCY MEDICAL SERVICE EMR-DRIVEN CONCEPT EXTRACTION FROM NARRATIVE TEXT Susanna S George (10947207) 05 August 2021 (has links) Being in the midst of a pandemic with patients having minor symptoms that quickly become fatal to patients with situations like a stemi heart attack, a fatal accident injury, and so on, the importance of medical research to improve speed and efficiency in patient care, has increased. As researchers in the computer domain work hard to use automation in technology in assisting the first responders in the work they do, decreasing the cognitive load on the field crew, time taken for documentation of each patient case and improving accuracy in details of a report has been a priority. <br>This paper presents an information extraction algorithm that custom engineers certain existing extraction techniques that work on the principles of natural language processing like metamap along with syntactic dependency parser like spacy for analyzing the sentence structure and regular expressions to recurring patterns, to retrieve patient-specific information from medical narratives. These concept value pairs automatically populates the fields of an EMR form which could be reviewed and modified manually if needed. This report can then be reused for various medical and billing purposes related to the patient. Computer Engineering concept extraction multi-label classification algorithms Natural Language Processing Syntactic Dependency
10	Experiments with GMTI Radar using Micro-Doppler Dilsaver, Benjamin Walter 24 June 2013 (has links) (PDF) As objects move, their changing shape produces a signature that can be measured by a radar system. That signature is called the micro-Doppler signature. The micro-Doppler signature of an object is a distinguishing characteristic for certain classes of objects. In this thesis features are extracted from the micro-Doppler signature and are used to classify objects. The scope of the objects is limited to humans walking and traveling vehicles. The micro-Doppler features are able to distinguish the two classes of objects. With a sufficient amount of training data, the micro-Doppler features may be used with learning algorithms to predict unknown objects detected by the radar with high accuracy. Doppler radar feature extraction Doppler measurement Doppler effect classification algorithms Electrical and Computer Engineering

Search results