Global ETD Search

1161	Phenotyping cellular motion Zhou, Felix January 2017 (has links) In the development of multicellular organisms, tissue development and homeostasis require coordinated cellular motion. For example, in conditions such as wound healing, immune and epithelial cells need to proliferate and migrate. Deregulation of key signalling pathways in pathological conditions causes alterations in cellular motion properties that are critical for disease development and progression, in cancer it leads to invasion and metastasis. Consequently there is strong interest in identifying factors, including drugs that affect the motion and interactions of cells in disease using experimental models suitable for high-content screening. There are two main modes of cell migration; individual and collective migration. Currently analysis tools for robust, sensitive and comprehensive motion characterisation in varying experimental conditions for large extended timelapse acquisitions that jointly considers both modes are limited. We have developed a systematic motion analysis framework, Motion Sensing Superpixels (MOSES) to quantitatively capture cellular motion in timelapse microscopy videos suitable for high-content screening. MOSES builds upon established computer vision approaches to deliver a minimal parameter, robust algorithm that can i) extract reliable phenomena-relevant motion metrics, ii) discover spatiotemporal salient motion patterns and iii) facilitate unbiased analysis with little prior knowledge through unique motion 'signatures'. The framework was validated by application to numerous datasets including YouTube videos, zebrafish immunosurveillance and Drosophila embryo development. We demonstrate two extended applications; the analysis of interactions between two epithelial populations in 2D culture using cell lines of the squamous and columnar epithelia from human normal esophagus, Barrett's esophagus and esophageal adenocarcinoma and the automatic monitoring of 3D organoid culture growth captured through label-free phase contrast microscopy. MOSES found unique boundary formation between squamous and columnar cells and could measure subtle changes in boundary formation due to external stimuli. MOSES automatically segments the motion and shape of multiple organoids even if present in the same field of view. Automated analysis of intestinal organoid branching following treatment agrees with independent RNA-seq results.
1162	Predicting the programming language of questions and snippets of stack overflow using natural language processing Alrashedy, Kamel 11 September 2018 (has links) Stack Overflow is the most popular Q&A website among software developers. As a platform for knowledge sharing and acquisition, the questions posted in Stack Over- flow usually contain a code snippet. Stack Overflow relies on users to properly tag the programming language of a question and assumes that the programming language of the snippets inside a question is the same as the tag of the question itself. In this the- sis, a classifier is proposed to predict the programming language of questions posted in Stack Overflow using Natural Language Processing (NLP) and Machine Learning (ML). The classifier achieves an accuracy of 91.1% in predicting the 24 most popular programming languages by combining features from the title, body and code snippets of the question. We also propose a classifier that only uses the title and body of the question and has an accuracy of 81.1%. Finally, we propose a classifier of code snip- pets only that achieves an accuracy of 77.7%.Thus, deploying ML techniques on the combination of text and code snippets of a question provides the best performance. These results demonstrate that it is possible to identify the programming language of a snippet of only a few lines of source code. We visualize the feature space of two programming languages Java and SQL in order to identify some properties of the information inside the questions corresponding to these languages. / Graduate Stack overflow knowledge sharing Natural Language Processing Machine Learning
1163	Using XGBoost to classify theBeihang Keystroke Dynamics Database Blomqvist, Johanna January 2018 (has links) Keystroke Dynamics enable biometric security systems by collecting and analyzing computer keyboard usage data. There are different approaches to classifying keystroke data and a method that has been gaining a lot of attention in the machine learning industry lately is the decision tree framework of XGBoost. XGBoost has won several Kaggle competitions in the last couple of years, but its capacity in the keystroke dynamics field has not yet been widely explored. Therefore, this thesis has attempted to classify the existing Beihang Keystroke Dynamics Database using XGBoost. To do this, keystroke features such as dwell time and flight time were extracted from the dataset, which contains 47 usernames and passwords. XGBoost was then applied to a binary classification problem, where the model attempts to distinguish keystroke feature sequences from genuine users from those of `impostors'. In this way, the ratio of inaccurately and accurately labeled password inputs can be analyzed. The result showed that, after tuning of the hyperparameters, the XGBoost yielded Equal Error Rates (EER) at best 0.31 percentage points better than the SVM used in the original study of the database at 11.52%, and a highest AUC of 0.9792. The scores achieved by this thesis are however significantly worse than a lot of others in the same field, but so were the results in the original study. The results varied greatly depending on user tested. These results suggests that XGBoost may be a useful tool, that should be tuned, but that a better dataset should be used to sufficiently benchmark the tool. Also, the quality of the model is greatly affected by variance among the users. For future research purposes, one should make sure that the database used is of good quality. To create a security system utilizing XGBoost, one should be careful of the setting and quality requirements when collecting training data Keystroke XGBoost machine learning biometrics keyboard Computer Sciences Datavetenskap (datalogi)
1164	Automated Measurements of Liver Fat Using Machine Learning Grundström, Tobias January 2018 (has links) The purpose of the thesis was to investigate the possibility of using machine learn-ing for automation of liver fat measurements in fat-water magnetic resonancei maging (MRI). The thesis presents methods for texture based liver classificationand Proton Density Fat Fraction (PDFF) regression using multi-layer perceptrons utilizing 2D and 3D textural image features. The first proposed method was a data classification method with the goal to distinguish between suitable andunsuitable regions to measure PDFF in. The second proposed method was a combined classification and regression method where the classification distinguishes between liver and non-liver tissue. The goal of the regression model was to predict the difference d = pdff mean − pdff ROI between the manual ground truth mean and the fat fraction of the active Region of Interest (ROI).Tests were performed on varying sizes of Image Feature Regions (froi) and combinations of image features on both of the proposed methods. The tests showed that 3D measurements using image features from discrete wavelet transforms produced measurements similar to the manual fat measurements. The first method resulted in lower relative errors while the second method had a higher method agreement compared to manual measurements. MRI image processing machine learning Signal Processing Signalbehandling
1165	Machine learning in indoor positioning and channel prediction systems Zhu, Yizhou 18 September 2018 (has links) In this thesis, the neural network, a powerful tool which has demonstrated its ability in many fields, is studied for the indoor localization system and channel prediction system. This thesis first proposes a received signal strength indicator (RSSI) fingerprinting-based indoor positioning system for the widely deployed WiFi environment, using deep neural networks (DNN). To reduce the computing time as well as improve the estimation accuracy, a two-step scheme is designed, employing a classification network for clustering and several regression networks for final location prediction. A new fingerprinting, which utilizes the similarity in RSSI readings of the nearby reference points (RPs) is also proposed. Real-time tests demonstrate that the proposed algorithm achieves an average distance error of 43.5 inches. Then this thesis extends the ability of the neural network to the physical layer communications by introducing a recurrent neural network (RNN) based approach for real-time channel prediction which uses the recent history channel state information (CSI) estimation for online training before prediction, to adapt to the continuously changing channel to gain a more accurate CSI prediction compared to the other conventional methods. Furthermore, the proposed method needs no additional knowledge, neither the internal properties of the channel itself nor the external features that affect the channel propagation. The proposed approach outperforms the other methods in a changing environment in the simulation test, validating it a promising method for channel prediction in wireless communications. / Graduate Machine Learning Neural Network Indoor Positioning Channel Prediction
1166	Clustering to Improve One-Class Classifier Performance in Data Streams Moulton, Richard Hugh 27 August 2018 (has links) The classification task requires learning a decision boundary between classes by making use of training examples from each. A potential challenge for this task is the class imbalance problem, which occurs when there are many training instances available for a single class, the majority class, and few training instances for the other, the minority class [58]. In this case, it is no longer clear how to separate the majority class from something for which we have little to no knowledge. More worrying, often the minority class is the class of interest, e.g. for detecting abnormal conditions from streaming sensor data. The one-class classification (OCC) paradigm addresses this scenario by casting the task as learning a decision boundary around the majority class with no need for minority class instances [110]. OCC has been thoroughly investigated, e.g. [20, 60, 90, 110], and many one-class classifiers have been proposed. One approach for improving one-class classifier performance on static data sets is learning in the context of concepts: the majority class is broken down into its constituent sub-concepts and a classifier is induced over each [100]. Modern machine learning research, however, is concerned with data streams: where potentially infinite amounts of data arrive quickly and need to be processed as they arrive. In these cases it is not possible to store all of the instances in memory, nor is it practical to wait until “the end of the data stream” before learning. An example is network intrusion detection: detecting an attack on the computer network should occur as soon as practicable. Many one-class classifiers for data streams have been described in the literature, e.g. [33, 108], and it is worth investigating whether the approach of learning in the context of concepts can be successfully applied to the OCC task for data streams as well. This thesis identifies that the idea of breaking the majority class into subconcepts to simplify the OCC problem has been demonstrated for static data sets, [100], but has not been applied in data streams. The primary contribution to the literature made by this thesis is the identification of how the majority class’s sub-concept structure can be used to improve the classification performance of streaming one-class classifiers while mitigating the challenges posed by the data stream environment. Three frameworks are developed, each using this knowledge to a different degree. These are applied with a selection of streaming one-class classifiers to both synthetic and benchmark data streams with performance compared to that of the one-class classifier learning independently. These results are analyzed and it is shown that scenarios exist where knowledge of sub-concepts can be used to improve one-class classifier performance. machine learning one-class classification data streams sub-concepts
1167	Machine learning and brain imaging in psychosis Zarogianni, Eleni January 2016 (has links) Over the past years early detection and intervention in schizophrenia have become a major objective in psychiatry. Early intervention strategies are intended to identify and treat psychosis prior to fulfilling diagnostic criteria for the disorder. To this aim, reliable early diagnostic biomarkers are needed in order to identify a high-risk state for psychosis and also predict transition to frank psychosis in those high-risk individuals destined to develop the disorder. Recently, machine learning methods have been successfully applied in the diagnostic classification of schizophrenia and in predicting transition to psychosis at an individual level based on magnetic resonance imaging (MRI) data and also neurocognitive variables. This work investigates the application of machine learning methods for the early identification of schizophrenia in subjects at high risk for developing the disorder. The dataset used in this work involves data from the Edinburgh High Risk Study (EHRS), which examined individuals at a heightened risk for developing schizophrenia for familial reasons, and the FePsy (Fruherkennung von Psychosen) study that was conducted in Basel and involves subjects at a clinical high-risk state for psychosis. The overriding aim of this thesis was to use machine learning, and specifically Support Vector Machine (SVM), in order to identify predictors of transition to psychosis in high-risk individuals, using baseline structural MRI data. There are three aims pertaining to this main one. (i) Firstly, our aim was to examine the feasibility of distinguishing at baseline those individuals who later developed schizophrenia from those who did not, yet had psychotic symptoms using SVM and baseline data from the EHRS study. (ii) Secondly, we intended to examine if our classification approach could generalize to clinical high-risk cohorts, using neuroanatomical data from the FePsy study. (iii) In a more exploratory context, we have also examined the diagnostic performance of our classifier by pooling the two datasets together. With regards to the first aim, our findings suggest that the early prediction of schizophrenia is feasible using a MRI-based linear SVM classifier operating at the single-subject level. Additionally, we have shown that the combination of baseline neuroanatomical data with measures of neurocognitive functioning and schizotypal cognition can improve predictive performance. The application of our pattern classification approach to baseline structural MRI data from the FePsy study highly replicated our previous findings. Our classification method identified spatially distributed networks that discriminate at baseline between subjects that later developed schizophrenia and other related psychoses and those that did not. Finally, a preliminary classification analysis using pooled datasets from the EHRS and the FePsy study supports the existence of a neuroanatomical pattern that differentiates between groups of high-risk subjects that develop psychosis against those who do not across research sites and despite any between-sites differences. Taken together, our findings suggest that machine learning is capable of distinguishing between cohorts of high risk subjects that later convert to psychosis and those that do not based on patterns of structural abnormalities that are present before disease onset. Our findings have some clinical implications in that machine learning-based approaches could advise or complement clinical decision-making in early intervention strategies in schizophrenia and related psychoses. Future work will be, however, required to tackle issues of reproducibility of early diagnostic biomarkers across research sites, where different assessment criteria and imaging equipment and protocols are used. In addition, future projects may also examine the diagnostic and prognostic value of multimodal neuroimaging data, possibly combined with other clinical, neurocognitive, genetic information. 616.89
1168	A FEATURES EXTRACTION WRAPPER METHOD FOR NEURAL NETWORKS WITH APPLICATION TO DATA MINING AND MACHINE LEARNING MIGDADY, HAZEM MOH'D 01 May 2013 (has links) This dissertation presents a novel features selection wrapper method based on neural networks, named the Binary Wrapper for Features Selection Technique. The major aim of this method is to reduce the computation time that is consumed during the implementation of the process of features selection and classifier optimization in the Heuristic for Features Selection (HVS) method. The HVS technique is a neural network based features selection technique that uses the weights of a well-trained neural network as relevance index for each input feature with respect to the target. The HVS technique consumes long computation time because it follows a sequential approach to discard irrelevant, low relevance, and redundant features. Hence, the HVS technique discards a single feature only at each training session of the classifier. In order to reduce the computation time of the HVS technique, a threshold was produced and used to implement the features selection process. In this dissertation, a new technique, named the replacement technique, was designed and implemented to produce an appropriate threshold that can be used in discarding a group of features instead of discarding a single feature only, which is currently the case with HVS technique. Since the distribution of the candidate features (i.e. relevant, low relevance, redundant and irrelevant features) with respect to the target in a dataset is unknown, the replacement technique produces low relevance features (i.e. probes) to generate a low relevance threshold that is compared to the candidate features and used to detect low relevance, irrelevant and redundant features. Moreover, the replacement technique is considered to be a novel technique that overcomes the limitation of another similar technique that is known as: random shuffling technique. The goal of the random shuffling technique is to produce low relevance features (i.e. probes) in comparison with the relevance of the candidate features with respect to the target. However, using the random shuffling technique, it is not guaranteed to produce such features, whereas this is guaranteed when using the replacement technique. The binary wrapper for features selection technique was evaluated by implementing it over a number of experiments. In those experiments, three different datasets were used, which are: Congressional Voting Records, Wave Forms, and Multiple Features. The numbers of features in the datasets are: 16, 40, and 649 respectively. The results of those experiments were compared to the results of the HVS method and other similar methods to evaluate the performance of the binary wrapper for features selection technique. The technique showed a critical improvement in the consumed time for features selection and classifier optimization, since the consumed computation time using this method was largely less than the time consumed by the HVS method and other methods. The binary wrapper technique was able to save 0.889, 0.931, and 0.993 of the time that is consumed by the HVS method to produce results identical to those produced by the binary wrapper technique over the former three datasets. This implies that the amount of the saved computation time by the binary wrapper technique in comparison with the HVS method increases as the number of features in a dataset increases as well. Regarding the classification accuracy, the results showed that the binary wrapper technique was able to enhance the classification accuracy after discarding features, which is considered as an advantage in comparison with the HVS which did not enhance the classification accuracy after discarding features. DATA MINING FEATURES SELECTION MACHINE LEARNING NEURAL NETWORKS
1169	MACHINE LEARNING ON BIG DATA FOR STOCK MARKET PREDICTION Fallahi, Faraz 01 August 2017 (has links) In recent decades, the rapid development of information technology in the big data field has introduced new opportunities to explore a large amount of data available online. The Global Database of Events, Location (Language), and Tone (GDELT) is the largest, most comprehensive, and highest resolution open source database of human society that includes more than 440 million entries capturing information about events that have been covered by local, national, and international news sources since 1979 in over 100 languages. GDELT constructs a catalog of human societal-scale behavior and beliefs across all countries of the world, connecting every person, organization, location, count, theme, news source, and event across the planet into a single massive network that captures what is happening around the world, what its context is and who is involved, and how the world is feeling about it, every single day. On the other hand, the stock market prediction has also been a long-time attractive topic and is extensively studied by researchers in different fields with numerous studies of the correlation between stock market fluctuations and different data sources derived from the historical data of world major stock indices or external information from social media and news. Support Vector Machine (SVM) and Logistic Regression are two of the most widely used machine learning techniques in recent studies. The main objective of this research project is to investigate the worthiness of information derived from GDELT project in improving the accuracy of stock market trend prediction specifically for the next days' price changes. This research is based on data sets of events from GDELT database and daily prices of Bitcoin and some other stock market companies and indices from Yahoo Finance, all from March 2015 to May 2017. Then multiple different machine learning and specifically classification algorithms are applied to data sets generated, first using only features derived from historical market prices and then including more features derived from external sources, in this case, GDELT. Then the performance is evaluated for each model over a range of parameters. Finally, experimental results show that using information gained from GDELT has a direct positive impact on improving the prediction accuracy. Keywords: Machine Learning, Stock Market, GDELT, Big Data, Data Mining Big Data Bitcoin Data Mining GDELT Machine learning Stock Market
1170	A Twitter-Based Prediction Tool for Digital Currency McCoy, Mason Eugene 01 May 2018 (has links) Digital currencies (cryptocurrencies) are rapidly becoming commonplace in the global market. Trading is performed similarly to the stock market or commodities, but stock market prediction algorithms are not necessarily well-suited for predicting digital currency prices. In this work, we analyzed tweets with both an existing sentiment analysis package and a manually tailored "objective analysis," resulting in one impact value for each analysis per 15-minute period. We then used evolutionary techniques to select the most appropriate training method and the best subset of the generated features to include, as well as other parameters. This resulted in implementation of predictors which yielded much more profit in four-week simulations than simply holding a digital currency for the same time period--the results ranged from 28% to 122% profit. Unlike stock exchanges, which shut down for several hours or days at a time, digital currency prediction and trading seems to be of a more consistent and predictable nature. bitcoin cryptocurrency digital currency machine learning prediction twitter

Search results