Global ETD Search

131	A Comparison on Supervised and Semi-Supervised Machine Learning Classifiers for Diabetes Prediction Kola, Lokesh, Muriki, Vigneshwar January 2021 (has links) Background: The main cause of diabetes is due to high sugar levels in the blood. There is no permanent cure for diabetes. However, it can be prevented by early diagnosis. In recent years, the hype for Machine Learning is increasing in disease prediction especially during COVID-19 times. In the present scenario, it is difficult for patients to visit doctors. A possible framework is provided using Machine Learning which can detect diabetes at early stages. Objectives: This thesis aims to identify the critical features that impact gestational (Type-3) diabetes and experiments are performed to identify the efficient algorithm for Type-3 diabetes prediction. The selected algorithms are Decision Trees, RandomForest, Support Vector Machine, Gaussian Naive Bayes, Bernoulli Naive Bayes, Laplacian Support Vector Machine. The algorithms are compared based on the performance. Methods: The method consists of gathering the dataset and preprocessing the data. SelectKBestunivariate feature selection was performed for selecting the important features, which influence the Type-3 diabetes prediction. A new dataset was created by binning some of the important features from the original dataset, leading to two datasets, non-binned and binned datasets. The original dataset was imbalanced due to the unequal distribution of class labels. The train-test split was performed on both datasets. Therefore, the oversampling technique was performed on both training datasets to overcome the imbalance nature. The selected Machine Learning algorithms were trained. Predictions were made on the test data. Hyperparameter tuning was performed on all algorithms to improve the performance. Predictions were made again on the test data and accuracy, precision, recall, and f1-score were measured on both binned and non-binned datasets. Results: Among selected Machine Learning algorithms, Laplacian Support Vector Machineattained higher performance with 89.61% and 86.93% on non-binned and binned datasets respectively. Hence, it is an efficient algorithm for Type-3 diabetes prediction. The second best algorithm is Random Forest with 74.5% and 72.72% on non-binned and binned datasets. The non-binned dataset performed well for the majority of selected algorithms. Conclusions: Laplacian Support Vector Machine scored high performance among the other algorithms on both binned and non-binned datasets. The non-binned dataset showed the best performance in almost all Machine Learning algorithms except Bernoulli naive Bayes. Therefore, the non-binned dataset is more suitable for the Type-3 diabetes prediction. Machine Learning Semi-supervised Learning Supervised Learning Diabetes Prediction Engineering and Technology Teknik och teknologier Computer Sciences Datavetenskap (datalogi)
132	Semi-Supervised Learning Algorithm for Large Datasets Using Spark Environment Kacheria, Amar January 2021 (has links) No description available. Computer Science Semi-supervised learning Real-Valued Biclustering Minimizing Labelling Cost SSL using Biclustering Distributed biclustering Spark
133	Dynamic Information Density for Image Classification in an Active Learning Framework Morgan, Joshua Edward 01 May 2020 (has links) No description available. Computer Science Machine Learning Active Learning Semi-supervised Learning Convex Combination Uncertainty Similarity Acquisition Function Information Density Dynamic Information Density
134	Semi-Supervised Semantic Segmentation for Agricultural Aerial Images Chen-yi Lu (15383813) 01 May 2023 (has links) <p>Unmanned Aerial Systems (UAS) have been an essential tool for field scouting, nutrient applications, and farm management. However, assessing the aerial images captured by UAS is labor-intensive, and human assessment can be misleading, introducing bias. Deep learning based image segmentation has been proposed to assist in segmenting different areas of interest in the field, but it usually requires significant pixel-level annotated data. To address this, we propose a semi-supervised learning algorithm, AgSemSeg, to train a robust image segmentation</p> <p>model with less annotated data. Semi-supervised semantic segmentation aims to predict accurate pixel-level segmentation results via incorporating unlabeled images. Existing methods rely on computing the consistency loss on the output predictions between pseudo-labels and unlabeled images. In AgSemSeg, we exploit the intermediate feature representations rather than only using the output predictions to improve the overall performance of the</p> <p>model. Specifically, we add a projection layer on the output of the backbone encoder, and inject consistency loss between intermediate feature representations with Sliced-Wasserstein distance. We evaluate AgSemSeg using Agriculture-Vision dataset and outperform the supervised baseline by up to 9.71%. We also evaluate AgSemSeg on benchmark datasets such as PASCAL VOC 2012 and Cityscapes datasets, and it outperforms supervised baselines by up to 24.6% and 7.5% mIoU, respectively. We also perform extensive ablation studies to show that our proposed components are key to the performance improvements of our method. </p> Computer vision Deep learning Semantic segmantation Wasserstein distance loss Semi-Supervised learning
135	Analysis of Meso-scale Structures in Weighted Graphs Sardana, Divya January 2017 (has links) No description available. Computer Science community structure core periphery structure graph clustering protein protein interaction networks semi supervised clustering overlapping clustering
136	Urban Seismic Event Detection: A Non-Invasive Deep Learning Approach Parth Sagar Hasabnis (18424092) 23 April 2024 (has links) <p dir="ltr">As cameras increasingly populate urban environments for surveillance, the threat of data breaches and losses escalates as well. The rapid advancements in generative Artificial Intelligence have greatly simplified the replication of individuals’ appearances from video footage. This capability poses a grave risk as malicious entities can exploit it for various nefarious purposes, including identity theft and tracking individuals’ daily activities to facilitate theft or burglary.</p><p dir="ltr">To reduce reliance on video surveillance systems, this study introduces Urban Seismic Event Detection (USED), a deep learning-based technique aimed at extracting information about urban seismic events. Our approach involves synthesizing training data through a small batch of manually labelled field data. Additionally, we explore the utilization of unlabeled field data in training through semi-supervised learning, with the implementation of a mean-teacher approach. We also introduce pre-processing and post-processing techniques tailored to seismic data. Subsequently, we evaluate the trained models using synthetic, real, and unlabeled data and compare the results with recent statistical methods. Finally, we discuss the insights gained and the limitations encountered in our approach, while also proposing potential avenues for future research.</p> Seismology and seismic exploration Signal processing Deep learning deep acoustic features seismic event identification semi supervised learning synthetic data generated
137	Semi-Supervised Deep Learning Approach for Transportation Mode Identification Using GPS Trajectory Data Dabiri, Sina 11 December 2018 (has links) Identification of travelers' transportation modes is a fundamental step for various problems that arise in the domain of transportation such as travel demand analysis, transport planning, and traffic management. This thesis aims to identify travelers' transportation modes purely based on their GPS trajectories. First, a segmentation process is developed to partition a user's trip into GPS segments with only one transportation mode. A majority of studies have proposed mode inference models based on hand-crafted features, which might be vulnerable to traffic and environmental conditions. Furthermore, the classification task in almost all models have been performed in a supervised fashion while a large amount of unlabeled GPS trajectories has remained unused. Accordingly, a deep SEmi-Supervised Convolutional Autoencoder (SECA) architecture is proposed to not only automatically extract relevant features from GPS segments but also exploit useful information in unlabeled data. The SECA integrates a convolutional-deconvolutional autoencoder and a convolutional neural network into a unified framework to concurrently perform supervised and unsupervised learning. The two components are simultaneously trained using both labeled and unlabeled GPS segments, which have already been converted into an efficient representation for the convolutional operation. An optimum schedule for varying the balancing parameters between reconstruction and classification errors are also implemented. The performance of the proposed SECA model, trip segmentation, the method for converting a raw trajectory into a new representation, the hyperparameter schedule, and the model configuration are evaluated by comparing to several baselines and alternatives for various amounts of labeled and unlabeled data. The experimental results demonstrate the superiority of the proposed model over the state-of-the-art semi-supervised and supervised methods with respect to metrics such as accuracy and F-measure. / Master of Science / Identifying users' transportation modes (e.g., bike, bus, train, and car) is a key step towards many transportation related problems including (but not limited to) transport planning, transit demand analysis, auto ownership, and transportation emissions analysis. Traditionally, the information for analyzing travelers' behavior for choosing transport mode(s) was obtained through travel surveys. High cost, low-response rate, time-consuming manual data collection, and misreporting are the main demerits of the survey-based approaches. With the rapid growth of ubiquitous GPS-enabled devices (e.g., smartphones), a constant stream of users' trajectory data can be recorded. A user's GPS trajectory is a sequence of GPS points, recorded by means of a GPS-enabled device, in which a GPS point contains the information of the device geographic location at a particular moment. In this research, users' GPS trajectories, rather than traditional resources, are harnessed to predict their transportation mode by means of statistical models. With respect to the statistical models, a wide range of studies have developed travel mode detection models using on hand-designed attributes and classical learning techniques. Nonetheless, hand-crafted features cause some main shortcomings including vulnerability to traffic uncertainties and biased engineering justification in generating effective features. A potential solution to address these issues is by leveraging deep learning frameworks that are capable of capturing abstract features from the raw input in an automated fashion. Thus, in this thesis, deep learning architectures are exploited in order to identify transport modes based on only raw GPS tracks. It is worth noting that a significant portion of trajectories in GPS data might not be annotated by a transport mode and the acquisition of labeled data is a more expensive and labor-intensive task in comparison with collecting unlabeled data. Thus, utilizing the unlabeled GPS trajectory (i.e., the GPS trajectories that have not been annotated by a transport mode) is a cost-effective approach for improving the prediction quality of the travel mode detection model. Therefore, the unlabeled GPS data are also leveraged by developing a novel deep-learning architecture that is capable of extracting information from both labeled and unlabeled data. The experimental results demonstrate the superiority of the proposed models over the state-of-the-art methods in literature with respect to several performance metrics. Deep learning semi-supervised learning convolutional neural network convolutional autoencoder GPS trajectory data Machine learning Travel Mode Detection
138	Machine learning for complex evaluation and detection of combustion health of Industrial Gas turbines Mshaleh, Mohammad January 2024 (has links) This study addresses the challenge of identifying anomalies within multivariate time series data, focusing specifically on the operational parameters of gas turbine combustion systems. In search of an effective detection method, the research explores the application of three distinct machine learning methods: the Long Short-Term Memory (LSTM) autoencoder, the Self-Organizing Map (SOM), and the Density-Based Spatial Clustering of Applications with Noise (DBSCAN). Through the experiment, these models are evaluated to determine their efficacy in anomaly detection. The findings show that the LSTM autoencoder not only surpasses its counterparts in performance metrics but also shows a unique capability to identify the underlying causes of detected anomalies. This paper delves into the comparative analysis of these techniques and discusses the implications of the models in maintaining the reliability and safety of gas turbine operations. Anomaly detection Semi-supervised learning Multivariate time-series Combustion systems Ethical AI Computer Sciences Datavetenskap (datalogi) Computer Engineering Datorteknik
139	Web genre classification using feature selection and semi-supervised learning Chetry, Roshan January 1900 (has links) Master of Science / Department of Computing and Information Sciences / Doina Caragea / As the web pages continuously change and their number grows exponentially, the need for genre classification of web pages also increases. One simple reason for this is given by the need to group web pages into various genre categories in order to reduce the complexities of various web tasks (e.g., search). Experts unanimously agree on the huge potential of genre classification of web pages. However, while everybody agrees that genre classification of web pages is necessary, researchers face problems in finding enough labeled data to perform supervised classification of web pages into various genres. The high cost of skilled manual labor, rapid changing nature of web and never ending growth of web pages are the main reasons for the limited amount of labeled data. On the contrary unlabeled data can be acquired relatively inexpensively in comparison to labeled data. This suggests the use of semi-supervised learning approaches for genre classification, instead of using supervised approaches. Semi-supervised learning makes use of both labeled and unlabeled data for training - typically a small amount of labeled data and a large amount of unlabeled data. Semi-supervised learning have been extensively used in text classification problems. Given the link structure of the web, for web-page classification one can use link features in addition to the content features that are used for general text classification. Hence, the feature set corresponding to web-pages can be easily divided into two views, namely content and link based feature views. Intuitively, the two feature views are conditionally independent given the genre category and have the ability to predict the class on their own. The scarcity of labeled data, availability of large amounts of unlabeled data, richer set of features as compared to the conventional text classification tasks (specifically complementary and sufficient views of features) have encouraged us to use co-training as a tool to perform semi-supervised learning. During co-training labeled examples represented using the two views are used to learn distinct classifiers, which keep improving at each iteration by sharing the most confident predictions on the unlabeled data. In this work, we classify web-pages of .eu domain consisting of 1232 labeled host and 20000 unlabeled hosts (provided by the European Archive Foundation [Benczur et al., 2010]) into six different genres, using co-training. We compare our results with the results produced by standard supervised methods. We find that co-training can be an effective and cheap alternative to costly supervised learning. This is mainly due to the two independent and complementary feature sets of web: content based features and link based features. Web genre classification Co-training Semi-supervised learning Feature selection Roshan Chetry Computer Science (0984) Information Technology (0489) Web Studies (0646)
140	Méthodes d’apprentissage semi-supervisé basé sur les graphes et détection rapide des nœuds centraux / Graph-based semi-supervised learning methods and quick detection of central nodes Sokol, Marina 29 April 2014 (has links) Les méthodes d'apprentissage semi-supervisé constituent une catégorie de méthodes d'apprentissage automatique qui combinent points étiquetés et données non labellisées pour construire le classifieur. Dans la première partie de la thèse, nous proposons un formalisme d'optimisation général, commun à l'ensemble des méthodes d'apprentissage semi-supervisé et en particulier aux Laplacien Standard, Laplacien Normalisé et PageRank. En utilisant la théorie des marches aléatoires, nous caractérisons les différences majeures entre méthodes d'apprentissage semi-supervisé et nous définissons des critères opérationnels pour guider le choix des paramètres du noyau ainsi que des points étiquetés. Nous illustrons la portée des résultats théoriques obtenus sur des données synthétiques et réelles, comme par exemple la classification par le contenu et par utilisateurs des systèmes pair-à-pair. Cette application montre de façon édifiante que la famille de méthodes proposée passe parfaitement à l’échelle. Les algorithmes développés dans la deuxième partie de la thèse peuvent être appliquées pour la sélection des données étiquetées, mais également aux autres applications dans la recherche d'information. Plus précisément, nous proposons des algorithmes randomisés pour la détection rapide des nœuds de grands degrés et des nœuds avec de grandes valeurs de PageRank personnalisé. A la fin de la thèse, nous proposons une nouvelle mesure de centralité, qui généralise à la fois la centralité d'intermédiarité et PageRank. Cette nouvelle mesure est particulièrement bien adaptée pour la détection de la vulnérabilité de réseau. / Semi-supervised learning methods constitute a category of machine learning methods which use labelled points together with unlabeled data to tune the classifier. The main idea of the semi-supervised methods is based on an assumption that the classification function should change smoothly over a similarity graph. In the first part of the thesis, we propose a generalized optimization approach for the graph-based semi-supervised learning which implies as particular cases the Standard Laplacian, Normalized Laplacian and PageRank based methods. Using random walk theory, we provide insights about the differences among the graph-based semi-supervised learning methods and give recommendations for the choice of the kernel parameters and labelled points. We have illustrated all theoretical results with the help of synthetic and real data. As one example of real data we consider classification of content and users in P2P systems. This application demonstrates that the proposed family of methods scales very well with the volume of data. The second part of the thesis is devoted to quick detection of network central nodes. The algorithms developed in the second part of the thesis can be applied for the selections of quality labelled data but also have other applications in information retrieval. Specifically, we propose random walk based algorithms for quick detection of large degree nodes and nodes with large values of Personalized PageRank. Finally, in the end of the thesis we suggest new centrality measure, which generalizes both the current flow betweenness centrality and PageRank. This new measure is particularly well suited for detection of network vulnerability. Apprentissage automatique Apprentissage semi-supervisé PageRank Mesures de centralité Machine learning Semi-supervised learning PageRank Centrality measures Classification in P2P systems

Search results