Global ETD Search

181	Semi Supervised Learning for Accurate Segmentation of Roughly Labeled Data Rajan, Rachel 01 September 2020 (has links) No description available. Computer Engineering
182	A Data Analytic Methodology for Materials Informatics AbuOmar, Osama Yousef 17 May 2014 (has links) A data analytic materials informatics methodology is proposed after applying different data mining techniques on some datasets of particular domain in order to discover and model certain patterns, trends and behavior related to that domain. In essence, it is proposed to develop an information mining tool for vapor-grown carbon nanofiber (VGCNF)/vinyl ester (VE) nanocomposites as a case study. Formulation and processing factors (VGCNF type, use of a dispersing agent, mixing method, and VGCNF weight fraction) and testing temperature were utilized as inputs and the storage modulus, loss modulus, and tan delta were selected as outputs or responses. The data mining and knowledge discovery algorithms and techniques included self-organizing maps (SOMs) and clustering techniques. SOMs demonstrated that temperature had the most significant effect on the output responses followed by VGCNF weight fraction. A clustering technique, i.e., fuzzy C-means (FCM) algorithm, was also applied to discover certain patterns in nanocomposite behavior after using principal component analysis (PCA) as a dimensionality reduction technique. Particularly, these techniques were able to separate the nanocomposite specimens into different clusters based on temperature and tan delta features as well as to place the neat VE specimens in separate clusters. In addition, an artificial neural network (ANN) model was used to explore the VGCNF/VE dataset. The ANN was able to predict/model the VGCNF/VE responses with minimal mean square error (MSE) using the resubstitution and 3olds cross validation (CV) techniques. Furthermore, the proposed methodology was employed to acquire new information and mechanical and physical patterns and trends about not only viscoelastic VGCNF/VE nanocomposites, but also about flexural and impact strengths properties for VGCNF/ VE nanocomposites. Formulation and processing factors (curing environment, use or absence of dispersing agent, mixing method, VGCNF fiber loading, VGCNF type, high shear mixing time, sonication time) and testing temperature were utilized as inputs and the true ultimate strength, true yield strength, engineering elastic modulus, engineering ultimate strength, flexural modulus, flexural strength, storage modulus, loss modulus, and tan delta were selected as outputs. This work highlights the significance and utility of data mining and knowledge discovery techniques in the context of materials informatics. Vapor-grown carbon nanofiber Vinyl ester Materials informatics Knowledge discovery Data mining Unsupervised learning Supervised learning
183	Automatic evaluation of the effectiveness ofcommunication between software developers -NLP/AI Haapasaari Lindgren, Marcus, Persson, Jon January 2023 (has links) Communication is one of the most demanding andimportant parts of effective software development.Furthermore, the effectiveness of software developmentcommunication can be measured with the three collaborativeinterpersonal problem-solving conversation dimensions:Active Discussion, Creative Conflict, and ConversationManagement.Previous work that utilized these dimensions to analyzecommunication relied on manually labeling thecommunication, a process that is time-consuming and notapplicable to real-time use.In this study, natural language processing and supervisedmachine learning were investigated for the automaticclassification and measurement of collaborativeinterpersonal problem-solving conversation dimensions intranscribed software development communication. Thisapproach enables the evaluation of communication andprovides suggestions to improve software developmentefficiency.To determine the optimal classification approach, this workexamined nine different classifiers. It was determined thatthe classifier that scored the highest was Random Forest,followed by Decision Tree and SVM.Random Forest managed to achieve accuracy, precision, andrecall up to 93.66%, 93.76%, and 93.63%, respectively whentrained and tested with stratified 10-fold cross-validation. Machine Learning Natural Language Processing Conversation Supervised Learning Software Engineering Programvaruteknik
184	Predicting Airbnb Prices in European Cities Using Machine Learning Gangarapu, Shalini, Mernedi, Venkata Surya Akash January 2023 (has links) Background: Machine learning is a field of computer science that focuses on creating models that can predict patterns and relations among data. In this thesis, we use machine learning to predict Airbnb prices in various European cities to help the hosts in setting reasonable prices for their properties. Different supervised machine learning algorithms will be used to determine which model will provide the highest accuracy so that hosts set profitable prices for their housing properties. Objectives: The main goal of this thesis is to use machine learning algorithms to assist the hosts in setting reasonable rental prices for their properties so that they can keep their properties affordable for renters across Europe and achieve maximum occupancy. Methods: The dataset for Airbnb in European cities is gathered from Kaggle and then has been pre-processed using techniques like one-hot encoding, label encoder, standardscaler and principle component analysis. The data set is divided into three parts for training, validation and testing. Next, feature selection is done to determine the most important features that contribute to the pricing, and the dimensionality of the dataset is reduced. Supervised machine learning algorithms are utilized for training. The models are evaluated with reliable performance estimates after tuning the hyperparameters using k-fold cross-validation. Results: The feature_importance_ predicts that room capacity, type of room(shared or not), and the country appear in all three algorithms. Although scores vary between algorithms, these are among the top five attributes that influence the target variable. Day, cleanliness rating, and attr index are some other attributes that are among the top five characteristics. Among the chosen learning algorithms, the random forest regressor gave the best regression model with a R2 score of 0.70. The second best is the gradient boosting regressor with a R2 score of 0.32. While SVM gave the least score of 0.06. Conclusions: Random forest regressor was the best algorithm for predicting the prices of Airbnb and suggests hosts setting reasonable rental prices for their properties with more accurate pricing for renters across Europe compared to other chosen models. Contrary to our expectations SVM had performed the least for this dataset. Machine Learning Supervised Learning Regression Algorithms Airbnb Price Prediction Engineering and Technology Teknik och teknologier
185	Time-domain Deep Neural Networks for Speech Separation Sun, Tao 24 May 2022 (has links) No description available. Computer Science Speech Separation Deep Neural Networks Self-supervised Learning Speech Enhancement Speaker Separation
186	Using Instance-Level Meta-Information to Facilitate a More Principled Approach to Machine Learning Smith, Michael Reed 01 April 2015 (has links) (PDF) As the capability for capturing and storing data increases and becomes more ubiquitous, an increasing number of organizations are looking to use machine learning techniques as a means of understanding and leveraging their data. However, the success of applying machine learning techniques depends on which learning algorithm is selected, the hyperparameters that are provided to the selected learning algorithm, and the data that is supplied to the learning algorithm. Even among machine learning experts, selecting an appropriate learning algorithm, setting its associated hyperparameters, and preprocessing the data can be a challenging task and is generally left to the expertise of an experienced practitioner, intuition, trial and error, or another heuristic approach. This dissertation proposes a more principled approach to understand how the learning algorithm, hyperparameters, and data interact with each other to facilitate a data-driven approach for applying machine learning techniques. Specifically, this dissertation examines the properties of the training data and proposes techniques to integrate this information into the learning process and for preprocessing the training set.It also proposes techniques and tools to address selecting a learning algorithm and setting its hyperparameters.This dissertation is comprised of a collection of papers that address understanding the data used in machine learning and the relationship between the data, the performance of a learning algorithm, and the learning algorithms associated hyperparameter settings.Contributions of this dissertation include:* Instance hardness that examines how difficult an instance is to classify correctly.* hardness measures that characterize properties of why an instance may be misclassified.* Several techniques for integrating instance hardness into the learning process. These techniques demonstrate the importance of considering each instance individually rather than doing a global optimization which considers all instances equally.* Large-scale examinations of the investigated techniques including a large numbers of examined data sets and learning algorithms. This provides more robust results that are less likely to be affected by noise.* The Machine Learning Results Repository, a repository for storing the results from machine learning experiments at the instance level (the prediction for each instance is stored). This allows many data set-level measures to be calculated such as accuracy, precision, or recall. These results can be used to better understand the interaction between the data, learning algorithms, and associated hyperparameters. Further, the repository is designed to be a tool for the community where data can be downloaded and uploaded to follow the development of machine learning algorithms and applications. machine learning supervised learning classification meta-learning instance hardness machine learning results repository Computer Sciences
187	Semi-Supervised Learning with Sparse Autoencoders in Automatic Speech Recognition / Semi-övervakad inlärning med glesa autoencoders i automatisk taligenkänning DHAKA, AKASH KUMAR January 2016 (has links) This work is aimed at exploring semi-supervised learning techniques to improve the performance of Automatic Speech Recognition systems. Semi-supervised learning takes advantage of unlabeled data in order to improve the quality of the representations extracted from the data.The proposed model is a neural network where the weights are updated by minimizing the weighted sum of a supervised and an unsupervised cost function, simultaneously. These costs are evaluated on the labeled and unlabeled portions of the data set, respectively. The combined cost is optimized through mini-batch stochastic gradient descent via standard backpropagation.The model was tested on a phone classification task on the TIMIT American English data set and on a written digit classification task on the MNIST data set. Our results show that the model outperforms a network trained with standard backpropagation on the labelled material alone. The results are also in line with state-of-the-art graph-based semi-supervised training methods. / Detta arbete syftar till att utforska halvövervakade inlärningstekniker (semi-supervised learning techniques) för att förbättra prestandan hos automatiska taligenkänningssystem.Halvövervakad maskininlärning använder sig av data ej märkt med klasstillhörighetsinformation för att förbättra kvaliteten hos den från datan extraherade representationen.Modellen som beskrivs i arbetet är ett neuralt nätverk där vikterna uppdateras genom att samtidigt minimera den viktade summan av en övervakad och en oövervakad kostnadsfunktion.Dessa kostnadsfunktioner evalueras på den märkta respektive den omärkta datamängden.De kombinerade kostnadsfunktionerna optimeras genom gradient descent med hjälp av traditionell backpropagation.Modellen har evaluerats genom en fonklassificeringsuppgift på datamängden TIMIT American English, samt en sifferklassificeringsuppgift på datamängden MNIST.Resultaten visar att modellen presterar bättre än ett nätverk tränat med backpropagation på endast märkt data.Resultaten är även konkurrenskraftiga med rådande state of the art, grafbaserade halvövervakade inlärningsmetoder. machine learning automatic speech recognition semi supervised learning Computer Sciences Datavetenskap (datalogi)
188	Study of Semi-supervised Deep Learning Methods on Human Activity Recognition Tasks Song, Shiping January 2019 (has links) This project focuses on semi-supervised human activity recognition (HAR) tasks, in which the inputs are partly labeled time series data acquired from sensors such as accelerometer data, and the outputs are predefined human activities. Most state-of-the-art existing work in HAR area is supervised now, which relies on fully labeled datasets. Since the cost to label the collective instances increases fast with the increasing scale of data, semi-supervised methods are now widely required. This report proposed two semi-supervised methods and then investigated how well they perform on a partly labeled dataset, comparing to the state-of-the-art supervised method. One of these methods is designed based on the state-of-the-art supervised method, Deep-ConvLSTM, together with the semi-supervised learning concepts, self-training. Another one is modified based on a semi-supervised deep learning method, LSTM initialized by seq2seq autoencoder, which is firstly introduced for natural language processing. According to the experiments on a published dataset (Opportunity Activity Recognition dataset), both of these semi-supervised methods have better performance than the state-of-the-art supervised methods. / Detta projekt fokuserar på halvövervakad Human Activity Recognition (HAR), där indata delvis är märkta tidsseriedata från sensorer som t.ex. accelerometrar, och utdata är fördefinierade mänskliga aktiviteter. De främsta arbetena inom HAR-området använder numera övervakade metoder, vilka bygger på fullt märkta dataset. Eftersom kostnaden för att märka de samlade instanserna ökar snabbt med den ökade omfattningen av data, föredras numera ofta halvövervakade metoder. I denna rapport föreslås två halvövervakade metoder och det undersöks hur bra de presterar på ett delvis märkt dataset jämfört med den moderna övervakade metoden. En av dessa metoder utformas baserat på en högkvalitativ övervakad metod, DeepConvLSTM, kombinerad med självutbildning. En annan metod baseras på en halvövervakad djupinlärningsmetod, LSTM, initierad av seq2seq autoencoder, som först införs för behandling av naturligt språk. Enligt experimenten på ett publicerat dataset (Opportunity Activity Recognition dataset) har båda dessa metoder bättre prestanda än de toppmoderna övervakade metoderna. Semi-supervised learning Sequence learning Human activity recognization DeepConvLSTM Seq2seq model Computer Sciences Datavetenskap (datalogi)
189	Comparative Study of the Combined Performance of Learning Algorithms and Preprocessing Techniques for Text Classification Grancharova, Mila, Jangefalk, Michaela January 2018 (has links) With the development in the area of machine learning, society has become more dependent on applications that build on machine learning techniques. Despite this, there are extensive classification tasks which are still performed by humans. This is time costly and often results in errors. One application in machine learning is text classification which has been researched a lot the past twenty years. Text classification tasks can be automated through the machine learning technique supervised learning which can lead to increased performance compared to manual classification. When handling text data, the data often has to be preprocessed in different ways to assure a good classification. Preprocessing techniques have been shown to increase performance of text classification through supervised learning. Different processing techniques affect the performance differently depending on the choice of learning algorithm and characteristics of the data set. This thesis investigates how classification accuracy is affected by different learning algorithms and different preprocessing techniques for a specific customer feedback data set. The researched algorithms are Naïve Bayes, Support Vector Machine and Decision Tree. The research is done by experiments with dependency on algorithm and combinations of preprocessing techniques. The results show that spelling correction and removing stop words increase the accuracy for all classifiers while stemming lowers the accuracy for all classifiers. Furthermore, Decision Tree was most positively affected by preprocessing while Support Vector Machine was most negatively affected. A deeper study on why the preprocessing techniques affected the algorithms in such a way is recommended for future work. / I och med utvecklingen inom området maskininlärning har samhället blivit mer beroende av applikationer som bygger på maskininlärningstekniker. Trots detta finns omfattande klassificeringsuppgifter som fortfarande utförs av människor. Detta är tidskrävande och resulterar ofta i olika typer av fel. En uppgift inom maskininlärning är textklassificering som har forskats mycket i de senaste tjugo åren. Textklassificering kan automatiseras genom övervakad maskininlärningsteknik vilket kan leda till effektiviseringar jämfört med manuell klassificering. Ofta måste textdata förbehandlas på olika sätt för att säkerställa en god klassificering. Förbehandlingstekniker har visat sig öka textklassificeringens prestanda genom övervakad inlärning. Olika förbetningstekniker påverkar prestandan olika beroende på valet av inlärningsalgoritm och egenskaper hos datamängden. Denna avhandling undersöker hur klassificeringsnoggrannheten påverkas av olika inlärningsalgoritmer och olika förbehandlingstekniker för en specifik datamängd som utgörs av kunddata. De undersökta algoritmerna är naïve Bayes, supportvektormaskin och beslutsträd. Undersökningen görs genom experiment med beroende av algoritm och kombinationer av förbehandlingstekniker. Resultaten visar att stavningskorrektion och borttagning av stoppord ökar noggrannheten för alla klassificerare medan stämming sänker noggrannheten för alla. Decision Tree var dessutom mest positivt påverkad av de olika förbehandlingsmetoderna medan Support Vector Machine påverkades mest negativt. En djupare studie om varför förbehandlingsresultaten påverkat algoritmerna på ett sådant sätt rekommenderas för framtida arbete. Computer and Information Sciences Data- och informationsvetenskap
190	Prediction of Optimal Packaging Solution using Supervised Learning Methods / Förutsägelse av optimal förpackningslösning med övervakade inlärningsmodeller Chari, Anirudh Venkat January 2020 (has links) This thesis investigates the feasibility of supervised learning models in the decision-making problem to package products and predict an optimal packaging solution. The decision-making problem was broken down into a multi-class classification and a regression problem using relevant literature. Supervised learning models from the field of logistics were shortlisted namely; Generalized Linear Models, Support Vector Machines, Random Forest and Gradient Boosted Trees using CatBoost. The performance of the models were evaluated based on relevant metrics, interpretability and ease of implementation. The results from this thesis show that the Random Forest model had the best performance on all the aforementioned criteria in both the classification and regression problems. / Denna avhandling undersöker möjligheten att genomföra övervakade inlärningsmodeller i syfte att förbättra beslutsprocessen kring produktpaketering samt att förutsäga en optimal förpackningslösning. Beslutsfattandeprocessen bröts ner i klassificeringsdelar samt ett regressionsproblem med hjälp av relevant litteratur. De övervakade inlärningsmodeller från logistikområdet som har använts är ”Generalized Linear Models”, ”Support Vector Machines”, ”Random Forest” och ”Gradient Boosted Trees using CatBoost”. Modellerna har utvärderades utifrån relevanta mätvärden, tolkbarhet och enkelhet avseende implementering. Resultaten i denna avhandling visar att ”Random Forest”-modellen har bäst prestanda på alla ovannämnda kriterier, både vad gäller klassificerings- och regressionsproblemen. Supervised learning machine learning product packaging logistics Övervakade inlärningsmodeller förpackningslösning logistik maskininlärning Mathematics Matematik

Search results