Global ETD Search

151	Practical Morphological Modeling: Insights from Dialectal Arabic Erdmann, Alexander January 2020 (has links) No description available. Computer Science Linguistics Computational Linguistics Unsupervised Learning Computational Morphology Machine Translation Segmentation Arabic Dialectology Language Complexity Linguistic Typology
152	Bayesian Test Analytics for Document Collections Walker, Daniel David 15 November 2012 (has links) (PDF) Modern document collections are too large to annotate and curate manually. As increasingly large amounts of data become available, historians, librarians and other scholars increasingly need to rely on automated systems to efficiently and accurately analyze the contents of their collections and to find new and interesting patterns therein. Modern techniques in Bayesian text analytics are becoming wide spread and have the potential to revolutionize the way that research is conducted. Much work has been done in the document modeling community towards this end,though most of it is focused on modern, relatively clean text data. We present research for improved modeling of document collections that may contain textual noise or that may include real-valued metadata associated with the documents. This class of documents includes many historical document collections. Indeed, our specific motivation for this work is to help improve the modeling of historical documents, which are often noisy and/or have historical context represented by metadata. Many historical documents are digitized by means of Optical Character Recognition(OCR) from document images of old and degraded original documents. Historical documents also often include associated metadata, such as timestamps,which can be incorporated in an analysis of their topical content. Many techniques, such as topic models, have been developed to automatically discover patterns of meaning in large collections of text. While these methods are useful, they can break down in the presence of OCR errors. We show the extent to which this performance breakdown occurs. The specific types of analyses covered in this dissertation are document clustering, feature selection, unsupervised and supervised topic modeling for documents with and without OCR errors and a new supervised topic model that uses Bayesian nonparametrics to improve the modeling of document metadata. We present results in each of these areas, with an emphasis on studying the effects of noise on the performance of the algorithms and on modeling the metadata associated with the documents. In this research we effectively: improve the state of the art in both document clustering and topic modeling; introduce a useful synthetic dataset for historical document researchers; and present analyses that empirically show how existing algorithms break down in the presence of OCR errors. topic modeling Bayesian nonparametrics ocr text mining text analytics document clustering clustering feature selection unsupervised learning machine learning Computer Sciences
153	Presence detection by means of RF waveform classification Lengdell, Max January 2022 (has links) This master thesis investigates the possibility to automatically label and classify radio waves for presence detection, where the objective is to obtain information about the number of people in a room based on channel estimates. Labeling data for machine learning is time consuming and tedious process. To address this two approaches are evaluated. One was to develop a framework to generate labels with the aid of computer vision AI. The other relies on unsupervised learning classifiers complemented with heuristics to generate the labels. The investigation also studies the performance of the classifiers as a function of the TX/RX configuration, SNR, number of consecutive samples in a feature vector, bandwidth and frequency band. When someone moves in a room the propagation environment changes and induces variations in the channel estimates, compared to when the room is empty. These variations are the fundamental concept that is exploited in this thesis. Two methods are suggested to perform classification without the need of training data. The first uses random trees embeddings to construct a random forest without labels and the second using statistical bootstrapping with a random forest classifier. The labels used for annotation indicate whether were zero, one or two people in the room. The performance of binary and non-binary classification is evaluated both for the two blind detection models, as well as the performance of the unsupervised learning techniques Kmeans and self-organizing maps. For classification both supervised and unsupervised learning use random forest classifiers. Results show that random forest classifiers perform well for this kind of problem, and that random tree embeddings are able to extract relational data that could be used for automatic labeling of the data. presence detection machine learning radio wave classification unsupervised learning supervised learning sensor fusion unsupervised classification Computer Engineering Datorteknik
154	Conditional Noise-Contrastive Estimation : With Application to Natural Image Statistics / Uppskattning via betingat kontrastivt brus Ceylan, Ciwan January 2017 (has links) Unnormalised parametric models are an important class of probabilistic models which are difficult to estimate. The models are important since they occur in many different areas of application, e.g. in modelling of natural images, natural language and associative memory. However, standard maximum likelihood estimation is not applicable to unnormalised models, so alternative methods are required. Noise-contrastive estimation (NCE) has been proposed as an effective estimation method for unnormalised models. The basic idea is to transform the unsupervised estimation problem into a supervised classification problem. The parameters of the unnormalised model are learned by training the model to differentiate the given data samples from generated noise samples. However, the choice of the noise distribution has been left open to the user, and as the performance of the estimation may be sensitive to this choice, it is desirable for it to be automated. In this thesis, the ambiguity in the choice of the noise distribution is addressed by presenting the previously unpublished conditional noise-contrastive estimation (CNCE) method. Like NCE, CNCE estimates unnormalised models by classifying data and noise samples. However, the choice of noise distribution is partly automated via the use of a conditional noise distribution that is dependent on the data. In addition to introducing the core theory for CNCE, the method is empirically validated on data and models where the ground truth is known. Furthermore, CNCE is applied to natural image data to show its applicability in a realistic application. / Icke-normaliserade parametriska modeller utgör en viktig klass av svåruppskattade statistiska modeller. Dessa modeller är viktiga eftersom de uppträder inom många olika tillämpningsområden, t.ex. vid modellering av bilder, tal och skrift och associativt minne. Dessa modeller är svåruppskattade eftersom den vanliga maximum likelihood-metoden inte är tillämpbar på icke-normaliserade modeller. Noise-contrastive estimation (NCE) har föreslagits som en effektiv metod för uppskattning av icke-normaliserade modeller. Grundidén är att transformera det icke-handledda uppskattningsproblemet till ett handlett klassificeringsproblem. Den icke-normaliserade modellens parametrar blir inlärda genom att träna modellen på att skilja det givna dataprovet från ett genererat brusprov. Dock har valet av brusdistribution lämnats öppet för användaren. Eftersom uppskattningens prestanda är känslig gentemot det här valet är det önskvärt att få det automatiserat. I det här examensarbetet behandlas valet av brusdistribution genom att presentera den tidigare opublicerade metoden conditional noise-contrastive estimation (CNCE). Liksom NCE uppskattar CNCE icke-normaliserade modeller via klassificering av data- och brusprov. I det här fallet är emellertid brusdistributionen delvis automatiserad genom att använda en betingad brusdistribution som är beroende på dataprovet. Förutom att introducera kärnteorin för CNCE valideras även metoden med hjälp av data och modeller vars genererande parametrar är kända. Vidare appliceras CNCE på bilddata för att demonstrera dess tillämpbarhet. noise-contrastive estimation NCE unnormalised models statistical estimation natural image statistics unsupervised learning neural network Computer Sciences Datavetenskap (datalogi)
155	Clustering Web Users by Mouse Movement to Detect Bots and Botnet Attacks Morgan, Justin L 01 March 2021 (has links) (PDF) The need for website administrators to efficiently and accurately detect the presence of web bots has shown to be a challenging problem. As the sophistication of modern web bots increases, specifically their ability to more closely mimic the behavior of humans, web bot detection schemes are more quickly becoming obsolete by failing to maintain effectiveness. Though machine learning-based detection schemes have been a successful approach to recent implementations, web bots are able to apply similar machine learning tactics to mimic human users, thus bypassing such detection schemes. This work seeks to address the issue of machine learning based bots bypassing machine learning-based detection schemes, by introducing a novel unsupervised learning approach to cluster users based on behavioral biometrics. The idea is that, by differentiating users based on their behavior, for example how they use the mouse or type on the keyboard, information can be provided for website administrators to make more informed decisions on declaring if a user is a human or a bot. This approach is similar to how modern websites require users to login before browsing their website; which in doing so, website administrators can make informed decisions on declaring if a user is a human or a bot. An added benefit of this approach is that it is a human observational proof (HOP); meaning that it will not inconvenience the user (user friction) with human interactive proofs (HIP) such as CAPTCHA, or with login requirements Bot Detection Botnet Detection Web Scraping Data Science Statistics Unsupervised Learning Computer Sciences Data Science Information Security Statistics and Probability
156	MENTAL STRESS AND OVERLOAD DETECTION FOR OCCUPATIONAL SAFETY Eskandar, Sahel January 2022 (has links) Stress and overload are strongly associated with unsafe behaviour, which motivated various studies to detect them automatically in workplaces. This study aims to advance safety research by developing a data-driven stress and overload detection method. An unsupervised deep learning-based anomaly detection method is developed to detect stress. The proposed method performs with convolutional neural network encoder-decoder and long short-term memory equipped with an attention layer. Data from a field experiment with 18 participants was used to train and test the developed method. The field experiment was designed to include a pre-defined sequence of activities triggering mental and physical stress, while a wristband biosensor was used to collect physiological signals. The collected contextual and physiological data were pre-processed and then resampled into correlation matrices of 14 features. Correlation matrices are used as an input to the unsupervised Deep Learning (DL) based anomaly detection method. The developed method is validated, offering accuracy and F-measures close to 0.98. The technique employed captures the input data attributes correlation, promoting higher interpretability of the DL method for easier comprehension. Over-reliance on uncertain absolute truth, the need for a high number of training samples, and the requirement of a threshold for detecting anomalies are identified as shortcomings of the proposed method. To overcome these shortcomings, an Adaptive Neuro-Fuzzy Inference System (ANFIS) was designed and developed. While the ANFIS method did not improve the overall accuracy, it outperformed the DL-based method in detecting anomalies precisely. The overall performance of the ANFIS method is better than the DL-based method for the anomalous class, and the method results in lower false alarms. However, the DL-based method is suitable for circumstances where false alarms are tolerated. / Dissertation / Doctor of Philosophy (PhD) Worker Safety Data-Driven Health Monitoring Stress and Overload Wearable Sensors Unsupervised Learning Adaptive Neuro-Fuzzy Inference System
157	Optimized material flow using unsupervised time series clustering : An experimental study on the just in time supermarket for Volvo powertrain production Skövde. Darwish, Amena January 2019 (has links) Machine learning has achieved remarkable performance in many domains, now it promising to solve manufacturing problems — a new ongoing trend of using machine learning in industrial applications. Dealing with the material order demand in manufacturing as time-series sequences, making unsupervised time-series clustering possible to apply. This study aims to evaluate different time-series clustering approaches, algorithms, and distance measures in material flow data. Three different approaches are evaluated; statistical clustering approaches; raw based and shape-based approaches and at last feature-based approach. The objectives are to categorize the materials in the supermarket (intermediate storage area to store materials before assembling the products) into three different flows according to their time-series properties. The experimental shows that feature-based approach is performed best for the data. A features filter is applied to keep the relevant features, that catch the unique characteristics from the data the predicted output. As a conclusion data type, structure, the goal of the clustering task and the application domains are reasons that have to consider when choosing the suitable clustering approach. Time-series clustering unsupervised learning optimization material flow supermarket replenishment extract relevant features Computer and Information Sciences Data- och informationsvetenskap
158	Exploring Multi-Domain and Multi-Modal Representations for Unsupervised Image-to-Image Translation Liu, Yahui 20 May 2022 (has links) Unsupervised image-to-image translation (UNIT) is a challenging task in the image manipulation field, where input images in a visual domain are mapped into another domain with desired visual patterns (also called styles). An ideal direction in this field is to build a model that can map an input image in a domain to multiple target domains and generate diverse outputs in each target domain, which is termed as multi-domain and multi-modal unsupervised image-to-image translation (MMUIT). Recent studies have shown remarkable results in UNIT but they suffer from four main limitations: (1) State-of-the-art UNIT methods are either built from several two-domain mappings that are required to be learned independently or they generate low-diversity results, a phenomenon also known as model collapse. (2) Most of the manipulation is with the assistance of visual maps or digital labels without exploring natural languages, which could be more scalable and flexible in practice. (3) In an MMUIT system, the style latent space is usually disentangled between every two image domains. While interpolations within domains are smooth, interpolations between two different domains often result in unrealistic images with artifacts when interpolating between two randomly sampled style representations from two different domains. Improving the smoothness of the style latent space can lead to gradual interpolations between any two style latent representations even between any two domains. (4) It is expensive to train MMUIT models from scratch at high resolution. Interpreting the latent space of pre-trained unconditional GANs can achieve pretty good image translations, especially high-quality synthesized images (e.g., 1024x1024 resolution). However, few works explore building an MMUIT system with such pre-trained GANs. In this thesis, we focus on these vital issues and propose several techniques for building better MMUIT systems. First, we base on the content-style disentangled framework and propose to fit the style latent space with Gaussian Mixture Models (GMMs). It allows a well-trained network using a shared disentangled style latent space to model multi-domain translations. Meanwhile, we can randomly sample different style representations from a Gaussian component or use a reference image for style transfer. Second, we show how the GMM-modeled latent style space can be combined with a language model (e.g., a simple LSTM network) to manipulate multiple styles by using textual commands. Then, we not only propose easy-to-use constraints to improve the smoothness of the style latent space in MMUIT models, but also design a novel metric to quantitatively evaluate the smoothness of the style latent space. Finally, we build a new model to use pretrained unconditional GANs to do MMUIT tasks.
159	Readability: Man and Machine : Using readability metrics to predict results from unsupervised sentiment analysis / Läsbarhet: Människa och maskin : Användning av läsbarhetsmått för att förutsäga resultaten från oövervakad sentimentanalys Larsson, Martin, Ljungberg, Samuel January 2021 (has links) Readability metrics assess the ease with which human beings read and understand written texts. With the advent of machine learning techniques that allow computers to also analyse text, this provides an interesting opportunity to investigate whether readability metrics can be used to inform on the ease with which machines understand texts. To that end, the specific machine analysed in this paper uses word embeddings to conduct unsupervised sentiment analysis. This specification minimises the need for labelling and human intervention, thus relying heavily on the machine instead of the human. Across two different datasets, sentiment predictions are made using Google’s Word2Vec word embedding algorithm, and are evaluated to produce a dichotomous output variable per sentiment. This variable, representing whether a prediction is correct or not, is then used as the dependent variable in a logistic regression with 17 readability metrics as independent variables. The resulting model has high explanatory power and the effects of readability metrics on the results from the sentiment analysis are mostly statistically significant. However, metrics affect sentiment classification in the two datasets differently, indicating that the metrics are expressions of linguistic behaviour unique to the datasets. The implication of the findings is that readability metrics could be used directly in sentiment classification models to improve modelling accuracy. Moreover, the results also indicate that machines are able to pick up on information that human beings do not pick up on, for instance that certain words are associated with more positive or negative sentiments. / Läsbarhetsmått bedömer hur lätt eller svårt det är för människor att läsa och förstå skrivna texter. Eftersom nya maskininlärningstekniker har utvecklats kan datorer numera också analysera texter. Därför är en intressant infallsvinkel huruvida läsbarhetsmåtten också kan användas för att bedöma hur lätt eller svårt det är för maskiner att förstå texter. Mot denna bakgrund använder den specifika maskinen i denna uppsats ordinbäddningar i syfte att utföra oövervakad sentimentanalys. Således minimeras behovet av etikettering och mänsklig handpåläggning, vilket resulterar i en mer djupgående analys av maskinen istället för människan. I två olika dataset jämförs rätt svar mot sentimentförutsägelser från Googles ordinbäddnings-algoritm Word2Vec för att producera en binär utdatavariabel per sentiment. Denna variabel, som representerar om en förutsägelse är korrekt eller inte, används sedan som beroende variabel i en logistisk regression med 17 olika läsbarhetsmått som oberoende variabler. Den resulterande modellen har högt förklaringsvärde och effekterna av läsbarhetsmåtten på resultaten från sentimentanalysen är mestadels statistiskt signifikanta. Emellertid är effekten på klassificeringen beroende på dataset, vilket indikerar att läsbarhetsmåtten ger uttryck för olika lingvistiska beteenden som är unika till datamängderna. Implikationen av resultaten är att läsbarhetsmåtten kan användas direkt i modeller som utför sentimentanalys för att förbättra deras prediktionsförmåga. Dessutom indikerar resultaten också att maskiner kan plocka upp på information som människor inte kan, exempelvis att vissa ord är associerade med positiva eller negativa sentiment. Natural language processing Unsupervised learning Sentiment analysis Word embeddings Readability Språkteknologi Oövervakad inlärning Sentimentanalys Ordinbäddningar Läsbarhet Computer Sciences Datavetenskap (datalogi)
160	Product Matching through Multimodal Image and Text Combined Similarity Matching / Produktmatchning Genom Multimodal Kombinerad Bild- och Textlikhetsmatchning Ko, E Soon January 2021 (has links) Product matching in e-commerce is an area that faces more and more challenges with growth in the e-commerce marketplace as well as variation in the quality of data available online for each product. Product matching for e-commerce provides competitive possibilities for vendors and flexibility for customers by identifying identical products from different sources. Traditional methods in product matching are often conducted through rule-based methods and methods tackling the issue through machine learning usually do so through unimodal systems. Moreover, existing methods would tackle the issue through product identifiers which are not always unified for each product. This thesis provides multimodal approaches through product name, description, and image to the problem area of product matching that outperforms unimodal approaches. Three multimodal approaches were taken, one unsupervised and two supervised. The unsupervised approach uses straight-forward embedding space to nearest neighbor search that provides better results than unimodal approaches. One of the supervised multimodal approaches uses Siamese network on the embedding space which outperforms the unsupervised multi- modal approach. Finally, the last supervised approach instead tackles the issue by exploiting distance differences in each modality through logistic regression and a decision system that provided the best results. / Produktmatchning inom e-handel är ett område som möter fler och fler utmaningar med hänsyn till den tillväxt som e-handelsmarknaden undergått och fortfarande undergår samt variation i kvaliteten på den data som finns tillgänglig online för varje produkt. Produktmatchning inom e-handel är ett område som ger konkurrenskraftiga möjligheter för leverantörer och flexibilitet för kunder genom att identifiera identiska produkter från olika källor. Traditionella metoder för produktmatchning genomfördes oftast genom regelbaserade metoder och metoder som utnyttjar maskininlärning gör det vanligtvis genom unimodala system. Dessutom utnyttjar mestadels av befintliga metoder produktidentifierare som inte alltid är enhetliga för varje produkt mellan olika källor. Denna studie ger istället förslag till multimodala tillvägagångssätt som istället använder sig av produktnamn, produktbeskrivning och produktbild för produktmatchnings-problem vilket ger bättre resultat än unimodala metoder. Tre multimodala tillvägagångssätt togs, en unsupervised och två supervised. Den unsupervised metoden använder embeddings vektorerna rakt av för att göra en nearest neighborsökning vilket gav bättre resultat än unimodala tillvägagångssätt. Ena supervised multimodal tillvägagångssätten använder siamesiska nätverk på embedding utrymmet vilket gav resultat som överträffade den unsupervised multimodala tillvägagångssättet. Slutligen tar den sista supervised metoden istället avståndsskillnader i varje modalitet genom logistisk regression och ett beslutssystem som gav bästa resultaten. Multimodal Machine Learning Product Matching Similarity Matching Supervised Learning Unsupervised Learning Siamese network Computer and Information Sciences Data- och informationsvetenskap

Search results