• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 45
  • 6
  • 5
  • 2
  • 1
  • 1
  • Tagged with
  • 67
  • 67
  • 42
  • 40
  • 33
  • 32
  • 32
  • 32
  • 32
  • 31
  • 29
  • 28
  • 28
  • 28
  • 28
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Amélioration du système de recueils d'information de l'entreprise Semantic Group Company grâce à la constitution de ressources sémantiques / Improvement of the information system of the Semantic Group Company through the creation of semantic resources

Yahaya Alassan, Mahaman Sanoussi 05 October 2017 (has links)
Prendre en compte l'aspect sémantique des données textuelles lors de la tâche de classification s'est imposé comme un réel défi ces dix dernières années. Cette difficulté vient s'ajouter au fait que la plupart des données disponibles sur les réseaux sociaux sont des textes courts, ce qui a notamment pour conséquence de rendre les méthodes basées sur la représentation "bag of words" peu efficientes. L'approche proposée dans ce projet de recherche est différente des approches proposées dans les travaux antérieurs sur l'enrichissement des messages courts et ce pour trois raisons. Tout d'abord, nous n'utilisons pas des bases de connaissances externes comme Wikipedia parce que généralement les messages courts qui sont traités par l'entreprise proveniennent des domaines spécifiques. Deuxièment, les données à traiter ne sont pas utilisées pour la constitution de ressources à cause du fonctionnement de l'outil. Troisièment, à notre connaissance il n'existe pas des travaux d'une part qui exploitent des données structurées comme celles de l'entreprise pour constituer des ressources sémantiques, et d'autre part qui mesurent l'impact de l'enrichissement sur un système interactif de regroupement de flux de textes. Dans cette thèse, nous proposons la création de ressources permettant d'enrichir les messages courts afin d'améliorer la performance de l'outil du regroupement sémantique de l'entreprise Succeed Together. Ce dernier implémente des méthodes de classification supervisée et non supervisée. Pour constituer ces ressources, nous utilisons des techniques de fouille de données séquentielles. / Taking into account the semantic aspect of the textual data during the classification task has become a real challenge in the last ten years. This difficulty is in addition to the fact that most of the data available on social networks are short texts, which in particular results in making methods based on the "bag of words" representation inefficient. The approach proposed in this research project is different from the approaches proposed in previous work on the enrichment of short messages for three reasons. First, we do not use external knowledge like Wikipedia because typically short messages that are processed by the company come from specific domains. Secondly, the data to be processed are not used for the creation of resources because of the operation of the tool. Thirdly, to our knowledge there is no work on the one hand, which uses structured data such as the company's data to constitute semantic resources, and on the other hand, which measure the impact of enrichment on a system Interactive grouping of text flows. In this thesis, we propose the creation of resources enabling to enrich the short messages in order to improve the performance of the tool of the semantic grouping of the company Succeed Together. The tool implements supervised and unsupervised classification methods. To build these resources, we use sequential data mining techniques.
62

Reprezentace textu a její vliv na kategorizaci / Representation of Text and Its Influence on Categorization

Šabatka, Ondřej January 2010 (has links)
The thesis deals with machine processing of textual data. In the theoretical part, issues related to natural language processing are described and different ways of pre-processing and representation of text are also introduced. The thesis also focuses on the usage of N-grams as features for document representation and describes some algorithms used for their extraction. The next part includes an outline of classification methods used. In the practical part, an application for pre-processing and creation of different textual data representations is suggested and implemented. Within the experiments made, the influence of these representations on accuracy of classification algorithms is analysed.
63

A GENERAL FRAMEWORK FOR CUSTOMER CONTENT PRINT QUALITY DEFECT DETECTION AND ANALYSIS

Runzhe Zhang (11442742) 11 July 2022 (has links)
<p>Print quality (PQ) is one of the most significant issues with electrophotographic printers. There are many reasons for PQ issues, such as limitations of the electrophotographic process, faulty printer components, or other failures of the print mechanism. These reasons can produce different PQ issues, like streaks, bands, gray spots, text fading, and color fading defects. It is important to analyze the nature and causes of different print defects to more efficiently repair printers and improve the electrophotographic process. </p> <p><br></p> <p>We design a general framework for print quality detection and analysis of customer content. This print quality analysis framework inputs the original digital image saved on the computer and then the scanned image. This framework includes two main modules: image pre-processing, print defects feature vector extraction, and classification. The first module, image pre-processing, includes image registration, color calibration, and region of interest (ROI) extraction. The ROI extraction part is designed to extract four different kinds of ROI from the digital master image. Because different ROIs include different print defects, for example, the symbol ROI includes the text fading defect, and the raster ROI includes the color fading defect. The second module includes different ROI print defects detection and analysis algorithms. We classify different ROI print defects using their feature vector based on their severity. This module proposed four important defects detection methods: uniform color area streak detection, symbol ROI color text fading detection, raster ROI color fading detection using a novel unsupervised clustering method, and raster ROI streak detection. We will introduce the details of these algorithms in this thesis. </p> <p><br></p> <p>We will also show two other interesting print quality projects: print margin skew detection and print velocity simulation and estimation. Print margin skew detection proposes an algorithm that uses the Hough Lines Detection algorithm to detect printing margin and skew errors based on factual scanned image verification. In the print velocity simulation and estimation project, we propose a print velocity simulation tool, design a specific print velocity test page, and design a print velocity estimation algorithm using the dynamic time warping algorithm. </p>
64

Monitoring Vehicle Suspension Elements Using Machine Learning Techniques / Tillståndsövervakning av komponenter i fordonsfjädringssystem genom maskininlärningstekniker

Karlsson, Henrik January 2019 (has links)
Condition monitoring (CM) is widely used in industry, and there is a growing interest in applying CM on rail vehicle systems. Condition based maintenance has the possibility to increase system safety and availability while at the sametime reduce the total maintenance costs.This thesis investigates the feasibility of using condition monitoring of suspension element components, in this case dampers, in rail vehicles. There are different methods utilized to detect degradations, ranging from mathematicalmodelling of the system to pure "knowledge-based" methods, using only large amount of data to detect patterns on a larger scale. In this thesis the latter approach is explored, where acceleration signals are evaluated on severalplaces on the axleboxes, bogieframes and the carbody of a rail vehicle simulation model. These signals are picked close to the dampers that are monitored in this study, and frequency response functions (FRF) are computed between axleboxes and bogieframes as well as between bogieframes and carbody. The idea is that the FRF will change as the condition of the dampers change, and thus act as indicators of faults. The FRF are then fed to different classificationalgorithms, that are trained and tested to distinguish between the different damper faults.This thesis further investigates which classification algorithm shows promising results for the problem, and which algorithm performs best in terms of classification accuracy as well as two other measures. Another aspect explored is thepossibility to apply dimensionality reduction to the extracted indicators (features). This thesis is also looking into how the three performance measures used are affected by typical varying operational conditions for a rail vehicle,such as varying excitation and carbody mass. The Linear Support Vector Machine classifier using the whole feature space, and the Linear Discriminant Analysis classifier combined with Principal Component Analysis dimensionality reduction on the feature space both show promising results for the taskof correctly classifying upcoming damper degradations. / Tillståndsövervakning används brett inom industrin och det finns ett ökat intresse för att applicera tillståndsövervakning inom spårfordons olika system. Tillståndsbaserat underhåll kan potentiellt öka ett systems säkerhet och tillgänglighetsamtidigt som det kan minska de totala underhållskostnaderna.Detta examensarbete undersöker möjligheten att applicera tillståndsövervakning av komponenter i fjädringssystem, i detta fall dämpare, hos spårfordon. Det finns olika metoder för att upptäcka försämringar i komponenternas skick, från matematisk modellering av systemet till mer ”kunskaps-baserade” metodersom endast använder stora mängder data för att upptäcka mönster i en större skala. I detta arbete utforskas den sistnämnda metoden, där accelerationssignaler inhämtas från axelboxar, boggieramar samt vagnskorg från en simuleringsmodellav ett spårfordon. Dessa signaler är extraherade nära de dämpare som övervakas, och används för att beräkna frekvenssvarsfunktioner mellan axelboxar och boggieramar, samt mellan boggieramar och vagnskorg. Tanken är att frekvenssvarsfunktionerna förändras när dämparnas skick förändras ochpå så sätt fungera som indikatorer av dämparnas skick. Frekvenssvarsfunktionerna används sedan för att träna och testa olika klassificeringsalgoritmer för att kunna urskilja olika dämparfel.Detta arbete undersöker vidare vilka klassificeringsalgoritmer som visar lovande resultat för detta problem, och vilka av dessa som presterar bäst med avseende på noggrannheten i prediktionerna, samt två andra mått på algoritmernasprestanda. En annan aspekt som undersöks är möjligheten att applicera dimensionalitetsminskning på de extraherade indikatorerna. Detta arbete undersöker också hur de tre prestandamåtten som används påverkas av typiska förändringar i driftsförhållanden för ett spårfordon såsom varierande exciteringfrån spåret och vagnkorgsmassa. Resultaten visar lovande prestanda för klassificeringsalgoritmen ”Linear Support Vector Machine” som använder hela rymden med felindikatorer, samt algoritmen ”Linear Discriminant Analysis” i kombination med ”Principal Component Analysis” dimensionalitetsreducering.
65

Machine Learning Driven Simulation in the Automotive Industry

Ram Seshadri, Aravind January 2022 (has links)
The current thesis investigates data-driven simulation decision-making with field-quality consumer data. This is accomplished by outlining the benefits and uses of combining machine learning and simulation in the literature and by locating barriers to the use of machine learning (ML) in the simulation subsystems at a case study organization. Additionally, an implementation is carried out to demonstrate how Scania departments can use this technology to analyze their current data and produce results that support the exploration of the simulation space and the identification of potential design issues so that preventative measures can be taken during concept development. The thesis' findings provide an overview of the literature on the relationship between machine learning and simulation technologies, as well as limitations of using machine learning in simulation systems at large scale manufacturing organizations. Support vector machines, logistic regression, and Random Forest classifiers are used to demonstrate one possible use of machine learning in simulation.
66

LB-CNN & HD-OC, DEEP LEARNING ADAPTABLE BINARIZATION TOOLS FOR LARGE SCALE IMAGE CLASSIFICATION

Timothy G Reese (13163115) 28 July 2022 (has links)
<p>The computer vision task of classifying natural images is a primary driving force behind modern AI algorithms. Deep Convolutional Neural Networks (CNNs) demonstrate state of the art performance in large scale multi-class image classification tasks. However, due to the many layers and millions of parameters these models are considered to be black box algorithms. The decisions of these models are further obscured due to a cumbersome multi-class decision process. There exists another approach called class binarization in the literature which determines the multi-class prediction outcome through a sequence of binary decisions.The focus of this dissertation is on the integration of the class-binarization approach to multi-class classification with deep learning models, such as CNNs, for addressing large scale image classification problems. Three works are presented to address the integration.</p> <p>In the first work, Error Correcting Output Codes (ECOCs) are integrated into CNNs by inserting a latent-binarization layer prior to the CNNs final classification layer.  This approach encapsulates both encoding and decoding steps of ECOC into a single CNN architecture. EM and Gibbs sampling algorithms are combined with back-propagation to train CNN models with Latent Binarization (LB-CNN). The training process of LB-CNN guides the model to discover hidden relationships similar to the semantic relationships known apriori between the categories. The proposed models and algorithms are applied to several image recognition tasks, producing excellent results.</p> <p>In the second work, Hierarchically Decodeable Output Codes (HD-OCs) are proposedto compactly describe a hierarchical probabilistic binary decision process model over the features of a CNN. HD-OCs enforce more homogeneous assignments of the categories to the dichotomy labels. A novel concept called average decision depth is presented to quantify the average number of binary questions needed to classify an input. An HD-OC is trained using a hierarchical log-likelihood loss that is empirically shown to orient the output of the latent feature space to resemble the hierarchical structure described by the HD-OC. Experiments are conducted at several different scales of category labels. The experiments demonstrate strong performance and powerful insights into the decision process of the model.</p> <p>In the final work, the literature of enumerative combinatorics and partially ordered sets isused to establish a unifying framework of class-binarization methods under the Multivariate Bernoulli family of models. The unifying framework theoretically establishes simple relationships for transitioning between the different binarization approaches. Such relationships provide useful investigative tools for the discovery of statistical dependencies between large groups of categories. They are additionally useful for incorporating taxonomic information as well as enforcing structural model constraints. The unifying framework lays the groundwork for future theoretical and methodological work in addressing the fundamental issues of large scale multi-class classification.</p> <p><br></p>
67

Measuring the Utility of Synthetic Data : An Empirical Evaluation of Population Fidelity Measures as Indicators of Synthetic Data Utility in Classification Tasks / Mätning av Användbarheten hos Syntetiska Data : En Empirisk Utvärdering av Population Fidelity mätvärden som Indikatorer på Syntetiska Datas Användbarhet i Klassifikationsuppgifter

Florean, Alexander January 2024 (has links)
In the era of data-driven decision-making and innovation, synthetic data serves as a promising tool that bridges the need for vast datasets in machine learning (ML) and the imperative necessity of data privacy. By simulating real-world data while preserving privacy, synthetic data generators have become more prevalent instruments in AI and ML development. A key challenge with synthetic data lies in accurately estimating its utility. For such purpose, Population Fidelity (PF) measures have shown to be good candidates, a category of metrics that evaluates how well the synthetic data mimics the general distribution of the original data. With this setting, we aim to answer: "How well are different population fidelity measures able to indicate the utility of synthetic data for machine learning based classification models?" We designed a reusable six-step experiment framework to examine the correlation between nine PF measures and the performance of four ML for training classification models over five datasets. The six-step approach includes data preparation, training, testing on original and synthetic datasets, and PF measures computation. The study reveals non-linear relationships between the PF measures and synthetic data utility. The general analysis, meaning the monotonic relationship between the PF measure and performance over all models, yielded at most moderate correlations, where the Cluster measure showed the strongest correlation. In the more granular model-specific analysis, Random Forest showed strong correlations with three PF measures. The findings show that no PF measure shows a consistently high correlation over all models to be considered a universal estimator for model performance.This highlights the importance of context-aware application of PF measures and sets the stage for future research to expand the scope, including support for a wider range of types of data and integrating privacy evaluations in synthetic data assessment. Ultimately, this study contributes to the effective and reliable use of synthetic data, particularly in sensitive fields where data quality is vital. / I eran av datadriven beslutsfattning och innovation, fungerar syntetiska data som ett lovande verktyg som bryggar behovet av omfattande dataset inom maskininlärning (ML) och nödvändigheten för dataintegritet. Genom att simulera verklig data samtidigt som man bevarar integriteten, har generatorer av syntetiska data blivit allt vanligare verktyg inom AI och ML-utveckling. En viktig utmaning med syntetiska data är att noggrant uppskatta dess användbarhet. För detta ändamål har mått under kategorin Populations Fidelity (PF) visat sig vara goda kandidater, det är mätvärden som utvärderar hur väl syntetiska datan efterliknar den generella distributionen av den ursprungliga datan. Med detta i åtanke strävar vi att svara på följande: Hur väl kan olika population fidelity mätvärden indikera användbarheten av syntetisk data för maskininlärnings baserade klassifikationsmodeller? För att besvara frågan har vi designat ett återanvändbart sex-stegs experiment ramverk, för att undersöka korrelationen mellan nio PF-mått och prestandan hos fyra ML klassificeringsmodeller, på fem dataset. Sex-stegs strategin inkluderar datatillredning, träning, testning på både ursprungliga och syntetiska dataset samt beräkning av PF-mått. Studien avslöjar förekommandet av icke-linjära relationer mellan PF-måtten och användbarheten av syntetiska data. Den generella analysen, det vill säga den monotona relationen mellan PF-måttet och prestanda över alla modeller, visade som mest medelmåttiga korrelationer, där Cluster-måttet visade den starkaste korrelationen. I den mer detaljerade, modell-specifika analysen visade Random Forest starka korrelationer med tre PF-mått. Resultaten visar att inget PF-mått visar konsekvent hög korrelation över alla modeller för att betraktas som en universell indikator för modellprestanda. Detta understryker vikten av kontextmedveten tillämpning av PF-mått och banar väg för framtida forskning för att utöka omfånget, inklusive stöd för ett bredare utbud för data av olika typer och integrering av integritetsutvärderingar i bedömningen av syntetiska data. Därav, så bidrar denna studie till effektiv och tillförlitlig användning av syntetiska data, särskilt inom känsliga områden där datakvalitet är avgörande.

Page generated in 0.1395 seconds