Global ETD Search

51	Spectral and textural analysis of high resolution data for the automatic detection of grape vine diseases / Analyses spectrale et texturale de données haute résolution pour la détection automatique des maladies de la vigne Al saddik, Hania 04 July 2019 (has links) La Flavescence dorée est une maladie contagieuse et incurable de la vigne détectable sur les feuilles. Le projet DAMAV (Détection Automatique des MAladies de la Vigne) a été mis en place, avec pour objectif de développer une solution de détection automatisée des maladies de la vigne à l’aide d’un micro-drone. Cet outil doit permettre la recherche des foyers potentiels de la Flavescence dorée, puis plus généralement de toute maladie détectable sur le feuillage à l’aide d’un outil multispectral dédié haute résolution.Dans le cadre de ce projet, cette thèse a pour objectif de participer à la conception et à l’implémentation du système d’acquisition multispectral et de développer les algorithmes de prétraitement d’images basés sur les caractéristiques spectrales et texturales les plus pertinentes reliées à la Flavescence dorée.Plusieurs variétés de vigne ont été considérées telles que des variétés rouges et blanches; de plus, d’autres maladies que ‘Flavescence dorée’ (FD) telles que Esca et ‘Bois noir’ (BN) ont également été testées dans des conditions de production réelles. Le travail de doctorat a été essentiellement réalisé au niveau feuille et a impliqué une étape d’acquisition suivie d’une étape d’analyse des données.La plupart des techniques d'imagerie, même multispectrales, utilisées pour détecter les maladies dans les grandes cultures ou les vignobles, opèrent dans le domaine du visible. Dans DAMAV, il est conseillé que la maladie soit détectée le plus tôt possible. Des informations spectrales sont nécessaires, notamment dans l’infrarouge. Les réflectances des feuilles des plantes peuvent être obtenues sur les longueurs d'onde les plus courtes aux plus longues. Ces réflectances sont intimement liées aux composants internes des feuilles. Cela signifie que la présence d'une maladie peut modifier la structure interne des feuilles et donc altérer sa signature.Un spectromètre a été utilisé sur le terrain pour caractériser les signatures spectrales des feuilles à différents stades de croissance. Afin de déterminer les réflectances optimales pour la détection des maladies (FD, Esca, BN), une nouvelle méthodologie de conception d'indices de maladies basée sur deux techniques de réduction de dimensions, associées à un classifieur, a été mise en place. La première technique de sélection de variables utilise les Algorithmes Génétiques (GA) et la seconde s'appuie sur l'Algorithme de Projections Successives (SPA). Les nouveaux indices de maladies résultants surpassent les indices de végétation traditionnels et GA était en général meilleur que SPA. Les variables finalement choisies peuvent ainsi être mises en oeuvre en tant que filtres dans le capteur MS.Les informations de réflectance étaient satisfaisantes pour la recherche d’infections (plus que 90% de précision pour la meilleure méthode) mais n’étaient pas suffisantes. Ainsi, les images acquises par l’appareil MS peuvent être ensuite traitées par des techniques bas-niveau basées sur le calcul de paramètres de texture puis injectés dans un classifieur. Plusieurs techniques de traitement de texture ont été testées mais uniquement sur des images couleur. Une nouvelle méthode combinant plusieurs paramètres texturaux a été élaborée pour en choisir les meilleurs. Nous avons constaté que les informations texturales pouvaient constituer un moyen complémentaire non seulement pour différencier les feuilles de vigne saines des feuilles infectées (plus que 85% de précision), mais également pour classer le degré d’infestation des maladies (plus que 74% de précision) et pour distinguer entre les maladies (plus que 75% de précision). Ceci conforte l’hypothèse qu’une caméra multispectrale permet la détection et l’identification de maladies de la vigne en plein champ. / ‘Flavescence dorée’ is a contagious and incurable disease present on the vine leaves. The DAMAV project (Automatic detection of Vine Diseases) aims to develop a solution for automated detection of vine diseases using a micro-drone. The goal is to offer a turnkey solution for wine growers. This tool will allow the search for potential foci, and then more generally any type of detectable vine disease on the foliage. To enable this diagnosis, the foliage is proposed to be studied using a dedicated high-resolution multispectral camera.The objective of this PhD-thesis in the context of DAMAV is to participate in the design and implementation of a Multi-Spectral (MS) image acquisition system and to develop the image pre-processing algorithms, based on the most relevant spectral and textural characteristics related to ‘Flavescence dorée’.Several grapevine varieties were considered such as red-berried and white-berried ones; furthermore, other diseases than ‘Flavescence dorée’ (FD) such as Esca and ‘Bois noir’ (BN) were also tested under real production conditions. The PhD work was basically performed at a leaf-level scale and involved an acquisition step followed by a data analysis step.Most imaging techniques, even MS, used to detect diseases in field crops or vineyards, operate in the visible electromagnetic radiation range. In DAMAV, it is advised to detect the disease as early as possible. It is therefore necessary to investigate broader information in particular in the infra-red. Reflectance responses of plants leaves can be obtained from short to long wavelengths. These reflectance signatures describe the internal constituents of leaves. This means that the presence of a disease can modify the internal structure of the leaves and hence cause an alteration of its reflectance signature.A spectrometer is used in our study to characterize reflectance responses of leaves in the field. Several samples at different growth stages were used for the tests. To define optimal reflectance features for grapevine disease detection (FD, Esca, BN), a new methodology that designs spectral disease indices based on two dimension reduction techniques, coupled with a classifier, has been developed. The first feature selection technique uses the Genetic Algorithms (GA) and the second one relies on the Successive Projection Algorithm (SPA). The new resulting spectral disease indices outperformed traditional vegetation indices and GA performed in general better than SPA. The features finally chosen can thus be implemented as filters in the MS sensor.In general, the reflectance information was satisfying for finding infections (higher than 90% of accuracy for the best method) but wasn’t enough. Thus, the images acquired with the developed MS device can further be pre-processed by low level techniques based on the calculation of texture parameters injected into a classifier. Several texture processing techniques have been tested but only on colored images. A method that combines many texture features is elaborated, allowing to choose the best ones. We found that the combination of optimal textural information could provide a complementary mean for not only differentiating healthy from infected grapevine leaves (higher than 85% of accuracy), but also for grading the disease severity stages (higher than 73% of accuracy) and for discriminating among diseases (higher than 72% of accuracy). This is in accordance with the hypothesis that a multispectral camera can enable detection and identification of diseases in grapevine fields. Analyse spectrale Analyse de texture Sélection de variables Classification de données Maladies de la Vigne Flavescence Dorée Spectral analysis Textural analysis Variable selection Data classification Grapevine diseases Flavescence Dorée 006.6
52	Linked Open Projects: Nachnutzung von Ergebnissen im Semantic Web Pfeffer, Magnus, Eckert, Kai 28 January 2011 (has links) Semantic Web und Linked Data sind in aller Munde. Nach fast einem Jahrzehnt der Entwicklung der Technologien und Erforschung der Möglichkeiten des Semantic Webs rücken nun die Daten in den Mittelpunk, denn ohne diese wäre das Semantic Web nicht mehr als ein theoretisches Konstrukt. Fast wie das World Wide Web ohne Websites. Bibliotheken besitzen mit Normdaten (PND, SWD) und Titelaufnahmen eine Fülle Daten, die sich zur Befüllung des Semantic Web eignen und teilweise bereits für das Semantic Web aufbereitet und zur Nutzung freigegeben wurden. Die Universitätsbibliothek Mannheim hat sich in zwei verschiedenen Projekten mit der Nutzung solcher Daten befasst – allerdings standen diese zu diesem Zeitpunkt noch nicht als Linked Data zur Verfügung. In einem Projekt ging es um die automatische Erschließung von Publikationen auf der Basis von Abstracts, im anderen Projekt um die automatische Klassifikation von Publikationen auf der Basis von Titeldaten. Im Rahmen dieses Beitrags stellen wir die Ergebnisse der Projekte kurz vor, möchten aber im Schwerpunkt auf einen Nebenaspekt eingehen, der sich erst im Laufe dieser Projekte herauskristallisiert hat: Wie kann man die gewonnenen Ergebnisse dauerhaft und sinnvoll zur Nachnutzung durch Dritte präsentieren? Soviel vorweg: Beide Verfahren können und wollen einen Bibliothekar nicht ersetzen. Die Einsatzmöglichkeiten der generierten Daten sind vielfältig. Konkrete Einsätze, zum Beispiel das Einspielen in einen Verbundkatalog, sind aber aufgrund der Qualität und mangelnden Kontrolle der Daten umstritten. Die Bereitstellung dieser Daten als Linked Data im Semantic Web ist da eine naheliegende Lösung – jeder, der die Ergebnisse nachnutzen möchte, kann das tun, ohne dass ein bestehender Datenbestand damit kompromittiert werden könnte. Diese Herangehensweise wirft aber neue Fragen auf, nicht zuletzt auch nach der Identifizierbarkeit der Ursprungsdaten über URIs, wenn diese (noch) nicht als Linked Data zur Verfügung stehen. Daneben erfordert die Bereitstellung von Ergebnisdaten aber auch weitere Maßnahmen, die über die gängige Praxis von Linked Data hinaus gehen: Die Bereitstellung von Zusatzinformationen, die die Quelle und das Zustandekommen dieser Daten näher beschreiben (Provenienzinformationen), aber auch weitere Informationen, die über das zugrunde liegende Metadatenschema meist hinausgehen, wie Konfidenzwerte im Falle eines automatischen Verfahrens der Datenerzeugung. Dazu präsentieren wir Ansätze auf Basis von RDF Reification und Named Graphs und schildern die aktuellen Entwicklungen auf diesem Gebiet, wie sie zum Beispiel in der Provenance Incubator Group des W3C und in Arbeitsgruppen der Dublin Core Metadaten-Initiative diskutiert werden. info:eu-repo/classification/ddc/020 ddc:020 Klassifikationssystem, Nachnutzung
53	Classifying personal data on contextual information / Klassificering av persondata från kontextuell information Dath, Carl January 2023 (has links) In this thesis, a novel approach to classifying personal data is tested. Previous personal data classification models read the personal data before classifying it. However, this thesis instead investigates an approach to classify personal data by looking at contextual information frequently available in data sets. The thesis compares the well-researched word embedding methods Word2Vec, Global representations of Vectors (GloVe) and Bidirectional Encoder Representations from Transformers (BERT) used in conjunction with the different types of classification methods Bag Of Word representation (BOW), Convolutional Neural Networks (CNN), and Long Short-term Memory (LSTM) when solving a personal data classification task. The comparisons are made by extrinsically evaluating the different embeddings' and models' performance in a personal data classification task on a sizable collection of well-labeled datasets belonging to Spotify. The results suggest that the embedded representations of the contextual data capture enough information to be able to classify personal data both when classifying non-personal data against personal data, and also when classifying different types of personal data from each other. / I denna uppsats undersöks ett nytt tillvägagångssätt att klassificera personlig data. Tidigare dataklassificerings modeller läser data innan den klassificerar den. I denna uppsats undersöks istället ett tillvägagångssätt där den kontextuella informationen används. Uppsatsen jämför flera väletablerade metoder för 'word embedding' så som Word2Vec, Global representations of Vectors (GloVe) och Bidirectional Encoder Representations from Transformers (BERT) i kombination med klassificeringsmodellerna Bag Of Word representation (BOW), Convolutional Neural Networks (CNN) och Long Short-term Memory (LSTM). Modellerna jämförs genom att evaluera deras förmåga att klassificera olika typer av personlig data baserad på namngivning och beskrivning av dataset. Resultaten pekar på att representationerna samt modellerna fångar tillräckligt med information för att kunna klassificera personlig data baserat på den kontextuell information som gavs. Utöver detta antyder resultaten att modellerna även klarar av att urskilja olika typer av personlig data från varandra. Natural Language Processing Machine Learning Word2Vec GloVe BERT Personal Data classification Språkteknologi Maskininlärning Personlig Data Klassificering Computer Sciences Datavetenskap (datalogi) Computer Engineering Datorteknik Computer and Information Sciences Data- och informationsvetenskap
54	Modeling, Detection, and Prevention of Electricity Theft for Enhanced Performance and Security of Power Grid Depuru, Soma Shekara 24 September 2012 (has links) No description available. Electrical Engineering Non-Technical Losses Detection of Electricity Theft Neural Network SVM Data Classification Rule Engine Smart Grid Smart Meter Power Grid Fraud/Illegal consumption of electricity HPC DG
55	Redes complexas para classificação de dados via conformidade de padrão, caracterização de importância e otimização estrutural / Data classification in complex networks via pattern conformation, data importance and structural optimization Carneiro, Murillo Guimarães 08 November 2016 (has links) A classificação é uma tarefa do aprendizado de máquina e mineração de dados, na qual um classificador é treinado sobre um conjunto de dados rotulados de forma que as classes de novos itens de dados possam ser preditas. Tradicionalmente, técnicas de classificação trabalham por definir fronteiras de decisão no espaço de dados considerando os atributos físicos do conjunto de treinamento e uma nova instância é classificada verificando sua posição relativa a tais fronteiras. Essa maneira de realizar a classificação, essencialmente baseada nos atributos físicos dos dados, impossibilita que as técnicas tradicionais sejam capazes de capturar relações semânticas existentes entre os dados, como, por exemplo, a formação de padrão. Por outro lado, o uso de redes complexas tem se apresentado como um caminho promissor para capturar relações espaciais, topológicas e funcionais dos dados, uma vez que a abstração da rede unifica a estrutura, a dinâmica e as funções do sistema representado. Dessa forma, o principal objetivo desta tese é o desenvolvimento de métodos e heurísticas baseadas em teorias de redes complexas para a classificação de dados. As principais contribuições envolvem os conceitos de conformidade de padrão, caracterização de importância e otimização estrutural de redes. Para a conformidade de padrão, onde medidas de redes complexas são usadas para estimar a concordância de um item de teste com a formação de padrão dos dados, é apresentada uma técnica híbrida simples pela qual associações físicas e topológicas são produzidas a partir da mesma rede. Para a caracterização de importância, é apresentada uma técnica que considera a importância individual dos itens de dado para determinar o rótulo de um item de teste. O conceito de importância aqui é definido em termos do PageRank, algoritmo usado na engine de busca do Google para definir a importância de páginas da web. Para a otimização estrutural de redes, é apresentado um framework bioinspirado capaz de construir a rede enquanto otimiza uma função de qualidade orientada à tarefa, como, por exemplo, classificação, redução de dimensionalidade, etc. A última investigação apresentada no documento explora a representação baseada em grafo e sua habilidade para detectar classes de distribuições arbitrárias na tarefa de difusão de papéis semânticos. Vários experimentos em bases de dados artificiais e reais, além de comparações com técnicas bastante usadas na literatura, são fornecidos em todas as investigações. Em suma, os resultados obtidos demonstram que as vantagens e novos conceitos propiciados pelo uso de redes se configuram em contribuições relevantes para as áreas de classificação, sistemas de aprendizado e redes complexas. / Data classification is a machine learning and data mining task in which a classifier is trained over a set of labeled data instances in such a way that the labels of new instances can be predicted. Traditionally, classification techniques define decision boundaries in the data space according to the physical features of a training set and a new data item is classified by verifying its relative position to the boundaries. Such kind of classification, which is only based on the physical attributes of the data, makes traditional techniques unable to detect semantic relationship existing among the data such as the pattern formation, for instance. On the other hand, recent works have shown the use of complex networks is a promissing way to capture spatial, topological and functional relationships of the data, as the network representation unifies structure, dynamic and functions of the networked system. In this thesis, the main objective is the development of methods and heuristics based on complex networks for data classification. The main contributions comprise the concepts of pattern conformation, data importance and network structural optimization. For pattern conformation, in which complex networks are employed to estimate the membership of a test item according to the data formation pattern, we present, in this thesis, a simple hybrid technique where physical and topological associations are produced from the same network. For data importance, we present a technique which considers the individual importance of the data items in order to determine the label of a given test item. The concept of importance here is derived from PageRank formulation, the ranking measure behind the Googles search engine used to calculate the importance of webpages. For network structural optimization, we present a bioinspired framework, which is able to build up the network while optimizing a task-oriented quality function such as classification, dimension reduction, etc. The last investigation presented in this thesis exploits the graph representation and its hability to detect classes of arbitrary distributions for the task of semantic role diffusion. In all investigations, a wide range of experiments in artificial and real-world data sets, and many comparisons with well-known and widely used techniques are also presented. In summary, the experimental results reveal that the advantages and new concepts provided by the use of networks represent relevant contributions to the areas of classification, learning systems and complex networks. Aprendizado de máquina Classificação de dados Classificação por importância Classification by data importance Classification by pattern conformation Complex networks Data classification Machine learning network structural optimization Otimização de redes Redes complexas
56	模糊資料分類與模式建構探討-以單身人口數及失業率為例 / A study on the fuzzy data classification and model construction - with case study on the population of singles versus unemployment rate 游鈞毅, Yu,Chun Yi Unknown Date (has links) 資料分類的應用在時間數列的分析與預測過程相當重要。而模糊資料近年來更受到重視，其應用的範圍包含：財金、社會、生醫、電機等各個領域。本研究欲運用模糊資料分類法，對區間時間數列的轉折偵測與模式建構做一個深入探討。主要應用平均累加模糊熵(average of the sum of fuzzy entropies)，找出其結構性改變的區間。並針對區間型時間數列進行模式建構診斷與預測。最後我們以單身人口數與失業率為實列做一個詳細的探討。結果顯示，失業率對單身人口數有顯著的影響而孤鸞年的效應並不顯著。 / The application of data classifications in time series analysis and forecasting is rather important. The fuzzy data classification has received much attention recently. It can be applied on various fields such as finance, sociology, biomedicine, electrical engineering and so on. This study is to use the fuzzy data classification to perform an intensive research on the change periods detection and model construction of the interval time series. We use average of the sum of fuzzy entropies to find out interval of the structural changes. Focusing on the time series of intervals, we build a model and make prediction about it. At the end, based on the case study on the population of singles versus, we thoroughly discuss this topic. The result shows that the unemployment rate does significantly correlate with the population of singles, but the "widow's year" does not . 模糊資料分類轉折區間平均累加模糊熵失業率單身人口數 fuzzy data classification average of the sum of fuzzy entropies change periods unemployment rate population of singles
57	Classification of uncertain data in the framework of belief functions : nearest-neighbor-based and rule-based approaches / Classification des données incertaines dans le cadre des fonctions de croyance : la métode des k plus proches voisins et la méthode à base de règles Jiao, Lianmeng 26 October 2015 (has links) Dans de nombreux problèmes de classification, les données sont intrinsèquement incertaines. Les données d’apprentissage disponibles peuvent être imprécises, incomplètes, ou même peu fiables. En outre, des connaissances spécialisées partielles qui caractérisent le problème de classification peuvent également être disponibles. Ces différents types d’incertitude posent de grands défis pour la conception de classifieurs. La théorie des fonctions de croyance fournit un cadre rigoureux et élégant pour la représentation et la combinaison d’une grande variété d’informations incertaines. Dans cette thèse, nous utilisons cette théorie pour résoudre les problèmes de classification des données incertaines sur la base de deux approches courantes, à savoir, la méthode des k plus proches voisins (kNN) et la méthode à base de règles.Pour la méthode kNN, une préoccupation est que les données d’apprentissage imprécises dans les régions où les classes de chevauchent peuvent affecter ses performances de manière importante. Une méthode d’édition a été développée dans le cadre de la théorie des fonctions de croyance pour modéliser l’information imprécise apportée par les échantillons dans les régions qui se chevauchent. Une autre considération est que, parfois, seul un ensemble de données d’apprentissage incomplet est disponible, auquel cas les performances de la méthode kNN se dégradent considérablement. Motivé par ce problème, nous avons développé une méthode de fusion efficace pour combiner un ensemble de classifieurs kNN couplés utilisant des métriques couplées apprises localement. Pour la méthode à base de règles, afin d’améliorer sa performance dans les applications complexes, nous étendons la méthode traditionnelle dans le cadre des fonctions de croyance. Nous développons un système de classification fondé sur des règles de croyance pour traiter des informations incertains dans les problèmes de classification complexes. En outre, dans certaines applications, en plus de données d’apprentissage, des connaissances expertes peuvent également être disponibles. Nous avons donc développé un système de classification hybride fondé sur des règles de croyance permettant d’utiliser ces deux types d’information pour la classification. / In many classification problems, data are inherently uncertain. The available training data might be imprecise, incomplete, even unreliable. Besides, partial expert knowledge characterizing the classification problem may also be available. These different types of uncertainty bring great challenges to classifier design. The theory of belief functions provides a well-founded and elegant framework to represent and combine a large variety of uncertain information. In this thesis, we use this theory to address the uncertain data classification problems based on two popular approaches, i.e., the k-nearest neighbor rule (kNN) andrule-based classification systems. For the kNN rule, one concern is that the imprecise training data in class over lapping regions may greatly affect its performance. An evidential editing version of the kNNrule was developed based on the theory of belief functions in order to well model the imprecise information for those samples in over lapping regions. Another consideration is that, sometimes, only an incomplete training data set is available, in which case the ideal behaviors of the kNN rule degrade dramatically. Motivated by this problem, we designedan evidential fusion scheme for combining a group of pairwise kNN classifiers developed based on locally learned pairwise distance metrics.For rule-based classification systems, in order to improving their performance in complex applications, we extended the traditional fuzzy rule-based classification system in the framework of belief functions and develop a belief rule-based classification system to address uncertain information in complex classification problems. Further, considering that in some applications, apart from training data collected by sensors, partial expert knowledge can also be available, a hybrid belief rule-based classification system was developed to make use of these two types of information jointly for classification. Classification à base de règles Classifieurs Fusion de données Théorie des fonctions de croyances Gestion de l'incertitude K plus proches voisins Data classification Information fusion Uncertainty management Theory of belief functions K-nearest neighbor rule Rule-based classification system
58	Redes complexas para classificação de dados via conformidade de padrão, caracterização de importância e otimização estrutural / Data classification in complex networks via pattern conformation, data importance and structural optimization Murillo Guimarães Carneiro 08 November 2016 (has links) A classificação é uma tarefa do aprendizado de máquina e mineração de dados, na qual um classificador é treinado sobre um conjunto de dados rotulados de forma que as classes de novos itens de dados possam ser preditas. Tradicionalmente, técnicas de classificação trabalham por definir fronteiras de decisão no espaço de dados considerando os atributos físicos do conjunto de treinamento e uma nova instância é classificada verificando sua posição relativa a tais fronteiras. Essa maneira de realizar a classificação, essencialmente baseada nos atributos físicos dos dados, impossibilita que as técnicas tradicionais sejam capazes de capturar relações semânticas existentes entre os dados, como, por exemplo, a formação de padrão. Por outro lado, o uso de redes complexas tem se apresentado como um caminho promissor para capturar relações espaciais, topológicas e funcionais dos dados, uma vez que a abstração da rede unifica a estrutura, a dinâmica e as funções do sistema representado. Dessa forma, o principal objetivo desta tese é o desenvolvimento de métodos e heurísticas baseadas em teorias de redes complexas para a classificação de dados. As principais contribuições envolvem os conceitos de conformidade de padrão, caracterização de importância e otimização estrutural de redes. Para a conformidade de padrão, onde medidas de redes complexas são usadas para estimar a concordância de um item de teste com a formação de padrão dos dados, é apresentada uma técnica híbrida simples pela qual associações físicas e topológicas são produzidas a partir da mesma rede. Para a caracterização de importância, é apresentada uma técnica que considera a importância individual dos itens de dado para determinar o rótulo de um item de teste. O conceito de importância aqui é definido em termos do PageRank, algoritmo usado na engine de busca do Google para definir a importância de páginas da web. Para a otimização estrutural de redes, é apresentado um framework bioinspirado capaz de construir a rede enquanto otimiza uma função de qualidade orientada à tarefa, como, por exemplo, classificação, redução de dimensionalidade, etc. A última investigação apresentada no documento explora a representação baseada em grafo e sua habilidade para detectar classes de distribuições arbitrárias na tarefa de difusão de papéis semânticos. Vários experimentos em bases de dados artificiais e reais, além de comparações com técnicas bastante usadas na literatura, são fornecidos em todas as investigações. Em suma, os resultados obtidos demonstram que as vantagens e novos conceitos propiciados pelo uso de redes se configuram em contribuições relevantes para as áreas de classificação, sistemas de aprendizado e redes complexas. / Data classification is a machine learning and data mining task in which a classifier is trained over a set of labeled data instances in such a way that the labels of new instances can be predicted. Traditionally, classification techniques define decision boundaries in the data space according to the physical features of a training set and a new data item is classified by verifying its relative position to the boundaries. Such kind of classification, which is only based on the physical attributes of the data, makes traditional techniques unable to detect semantic relationship existing among the data such as the pattern formation, for instance. On the other hand, recent works have shown the use of complex networks is a promissing way to capture spatial, topological and functional relationships of the data, as the network representation unifies structure, dynamic and functions of the networked system. In this thesis, the main objective is the development of methods and heuristics based on complex networks for data classification. The main contributions comprise the concepts of pattern conformation, data importance and network structural optimization. For pattern conformation, in which complex networks are employed to estimate the membership of a test item according to the data formation pattern, we present, in this thesis, a simple hybrid technique where physical and topological associations are produced from the same network. For data importance, we present a technique which considers the individual importance of the data items in order to determine the label of a given test item. The concept of importance here is derived from PageRank formulation, the ranking measure behind the Googles search engine used to calculate the importance of webpages. For network structural optimization, we present a bioinspired framework, which is able to build up the network while optimizing a task-oriented quality function such as classification, dimension reduction, etc. The last investigation presented in this thesis exploits the graph representation and its hability to detect classes of arbitrary distributions for the task of semantic role diffusion. In all investigations, a wide range of experiments in artificial and real-world data sets, and many comparisons with well-known and widely used techniques are also presented. In summary, the experimental results reveal that the advantages and new concepts provided by the use of networks represent relevant contributions to the areas of classification, learning systems and complex networks. Aprendizado de máquina Classificação de dados Classificação por importância Otimização de redes Redes complexas Classification by data importance Classification by pattern conformation Complex networks Data classification Machine learning network structural optimization
59	Predicting the threshold grade for university admission through Machine Learning Classification Models / Förutspå tröskelvärdet för universitetsantagningsbetyg genom klassificeringsmodeller inom maskininlärning Almawed, Anas, Victorin, Anton January 2023 (has links) Higher-level education is very important these days, which can create very high thresholds for admission on popular programs on certain universities. In order to know what grade will be needed to be admitted to a program, a student can look at the threshold from previous years. We explored whether it was possible to generate accurate predictions of what the future threshold would be. We did this by using well-established machine learning classification models and admission data from 14 years back covering all applicants to the Computer Science and Engineering Program at KTH Royal Institute of Technology. What we found through this work is that the models are good at correctly classifying data from the past, but not in a meaningful way able to predict future thresholds. The models could not make accurate future predictions solely based on grades of past applicants. / Eftergymnasiala studier är väldigt viktiga numera, vilket kan leda till mycket höga antagningskrav på populära program på vissa universitet och högskolor. För att veta vilket betyg som krävs för att komma in på en utbildning så kan studenten titta på gränsen från tidigare år och utifrån det gissa sig till vad gränsen kommer vara kommande år. Vi undersöker om det är möjligt att, med hjälp av väletablerade, klassificerande Maskininlärnings-modeller kunna förutse antagningsgränsen i framtiden. Vi tränar modellerna på data med antagningsstatistik som sträcker sig tillbaka 14 år med alla ansökningar till civilingenjörs-programmet Datateknik på Kungliga Tekniska Högskolan. Det vi finner genom detta arbete är att modellerna är bra på att korrekt klassificera data från tidigare år, men att de inte, på ett meningsfullt sätt, kan förutse betygsgränsen kommande år. Modellerna kan inte göra detta endast genom data på betyg från tidigare år. Admission data Data Classification Machine Learning Logistic Regression Support Vector Machine Decision Tree Classifier Random Forest Antagningsdata Dataklassificering Maskininlärning Logistic Regression Support Vector Machine Decision Tree Classifier Random Forest Computer and Information Sciences Data- och informationsvetenskap
60	Fault Detection and Identification of Vehicle Starters and Alternators Using Machine Learning Techniques Seddik, Essam January 2016 (has links) Artificial Intelligence in Automotive Industry / Cost reduction is one of the main concerns in industry. Companies invest considerably for better performance in end-of-line fault diagnosis systems. A common strategy is to use data obtained from existing instrumentation. This research investigates the challenge of learning from historical data that have already been collected by companies. Machine learning is basically one of the most common and powerful techniques of artificial intelligence that can learn from data and identify fault features with no need for human interaction. In this research, labeled sound and vibration measurements are processed into fault signatures for vehicle starter motors and alternators. A fault detection and identification system has been developed to identify fault types for end-of-line testing of motors. However, labels are relatively difficult to obtain, expensive, time consuming and require experienced humans, while unlabeled samples needs less effort to collect. Thus, learning from unlabeled data together with the guidance of few labels would be a better solution. Furthermore, in this research, learning from unlabeled data with absolutely no human intervention is also implemented and discussed as well. / Thesis / Master of Applied Science (MASc) Machine Learning Fault Diagnosis Fault Detection and Identification Fault Detection Fault Identification Starters Alternators Automotive Industry Artificial Intelligence Deep Learning Data classification Data clustering Neural Network Support Vector Machine Label Propagation Unknown Faults Detection

Search results