Global ETD Search

41	GEOFIER: um sistema de anotação geográfica de textos com o uso de classificadores de aprendizagem de máquina. / GEOFIER: a geotagging system based on machine learning text classifiers. Maçan, Eduardo Marcel 13 August 2015 (has links) A anotação geográfica de documentos consiste na adoção de metadados para a identificação de nomes de locais e a posição de suas ocorrências no texto. Esta informação é útil, por exemplo, para mecanismos de busca. A partir dos topônimos mencionados no texto é possível identificar o contexto espacial em que o assunto do texto está inserido, o que permite agrupar documentos que se refiram a um mesmo contexto, atribuindo ao documento um escopo geográfico. Esta Dissertação de Mestrado apresenta um novo método, batizado de Geofier, para determinação do escopo geográfico de documentos. A novidade apresentada pelo Geofier é a possibilidade da identificação do escopo geográfico de um documento por meio de classificadores de aprendizagem de máquina treinados sem o uso de um gazetteer e sem premissas quanto à língua dos textos analisados. A Wikipédia foi utilizada como fonte de um conjunto de documentos anotados geograficamente para o treinamento de uma hierarquia de Classificadores Naive Bayes e Support Vector Machines (SVMs). Uma comparação de desempenho entre o Geofier e uma reimplementação do sistema Web-a-Where foi realizada em relação à determinação do escopo geográfico dos textos da Wikipédia. A hierarquia do Geofier foi treinada e avaliada de duas formas: usando topônimos do mesmo gazetteer que o Web-a-Where e usando n-gramas extraídos dos documentos de treinamento. Como resultado, o Geofier manteve desempenho superior ao obtido pela reimplementação do Web-a-Where. / Automatic text geotagging is the process by which mentions of place names and their positions in text are identified as metadata, allowing this information to be used by specialized applications, like Search Engines. It is possible to identify the geographic scope of a document by analysing the toponyms it mentions and then group documents by their geographic context, effectively adding a geographic scope to the documents. This dissertation presents a new method to identify the geographic scope of text, named Geofier. The novelty in Geofier is that it uses machine learning text classifiers, trained without the need of a gazetteer and without making assumptions regarding the language in which the documents are written. Wikipedia was used as the source for a geotagged text dataset in order to train a hierarchy of Naive Bayes and Support Vector Machine (SVM) classifiers. The Geofier hierarchy was then trained and evaluated, first using toponyms from the same gazetteer as Web-a-Where and then using n-grams extracted from the training samples as attributes. Geofier performed significantly better when compared to a Web-a-Where implementation. Anotação geográfica Aprendizagem computacional Automatic text classifiers Classificação automática de texto Gazetteers Gazetteers Geotagging Geotagging Hierarchy of text classifiers Hierarquias de classificadores de texto Inteligência artificial Mineração de dados Topônímia Toponym ambiguity
42	Combinação de classificadores simbólicos utilizando medidas de regras de conhecimento e algoritmos genéticos / Combinig classifiers using knowledge rule measures and genetic algortgms Bernardini, Flávia Cristina 29 August 2006 (has links) A qualidade das hipóteses induzidas pelos atuais sistemas de aprendizado de máquina supervisionado depende da quantidade dos exemplos no conjunto de treinamento. Por outro lado, muitos dos sistemas de aprendizado de máquina conhecidos não estão preparados para trabalhar com uma grande quantidade de exemplos. Grandes conjuntos de dados são típicos em mineração de dados. Uma maneira para resolver este problema consiste em construir ensembles de classificadores. Um ensemble é um conjunto de classificadores cujas decisões são combinadas de alguma maneira para classificar um novo caso. Apesar de melhorar o poder de predição dos algoritmos de aprendizado, ensembles podem ser compostos por muitos classificadores, o que pode ser indesejável. Ainda, apesar de ensembles classificarem novos exemplos melhor que cada classificador individual, eles se comportam como caixas pretas, no sentido de não oferecer ao usuário alguma explicação relacionada à classificação por eles fornecida. Assim, neste trabalho propomos uma abordagem que utiliza algoritmos de aprendizado simbólico para construir ensembles de classificadores simbólicos que explicam suas decisões de classificação e são tão ou mais precisos que o mais preciso dos seus classificadores individuais. Além disso, considerando que algoritmos de aprendizado simbólico utilizam métodos de busca local para induzir classificadores quanto que algoritmos genéticos utilizam métodos de busca global, propomos uma segunda abordagem para aprender conceitos simbólicos de grandes bases de dados utilizando algoritmos genéticos para evoluir classificadores simbólicos em um u´ nico classificador simbólico, de maneira que o classificador evoluído é mais preciso que os classificadores iniciais. Ambas propostas foram implementadas em dois sistemas computacionais. Diversos experimentos usando diferentes conjuntos de dados foram conduzidos para avaliar ambas as propostas. Ainda que os resultados experimenta das duas soluções propostas são promissores, os melhores resultados foram obtidos utilizando a abordagem relacionada a algoritmos genéticos / The quality of hypotheses induced by most of the available supervised machine learning algorithms depends on the quantity and quality of the instances in the training set. However, several well known learning algorithms are not able to manipulate many instances making it difficult to induce good classifiers from large databases, as are needed in the Data Mining process. One approach to overcome this problem is to construct ensembles of classifiers. An ensemble is a set of classifiers whose decisions are combined in some way to classify new cases (instances). However, although ensembles improve learning algorithms power prediction, ensembles may use an undesired large set of classifiers. Furthermore, despite classifying new cases better than each individual classifier, ensembles are generally a sort of ?black-box? classifier, not being able to explain their classification decisions. To this end, in this work we propose an approach that uses symbolic learning algorithms to construct ensembles of symbolic classifiers that can explain their classification decisions so that the ensemble is as accurate as or more accurate than the individual classifiers. Furthermore, considering that symbolic learning algorithms use local search methods to induce classifiers while genetic algorithms use global search methods, we propose a second approach to learn symbolic concepts from large databases using genetic algorithms to evolve symbolic classifiers into only one symbolic classifier so that the evolved classifier is more accurate than the initial ones. Both proposals were implemented in two computational systems. Several experiments using different databases were conducted in order to evaluate both proposals. Results show that although both proposals are promising, the approach using genetic algorithms produces better results. Algoritmos genéticos Aprendizado de máquina Combinação de classificadores Combining classifiers Computação evolutiva Essembles de classificadores Essembles of classifiers Evolutionary computtion Genetic algorithms Knowledge rule evaluation measures Machine learning
43	Combinação de classificadores simbólicos utilizando medidas de regras de conhecimento e algoritmos genéticos / Combinig classifiers using knowledge rule measures and genetic algortgms Flávia Cristina Bernardini 29 August 2006 (has links) A qualidade das hipóteses induzidas pelos atuais sistemas de aprendizado de máquina supervisionado depende da quantidade dos exemplos no conjunto de treinamento. Por outro lado, muitos dos sistemas de aprendizado de máquina conhecidos não estão preparados para trabalhar com uma grande quantidade de exemplos. Grandes conjuntos de dados são típicos em mineração de dados. Uma maneira para resolver este problema consiste em construir ensembles de classificadores. Um ensemble é um conjunto de classificadores cujas decisões são combinadas de alguma maneira para classificar um novo caso. Apesar de melhorar o poder de predição dos algoritmos de aprendizado, ensembles podem ser compostos por muitos classificadores, o que pode ser indesejável. Ainda, apesar de ensembles classificarem novos exemplos melhor que cada classificador individual, eles se comportam como caixas pretas, no sentido de não oferecer ao usuário alguma explicação relacionada à classificação por eles fornecida. Assim, neste trabalho propomos uma abordagem que utiliza algoritmos de aprendizado simbólico para construir ensembles de classificadores simbólicos que explicam suas decisões de classificação e são tão ou mais precisos que o mais preciso dos seus classificadores individuais. Além disso, considerando que algoritmos de aprendizado simbólico utilizam métodos de busca local para induzir classificadores quanto que algoritmos genéticos utilizam métodos de busca global, propomos uma segunda abordagem para aprender conceitos simbólicos de grandes bases de dados utilizando algoritmos genéticos para evoluir classificadores simbólicos em um u´ nico classificador simbólico, de maneira que o classificador evoluído é mais preciso que os classificadores iniciais. Ambas propostas foram implementadas em dois sistemas computacionais. Diversos experimentos usando diferentes conjuntos de dados foram conduzidos para avaliar ambas as propostas. Ainda que os resultados experimenta das duas soluções propostas são promissores, os melhores resultados foram obtidos utilizando a abordagem relacionada a algoritmos genéticos / The quality of hypotheses induced by most of the available supervised machine learning algorithms depends on the quantity and quality of the instances in the training set. However, several well known learning algorithms are not able to manipulate many instances making it difficult to induce good classifiers from large databases, as are needed in the Data Mining process. One approach to overcome this problem is to construct ensembles of classifiers. An ensemble is a set of classifiers whose decisions are combined in some way to classify new cases (instances). However, although ensembles improve learning algorithms power prediction, ensembles may use an undesired large set of classifiers. Furthermore, despite classifying new cases better than each individual classifier, ensembles are generally a sort of ?black-box? classifier, not being able to explain their classification decisions. To this end, in this work we propose an approach that uses symbolic learning algorithms to construct ensembles of symbolic classifiers that can explain their classification decisions so that the ensemble is as accurate as or more accurate than the individual classifiers. Furthermore, considering that symbolic learning algorithms use local search methods to induce classifiers while genetic algorithms use global search methods, we propose a second approach to learn symbolic concepts from large databases using genetic algorithms to evolve symbolic classifiers into only one symbolic classifier so that the evolved classifier is more accurate than the initial ones. Both proposals were implemented in two computational systems. Several experiments using different databases were conducted in order to evaluate both proposals. Results show that although both proposals are promising, the approach using genetic algorithms produces better results. Algoritmos genéticos Aprendizado de máquina Combinação de classificadores Computação evolutiva Essembles de classificadores Combining classifiers Essembles of classifiers Evolutionary computtion Genetic algorithms Knowledge rule evaluation measures Machine learning
44	GEOFIER: um sistema de anotação geográfica de textos com o uso de classificadores de aprendizagem de máquina. / GEOFIER: a geotagging system based on machine learning text classifiers. Eduardo Marcel Maçan 13 August 2015 (has links) A anotação geográfica de documentos consiste na adoção de metadados para a identificação de nomes de locais e a posição de suas ocorrências no texto. Esta informação é útil, por exemplo, para mecanismos de busca. A partir dos topônimos mencionados no texto é possível identificar o contexto espacial em que o assunto do texto está inserido, o que permite agrupar documentos que se refiram a um mesmo contexto, atribuindo ao documento um escopo geográfico. Esta Dissertação de Mestrado apresenta um novo método, batizado de Geofier, para determinação do escopo geográfico de documentos. A novidade apresentada pelo Geofier é a possibilidade da identificação do escopo geográfico de um documento por meio de classificadores de aprendizagem de máquina treinados sem o uso de um gazetteer e sem premissas quanto à língua dos textos analisados. A Wikipédia foi utilizada como fonte de um conjunto de documentos anotados geograficamente para o treinamento de uma hierarquia de Classificadores Naive Bayes e Support Vector Machines (SVMs). Uma comparação de desempenho entre o Geofier e uma reimplementação do sistema Web-a-Where foi realizada em relação à determinação do escopo geográfico dos textos da Wikipédia. A hierarquia do Geofier foi treinada e avaliada de duas formas: usando topônimos do mesmo gazetteer que o Web-a-Where e usando n-gramas extraídos dos documentos de treinamento. Como resultado, o Geofier manteve desempenho superior ao obtido pela reimplementação do Web-a-Where. / Automatic text geotagging is the process by which mentions of place names and their positions in text are identified as metadata, allowing this information to be used by specialized applications, like Search Engines. It is possible to identify the geographic scope of a document by analysing the toponyms it mentions and then group documents by their geographic context, effectively adding a geographic scope to the documents. This dissertation presents a new method to identify the geographic scope of text, named Geofier. The novelty in Geofier is that it uses machine learning text classifiers, trained without the need of a gazetteer and without making assumptions regarding the language in which the documents are written. Wikipedia was used as the source for a geotagged text dataset in order to train a hierarchy of Naive Bayes and Support Vector Machine (SVM) classifiers. The Geofier hierarchy was then trained and evaluated, first using toponyms from the same gazetteer as Web-a-Where and then using n-grams extracted from the training samples as attributes. Geofier performed significantly better when compared to a Web-a-Where implementation. Anotação geográfica Aprendizagem computacional Classificação automática de texto Gazetteers Geotagging Hierarquias de classificadores de texto Inteligência artificial Mineração de dados Topônímia Automatic text classifiers Gazetteers Geotagging Hierarchy of text classifiers Toponym ambiguity
45	A Multiclassifier Approach to Motor Unit Potential Classification for EMG Signal Decomposition Rasheed, Sarbast January 2006 (has links) EMG signal decomposition is the process of resolving a composite EMG signal into its constituent motor unit potential trains (classes) and it can be configured as a classification problem. An EMG signal detected by the tip of an inserted needle electrode is the superposition of the individual electrical contributions of the different motor units that are active, during a muscle contraction, and background interference. <BR>This thesis addresses the process of EMG signal decomposition by developing an interactive classification system, which uses multiple classifier fusion techniques in order to achieve improved classification performance. The developed system combines heterogeneous sets of base classifier ensembles of different kinds and employs either a one level classifier fusion scheme or a hybrid classifier fusion approach. <BR>The hybrid classifier fusion approach is applied as a two-stage combination process that uses a new aggregator module which consists of two combiners: the first at the abstract level of classifier fusion and the other at the measurement level of classifier fusion such that it uses both combiners in a complementary manner. Both combiners may be either data independent or the first combiner data independent and the second data dependent. For the purpose of experimentation, we used as first combiner the majority voting scheme, while we used as the second combiner one of the fixed combination rules behaving as a data independent combiner or the fuzzy integral with the lambda-fuzzy measure as an implicit data dependent combiner. <BR>Once the set of motor unit potential trains are generated by the classifier fusion system, the firing pattern consistency statistics for each train are calculated to detect classification errors in an adaptive fashion. This firing pattern analysis allows the algorithm to modify the threshold of assertion required for assignment of a motor unit potential classification individually for each train based on an expectation of erroneous assignments. <BR>The classifier ensembles consist of a set of different versions of the Certainty classifier, a set of classifiers based on the nearest neighbour decision rule: the fuzzy <em>k</em>-NN and the adaptive fuzzy <em>k</em>-NN classifiers, and a set of classifiers that use a correlation measure as an estimation of the degree of similarity between a pattern and a class template: the matched template filter classifiers and its adaptive counterpart. The base classifiers, besides being of different kinds, utilize different types of features and their performances were investigated using both real and simulated EMG signals of different complexities. The feature sets extracted include time-domain data, first- and second-order discrete derivative data, and wavelet-domain data. <BR>Following the so-called <em>overproduce and choose</em> strategy to classifier ensemble combination, the developed system allows the construction of a large set of candidate base classifiers and then chooses, from the base classifiers pool, subsets of specified number of classifiers to form candidate classifier ensembles. The system then selects the classifier ensemble having the maximum degree of agreement by exploiting a diversity measure for designing classifier teams. The kappa statistic is used as the diversity measure to estimate the level of agreement between the base classifier outputs, i. e. , to measure the degree of decision similarity between the base classifiers. This mechanism of choosing the team's classifiers based on assessing the classifier agreement throughout all the trains and the unassigned category is applied during the one level classifier fusion scheme and the first combiner in the hybrid classifier fusion approach. For the second combiner in the hybrid classifier fusion approach, we choose team classifiers also based on kappa statistics but by assessing the classifiers agreement only across the unassigned category and choose those base classifiers having the minimum agreement. <BR>Performance of the developed classifier fusion system, in both of its variants, i. e. , the one level scheme and the hybrid approach was evaluated using synthetic simulated signals of known properties and real signals and then compared it with the performance of the constituent base classifiers. Across the EMG signal data sets used, the hybrid approach had better average classification performance overall, specially in terms of reducing the number of classification errors. Systems Design Combination of multiple classifiers classifier ensembles motor unit potential classification motor unit firing patterns base classifiers one level classifier fusion hybrid classifier fusion diversity measure classifier agreement.
46	A Multiclassifier Approach to Motor Unit Potential Classification for EMG Signal Decomposition Rasheed, Sarbast January 2006 (has links) EMG signal decomposition is the process of resolving a composite EMG signal into its constituent motor unit potential trains (classes) and it can be configured as a classification problem. An EMG signal detected by the tip of an inserted needle electrode is the superposition of the individual electrical contributions of the different motor units that are active, during a muscle contraction, and background interference. <BR>This thesis addresses the process of EMG signal decomposition by developing an interactive classification system, which uses multiple classifier fusion techniques in order to achieve improved classification performance. The developed system combines heterogeneous sets of base classifier ensembles of different kinds and employs either a one level classifier fusion scheme or a hybrid classifier fusion approach. <BR>The hybrid classifier fusion approach is applied as a two-stage combination process that uses a new aggregator module which consists of two combiners: the first at the abstract level of classifier fusion and the other at the measurement level of classifier fusion such that it uses both combiners in a complementary manner. Both combiners may be either data independent or the first combiner data independent and the second data dependent. For the purpose of experimentation, we used as first combiner the majority voting scheme, while we used as the second combiner one of the fixed combination rules behaving as a data independent combiner or the fuzzy integral with the lambda-fuzzy measure as an implicit data dependent combiner. <BR>Once the set of motor unit potential trains are generated by the classifier fusion system, the firing pattern consistency statistics for each train are calculated to detect classification errors in an adaptive fashion. This firing pattern analysis allows the algorithm to modify the threshold of assertion required for assignment of a motor unit potential classification individually for each train based on an expectation of erroneous assignments. <BR>The classifier ensembles consist of a set of different versions of the Certainty classifier, a set of classifiers based on the nearest neighbour decision rule: the fuzzy <em>k</em>-NN and the adaptive fuzzy <em>k</em>-NN classifiers, and a set of classifiers that use a correlation measure as an estimation of the degree of similarity between a pattern and a class template: the matched template filter classifiers and its adaptive counterpart. The base classifiers, besides being of different kinds, utilize different types of features and their performances were investigated using both real and simulated EMG signals of different complexities. The feature sets extracted include time-domain data, first- and second-order discrete derivative data, and wavelet-domain data. <BR>Following the so-called <em>overproduce and choose</em> strategy to classifier ensemble combination, the developed system allows the construction of a large set of candidate base classifiers and then chooses, from the base classifiers pool, subsets of specified number of classifiers to form candidate classifier ensembles. The system then selects the classifier ensemble having the maximum degree of agreement by exploiting a diversity measure for designing classifier teams. The kappa statistic is used as the diversity measure to estimate the level of agreement between the base classifier outputs, i. e. , to measure the degree of decision similarity between the base classifiers. This mechanism of choosing the team's classifiers based on assessing the classifier agreement throughout all the trains and the unassigned category is applied during the one level classifier fusion scheme and the first combiner in the hybrid classifier fusion approach. For the second combiner in the hybrid classifier fusion approach, we choose team classifiers also based on kappa statistics but by assessing the classifiers agreement only across the unassigned category and choose those base classifiers having the minimum agreement. <BR>Performance of the developed classifier fusion system, in both of its variants, i. e. , the one level scheme and the hybrid approach was evaluated using synthetic simulated signals of known properties and real signals and then compared it with the performance of the constituent base classifiers. Across the EMG signal data sets used, the hybrid approach had better average classification performance overall, specially in terms of reducing the number of classification errors. Systems Design Combination of multiple classifiers classifier ensembles motor unit potential classification motor unit firing patterns base classifiers one level classifier fusion hybrid classifier fusion diversity measure classifier agreement.
47	Charge-based analog circuits for reconfigurable smart sensory systems Peng, Sheng-Yu 02 July 2008 (has links) The notion of designing circuits based on charge sensing, charge adaptation, and charge programming is explored in this research. This design concept leads to a low-power capacitive sensing interface circuit that has been designed and tested with a MEMS microphone and a capacitive micromachined ultrasonic transducer. Moreover, by using the charge programming technique, a designed floating-gate based large-scale field-programmable analog array (FPAA) containing a universal sensor interface sets the stage for reconfigurable smart sensory systems. Based on the same charge programming technique, a compact programmable analog radial-basis-function (RBF) based classifier and a resultant analog vector quantizer have been developed and tested. Measurement results have shown that the analog RBF-based classifier is at least two orders of magnitude more power-efficient than an equivalent digital processor. Furthermore, an adaptive bump circuit that can facilitate unsupervised learning in the analog domain has also been proposed. A projection neural network for a support vector machine, a powerful and more complicated binary classification algorithm, has also been proposed. This neural network is suitable for analog VLSI implementation and has been simulated and verified on the transistor level. These analog classifiers can be integrated at the interface to build smart sensory systems. Floating-gate Analog support vector machine RBF-classifiers Capacitive sensing Analog classifiers Electronic analog computers Circuits Microelectronics Charge coupled devices Field programmable gate arrays
48	Unární klasifikátor obrazových dat / Unary Classification of Image Data Beneš, Jiří January 2021 (has links) The work deals with an introduction to classification algorithms. It then divides classifiers into unary, binary and multi-class and describes the different types of classifiers. The work compares individual classifiers and their areas of use. For unary classifiers, practical examples and a list of used architectures are given in the work. The work contains a chapter focused on the comparison of the effects of hyper parameters on the quality of unary classification for individual architectures. Part of the submission is a practical example of reimplementation of the unary classifier.
49	Acquisition of Cantonese sortal classifiers in Cantonese-English bilinguals Chung, Poy-san., 鍾佩珊. January 2007 (has links) published_or_final_version / abstract / Linguistics / Master / Master of Arts Cantonese dialects - Acquisition. Bilingualism in children.
50	Humanness and classifiers in Mandarin Chinese Frankowsky, Maximilian, Ke, Dan 12 May 2017 (has links) (PDF) Mandarin Chinese numeral classifiers receive considerable at-tention in linguistic research. The status of the general classifier 个 gè re-mains unresolved. Many linguists suggest that the use of 个 gè as a noun classifier is arbitrary. This view is challenged in the current study. Relying on the CCL-Corpus of Peking University and data from Google, we investigated which nouns for living beings are most likely classified by the general clas-sifier 个 gè. The results suggest that the use of the classifier 个 gè is motivated by an anthropocentric continuum as described by Köpcke and Zubin in the 1990s. We tested Köpcke and Zubin’s approach with Chinese native speakers. We examined 76 animal expressions to explore the semantic interdepen-dence of numeral classifiers and the nouns. Our study shows that nouns with the semantic feature [+ animate] are more likely to be classified by 个 gè if their denotatum is either very close to or very far located from the anthropo-centric center. In contrast animate nouns whose denotata are located at some intermediate distance from the anthropocentric center are less likely to be classified by 个 gè. anthropozentrisches Kontinuum Chinesisch Menschlichkeit Zahlwortklassifikatoren anthropocentric continuum Chinese humanness numeral classifiers ddc:410

Search results