Global ETD Search

31	Semi-supervised structured prediction models Brefeld, Ulf 14 March 2008 (has links) Das Lernen aus strukturierten Eingabe- und Ausgabebeispielen ist die Grundlage für die automatisierte Verarbeitung natürlich auftretender Problemstellungen und eine Herausforderung für das Maschinelle Lernen. Die Einordnung von Objekten in eine Klassentaxonomie, die Eigennamenerkennung und das Parsen natürlicher Sprache sind mögliche Anwendungen. Klassische Verfahren scheitern an der komplexen Natur der Daten, da sie die multiplen Abhängigkeiten und Strukturen nicht erfassen können. Zudem ist die Erhebung von klassifizierten Beispielen in strukturierten Anwendungsgebieten aufwändig und ressourcenintensiv, während unklassifizierte Beispiele günstig und frei verfügbar sind. Diese Arbeit thematisiert halbüberwachte, diskriminative Vorhersagemodelle für strukturierte Daten. Ausgehend von klassischen halbüberwachten Verfahren werden die zugrundeliegenden analytischen Techniken und Algorithmen auf das Lernen mit strukturierten Variablen übertragen. Die untersuchten Verfahren basieren auf unterschiedlichen Prinzipien und Annahmen, wie zum Beispiel der Konsensmaximierung mehrerer Hypothesen im Lernen aus mehreren Sichten, oder der räumlichen Struktur der Daten im transduktiven Lernen. Desweiteren wird in einer Fallstudie zur Email-Batcherkennung die räumliche Struktur der Daten ausgenutzt und eine Lösung präsentiert, die der sequenziellen Natur der Daten gerecht wird. Aus den theoretischen Überlegungen werden halbüberwachte, strukturierte Vorhersagemodelle und effiziente Optmierungsstrategien abgeleitet. Die empirische Evaluierung umfasst Klassifikationsprobleme, Eigennamenerkennung und das Parsen natürlicher Sprache. Es zeigt sich, dass die halbüberwachten Methoden in vielen Anwendungen zu signifikant kleineren Fehlerraten führen als vollständig überwachte Baselineverfahren. / Learning mappings between arbitrary structured input and output variables is a fundamental problem in machine learning. It covers many natural learning tasks and challenges the standard model of learning a mapping from independently drawn instances to a small set of labels. Potential applications include classification with a class taxonomy, named entity recognition, and natural language parsing. In these structured domains, labeled training instances are generally expensive to obtain while unlabeled inputs are readily available and inexpensive. This thesis deals with semi-supervised learning of discriminative models for structured output variables. The analytical techniques and algorithms of classical semi-supervised learning are lifted to the structured setting. Several approaches based on different assumptions of the data are presented. Co-learning, for instance, maximizes the agreement among multiple hypotheses while transductive approaches rely on an implicit cluster assumption. Furthermore, in the framework of this dissertation, a case study on email batch detection in message streams is presented. The involved tasks exhibit an inherent cluster structure and the presented solution exploits the streaming nature of the data. The different approaches are developed into semi-supervised structured prediction models and efficient optimization strategies thereof are presented. The novel algorithms generalize state-of-the-art approaches in structural learning such as structural support vector machines. Empirical results show that the semi-supervised algorithms lead to significantly lower error rates than their fully supervised counterparts in many application areas, including multi-class classification, named entity recognition, and natural language parsing. Lernen mit strukturierten Daten halbüberwachtes Lernen Kernverfahren natürliche Sprachverarbeitung Learning with structured data semi-supervised learning kernel machines natural language processing 004 Informatik 28 Informatik, Datenverarbeitung ddc:004
32	Supervised metric learning with generalization guarantees Bellet, Aurélien 11 December 2012 (has links) (PDF) In recent years, the crucial importance of metrics in machine learningalgorithms has led to an increasing interest in optimizing distanceand similarity functions using knowledge from training data to make them suitable for the problem at hand.This area of research is known as metric learning. Existing methods typically aim at optimizing the parameters of a given metric with respect to some local constraints over the training sample. The learned metrics are generally used in nearest-neighbor and clustering algorithms.When data consist of feature vectors, a large body of work has focused on learning a Mahalanobis distance, which is parameterized by a positive semi-definite matrix. Recent methods offer good scalability to large datasets.Less work has been devoted to metric learning from structured objects (such as strings or trees), because it often involves complex procedures. Most of the work has focused on optimizing a notion of edit distance, which measures (in terms of number of operations) the cost of turning an object into another.We identify two important limitations of current supervised metric learning approaches. First, they allow to improve the performance of local algorithms such as k-nearest neighbors, but metric learning for global algorithms (such as linear classifiers) has not really been studied so far. Second, and perhaps more importantly, the question of the generalization ability of metric learning methods has been largely ignored.In this thesis, we propose theoretical and algorithmic contributions that address these limitations. Our first contribution is the derivation of a new kernel function built from learned edit probabilities. Unlike other string kernels, it is guaranteed to be valid and parameter-free. Our second contribution is a novel framework for learning string and tree edit similarities inspired by the recent theory of (epsilon,gamma,tau)-good similarity functions and formulated as a convex optimization problem. Using uniform stability arguments, we establish theoretical guarantees for the learned similarity that give a bound on the generalization error of a linear classifier built from that similarity. In our third contribution, we extend the same ideas to metric learning from feature vectors by proposing a bilinear similarity learning method that efficiently optimizes the (epsilon,gamma,tau)-goodness. The similarity is learned based on global constraints that are more appropriate to linear classification. Generalization guarantees are derived for our approach, highlighting that our method minimizes a tighter bound on the generalization error of the classifier. Our last contribution is a framework for establishing generalization bounds for a large class of existing metric learning algorithms. It is based on a simple adaptation of the notion of algorithmic robustness and allows the derivation of bounds for various loss functions and regularizers. Metric learning Statistical learning Convex optimization Classification Structured data Edit distance Generalization bounds
33	Uma técnica de indexação de dados semi-estruturados para o processamento eficiente de consultas com ramificação Viana, Talles Brito 20 April 2012 (has links) Made available in DSpace on 2015-05-14T12:36:35Z (GMT). No. of bitstreams: 1 arquivototal.pdf: 1730516 bytes, checksum: 167ec230d84a25e110ad4386ec5aae74 (MD5) Previous issue date: 2012-04-20 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / The explosive growth of web-based information systems has created various sources and vast quantities of semi-structured data, which need to be indexed by search engines in order to allow the retrieval of documents according to user needs. However, one of the major challenges in the development of indexing techniques for semi-structured data is related to how to index not only textual but also structural content. The main issue is how to efficiently handle branching path expressions without introducing precision loss as well as undesired growth of query processing costs and index file sizes. Several proposals for indexing semistructured data can be found in the literature. Despite their relevant contributions, existing proposals suffer from at least one of the problems related to precision loss, storage space requirements and query processing costs. In such a context, this thesis proposes an efficient, lossless path-based indexing technique (named as BranchGuide) for semi-structured data, which deals with a well-defined class of branching path expressions. This well-defined class includes branching paths that allow expressing parent-child dependencies between elements in which may be imposed restrictions over the textual value of attributes of such elements. As evinced by experimental evaluation, the adoption of the BranchGuide technique results in excellent query processing time and generates smaller index file sizes than a structural join indexing technique. / O surgimento de sistemas baseados na Web tem gerado uma vasta quantidade de fontes de documentos semi-estruturados, os quais necessitam ser indexados por sistemas de busca a fim de possibilitar a descoberta de documentos de acordo com necessidades de informação do usuário. Entretanto, um dos maiores desafios no desenvolvimento de técnicas de indexação para documentos semi-estruturados diz respeito a como indexar não somente o conteúdo textual, mas também a informação estrutural dos documentos. O principal problema está em prover suporte para consultas com ramificação sem introduzir fatores que causem perda de precisão aos resultados de pesquisa, bem como, o crescimento indesejado do tempo de processamento de consultas e dos tamanhos de índice. Várias técnicas de indexação para dados semi-estruturados são encontradas na literatura. Apesar das relevantes contribuições, as propostas existentes sofrem com problemas relacionados à perda de precisão, requisitos de armazenamento ou custos de processamento de consultas. Neste contexto, nesta dissertação é proposta uma técnica de indexação (denominada BranchGuide) para dados semi-estruturados que suporta uma bem definida classe de consultas com ramificação sem perda de precisão. Esta classe compreende caminhos com ramificação que permitem expressar dependências paifilho entre elementos nos quais podem ser impostas restrições sob os valores de atributos de tais elementos. Como evidenciado experimentalmente, a adoção da técnica BranchGuide gera excelentes tempos de processamento de consulta e tamanhos de índice menores do que os gerados por uma técnica de interseção estrutural. Informática Indexação Recuperação de Informação Dados Semi-Estruturados Data Processing Indexing Techniques Information Retrieval Semi-Structured Data
34	Hlasem ovládaný elektronický zubní kříž / Voice controled electronic health record in dentistry Hippmann, Radek January 2012 (has links) Title: Voice controlled electronic health record in dentistry Author: MUDr. Radek Hippmann Department: Department of paediatric stomatology, Faculty hospital Motol Supervisor: Prof. MUDr. Taťjana Dostalová, DrSc., MBA Supervisor's e-mail: Tatjana.Dostalova@fnmotol.cz This PhD thesis is concerning with development of the complex electronic health record (EHR) for the field of dentistry. This system is also enhanced with voice control based on the Automatic speech recognition (ASR) system and module for speech synthesis Text-to- speech (TTS). In the first part of the thesis is described the whole issue and are defined particular areas, whose combination is essential for EHR system creation in this field. It is mainly basic delimiting of terms and areas in the dentistry. In the next step we are engaged in temporomandibular joint (TMJ) problematic, which is often ignored and trends in EHR and voice technologies are also described. In the methodological part are described delineated technologies used during the EHR system creation, voice recognition and TMJ disease classification. Following part incorporates results description, which are corresponding with the knowledge base in dentistry and TMJ. From this knowledge base originates the graphic user interface DentCross, which is serving for dental data...
35	Uma abordagem de predição estruturada baseada no modelo perceptron Coelho, Maurício Archanjo Nunes 25 June 2015 (has links) Submitted by Renata Lopes (renatasil82@gmail.com) on 2017-03-06T17:58:43Z No. of bitstreams: 1 mauricioarchanjonunescoelho.pdf: 10124655 bytes, checksum: 549fa53eba76e81b76ddcbce12c97e55 (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-03-06T20:26:43Z (GMT) No. of bitstreams: 1 mauricioarchanjonunescoelho.pdf: 10124655 bytes, checksum: 549fa53eba76e81b76ddcbce12c97e55 (MD5) / Made available in DSpace on 2017-03-06T20:26:44Z (GMT). No. of bitstreams: 1 mauricioarchanjonunescoelho.pdf: 10124655 bytes, checksum: 549fa53eba76e81b76ddcbce12c97e55 (MD5) Previous issue date: 2015-06-25 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / A teoria sobre aprendizado supervisionado tem avançado significativamente nas últimas décadas. Diversos métodos são largamente utilizados para resoluções dos mais variados problemas, citando alguns: sistemas especialistas para obter respostas to tipo verdadeiro/ falso, o modelo Perceptron para separação de classes, Máquina de Vetores Suportes (SVMs) e o Algoritmo de Margem Incremental (IMA) no intuito de aumentar a margem de separação, suas versões multi-classe, bem como as redes neurais artificiais, que apresentam possibilidades de entradas relativamente complexas. Porém, como resolver tarefas que exigem respostas tão complexas quanto as perguntas? Tais respostas podem consistir em várias decisões inter-relacionadas que devem ser ponderadas uma a uma para se chegar a uma solução satisfatória e globalmente consistente. Será visto no decorrer do trabalho que existem problemas de relevante interesse que apresentam estes requisitos. Uma questão que naturalmente surge é a necessidade de se lidar com a explosão combinatória das possíveis soluções. Uma alternativa encontrada apresenta-se através da construção de modelos que compactam e capturam determinadas propriedades estruturais do problema: correlações sequenciais, restrições temporais, espaciais, etc. Tais modelos, chamados de estruturados, incluem, entre outros, modelos gráficos, tais como redes de Markov e problemas de otimização combinatória, como matchings ponderados, cortes de grafos e agrupamentos de dados com padrões de similaridade e correlação. Este trabalho formula, apresenta e discute estratégias on-line eficientes para predição estruturada baseadas no princípio de separação de classes derivados do modelo Perceptron e define um conjunto de algoritmos de aprendizado supervisionado eficientes quando comparados com outras abordagens. São também realizadas e descritas duas aplicações experimentais a saber: inferência dos custos das diversas características relevantes para a realização de buscas em mapas variados e a inferência dos parâmetros geradores dos grafos de Markov. Estas aplicações têm caráter prático, enfatizando a importância da abordagem proposta. / The theory of supervised learning has significantly advanced in recent decades. Several methods are widely used for solutions of many problems, such as expert systems for answers to true/false, Support Vector Machine (SVM) and Incremental Margin Algorithm (IMA). In order to increase the margin of separation, as well as its multi-class versions, in addition to the artificial neural networks which allow complex input data. But how to solve tasks that require answers as complex as the questions? Such responses may consist of several interrelated decisions to be considered one by one to arrive at a satisfactory and globally consistent solution. Will be seen throughout the thesis, that there are problems of relevant interest represented by these requirements. One question that naturally arises is the need to deal with the exponential explosion of possible answers. As a alternative, we have found through the construction of models that compress and capture certain structural properties of the problem: sequential correlations, temporal constraints, space, etc. These structured models include, among others, graphical models, such as Markov networks and combinatorial optimization problems, such as weighted matchings, graph cuts and data clusters with similarity and correlation patterns. This thesis formulates, presents and discusses efficient online strategies for structured prediction based on the principle of separation of classes, derived from the Perceptron and defines a set of efficient supervised learning algorithms compared to other approaches. Also are performed and described two experimental applications: the costs prediction of relevant features on maps and the prediction of the probabilistic parameters for the generating Markov graphs. These applications emphasize the importance of the proposed approach. CNPQ::CIENCIAS EXATAS E DA TERRA Aprendizado de máquina Predição de dados estruturados Perceptron multi-classe Planejamento de caminhos Grafos de Markov Machine Learning Perceptron Multi-class Path Planning Prediction of Structured Data Markov Graphs
36	Hlasem ovládaný elektronický zubní kříž / Voice controled electronic health record in dentistry Hippmann, Radek January 2012 (has links) Title: Voice controlled electronic health record in dentistry Author: MUDr. Radek Hippmann Department: Department of paediatric stomatology, Faculty hospital Motol Supervisor: Prof. MUDr. Taťjana Dostalová, DrSc., MBA Supervisor's e-mail: Tatjana.Dostalova@fnmotol.cz This PhD thesis is concerning with development of the complex electronic health record (EHR) for the field of dentistry. This system is also enhanced with voice control based on the Automatic speech recognition (ASR) system and module for speech synthesis Text-to- speech (TTS). In the first part of the thesis is described the whole issue and are defined particular areas, whose combination is essential for EHR system creation in this field. It is mainly basic delimiting of terms and areas in the dentistry. In the next step we are engaged in temporomandibular joint (TMJ) problematic, which is often ignored and trends in EHR and voice technologies are also described. In the methodological part are described delineated technologies used during the EHR system creation, voice recognition and TMJ disease classification. Following part incorporates results description, which are corresponding with the knowledge base in dentistry and TMJ. From this knowledge base originates the graphic user interface DentCross, which is serving for dental data...
37	ADVANCED INTERFACE FOR QUERYING GRAPH DATA Mayes, Stephen Frederick January 2008 (has links) No description available. Computer Science Query Query Interface Graph Data Advanced Query Interface Semi-structured data Pathways Biological Pathway Data Path Query Neighborhood Query
38	Abordagem para integração automática de dados estruturados e não estruturados em um contexto Big Data / Approach for automatic integration of structured and unstructured data in a Big Data context Keylla Ramos Saes 22 November 2018 (has links) O aumento de dados disponíveis para uso tem despertado o interesse na geração de conhecimento pela integração de tais dados. No entanto, a tarefa de integração requer conhecimento dos dados e também dos modelos de dados utilizados para representá-los. Ou seja, a realização da tarefa de integração de dados requer a participação de especialistas em computação, o que limita a escalabilidade desse tipo de tarefa. No contexto de Big Data, essa limitação é reforçada pela presença de uma grande variedade de fontes e modelos heterogêneos de representação de dados, como dados relacionais com dados estruturados e modelos não relacionais com dados não estruturados, essa variedade de representações apresenta uma complexidade adicional para o processo de integração de dados. Para lidar com esse cenário é necessário o uso de ferramentas de integração que reduzam ou até mesmo eliminem a necessidade de intervenção humana. Como contribuição, este trabalho oferece a possibilidade de integração de diversos modelos de representação de dados e fontes de dados heterogêneos, por meio de uma abordagem que permite o do uso de técnicas variadas, como por exemplo, algoritmos de comparação por similaridade estrutural dos dados, algoritmos de inteligência artificial, que através da geração do metadados integrador, possibilita a integração de dados heterogêneos. Essa flexibilidade permite lidar com a variedade crescente de dados, é proporcionada pela modularização da arquitetura proposta, que possibilita que integração de dados em um contexto Big Data de maneira automática, sem a necessidade de intervenção humana / The increase of data available to use has piqued interest in the generation of knowledge for the integration of such data bases. However, the task of integration requires knowledge of the data and the data models used to represent them. Namely, the accomplishment of the task of data integration requires the participation of experts in computing, which limits the scalability of this type of task. In the context of Big Data, this limitation is reinforced by the presence of a wide variety of sources and heterogeneous data representation models, such as relational data with structured and non-relational models with unstructured data, this variety of features an additional complexity representations for the data integration process. Handling this scenario is required the use of integration tools that reduce or even eliminate the need for human intervention. As a contribution, this work offers the possibility of integrating diverse data representation models and heterogeneous data sources through the use of varied techniques such as comparison algorithms for structural similarity of the artificial intelligence algorithms, data, among others. This flexibility, allows dealing with the growing variety of data, is provided by the proposed modularized architecture, which enables data integration in a context Big Data automatically, without the need for human intervention Banco de dados não relacionais Banco de dados relacionais Big Data Dados estruturados Dados não estruturados Integração de dados Integração de dados heterogêneos NoSQL Big Data Data integration Heterogeneous data integration Non-relational database NoSQL Relational database Structured data Unstructured data
39	Abordagem para integração automática de dados estruturados e não estruturados em um contexto Big Data / Approach for automatic integration of structured and unstructured data in a Big Data context Saes, Keylla Ramos 22 November 2018 (has links) O aumento de dados disponíveis para uso tem despertado o interesse na geração de conhecimento pela integração de tais dados. No entanto, a tarefa de integração requer conhecimento dos dados e também dos modelos de dados utilizados para representá-los. Ou seja, a realização da tarefa de integração de dados requer a participação de especialistas em computação, o que limita a escalabilidade desse tipo de tarefa. No contexto de Big Data, essa limitação é reforçada pela presença de uma grande variedade de fontes e modelos heterogêneos de representação de dados, como dados relacionais com dados estruturados e modelos não relacionais com dados não estruturados, essa variedade de representações apresenta uma complexidade adicional para o processo de integração de dados. Para lidar com esse cenário é necessário o uso de ferramentas de integração que reduzam ou até mesmo eliminem a necessidade de intervenção humana. Como contribuição, este trabalho oferece a possibilidade de integração de diversos modelos de representação de dados e fontes de dados heterogêneos, por meio de uma abordagem que permite o do uso de técnicas variadas, como por exemplo, algoritmos de comparação por similaridade estrutural dos dados, algoritmos de inteligência artificial, que através da geração do metadados integrador, possibilita a integração de dados heterogêneos. Essa flexibilidade permite lidar com a variedade crescente de dados, é proporcionada pela modularização da arquitetura proposta, que possibilita que integração de dados em um contexto Big Data de maneira automática, sem a necessidade de intervenção humana / The increase of data available to use has piqued interest in the generation of knowledge for the integration of such data bases. However, the task of integration requires knowledge of the data and the data models used to represent them. Namely, the accomplishment of the task of data integration requires the participation of experts in computing, which limits the scalability of this type of task. In the context of Big Data, this limitation is reinforced by the presence of a wide variety of sources and heterogeneous data representation models, such as relational data with structured and non-relational models with unstructured data, this variety of features an additional complexity representations for the data integration process. Handling this scenario is required the use of integration tools that reduce or even eliminate the need for human intervention. As a contribution, this work offers the possibility of integrating diverse data representation models and heterogeneous data sources through the use of varied techniques such as comparison algorithms for structural similarity of the artificial intelligence algorithms, data, among others. This flexibility, allows dealing with the growing variety of data, is provided by the proposed modularized architecture, which enables data integration in a context Big Data automatically, without the need for human intervention Banco de dados não relacionais Banco de dados relacionais Big Data Big Data Dados estruturados Dados não estruturados Data integration Heterogeneous data integration Integração de dados Integração de dados heterogêneos Non-relational database NoSQL NoSQL Relational database Structured data Unstructured data
40	[en] QEEF: AN EXTENSIBLE QUERY EXECUTION ENGINE / [pt] QEEF: UMA MÁQUINA DE EXECUÇÃO DE CONSULTAS FAUSTO VERAS MARANHAO AYRES 30 June 2004 (has links) [pt] O processamento de consultas em Sistemas de Gerência de Banco de Dados tradicionais tem sido largamente estudado na literatura e utilizado comercialmente com enorme sucesso. Isso é devido, em parte, à eficiência das Máquinas de Execução de Consultas (MEC) no suporte ao modelo de execução tradicional. Porém, o surgimento de novos cenários de aplicação, principalmente em conseqüência do modelo computacional da web, motivou a pesquisa de novos modelos de execução, tais como: modelo adaptável e modelo contínuo, além da pesquisa de modelos de dados semi-estruturados, tal como o XML, ambos não suportados pelas MEC tradicionais. O objetivo desta tese consiste no desenvolvimento de uma MEC extensível frente a diferentes modelos de execução e de dados. Adicionalmente, esta proposta trata de maneira ortogonal o modelo de execução e o modelo de dados, o que permite a avaliação de planos de execução de consultas (PEC) com fragmentos em diferentes modelos. Utilizou-se a técnica de framework de software para a especificação da MEC extensível, produzindo o framework QEEF (Query Execution Engine Framework). A extensibilidade da solução reflete-se em um meta-modelo, denominado QUEM (QUery Execution Meta-model), capaz de exprimir diferentes modelos em um meta-PEC. O framework QEEF pré-processa um meta-PEC e produz um PEC final a ser avaliado pela MEC instanciada. Como parte da validação desta proposta, instanciou-se o QEEF para diferentes modelos de execução e de dados. / [en] Querying processing in traditional Database Management Systems (DBMS) has been extensively studied in the literature and adopted in industry. Such success is, in part, due to the performance of their Query Execution Engines (QEE) for supporting the traditional query execution model. The advent of new query scenarios, mainly due to the web computational model, has motivate the research on new execution models such as: adaptive and continuous, and on semistructured data models, such as XML, both not natively supported by traditional query engines. This thesis proposes the development of an extensible QEE adapted to the new execution and data models. Achieving this goal, we use a software design approach based on framework technique to produce the Query Execution Engine Framework (QEEF). Moreover, we address the question of the orthogonality between execution and data models, witch allows for executing query execution plans (QEP) with fragments in different models. The extensibility of our solution is specified by in a QEP by an execution meta- model named QUEM (QUery Execution Meta-model) used to express different models in a meta-QEP. During query evaluation, the latter is pre-processed by the QEEF producing a final QEP to be evaluated by the running QEE. The QEEF is instantiated for different execution and data models as part of the validation of this proposal. [pt] BANCO DE DADOS [en] DATABASE [pt] PROCESSAMENTO DE CONSULTAS [en] QUERY PROCESSING [pt] MAQUINA DE EXECUCAO DE CONSULTAS [en] QUERY EXECUTION ENGINE [pt] MODELO DE EXECUCAO DE CONSULTAS [en] QUERY EXECUTION MODEL [pt] MODELO DE DADOS SEMI-ESTRUTURADO [en] SEMI-STRUCTURED DATA MODEL [pt] FRAMEWORK DE SOFTWARE [en] SOFTWARE FRAMEWORK

Search results