• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 77
  • 74
  • 52
  • 10
  • 3
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 269
  • 269
  • 175
  • 165
  • 95
  • 56
  • 55
  • 51
  • 50
  • 47
  • 44
  • 43
  • 40
  • 40
  • 36
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Geração automática de metadados: uma contribuição para a Web semântica. / Automatic metadata generation: a contribution to the semantic Web.

Eveline Cruz Hora Gomes Ferreira 05 April 2006 (has links)
Esta Tese oferece uma contribuição na área de Web Semântica, no âmbito da representação e indexação de documentos, definindo um Modelo de geração automática de metadados baseado em contexto, a partir de documentos textuais na língua portuguesa, em formato não estruturado (txt). Um conjunto teórico amplo de assuntos ligados à criação de ambientes digitais semântico também é apresentado. Conforme recomendado em SemanticWeb.org, os documentos textuais aqui estudados foram automaticamente convertidos em páginas Web anotadas semanticamente, utilizando o Dublin Core como padrão para definição dos elementos de metadados, e o padrão RDF/XML para representação dos documentos e descrição dos elementos de metadados. Dentre os quinze elementos de metadados Dublin Core, nove foram gerados automaticamente pelo Modelo, e seis foram gerados de forma semi-automática. Os metadados Description e Subject foram os que necessitaram de algoritmos mais complexos, sendo obtidos através de técnicas estatísticas, de mineração de textos e de processamento de linguagem natural. A finalidade principal da avaliação do Modelo foi verificar o comportamento dos documentos convertidos para o formato RDF/XML, quando estes foram submetidos a um processo de recuperação de informação. Os elementos de metadados Description e Subject foram exaustivamente avaliados, uma vez que estes são os principais responsáveis por apreender a semântica de documentos textuais. A diversidade de contextos, a complexidade dos problemas relativos à língua portuguesa, e os novos conceitos introduzidos pelos padrões e tecnologias da Web Semântica, foram alguns dos fortes desafios enfrentados na construção do Modelo aqui proposto. Apesar de se ter utilizado técnicas não muito novas para a exploração dos conteúdos dos documentos, não se pode ignorar que os elementos inovadores introduzidos pela Web Semântica ofereceram avanços que possibilitaram a obtenção de resultados importantes nesta Tese. Como demonstrado aqui, a junção dessas técnicas com os padrões e tecnologias recomendados pela Web Semântica pode minimizar um dos maiores problemas da Web atual, e uma das fortes razões para a implementação da Web Semântica: a tendência dos mecanismos de busca de inundarem os usuários com resultados irrelevantes, por não levarem em consideração o contexto específico desejado pelo usuário. Dessa forma, é importante que se dê continuidade aos estudos e pesquisas em todas as áreas relacionadas à implementação da Web Semântica, dando abertura para que sistemas de informação mais funcionais sejam projetados / This Thesis offers a contribution to the Semantic Web area, in the scope of the representation and indexing of documents, defining an Automatic metadata generation model based on context, starting from textual documents not structured in the Portuguese language. A wide theoretical set of subjects related to the creation of semantic digital environments is also presented. As recommended in SemanticWeb.org, the textual documents studied here were automatically converted to Web pages written in semantic format, using Dublin Core as standard for definition of metadata elements, and the standard RDF/XML for representation of documents and description of the metadata elements. Among the fifteen Dublin Core metadata elements, nine were automatically generated by the Model, and six were generated in a semiautomatic manner. The metadata Description and Subject were the ones that required more complex algorithms, being obtained through statistical techniques, text mining techniques and natural language processing. The main purpose of the evaluation of the Model was to verify the behavior of the documents converted to the format RDF/XML, when these were submitted to an information retrieval process. The metadata elements Description and Subject were exhaustively evaluated, since these are the main ones responsible for learning the semantics of textual documents. The diversity of contexts, the complexity of the problems related to the Portuguese language, and the new concepts introduced by the standards and technologies of the Semantic Web, were some of the great challenges faced in the construction of the Model here proposed. In spite of having used techniques which are not very new for the exploration and exploitation of the contents of the documents, we cannot ignore that the innovative elements introduced by the Web Semantic have offered improvements that made possible the obtention of important results in this Thesis. As demonstrated here, the joining of those techniques with the standards and technologies recommended by the Semantic Web can minimize one of the largest problems of the current Web, and one of the strong reasons for the implementation of the Semantic Web: the tendency of the search mechanisms to flood the users with irrelevant results, because they do not take into account the specific context desired by the user. Therefore, it is important that the studies and research be continued in all of the areas related to the Semantic Web?s implementation, opening the door for more functional systems of information to be designed.
42

"O framework de integração do sistema DISCOVER" / The Discover integration framework

Prati, Ronaldo Cristiano 04 April 2003 (has links)
Talvez uma das maiores capacidades do ser humano seja a sua habilidade de aprender a partir de observações e transmitir o que aprendeu para outros humanos. Durante séculos, a humanidade vem tentado compreender o mundo em que vive e, a partir desse novo conhecimento adquirido, melhorar o mundo em que vive. O desenvolvimento da tecnologia colocou a descoberta de conhecimento em um momento ímpar na história da humanidade. Com os progressos da Ciência da Computação, e, em particular, da Inteligência Artificial - IA - e Aprendizado de Máquina -AM, hoje em dia é possível, a partir de métodos de inferência indutiva e utilizando um conjunto de exemplos, descobrir algum tipo de conhecimento implícito nesses exemplos. Entretanto, por ser uma área de pesquisa relativamente nova, e por envolver um processo tanto iterativo quanto interativo, atualmente existem poucas ferramentas que suportam eficientemente a descoberta de conhecimento a partir dos dados. Essa falta de ferramentas se agrava ainda mais no que se refere ao seu uso por pesquisadores em Aprendizado de Máquina e Aquisição de Conhecimento. Esses fatores, além do fato que algumas pesquisas em nosso Laboratório de Inteligência Computacional - LABIC - têm alguns componentes em comum, motivaram a elaboração do projeto Discover, que consiste em uma estratégia de trabalho em conjunto, envolvendo um conjunto de ferramentas que se integram e interajam, e que supram as necessidades de pesquisa dos integrantes do nosso laboratório. O Discover também pode ser utilizado como um campo de prova para desenvolver novas ferramentas e testar novas idéias. Como o Discover tem como principal finalidade o seu uso e extensão por pesquisadores, uma questão principal é que a arquitetura do projeto seja flexível o suficiente para permitir que novas pesquisas sejam englobadas e, simultaneamente, deve impor determinados padrões que permitam a integração eficiente de seus componentes. Neste trabalho, é proposto um framework de integração de componentes que tem como principal objetivo possibilitar a criação de um sistema computacional a partir das ferramentas desenvolvidas para serem utilizadas no projeto Discover. Esse framework compreende um mecanismo de adaptação de interface que cria uma camada (interface horizontal) sobre essas ferramentas, um poderoso mecanismo de metadados, que é utilizado para descrever tanto os componentes que implementam as funcionalidades do sistema quanto as configurações de experimentos criadas pelos usuário, que serão executadas pelo framework, e um ambiente de execução para essas configurações de experimentos. / One of human greatest capability is the ability to learn from observed instances of the world and to transmit what have been learnt to others. For thousands of years, we have tried to understand the world, and used the acquired knowledge to improve it. Nowadays, due to the progress in digital data acquisition and storage technology as well as significant progress in the field of Artificial Intelligence - AI, particularly Machine Learning - ML, it is possible to use inductive inference in huge databases in order to find, or discover, new knowledge from these data. The discipline concerned with this task has become known as Knowledge Discovery from Databases - KDD. However, this relatively new research area offers few tools that can efficiently be used to acquire knowledge from data. With these in mind, a group of researchers at the Computational Intelligence Laboratory - LABIC - is working on a system, called Discover, in order to help our research activities in KDD and ML. The aim of the system is to integrate ML algorithms mostly used by the community with the data and knowledge processing tools developed as the results of our work. The system can also be used as a workbench for new tools and ideas. As the main concern of the Discover is related to its use and extension by researches, an important question is related to the flexibility of its architecture. Furthermore, the Discover architecture should allow new tools be easily incorporated. Also, it should impose strong patterns to guarantee efficient component integration. In this work, we propose a component integration framework that aims the development of an integrated computational environment using the tools already implemented in the Discover project. The proposed component integration framework has been developed keeping in mind its future integration with new tools. This framework offers an interface adapter mechanism that creates a layer (horizontal interface) over these tools, a powerful metadata mechanism, which is used to describe both components implementing systems' functionalities and experiment configurations created by the user, and an environment that enables these experiment execution.
43

Uma arquitetura híbrida para descoberta de conhecimento em bases de dados: teoria dos rough sets e redes neurais artificiais mapas auto-organizáveis. / An hybrid architecture for the knowledge discovery in databases: rough sets theory and artificial neural nets self-organizing maps.

Sassi, Renato José 28 November 2006 (has links)
As bases de dados do mundo real contêm grandes volumes de dados, e entre eles escondem-se diversas relações difíceis de descobrir através de métodos tradicionais como planilhas de cálculo e relatórios informativos operacionais. Desta forma, os sistemas de descoberta de conhecimento (Knowledge Discovery in Data Bases - KDD) surgem como uma possível solução para dessas relações extrair conhecimento que possa ser aplicado na tomada de decisão em organizações. Mesmo utilizando um KDD, tal atividade pode continuar sendo extremamente difícil devido à grande quantidade de dados que deve ser processada. Assim, nem todos os dados que compõem essas bases servem para um sistema descobrir conhecimento. Em geral, costuma-se pré-processar os dados antes de serem apresentados ao KDD, buscando reduzir a sua quantidade e também selecionar os dados mais relevantes que serão utilizados pelo sistema. Este trabalho propõe o desenvolvimento, aplicação e análise de uma Arquitetura Híbrida formada pela combinação da Teoria dos Rough Sets (Teoria dos Conjuntos Aproximados) com uma arquitetura de rede neural artificial denominada Mapas Auto-Organizáveis ou Self-Organizing Maps (SOM) para descoberta de conhecimento. O objetivo é verificar o desempenho da Arquitetura Híbrida proposta na geração de clusters (agrupamentos) em bases de dados. Em particular, alguns dos experimentos significativos foram feitos para apoiar a tomada de decisão em organizações. / Databases of the real world contain a huge amount of data within which several relations are hidden. These relations are difficult to discover by means of traditional methods such as worksheets and operational informative reports. Therefore, the knowledge discovery systems (KDD) appear as a possible solution to extract, from such relations, knowledge to be applied in decision taking. Even using a KDD system, such activity may still continue to be extremely difficult due to the huge amount of data to be processed. Thus, not all data which are part of this base will be useful for a system to discover knowledge. In general, data are usually previously processed before being presented to a knowledge discovery system in order to reduce their quantity and also to select the most relevant data to be used by the system. This research presents the development, application and analysis of an hybrid architecture formed by the combination of the Rough Sets Theory with an artificial neural net architecture named Self-Organizing Maps (SOM) to discover knowledge. The objective is to verify the performance of the hybrid architecture proposed in the generation of clusters in databases. In particular, some of the important performed experiments targeted the decision taking in organizations.
44

Integrando mineração de séries temporais e fractais para encontrar padrões e eventos extremos em bases de dados climáticas e de sensoriamento remoto / Integrating time series mining and fractals to discover patterns and extreme events in climate and remote sensing databases

Romani, Luciana Alvim Santos 13 December 2010 (has links)
Esta tese apresenta novos metodos baseados na teoria dos fractais e em tecnicas de mineração de dados para dar suporte ao monitoramento agrícola em escala regional, mais especicamente areas com plantações de cana-de-açucar que tem um papel importante na economia brasileira como uma alternativa viavel para a substituição de combustíveis fósseis. Uma vez que o clima tem um grande impacto na agricultura, os agrometeorologistas utilizam dados climáticos associados a índices agrometeorológicos e mais recentemente dados provenientes de satélites para apoiar a tomada de decisão. Neste sentido, foi proposto um método que utiliza a dimensão fractal para identicar mudanças de tendências nas séries climáticas juntamente com um módulo de análise estatística para definir quais atributos são responsáveis por essas alterações de comportamento. Além disso, foram propostos dois métodos de medidas de similaridade para auxiliar na comparação de diferentes regiões agrícolas representadas por múltiplas variáveis provenientes de dados meteorológicos e imagens de sensoriamento remoto. Diante da importância de se estudar os extremos climáticos que podem se intensicar dado os cenários que preveem mudanças globais no clima, foi proposto o algoritmo CLIPSMiner que identifica padrões relevantes e extremos em séries climáticas. CLIPSMiner também permite a identificação de correlação de múltiplas séries considerando defasagem de tempo e encontra padrões de acordo com parâmetros que podem ser calibrados pelos usuários. A busca por padrões de associação entre séries foi alcançada por meio de duas abordagens distintas. A primeira delas integrou o cálculo da correlação de dimensão fractal com uma técnica para tornar os valores contínuos das séries em intervalos discretos e um algoritmo de regras de associação gerando o método Apriori-FD. Embora tenha identificado padrões interessantes em relação a temperatura, este método não conseguiu lidar de forma apropriada com defasagem temporal. Foi proposto então o algoritmo CLEARMiner que de forma não-supervisionada minera padrões em uma série associando-os a padrões em outras séries considerando a possibilidade de defasagem temporal. Os métodos propostos foram comparados a técnicas similares e avaliados por um grupo composto por meteorologistas, agrometeorologistas e especialistas em sensoriamento remoto. Os experimentos realizados mostraram que a aplicação de técnicas de mineração de dados e fractais contribui para melhorar a análise dos dados agrometeorológicos e de satélite auxiliando no trabalho de pesquisadores, além de se configurar como uma ferramenta importante para apoiar a tomada de decisão no agronegócio / This thesis presents new methods based on fractal theory and data mining techniques to support agricultural monitoring in regional scale, specifically regions with sugar canefields. This commodity greatly contributes to the Brazilian economy since it is a viable alternative to replace fossil fuels. Since climate in uences the national agricultural production, researchers use climate data associated to agrometeorological indexes, and recently they also employed data from satellites to support decision making processes. In this context, we proposed a method that uses the fractal dimension to identify trend changes in climate series jointly with a statistical analysis module to define which attributes are responsible for the behavior alteration in the series. Moreover, we also proposed two methods of similarity measure to allow comparisons among different agricultural regions represented by multiples variables from meteorological data and remote sensing images. Given the importance of studying the extreme weather events, which could increase in intensity, duration and frequency according to different scenarios indicated by climate forecasting models, we proposed the CLIPSMiner algorithm to identify relevant patterns and extremes in climate series. CLIPSMiner also detects correlations among multiple time series considering time lag and finds patterns according to parameters, which can be calibrated by the users. We applied two distinct approaches in order to discover association patterns on time series. The first one is the Apriori-FD method that integrates an algorithm to perform attribute selection through applying the correlation fractal dimension, an algorithm of discretization to convert continuous values of series into discrete intervals, and a well-known association rules algorithm (Apriori). Although Apriori-FD has identified interesting patterns related to temperature, this method failed to appropriately deal with time lag. As a solution, we proposed CLEARMiner that is an unsupervised algorithm in order to mine the association patterns in one time series relating them to patterns in other series considering the possibility of time lag. The proposed methods were compared with similar techniques as well as assessed by a group of meteorologists, and specialists in agrometeorology and remote sensing. The experiments showed that applying data mining techniques and fractal theory can contribute to improve the analyses of agrometeorological and satellite data. These new techniques can aid researchers in their work on decision making and become important tools to support decision making in agribusiness
45

Knowledge Discovery In Microarray Data Of Bioinformatics

Kocabas, Fahri 01 June 2012 (has links) (PDF)
This thesis analyzes major microarray repositories and presents a metadata framework both to address the current issues and to promote the main operations such as knowledge discovery, sharing, integration, and exchange. The proposed framework is demonstrated in a case study on real data and can be used for other high throughput repositories in biomedical domain. Not only the number of microarray experimentation increases, but also the size and complexity of the results rise in response to biomedical inquiries. And, experiment results are significant when examined in a batch and placed in a biological context. There have been standardization initiatives on content, object model, exchange format, and ontology. However, they have proprietary information space. There are backlogs and the data cannot be exchanged among the repositories. There is a need for a format and data management standard at present.iv v We introduced a metadata framework to include metadata card and semantic nets to make the experiment results visible, understandable and usable. They are encoded in standard syntax encoding schemes and represented in XML/RDF. They can be integrated with other metadata cards, semantic nets and can be queried. They can be exchanged and shared. We demonstrated the performance and potential benefits with a case study on a microarray repository. This study does not replace any product on repositories. A metadata framework is required to manage such huge data. We state that the backlogs can be reduced, complex knowledge discovery queries and exchange of information can become possible with this metadata framework.
46

Structure Pattern Analysis Using Term Rewriting and Clustering Algorithm

Fu, Xuezheng 27 June 2007 (has links)
Biological data is accumulated at a fast pace. However, raw data are generally difficult to understand and not useful unless we unlock the information hidden in the data. Knowledge/information can be extracted as the patterns or features buried within the data. Thus data mining, aims at uncovering underlying rules, relationships, and patterns in data, has emerged as one of the most exciting fields in computational science. In this dissertation, we develop efficient approaches to the structure pattern analysis of RNA and protein three dimensional structures. The major techniques used in this work include term rewriting and clustering algorithms. Firstly, a new approach is designed to study the interaction of RNA secondary structures motifs using the concept of term rewriting. Secondly, an improved K-means clustering algorithm is proposed to estimate the number of clusters in data. A new distance descriptor is introduced for the appropriate representation of three dimensional structure segments of RNA and protein three dimensional structures. The experimental results show the improvements in the determination of the number of clusters in data, evaluation of RNA structure similarity, RNA structure database search, and better understanding of the protein sequence-structure correspondence.
47

Text Mining Biomedical Literature for Genomic Knowledge Discovery

Liu, Ying 20 July 2005 (has links)
The last decade has been marked by unprecedented growth in both the production of biomedical data and the amount of published literature discussing it. Almost every known or postulated piece of information pertaining to genes, proteins, and their role in biological processes is reported somewhere in the vast amount of published biomedical literature. We believe the ability to rapidly survey and analyze this literature and extract pertinent information constitutes a necessary step toward both the design and the interpretation of any large-scale experiment. Moreover, automated literature mining offers a yet untapped opportunity to integrate many fragments of information gathered by researchers from multiple fields of expertise into a complete picture exposing the interrelated roles of various genes, proteins, and chemical reactions in cells and organisms. In this thesis, we show that functional keywords in biomedical literature, particularly Medline, represent very valuable information and can be used to discover new genomic knowledge. To validate our claim we present an investigation into text mining biomedical literature to assist microarray data analysis, yeast gene function classification, and biomedical literature categorization. We conduct following studies: 1. We test sets of genes to discover common functional keywords among them and use these keywords to cluster them into groups; 2. We show that it is possible to link genes to diseases by an expert human interpretation of the functional keywords for the genes- none of these diseases are as yet mentioned in public databases; 3. By clustering genes based on commonality of functional keywords it is possible to group genes into meaningful clusters that reveal more information about their functions, link to diseases and roles in metabolism pathways; 4. Using extracted functional keywords, we are able to demonstrate that for yeast genes, we can make a better functional grouping of genes in comparison to available public microarray and phylogenetic databases; 5. We show an application of our approach to literature classification. Using functional keywords as features, we are able to extract epidemiological abstracts automatically from Medline with higher sensitivity and accuracy than a human expert.
48

Feature-Based Hierarchical Knowledge Engineering for Aircraft Life Cycle Design Decision Support

Zhao, Wei 09 April 2007 (has links)
The design process of aerospace systems is becoming more and more complex. As the process is progressively becoming enterprise-wide, it involves multiple vendors and encompasses the entire life-cycle of the system, as well as a system-of-systems perspective. The amount of data and information generated under this paradigm has increased exponentially creating a difficult situation as it pertains to data storage, management, and retrieval. Furthermore, the data themselves are not suitable or adequate for use in most cases and must be translated into knowledge with a proper level of abstraction. Adding to the problem is the fact that the knowledge discovery process needed to support the growth of data in aerospace systems design has not been developed to the appropriate level. In fact, important design decisions are often made without sufficient understanding of their overall impact on the aircraft's life, because the data have not been efficiently converted and interpreted in time to support design. In order to make the design process adapt to the life-cycle centric requirement, this thesis proposes a methodology to provide the necessary supporting knowledge for better design decision making. The primary contribution is the establishment of a knowledge engineering framework for design decision support to effectively discover knowledge from the existing data, and efficiently manage and present the knowledge throughout all phases of the aircraft life-cycle. The second contribution is the proposed methodology on the feature generation and exploration, which is used to improve the process of knowledge discovery process significantly. In addition, the proposed work demonstrates several multimedia-based approaches on knowledge presentation.
49

An Efficient Bitmap-Based Approach to Mining Sequential Patterns for Large Databases

Wu, Chien-Hui 29 July 2004 (has links)
The task of Data Mining is to find the useful information within the incredible sets of data. One of important research areas of Data Mining is Mining Sequential Patterns. For a transaction database, sequential pattern means that there are some relations between the items bought by customers in a period of time. If we can find these relations by mining sequential patterns, we can provide better selling strategy to gain more customers' attentions. However, since the transaction database contains a lot of data, and it will be scanned during the mining process again and again, to improve the running efficiency is an important topic. In the GSP algorithm proposed by Srikant and Agrawal, they use a complex data structure to store and generate candidates. The generated candidates satisfy a property, ``the subsets of a frequent itemset are also frequent'. The property leads to fewer number of candidates; however, it still spends too much time to counting candidates. In the SPAM algorithm proposed by Aryes et al., they use the bitwise operations to reduce the time for counting candidates. However, it generates too many candidates which will never become frequent itemsets, which decreases the efficiency. In this thesis, we proposed a new bitmap-based algorithm. By modifying the way to generate candidates in the GSP algorithm and applying the bitwise operations in the SPAM algorithm, the proposed algorithm can mine sequential patterns efficiently. That is, we use the similar candidate generation method presented in the GSP algorithm to reduce the number of candidates and the similar counting method proposed in the SPAM algorithm to reduce the time of counting candidates. In the proposed algorithm, we classify the itemsets into two cases, simultaneous occurrence (noted as AB) and sequential occurrence (noted as A-> B). In the case of simultaneous occurrence, the number of candidate is C(n,k) based on the exhausted method. In order to prevent too many candidates generated, we make use of the property, ``the subsets of a frequent itemset are also frequent', to reduce the number of candidates from C(n,k) to C(y,k), k <= y < n. In the case of sequential occurrence, the candidates are generated by using a special join operation which could combine, for example, A->B and B->C to A->B->C. Moreover, we have to consider two other cases: (1) combing A->B and A->C to A->BC; (2) combing A->C and B->C to AB->C. The method of counting candidates is similar to the SPAM algorithm (i.e., bitwise operations). From our simulation results, based on the same bit representation for the transaction database, we show that our proposed algorithm could provide better performance than the SPAM algorithm in terms of the processing time, since our algorithm could generate fewer number of candidates than the SPAM algorithm.
50

Supporting Data Warehouse Design with Data Mining Approach

Tsai, Tzu-Chao 06 August 2001 (has links)
Traditional relational database model does not have enough capability to cope with a great deal of data in finite time. To address these requirements, data warehouses and online analytical processing (OLAP) have emerged. Data warehouses improve the productivity of corporate decision makers through consolidation, conversion, transformation, and integration of operational data, and supports online analytical processing (OLAP). The data warehouse design is a complex and knowledge intensive process. It needs to consider not only the structure of the underlying operational databases (source-driven), but also the information requirements of decision makers (user-driven). Past research focused predominately on supporting the source-driven data warehouse design process, but paid less attention to supporting the user-driven data warehouse design process. Thus, the goal of this research is to propose a user-driven data warehouse design support system based on the knowledge discovery approach. Specifically, a Data Warehouse Design Support System was proposed and the generalization hierarchy and generalized star schemas were used as the data warehouse design knowledge. The technique for learning these design knowledge and reasoning upon them were developed. An empirical evaluation study was conducted to validate the effectiveness on the proposed techniques in supporting data warehouse design process. The result of empirical evaluation showed that this technique was useful to support data warehouse design especially on reducing the missing design and enhancing the potentially useful design.

Page generated in 0.0664 seconds