Global ETD Search

1	Empirické porovnání systémů dobývání znalostí z databází / Empirical Comparison of Knowledge Discovery in Databases Systems Dopitová, Kateřina January 2010 (has links) Submitted diploma thesis considers empirical comparison of knowledge discovery in databases systems. Basic terms and methods of knowledge discovery in databases domain are defined and criterions used to system comparison are determined. Tested software products are also shortly described in the thesis. Results of real task processing are brought out for each system. The comparison of individual systems according to previously determined criterions and comparison of competitiveness of commercial and non-commercial knowledge discovery in databases systems are performed within the framework of thesis.
2	Minerador WEB: um estudo sobre mecanismos de descoberta de informações na WEB. / Minerador WEB: a study on mechanisms of discovery of information in the WEB. Toscano, Wagner 10 July 2003 (has links) A Web (WWW - World Wide Web) possui uma grande quantidade e variedade de informações. Isso representa um grande atrativo para que as pessoas busquem alguma informação desejada na Web. Por outo lado, dessa grande quantidade de informações resulta o problema fundamental de como descobrir, de uma maneira eficaz, se a informação desejada está presente na Web e como chegar até ela. A existência de um conjunto de informações que não se permitem acessar com facilidade ou que o acesso é desprovido de ferramentas eficazes de busca da informção, inviabiliza sua utilização. Soma-se às dificuldades no processo de pesquisa, a falta de estrutura das informações da Web que dificulta a aplicação de processos na busca da informação. Neste trabalho é apresentado um estudo de técnicas alternativas de busca da informação, pela aplicação de diversos conceitos relacionados à recuperação da informação e à representação do conhecimento. Mais especificamente, os objetivos são analisar a eficiência resultante da utilização de técnicas complementares de busca da informação, em particular mecanismos de extração de informações a partir de trechos explícitos nos documentos HTML e o uso do método de Naive Bayes na classificação de sites, e analisar a eficácia de um processo de armazenamento de informações extraídas da Web numa base de conhecimento (descrita em lógica de primeira ordem) que, aliada a um conhecimento de fundo, permita respomder a consultas mais complexas que as possíveis por meio do uso de expressões baseadas em palavras-chave e conectivos lógicos. / The World Wide Web (Web) has a huge amount and a large diversity of informations. There is a big appeal to people navigate on the Web to search for a desired information. On the other hand, due to this huge amount of data, we are faced with the fundamental problems of how to discover and how to reach the desired information in a efficient way. If there is no efficient mechanisms to find informations, the use of the Web as a useful source of information becomes very restrictive. Another important problem to overcome is the lack of a regular structure of the information in the Web, making difficult the use of usual information search methods. In this work it is presented a study of alternative techniques for information search. Several concepts of information retrieval and knowledge representation are applied. A primary goal is to analyse the efficiency of information retrieval methods using analysis of extensional information and probabilistic methods like Naive Bayes to classify sites among a pre-defined classes of sites.Another goal is to design a logic based knowledhe base, in order to enable a user to apply more complex queries than queries based simply on expressions using keywouds and logical connectives artificial intelligence computação e sistemas digitais descoberta do conhecimento discovery of the knowledge extração de informação extraction of information inteligência artificial ontologia ontology representação do conhecimento representation of the knowledge WEB mining web mining
3	Minerador WEB: um estudo sobre mecanismos de descoberta de informações na WEB. / Minerador WEB: a study on mechanisms of discovery of information in the WEB. Wagner Toscano 10 July 2003 (has links) A Web (WWW - World Wide Web) possui uma grande quantidade e variedade de informações. Isso representa um grande atrativo para que as pessoas busquem alguma informação desejada na Web. Por outo lado, dessa grande quantidade de informações resulta o problema fundamental de como descobrir, de uma maneira eficaz, se a informação desejada está presente na Web e como chegar até ela. A existência de um conjunto de informações que não se permitem acessar com facilidade ou que o acesso é desprovido de ferramentas eficazes de busca da informção, inviabiliza sua utilização. Soma-se às dificuldades no processo de pesquisa, a falta de estrutura das informações da Web que dificulta a aplicação de processos na busca da informação. Neste trabalho é apresentado um estudo de técnicas alternativas de busca da informação, pela aplicação de diversos conceitos relacionados à recuperação da informação e à representação do conhecimento. Mais especificamente, os objetivos são analisar a eficiência resultante da utilização de técnicas complementares de busca da informação, em particular mecanismos de extração de informações a partir de trechos explícitos nos documentos HTML e o uso do método de Naive Bayes na classificação de sites, e analisar a eficácia de um processo de armazenamento de informações extraídas da Web numa base de conhecimento (descrita em lógica de primeira ordem) que, aliada a um conhecimento de fundo, permita respomder a consultas mais complexas que as possíveis por meio do uso de expressões baseadas em palavras-chave e conectivos lógicos. / The World Wide Web (Web) has a huge amount and a large diversity of informations. There is a big appeal to people navigate on the Web to search for a desired information. On the other hand, due to this huge amount of data, we are faced with the fundamental problems of how to discover and how to reach the desired information in a efficient way. If there is no efficient mechanisms to find informations, the use of the Web as a useful source of information becomes very restrictive. Another important problem to overcome is the lack of a regular structure of the information in the Web, making difficult the use of usual information search methods. In this work it is presented a study of alternative techniques for information search. Several concepts of information retrieval and knowledge representation are applied. A primary goal is to analyse the efficiency of information retrieval methods using analysis of extensional information and probabilistic methods like Naive Bayes to classify sites among a pre-defined classes of sites.Another goal is to design a logic based knowledhe base, in order to enable a user to apply more complex queries than queries based simply on expressions using keywouds and logical connectives computação e sistemas digitais descoberta do conhecimento extração de informação inteligência artificial ontologia representação do conhecimento web mining artificial intelligence discovery of the knowledge extraction of information ontology representation of the knowledge WEB mining
4	Accelerating AI-driven scientific discovery with end-to-end learning and random projection Md Nasim (19471057) 23 August 2024 (has links) <p dir="ltr">Scientific discovery of new knowledge from data can enhance our understanding of the physical world and lead to the innovation of new technologies. AI-driven methods can greatly accelerate scientific discovery and are essential for analyzing and identifying patterns in huge volumes of experimental data. However, current AI-driven scientific discovery pipeline suffers from several inefficiencies including but not limited to lack of <b>precise modeling</b>, lack of <b>efficient learning methods</b>, and lack of <b>human-in-the-loop integrated frameworks</b> in the scientific discovery loop. Such inefficiencies increase resource requirements such as expensive computing infrastructures, significant human expert efforts and subsequently slows down scientific discovery.</p><p dir="ltr">In this thesis, I introduce a collection of methods to address the lack of precise modeling, lack of efficient learning methods and lack of human-in-the-loop integrated frameworks in AI-driven scientific discovery workflow. These methods include automatic physics model learning from partially annotated noisy video data, accelerated partial differential equation (PDE) physics model learning, and an integrated AI-driven platform for rapid analysis of experimental video data. <b>My research has led to the discovery of a new size fluctuation property of material defects</b> exposed to high temperature and high irradiation environments such as inside nuclear reactors. Such discovery is essential for designing strong materials that are critical for energy applications.</p><p dir="ltr">To address the lack of precise modeling of physics learning tasks, I developed NeuraDiff, an end-to-end method for learning phase field physics models from noisy video data. In previous learning approaches involving multiple disjoint steps, errors in one step can propagate to another, thus affecting the accuracy of the learned physics models. Trial-and-error simulation methods for learning physics model parameters are inefficient, heavily dependent on expert intuition and may not yield reasonably accurate physics models even after many trial iterations. By encoding the physics model equations directly into learning, end-to-end NeuraDiff framework can provide <b>~100%</b> accurate tracking of material defects and yield correct physics model parameters. </p><p dir="ltr">To address the lack of efficient methods for PDE physics model learning, I developed Rapid-PDE and Reel. The key idea behind these methods is the random projection based compression of system change signals which are sparse in - either value domain (Rapid-PDE) or, both value and frequency domain (Reel). Experiments show that PDE model training times can be reduced significantly using our Rapid-PDE (<b>50-70%)</b> and Reel (<b>70-98%</b>) methods. </p><p dir="ltr">To address the lack of human-in-the-loop integrated frameworks for high volume experimental data analysis, I developed an integrated framework with an easy-to-use annotation tool. Our interactive AI-driven annotation tool can reduce video annotation times by <b>50-75%</b>, and enables material scientists to scale up the analysis of experimental videos.</p><p dir="ltr"><b>Our framework for analyzing experimental data has been deployed in the real world</b> for scaling up in-situ irradiation experiment video analysis and has played a crucial role in the discovery of size fluctuation of material defects under extreme heat and irradiation. </p> Computer vision Phase field models machine Learning Accelerates Discovery computer vision classification Fourier transform image
5	[en] TEXT CATEGORIZATION: CASE STUDY: PATENT S APPLICATION DOCUMENTS IN PORTUGUESE / [pt] CATEGORIZAÇÃO DE TEXTOS: ESTUDO DE CASO: DOCUMENTOS DE PEDIDOS DE PATENTE NO IDIOMA PORTUGUÊS NEIDE DE OLIVEIRA GOMES 08 January 2015 (has links) [pt] Atualmente os categorizadores de textos construídos por técnicas de aprendizagem de máquina têm alcançado bons resultados, tornando viável a categorização automática de textos. A proposição desse estudo foi a definição de vários modelos direcionados à categorização de pedidos de patente, no idioma português. Para esse ambiente foi proposto um comitê composto de 6 (seis) modelos, onde foram usadas várias técnicas. A base de dados foi constituída de 1157 (hum mil cento e cinquenta e sete) resumos de pedidos de patente, depositados no INPI, por depositantes nacionais, distribuídos em várias categorias. Dentre os vários modelos propostos para a etapa de processamento da categorização de textos, destacamos o desenvolvido para o Método 01, ou seja, o k-Nearest-Neighbor (k-NN), modelo também usado no ambiente de patentes, para o idioma inglês. Para os outros modelos, foram selecionados métodos que não os tradicionais para ambiente de patentes. Para quatro modelos, optou-se por algoritmos, onde as categorias são representadas por vetores centróides. Para um dos modelos, foi explorada a técnica do High Order Bit junto com o algoritmo k- NN, sendo o k todos os documentos de treinamento. Para a etapa de préprocessamento foram implementadas duas técnicas: os algoritmos de stemização de Porter; e o StemmerPortuguese; ambos com modificações do original. Foram também utilizados na etapa do pré-processamento: a retirada de stopwords; e o tratamento dos termos compostos. Para a etapa de indexação foi utilizada principalmente a técnica de pesagem dos termos intitulada: frequência de termos modificada versus frequência de documentos inversa TF -IDF . Para as medidas de similaridade ou medidas de distância destacamos: cosseno; Jaccard; DICE; Medida de Similaridade; HOB. Para a obtenção dos resultados foram usadas as técnicas de predição da relevância e do rank. Dos métodos implementados nesse trabalho, destacamos o k-NN tradicional, o qual apresentou bons resultados embora demande muito tempo computacional. / [en] Nowadays, the text s categorizers constructed based on learning techniques, had obtained good results and the automatic text categorization became viable. The purpose of this study was the definition of various models directed to text categorization of patent s application in Portuguese language. For this environment was proposed a committee composed of 6 (six) models, where were used various techniques. The text base was constituted of 1157 (one thousand one hundred fifty seven) abstracts of patent s applications, deposited in INPI, by national applicants, distributed in various categories. Among the various models proposed for the step of text categorization s processing, we emphasized the one devellopped for the 01 Method, the k-Nearest-Neighbor (k-NN), model also used in the English language patent s categorization environment. For the others models were selected methods, that are not traditional in the English language patent s environment. For four models, there were chosen for the algorithms, centroid vectors representing the categories. For one of the models, was explored the High Order Bit technique together with the k-NN algorithm, being the k all the training documents. For the pre-processing step, there were implemented two techniques: the Porter s stemization algorithm; and the StemmerPortuguese algorithm; both with modifications of the original. There were also used in the pre-processing step: the removal of the stopwards; and the treatment of the compound terms. For the indexing step there was used specially the modified documents term frequency versus documents term inverse frequency TF-IDF . For the similarity or distance measures there were used: cosine; Jaccard; DICE; Similarity Measure; HOB. For the results, there were used the relevance and the rank technique. Among the methods implemented in this work it was emphasized the traditional k-NN, which had obtained good results, although demands much computational time. [pt] CATEGORIZACAO DE TEXTOS [en] TEXT CATEGORIZATION [pt] CLASSIFICACAO DE TEXTOS [en] TEXT CLASSIFICATION [pt] STEMIZACAO [en] STEMMING [en] CENTROID OR PROTOTYPE ALGORITHM

1

Page generated in 0.0731 seconds