Global ETD Search

481	Minerafórum : um recurso de apoio para análise qualitativa em fóruns de discussão Azevedo, Breno Fabrício Terra January 2011 (has links) Esta tese aborda o desenvolvimento, uso e experimentação do MineraFórum. Trata-se de um recurso para auxiliar o professor na análise qualitativa das contribuições textuais registradas por alunos em fóruns de discussão. A abordagem desta pesquisa envolveu técnicas de mineração de textos utilizando grafos. As interações proporcionadas pelas trocas de mensagens em um fórum de discussão representam uma importante fonte de investigação para o professor. A partir da análise das postagens, o docente pode identificar quais alunos redigiram contribuições textuais que contemplam conceitos relativos ao tema da discussão, e quais discentes não o fizeram. Desta forma, é possível ter subsídios para motivar a discussão dos conceitos importantes que fazem parte do tema em debate. Para atingir o objetivo do presente estudo, foi necessário realizar uma revisão da literatura onde foram abordados temas como: a Educação a Distância (EAD); Ambientes Virtuais de Aprendizagem; os principais conceitos da área de Mineração de Textos e, por último, trabalhos correlacionados a esta tese. A estratégia metodológica utilizada no processo de desenvolvimento do MineraFórum envolveu uma série de etapas: 1) a escolha de uma técnica de mineração de textos adequada às necessidades da pesquisa; 2) verificação da existência de algum software de mineração de textos que auxiliasse o professor a analisar qualitativamente as contribuições em um fórum de discussão; 3) realização de estudos preliminares para avaliar a técnica de mineração escolhida; 4) definição dos indicadores de relevância das mensagens; elaboração de fórmulas para calcular a relevância das postagens; 5) construção do sistema; 6) integração do MineraFórum a três Ambientes Virtuais de Aprendizagem e, por último, 7) a realização de experimentos com a ferramenta. / This thesis presents the development, use and experimentation of the MineraFórum software. It is a resource that can help teachers in doing qualitative analyses of text contributions in discussion forums. This research included the use of text mining techniques with graphs. Message exchange in discussion forums are an important source of investigation for teachers. By analyzing students’ posts, teachers can identify which learners wrote contributions that have concepts related to the debate theme, and which students did not succeed to do so. This strategy may also give teachers the necessary elements to motivate discussion of concepts relevant to the topic being debated. To accomplish the objectives of this study, a review of the literature was carried on topics such as: Distance Learning; Virtual Learning Environments; main concepts in Text Mining; and studies related to this thesis. The methodological strategy used in the development of MineraFórum followed these steps: 1) choosing a text mining technique suitable to the needs of the research; 2) checking whether there was software available to help teachers to do qualitative analysis of contributions in discussion forums; 3) doing preliminary studies to evaluate the selected mining technique; 4) defining indicators of relevance in the messages; elaborating formulas to calculate relevance in posts; 5) building the system; 6) integrating MineraFórum to three Virtual Learning Environments, and 7) carrying experiments with the tool. Computador na educação Fórum de discussão Ambiente virtual Ambiente de aprendizagem Análise de dados Text mining Discussion forum Qualitative analysis Thematic relevance Virtual learning environments
482	Visual analytics of arsenic in various foods Johnson, Matilda Olubunmi 06 1900 (has links) Arsenic is a naturally occurring toxic metal and its presence in food composites could be a potential risk to the health of both humans and animals. Arseniccontaminated groundwater is often used for food and animal consumption, irrigation of soils, which could potentially lead to arsenic entering the human food chain. Its side effects include multiple organ damage, cancers, heart disease, diabetes mellitus, hypertension, lung disease and peripheral vascular disease. Research investigations, epidemiologic surveys and total diet studies (market baskets) provide datasets, information and knowledge on arsenic content in foods. The determination of the concentration of arsenic in rice varieties is an active area of research. With the increasing capability to measure the concentration of arsenic in foods, there are volumes of varied and continuously generated datasets on arsenic in food groups. Visual analytics, which integrates techniques from information visualization and computational data analysis via interactive visual interfaces, presents an approach to enable data on arsenic concentrations to be visually represented. The goal of this doctoral research in Environmental Science is to address the need to provide visual analytical decision support tools on arsenic content in various foods with special emphasis on rice. The hypothesis of this doctoral thesis research is that software enabled visual representation and user interaction facilitated by visual interfaces will help discover hidden relationships between arsenic content and food categories. The specific objectives investigated were: (1) Provide insightful visual analytic views of compiled data on arsenic in food categories; (2) Categorize table ready foods by arsenic content; (3) Compare arsenic content in rice product categories and (4) Identify informative sentences on arsenic concentrations in rice. The overall research method is secondary data analyses using visual analytics techniques implemented through Tableau Software. Several datasets were utilized to conduct visual analytical representations of data on arsenic concentrations in foods. These consisted of (i) arsenic concentrations in 459 crop samples; (ii) arsenic concentrations in 328 table ready foods from multi-year total diet studies; (iii) estimates of daily inorganic arsenic intake for 49 food groups from multicountry total diet studies; (iv) arsenic content in rice product categories for 193 samples of rice and rice products; (v) 758 sentences extracted from PubMed abstracts on arsenic in rice. Several key insights were made in this doctoral research. The concentration of inorganic arsenic in instant rice was lower than those of other rice types. The concentration of Dimethylarsinic Acid (DMA) in wild rice, an aquatic grass, was notably lower than rice varieties (e.g. 0.0099 ppm versus 0.182 for a long grain white rice). The categorization of 328 table ready foods into 12 categories enhances the communication on arsenic concentrations. Outlier concentration of arsenic in rice were observed in views constructed for integrating data from four total diet studies. The 193 rice samples were grouped into two groups using a cut-off level of 3 mcg of inorganic arsenic per serving. The visual analytics views constructed allow users to specify cut-off levels desired. A total of 86 sentences from 53 PubMed abstracts were identified as informative for arsenic concentrations. The sentences enabled literature curation for arsenic concentration and additional supporting information such as location of the research. An informative sentence provided global “normal” range of 0.08 to 0.20 mg/kg for arsenic in rice. A visual analytics resource developed was a dashboard that facilitates the interaction with text and a connection to the knowledge base of the PubMed literature database. The research reported provides a foundation for additional investigations on visual analytics of data on arsenic concentrations in foods. Considering the massive and complex data associated with contaminants in foods, the development of visual analytics tools are needed to facilitate diverse human cognitive tasks. Visual analytics tools can provide integrated automated analysis; interaction with data; and data visualization critically needed to enhance decision making. Stakeholders that would benefit include consumers; food and health safety personnel; farmers; and food producers. Arsenic content of baby foods warrants attention because of the early life exposures that could have life time adverse health consequences. The action of microorganisms in the soil is associated with availability of arsenic species for uptake by plants. Genomic data on microbial communities presents wealth of data to identify mitigation strategies for arsenic uptake by plants. Arsenic metabolism pathways encoded in microbial genomes warrants further research. Visual analytics tasks could facilitate the discovery of biological processes for mitigating arsenic uptake from soil. The increasing availability of central resources on data from total diet studies and research investigations presents a need for personnel with diverse levels of skills in data management and analysis. Training workshops and courses on the foundations and applications of visual analytics can contribute to global workforce development in food safety and environmental health. Research investigations could determine learning gains accomplished through hardware and software for visual analytics. Finally, there is need to develop and evaluate informatics tools that have visual analytics capabilities in the domain of contaminants in foods. / Environmental Sciences / P. Phil. (Environmental Science) Arsenic Dietary Cancer Foods Rice Text mining Visual analytics 664.07 Arsenic--Environmental aspects Arsenic--Physiological effect Food--Analysis Food contamination Arsenic--Carcinogenicity Visual analytics
483	Epistemologia da Informática em Saúde: entre a teoria e a prática / Epistemology of Medical Informatics: between theory and practice Colepícolo, Eliane [UNIFESP] 26 March 2008 (has links) (PDF) Made available in DSpace on 2015-07-22T20:50:02Z (GMT). No. of bitstreams: 0 Previous issue date: 2008-03-26 / Epistemologia da Informática em Saúde: entre a teoria e a prática. Eliane Colepí-colo. 2008. CONTEXTO. O objetivo dessa pesquisa é compreender a epistemologia da área de Informática em Saúde (IS) por meio de um estudo comparativo entre aspectos teóricos e práticos desta disciplina. MATERIAIS E MÉTODOS. O estudo foi dividido em 3 eta-pas: estudo estatístico, estudo terminológico e estudo epistemológico. O estudo esta-tístico envolveu o desenvolvimento e uso de robô para extração de metadados de arti-gos científicos da base PubMed, assim como a mineração de textos destes resumos de artigos, utilizados para estatísticas e análise posterior. O estudo terminológico visou o desenvolvimento de um tesauro especializado em IS, aqui denominado EpistemIS, que, integrado ao MeSH, serviu como base ao estudo estatístico. O estudo epistemo-lógico começou com o estudo dos metaconceitos da ação e pensamento humanos (MAPHs), que são arte, técnica, ciência, tecnologia e tecnociência. A seguir, realizou-se o desenvolvimento de um método epistemológico, baseado nas obras de Mário Bunge, para classificação epistemológica de conceitos da área provenientes do tesau-ro EpistemIS. Uma pesquisa de opinião com a comunidade científica da área foi reali-zada por meio de questionário na web. RESULTADOS. Obteve-se: uma caracteriza-ção dos MAPHs, mapas de sistematização do conhecimento em IS, classificações epistemológica e em MAPHs da IS, um mapa do conhecimento em IS e o consenso da comunidade sobre a epistemologia da IS. Por fim, foram calculadas estatísticas relati-vas: às classificações epistemológica e em MAPHs em IS, à integração entre o corpus de análise (437.289 artigos PubMed) e o tesauro EpistemIS. CONCLUSÃO. A partir de argumentos teóricos e práticos concluiu-se que a Informática em Saúde é uma tecno-ciência que se ocupa de solucionar problemas relativos aos domínios das Ciências da Vida, Ciências da Saúde e do Cuidado em Saúde, por meio da pesquisa científica in-terdisciplinar e do desenvolvimento de tecnologia para uso na sociedade. / TEDE Epistemologia Estatística Medical Subject Headings (MeSH) Mineração de textos Terminologia Tesauro Informática em saúde Epistemology Statistics Medical Subject Headings (MeSH) Text Mining Terminology Thesaurus Health informatics
484	Analyse des médias sociaux de santé pour évaluer la qualité de vie des patientes atteintes d’un cancer du sein / Analysis of social health media to assess the quality of life of breast cancer patients Tapi Nzali, Mike Donald 28 September 2017 (has links) En 2015, le nombre de nouveaux cas de cancer du sein en France s'élève à 54 000. Le taux de survie 5 ans après le diagnostic est de 89 %. Si les traitements modernes permettent de sauver des vies, certains sont difficiles à supporter. De nombreux projets de recherche clinique se sont donc focalisés sur la qualité de vie (QdV) qui fait référence à la perception que les patients ont de leurs maladies et de leurs traitements. La QdV est un critère d'évaluation clinique pertinent pour évaluer les avantages et les inconvénients des traitements que ce soit pour le patient ou pour le système de santé. Dans cette thèse, nous nous intéresserons aux histoires racontées par les patients dans les médias sociaux à propos de leur santé, pour mieux comprendre leur perception de la QdV. Ce nouveau mode de communication est très prisé des patients car associé à une grande liberté du discours due notamment à l'anonymat fourni par ces sites.L’originalité de cette thèse est d’utiliser et d'étendre des méthodes de fouille de données issues des médias sociaux pour la langue Française. Les contributions de ce travail sont les suivantes : (1) construction d’un vocabulaire patient/médecin ; (2) détection des thèmes discutés par les patients; (3) analyse des sentiments des messages postés par les patients et (4) mise en relation des différentes contributions citées.Dans un premier temps, nous avons utilisé les textes des patients pour construire un vocabulaire patient/médecin spécifique au domaine du cancer du sein, en recueillant divers types d'expressions non-expertes liées à la maladie, puis en les liant à des termes biomédicaux utilisés par les professionnels de la santé. Nous avons combiné plusieurs méthodes de la littérature basées sur des approches linguistiques et statistiques. Pour évaluer les relations obtenues, nous utilisons des validations automatiques et manuelles. Nous avons ensuite transformé la ressource construite dans un format lisible par l’être humain et par l’ordinateur en créant une ontologie SKOS, laquelle a été intégrée dans la plateforme BioPortal.Dans un deuxième temps, nous avons utilisé et étendu des méthodes de la littérature afin de détecter les différents thèmes discutés par les patients dans les médias sociaux et de les relier aux dimensions fonctionnelles et symptomatiques des auto-questionnaires de QdV (EORTC QLQ-C30 et EORTC QLQ-BR23). Afin de détecter les thèmes, nous avons appliqué le modèle d’apprentissage non supervisé LDA avec des prétraitements pertinents. Ensuite, nous avons proposé une méthode permettant de calculer automatiquement la similarité entre les thèmes détectés et les items des auto-questionnaires de QdV. Nous avons ainsi déterminé de nouveaux thèmes complémentaires à ceux déjà présents dans les questionnaires. Ce travail a ainsi mis en évidence que les données provenant des forums de santé sont susceptibles d'être utilisées pour mener une étude complémentaire de la QdV.Dans un troisième temps, nous nous sommes focalisés sur l’extraction de sentiments (polarité et émotions). Pour cela, nous avons évalué différentes méthodes et ressources pour la classification de sentiments en Français. Ces expérimentations ont permis de déterminer les caractéristiques utiles dans la classification de sentiments pour différents types de textes, y compris les textes provenant des forums de santé. Finalement, nous avons utilisé les différentes méthodes proposées dans cette thèse pour quantifier les thèmes et les sentiments identifiés dans les médias sociaux de santé.De manière générale, ces travaux ont ouvert des perspectives prometteuses sur diverses tâches d'analyse des médias sociaux pour la langue française et en particulier pour étudier la QdV des patients à partir des forums de santé. / In 2015, the number of new cases of breast cancer in France is 54,000.The survival rate after 5 years of cancer diagnosis is 89%.If the modern treatments allow to save lives, some are difficult to bear. Many clinical research projects have therefore focused on quality of life (QoL), which refers to the perception that patients have on their diseases and their treatments.QoL is an evaluation method of alternative clinical criterion for assessing the advantages and disadvantages of treatments for the patient and the health system. In this thesis, we will focus on the patients stories in social media dealing with their health. The aim is to better understand their perception of QoL. This new mode of communication is very popular among patients because it is associated with a great freedom of speech, induced by the anonymity provided by these websites.The originality of this thesis is to use and extend social media mining methods for the French language. The main contributions of this work are: (1) construction of a patient/doctor vocabulary; (2) detection of topics discussed by patients; (3) analysis of the feelings of messages posted by patients and (4) combinaison of the different contributions to quantify patients discourse.Firstly, we used the patient's texts to construct a patient/doctor vocabulary, specific to the field of breast cancer, by collecting various types of non-experts' expressions related to the disease, linking them to the biomedical terms used by health care professionals. We combined several methods of the literature based on linguistic and statistical approaches. To evaluate the relationships, we used automatic and manual validations. Then, we transformed the constructed resource into human-readable format and machine-readable format by creating a SKOS ontology, which is integrated into the BioPortal platform.Secondly, we used and extended literature methods to detect the different topics discussed by patients in social media and to relate them to the functional and symptomatic dimensions of the QoL questionnaires (EORTC QLQ-C30 and EORTC QLQ-BR23). In order to detect the topics discussed by patients, we applied the unsupervised learning LDA model with relevant preprocessing. Then, we applied a customized Jaccard coefficient to automatically compute the similarity distance between the topics detected with LDA and the items in the auto-questionnaires. Thus, we detected new emerging topics from social media that could be used to complete actual QoL questionnaires. This work confirms that social media can be an important source of information for the study of the QoL in the field of cancer.Thirdly, we focused on the extraction of sentiments (polarity and emotions). For this, we evaluated different methods and resources for the classification of feelings in French.These experiments aim to determine useful characteristics in the classification of feelings for different types of texts, including texts from health forums.Finally, we used the different methods proposed in this thesis to quantify the topics and feelings identified in the health social media.In general, this work has opened promising perspectives on various tasks of social media analysis for the French language and in particular the study of the QoL of patients from the health forums. Cancer du sein Qualité de vie Extraction d'information Fouille de textes Analyse des sentiments Détection des thèmes Breast cancer Quality of life Information retrieval Text mining Sentiment analysis Topic detection
485	Seleção de atributos para classificação de textos usando técnicas baseadas em agrupamento, PoS tagging e algoritmos evolutivos Ferreira, Charles Henrique Porto January 2016 (has links) Orientadora: Profa. Dra. Debora Maria Rossi de Medeiros / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Ciência da Computação, 2016. / Neste trabalho são investigadas técnicas de seleção de atributos para serem aplicadas à tarefa de classificação de textos. Três técnicas diferentes são propostas para serem comparadas com as técnicas tradicionais de pré-processamento de textos. A primeira técnica propõe que nem todas as classes gramaticais de um dado idioma sejam relevantes em um texto quando este é submetido à tarefa de classificação. A segunda técnica emprega agrupamento de atributos e algoritmos genéticos para seleção de grupos. Na terceira técnica são levantadas 2 hipóteses: a primeira supõe que as palavras que ocorrem com mais frequência em uma base de textos do que no idioma como um todo, podem ser as palavras mais importantes para comporem os atributos; a segunda presume que as relações de cada instância de dados com cada classe pode compor um novo conjunto de atributos. Os resultados obtidos sugerem que as abordagens propostas são promissoras e que as hipóteses levantadas podem ser válidas. Os experimentos com a primeira abordagem mostram que existe um conjunto de classes gramaticais cujas palavras podem ser desconsideradas do conjunto de atributos em bases de textos diferentes mantendo ou até melhorando a acurácia de classificação. A segunda abordagem consegue realizar uma forte redução no número de atributos original e ainda melhorar a acurácia de classificação. Com a terceira abordagem, foi obtida a redução mais acentuada do número de atributos pois, pela natureza da proposta, o número final de atributos é igual ao número de classes da base, e o impacto na acurácia foi nulo ou até positivo. / This work investigates feature selection techniques to be applied to text classification task. Three different techniques are proposed to be compared with the traditional techniques of preprocessing texts. The first technique proposed that not all grammatical classes of a given language in a text are relevant when it is subjected to the classification task. The second technique employs clustering features and genetic algorithms for selecting groups. In the third technique are raised two hypotheses: the first assumes that the words that occur most often on the dataset than the language as a whole, may be the most important words to compose the features; the second assumes that the relationship of each data instance with each class can compose a new set of attributes. The results suggest that the proposed approaches are promising and that the hypotheses may be valid. The experiments show that the first approach is a set of grammatical word classes which can be disregarded from the set of features from different datasets maintaining or even improving the accuracy of classification. The second approach can achieve a significant reduction in the number of unique features and to improve the accuracy of classification. With the third approach, it obtained the more pronounced reduction in the number of features because, by the nature of the proposal, the final number offeatures is equal to the number of classes of the dataset, and the impact on the accuracy was zero or even positive. Seleção de Atributos PROCESSAMENTO DE LINGUAGEM NATURAL CLASSIFICAÇÃO DE TEXTOS FEATURE SELECTION NATURAL LANGUAGE PROCESSING TEXT MINING
486	De l'extraction des connaissances à la recommandation / From knowledge extraction to recommendation Duthil, Benjamin 03 December 2012 (has links) Les technologies de l'information et le succès des services associés (forums, sites spécialisés, etc) ont ouvert la voie à un mode d'expression massive d'opinions sur les sujets les plus variés (e-commerce, critiques artistiques, etc). Cette profusion d'opinions constitue un véritable eldorado pour l'internaute, mais peut rapidement le conduire à une situation d'indécision car les avis déposés peuvent être fortement disparates voire contradictoires. Pour une gestion fiable et pertinente de l'information contenue dans ces avis, il est nécessaire de mettre en place des systèmes capables de traiter directement les opinions exprimées en langage naturel afin d'en contrôler la subjectivité et de gommer les effets de lissage des traitements statistiques. La plupart des systèmes dits de recommandation ne prennent pas en compte toute la richesse sémantique des critiques et leur associent souvent des systèmes d'évaluation qui nécessitent une implication conséquente et des compétences particulières chez l'internaute. Notre objectif est de minimiser l'intervention humaine dans le fonctionnement collaboratif des systèmes de recommandation en automatisant l'exploitation des données brutes que constituent les avis en langage naturel. Notre approche non supervisée de segmentation thématique extrait les sujets d'intérêt des critiques, puis notre technique d'analyse de sentiments calcule l'opinion exprimée sur ces critères. Ces méthodes d'extraction de connaissances combinées à des outils d'analyse multicritère adaptés à la fusion d'avis d'experts ouvrent la voie à des systèmes de recommandation pertinents, fiables et personnalisés. / Information Technology and the success of its related services (blogs, forums, etc.) have paved the way for a massive mode of opinion expression on the most varied subjects (e-commerce websites, art reviews, etc). This abundance of opinions could appear as a real gold mine for internet users, but it can also be a source of indecision because available opinions may be ill-assorted if not contradictory. A reliable and relevant information management of opinions bases requires systems able to directly analyze the content of opinions expressed in natural language. It allows controlling subjectivity in evaluation process and avoiding smoothing effects of statistical treatments. Most of the so-called recommender systems are unable to manage all the semantic richness of a review and prefer to associate to the review an assessment system that supposes a substantial implication and specific competences of the internet user. Our aim is minimizing user intervention in the collaborative functioning of recommender systems thanks to an automated processing of available reviews in natural language by the recommender system itself. Our topic segmentation method extracts the subjects of interest from the reviews, and then our sentiment analysis approach computes the opinion related to these criteria. These knowledge extraction methods are combined with multicriteria analysis techniques adapted to expert assessments fusion. This proposal should finally contribute to the coming of a new generation of more relevant, reliable and personalized recommender systems. Fouille de texte Fouille de données Extraction d'opinion Extraction conceptuelle Système de recommandation Analyse multicritère Text-Mining Data-mining Opinion-mining Concept characterization Recommender system Multicriteria analysis
487	Serendipity prospecção semântica de dados qualitativos em Educação Especial Fernandes, Woquiton Lima 22 August 2016 (has links) Submitted by Alison Vanceto (alison-vanceto@hotmail.com) on 2017-02-23T12:32:56Z No. of bitstreams: 1 TeseWLF.pdf: 10494807 bytes, checksum: df4332346794cb6528875bef5e9313c4 (MD5) / Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-03-20T13:42:30Z (GMT) No. of bitstreams: 1 TeseWLF.pdf: 10494807 bytes, checksum: df4332346794cb6528875bef5e9313c4 (MD5) / Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-03-20T13:42:43Z (GMT) No. of bitstreams: 1 TeseWLF.pdf: 10494807 bytes, checksum: df4332346794cb6528875bef5e9313c4 (MD5) / Made available in DSpace on 2017-03-20T13:54:25Z (GMT). No. of bitstreams: 1 TeseWLF.pdf: 10494807 bytes, checksum: df4332346794cb6528875bef5e9313c4 (MD5) Previous issue date: 2016-08-22 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / In the past decades, there has been a revolution in the way science has been conducted. The current context has demanded more collaborative work such as, studies in research networks of large scale. One of the many essential marks of change in this new way of making science has been the intense usage of Information and Communication Technologies (ICT), or “eScience”. Nowadays, it plays a fundamental role in the methodology adopted by many research groups around the world. Analyses of the qualitative data evidenced in researches about Special Education were done then. The biggest challenge that was noticed would be to advance in the analysis of qualitative data using information technologies without losing the subjectivity involved in the research and to broaden the capability of going over the data without losing the right to come and go, the right to critique and establish proper reflexions, respecting subjective positioning and, above all, maintaining the research's critic criteria. In this sense, this work establishes as its main objective to evaluate the proposed technological architecture of qualitative analyses of data. This analysis was based upon data mining theories, researches in ontology and techniques of semantic notation in the field of special education aiming to analyze the thresholds and possibilities this methodological approach permits. We used as methodology the construction of a prototype, named Serendipity, based on the perspective of software engineering, in order to extract the main techniques that could set as a safe method for design, implementation and deployment of the solution. Cyclically, the methodology allowed us to modify requirements and establish improvements, allowing the feedback process from new analyses. The text mining process relied on gaining knowledge from textual databases that have little or no data structure. The computational ontology was the element able to reconstruct the syntactic representation, giving it direction. The words (data) are related and are set within a context of formal knowledge, providing them with a semantic and cognitive ability, building concepts, open to interpretation, comprehension and common understanding; as a result, we built up a specific ontology for Special Education. The semantic annotation helped attach content to the text to describe their semantics, allowing that software agents could retrieve information in a more precise manner through the association of the document to the ontology in a conception of semantic fields. We built a customized dictionary for special education to relate terms to synonyms and expressions associated with the ontology. To view beyond the semantic classes, we used automatic concept maps to establish relationships between concepts included in a hierarchical structure of propositions. Finally, to assess the proposal, we made use of part of the data collected from the National Observatory of Special Education in transcribed texts about the formation of five cities, one from each region of Brazil. The results show limits already recognized in the proposal and; in this respect, did not aim to establish a subjective and deep analysis that would permit extreme precision results. It points out that the researcher is and will always be the driving factor that operates the process’ flow and relying, or not, on computing tools is not entirely immune to err. The proposal of serendipity has given a step forward in the automatic process of data analysis and can be used in big data without losing the subjectivity of the researcher. However, we must add new human and technological resources to contribute to its improvement and encourage other areas to develop domain ontologies with their experts and the development of specific dictionaries. Therefore, despite its limitations, the approach has shown significant advances in semantic exploration of qualitative data in the Special Education field and it is capable of being adapted to other areas and fields of knowledge. / Nas últimas décadas, tem ocorrido uma revolução no modo como a ciência tem sido conduzida, o atual contexto tem demandado cada vez mais o trabalho colaborativo, tais como os estudos em redes de pesquisa de ampla escala. Um dos pontos essenciais de mudança nessa nova forma de se fazer ciência tem sido o uso intenso de Tecnologias de Informação e Comunicação (TIC), chamada como “eScience”, que desempenha hoje um papel fundamental na metodologia adotada por muitos grupos de pesquisa ao redor do mundo. Partiu-se então para uma reflexão acerca do aprofundamento de dados qualitativos evidenciadas principalmente nas pesquisas em Educação Especial. O grande desafio seria avançar na qualidade da análise de dados qualitativos com uso das tecnologias da informação sem perder a subjetividade envolvida na pesquisa e ampliar a capacidade de esmiuçar os dados sem perder a liberdade de ir e vir, de criticar e estabelecer reflexões próprias, respeitando posicionamentos e, sobretudo, mantendo o rigor científico na pesquisa. Neste sentido, o presente estudo estabeleceu como objetivo principal avaliar a arquitetura tecnológica proposta de análise qualitativa de dados, tendo como base as teorias de mineração de textos, ontologia computacional e técnicas de anotação semântica, em pesquisa da educação especial, a fim de analisar os limites e possibilidades desta abordagem metodológica. Utilizamos como metodologia baseada na construção de um protótipo, denominado Serendipity, fundamentado na perspectiva da engenharia de software, de maneira que extraímos as principais técnicas que puderam definir um método seguro para a concepção, implementação e implantação da solução. De forma cíclica a metodologia permitia modificar requisitos e estabelecer melhorias, permitindo a retroalimentação do processo a partir de novas análises. Para isto, a mineração de textos apoiou-se na obtenção de conhecimento a partir de bases de dados textuais que possuem pouca ou nenhuma estrutura de dados. A ontologia computacional foi o elemento capaz de reconstruir a representação sintática, dando a ela sentido. As palavras (dados) se relacionam e são postas dentro de um contexto, de um conhecimento formal, dotando-as de uma capacidade semântica e cognitiva, construindo conceitos, passível de interpretação, compreensão e entendimento comum; para isto construiu-se uma ontologia específica para Educação Especial. A anotação semântica ajudou a anexar conteúdos ao texto para descrever a sua semântica, permitindo que agentes de software pudessem recuperar informações de forma mais precisa, através da associação do documento à ontologia, numa concepção de campos semânticos. Construiu-se também um dicionário da Educação Especial customizado para relacionar termos a sinônimos e expressões associadas à ontologia. Para visualização, além das classes semânticas, utilizou-se de mapas conceituais automáticos para estabelecer relações entre conceitos incluídos numa estrutura hierárquica de proposições. Por fim, para a avaliação da proposta utilizou-se de parte dos dados coletados no Observatório Nacional da Educação Especial de textos transcritos acerca da Formação em cinco cidades, sendo uma de cada região do Brasil. Os resultados evidenciam limites já reconhecidos na proposta e, neste aspecto, não teve a pretensão de determinar uma análise subjetiva e detalhista, que a rigor, permita resultados de extrema precisão. Destaca que o pesquisador é e sempre será o condutor livre do funcionamento do processo e contando, ou não, com ferramentas computacionais ele pode cometer erros. A proposta do serendipity deu um passo no processo automático de análise de dados, podendo ser aproveitada em big data, pesquisas de nível nacional, sem perder a subjetividade do pesquisador. Para isto é preciso agregar novos recursos humanos e tecnológicos que contribuam em seu aprimoramento. Estimular outras áreas a desenvolverem ontologias de domínio com seus especialistas e a evolução dos dicionários específicos. Portanto, apesar de seus limites, a abordagem possui avanços significativos na prospecção semântica de dados qualitativos em Educação Especial e passível de adaptação a outras áreas de conhecimento. Educação especial Análise qualitativa Ontologia computacional Mineração de textos Anotação semântica Special education Qualitative analysis Computational ontology Text mining Semantic annotation CIENCIAS HUMANAS::EDUCACAO
488	Minerafórum : um recurso de apoio para análise qualitativa em fóruns de discussão Azevedo, Breno Fabrício Terra January 2011 (has links) Esta tese aborda o desenvolvimento, uso e experimentação do MineraFórum. Trata-se de um recurso para auxiliar o professor na análise qualitativa das contribuições textuais registradas por alunos em fóruns de discussão. A abordagem desta pesquisa envolveu técnicas de mineração de textos utilizando grafos. As interações proporcionadas pelas trocas de mensagens em um fórum de discussão representam uma importante fonte de investigação para o professor. A partir da análise das postagens, o docente pode identificar quais alunos redigiram contribuições textuais que contemplam conceitos relativos ao tema da discussão, e quais discentes não o fizeram. Desta forma, é possível ter subsídios para motivar a discussão dos conceitos importantes que fazem parte do tema em debate. Para atingir o objetivo do presente estudo, foi necessário realizar uma revisão da literatura onde foram abordados temas como: a Educação a Distância (EAD); Ambientes Virtuais de Aprendizagem; os principais conceitos da área de Mineração de Textos e, por último, trabalhos correlacionados a esta tese. A estratégia metodológica utilizada no processo de desenvolvimento do MineraFórum envolveu uma série de etapas: 1) a escolha de uma técnica de mineração de textos adequada às necessidades da pesquisa; 2) verificação da existência de algum software de mineração de textos que auxiliasse o professor a analisar qualitativamente as contribuições em um fórum de discussão; 3) realização de estudos preliminares para avaliar a técnica de mineração escolhida; 4) definição dos indicadores de relevância das mensagens; elaboração de fórmulas para calcular a relevância das postagens; 5) construção do sistema; 6) integração do MineraFórum a três Ambientes Virtuais de Aprendizagem e, por último, 7) a realização de experimentos com a ferramenta. / This thesis presents the development, use and experimentation of the MineraFórum software. It is a resource that can help teachers in doing qualitative analyses of text contributions in discussion forums. This research included the use of text mining techniques with graphs. Message exchange in discussion forums are an important source of investigation for teachers. By analyzing students’ posts, teachers can identify which learners wrote contributions that have concepts related to the debate theme, and which students did not succeed to do so. This strategy may also give teachers the necessary elements to motivate discussion of concepts relevant to the topic being debated. To accomplish the objectives of this study, a review of the literature was carried on topics such as: Distance Learning; Virtual Learning Environments; main concepts in Text Mining; and studies related to this thesis. The methodological strategy used in the development of MineraFórum followed these steps: 1) choosing a text mining technique suitable to the needs of the research; 2) checking whether there was software available to help teachers to do qualitative analysis of contributions in discussion forums; 3) doing preliminary studies to evaluate the selected mining technique; 4) defining indicators of relevance in the messages; elaborating formulas to calculate relevance in posts; 5) building the system; 6) integrating MineraFórum to three Virtual Learning Environments, and 7) carrying experiments with the tool. Computador na educação Fórum de discussão Ambiente virtual Ambiente de aprendizagem Análise de dados Text mining Discussion forum Qualitative analysis Thematic relevance Virtual learning environments
489	On text mining to identify gene networks with a special reference to cardiovascular disease / Identifiering av genetiska nätverk av betydelse för kärlförkalkning med hjälp av automatisk textsökning i Medline, en medicinsk litteraturdatabas Strandberg, Per Erik January 2005 (has links) The rate at which articles gets published grows exponentially and the possibility to access texts in machine-readable formats is also increasing. The need of an automated system to gather relevant information from text, text mining, is thus growing. The goal of this thesis is to find a biologically relevant gene network for atherosclerosis, themain cause of cardiovascular disease, by inspecting gene cooccurrences in abstracts from PubMed. In addition to this gene nets for yeast was generated to evaluate the validity of using text mining as a method. The nets found were validated in many ways, they were for example found to have the well known power law link distribution. They were also compared to other gene nets generated by other, often microbiological, methods from different sources. In addition to classic measurements of similarity like overlap, precision, recall and f-score a new way to measure similarity between nets are proposed and used. The method uses an urn approximation and measures the distance from comparing two unrelated nets in standard deviations. The validity of this approximation is supported both analytically and with simulations for both Erd¨os-R´enyi nets and nets having a power law link distribution. The new method explains that very poor overlap, precision, recall and f-score can still be very far from random and also how much overlap one could expect at random. The cutoff was also investigated. Results are typically in the order of only 1% overlap but with the remarkable distance of 100 standard deviations from what one could have expected at random. Of particular interest is that one can only expect an overlap of 2 edges with a variance of 2 when comparing two trees with the same set of nodes. The use of a cutoff at one for cooccurrence graphs is discussed and motivated by for example the observation that this eliminates about 60-70% of the false positives but only 20-30% of the overlapping edges. This thesis shows that text mining of PubMed can be used to generate a biologically relevant gene subnet of the human gene net. A reasonable extension of this work is to combine the nets with gene expression data to find a more reliable gene net. Bioinformatics Atherosclerosis Cardiovascular Disease Cooccurrence Data mining Gene networks Literature networks Prior incorporation Text mining Bioinformatik Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
490	Um data warehouse de publicações científicas: indexação automática da dimensão tópicos de pesquisa dos data marts / A Data warehouse for scientific publications: automatic indexing of the research topic dimension for using in data marts Augusto Kanashiro 04 May 2007 (has links) Este trabalho de mestrado insere-se no contexto do projeto de uma Ferramenta Inteligente de Apoio à Pesquisa (FIP), sendo desenvolvida no Laboratório de Inteligência Computacional do ICMC-USP. A ferramenta foi proposta para recuperar, organizar e minerar grandes conjuntos de documentos científicos (na área de computação). Nesse contexto, faz-se necessário um repositório de artigos para a FIP. Ou seja, um Data Warehouse que armazene e integre todas as informações extraídas dos documentos recuperados de diferentes páginas pessoais, institucionais e de repositórios de artigos da Web. Para suportar o processamento analítico on-line (OLAP) das informações e facilitar a ?mineração? desses dados é importante que os dados estejam armazenados apropriadamente. Dessa forma, o trabalho de mestrado teve como objetivo principal projetar um Data Warehouse (DW) para a ferramenta FIP e, adicionalmente, realizar experimentos com técnicas de mineração e Aprendizado de Máquina para automatizar o processo de indexação das informações e documentos armazenados no data warehouse (descoberta de tópicos). Para as consultas multidimensionais foram construídos data marts de forma a permitir aos pesquisadores avaliar tendências e a evolução de tópicos de pesquisa / This dissertation is related to the project of an Intelligent Tool for Research Supporting (FIP), being developed at the Laboratory of Computational Intelligence at ICMC-USP. The tool was proposed to retrieve, organize, and mining large sets of scientific documents in the field of computer science. In this context, a repository of articles becomes necessary, i.e., a Data Warehouse that integrates and stores all extracted information from retrieved documents from different personal and institutional web pages, and from article repositories. Data appropriatelly stored is decisive for supporting online analytical processing (OLAP), and ?data mining? processes. Thus, the main goal of this MSc research was design the FIP Data Warehouse (DW). Additionally, we carried out experiments with Data Mining and Machine Learning techniques in order to automatize the process of indexing of information and documents stored in the data warehouse (Topic Detection). Data marts for multidimensional queries were designed in order to facilitate researchers evaluation of research topics trend and evolution Aprendizado de máquina Data mart Data warehouse Detecção de tópicos de pesquisa Mineração de dados Mineração de textos OLAP Data mart Data mining Data warehouse Machine learning OLAP Research topic detection Text mining

Search results