Spelling suggestions: "subject:"textmining"" "subject:"textläsning""
481 |
Minerafórum : um recurso de apoio para análise qualitativa em fóruns de discussãoAzevedo, Breno Fabrício Terra January 2011 (has links)
Esta tese aborda o desenvolvimento, uso e experimentação do MineraFórum. Trata-se de um recurso para auxiliar o professor na análise qualitativa das contribuições textuais registradas por alunos em fóruns de discussão. A abordagem desta pesquisa envolveu técnicas de mineração de textos utilizando grafos. As interações proporcionadas pelas trocas de mensagens em um fórum de discussão representam uma importante fonte de investigação para o professor. A partir da análise das postagens, o docente pode identificar quais alunos redigiram contribuições textuais que contemplam conceitos relativos ao tema da discussão, e quais discentes não o fizeram. Desta forma, é possível ter subsídios para motivar a discussão dos conceitos importantes que fazem parte do tema em debate. Para atingir o objetivo do presente estudo, foi necessário realizar uma revisão da literatura onde foram abordados temas como: a Educação a Distância (EAD); Ambientes Virtuais de Aprendizagem; os principais conceitos da área de Mineração de Textos e, por último, trabalhos correlacionados a esta tese. A estratégia metodológica utilizada no processo de desenvolvimento do MineraFórum envolveu uma série de etapas: 1) a escolha de uma técnica de mineração de textos adequada às necessidades da pesquisa; 2) verificação da existência de algum software de mineração de textos que auxiliasse o professor a analisar qualitativamente as contribuições em um fórum de discussão; 3) realização de estudos preliminares para avaliar a técnica de mineração escolhida; 4) definição dos indicadores de relevância das mensagens; elaboração de fórmulas para calcular a relevância das postagens; 5) construção do sistema; 6) integração do MineraFórum a três Ambientes Virtuais de Aprendizagem e, por último, 7) a realização de experimentos com a ferramenta. / This thesis presents the development, use and experimentation of the MineraFórum software. It is a resource that can help teachers in doing qualitative analyses of text contributions in discussion forums. This research included the use of text mining techniques with graphs. Message exchange in discussion forums are an important source of investigation for teachers. By analyzing students’ posts, teachers can identify which learners wrote contributions that have concepts related to the debate theme, and which students did not succeed to do so. This strategy may also give teachers the necessary elements to motivate discussion of concepts relevant to the topic being debated. To accomplish the objectives of this study, a review of the literature was carried on topics such as: Distance Learning; Virtual Learning Environments; main concepts in Text Mining; and studies related to this thesis. The methodological strategy used in the development of MineraFórum followed these steps: 1) choosing a text mining technique suitable to the needs of the research; 2) checking whether there was software available to help teachers to do qualitative analysis of contributions in discussion forums; 3) doing preliminary studies to evaluate the selected mining technique; 4) defining indicators of relevance in the messages; elaborating formulas to calculate relevance in posts; 5) building the system; 6) integrating MineraFórum to three Virtual Learning Environments, and 7) carrying experiments with the tool.
|
482 |
Visual analytics of arsenic in various foodsJohnson, Matilda Olubunmi 06 1900 (has links)
Arsenic is a naturally occurring toxic metal and its presence in food composites could be a potential risk to the health of both humans and animals. Arseniccontaminated groundwater is often used for food and animal consumption, irrigation of soils, which could potentially lead to arsenic entering the human food chain. Its side effects include multiple organ damage, cancers, heart disease, diabetes mellitus, hypertension, lung disease and peripheral vascular disease. Research investigations, epidemiologic surveys and total diet studies (market baskets) provide datasets, information and knowledge on arsenic content in foods. The determination of the concentration of arsenic in rice varieties is an active area of research. With the increasing capability to measure the concentration of arsenic in foods, there are volumes of varied and continuously generated datasets on arsenic in food groups.
Visual analytics, which integrates techniques from information visualization and computational data analysis via interactive visual interfaces, presents an approach to enable data on arsenic concentrations to be visually represented.
The goal of this doctoral research in Environmental Science is to address the need to provide visual analytical decision support tools on arsenic content in various foods with special emphasis on rice. The hypothesis of this doctoral thesis research is that software enabled visual representation and user interaction facilitated by visual
interfaces will help discover hidden relationships between arsenic content and food categories.
The specific objectives investigated were: (1) Provide insightful visual analytic views of compiled data on arsenic in food categories; (2) Categorize table ready foods by arsenic content; (3) Compare arsenic content in rice product categories and (4) Identify informative sentences on arsenic concentrations in rice. The overall research method is secondary data analyses using visual analytics techniques implemented through Tableau Software.
Several datasets were utilized to conduct visual analytical representations of data on arsenic concentrations in foods. These consisted of (i) arsenic concentrations in 459 crop samples; (ii) arsenic concentrations in 328 table ready foods from multi-year total diet studies; (iii) estimates of daily inorganic arsenic intake for 49 food groups from multicountry total diet studies; (iv) arsenic content in rice product categories for 193 samples of rice and rice products; (v) 758 sentences extracted from PubMed abstracts on arsenic in rice.
Several key insights were made in this doctoral research. The concentration of inorganic arsenic in instant rice was lower than those of other rice types. The concentration of Dimethylarsinic Acid (DMA) in wild rice, an aquatic grass, was notably lower than rice varieties (e.g. 0.0099 ppm versus 0.182 for a long grain white rice). The categorization of 328 table ready foods into 12 categories enhances the communication on arsenic concentrations. Outlier concentration of arsenic in rice were observed in views constructed for integrating data from four total diet studies. The 193 rice samples were grouped into two groups using a cut-off level of 3 mcg of inorganic arsenic per
serving. The visual analytics views constructed allow users to specify cut-off levels desired. A total of 86 sentences from 53 PubMed abstracts were identified as informative for arsenic concentrations. The sentences enabled literature curation for arsenic concentration and additional supporting information such as location of the research. An
informative sentence provided global “normal” range of 0.08 to 0.20 mg/kg for arsenic in rice. A visual analytics resource developed was a dashboard that facilitates the interaction with text and a connection to the knowledge base of the PubMed literature database.
The research reported provides a foundation for additional investigations on visual analytics of data on arsenic concentrations in foods. Considering the massive and complex data associated with contaminants in foods, the development of visual analytics tools are needed to facilitate diverse human cognitive tasks. Visual analytics
tools can provide integrated automated analysis; interaction with data; and data visualization critically needed to enhance decision making. Stakeholders that would benefit include consumers; food and health safety personnel; farmers; and food producers. Arsenic content of baby foods warrants attention because of the early life exposures that could have life time adverse health consequences.
The action of microorganisms in the soil is associated with availability of arsenic species for uptake by plants. Genomic data on microbial communities presents wealth of data to identify mitigation strategies for arsenic uptake by plants. Arsenic metabolism pathways encoded in microbial genomes warrants further research. Visual analytics tasks could facilitate the discovery of biological processes for mitigating arsenic uptake from soil. The increasing availability of central resources on data from total diet studies and research investigations presents a need for personnel with diverse levels of skills in data
management and analysis. Training workshops and courses on the foundations and applications of visual analytics can contribute to global workforce development in food safety and environmental health. Research investigations could determine learning
gains accomplished through hardware and software for visual analytics. Finally, there is need to develop and evaluate informatics tools that have visual analytics capabilities in the domain of contaminants in foods. / Environmental Sciences / P. Phil. (Environmental Science)
|
483 |
Epistemologia da Informática em Saúde: entre a teoria e a prática / Epistemology of Medical Informatics: between theory and practiceColepícolo, Eliane [UNIFESP] 26 March 2008 (has links) (PDF)
Made available in DSpace on 2015-07-22T20:50:02Z (GMT). No. of bitstreams: 0
Previous issue date: 2008-03-26 / Epistemologia da Informática em Saúde: entre a teoria e a prática. Eliane Colepí-colo. 2008. CONTEXTO. O objetivo dessa pesquisa é compreender a epistemologia da área de Informática em Saúde (IS) por meio de um estudo comparativo entre aspectos teóricos e práticos desta disciplina. MATERIAIS E MÉTODOS. O estudo foi dividido em 3 eta-pas: estudo estatístico, estudo terminológico e estudo epistemológico. O estudo esta-tístico envolveu o desenvolvimento e uso de robô para extração de metadados de arti-gos científicos da base PubMed, assim como a mineração de textos destes resumos de artigos, utilizados para estatísticas e análise posterior. O estudo terminológico visou o desenvolvimento de um tesauro especializado em IS, aqui denominado EpistemIS, que, integrado ao MeSH, serviu como base ao estudo estatístico. O estudo epistemo-lógico começou com o estudo dos metaconceitos da ação e pensamento humanos (MAPHs), que são arte, técnica, ciência, tecnologia e tecnociência. A seguir, realizou-se o desenvolvimento de um método epistemológico, baseado nas obras de Mário Bunge, para classificação epistemológica de conceitos da área provenientes do tesau-ro EpistemIS. Uma pesquisa de opinião com a comunidade científica da área foi reali-zada por meio de questionário na web. RESULTADOS. Obteve-se: uma caracteriza-ção dos MAPHs, mapas de sistematização do conhecimento em IS, classificações epistemológica e em MAPHs da IS, um mapa do conhecimento em IS e o consenso da comunidade sobre a epistemologia da IS. Por fim, foram calculadas estatísticas relati-vas: às classificações epistemológica e em MAPHs em IS, à integração entre o corpus de análise (437.289 artigos PubMed) e o tesauro EpistemIS. CONCLUSÃO. A partir de argumentos teóricos e práticos concluiu-se que a Informática em Saúde é uma tecno-ciência que se ocupa de solucionar problemas relativos aos domínios das Ciências da Vida, Ciências da Saúde e do Cuidado em Saúde, por meio da pesquisa científica in-terdisciplinar e do desenvolvimento de tecnologia para uso na sociedade. / TEDE
|
484 |
Analyse des médias sociaux de santé pour évaluer la qualité de vie des patientes atteintes d’un cancer du sein / Analysis of social health media to assess the quality of life of breast cancer patientsTapi Nzali, Mike Donald 28 September 2017 (has links)
En 2015, le nombre de nouveaux cas de cancer du sein en France s'élève à 54 000. Le taux de survie 5 ans après le diagnostic est de 89 %. Si les traitements modernes permettent de sauver des vies, certains sont difficiles à supporter. De nombreux projets de recherche clinique se sont donc focalisés sur la qualité de vie (QdV) qui fait référence à la perception que les patients ont de leurs maladies et de leurs traitements. La QdV est un critère d'évaluation clinique pertinent pour évaluer les avantages et les inconvénients des traitements que ce soit pour le patient ou pour le système de santé. Dans cette thèse, nous nous intéresserons aux histoires racontées par les patients dans les médias sociaux à propos de leur santé, pour mieux comprendre leur perception de la QdV. Ce nouveau mode de communication est très prisé des patients car associé à une grande liberté du discours due notamment à l'anonymat fourni par ces sites.L’originalité de cette thèse est d’utiliser et d'étendre des méthodes de fouille de données issues des médias sociaux pour la langue Française. Les contributions de ce travail sont les suivantes : (1) construction d’un vocabulaire patient/médecin ; (2) détection des thèmes discutés par les patients; (3) analyse des sentiments des messages postés par les patients et (4) mise en relation des différentes contributions citées.Dans un premier temps, nous avons utilisé les textes des patients pour construire un vocabulaire patient/médecin spécifique au domaine du cancer du sein, en recueillant divers types d'expressions non-expertes liées à la maladie, puis en les liant à des termes biomédicaux utilisés par les professionnels de la santé. Nous avons combiné plusieurs méthodes de la littérature basées sur des approches linguistiques et statistiques. Pour évaluer les relations obtenues, nous utilisons des validations automatiques et manuelles. Nous avons ensuite transformé la ressource construite dans un format lisible par l’être humain et par l’ordinateur en créant une ontologie SKOS, laquelle a été intégrée dans la plateforme BioPortal.Dans un deuxième temps, nous avons utilisé et étendu des méthodes de la littérature afin de détecter les différents thèmes discutés par les patients dans les médias sociaux et de les relier aux dimensions fonctionnelles et symptomatiques des auto-questionnaires de QdV (EORTC QLQ-C30 et EORTC QLQ-BR23). Afin de détecter les thèmes, nous avons appliqué le modèle d’apprentissage non supervisé LDA avec des prétraitements pertinents. Ensuite, nous avons proposé une méthode permettant de calculer automatiquement la similarité entre les thèmes détectés et les items des auto-questionnaires de QdV. Nous avons ainsi déterminé de nouveaux thèmes complémentaires à ceux déjà présents dans les questionnaires. Ce travail a ainsi mis en évidence que les données provenant des forums de santé sont susceptibles d'être utilisées pour mener une étude complémentaire de la QdV.Dans un troisième temps, nous nous sommes focalisés sur l’extraction de sentiments (polarité et émotions). Pour cela, nous avons évalué différentes méthodes et ressources pour la classification de sentiments en Français. Ces expérimentations ont permis de déterminer les caractéristiques utiles dans la classification de sentiments pour différents types de textes, y compris les textes provenant des forums de santé. Finalement, nous avons utilisé les différentes méthodes proposées dans cette thèse pour quantifier les thèmes et les sentiments identifiés dans les médias sociaux de santé.De manière générale, ces travaux ont ouvert des perspectives prometteuses sur diverses tâches d'analyse des médias sociaux pour la langue française et en particulier pour étudier la QdV des patients à partir des forums de santé. / In 2015, the number of new cases of breast cancer in France is 54,000.The survival rate after 5 years of cancer diagnosis is 89%.If the modern treatments allow to save lives, some are difficult to bear. Many clinical research projects have therefore focused on quality of life (QoL), which refers to the perception that patients have on their diseases and their treatments.QoL is an evaluation method of alternative clinical criterion for assessing the advantages and disadvantages of treatments for the patient and the health system. In this thesis, we will focus on the patients stories in social media dealing with their health. The aim is to better understand their perception of QoL. This new mode of communication is very popular among patients because it is associated with a great freedom of speech, induced by the anonymity provided by these websites.The originality of this thesis is to use and extend social media mining methods for the French language. The main contributions of this work are: (1) construction of a patient/doctor vocabulary; (2) detection of topics discussed by patients; (3) analysis of the feelings of messages posted by patients and (4) combinaison of the different contributions to quantify patients discourse.Firstly, we used the patient's texts to construct a patient/doctor vocabulary, specific to the field of breast cancer, by collecting various types of non-experts' expressions related to the disease, linking them to the biomedical terms used by health care professionals. We combined several methods of the literature based on linguistic and statistical approaches. To evaluate the relationships, we used automatic and manual validations. Then, we transformed the constructed resource into human-readable format and machine-readable format by creating a SKOS ontology, which is integrated into the BioPortal platform.Secondly, we used and extended literature methods to detect the different topics discussed by patients in social media and to relate them to the functional and symptomatic dimensions of the QoL questionnaires (EORTC QLQ-C30 and EORTC QLQ-BR23). In order to detect the topics discussed by patients, we applied the unsupervised learning LDA model with relevant preprocessing. Then, we applied a customized Jaccard coefficient to automatically compute the similarity distance between the topics detected with LDA and the items in the auto-questionnaires. Thus, we detected new emerging topics from social media that could be used to complete actual QoL questionnaires. This work confirms that social media can be an important source of information for the study of the QoL in the field of cancer.Thirdly, we focused on the extraction of sentiments (polarity and emotions). For this, we evaluated different methods and resources for the classification of feelings in French.These experiments aim to determine useful characteristics in the classification of feelings for different types of texts, including texts from health forums.Finally, we used the different methods proposed in this thesis to quantify the topics and feelings identified in the health social media.In general, this work has opened promising perspectives on various tasks of social media analysis for the French language and in particular the study of the QoL of patients from the health forums.
|
485 |
Seleção de atributos para classificação de textos usando técnicas baseadas em agrupamento, PoS tagging e algoritmos evolutivosFerreira, Charles Henrique Porto January 2016 (has links)
Orientadora: Profa. Dra. Debora Maria Rossi de Medeiros / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Ciência da Computação, 2016. / Neste trabalho são investigadas técnicas de seleção de atributos para serem
aplicadas à tarefa de classificação de textos. Três técnicas diferentes são propostas
para serem comparadas com as técnicas tradicionais de pré-processamento de textos.
A primeira técnica propõe que nem todas as classes gramaticais de um dado idioma
sejam relevantes em um texto quando este é submetido à tarefa de classificação.
A segunda técnica emprega agrupamento de atributos e algoritmos genéticos para
seleção de grupos. Na terceira técnica são levantadas 2 hipóteses: a primeira supõe
que as palavras que ocorrem com mais frequência em uma base de textos do que no
idioma como um todo, podem ser as palavras mais importantes para comporem os
atributos; a segunda presume que as relações de cada instância de dados com cada
classe pode compor um novo conjunto de atributos.
Os resultados obtidos sugerem que as abordagens propostas são promissoras
e que as hipóteses levantadas podem ser válidas. Os experimentos com a primeira
abordagem mostram que existe um conjunto de classes gramaticais cujas palavras
podem ser desconsideradas do conjunto de atributos em bases de textos diferentes
mantendo ou até melhorando a acurácia de classificação. A segunda abordagem consegue
realizar uma forte redução no número de atributos original e ainda melhorar
a acurácia de classificação. Com a terceira abordagem, foi obtida a redução mais
acentuada do número de atributos pois, pela natureza da proposta, o número final
de atributos é igual ao número de classes da base, e o impacto na acurácia foi nulo
ou até positivo. / This work investigates feature selection techniques to be applied to text
classification task. Three different techniques are proposed to be compared with
the traditional techniques of preprocessing texts. The first technique proposed that
not all grammatical classes of a given language in a text are relevant when it is
subjected to the classification task. The second technique employs clustering features
and genetic algorithms for selecting groups. In the third technique are raised two
hypotheses: the first assumes that the words that occur most often on the dataset
than the language as a whole, may be the most important words to compose the
features; the second assumes that the relationship of each data instance with each
class can compose a new set of attributes.
The results suggest that the proposed approaches are promising and that
the hypotheses may be valid. The experiments show that the first approach is a
set of grammatical word classes which can be disregarded from the set of features
from different datasets maintaining or even improving the accuracy of classification.
The second approach can achieve a significant reduction in the number of unique
features and to improve the accuracy of classification. With the third approach, it
obtained the more pronounced reduction in the number of features because, by the
nature of the proposal, the final number offeatures is equal to the number of classes
of the dataset, and the impact on the accuracy was zero or even positive.
|
486 |
De l'extraction des connaissances à la recommandation / From knowledge extraction to recommendationDuthil, Benjamin 03 December 2012 (has links)
Les technologies de l'information et le succès des services associés (forums, sites spécialisés, etc) ont ouvert la voie à un mode d'expression massive d'opinions sur les sujets les plus variés (e-commerce, critiques artistiques, etc). Cette profusion d'opinions constitue un véritable eldorado pour l'internaute, mais peut rapidement le conduire à une situation d'indécision car les avis déposés peuvent être fortement disparates voire contradictoires. Pour une gestion fiable et pertinente de l'information contenue dans ces avis, il est nécessaire de mettre en place des systèmes capables de traiter directement les opinions exprimées en langage naturel afin d'en contrôler la subjectivité et de gommer les effets de lissage des traitements statistiques. La plupart des systèmes dits de recommandation ne prennent pas en compte toute la richesse sémantique des critiques et leur associent souvent des systèmes d'évaluation qui nécessitent une implication conséquente et des compétences particulières chez l'internaute. Notre objectif est de minimiser l'intervention humaine dans le fonctionnement collaboratif des systèmes de recommandation en automatisant l'exploitation des données brutes que constituent les avis en langage naturel. Notre approche non supervisée de segmentation thématique extrait les sujets d'intérêt des critiques, puis notre technique d'analyse de sentiments calcule l'opinion exprimée sur ces critères. Ces méthodes d'extraction de connaissances combinées à des outils d'analyse multicritère adaptés à la fusion d'avis d'experts ouvrent la voie à des systèmes de recommandation pertinents, fiables et personnalisés. / Information Technology and the success of its related services (blogs, forums, etc.) have paved the way for a massive mode of opinion expression on the most varied subjects (e-commerce websites, art reviews, etc). This abundance of opinions could appear as a real gold mine for internet users, but it can also be a source of indecision because available opinions may be ill-assorted if not contradictory. A reliable and relevant information management of opinions bases requires systems able to directly analyze the content of opinions expressed in natural language. It allows controlling subjectivity in evaluation process and avoiding smoothing effects of statistical treatments. Most of the so-called recommender systems are unable to manage all the semantic richness of a review and prefer to associate to the review an assessment system that supposes a substantial implication and specific competences of the internet user. Our aim is minimizing user intervention in the collaborative functioning of recommender systems thanks to an automated processing of available reviews in natural language by the recommender system itself. Our topic segmentation method extracts the subjects of interest from the reviews, and then our sentiment analysis approach computes the opinion related to these criteria. These knowledge extraction methods are combined with multicriteria analysis techniques adapted to expert assessments fusion. This proposal should finally contribute to the coming of a new generation of more relevant, reliable and personalized recommender systems.
|
487 |
Serendipity prospecção semântica de dados qualitativos em Educação EspecialFernandes, Woquiton Lima 22 August 2016 (has links)
Submitted by Alison Vanceto (alison-vanceto@hotmail.com) on 2017-02-23T12:32:56Z
No. of bitstreams: 1
TeseWLF.pdf: 10494807 bytes, checksum: df4332346794cb6528875bef5e9313c4 (MD5) / Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-03-20T13:42:30Z (GMT) No. of bitstreams: 1
TeseWLF.pdf: 10494807 bytes, checksum: df4332346794cb6528875bef5e9313c4 (MD5) / Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-03-20T13:42:43Z (GMT) No. of bitstreams: 1
TeseWLF.pdf: 10494807 bytes, checksum: df4332346794cb6528875bef5e9313c4 (MD5) / Made available in DSpace on 2017-03-20T13:54:25Z (GMT). No. of bitstreams: 1
TeseWLF.pdf: 10494807 bytes, checksum: df4332346794cb6528875bef5e9313c4 (MD5)
Previous issue date: 2016-08-22 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) / In the past decades, there has been a revolution in the way science has been
conducted. The current context has demanded more collaborative work such as,
studies in research networks of large scale. One of the many essential marks of
change in this new way of making science has been the intense usage of Information
and Communication Technologies (ICT), or “eScience”. Nowadays, it plays a
fundamental role in the methodology adopted by many research groups around the
world. Analyses of the qualitative data evidenced in researches about Special
Education were done then. The biggest challenge that was noticed would be to
advance in the analysis of qualitative data using information technologies without
losing the subjectivity involved in the research and to broaden the capability of going
over the data without losing the right to come and go, the right to critique and
establish proper reflexions, respecting subjective positioning and, above all,
maintaining the research's critic criteria. In this sense, this work establishes as its
main objective to evaluate the proposed technological architecture of qualitative
analyses of data. This analysis was based upon data mining theories, researches in
ontology and techniques of semantic notation in the field of special education aiming
to analyze the thresholds and possibilities this methodological approach permits. We
used as methodology the construction of a prototype, named Serendipity, based on
the perspective of software engineering, in order to extract the main techniques that
could set as a safe method for design, implementation and deployment of the
solution. Cyclically, the methodology allowed us to modify requirements and establish
improvements, allowing the feedback process from new analyses. The text mining
process relied on gaining knowledge from textual databases that have little or no
data structure. The computational ontology was the element able to reconstruct the
syntactic representation, giving it direction. The words (data) are related and are set
within a context of formal knowledge, providing them with a semantic and cognitive
ability, building concepts, open to interpretation, comprehension and common
understanding; as a result, we built up a specific ontology for Special Education. The
semantic annotation helped attach content to the text to describe their semantics,
allowing that software agents could retrieve information in a more precise manner
through the association of the document to the ontology in a conception of semantic
fields. We built a customized dictionary for special education to relate terms to
synonyms and expressions associated with the ontology. To view beyond the
semantic classes, we used automatic concept maps to establish relationships
between concepts included in a hierarchical structure of propositions. Finally, to
assess the proposal, we made use of part of the data collected from the National
Observatory of Special Education in transcribed texts about the formation of five
cities, one from each region of Brazil. The results show limits already recognized in
the proposal and; in this respect, did not aim to establish a subjective and deep
analysis that would permit extreme precision results. It points out that the researcher
is and will always be the driving factor that operates the process’ flow and relying, or
not, on computing tools is not entirely immune to err. The proposal of serendipity has
given a step forward in the automatic process of data analysis and can be used in big
data without losing the subjectivity of the researcher. However, we must add new
human and technological resources to contribute to its improvement and encourage
other areas to develop domain ontologies with their experts and the development of
specific dictionaries. Therefore, despite its limitations, the approach has shown
significant advances in semantic exploration of qualitative data in the Special Education field and it is capable of being adapted to other areas and fields of
knowledge. / Nas últimas décadas, tem ocorrido uma revolução no modo como a ciência tem sido
conduzida, o atual contexto tem demandado cada vez mais o trabalho colaborativo,
tais como os estudos em redes de pesquisa de ampla escala. Um dos pontos
essenciais de mudança nessa nova forma de se fazer ciência tem sido o uso intenso
de Tecnologias de Informação e Comunicação (TIC), chamada como “eScience”,
que desempenha hoje um papel fundamental na metodologia adotada por muitos
grupos de pesquisa ao redor do mundo. Partiu-se então para uma reflexão acerca
do aprofundamento de dados qualitativos evidenciadas principalmente nas
pesquisas em Educação Especial. O grande desafio seria avançar na qualidade da
análise de dados qualitativos com uso das tecnologias da informação sem perder a
subjetividade envolvida na pesquisa e ampliar a capacidade de esmiuçar os dados
sem perder a liberdade de ir e vir, de criticar e estabelecer reflexões próprias,
respeitando posicionamentos e, sobretudo, mantendo o rigor científico na pesquisa.
Neste sentido, o presente estudo estabeleceu como objetivo principal avaliar a
arquitetura tecnológica proposta de análise qualitativa de dados, tendo como base
as teorias de mineração de textos, ontologia computacional e técnicas de anotação
semântica, em pesquisa da educação especial, a fim de analisar os limites e
possibilidades desta abordagem metodológica. Utilizamos como metodologia
baseada na construção de um protótipo, denominado Serendipity, fundamentado na
perspectiva da engenharia de software, de maneira que extraímos as principais
técnicas que puderam definir um método seguro para a concepção, implementação
e implantação da solução. De forma cíclica a metodologia permitia modificar
requisitos e estabelecer melhorias, permitindo a retroalimentação do processo a
partir de novas análises. Para isto, a mineração de textos apoiou-se na obtenção de
conhecimento a partir de bases de dados textuais que possuem pouca ou nenhuma
estrutura de dados. A ontologia computacional foi o elemento capaz de reconstruir a
representação sintática, dando a ela sentido. As palavras (dados) se relacionam e
são postas dentro de um contexto, de um conhecimento formal, dotando-as de uma
capacidade semântica e cognitiva, construindo conceitos, passível de interpretação,
compreensão e entendimento comum; para isto construiu-se uma ontologia
específica para Educação Especial. A anotação semântica ajudou a anexar
conteúdos ao texto para descrever a sua semântica, permitindo que agentes de
software pudessem recuperar informações de forma mais precisa, através da
associação do documento à ontologia, numa concepção de campos semânticos.
Construiu-se também um dicionário da Educação Especial customizado para
relacionar termos a sinônimos e expressões associadas à ontologia. Para
visualização, além das classes semânticas, utilizou-se de mapas conceituais
automáticos para estabelecer relações entre conceitos incluídos numa estrutura
hierárquica de proposições. Por fim, para a avaliação da proposta utilizou-se de
parte dos dados coletados no Observatório Nacional da Educação Especial de
textos transcritos acerca da Formação em cinco cidades, sendo uma de cada região
do Brasil. Os resultados evidenciam limites já reconhecidos na proposta e, neste
aspecto, não teve a pretensão de determinar uma análise subjetiva e detalhista, que
a rigor, permita resultados de extrema precisão. Destaca que o pesquisador é e
sempre será o condutor livre do funcionamento do processo e contando, ou não,
com ferramentas computacionais ele pode cometer erros. A proposta do serendipity
deu um passo no processo automático de análise de dados, podendo ser
aproveitada em big data, pesquisas de nível nacional, sem perder a subjetividade do pesquisador. Para isto é preciso agregar novos recursos humanos e tecnológicos
que contribuam em seu aprimoramento. Estimular outras áreas a desenvolverem
ontologias de domínio com seus especialistas e a evolução dos dicionários
específicos. Portanto, apesar de seus limites, a abordagem possui avanços
significativos na prospecção semântica de dados qualitativos em Educação Especial
e passível de adaptação a outras áreas de conhecimento.
|
488 |
Minerafórum : um recurso de apoio para análise qualitativa em fóruns de discussãoAzevedo, Breno Fabrício Terra January 2011 (has links)
Esta tese aborda o desenvolvimento, uso e experimentação do MineraFórum. Trata-se de um recurso para auxiliar o professor na análise qualitativa das contribuições textuais registradas por alunos em fóruns de discussão. A abordagem desta pesquisa envolveu técnicas de mineração de textos utilizando grafos. As interações proporcionadas pelas trocas de mensagens em um fórum de discussão representam uma importante fonte de investigação para o professor. A partir da análise das postagens, o docente pode identificar quais alunos redigiram contribuições textuais que contemplam conceitos relativos ao tema da discussão, e quais discentes não o fizeram. Desta forma, é possível ter subsídios para motivar a discussão dos conceitos importantes que fazem parte do tema em debate. Para atingir o objetivo do presente estudo, foi necessário realizar uma revisão da literatura onde foram abordados temas como: a Educação a Distância (EAD); Ambientes Virtuais de Aprendizagem; os principais conceitos da área de Mineração de Textos e, por último, trabalhos correlacionados a esta tese. A estratégia metodológica utilizada no processo de desenvolvimento do MineraFórum envolveu uma série de etapas: 1) a escolha de uma técnica de mineração de textos adequada às necessidades da pesquisa; 2) verificação da existência de algum software de mineração de textos que auxiliasse o professor a analisar qualitativamente as contribuições em um fórum de discussão; 3) realização de estudos preliminares para avaliar a técnica de mineração escolhida; 4) definição dos indicadores de relevância das mensagens; elaboração de fórmulas para calcular a relevância das postagens; 5) construção do sistema; 6) integração do MineraFórum a três Ambientes Virtuais de Aprendizagem e, por último, 7) a realização de experimentos com a ferramenta. / This thesis presents the development, use and experimentation of the MineraFórum software. It is a resource that can help teachers in doing qualitative analyses of text contributions in discussion forums. This research included the use of text mining techniques with graphs. Message exchange in discussion forums are an important source of investigation for teachers. By analyzing students’ posts, teachers can identify which learners wrote contributions that have concepts related to the debate theme, and which students did not succeed to do so. This strategy may also give teachers the necessary elements to motivate discussion of concepts relevant to the topic being debated. To accomplish the objectives of this study, a review of the literature was carried on topics such as: Distance Learning; Virtual Learning Environments; main concepts in Text Mining; and studies related to this thesis. The methodological strategy used in the development of MineraFórum followed these steps: 1) choosing a text mining technique suitable to the needs of the research; 2) checking whether there was software available to help teachers to do qualitative analysis of contributions in discussion forums; 3) doing preliminary studies to evaluate the selected mining technique; 4) defining indicators of relevance in the messages; elaborating formulas to calculate relevance in posts; 5) building the system; 6) integrating MineraFórum to three Virtual Learning Environments, and 7) carrying experiments with the tool.
|
489 |
On text mining to identify gene networks with a special reference to cardiovascular disease / Identifiering av genetiska nätverk av betydelse för kärlförkalkning med hjälp av automatisk textsökning i Medline, en medicinsk litteraturdatabasStrandberg, Per Erik January 2005 (has links)
The rate at which articles gets published grows exponentially and the possibility to access texts in machine-readable formats is also increasing. The need of an automated system to gather relevant information from text, text mining, is thus growing. The goal of this thesis is to find a biologically relevant gene network for atherosclerosis, themain cause of cardiovascular disease, by inspecting gene cooccurrences in abstracts from PubMed. In addition to this gene nets for yeast was generated to evaluate the validity of using text mining as a method. The nets found were validated in many ways, they were for example found to have the well known power law link distribution. They were also compared to other gene nets generated by other, often microbiological, methods from different sources. In addition to classic measurements of similarity like overlap, precision, recall and f-score a new way to measure similarity between nets are proposed and used. The method uses an urn approximation and measures the distance from comparing two unrelated nets in standard deviations. The validity of this approximation is supported both analytically and with simulations for both Erd¨os-R´enyi nets and nets having a power law link distribution. The new method explains that very poor overlap, precision, recall and f-score can still be very far from random and also how much overlap one could expect at random. The cutoff was also investigated. Results are typically in the order of only 1% overlap but with the remarkable distance of 100 standard deviations from what one could have expected at random. Of particular interest is that one can only expect an overlap of 2 edges with a variance of 2 when comparing two trees with the same set of nodes. The use of a cutoff at one for cooccurrence graphs is discussed and motivated by for example the observation that this eliminates about 60-70% of the false positives but only 20-30% of the overlapping edges. This thesis shows that text mining of PubMed can be used to generate a biologically relevant gene subnet of the human gene net. A reasonable extension of this work is to combine the nets with gene expression data to find a more reliable gene net.
|
490 |
Um data warehouse de publicações científicas: indexação automática da dimensão tópicos de pesquisa dos data marts / A Data warehouse for scientific publications: automatic indexing of the research topic dimension for using in data martsAugusto Kanashiro 04 May 2007 (has links)
Este trabalho de mestrado insere-se no contexto do projeto de uma Ferramenta Inteligente de Apoio à Pesquisa (FIP), sendo desenvolvida no Laboratório de Inteligência Computacional do ICMC-USP. A ferramenta foi proposta para recuperar, organizar e minerar grandes conjuntos de documentos científicos (na área de computação). Nesse contexto, faz-se necessário um repositório de artigos para a FIP. Ou seja, um Data Warehouse que armazene e integre todas as informações extraídas dos documentos recuperados de diferentes páginas pessoais, institucionais e de repositórios de artigos da Web. Para suportar o processamento analítico on-line (OLAP) das informações e facilitar a ?mineração? desses dados é importante que os dados estejam armazenados apropriadamente. Dessa forma, o trabalho de mestrado teve como objetivo principal projetar um Data Warehouse (DW) para a ferramenta FIP e, adicionalmente, realizar experimentos com técnicas de mineração e Aprendizado de Máquina para automatizar o processo de indexação das informações e documentos armazenados no data warehouse (descoberta de tópicos). Para as consultas multidimensionais foram construídos data marts de forma a permitir aos pesquisadores avaliar tendências e a evolução de tópicos de pesquisa / This dissertation is related to the project of an Intelligent Tool for Research Supporting (FIP), being developed at the Laboratory of Computational Intelligence at ICMC-USP. The tool was proposed to retrieve, organize, and mining large sets of scientific documents in the field of computer science. In this context, a repository of articles becomes necessary, i.e., a Data Warehouse that integrates and stores all extracted information from retrieved documents from different personal and institutional web pages, and from article repositories. Data appropriatelly stored is decisive for supporting online analytical processing (OLAP), and ?data mining? processes. Thus, the main goal of this MSc research was design the FIP Data Warehouse (DW). Additionally, we carried out experiments with Data Mining and Machine Learning techniques in order to automatize the process of indexing of information and documents stored in the data warehouse (Topic Detection). Data marts for multidimensional queries were designed in order to facilitate researchers evaluation of research topics trend and evolution
|
Page generated in 0.1447 seconds