Global ETD Search

221	ENEM nas redes sociais: minera??o de textos e clusteriza??o Silva, Leila Maria 18 December 2017 (has links) Submitted by Jos? Henrique Henrique (jose.neves@ufvjm.edu.br) on 2018-07-24T17:34:56Z No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) leila_maria_silva.pdf: 2106552 bytes, checksum: 53ba37c88f3aa004f2201a85b74fd640 (MD5) / Approved for entry into archive by Rodrigo Martins Cruz (rodrigo.cruz@ufvjm.edu.br) on 2018-10-04T19:43:35Z (GMT) No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) leila_maria_silva.pdf: 2106552 bytes, checksum: 53ba37c88f3aa004f2201a85b74fd640 (MD5) / Made available in DSpace on 2018-10-04T19:43:35Z (GMT). No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) leila_maria_silva.pdf: 2106552 bytes, checksum: 53ba37c88f3aa004f2201a85b74fd640 (MD5) Previous issue date: 2017 / A internet ? hoje a maior fonte de informa??o eletr?nica existente. Cresce a cada dia o n?mero de usu?rios da internet, e consequentemente o uso das redes sociais online. S?o muitas as informa??es novas que ficam embutidas nas bases de dados textuais. Por causa da sua natureza din?mica, ou seja, milh?es de p?ginas surgem e desaparecem todos os dias, a tarefa de encontrar informa??es relevantes nessas bases de dados se torna muito dif?cil. As t?cnicas de minera??o de textos para a descoberta de informa??es na web surgiram da necessidade de sanar este problema. O presente trabalho versa sobre a aplica??o de m?todos de minera??o de textos com clusteriza??o na grande quantidade de mensagens sobre o Exame Nacional do Ensino M?dio no ano de 2016 provenientes da rede social Twitter. O foco deste estudo est? na obten??o de grupos de textos, a fim de possibilitar uma visualiza??o resumida e sintetizada dos assuntos mais comentados pelos usu?rios. Para manipula??o dessas bases textuais, o Modelo Cassiopeia foi utilizado empregando seu algoritmo de agrupamento textual que tem como principal finalidade gerar agrupamentos, ou seja, clusters (grupos) de documentos textuais que apresentam algum tipo de similaridade. O Modelo Cassiopeia apresenta um limite de processamento com a quantidade m?xima de 700 tweets. Os tweets passam primeiramente pela fase de limpeza dos textos no pr?-processamento, logo ap?s, a utiliza??o do algoritmo no processamento e por fim, as an?lises dos resultados no p?s-processamento. Os resultados obtidos neste trabalho mostram valores coesos quanto ? similaridade dos documentos dentro de um cluster e entre os clusters, avaliados por medidas de agrupamento textual, proposto pelo Modelo Cassiopeia. Isso demonstra a aplicabilidade dessa proposta para a visualiza??o sintetizada das informa??es mais significativas de um determinado tema, muitas vezes permitindo que a??es sejam antecipadas e impactos sobre a popula??o afetada sejam reduzidos. / Disserta??o (Mestrado Profissional) ? Programa de P?s-Gradua??o em Educa??o, Universidade Federal dos Vales do Jequitinhonha e Mucuri, 2017. / The Internet is today the largest source of existing electronic information. The number of Internet users is increasing daily, and consequently the use of online networks online. There are many new information that is embedded in textual databases. Because of its dynamic nature- that is, millions of pages and other numbers-a task of finding relevant information in those databases becomes very difficult. The techniques of text mining for a discovery of information on the web came from the need to heal this problem. The present work is about an application of methods of text mining with clustering in the large amount of messages on the National High School Exams in the year 2016 issu social network Twitter. The focus of this study is on obtaining groups of texts in order to enable a summary and synthesized publication of the appropriate comments of the users. For manipulation of textual bases, the Cassiopeia Model was used by using its textual grouping algorithm that has as main purpose to generate clusters, that is, clusters of textual documents and executed some kind of similarity. The Cassiopeia Model has a processing limit with a maximum of 700 tweets. The tweets first pass through the phase of cleaning the texts without preprocessing, afterwards, a use of the algorithm without processing and, finally, as analysis of the results without post-processing. The results obtained in this work are more closely related to the similarity of the documents within the cluster and between the clusters, through the measurements of textual grouping, proposed by the Cassiopeia Model. This demonstrates an application for an uninformed publication of the most important information on a given topic, often allowing actions to be anticipated and impacts on an affected population to be reduced. Minera??o de textos Twitter ENEM Clusteriza??o Redes sociais Cassiopeia Text mining Clustering Social networks
222	Software Development Productivity Metrics, Measurements and Implications Gupta, Shweta 06 September 2018 (has links) The rapidly increasing capabilities and complexity of numerical software present a growing challenge to software development productivity. While many open source projects enable the community to share experiences, learn and collaborate; estimating individual developer productivity becomes more difficult as projects expand. In this work, we analyze some HPC software Git repositories with issue trackers and compute productivity metrics that can be used to better understand and potentially improve development processes. Evaluating productivity in these communities presents additional challenges because bug reports and feature requests are often done by using mailing lists instead of issue tracking, resulting in difficult-to-analyze unstructured data. For such data, we investigate automatic tag generation by using natural language processing techniques. We aim to produce metrics that help quantify productivity improvement or degradation over the projects lifetimes. We also provide an objective measurement of productivity based on the effort estimation for the developer's work. Data analysis Natural language processing Scientific software Software engineering Text mining
223	Semantic text classification for cancer text mining Baker, Simon January 2018 (has links) Cancer researchers and oncologists benefit greatly from text mining major knowledge sources in biomedicine such as PubMed. Fundamentally, text mining depends on accurate text classification. In conventional natural language processing (NLP), this requires experts to annotate scientific text, which is costly and time consuming, resulting in small labelled datasets. This leads to extensive feature engineering and handcrafting in order to fully utilise small labelled datasets, which is again time consuming, and not portable between tasks and domains. In this work, we explore emerging neural network methods to reduce the burden of feature engineering while outperforming the accuracy of conventional pipeline NLP techniques. We focus specifically on the cancer domain in terms of applications, where we introduce two NLP classification tasks and datasets: the first task is that of semantic text classification according to the Hallmarks of Cancer (HoC), which enables text mining of scientific literature assisted by a taxonomy that explains the processes by which cancer starts and spreads in the body. The second task is that of the exposure routes of chemicals into the body that may lead to exposure to carcinogens. We present several novel contributions. We introduce two new semantic classification tasks (the hallmarks, and exposure routes) at both sentence and document levels along with accompanying datasets, and implement and investigate a conventional pipeline NLP classification approach for both tasks, performing both intrinsic and extrinsic evaluation. We propose a new approach to classification using multilevel embeddings and apply this approach to several tasks; we subsequently apply deep learning methods to the task of hallmark classification and evaluate its outcome. Utilising our text classification methods, we develop and two novel text mining tools targeting real-world cancer researchers. The first tool is a cancer hallmark text mining tool that identifies association between a search query and cancer hallmarks; the second tool is a new literature-based discovery (LBD) system designed for the cancer domain. We evaluate both tools with end users (cancer researchers) and find they demonstrate good accuracy and promising potential for cancer research.
224	Automatizace generování stopslov Krupník, Jiří January 2014 (has links) This diploma thesis focuses its point on automatization of stopwords generation as one method of pre-processing a textual documents. It analyses an influence of stopwords removal to a result of data mining tasks (classification and clustering). First the text mining techniques and frequently used algorithms are described. Methods of creating domain specific lists of stopwords are described to detail. In the end the results of large collections of text files testing and implementation methods are presented and discussed.
225	Hledání sémantické informace v textových datech s využitím latentní analýzy Řezníček, Pavel January 2015 (has links) The first part of thesis focuses on theoretical introduction to the methods of text mining -- Information retrieval, classification and clustering. LSA method is presented as an advanced model for representing textual data. Furthermore, the work describes source data and methods for their preprocessing and preparation used to enhance the effectiveness of text mining methods. For each chosen text mining method there are defined evaluation metrics and used already existing, or newly implemented, programs are presented. The results of experiments comparing the effects of different preprocessing type and use of different models of the source data are then demonstrated and discussed in the conclusion.
226	Assimetria de informação e sinalização na cadeia da carne bovina Ceolin, Alessandra Carla January 2011 (has links) A presente pesquisa tem como objetivo identificar e analisar os principais Mecanismos de Sinalização utilizados na cadeia da carne bovina brasileira, bem como, verificar o grau de ocorrências dessas sinalizações pela Ciência, pelo Governo e pela Mídia, por meio de processos de mineração de textos e descoberta do conhecimento. Embasando-se na Economia da Informação como teoria-chave dessa análise e concentrando-se nas teorias da Assimetria de Informação, Seleção Adversa e Sinalização e, também, da revisão das interações entre Ciência, Mídia e Governo, foram definidas cinco dimensões, sob as quais os sinais sobre a carne bovina são enquadrados: Econômica, Comunicação e Marketing, Qualidade, Institucional e Sistema de Produção. Para análise, foram coletados documentos textuais em formato eletrônico ao longo de cinco anos. A busca dos documentos deu-se em bases de dados de publicações científicas, em portais do Governo e em arquivos dos jornais e magazines disponíveis na rede mundial de computadores , a partir de palavras-chave relacionadas à cadeia da carne bovina. Foram selecionados 4.281 artigos científicos para compor a base de dados da Ciência, 730 documentos para a base de dados do Governo e 5.439 documentos na base de dados da Mídia, totalizando 10.450 documentos selecionados. Esses documentos foram armazenados e classificados no software QDA Miner segundo as variáveis fonte (Ciência, Governo e Mídia) e ano (2005, 2006, 2007, 2008 e 2009). Para extrair o conhecimento das bases textuais, foi elaborada uma estrutura de análise constituída pelas dimensões, códigos (sinais) e palavras -chave. Na sequência foram analisados e codificados os documentos com os respectivos sinalizadores. Aplicando-se a estrutura de análise do software, foram utilizadas representações gráficas elaboradas a partir da frequência relativa das ocorrências de cada código nas três fontes de informação. De acordo com os resultados apresentados foi possível verificar que os termos e dimensões utilizados pela Ciência, Governo e Mídia diferem nas frequências em praticamente todos os códigos , demonstrando pouca sinergia entre as fontes de informação . No decorrer do período analisado, percebeuse maior oscilação na frequência das publicações que incluíam os sinais Sanidade e Sustentabilidade, os quais foram mais enfatizados nos últimos anos da presente análise. Os códigos Certificação e Rastreabilidade que pareciam ter maior destaque para fornecer informações sobre aspectos ligados à carne bovina brasileira, de acordo com estudos existentes, não são os mais emitidos, de acordo com esta pesquisa. A maior identidade da Ciência ao longo do período de anál ise foi para a Dimensão Sistema de Produção, demonstrando que as publicações científicas possuem um caráter mais voltado à melhoria da produção em si, como alimentação, genética, reprodução, idade de abate, dentre outros. Em relação ao Governo, observou-se maior identidade dessa fonte de informação para a Dimensão Institucional, com exceção do ano de 2007, onde se percebe maior es ênfases das publicações do Governo nas Dimensões Qualidade e Sistema de Produção . Já em relação ao que se observa na Mídia, há maior interesse em publicações nas Dimensões Qualidade e Sistema de Produção. Ao analisar os resultados entre as três fontes de informação, verifica-se maior proximidade entre o que se observa na Mídia e no Governo, visto a predominância da Dimensão Sistema de Produção na Ciência. Analisando comparativamente as fontes de informação foi possível observar que o maior grau de similaridade entre os códigos foi observad o nos documentos publicados pelo Governo. Embora os códigos adotados para sinalizar informações sobre a cadeia da carne bovina sejam os mesmos, enquadrados igualmente em cinco Dimensões, determinados sinalizadores são utilizados com maior destaque, similaridades e agrupamentos de códigos e, de acordo com as peculiaridades de cada fonte de informação investigada (Ciência, Governo e Mídia). / The goal of this research is to identify and analyze the main signaling mechanisms utilized in the Brazilian beef chain, as well as to investigate the degree of occurrence of these signalings by science, government, and media, through text mining and knowledge discover processes. Using Information Economy as key theory of this analysis and concentrating on the theories of Information Asymmetry, Adverse Selection and Signaling and, also, through revision of the interactions between science, media and government , five dimensions have been defined which encompass the signals about beef: economic, communication and marketing, quality, institutional and production system. To perform the analysis, electronic text documents over a five-year period have been collected. The documents have been searched in scientific journal databases, government portals, and news paper and magazine files available on the Internet, based on keywords connected to beef chain. The selection included 4.281 scientific articles to compose the science database, 730 documents to the government database, and 5.439 documents to the media database, in a total of 10.450 documents selected. These documents have been stored and classified in the QDA Miner software through the source variables (science, government and media ) and year (2005, 2006, 2007, 2008 and 2009). To extract the knowledge from the text databases, it has been elaborated an analysis framework composed by dimensions, codes (signals) and keywords. The next step was to analyze and codi fy the documents with the respective signallers. With application of the software analysis framework , it has been used graph representations elaborated from the relative frequency of each code occurrences in the three sources of information. According to the results found, it h as been evidenced that the terms and dimensions used by science, government an d media differ in frequency in practically all codes, demonstrating little synergy between the information sources. Over the period analyzed, a greater oscillation has been perceived in publications that included the sani ty and sustainability signals, which have been more emphasized in certain years. According to this research, the certification and traceability signals which, judging from existing studies, seemed to have the greatest emphasis to supply information on aspects related with Brazilian beef, are not the most issued. Science, over the analysis period, has been more identified with the production system dimension, demonstrating that scientific journals are more focused on the enhancement of production itself, such as feeding, genetics, breeding, slaughter age, among others. With regard to the government, it has been observed that this source of information is more identified with the institutional dimension, with exception of the year 2007, where the greatest emphasis of government publications is perceived in the quality and production syste m dimensions. In the media, quality and production system dimensions are the most emphasized in publications. Analyzing the results between the three sources of information, a closer connection between media and government is perceived, given the predominance of production system dimension in science . Comparative analysis between the sources of information revealed that the greatest degree of similarity between the codes has been found in the documents published by the government. Although the codes adopted to signaling information about beef chain are the same, equally framed in five dimensions, certain signallers are utilized with a greater emphasis, similarities and code groupings, and in accordance with the peculiarities of each source of information investigated (science, government and media). Agronegócio Cadeia produtiva Carne Informação Economia da informação Information Text mining Information asymmetry Signaling Beef
227	E-mediation : mapeamento de indícios de mediação por meio de um sistema de mineração de textos Severo, Carlos Emilio Padilla January 2011 (has links) Esta pesquisa apresenta a especificação, desenvolvimento e aplicação de um sistema de mapeamento de indícios de mediação em ambientes virtuais de ensino-aprendizagem – AVEA. Este sistema visa o apoio ao processo de mediação pedagógica a professores que desenvolvem atividades vinculadas a Educação a Distância – EAD, objetivando reduzir a sobrecarga de trabalho desses na identificação e acompanhamento da evolução das mediações realizadas no ambiente. Para isso, utilizamos técnicas de mineração de textos com o emprego de mecanismos de inferência bayesiana para identificação de categorias de mediação a partir de interações realizadas entre os participantes de um curso na modalidade a distância. A validação do sistema e avaliação dos resultados foi realizada através de um estudo de caso, aplicado em uma disciplina de um curso de Pós-Graduação da UFRGS na área de Informática na Educação. Os dados utilizados nesta pesquisa foram obtidos a partir das interações armazenadas no ambiente virtual de ensino-aprendizagem Moodle do Centro Interdisciplinar de Novas Tecnologias na Educação. No decorrer da investigação foram selecionados estudantes da disciplina para mapeamento das interações realizadas. Com o mapeamento de tais interações e identificação das categorias de mediação para cada estudante, foram gerados gráficos e relatórios com informações sobre o processo de mediação. As categorias de mediação auxiliam na identificação dos níveis de mediação. Na análise dos resultados, as informações obtidas em tais gráficos e relatórios foram trianguladas com informações provenientes da entrevista com um dos tutores e a visão do pesquisador envolvido no estudo de caso. Os resultados apontam que a utilização de um sistema de mapeamento de indícios de mediação em ambientes virtuais de ensino-aprendizagem, com adoção da tecnologia de mineração de textos, torna possível a identificação de níveis de medição dos participantes de um curso na modalidade a distância. / This research presents the specification, development and employ of a system to mapping evidences of mediation on Virtual Learning Enviroments – VLE. This system aims to support the mediation process to teachers who develop educational activities related to distance education, aimed at reducing the workload of the teacher during the identification and attendance of the mediations evolution in the environment. For this, we use techniques of text mining with the employ of bayesian inference mechanisms to identify mediations categories from the users interactions in the online course. The system validation and evaluation of results were performed through a case study applied to a discipline of graduate course of the UFRGS about Computers in Education. The data used in this research were achieved from the interactions stored in the Moodle virtual learning environment of the Interdisciplinary Center of New Technologies in the Education. During the investigation were selected students from the discipline to interactions mapping. With the interactions mapping and identification of the mediations categories for each student were created graphics and reports with informations about mediation process. The mediation categories aid to identify levels of mediation. In analyzing the results, the information obtained from such graphs and reports were triangulated with information from an interview with the tutor and the vision of the researcher involved in the case study. The results shown that the use of a system to mapping evidences of mediation in virtual learning environments, with the choice of the text mining technology makes it possible to identify levels of mediation of the participants of a course in the distance modality. Computador na educação Ambiente virtual Mediação Ambiente de aprendizagem Moodle Virtual learning environment Pedagogical mediation Text mining
228	Presença plus : modelo de identificação de presença social em ambientes virtuais de ensino e aprendizagem Bastos, Helvia Pereira Pinto January 2012 (has links) Este trabalho de tese apresenta o Modelo Presença Plus (PPlus) para identificação de pistas textuais denotadoras de presença social em interações discursivas feitas por alunos em fóruns e chats educacionais. O grau de presença social (PS) é um indicativo de como os sujeitos interagem entre si e com o ambiente de aprendizagem; sendo considerado, na literatura, significativo para o desenvolvimento de relacionamentos e fortalecimento de sentimento de pertencimento no grupo. O trabalho se baseia na vertente Pragmática da Linguística, campo que enfatiza a importância de se considerar o contexto de produção dos eventos comunicativos e a dinâmica dialógica entre os interlocutores. Considerando que o mapeamento das interações dos discentes para detectar e avaliar seu grau de presença pode se constituir uma tarefa complexa e morosa para tutores de cursos a distância, desenvolveu-se um software para realizar o processamento automático das mensagens eletrônicas visando torná-lo uma funcionalidade a ser adicionada a ambientes virtuais de ensino e aprendizagem (AVEAs). Apesar de alguns impasses referentes, particularmente, aos aspectos sintáticos dos textos, os resultados obtidos no processamento, por lexicometria, das postagens foram satisfatoriamente semelhantes aos levantados na análise manual. O modelo PPlus e a escala de graus de PS foram também testados em um segundo ambiente disponibilizado na plataforma Moodle, tendo apresentado resultados equivalentes. A sondagem com professores e tutores de cursos a distância forneceu dados que corroboram a proposta de uma ferramenta a ser inserida em AVEAs de modo a facilitar o acompanhamento de estados afetivos, grau de envolvimento e interação entre os participantes no e com o ambiente. / This thesis presents Presence Plus (PPLus), a model for identifying indicators of social presence in text-based interactions made by students in educational forums and chats. The degree of social presence (SP) can be an indicator of how individuals interact among themselves and with the learning environment, and is considered by the literature to be relevant for the development of relationships and the strengthening of the sense of belonging in the group. This study is based on Pragmatics, an area of Linguistics that emphasizes the importance of the context in communicative events and the dialogic dynamics among speakers. Considering that mapping students’ interactions, as well as detecting and evaluating their degree of SP, may be a complex and time-consuming task in distance learning tutoring, a software was developed to do the automatic processing of posts, aiming at making it a possible functionality in virtual learning environments (VLEs). Despite the few conflicting results, mostly related to textual syntactic relations, data from processing tests using lexicometrics were satisfactorily similar to those obtained by manual analysis. The PPlus model and the SP scale were also tested in a different course in the Moodle platform. Results from this experiment presented equally positive data. Feedback from the questionnaire answered by teachers and tutors working in distance learning courses support the proposal of a tool that may facilitate their assessment of affective states, involvement and text-based interaction dents within the environment. Ambiente de aprendizagem Fórum de discussão Chat Presence plus model Social presence Virtual learning environments Text mining software
229	Conceitos de gestão da inovação : compatibilidades da linguagem técnica na produção científica veiculada em periódicos brasileiros entre 2008 e 2012 Valent, Vinicius Dornelles January 2013 (has links) Este estudo analisa conceitos da linguagem científica, veiculados em periódicos nacionais da área de Administração, no período de 2008 a 2012, mais precisamente, 11 revistas classificadas pelo Qualis-Capes. Para tanto, focaliza a Teoria da Gestão da Inovação, embasada na Teoria Econômica da vertente schumpeteriana e neo-schumpeteriana e na Teoria da Linguagem. O ponto de partida foi o questionamento sobre: Qual a estrutura lógica dos conceitos-chave dos artigos sobre Gestão da Inovação? O objetivo geral foi verificar a consistência da linguagem empregada na produção de artigos brasileiros sobre a Gestão da Inovação em relação à sua Teoria. Como objetivos específicos, foram definidos os seguintes: identificar conceitos-chave por meio de um estudo-piloto; confrontar os conceitos identificados encontrados com aqueles elencados no projeto da dissertação; calcular a frequência do uso de tais conceitos-chave; formar clusters com os conceitos empregados pelos pesquisadores da área e, confrontar com as teorias de base o conteúdo de alguns conceitos de maior ocorrência. Este estudo classifica-se como uma pesquisa exploratória, com tratamentos quantitativo (mineração de textos) e qualitativo (análise de conteúdo) de dados. Os nove conceitos pesquisados foram os seguintes: Aprendizagem; Ciência e conhecimento científico; Capacidades (capabilities); Informação; Inovação; Invenção; Pesquisa e Desenvolvimento (P&D); Técnica e Tecnologia. O estudo-piloto constou da leitura de 31 artigos de uma das 11 revistas, cuja especialização era o tema da inovação. A análise quantitativa resultou em 10 clusters, representados por dendrogramas. A análise qualitativa confrontou alguns excertos extraídos dos artigos selecionados com as teorias de base da dissertação. O resultado da análise quantitativa apontou que os conceitos de “Informação”, “Aprendizagem” e “Tecnologia” são os que mais formam clusters com o conceito de “Inovação”. A análise qualitativa revelou que existem lacunas na relação lógica conceitual em muitos casos de aplicação dos conceitos-chave. Este motivo torna-se suficiente para estimular o surgimento de novos estudos nesta linha de tendência multidisciplinar. / The present study examines concepts of scientific language, conveyed in national Business Management scientific journals area, within the period between 2008 and 2012. Therefore, focuses on the theory of Innovation Management, based in the Economic Theory of schumpeterian and neo-schumpeterian strand and in the Theory of Language. The starting point was the question: "what key concepts in Innovation Management constitute a language that matches theoretical content?" The overall goal was to verify the language consistency used in some of Brazilian production of Innovation Management articles. Specific objectives were defined as follows: identify key concepts in reading scientific Brazilian articles chosen, list the key concepts related to innovation management; calculate use frequency of these key concepts; clustering the concepts used by this area researchers and compare contents of some highest occurrence concepts with based theories. This is an exploratory and descriptive research with quantitative (text mining) and qualitative (content analysis) treatment of data. The nine concepts surveyed were: Learning, Science and scientific knowledge; Capabilities; Information, Innovation, Invention, Research and Development (R&D); Technique and Technology. In order to confirm the pre-established conceptual corpus, we conducted a pilot study with a reading of 31 articles from a journal on the topic of innovation. The quantitative analysis resulted in 10 clusters, represented by dendrograms and qualitative analysis confronting extracted excerpts of selected articles with base theories of the study. The results of quantitative analysis showed that the concepts of "Information", "Learning" and "Technology" are the most clustered with the "Innovation" concept. Qualitative analysis revealed that there are gaps in the logic conceptual relationship in many of key concept applications. This fact becomes reason enough to stimulate the emergence of new studies in this multidisciplinary trendline. Gestão da inovação Teoria do conceito Produção científica Innovation management Theory of concept Text mining
230	As dimensões disciplinares na comunicação científica em biocombustíveis Gomes, Janaína January 2009 (has links) A comunicação científica constitui o substrato da pesquisa científica. Por meio dela se configuram os campos de legitimação do conhecimento. Este trabalho de tese se dedicou ao estudo do campo dos biocombustíveis através da comunicação científica. O referido campo de pesquisa envolve diferentes áreas do conhecimento e se refere às demandas energéticas da sociedade pós-industrial. Foram analisados dez anos da comunicação científica para se estabelecer as dimensões disciplinares sobre as quais essa discussão se sustenta. Para tanto, dois métodos de pesquisa foram combinados. Utilizou-se a bibliometria e a análise de conteúdo quantitativa, através de técnicas de text mining. A análise bibliométrica foi realizada com dados quantitativos sobre a comunicação científica, disponíveis na base Web of Science. A análise de conteúdo quantitativa foi feita com textos completos dos artigos e revisões científicas sobre biocombustíveis, utilizando-se o software Wordstat. Os dados bibliométicos apresentaram um alto grau de interdisciplinaridade expresso pela inter-relação de 132 áreas do conhecimento. Ademais, observou-se a predominância das áreas da Química (1.513 artigos e revisões), Engenharias (1.157) e Ciências Agrárias (1.029), configurando um campo com inserção eminentemente tecnológica. Na análise de conteúdo foi possível revelar uma inserção muito significativa das Ciências Sociais na argumentação dos artigos e revisões analisados. Com os dados obtidos foi possível dividir o campo dos biocombustíveis em três grupos de dimensões disciplinares, que o contextualizam. No primeiro grupo, de maior abrangência, participam as dimensões disciplinares das Ciências Agrárias, das Ciências Sociais e das Ciências Ambientais. No segundo grupo, que constitui a base tecnológica do campo, se expressam as dimensões disciplinares da Química, da Engenharia e da Microbiologia. O terceiro grupo, de expressão emergente, reúne as dimensões disciplinares da Biologia e Bioquímica, Ciências Animais e Vegetais, Biologia Molecular e Genética, Economia, Ciência dos Materiais, Nanociências e Nanotecnologia, Geociências, Física, Humanidades, Ciências Multidisciplinares, Matemática e Ciências da Computação. Infere-se que o primeiro grupo de dimensões disciplinares encerra os componentes que justificam socialmente o progresso do campo dos biocombustíveis, enquanto o segundo grupo representa a base tecnológica em que se sustenta essa temática de pesquisa. O terceiro grupo representa as áreas emergentes. No trabalho, formula-se uma métrica para a aferição da expressão da Interdisciplinaridade, útil também para outros campos de pesquisa. / The scientific communication on biofuels published from 1998 to 2007 was analysed by the use of a combination of bibliometric methods and techniques of content analysis. The analysis characterized this field of research as interdisciplinary with marked social relevance. The bibliometric study showed that in this research field 132 different, interacting areas concur with knowledge. Content analysis configured this field under the context of three groups of disciplinary dimensions. The first group, of broader influence, includes Agricultural Sciences, Social Sciences, and Environmental Sciences. The second group, which makes up the technological bases of the field, includes the disciplinary dimensions of Chemistry, Engineering, and Microbiology. In the third group, there are the disciplinary dimensions with emergent expressions in the field of biofuels, namely Biology and Biochemistry, Animal and Plant Sciences, Molecular Biology and Genetics, Economy, Material Sciences, Nanosciences and Nanotechnology, Geosciences, Physics, Humanities, Muldisciplinary Sciences, Mathematics, and Computer Sciences. One can infer from the study that the first group of disciplinary dimensions conform the elements that socially validate the progress of the research in the field of biofuels. Furthermore, in this work a metric is presented for the measurement of the expression of the interdisciplinarity of a research field, useful for the analysis of the biofuel research field and of others as well. Comunicação científica Biocombustíveis Interdisciplinarity Agroenergy Social communication Content analysis Text mining

Search results