• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 248
  • 124
  • 44
  • 38
  • 31
  • 29
  • 24
  • 24
  • 13
  • 7
  • 6
  • 6
  • 5
  • 5
  • 5
  • Tagged with
  • 630
  • 630
  • 144
  • 132
  • 122
  • 115
  • 95
  • 89
  • 87
  • 82
  • 81
  • 77
  • 72
  • 67
  • 66
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
531

The rise and fall of biodiversity in literature: A comprehensive quantification of historical changes in the use of vernacular labels for biological taxa in Western creative literature

Langer, Lars, Burghardt, Manuel, Borgards, Roland, Böhning-Gaese, Katrin, Seppelt, Ralf, Wirth, Christian 30 May 2024 (has links)
Nature's non-material contributions to people are difficult to quantify and one aspect in particular, nature's contributions to communication (NCC), has so far been neglected. Recent advances in automated language processing tools enable us to quantify diversity patterns underlying the distribution of plant and animal taxon labels in creative literature, which we term BiL (biodiversity in literature). We assume BiL to provide a proxy for people's openness to nature's non-material contributions enhancing our understanding of NCC. We assembled a comprehensive list of 240,000 English biological taxon labels. We pre-processed and searched a subcorpus of digitised literature on Project Gutenberg for these labels. We quantified changes in biodiversity indices commonly used in ecological studies for 16,000 books, encompassing 4,000 authors, as proxies for BiL between 1705 and 1969. We observed hump-shape patterns for taxon label richness, abundance and Shannon diversity indicating a peak of BiL in the middle of the 19th century. This is also true for the ratio of biological to general lexical richness. The variation in label use between different sections within books, quantified as β-diversity, declined until the 1830s and recovered little, indicating a less specialised use of taxon labels over time. This pattern corroborates our hypothesis that before the onset of industrialisation BiL may have increased, reflecting several concomitant influences such as the general broadening of literary content, improved education and possibly an intensified awareness of the starting loss of biodiversity during the period of romanticism. Given that these positive trends continued and that we do not find support for alternative processes reducing BiL, such as language streamlining, we suggest that this pronounced trend reversal and subsequent decline of BiL over more than 100 years may be the consequence of humans’ increasing alienation from nature owing to major societal changes in the wake of industrialisation. We conclude that our computational approach of analysing literary communication using biodiversity indices has a high potential for understanding aspects of non-material contributions of biodiversity to people. Our approach can be applied to other corpora and would benefit from additional metadata on taxa, works and authors.
532

A Computational Approach to Analyzing Musical Complexity of the Beatles

Burghardt, Manuel, Fuchs, Florian 05 June 2024 (has links)
No description available.
533

Measuring and Analyzing Community Resilience During COVID-19 Using Social Media

Valinejad, Jaber 22 October 2021 (has links)
Community resilience (CR) has been studied as an indicator to measure how well a given community copes with a given disaster and provides policy directions on what aspects of the community should be improved with high priority. Although the impact of the COVID-19 has been serious all over the world and every aspect of our daily life, some countries have handled this disaster better than others. In this thesis, I aim to assess the effect of various news and Tweets collected during the COVID-19 pandemic on community functionality and resilience. First, we measure the community resilience (CR) in five different countries using Tweeter data and investigated how each country shows different trends of the CR, which is measured based on real or fake Tweets. We use Tweets generated in Australia (AUS), Singapore (SG), Republic of Korea (ROK), the United Kingdom (UK), and the United States (US) for Mar.-Nov. 2020 and measured the CR of each country and associated attributes for analyzing the overall trends. In the next step, we scrap and manually clean 4,952 full-text news articles from Jan. 2020 to Jun. 2021 and classify them into real, mixed, and fake news by fact-checking. Then we retrieve Tweets from 42,877,312 Tweets IDs from the same period and classify them into real, mixed, and fake Tweets using machine learning classifiers. We compare CR measured from news articles and Tweets based on three categories, namely, real, mixed, and fake. Based on the news articles and Tweets collected, we quantify CR based on two key factors, community wellbeing and resource distribution. We evaluate community wellbeing by assessing mental wellbeing and physical wellbeing while evaluating resource distribution by assessing economic resilience, infrastructural resilience, institutional resilience, and community capital. Based on the estimates of these two factors, we quantify CR from both news articles and Tweets and analyze the extent to which CR measured from the news articles can reflect the actual state of CR measured from Tweets. / M.S. / The COVID-19 pandemic has severely harmed every aspect of our daily lives, resulting in a slew of social problems. It is critical to accurately assess the current state of community functionality and resilience under this pandemic to recover from it successfully. To accomplish this, various types of social sensing techniques, such as Tweeting and publicly released news, have been employed to understand individuals’ and communities’ thoughts, behaviors, and attitudes during the COVID-19 pandemic. However, some portions of the released news are fake and can easily mislead the community to respond improperly to disasters like COVID-19. In this thesis, I aim to assess the effect of various news and Tweets collected during the COVID-19 pandemic on community functionality and resilience. First, we measure the community resilience (CR) in five different countries, i.e., Australia (AUS), Singapore (SG), Republic of Korea (ROK), the United Kingdom (UK), and the United States (US), for Mar.-Nov. 2020 and measured the CR of each country and associated attributes for analyzing the overall trends. In the next step, we compare CR measured from news articles and Tweets based on three categories, namely, real, mixed, and fake. We quantify CR based on two key factors, community wellbeing and resource distribution. We evaluate community wellbeing by assessing mental wellbeing and physical wellbeing while evaluating resource distribution by assessing economic resilience, infrastructural resilience, institutional resilience, and community capital.
534

Digital Environmental Humanities

Langer, Lars, Burghardt, Manuel, Borgards, Roland, Köhring, Esther, Wirth, Christian 26 June 2024 (has links)
No description available.
535

Peeking Inside the DH Toolbox - Detection and Classification of Software Tools in DH Publications

Ruth, Nicolas, Niekler, Andreas, Burghardt, Manuel 26 June 2024 (has links)
Digital tools have played an important role in Digital Humanities (DH) since its beginnings. Accordingly, a lot of research has been dedicated to the documentation of tools as well as to the analysis of their impact from an epistemological perspective. In this paper we propose a binary and a multi-class classification approach to detect and classify tools. The approach builds on state-of-the-art neural language models. We test our model on two different corpora and report the results for different parameter configurations in two consecutive experiments. In the end, we demonstrate how the models can be used for actual tool detection and tool classification tasks in a large corpus of DH journals.
536

Leveraging customer knowledge in open innovation processes by using social software

Kruse, Paul 24 May 2016 (has links) (PDF)
Involving customers in the creation and design process of new products and services has been dis-cussed in practice and research since the early 1980’s. As one of the first researchers, von Hippel (1986) shed light on the concept of Lead Users, a group of users who are able to provide most accu-rate data on future needs for organizations. Subsequently, many scholars emphasized different areas of contribution for customers and how they provide assistance to the process of innovation. First of all, customers may contribute to product innovation (Cooper & Kleinschmidt, 1987; Driessen & Hillebrand, 2013; Füller & Matzler, 2007; Gruner & Homburg, 2000; Sawhney, Verona, & Prandelli, 2005; Snow, Fjeldstad, Lettl, & Miles, 2011; Yang & Rui, 2009) and service innovation (Abecassis-Moedas, Ben Mahmoud-Jouini, Dell’Era, Manceau, & Verganti, 2012; Alam, 2002; Chesbrough, 2011; Larbig-Wüst, 2010; Magnusson, 2003; Paton & Mclaughlin, 2008; Shang, Lin, & Wu, 2009; Silpakit & Fisk, 1985), e.g., by co-creating values (Prahalad & Ramaswamy, 2004), such as concepts or designs as well as reviewing and testing them throughout the stages of the process of innovation. From the customers’ point of view, being involved in innovation processes and becoming a part of the organ-ization is a desire of an increasing number of them. Customers are demanding more individual and more tailored products. They are increasingly knowledgeable and capable of designing and produc-ing their own products and services. Due to the fact that their influence on product development is positively related to the quality of the new product (Sethi, 2000), more and more organizations appreciate them as innovation actors and are willing to pay them for their input. Today, customers are not only involved in the qualification of products (Callon, Méadel, & Rabeharisoa, 2002; Callon & Muniesa, 2005; Grabher, Ibert, & Flohr, 2009) but also allowed to customize and evaluate them on the path to innovation (Franke & Piller, 2004; Piller & Walcher, 2006; von Hippel & Katz, 2002; von Hippel, 2001). Moreover, there is an abundance of studies that stress the customers’ influence on effectiveness (de Luca & Atuahene-Gima, 2007; Kleinschmidt & Cooper, 1991; Kristensson, Matthing, & Johansson, 2008; Still, Huhtamäki, Isomursu, Lahti, & Koskela-Huotari, 2012) and risk (Bayer & Maier, 2006; Enkel, Kausch, & Gassmann, 2005; Enkel, Perez-Freije, & Gassmann, 2005). While the latter comprises the risk of customer integration as well as the customers’ influence on market risks, e.g., during new product development, studies on effectiveness are mostly concerned with customer-orientation and products/services in line with customers’ expectations (Atuahene-Gima, 1996, 2003; Fuchs & Schreier, 2011). The accompanying change in understanding became known as open innovation (OI; first coined by Chesbrough in 2003) and represents a paradigm shift, where organizations switch their focus from internally generated innovation (i.e., ideation, in-house R&D, etc.) toward external knowledge and open innovation processes, thus, allowing them to integrate external ideas and actors, i.e. custom-ers (Chesbrough, 2006) and other external stakeholders (Laursen & Salter, 2006). Since then, OI has been identified as a success factor for increasing customer satisfaction (Füller, Hutter, & Faullant, 2011; Greer & Lei, 2012) and growing revenues (Faems, De Visser, Andries, & van Looy, 2010; Mette, Moser, & Fridgen, 2013; Spithoven, Frantzen, & Clarysse, 2010). In addition to that, by open-ing their doors to external experts and knowledge workers (Kang & Kang, 2009), organizations cope with shorter innovation cycles, rising R&D costs, and the shortage of resources (Gassmann & Enkel, 2004). Parallel to the paradigm shift in innovation, another shift has taken place in information and com-munication technologies (Kietzmann, Hermkens, McCarthy, & Silvestre, 2011). Only a few years ago, when customer integration was still very costly, companies had to fly in customers, provide facilities onsite, permanently assign employees to such activities, and incentivise each task execut-ed by customers. Today, emerging technologies (subsumed under the term ‘social software’) help integrating customers or other external stakeholders, who are increasingly familiar with the such technologies from personal usage experience (Cook, 2008), and grant them access from all over the world in a 24/7 fashion. Examples include blogging tools, social networking systems, or wikis. These technologies help organizations to access customer knowledge, facilitate the collaboration with customers (Culnan, McHugh, & Zubillaga, 2010; Piller & Vossen, 2012) at reduced costs and allow them to address a much larger audience (Kaplan & Haenlein, 2010). On the other hand, customers can now express their needs in a more direct way to organizations. However, each technology or application category may present a completely different benefit to the process of innovation or parts of it and, thus, the innovation itself. Reflecting these developments, organizations need to know two things: how can they exploit the customers’ knowledge for innovation purposes and how may the implementation of social soft-ware support this. Hence, this research addresses the integration of customers in organizational innovation, i.e. new product development. It addresses how and why firms activate customers for innovation and which contribution customers provide to the process of innovation. Additionally, it investigates which tasks customers may take over in open innovations projects and which strategies organiza-tions may choose to do so. It also addresses which social software application supports each task best and how organizations may select the most suitable application out of a rapidly growing num-ber of alternatives. The nature of this research is recommendatory and aims at designing a solution for organizations that are interested in the potential contribution of customers during innovation, already involve customers in innovation tasks or plan to do so. Following the recommendations of this research should result in a more effective organizational exploitation of customer knowledge and their workforce and, thus, a value added to innovation and the outcomes of the process of innovation, e.g., a product that better fits the customers’ expectations and demands or consequently a better adoption of the product by the customer.
537

[en] DATA SCIENCE AND SOLID STATE CHEMISTRY: A PLATFORM FOR THE COMPETITIVENESS OF THE PHARMACEUTICAL INDUSTRY IN EMERGING MARKETS / [pt] CIÊNCIA DE DADOS E QUÍMICA DO ESTADO SÓLIDO: UMA PLATAFORMA PARA COMPETITIVIDADE DA INDÚSTRIA FARMOQUÍMICA E FARMACÊUTICA EM MERCADO EMERGENTES

RONALDO PEDRO DA SILVA 28 November 2018 (has links)
[pt] A área de química do estado sólido ocupa uma posição cada vez mais importante nas atividades de pesquisa e desenvolvimento farmacêuticas. A compreensão das propriedades do estado sólido de um insumo farmacêutico ativo (IFA) mostra-se crítica no desenvolvimento de formulações em função de seus impactos na biodisponibilidade e solubilidade dos fármacos, sendo essencial para garantir o benefício terapêutico, otimizar o desenvolvimento e garantir a proteção da propriedade intelectual. Esta tese investiga indicadores científicos e tecnológicos na área de química do estado sólido utilizando ferramentas de ciência dos dados a partir de publicações científicas e depósitos de patentes, visando contribuir para o aumento da competitividade da indústria farmoquímica e farmacêutica brasileira e de outros mercados emergentes. A partir da utilização de ferramentas de ciência dos dados é proposta uma metodologia baseada em técnicas de text mining associadas a relações fuzzy. Essa metodologia de identificação de competências específicas aplicada na área de química do estado sólido tem como estudo de caso a descoberta de uma nova forma polimórfica para o IFA acetato de dexametasona. Os resultados revelam que existem competências científicas em química do estado sólido no Brasil. Contudo, quando comparada com a interação universidade-empresa mundial, a indústria farmoquimica e farmacêutica local perde em estágio de competitividade e desenvolvimento. Por outro lado, os resultados demonstram a robustez da metodologia e sua capacidade de identificar pesquisadores em área específicas, oferecendo soluções para apoio a tomada de decisão e identificação de pesquisadores relevantes para o desenvolvimento do setor farmoquímico e farmacêutico. / [en] The solid-state chemistry area has received increased attention in the pharmaceutical research and development activities. The comprehension of the solid-state properties of an active pharmaceutical ingredient (API) is critical in the development of formulations due to their impact on the bioavailability and solubility of the final drug, being essential to ensure therapeutic benefit, optimize development and allow a proper intellectual property protection. This research investigates science and technology indicators in the solid-state chemistry area using data science tools applied to scientific publications and patent documents, aiming to contribute to the increase of the competitiveness of the pharmaceutical industry in Brazil and in other emerging markets. Through data science tools, a methodology based on text mining techniques associated to fuzzy relations is proposed. This methodology for identifying specific competencies is applied in the solid-state chemistry area exploring a case study of the discovery of a new polymorphic form of the API dexamethasone acetate. The results reveal the existence of scientific competencies in solid-state chemistry in Brazil. However, when compared to the global university-company interaction, the local pharmaceutical industry shows a lower stage of competitiveness and development. On the other hand, the results indicates the robustness of the methodology and its ability to identify researchers in specific areas, offering solutions to support the decision making and identification of researchers relevant to the development of the pharmaceutical sector.
538

Le développement du neuromarketing aux Etats-Unis et en France. Acteurs-réseaux, traces et controverses / The comparative development of neuromarketing between the United States and France : Actor-networks, traces and controversies

Teboul, Bruno 20 September 2016 (has links)
Notre travail de recherche explore de manière comparée le développement du neuromarketing aux Etats-Unis et en France. Nous commençons par analyser la littérature sur le neuromarketing. Nous utilisons comme cadre théorique et méthodologique l’Actor Network Theory (ANT) ou Théorie de l’Acteur-Réseau (dans le sillage des travaux de Bruno Latour et Michel Callon). Nous montrons ainsi comment des actants « humains et non-humains »: acteurs-réseaux, traces (publications) et controverses forment les piliers d’une nouvelle discipline telle que le neuromarketing. Notre approche hybride « qualitative-quantitative », nous permet de construire une méthodologie appliquée de l’ANT: analyse bibliométrique (Publish Or Perish), text mining, clustering et analyse sémantique de la littérature scientifique et web du neuromarketing. A partir de ces résultats, nous construisons des cartographies, sous forme de graphes en réseau (Gephi) qui révèlent les interrelations et les associations entre acteurs, traces et controverses autour du neuromarketing. / Our research explores the comparative development of neuromarketing between the United States and France. We start by analyzing the literature on neuromarketing. We use as theoretical and methodological framework the Actor Network Theory (ANT) (in the wake of the work of Bruno Latour and Michel Callon). We show how “human and non-human” entities (“actants”): actor-network, traces (publications) and controversies form the pillars of a new discipline such as the neuromarketing. Our hybrid approach “qualitative-quantitative” allows us to build an applied methodology of the ANT: bibliometric analysis (Publish Or Perish), text mining, clustering and semantic analysis of the scientific literature and web of the neuromarketing. From these results, we build data visualizations, mapping of network graphs (Gephi) that reveal the interrelations and associations between actors, traces and controversies about neuromarketing.
539

Leveraging customer knowledge in open innovation processes by using social software

Kruse, Paul 10 September 2015 (has links)
Involving customers in the creation and design process of new products and services has been dis-cussed in practice and research since the early 1980’s. As one of the first researchers, von Hippel (1986) shed light on the concept of Lead Users, a group of users who are able to provide most accu-rate data on future needs for organizations. Subsequently, many scholars emphasized different areas of contribution for customers and how they provide assistance to the process of innovation. First of all, customers may contribute to product innovation (Cooper & Kleinschmidt, 1987; Driessen & Hillebrand, 2013; Füller & Matzler, 2007; Gruner & Homburg, 2000; Sawhney, Verona, & Prandelli, 2005; Snow, Fjeldstad, Lettl, & Miles, 2011; Yang & Rui, 2009) and service innovation (Abecassis-Moedas, Ben Mahmoud-Jouini, Dell’Era, Manceau, & Verganti, 2012; Alam, 2002; Chesbrough, 2011; Larbig-Wüst, 2010; Magnusson, 2003; Paton & Mclaughlin, 2008; Shang, Lin, & Wu, 2009; Silpakit & Fisk, 1985), e.g., by co-creating values (Prahalad & Ramaswamy, 2004), such as concepts or designs as well as reviewing and testing them throughout the stages of the process of innovation. From the customers’ point of view, being involved in innovation processes and becoming a part of the organ-ization is a desire of an increasing number of them. Customers are demanding more individual and more tailored products. They are increasingly knowledgeable and capable of designing and produc-ing their own products and services. Due to the fact that their influence on product development is positively related to the quality of the new product (Sethi, 2000), more and more organizations appreciate them as innovation actors and are willing to pay them for their input. Today, customers are not only involved in the qualification of products (Callon, Méadel, & Rabeharisoa, 2002; Callon & Muniesa, 2005; Grabher, Ibert, & Flohr, 2009) but also allowed to customize and evaluate them on the path to innovation (Franke & Piller, 2004; Piller & Walcher, 2006; von Hippel & Katz, 2002; von Hippel, 2001). Moreover, there is an abundance of studies that stress the customers’ influence on effectiveness (de Luca & Atuahene-Gima, 2007; Kleinschmidt & Cooper, 1991; Kristensson, Matthing, & Johansson, 2008; Still, Huhtamäki, Isomursu, Lahti, & Koskela-Huotari, 2012) and risk (Bayer & Maier, 2006; Enkel, Kausch, & Gassmann, 2005; Enkel, Perez-Freije, & Gassmann, 2005). While the latter comprises the risk of customer integration as well as the customers’ influence on market risks, e.g., during new product development, studies on effectiveness are mostly concerned with customer-orientation and products/services in line with customers’ expectations (Atuahene-Gima, 1996, 2003; Fuchs & Schreier, 2011). The accompanying change in understanding became known as open innovation (OI; first coined by Chesbrough in 2003) and represents a paradigm shift, where organizations switch their focus from internally generated innovation (i.e., ideation, in-house R&D, etc.) toward external knowledge and open innovation processes, thus, allowing them to integrate external ideas and actors, i.e. custom-ers (Chesbrough, 2006) and other external stakeholders (Laursen & Salter, 2006). Since then, OI has been identified as a success factor for increasing customer satisfaction (Füller, Hutter, & Faullant, 2011; Greer & Lei, 2012) and growing revenues (Faems, De Visser, Andries, & van Looy, 2010; Mette, Moser, & Fridgen, 2013; Spithoven, Frantzen, & Clarysse, 2010). In addition to that, by open-ing their doors to external experts and knowledge workers (Kang & Kang, 2009), organizations cope with shorter innovation cycles, rising R&D costs, and the shortage of resources (Gassmann & Enkel, 2004). Parallel to the paradigm shift in innovation, another shift has taken place in information and com-munication technologies (Kietzmann, Hermkens, McCarthy, & Silvestre, 2011). Only a few years ago, when customer integration was still very costly, companies had to fly in customers, provide facilities onsite, permanently assign employees to such activities, and incentivise each task execut-ed by customers. Today, emerging technologies (subsumed under the term ‘social software’) help integrating customers or other external stakeholders, who are increasingly familiar with the such technologies from personal usage experience (Cook, 2008), and grant them access from all over the world in a 24/7 fashion. Examples include blogging tools, social networking systems, or wikis. These technologies help organizations to access customer knowledge, facilitate the collaboration with customers (Culnan, McHugh, & Zubillaga, 2010; Piller & Vossen, 2012) at reduced costs and allow them to address a much larger audience (Kaplan & Haenlein, 2010). On the other hand, customers can now express their needs in a more direct way to organizations. However, each technology or application category may present a completely different benefit to the process of innovation or parts of it and, thus, the innovation itself. Reflecting these developments, organizations need to know two things: how can they exploit the customers’ knowledge for innovation purposes and how may the implementation of social soft-ware support this. Hence, this research addresses the integration of customers in organizational innovation, i.e. new product development. It addresses how and why firms activate customers for innovation and which contribution customers provide to the process of innovation. Additionally, it investigates which tasks customers may take over in open innovations projects and which strategies organiza-tions may choose to do so. It also addresses which social software application supports each task best and how organizations may select the most suitable application out of a rapidly growing num-ber of alternatives. The nature of this research is recommendatory and aims at designing a solution for organizations that are interested in the potential contribution of customers during innovation, already involve customers in innovation tasks or plan to do so. Following the recommendations of this research should result in a more effective organizational exploitation of customer knowledge and their workforce and, thus, a value added to innovation and the outcomes of the process of innovation, e.g., a product that better fits the customers’ expectations and demands or consequently a better adoption of the product by the customer.:1 Introduction 2 Theoretical foundation 3 Research areas and focal points 4 Research aims and questions 5 Methods 6 Findings 7 Conclusion References Essay 1: The Role of External Knowledge in Open Innovation – A Systematic Review of Literature Essay 2: External Knowledge in Organisational Innovation – Toward an Integration Concept Essay 3: Idea Mining – Text Mining Supported Knowledge Management for Innovation Purposes Essay 4: How do Tasks and Technology fit? – Bringing Order to the Open Innovation Chaos
540

中文文本探勘工具:主題分析、詞組關聯強度、相關句擷取 / Tools for Chinese Text Mining: Topic Analysis, Association Strengths of Collocations, Extraction of Relevant Statements

林書佑, Lin, Shu Yu Unknown Date (has links)
現今資料大量且快速數位化的時代,各領域對資訊探勘分析技術越趨倚重。而在數位人文中領域中從2009年「數位典藏與數位人文國際研討會」開始,此議題逐漸受到重視,主要目的為將數位文物結合資訊分析與圖像化輔助,透過不同層面的詮釋建構出更完整的文物資訊。 本研究建構一個針對各種中文語料分析的工具,藉由latent semantic analysis、pointwise mutual information、Person’s chi-squared test、typed dependencies distance、word2vec、Gibbs sampling for latent Dirichlet allocation等計算語料中關鍵詞彙關聯強度的方法,並結合分群方法找出可能的主題,最後擷取符合分群結果的相關句子予以輔助人文學者分析詮釋。透過提供各種觀察語料的面向,進而提升語料相關研究學者的效率。 我們利用《人民日報》、《新青年》、《聯合報》、《中國時報》作為實驗與測試的中文語料。且將《新青年》藉由此套工具分析後的結果提供給專業人文學者,做為分析詮釋的參考資訊與佐證依據,並在「2015年數位典藏與數位人文國際研討會」中發表論文。目前我們透過各種中文語料評估工具的效能,且在未來將公開此套工具提供給更多學者使用,節省對於語料分析的時間。 / In recent years, a wide variety of text documents have been transformed into digital format. Hence, using data mining techniques to analyze data is becoming more and more popular in many research fields. The digital humanities gradually have taken seriously since "International Conference of Digital Archives and Digital Humanities" began in 2009. The main purpose of the digital heritage combined with information analysis and visualization could improve the effectiveness of cultural information through different levels of interpretation. In this study, we construct a set of tools for Chinese text mining, calculating associated strengths of collocations work through latent semantic analysis, pointwise mutual information, Person’s chi-squared test, typed dependencies distance, word2vec, and Gibbs sampling for latent Dirichlet allocation etc. The tools employ clustering method to identify the possible topics, meanwhile, the tools will extract the relevant statements according to the clustering results. These clustering and relevant statements contribute and improve the efficiency of humanities scholars’ analysis through providing a variety of observations about the corpora. At the experimental stage of this study, we considered the "People's Daily", "New Youth", "United Daily News", and "China Times" as as the corpora for testing. Among the research, humanities scholars analyzed the "New Youth" by the tools and published a paper in the "2015 International Conference of Digital Archives and Digital Humanities". Currently, we assess the effectiveness of the tools through a variety of Chinese corpora. In the future, we will make the tools freely available on the Internet for Chinese text mining. We hope these time-saving tools can assist in humanities scholars’ study of Chinese corpora.

Page generated in 0.0492 seconds