Global ETD Search

431	Visual analytics of arsenic in various foods Johnson, Matilda Olubunmi 06 1900 (has links) Arsenic is a naturally occurring toxic metal and its presence in food composites could be a potential risk to the health of both humans and animals. Arseniccontaminated groundwater is often used for food and animal consumption, irrigation of soils, which could potentially lead to arsenic entering the human food chain. Its side effects include multiple organ damage, cancers, heart disease, diabetes mellitus, hypertension, lung disease and peripheral vascular disease. Research investigations, epidemiologic surveys and total diet studies (market baskets) provide datasets, information and knowledge on arsenic content in foods. The determination of the concentration of arsenic in rice varieties is an active area of research. With the increasing capability to measure the concentration of arsenic in foods, there are volumes of varied and continuously generated datasets on arsenic in food groups. Visual analytics, which integrates techniques from information visualization and computational data analysis via interactive visual interfaces, presents an approach to enable data on arsenic concentrations to be visually represented. The goal of this doctoral research in Environmental Science is to address the need to provide visual analytical decision support tools on arsenic content in various foods with special emphasis on rice. The hypothesis of this doctoral thesis research is that software enabled visual representation and user interaction facilitated by visual interfaces will help discover hidden relationships between arsenic content and food categories. The specific objectives investigated were: (1) Provide insightful visual analytic views of compiled data on arsenic in food categories; (2) Categorize table ready foods by arsenic content; (3) Compare arsenic content in rice product categories and (4) Identify informative sentences on arsenic concentrations in rice. The overall research method is secondary data analyses using visual analytics techniques implemented through Tableau Software. Several datasets were utilized to conduct visual analytical representations of data on arsenic concentrations in foods. These consisted of (i) arsenic concentrations in 459 crop samples; (ii) arsenic concentrations in 328 table ready foods from multi-year total diet studies; (iii) estimates of daily inorganic arsenic intake for 49 food groups from multicountry total diet studies; (iv) arsenic content in rice product categories for 193 samples of rice and rice products; (v) 758 sentences extracted from PubMed abstracts on arsenic in rice. Several key insights were made in this doctoral research. The concentration of inorganic arsenic in instant rice was lower than those of other rice types. The concentration of Dimethylarsinic Acid (DMA) in wild rice, an aquatic grass, was notably lower than rice varieties (e.g. 0.0099 ppm versus 0.182 for a long grain white rice). The categorization of 328 table ready foods into 12 categories enhances the communication on arsenic concentrations. Outlier concentration of arsenic in rice were observed in views constructed for integrating data from four total diet studies. The 193 rice samples were grouped into two groups using a cut-off level of 3 mcg of inorganic arsenic per serving. The visual analytics views constructed allow users to specify cut-off levels desired. A total of 86 sentences from 53 PubMed abstracts were identified as informative for arsenic concentrations. The sentences enabled literature curation for arsenic concentration and additional supporting information such as location of the research. An informative sentence provided global “normal” range of 0.08 to 0.20 mg/kg for arsenic in rice. A visual analytics resource developed was a dashboard that facilitates the interaction with text and a connection to the knowledge base of the PubMed literature database. The research reported provides a foundation for additional investigations on visual analytics of data on arsenic concentrations in foods. Considering the massive and complex data associated with contaminants in foods, the development of visual analytics tools are needed to facilitate diverse human cognitive tasks. Visual analytics tools can provide integrated automated analysis; interaction with data; and data visualization critically needed to enhance decision making. Stakeholders that would benefit include consumers; food and health safety personnel; farmers; and food producers. Arsenic content of baby foods warrants attention because of the early life exposures that could have life time adverse health consequences. The action of microorganisms in the soil is associated with availability of arsenic species for uptake by plants. Genomic data on microbial communities presents wealth of data to identify mitigation strategies for arsenic uptake by plants. Arsenic metabolism pathways encoded in microbial genomes warrants further research. Visual analytics tasks could facilitate the discovery of biological processes for mitigating arsenic uptake from soil. The increasing availability of central resources on data from total diet studies and research investigations presents a need for personnel with diverse levels of skills in data management and analysis. Training workshops and courses on the foundations and applications of visual analytics can contribute to global workforce development in food safety and environmental health. Research investigations could determine learning gains accomplished through hardware and software for visual analytics. Finally, there is need to develop and evaluate informatics tools that have visual analytics capabilities in the domain of contaminants in foods. / Environmental Sciences / P. Phil. (Environmental Science) Arsenic Dietary Cancer Foods Rice Text mining Visual analytics 664.07 Arsenic--Environmental aspects Arsenic--Physiological effect Food--Analysis Food contamination Arsenic--Carcinogenicity Visual analytics
432	應用文字探勘於影評文章自動摘要之研究 / A Study on Application of Text Mining for Automatic Text Summarization of Film Review 鄧亦安, Teng, I An Unknown Date (has links) 隨著網路世界的興起，在面臨選擇難題時，民眾不僅會接收口耳相傳的資訊，也會以關鍵字上網搜尋目標資訊，但是在海量資料的浪潮中，如何快速的整合資料是一大挑戰。電影影評文章摘要可以幫助民眾進電影院前了解電影的資訊，透過這樣的方式確認電影是自身有興趣的電影。本研究以電影：復仇者聯盟2影評66篇4616句、蝙蝠俠對超人：正義曙光60篇9345句、動物方城市60篇5545句、星際效應50篇4616句、高年級實習生62篇5622句為資料來源，以分群概念結合摘句之方法生成影評摘要。其中，利用K-Means演算法將五部電影的多篇影評特徵詞、句子進行分群後，使用TFIDF評比各分群語句的重要性來選取高權重語句，再以WWA方法挑選分群中不同面向的語句，最後以相似度計算最佳範本與各分群內容的相似度來決定每一群聚的排序順序，產生一篇具有相似內容段落和段落順序的影評多篇摘要。研究結果顯示，原本五部電影影評對最佳範本之相似度為15.87%，經由本研究方法產生之摘要對最佳範本單篇摘要之相似度為21.19%。另外，因為影評中各分群的順序是比對最佳範本相似度而產生的排序，整篇摘要會具有與最佳範本相似段落排序的摘要內容，其中內容包含了電影影評中廣泛提到的相似內容，不同的相似段落讓文章摘要的呈現更具廣泛性。藉由此摘要方法，可以幫助民眾藉由自動化彙整、萃取的摘要快速了解相關電影資訊內容和協助決策。 / Abstract As Facing the Big Data issue, there are too many information on the website for reader to understand. How to perform and summarize essential information quickly is a challenge. People who want to go to a movie will also face this situation. Before choosing movies, they will search relative information of the movies. However, there are many film reviews all over the websites. Automatic text summarization can efficiently extract important information for readers, and conclude concepts of reviews on the websites. Through this method, readers can easily comprehend the best idea of all the reviews and save their time. The research presents a multi-concept and extractive film review summary for readers. It generates film review summary from the most popular blog platform, PIXNET, with extract-based method and clustering concept. The method using K-Means algorism let the film review summary focus on specific film to cluster the sentences by features, and having statistical sense and WWA method to measure the weight of sentences in order to choose the representative sentences. On the last step, it will compare to templates to decide the sequence of classified sentences and summary all represent sentences from each cluster. The research provides a multi-concept and extractive film review summary for people. From the result, there are five movies, which are used summary method increase the average similarity to 21.19% that comparing between the film reviews summary and templates summary. It shows that the automatic film reviews summarization can extract the important sentences from the reviews. Also, with comparing template method to order the cluster, it can sequentially list the cluster of the sentences to generate a movie review, which saves readers’ time and easily comprehend. 文字探勘電影影評摘要自動文章摘要 Text-mining Film review summary Automatic text summarization
433	Topic Analysis of Tweets on the European Refugee Crisis Using Non-negative Matrix Factorization Shen, Chong 01 January 2016 (has links) The ongoing European Refugee Crisis has been one of the most popular trending topics on Twitter for the past 8 months. This paper applies topic modeling on bulks of tweets to discover the hidden patterns within these social media discussions. In particular, we perform topic analysis through solving Non-negative Matrix Factorization (NMF) as an Inexact Alternating Least Squares problem. We accelerate the computation using techniques including tweet sampling and augmented NMF, compare NMF results with different ranks and visualize the outputs through topic representation and frequency plots. We observe that supportive sentiments maintained a strong presence while negative sentiments such as safety concerns have emerged over time. Text Mining Topic Modeling Refugee Crisis Nonnegative Matrix Factorization Alternating Least Squares Linear Algebra Numerical Analysis and Computation Other Applied Mathematics Politics and Social Change Social Media Social Statistics
434	運用文字探勘技術協助建構公司治理本體知識陳言熙 Unknown Date (has links) 本體論的目的在表達一個大家能共用分享的概念，且為知識表達的重要基礎，可用來協助電腦搜尋、交換資訊及了解文字。本體論的應用使網路上的資源都能夠透過電腦明確的被定義出來，使機器透過本體論語言的描述，了解自然語言，加強資料檢索效率並達到知識共享的效果。本體論建置的困難點主要是有太多不同專業領域的領域本體知識需要被定義，所以非常的耗力費時。為了加強建置效率，需要依賴系統化的方法論來進行建置本體工程，並驗證其品質。為了使電腦能夠理解人類語言，許多研究者透過文字探勘技術發展能讓電腦理解的電子詞典，經過分析後將詞典中的詞彙連結成語意網絡，並將語意網路將應用於各種不同的研究領域。因此，本研究嘗試利用文字探勘技術協助建置本體知識，而結論包含可利用文字探勘技術半自動化的協助建置公司治理議題詞庫、語意網路，及以公司治理語意網路作為建置本體知識的基礎，並經由建置方法的提出，將語意網路轉化為公司治理本體知識。 / The purposes of ontology are offering reusable and sharable concepts, and being the base of knowledge representation. It serves a smart way of information searching and exchanging, the resources on internet can easily defined, and computer can understand people’s natural language by the application of ontology, improving the efficiency of data indexing. In order to let computer understand natural language, many researchers have worked hard on electronic lexicons containing computer’s logic through text mining technology, by analyzing lexicons for finding out relative vocabularies and connecting them into a semantic network. Therefore, this research try to utilize text mining technology to support on ontology engineering, the results are developing a text mining technology to support the building of corporate governance’s lexicon and semantic network semi-automatically, and take corporate governance semantic network as the bases of ontology engineering, and introduce a method to turn semantic network into corporate governance ontology. 本體論文字探勘語意網路公司治理議題 ontology text mining semantic network corporate governance
435	A treatise on Web 2.0 with a case study from the financial markets Sykora, Martin D. January 2012 (has links) There has been much hype in vocational and academic circles surrounding the emergence of web 2.0 or social media; however, relatively little work was dedicated to substantiating the actual concept of web 2.0. Many have dismissed it as not deserving of this new title, since the term web 2.0 assumes a certain interpretation of web history, including enough progress in certain direction to trigger a succession [i.e. web 1.0 → web 2.0]. Others provided arguments in support of this development, and there has been a considerable amount of enthusiasm in the literature. Much research has been busy evaluating current use of web 2.0, and analysis of the user generated content, but an objective and thorough assessment of what web 2.0 really stands for has been to a large extent overlooked. More recently the idea of collective intelligence facilitated via web 2.0, and its potential applications have raised interest with researchers, yet a more unified approach and work in the area of collective intelligence is needed. This thesis identifies and critically evaluates a wider context for the web 2.0 environment, and what caused it to emerge; providing a rich literature review on the topic, a review of existing taxonomies, a quantitative and qualitative evaluation of the concept itself, an investigation of the collective intelligence potential that emerges from application usage. Finally, a framework for harnessing collective intelligence in a more systematic manner is proposed. In addition to the presented results, novel methodologies are also introduced throughout this work. In order to provide interesting insight but also to illustrate analysis, a case study of the recent financial crisis is considered. Some interesting results relating to the crisis are revealed within user generated content data, and relevant issues are discussed where appropriate.
436	Knowledge discovery for moderating collaborative projects Choudhary, Alok K. January 2009 (has links) In today's global market environment, enterprises are increasingly turning towards collaboration in projects to leverage their resources, skills and expertise, and simultaneously address the challenges posed in diverse and competitive markets. Moderators, which are knowledge based systems have successfully been used to support collaborative teams by raising awareness of problems or conflicts. However, the functioning of a moderator is limited to the knowledge it has about the team members. Knowledge acquisition, learning and updating of knowledge are the major challenges for a Moderator's implementation. To address these challenges a Knowledge discOvery And daTa minINg inteGrated (KOATING) framework is presented for Moderators to enable them to continuously learn from the operational databases of the company and semi-automatically update the corresponding expert module. The architecture for the Universal Knowledge Moderator (UKM) shows how the existing moderators can be extended to support global manufacturing. A method for designing and developing the knowledge acquisition module of the Moderator for manual and semi-automatic update of knowledge is documented using the Unified Modelling Language (UML). UML has been used to explore the static structure and dynamic behaviour, and describe the system analysis, system design and system development aspects of the proposed KOATING framework. The proof of design has been presented using a case study for a collaborative project in the form of construction project supply chain. It has been shown that Moderators can "learn" by extracting various kinds of knowledge from Post Project Reports (PPRs) using different types of text mining techniques. Furthermore, it also proposed that the knowledge discovery integrated moderators can be used to support and enhance collaboration by identifying appropriate business opportunities and identifying corresponding partners for creation of a virtual organization. A case study is presented in the context of a UK based SME. Finally, this thesis concludes by summarizing the thesis, outlining its novelties and contributions, and recommending future research. 338.0068
437	Microbial phenomics information extractor (MicroPIE): a natural language processing tool for the automated acquisition of prokaryotic phenotypic characters from text sources Mao, Jin, Moore, Lisa R., Blank, Carrine E., Wu, Elvis Hsin-Hui, Ackerman, Marcia, Ranade, Sonali, Cui, Hong 13 December 2016 (has links) Background: The large-scale analysis of phenomic data (i.e., full phenotypic traits of an organism, such as shape, metabolic substrates, and growth conditions) in microbial bioinformatics has been hampered by the lack of tools to rapidly and accurately extract phenotypic data from existing legacy text in the field of microbiology. To quickly obtain knowledge on the distribution and evolution of microbial traits, an information extraction system needed to be developed to extract phenotypic characters from large numbers of taxonomic descriptions so they can be used as input to existing phylogenetic analysis software packages. Results: We report the development and evaluation of Microbial Phenomics Information Extractor (MicroPIE, version 0.1.0). MicroPIE is a natural language processing application that uses a robust supervised classification algorithm (Support Vector Machine) to identify characters from sentences in prokaryotic taxonomic descriptions, followed by a combination of algorithms applying linguistic rules with groups of known terms to extract characters as well as character states. The input to MicroPIE is a set of taxonomic descriptions (clean text). The output is a taxon-by-character matrix-with taxa in the rows and a set of 42 pre-defined characters (e.g., optimum growth temperature) in the columns. The performance of MicroPIE was evaluated against a gold standard matrix and another student-made matrix. Results show that, compared to the gold standard, MicroPIE extracted 21 characters (50%) with a Relaxed F1 score > 0.80 and 16 characters (38%) with Relaxed F1 scores ranging between 0.50 and 0.80. Inclusion of a character prediction component (SVM) improved the overall performance of MicroPIE, notably the precision. Evaluated against the same gold standard, MicroPIE performed significantly better than the undergraduate students. Conclusion: MicroPIE is a promising new tool for the rapid and efficient extraction of phenotypic character information from prokaryotic taxonomic descriptions. However, further development, including incorporation of ontologies, will be necessary to improve the performance of the extraction for some character types. Information extraction Phenotypic data extraction Prokaryotic taxonomic descriptions Microbial phenotypes Character matrices Support vector machine Machine learning Text mining Algorithm evaluation Natural language processing
438	Model trees with topic model preprocessing: an approach for data journalism illustrated with the WikiLeaks Afghanistan war logs Rusch, Thomas, Hofmarcher, Paul, Hatzinger, Reinhold, Hornik, Kurt 06 1900 (has links) (PDF) The WikiLeaks Afghanistan war logs contain nearly 77,000 reports of incidents in the US-led Afghanistan war, covering the period from January 2004 to December 2009. The recent growth of data on complex social systems and the potential to derive stories from them has shifted the focus of journalistic and scientific attention increasingly toward data-driven journalism and computational social science. In this paper we advocate the usage of modern statistical methods for problems of data journalism and beyond, which may help journalistic and scientific work and lead to additional insight. Using the WikiLeaks Afghanistan war logs for illustration, we present an approach that builds intelligible statistical models for interpretable segments in the data, in this case to explore the fatality rates associated with different circumstances in the Afghanistan war. Our approach combines preprocessing by Latent Dirichlet Allocation (LDA) with model trees. LDA is used to process the natural language information contained in each report summary by estimating latent topics and assigning each report to one of them. Together with other variables these topic assignments serve as splitting variables for finding segments in the data to which local statistical models for the reported number of fatalities are fitted. Segmentation and fitting is carried out with recursive partitioning of negative binomial distributions. We identify segments with different fatality rates that correspond to a small number of topics and other variables as well as their interactions. Furthermore, we carve out the similarities between segments and connect them to stories that have been covered in the media. This gives an unprecedented description of the war in Afghanistan and serves as an example of how data journalism, computational social science and other areas with interest in database data can benefit from modern statistical techniques. (authors' abstract)
439	Selecionando candidatos a descritores para agrupamentos hierárquicos de documentos utilizando regras de associação / Selecting candidate labels for hierarchical document clusters using association rules Santos, Fabiano Fernandes dos 17 September 2010 (has links) Uma forma de extrair e organizar o conhecimento, que tem recebido muita atenção nos últimos anos, é por meio de uma representação estrutural dividida por tópicos hierarquicamente relacionados. Uma vez construída a estrutura hierárquica, é necessário encontrar descritores para cada um dos grupos obtidos pois a interpretação destes grupos é uma tarefa complexa para o usuário, já que normalmente os algoritmos não apresentam descrições conceituais simples. Os métodos encontrados na literatura consideram cada documento como uma bag-of-words e não exploram explicitamente o relacionamento existente entre os termos dos documento do grupo. No entanto, essas relações podem trazer informações importantes para a decisão dos termos que devem ser escolhidos como descritores dos nós, e poderiam ser representadas por regras de associação. Assim, o objetivo deste trabalho é avaliar a utilização de regras de associação para apoiar a identificação de descritores para agrupamentos hierárquicos. Para isto, foi proposto o método SeCLAR (Selecting Candidate Labels using Association Rules), que explora o uso de regras de associação para a seleção de descritores para agrupamentos hierárquicos de documentos. Este método gera regras de associação baseadas em transações construídas à partir de cada documento da coleção, e utiliza a informação de relacionamento existente entre os grupos do agrupamento hierárquico para selecionar candidatos a descritores. Os resultados da avaliação experimental indicam que é possível obter uma melhora significativa com relação a precisão e a cobertura dos métodos tradicionais / One way to organize knowledge, that has received much attention in recent years, is to create a structural representation divided by hierarchically related topics. Once this structure is built, it is necessary to find labels for each of the obtained clusters, since most algorithms do not produce simple descriptions and the interpretation of these clusters is a difficult task for users. The related works consider each document as a bag-of-words and do not explore explicitly the relationship between the terms of the documents. However, these relationships can provide important information to the decision of the terms that must be chosen as descriptors of the nodes, and could be represented by rass. This works aims to evaluate the use of association rules to support the identification of labels for hierarchical document clusters. Thus, this paper presents the SeCLAR (Selecting Candidate Labels using Association Rules) method, which explores the use of association rules for the selection of good candidates for labels of hierarchical clusters of documents. This method generates association rules based on transactions built from each document in the collection, and uses the information relationship between the nodes of hierarchical clustering to select candidates for labels. The experimental results show that it is possible to obtain a significant improvement with respect to precision and recall of traditional methods Agrupamento hierárquico de documantos Association rules Hierarchical document clustering Label hierarchical clustering Mineração de texto Regras de associação Text mining
440	Representação de coleções de documentos textuais por meio de regras de associação / Representation of textual document collections through association rules Rossi, Rafael Geraldeli 16 August 2011 (has links) O número de documentos textuais disponíveis em formato digital tem aumentado incessantemente. Técnicas de Mineração de Textos são cada vez mais utilizadas para organizar e extrair conhecimento de grandes coleções de documentos textuais. Para o uso dessas técnicas é necessário que os documentos textuais estejam representados em um formato apropriado. A maioria das pesquisas de Mineração de Textos utiliza a abordagem bag-of-words para representar os documentos da coleção. Essa representação usa cada palavra presente na coleção de documentos como possível atributo, ignorando a ordem das palavras, informa ções de pontuação ou estruturais, e é caracterizada pela alta dimensionalidade e por dados esparsos. Por outro lado, a maioria dos conceitos são compostos por mais de uma palavra, como Inteligência Articial, Rede Neural, e Mineração de Textos. As abordagens que geram atributos compostos por mais de uma palavra apresentam outros problemas além dos apresentados pela representação bag-of-words, como a geração de atributos com pouco signicado e uma dimensionalidade muito maior. Neste projeto de mestrado foi proposta uma abordagem para representar documentos textuais nomeada bag-of-related-words. A abordagem proposta gera atributos compostos por palavras relacionadas com o uso de regras de associação. Com as regras de associação, espera-se identicar relações entre palavras de um documento, além de reduzir a dimensionalidade, pois são consideradas apenas as palavras que ocorrem ou que coocorrem acima de uma determinada frequência para gerar as regras. Diferentes maneiras de mapear o documento em transações para possibilitar a geração de regras de associação são analisadas. Diversas medidas de interesse aplicadas às regras de associação para a extração de atributos mais signicativos e a redução do número de atributos também são analisadas. Para avaliar o quanto a representação bag-of-related-words pode auxiliar na organização e extração de conhecimento de coleções de documentos textuais, e na interpretabilidade dos resultados, foram realizados três grupos de experimentos: 1) classicação de documentos textuais para avaliar o quanto os atributos da representação bag-of-related-words são bons para distinguir as categorias dos documentos; 2) agrupamento de documentos textuais para avaliar a qualidade dos grupos obtidos com a bag-of-related-words e consequentemente auxiliar na obtenção da estrutura de uma hierarquia de tópicos; e 3) construção e avaliação de hierarquias de tópicos por especialistas de domínio. Todos os resultados e dimensionalidades foram comparados com a representação bag-of-words. Pelos resultados dos experimentos realizados, pode-se vericar que os atributos da representação bag-of-related-words possuem um poder preditivo tão bom quanto os da representação bag-of-words. A qualidade dos agrupamentos de documentos textuais utilizando a representação bag-of-related-words foi tão boa quanto utilizando a representação bag-of-words. Na avaliação de hierarquias de tópicos por especialistas de domínio, a utilização da representação bag-of-related-words apresentou melhores resultados em todos os quesitos analisados / The amount of textual documents available in digital format is incredibly large. Text Mining techniques are becoming essentials to manage and extract knowledge in big textual document collections. In order to use these techniques, the textual documents need to be represented in an appropriate format to allow the construction of a model that represents the embedded knowledge in these textual documents. Most of the researches on Text Mining uses the bag-of-words approach to represent textual document collections. This representation uses each word in a collection as feature, ignoring the order of the words, structural information, and it is characterized by the high dimensionality and data sparsity. On the other hand, most of the concepts are compounded by more than one word, such as Articial Intelligence, Neural Network, and Text Mining. The approaches which generate features compounded by more than one word to solve this problem, suer from other problems, as the generation of features without meaning and a dimensionality much higher than that of the bag-of-words. An approach to represent textual documents named bag-of-related-words was proposed in this master thesis. The proposed approach generates features compounded by related words using association rules. We hope to identify relationships among words and reduce the dimensionality with the use of association rules, since only the words that occur and cooccur over a frequency threshold will be used to generate rules. Dierent ways to map the document into transactions to allow the extraction of association rules are analyzed. Dierent objective interest measures applied to the association rules to generate more meaningful features and to the reduce the feature number are also analyzed. To evaluate how much the textual document representation proposed in this master project can aid the managing and knowledge extraction from textual document collections, and the understanding of the results, three experiments were carried out: 1) textual document classication to analyze the predictive power of the bag-of-related-words features, 2) textual document clustering to analyze the quality of the cluster using the bag-of-related-words representation 3) topic hierarchies building and evaluation by domain experts. All the results and dimensionalities were compared to the bag-of-words representation. The results presented that the features of the bag-of-related-words representation have a predictive power as good as the features of the bag-of-words representation. The quality of the textual document clustering also was as good as the bag-of-words. The evaluation of the topic hierarchies by domain specialists presented better results when using the bag-of-related-words representation in all the questions analyzed Agrupamentos de textos Classificação de textos Hierarquia de tópicos Mineração de textos Text clustering Text mining Topic hierarchies

Search results