Global ETD Search

481	應用文字探勘分析網路團購商品群集之研究－以美食類商品為例 / The study of analyzing group-buying goods clusters by using text mining – exemplified by the group-buying foods 趙婉婷 Unknown Date (has links) 網路團購消費模式掀起一陣風潮，隨著網路團購市場接受度提高，現今以團購方式進行購物的消費模式不斷增加，團購商品品項也日益繁多。為了使網路團購消費者更容易找到感興趣的團購商品，本研究將針對團購商品進行群集分析。本研究以國內知名團購網站「愛合購」為例，以甜點蛋糕分類下的熱門美食團購商品為主，依商品名稱找尋該商品的顧客團購網誌文章納入資料庫中。本研究從熱門度前1000項的產品中找到268項產品擁有顧客團購網誌586篇，透過文字探勘技術從中擷取產品特徵相關資訊，並以「ｋ最近鄰居法」為基礎建置kNN分群器，以進行群集分析。本研究依不同的k值以及分群門檻值進行分群，並對大群集進行階段式分群，單項群集進行質心合併，以尋求較佳之分群結果。研究結果顯示，268項團購商品經過kNN分群器進行四個階段的群集分析後可獲得28個群集，群內相似度從未分群時的0.029834提升至0.177428。在經過第一階段的分群後，可將商品分為3個主要大群集，即「麵包類」、「蛋糕類」以及「其他口感類」。在進行完四個階段的分群後，「麵包類」可分為2種類型的群集，即『麵包類產品』以及『擁有麵包特質的產品』，而「蛋糕類」則是可依口味區分為不同的蛋糕群集。產品重要特徵詞彙不像一般文章的關鍵字詞會重複出現於文章中，因此在特徵詞彙過濾時應避免刪減過多的產品特徵詞彙。群集特性可由詞彙權重前20%之詞彙依人工過濾及商品出現頻率挑選出產品特徵代表詞來做描繪。研究所獲得之分群結果除了提供團購消費者選擇產品時參考外，也可幫助團購網站業者規劃更適切的行銷活動。本研究亦提出一些未來研究方向。 / Group-buying is prevailing, the items of merchandise diverse recently. In order to let consumer find the commodities they are interested in, the research focus on the cluster analysis about group-buying products and clusters products by the features of them. We catch the blogs of products posted by customers, via text mining to retrieve the features of products, and then establish the kNN clustering device to cluster them. This research sets different threshold values to test, and multiply clusters big groups, and merges small groups by centroid, we expect to obtain the best quality cluster. From the results, 268 items of group-buying foods can be divided into 28 clusters, and the mean of Intra-Similarity also can be improved. The 28 clusters can be categorized to three main clusters：Bread, Cake, and Other mouthfeel foods. We can define and name each cluster by catch the top twenty percent of the keywords in each cluster. The results of this paper could help buyers find similar commodities which they like, and also help sellers make the great marketing activity plan. 文字探勘團購最近鄰居法 kNN分群 Text Mining Group-buying k-Nearest Neighbors kNN clustering
482	運用資料及文字探勘探討不同市場營運概況文字敘述及財務表現之一致性 / Using data and text mining to explore for consistencies between narrative disclosures and financial performance in different markets 江韋達, Chiang, Danny Wei Ta Unknown Date (has links) 本研究使用TFIDF文字探勘技術分析樣本公司年度財務報告裡面的重要非量化資訊，與三項量化財務比率比較，欲探討公司年報在不同市場裡文字敘述與財務表現之一致性。研究結果顯示，根據從2003年至2010年上市半導體公司之年度報告，美國公司的年報較會對財務表現做出誇大的文字敘述，本研究亦發現在文字敘述上，市場較不成熟的中國公司所發布之年報較偏向低估他們的財務表現。 / This study presented a way to extract useful information out of unstructured qualitative textual data with the use of the TFIDF text mining technique, which was used to help us explore for consistencies between financial performance in the form of quantitative financial ratios and qualitative narrative disclosures in the annual report between countries with different levels of market development. The results show that, based on listed semiconductor companies' annual reports between 2003 to 2010, companies in the United States have a high tendency to exaggerate and overstate about their performance in the MD&A, while less developed markets such as China turned out to have the lowest tendency to exaggerate and was more likely to understate about its performance in their Director's Report. 文字探勘 K-Means分群文字敘述營運概況 Text Mining K-Means Narrative Disclosures MD&A
483	Knowledge discovery using pattern taxonomy model in text mining Wu, Sheng-Tang January 2007 (has links) In the last decade, many data mining techniques have been proposed for fulfilling various knowledge discovery tasks in order to achieve the goal of retrieving useful information for users. Various types of patterns can then be generated using these techniques, such as sequential patterns, frequent itemsets, and closed and maximum patterns. However, how to effectively exploit the discovered patterns is still an open research issue, especially in the domain of text mining. Most of the text mining methods adopt the keyword-based approach to construct text representations which consist of single words or single terms, whereas other methods have tried to use phrases instead of keywords, based on the hypothesis that the information carried by a phrase is considered more than that by a single term. Nevertheless, these phrase-based methods did not yield significant improvements due to the fact that the patterns with high frequency (normally the shorter patterns) usually have a high value on exhaustivity but a low value on specificity, and thus the specific patterns encounter the low frequency problem. This thesis presents the research on the concept of developing an effective Pattern Taxonomy Model (PTM) to overcome the aforementioned problem by deploying discovered patterns into a hypothesis space. PTM is a pattern-based method which adopts the technique of sequential pattern mining and uses closed patterns as features in the representative. A PTM-based information filtering system is implemented and evaluated by a series of experiments on the latest version of the Reuters dataset, RCV1. The pattern evolution schemes are also proposed in this thesis with the attempt of utilising information from negative training examples to update the discovered knowledge. The results show that the PTM outperforms not only all up-to-date data mining-based methods, but also the traditional Rocchio and the state-of-the-art BM25 and Support Vector Machines (SVM) approaches. pattern taxonomy model information retrieval text mining data mining association rules sequential pattern mining closed sequential patterns pattern deploying pattern evolving
484	Προστασία διανοητικής ιδιοκτησίας και επιπτώσεις στην καινοτομικότητα, την τεχνολογική πρόοδο και την έρευνα. Χρήση τεχνικών εξόρυξης γνώσης από κείμενο σε διπλώματα ευρεσιτεχνίας Γεωργής, Γεώργιος 15 October 2012 (has links) Το υπάρχον σύστημα κατοχύρωσης της Πνευματικής Ιδιοκτησίας και ιδιαίτερα της Βιομηχανικής Ιδιοκτησίας μέσω των Διπλωμάτων Ευρεσιτεχνίας δημιουργήθηκε με σκοπό να να προάγει την έρευνα και την επιστήμη. Η πολιτεία χορηγεί το μονοπωλιακό δικαίωμα στην χρήση και εκμετάλλευση μιας εφέυρεσης με αντάλλαγμα την δημόσια αποκάλυψη της ευρεσιτεχνίας από τον εφευρέτη. Οι υπόλοιποι ερευνητές πλέον μπορούν να χρησιμοποι-ήσουν ελεύθερα αυτή την (δημοσιευμένη) γνώση στην έρευνά τους ενώ οι επιχειρήσεις μπορούν καταβάλλοντας κάποιο τίμημα στον εφευρέτη να την χρησιμοποιήσουν για εμπο-ρικούς σκοπούς. Συνεπώς, με βάση τα παραπάνω, το σκεπτικό πίσω από την χορήγηση διπλωμάτων ευρεσιτεχνίας είναι η αύξηση των επενδύσεων σε έρευνα και καινοτομικότητα με ταυτόχ-ρονη διάχυση των αποτελεσμάτων μέσω της χορήγησης ενός προσωρινού μονοπωλίου. Όμως καθώς πλέον η χρήση των διπλωμάτων ευρεσιτεχνίας αυξάνεται διαρκώς και η χρήση τους γενικεύεται σε ζώντες οργανισμούς, φυτά, προγράμματα υπολογιστών προ-κύπτουν ερωτήματα σχετικά με την ανάγκη αναθεώρησης του υπάρχοντος σύστηματος Προστασίας της Διανοητικής Ιδιοκτησίας και της διαδικασίας εξέτασης και χορήγησης ευ-ρεσιτεχνιών. Επίσης εξετάζεται η χρήση τεχνικών εξόρυξης γνώσης από ΔΕ και οι δυνατότητες που οι τεχνικές αυτές μπορούν να προσφέρουν. / The existing system of Intellectual Property rights is analysed, and more specifically the patent system. Patents grant a monopoly right for a specific amount of time to the applicant in exchange of full disclosure. The existing patent system is examined along with controversial issues and grants and a text mining method for information extraction is tested. Εξόρυξη γνώσης Πατέντα 346.048 6 Text mining Patent Intellectual property Copyright Industrial property
485	Προδιαγραφές μιας καινοτόμας πλατφόρμας ηλεκτρονικής μάθησης που ενσωματώνει τεχνικές επεξεργασίας φυσικής γλώσσας Φερφυρή, Ναυσικά 04 September 2013 (has links) Ζούμε σε μια κοινωνία στην οποία η χρήση της τεχνολογίας έχει εισβάλει δυναμικά στην καθημερινότητα.Η εκπαίδευση δεν θα μπορούσε να μην επηρεαστεί απο τις Νέες Τεχνολογίες.Ήδη,όροι όπως “Ηλεκτρονική Μάθηση” και ”Ασύγχρονη Τηλε-εκπαίδευση” έχουν δημιουργήσει νέα δεδομένα στην κλασική Εκπαίδευση. Με τον όρο ασύγχρονη τηλε-εκπαίδευση εννοούμε μια διαδικασία ανταλλαγής μάθησης μεταξύ εκπαιδευτή - εκπαιδευομένων,που πραγματοποιείται ανεξάρτητα χρόνου και τόπου. Ηλεκτρονική Μάθηση είναι η χρήση των νέων πολυμεσικών τεχνολογιών και του διαδικτύου για τη βελτίωση της ποιότητας της μάθησης,διευκολύνοντας την πρόσβαση σε πηγές πληροφοριών και σε υπηρεσίες καθώς και σε ανταλλαγές και εξ'αποστάσεως συνεργασίες.Ο όρος καλύπτει ένα ευρύ φάσμα εφαρμογών και διαδικασιών,όπως ηλεκτρονικές τάξεις και ψηφιακές συνεργασίες, μάθηση βασιζόμενη στους ηλεκτρονικούς υπολογιστές και στις τεχνολογίες του παγκόσμιου ιστού. Κάποιες απο τις βασικές απαιτήσεις που θα πρέπει να πληρούνται για την δημιουργία μιας πλατφόρμας ηλεκτρονικής μάθησης είναι: Να υποστηρίζει τη δημιουργία βημάτων συζήτησης (discussion forums) και “δωματίων συζήτησης”(chat rooms),να υλοποιεί ηλεκτρονικό ταχυδρομείο,να έχει φιλικό περιβάλλον τόσο για το χρήστη/μαθητή όσο και για το χρήστη/καθηγητή,να υποστηρίζει προσωποποίηση(customization)του περιβάλλοντος ανάλογα με το χρήστη.Επίσης να κρατάει πληροφορίες(δημιουργία profiles)για το χρήστη για να τον “βοηθάει”κατά την πλοήγηση,να υποστηρίζει την εύκολη δημιουργία διαγωνισμάτων(online tests), να υποστηρίζει την παρουσίαση πολυμεσικών υλικών. Ως επεξεργασία φυσικής γλώσσας (NLP) ορίζουμε την υπολογιστική ανάλυση αδόμητων δεδομένων σε κείμενα, με σκοπό την επίτευξη μηχανικής κατανόησης του κειμένου αυτού.Είναι η επεξεργασία προτάσεων που εισάγονται ή διαβάζονται από το σύστημα,το οποίο απαντά επίσης με προτάσεις με τρόπο τέτοιο που να θυμίζει απαντήσεις μορφωμένου ανθρώπου. Βασικό ρόλο παίζει η γραμματική,το συντακτικό,η ανάλυση των εννοιολογικών στοιχείων και γενικά της γνώσης, για να γίνει κατανοητή η ανθρώπινη γλώσσα από τη μηχανή. Οι βασικές τεχνικές επεξεργασίας φυσικού κειμένου βασίζονται στις γενικές γνώσεις σχετικά με τη φυσική γλώσσα.Χρησιμοποιούν ορισμένους απλούς ευρετικούς κανόνες οι οποίοι στηρίζονται στη συντακτική και σημασιολογική προσέγγιση και ανάλυση του κειμένου.Ορισμένες τεχνικές που αφορούν σε όλα τα πεδία εφαρμογής είναι: ο διαμερισμός στα συστατικά στοιχεία του κειμένου (tokenization), η χρήση της διάταξης του κειμένου (structural data mining), η απαλοιφή λέξεων που δεν φέρουν ουσιαστική πληροφορία (elimination of insignificant words),η γραμματική δεικτοδότηση (PoS tagging), η μορφολογική ανάλυση και η συντακτική ανάλυση. Στόχος της παρούσας διπλωματικής είναι να περιγράψει και να αξιολογήσει πως οι τεχνικές επεξεργασίας της φυσικής γλώσσας (NLP), θα μπορούσαν να αξιοποιηθούν για την ενσωμάτωση τους σε πλατφόρμες ηλεκτρονικής μάθησης.Ο μεγάλος όγκος δεδομένων που παρέχεται μέσω μιας ηλεκτρονικής πλατφόρμας μάθησης, θα πρέπει να μπορεί να διαχειριστεί , να διανεμηθεί και να ανακτηθεί σωστά.Κάνοντας χρήση των τεχνικών NLP θα παρουσιαστεί μια καινοτόμα πλατφόρμα ηλεκτρονικής μάθησης,εκμεταλεύοντας τις υψηλού επιπέδου τεχνικές εξατομίκευσης, την δυνατότητα εξαγωγής συμπερασμάτων επεξεργάζοντας την φυσική γλώσσα των χρηστών προσαρμόζοντας το προσφερόμενο εκπαιδευτικό υλικό στις ανάγκες του κάθε χρήστη. / We live in a society in which the use of technology has entered dynamically in our life,the education could not be influenced by new Technologies. Terms such as "e-Learning" and "Asynchronous e-learning" have created new standards in the classical Education. By the term “asynchronous e-learning” we mean a process of exchange of learning between teacher & student, performed regardless of time and place. E-learning is the use of new multimedia technologies and the Internet to improve the quality of learning by facilitating access to information resources and services as well as remote exchanges .The term covers a wide range of applications and processes, such electronic classrooms, and digital collaboration, learning based on computers and Web technologies. Some of the basic requirements that must be met to establish a platform for e-learning are: To support the creation of forums and chat rooms, to deliver email, has friendly environment for both user / student and user / teacher, support personalization depending to the user . Holding information (creating profiles) for the user in order to provide help in the navigation, to support easy creating exams (online tests), to support multimedia presentation materials. As natural language processing (NLP) define the computational analysis of unstructured data in text, to achieve mechanical understanding of the text. To elaborate proposals that imported or read by the system, which also responds by proposals in a manner that reminds answers of educated man. A key role is played by the grammar, syntax, semantic analysis of data and general knowledge to understand the human language of the machine. The main natural text processing techniques based on general knowledge about natural language .This techniques use some simple heuristic rules based on syntactic and semantic analysis of the text. Some of the techniques pertaining to all fields of application are: tokenization, structural data mining, elimination of insignificant words, PoS tagging, analyzing the morphological and syntactic analysis. The aim of this study is to describe and evaluate how the techniques of natural language processing (NLP), could be used for incorporation into e-learning platforms. The large growth of data delivered through an online learning platform, should be able to manage, distributed and retrieved. By the use of NLP techniques will be presented an innovative e-learning platform, using the high level personalization techniques, the ability to extract conclusions digesting the user's natural language by customizing the offered educational materials to the needs of each user . Ηλεκτρονική μάθηση Εξατομίκευση Προσαρμοστικότητα Εξόρυξη γνώσης 371.334 467 8 e-Learning Personalization Adaptivity Data mining Web mining Text mining Natural language processing (NLP)
486	Automatic Identification of Duplicates in Literature in Multiple Languages Klasson Svensson, Emil January 2018 (has links) As the the amount of books available online the sizes of each these collections are at the same pace growing larger and more commonly in multiple languages. Many of these cor- pora contain duplicates in form of various editions or translations of books. The task of finding these duplicates is usually done manually but with the growing sizes making it time consuming and demanding. The thesis set out to find a method in the field of Text Mining and Natural Language Processing that can automatize the process of manually identifying these duplicates in a corpora mainly consisting of fiction in multiple languages provided by Storytel. The problem was approached using three different methods to compute distance measures between books. The first approach was comparing titles of the books using the Levenstein- distance. The second approach used extracting entities from each book using Named En- tity Recognition and represented them using tf-idf and cosine dissimilarity to compute distances. The third approach was using a Polylingual Topic Model to estimate the books distribution of topics and compare them using Jensen Shannon Distance. In order to es- timate the parameters of the Polylingual Topic Model 8000 books were translated from Swedish to English using Apache Joshua a statistical machine translation system. For each method every book written by an author was pairwise tested using a hypothesis test where the null hypothesis was that the two books compared is not an edition or translation of the others. Since there is no known distribution to assume as the null distribution for each book a null distribution was estimated using distance measures of books not written by the author. The methods were evaluated on two different sets of manually labeled data made by the author of the thesis. One randomly sampled using one-stage cluster sampling and one consisting of books from authors that the corpus provider prior to the thesis be considered more difficult to label using automated techniques. Of the three methods the Title Matching was the method that performed best in terms of accuracy and precision based of the sampled data. The entity matching approach was the method with the lowest accuracy and precision but with a almost constant recall at around 50 %. It was concluded that there seems to be a set of duplicates that are clearly distin- guished from the estimated null-distributions, with a higher significance level a better pre- cision and accuracy could have been made with a similar recall for the specific method. For topic matching the result was worse than the title matching and when studied the es- timated model was not able to create quality topics the cause of multiple factors. It was concluded that further research is needed for the topic matching approach. None of the three methods were deemed be complete solutions to automatize detection of book duplicates. Probability Theory and Statistics Sannolikhetsteori och statistik
487	Minerafórum : um recurso de apoio para análise qualitativa em fóruns de discussão Azevedo, Breno Fabrício Terra January 2011 (has links) Esta tese aborda o desenvolvimento, uso e experimentação do MineraFórum. Trata-se de um recurso para auxiliar o professor na análise qualitativa das contribuições textuais registradas por alunos em fóruns de discussão. A abordagem desta pesquisa envolveu técnicas de mineração de textos utilizando grafos. As interações proporcionadas pelas trocas de mensagens em um fórum de discussão representam uma importante fonte de investigação para o professor. A partir da análise das postagens, o docente pode identificar quais alunos redigiram contribuições textuais que contemplam conceitos relativos ao tema da discussão, e quais discentes não o fizeram. Desta forma, é possível ter subsídios para motivar a discussão dos conceitos importantes que fazem parte do tema em debate. Para atingir o objetivo do presente estudo, foi necessário realizar uma revisão da literatura onde foram abordados temas como: a Educação a Distância (EAD); Ambientes Virtuais de Aprendizagem; os principais conceitos da área de Mineração de Textos e, por último, trabalhos correlacionados a esta tese. A estratégia metodológica utilizada no processo de desenvolvimento do MineraFórum envolveu uma série de etapas: 1) a escolha de uma técnica de mineração de textos adequada às necessidades da pesquisa; 2) verificação da existência de algum software de mineração de textos que auxiliasse o professor a analisar qualitativamente as contribuições em um fórum de discussão; 3) realização de estudos preliminares para avaliar a técnica de mineração escolhida; 4) definição dos indicadores de relevância das mensagens; elaboração de fórmulas para calcular a relevância das postagens; 5) construção do sistema; 6) integração do MineraFórum a três Ambientes Virtuais de Aprendizagem e, por último, 7) a realização de experimentos com a ferramenta. / This thesis presents the development, use and experimentation of the MineraFórum software. It is a resource that can help teachers in doing qualitative analyses of text contributions in discussion forums. This research included the use of text mining techniques with graphs. Message exchange in discussion forums are an important source of investigation for teachers. By analyzing students’ posts, teachers can identify which learners wrote contributions that have concepts related to the debate theme, and which students did not succeed to do so. This strategy may also give teachers the necessary elements to motivate discussion of concepts relevant to the topic being debated. To accomplish the objectives of this study, a review of the literature was carried on topics such as: Distance Learning; Virtual Learning Environments; main concepts in Text Mining; and studies related to this thesis. The methodological strategy used in the development of MineraFórum followed these steps: 1) choosing a text mining technique suitable to the needs of the research; 2) checking whether there was software available to help teachers to do qualitative analysis of contributions in discussion forums; 3) doing preliminary studies to evaluate the selected mining technique; 4) defining indicators of relevance in the messages; elaborating formulas to calculate relevance in posts; 5) building the system; 6) integrating MineraFórum to three Virtual Learning Environments, and 7) carrying experiments with the tool. Computador na educação Fórum de discussão Ambiente virtual Ambiente de aprendizagem Análise de dados Text mining Discussion forum Qualitative analysis Thematic relevance Virtual learning environments
488	Visual analytics of arsenic in various foods Johnson, Matilda Olubunmi 06 1900 (has links) Arsenic is a naturally occurring toxic metal and its presence in food composites could be a potential risk to the health of both humans and animals. Arseniccontaminated groundwater is often used for food and animal consumption, irrigation of soils, which could potentially lead to arsenic entering the human food chain. Its side effects include multiple organ damage, cancers, heart disease, diabetes mellitus, hypertension, lung disease and peripheral vascular disease. Research investigations, epidemiologic surveys and total diet studies (market baskets) provide datasets, information and knowledge on arsenic content in foods. The determination of the concentration of arsenic in rice varieties is an active area of research. With the increasing capability to measure the concentration of arsenic in foods, there are volumes of varied and continuously generated datasets on arsenic in food groups. Visual analytics, which integrates techniques from information visualization and computational data analysis via interactive visual interfaces, presents an approach to enable data on arsenic concentrations to be visually represented. The goal of this doctoral research in Environmental Science is to address the need to provide visual analytical decision support tools on arsenic content in various foods with special emphasis on rice. The hypothesis of this doctoral thesis research is that software enabled visual representation and user interaction facilitated by visual interfaces will help discover hidden relationships between arsenic content and food categories. The specific objectives investigated were: (1) Provide insightful visual analytic views of compiled data on arsenic in food categories; (2) Categorize table ready foods by arsenic content; (3) Compare arsenic content in rice product categories and (4) Identify informative sentences on arsenic concentrations in rice. The overall research method is secondary data analyses using visual analytics techniques implemented through Tableau Software. Several datasets were utilized to conduct visual analytical representations of data on arsenic concentrations in foods. These consisted of (i) arsenic concentrations in 459 crop samples; (ii) arsenic concentrations in 328 table ready foods from multi-year total diet studies; (iii) estimates of daily inorganic arsenic intake for 49 food groups from multicountry total diet studies; (iv) arsenic content in rice product categories for 193 samples of rice and rice products; (v) 758 sentences extracted from PubMed abstracts on arsenic in rice. Several key insights were made in this doctoral research. The concentration of inorganic arsenic in instant rice was lower than those of other rice types. The concentration of Dimethylarsinic Acid (DMA) in wild rice, an aquatic grass, was notably lower than rice varieties (e.g. 0.0099 ppm versus 0.182 for a long grain white rice). The categorization of 328 table ready foods into 12 categories enhances the communication on arsenic concentrations. Outlier concentration of arsenic in rice were observed in views constructed for integrating data from four total diet studies. The 193 rice samples were grouped into two groups using a cut-off level of 3 mcg of inorganic arsenic per serving. The visual analytics views constructed allow users to specify cut-off levels desired. A total of 86 sentences from 53 PubMed abstracts were identified as informative for arsenic concentrations. The sentences enabled literature curation for arsenic concentration and additional supporting information such as location of the research. An informative sentence provided global “normal” range of 0.08 to 0.20 mg/kg for arsenic in rice. A visual analytics resource developed was a dashboard that facilitates the interaction with text and a connection to the knowledge base of the PubMed literature database. The research reported provides a foundation for additional investigations on visual analytics of data on arsenic concentrations in foods. Considering the massive and complex data associated with contaminants in foods, the development of visual analytics tools are needed to facilitate diverse human cognitive tasks. Visual analytics tools can provide integrated automated analysis; interaction with data; and data visualization critically needed to enhance decision making. Stakeholders that would benefit include consumers; food and health safety personnel; farmers; and food producers. Arsenic content of baby foods warrants attention because of the early life exposures that could have life time adverse health consequences. The action of microorganisms in the soil is associated with availability of arsenic species for uptake by plants. Genomic data on microbial communities presents wealth of data to identify mitigation strategies for arsenic uptake by plants. Arsenic metabolism pathways encoded in microbial genomes warrants further research. Visual analytics tasks could facilitate the discovery of biological processes for mitigating arsenic uptake from soil. The increasing availability of central resources on data from total diet studies and research investigations presents a need for personnel with diverse levels of skills in data management and analysis. Training workshops and courses on the foundations and applications of visual analytics can contribute to global workforce development in food safety and environmental health. Research investigations could determine learning gains accomplished through hardware and software for visual analytics. Finally, there is need to develop and evaluate informatics tools that have visual analytics capabilities in the domain of contaminants in foods. / Environmental Sciences / P. Phil. (Environmental Science) Arsenic Dietary Cancer Foods Rice Text mining Visual analytics 664.07 Arsenic--Environmental aspects Arsenic--Physiological effect Food--Analysis Food contamination Arsenic--Carcinogenicity Visual analytics
489	Epistemologia da Informática em Saúde: entre a teoria e a prática / Epistemology of Medical Informatics: between theory and practice Colepícolo, Eliane [UNIFESP] 26 March 2008 (has links) (PDF) Made available in DSpace on 2015-07-22T20:50:02Z (GMT). No. of bitstreams: 0 Previous issue date: 2008-03-26 / Epistemologia da Informática em Saúde: entre a teoria e a prática. Eliane Colepí-colo. 2008. CONTEXTO. O objetivo dessa pesquisa é compreender a epistemologia da área de Informática em Saúde (IS) por meio de um estudo comparativo entre aspectos teóricos e práticos desta disciplina. MATERIAIS E MÉTODOS. O estudo foi dividido em 3 eta-pas: estudo estatístico, estudo terminológico e estudo epistemológico. O estudo esta-tístico envolveu o desenvolvimento e uso de robô para extração de metadados de arti-gos científicos da base PubMed, assim como a mineração de textos destes resumos de artigos, utilizados para estatísticas e análise posterior. O estudo terminológico visou o desenvolvimento de um tesauro especializado em IS, aqui denominado EpistemIS, que, integrado ao MeSH, serviu como base ao estudo estatístico. O estudo epistemo-lógico começou com o estudo dos metaconceitos da ação e pensamento humanos (MAPHs), que são arte, técnica, ciência, tecnologia e tecnociência. A seguir, realizou-se o desenvolvimento de um método epistemológico, baseado nas obras de Mário Bunge, para classificação epistemológica de conceitos da área provenientes do tesau-ro EpistemIS. Uma pesquisa de opinião com a comunidade científica da área foi reali-zada por meio de questionário na web. RESULTADOS. Obteve-se: uma caracteriza-ção dos MAPHs, mapas de sistematização do conhecimento em IS, classificações epistemológica e em MAPHs da IS, um mapa do conhecimento em IS e o consenso da comunidade sobre a epistemologia da IS. Por fim, foram calculadas estatísticas relati-vas: às classificações epistemológica e em MAPHs em IS, à integração entre o corpus de análise (437.289 artigos PubMed) e o tesauro EpistemIS. CONCLUSÃO. A partir de argumentos teóricos e práticos concluiu-se que a Informática em Saúde é uma tecno-ciência que se ocupa de solucionar problemas relativos aos domínios das Ciências da Vida, Ciências da Saúde e do Cuidado em Saúde, por meio da pesquisa científica in-terdisciplinar e do desenvolvimento de tecnologia para uso na sociedade. / TEDE Epistemologia Estatística Medical Subject Headings (MeSH) Mineração de textos Terminologia Tesauro Informática em saúde Epistemology Statistics Medical Subject Headings (MeSH) Text Mining Terminology Thesaurus Health informatics
490	Analyse des médias sociaux de santé pour évaluer la qualité de vie des patientes atteintes d’un cancer du sein / Analysis of social health media to assess the quality of life of breast cancer patients Tapi Nzali, Mike Donald 28 September 2017 (has links) En 2015, le nombre de nouveaux cas de cancer du sein en France s'élève à 54 000. Le taux de survie 5 ans après le diagnostic est de 89 %. Si les traitements modernes permettent de sauver des vies, certains sont difficiles à supporter. De nombreux projets de recherche clinique se sont donc focalisés sur la qualité de vie (QdV) qui fait référence à la perception que les patients ont de leurs maladies et de leurs traitements. La QdV est un critère d'évaluation clinique pertinent pour évaluer les avantages et les inconvénients des traitements que ce soit pour le patient ou pour le système de santé. Dans cette thèse, nous nous intéresserons aux histoires racontées par les patients dans les médias sociaux à propos de leur santé, pour mieux comprendre leur perception de la QdV. Ce nouveau mode de communication est très prisé des patients car associé à une grande liberté du discours due notamment à l'anonymat fourni par ces sites.L’originalité de cette thèse est d’utiliser et d'étendre des méthodes de fouille de données issues des médias sociaux pour la langue Française. Les contributions de ce travail sont les suivantes : (1) construction d’un vocabulaire patient/médecin ; (2) détection des thèmes discutés par les patients; (3) analyse des sentiments des messages postés par les patients et (4) mise en relation des différentes contributions citées.Dans un premier temps, nous avons utilisé les textes des patients pour construire un vocabulaire patient/médecin spécifique au domaine du cancer du sein, en recueillant divers types d'expressions non-expertes liées à la maladie, puis en les liant à des termes biomédicaux utilisés par les professionnels de la santé. Nous avons combiné plusieurs méthodes de la littérature basées sur des approches linguistiques et statistiques. Pour évaluer les relations obtenues, nous utilisons des validations automatiques et manuelles. Nous avons ensuite transformé la ressource construite dans un format lisible par l’être humain et par l’ordinateur en créant une ontologie SKOS, laquelle a été intégrée dans la plateforme BioPortal.Dans un deuxième temps, nous avons utilisé et étendu des méthodes de la littérature afin de détecter les différents thèmes discutés par les patients dans les médias sociaux et de les relier aux dimensions fonctionnelles et symptomatiques des auto-questionnaires de QdV (EORTC QLQ-C30 et EORTC QLQ-BR23). Afin de détecter les thèmes, nous avons appliqué le modèle d’apprentissage non supervisé LDA avec des prétraitements pertinents. Ensuite, nous avons proposé une méthode permettant de calculer automatiquement la similarité entre les thèmes détectés et les items des auto-questionnaires de QdV. Nous avons ainsi déterminé de nouveaux thèmes complémentaires à ceux déjà présents dans les questionnaires. Ce travail a ainsi mis en évidence que les données provenant des forums de santé sont susceptibles d'être utilisées pour mener une étude complémentaire de la QdV.Dans un troisième temps, nous nous sommes focalisés sur l’extraction de sentiments (polarité et émotions). Pour cela, nous avons évalué différentes méthodes et ressources pour la classification de sentiments en Français. Ces expérimentations ont permis de déterminer les caractéristiques utiles dans la classification de sentiments pour différents types de textes, y compris les textes provenant des forums de santé. Finalement, nous avons utilisé les différentes méthodes proposées dans cette thèse pour quantifier les thèmes et les sentiments identifiés dans les médias sociaux de santé.De manière générale, ces travaux ont ouvert des perspectives prometteuses sur diverses tâches d'analyse des médias sociaux pour la langue française et en particulier pour étudier la QdV des patients à partir des forums de santé. / In 2015, the number of new cases of breast cancer in France is 54,000.The survival rate after 5 years of cancer diagnosis is 89%.If the modern treatments allow to save lives, some are difficult to bear. Many clinical research projects have therefore focused on quality of life (QoL), which refers to the perception that patients have on their diseases and their treatments.QoL is an evaluation method of alternative clinical criterion for assessing the advantages and disadvantages of treatments for the patient and the health system. In this thesis, we will focus on the patients stories in social media dealing with their health. The aim is to better understand their perception of QoL. This new mode of communication is very popular among patients because it is associated with a great freedom of speech, induced by the anonymity provided by these websites.The originality of this thesis is to use and extend social media mining methods for the French language. The main contributions of this work are: (1) construction of a patient/doctor vocabulary; (2) detection of topics discussed by patients; (3) analysis of the feelings of messages posted by patients and (4) combinaison of the different contributions to quantify patients discourse.Firstly, we used the patient's texts to construct a patient/doctor vocabulary, specific to the field of breast cancer, by collecting various types of non-experts' expressions related to the disease, linking them to the biomedical terms used by health care professionals. We combined several methods of the literature based on linguistic and statistical approaches. To evaluate the relationships, we used automatic and manual validations. Then, we transformed the constructed resource into human-readable format and machine-readable format by creating a SKOS ontology, which is integrated into the BioPortal platform.Secondly, we used and extended literature methods to detect the different topics discussed by patients in social media and to relate them to the functional and symptomatic dimensions of the QoL questionnaires (EORTC QLQ-C30 and EORTC QLQ-BR23). In order to detect the topics discussed by patients, we applied the unsupervised learning LDA model with relevant preprocessing. Then, we applied a customized Jaccard coefficient to automatically compute the similarity distance between the topics detected with LDA and the items in the auto-questionnaires. Thus, we detected new emerging topics from social media that could be used to complete actual QoL questionnaires. This work confirms that social media can be an important source of information for the study of the QoL in the field of cancer.Thirdly, we focused on the extraction of sentiments (polarity and emotions). For this, we evaluated different methods and resources for the classification of feelings in French.These experiments aim to determine useful characteristics in the classification of feelings for different types of texts, including texts from health forums.Finally, we used the different methods proposed in this thesis to quantify the topics and feelings identified in the health social media.In general, this work has opened promising perspectives on various tasks of social media analysis for the French language and in particular the study of the QoL of patients from the health forums. Cancer du sein Qualité de vie Extraction d'information Fouille de textes Analyse des sentiments Détection des thèmes Breast cancer Quality of life Information retrieval Text mining Sentiment analysis Topic detection

Search results