• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 247
  • 124
  • 44
  • 38
  • 31
  • 29
  • 24
  • 24
  • 13
  • 7
  • 6
  • 6
  • 5
  • 5
  • 5
  • Tagged with
  • 629
  • 629
  • 144
  • 132
  • 122
  • 115
  • 95
  • 89
  • 87
  • 82
  • 81
  • 77
  • 72
  • 67
  • 66
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
471

Webユーザレビューにおける評価情報の時系列変化の可視化

IGUCHI, Hiroto, HIRAO, Eiji, FURUHASHI, Takeshi, YOSHIKAWA, Tomohiro, UCHIDA, Yuki, 井口, 浩人, 平尾, 英司, 古橋, 武, 吉川, 大弘, 打田, 裕樹 30 September 2010 (has links)
No description available.
472

運用社會網絡技術由文集中探勘觀念:以新青年為例 / Concept Discovery from Essays based on Social Network Mining: Using New Youth as an Example

陳柏聿, Chen, Po Yu Unknown Date (has links)
以往人文歷史領域的學者們,以土法煉鋼的人工方式進行資料的研究與分析,這樣的方法在資料量不大的時候還可行,但隨著數位典藏的進行以及巨量資料的興起,傳統的書本、古籍和文獻大量的數位化,若繼續使用傳統逐條分析的方式便會花費很多的時間與人力,但也因為資料數位化的關係,資訊領域的人便能利用資訊技術從旁進行協助。 而其中在觀念史研究領域裡,關鍵詞叢的研究是其中的重點之一,因為觀念可以用關鍵詞或含關鍵詞的句子來表達,所以研究關鍵詞就能幫助人文學者,了解史料文獻背後的意義與掌握當時的脈絡。因此本篇論文研究之目的在於針對收錄多篇文章的文集,探討詞彙與詞彙之間出現在文章中的情形,並利用五種的共現關係,將社群網絡的概念引入到文本分析之中,將每個詞彙當作節點,詞彙之間的關聯性當作邊建立詞彙網絡,從中找出詞彙所形成的觀念,最後實作一個由文集中探勘觀念的系統,此系統主要提供三種分析功能,分別是多詞彙觀念查詢、單詞彙觀念查詢與潛在觀念探勘。 本研究主要以《新青年》雜誌作為主要的觀察文集與實驗案例分析,《新青年》中觀念由自由主義轉向馬克思列寧主義,而我們利用本系統的確能夠找出變化的軌跡,以及探勘兩個觀念下的關鍵詞彙。 / With development of the digital archives, essays have been digitized. While it takes much time to analyze the contents of essays by human, it is beneficial to analyze by computer. This thesis aims to investigate the approach to discover concepts of essays based on social network mining techniques. While a concept can be represented as a set of keywords, the proposed approach measure the co-occurrence relationships between two keywords and represent the relationships among keywords by networks of keywords. Social network mining techniques are employed to discover the concepts of essays. We also develop the concept discovery system which provides discovery by multiple keywords, discovery by single keyword, and latent concept mining. The New Youth is taken as an example to demonstrate the capability of the developed system.
473

Extracting Clinical Findings from Swedish Health Record Text

Skeppstedt, Maria January 2014 (has links)
Information contained in the free text of health records is useful for the immediate care of patients as well as for medical knowledge creation. Advances in clinical language processing have made it possible to automatically extract this information, but most research has, until recently, been conducted on clinical text written in English. In this thesis, however, information extraction from Swedish clinical corpora is explored, particularly focusing on the extraction of clinical findings. Unlike most previous studies, Clinical Finding was divided into the two more granular sub-categories Finding (symptom/result of a medical examination) and Disorder (condition with an underlying pathological process). For detecting clinical findings mentioned in Swedish health record text, a machine learning model, trained on a corpus of manually annotated text, achieved results in line with the obtained inter-annotator agreement figures. The machine learning approach clearly outperformed an approach based on vocabulary mapping, showing that Swedish medical vocabularies are not extensive enough for the purpose of high-quality information extraction from clinical text. A rule and cue vocabulary-based approach was, however, successful for negation and uncertainty classification of detected clinical findings. Methods for facilitating expansion of medical vocabulary resources are particularly important for Swedish and other languages with less extensive vocabulary resources. The possibility of using distributional semantics, in the form of Random indexing, for semi-automatic vocabulary expansion of medical vocabularies was, therefore, evaluated. Distributional semantics does not require that terms or abbreviations are explicitly defined in the text, and it is, thereby, a method suitable for clinical corpora. Random indexing was shown useful for extending vocabularies with medical terms, as well as for extracting medical synonyms and abbreviation dictionaries.
474

運用文字探勘技術探討國際財務報導準則對企業財務報告揭露之影響 / Disclosure quality and IFRS adoption:a text mining approach

廖培君, Liao, Pei Chun Unknown Date (has links)
本研究探討國際財務報導準則採用後對英國上市公司財務報告揭露品質之影響,選取高科技產業公司於國際財務報導準則轉換年度、轉換年度前後兩年之年報,並根據IAS 38, Edvinsson and Malone (1997), Lev (2001), and Sveiby (1997)編纂智慧資本字典,與先前研究之差異處在於本研究採用文字探勘技術之分類演算法以探討智慧資本揭露品質是否和國際財務報導準則之採用有關,結果顯示智慧資本揭露品質和國際財務報導準則之採用有關,接著本研究運用迴歸分析,進一步了解那些智慧資本項目之揭露於採用前後有顯著差異,結果顯示在國際財務報導準則採用後,高科技公司增加智慧資本項目之揭露,符合本研究之預期,有顯著差異之智慧資本項目如:電腦軟體、顧客名單、顧客忠誠度、顧客關係和專利,研究結果也指出在國際財務報導準則採用後,高科技公司增加智慧資本項目之揭露之現象較常發生在上市時間較早之公司、總資產較大之公司。 / This study investigates the impact of the quality of disclosures of financial reports of the listed companies in the U.K. with International Financial Reporting Standards (IFRS) adoption. I select the annual reports of companies in the high-tech industry sectors in the IFRS transition year and two years before and after the transition year. The dictionary for intellectual capital according to four sources, IAS 38, Edvinsson and Malone (1997), Lev (2001), and Sveiby (1997) is compiled. In contrast to prior studies, I use classification algorithm of text mining techniques to explore whether the quality of intellectual capital disclosures is related with the adoption of IFRS. Results show that the disclosures of intellectual capital items are related with the adoption of IFRS. To further realize which intellectual capital item disclosures are significantly different between pre-adoption and post-adoption, the regression analysis is applied. Evidence is promising, in the post-IFRS period, high-tech firms may increase the intellectual capital item disclosures, such as computer software, customer list, customer loyalty, customer relationships and patents. Evidence also indicates that, the evidence that high-tech firms may increase the intellectual capital item disclosures in the post-IFRS period is more pronounced in older and larger companies.
475

應用文字探勘分析網路團購商品群集之研究 -以美食類商品為例 / The study of analyzing group-buying goods clusters by using text mining – exemplified by the group-buying foods

趙婉婷 Unknown Date (has links)
網路團購消費模式掀起一陣風潮,隨著網路團購市場接受度提高,現今以團購方式進行購物的消費模式不斷增加,團購商品品項也日益繁多。為了使網路團購消費者更容易找到感興趣的團購商品,本研究將針對團購商品進行群集分析。 本研究以國內知名團購網站「愛合購」為例,以甜點蛋糕分類下的熱門美食團購商品為主,依商品名稱找尋該商品的顧客團購網誌文章納入資料庫中。本研究從熱門度前1000項的產品中找到268項產品擁有顧客團購網誌586篇,透過文字探勘技術從中擷取產品特徵相關資訊,並以「k最近鄰居法」為基礎建置kNN分群器,以進行群集分析。本研究依不同的k值以及分群門檻值進行分群,並對大群集進行階段式分群,單項群集進行質心合併,以尋求較佳之分群結果。 研究結果顯示,268項團購商品經過kNN分群器進行四個階段的群集分析後可獲得28個群集,群內相似度從未分群時的0.029834提升至0.177428。在經過第一階段的分群後,可將商品分為3個主要大群集,即「麵包類」、「蛋糕類」以及「其他口感類」。在進行完四個階段的分群後,「麵包類」可分為2種類型的群集,即『麵包類產品』以及『擁有麵包特質的產品』,而「蛋糕類」則是可依口味區分為不同的蛋糕群集。產品重要特徵詞彙不像一般文章的關鍵字詞會重複出現於文章中,因此在特徵詞彙過濾時應避免刪減過多的產品特徵詞彙。群集特性可由詞彙權重前20%之詞彙依人工過濾及商品出現頻率挑選出產品特徵代表詞來做描繪。研究所獲得之分群結果除了提供團購消費者選擇產品時參考外,也可幫助團購網站業者規劃更適切的行銷活動。本研究亦提出一些未來研究方向。 / Group-buying is prevailing, the items of merchandise diverse recently. In order to let consumer find the commodities they are interested in, the research focus on the cluster analysis about group-buying products and clusters products by the features of them. We catch the blogs of products posted by customers, via text mining to retrieve the features of products, and then establish the kNN clustering device to cluster them. This research sets different threshold values to test, and multiply clusters big groups, and merges small groups by centroid, we expect to obtain the best quality cluster. From the results, 268 items of group-buying foods can be divided into 28 clusters, and the mean of Intra-Similarity also can be improved. The 28 clusters can be categorized to three main clusters:Bread, Cake, and Other mouthfeel foods. We can define and name each cluster by catch the top twenty percent of the keywords in each cluster. The results of this paper could help buyers find similar commodities which they like, and also help sellers make the great marketing activity plan.
476

運用資料及文字探勘探討不同市場營運概況文字敘述及財務表現之一致性 / Using data and text mining to explore for consistencies between narrative disclosures and financial performance in different markets

江韋達, Chiang, Danny Wei Ta Unknown Date (has links)
本研究使用TFIDF文字探勘技術分析樣本公司年度財務報告裡面的重要非量化資訊,與三項量化財務比率比較,欲探討公司年報在不同市場裡文字敘述與財務表現之一致性。研究結果顯示,根據從2003年至2010年上市半導體公司之年度報告,美國公司的年報較會對財務表現做出誇大的文字敘述,本研究亦發現在文字敘述上,市場較不成熟的中國公司所發布之年報較偏向低估他們的財務表現。 / This study presented a way to extract useful information out of unstructured qualitative textual data with the use of the TFIDF text mining technique, which was used to help us explore for consistencies between financial performance in the form of quantitative financial ratios and qualitative narrative disclosures in the annual report between countries with different levels of market development. The results show that, based on listed semiconductor companies' annual reports between 2003 to 2010, companies in the United States have a high tendency to exaggerate and overstate about their performance in the MD&A, while less developed markets such as China turned out to have the lowest tendency to exaggerate and was more likely to understate about its performance in their Director's Report.
477

Knowledge discovery using pattern taxonomy model in text mining

Wu, Sheng-Tang January 2007 (has links)
In the last decade, many data mining techniques have been proposed for fulfilling various knowledge discovery tasks in order to achieve the goal of retrieving useful information for users. Various types of patterns can then be generated using these techniques, such as sequential patterns, frequent itemsets, and closed and maximum patterns. However, how to effectively exploit the discovered patterns is still an open research issue, especially in the domain of text mining. Most of the text mining methods adopt the keyword-based approach to construct text representations which consist of single words or single terms, whereas other methods have tried to use phrases instead of keywords, based on the hypothesis that the information carried by a phrase is considered more than that by a single term. Nevertheless, these phrase-based methods did not yield significant improvements due to the fact that the patterns with high frequency (normally the shorter patterns) usually have a high value on exhaustivity but a low value on specificity, and thus the specific patterns encounter the low frequency problem. This thesis presents the research on the concept of developing an effective Pattern Taxonomy Model (PTM) to overcome the aforementioned problem by deploying discovered patterns into a hypothesis space. PTM is a pattern-based method which adopts the technique of sequential pattern mining and uses closed patterns as features in the representative. A PTM-based information filtering system is implemented and evaluated by a series of experiments on the latest version of the Reuters dataset, RCV1. The pattern evolution schemes are also proposed in this thesis with the attempt of utilising information from negative training examples to update the discovered knowledge. The results show that the PTM outperforms not only all up-to-date data mining-based methods, but also the traditional Rocchio and the state-of-the-art BM25 and Support Vector Machines (SVM) approaches.
478

Προστασία διανοητικής ιδιοκτησίας και επιπτώσεις στην καινοτομικότητα, την τεχνολογική πρόοδο και την έρευνα. Χρήση τεχνικών εξόρυξης γνώσης από κείμενο σε διπλώματα ευρεσιτεχνίας

Γεωργής, Γεώργιος 15 October 2012 (has links)
Το υπάρχον σύστημα κατοχύρωσης της Πνευματικής Ιδιοκτησίας και ιδιαίτερα της Βιομηχανικής Ιδιοκτησίας μέσω των Διπλωμάτων Ευρεσιτεχνίας δημιουργήθηκε με σκοπό να να προάγει την έρευνα και την επιστήμη. Η πολιτεία χορηγεί το μονοπωλιακό δικαίωμα στην χρήση και εκμετάλλευση μιας εφέυρεσης με αντάλλαγμα την δημόσια αποκάλυψη της ευρεσιτεχνίας από τον εφευρέτη. Οι υπόλοιποι ερευνητές πλέον μπορούν να χρησιμοποι-ήσουν ελεύθερα αυτή την (δημοσιευμένη) γνώση στην έρευνά τους ενώ οι επιχειρήσεις μπορούν καταβάλλοντας κάποιο τίμημα στον εφευρέτη να την χρησιμοποιήσουν για εμπο-ρικούς σκοπούς. Συνεπώς, με βάση τα παραπάνω, το σκεπτικό πίσω από την χορήγηση διπλωμάτων ευρεσιτεχνίας είναι η αύξηση των επενδύσεων σε έρευνα και καινοτομικότητα με ταυτόχ-ρονη διάχυση των αποτελεσμάτων μέσω της χορήγησης ενός προσωρινού μονοπωλίου. Όμως καθώς πλέον η χρήση των διπλωμάτων ευρεσιτεχνίας αυξάνεται διαρκώς και η χρήση τους γενικεύεται σε ζώντες οργανισμούς, φυτά, προγράμματα υπολογιστών προ-κύπτουν ερωτήματα σχετικά με την ανάγκη αναθεώρησης του υπάρχοντος σύστηματος Προστασίας της Διανοητικής Ιδιοκτησίας και της διαδικασίας εξέτασης και χορήγησης ευ-ρεσιτεχνιών. Επίσης εξετάζεται η χρήση τεχνικών εξόρυξης γνώσης από ΔΕ και οι δυνατότητες που οι τεχνικές αυτές μπορούν να προσφέρουν. / The existing system of Intellectual Property rights is analysed, and more specifically the patent system. Patents grant a monopoly right for a specific amount of time to the applicant in exchange of full disclosure. The existing patent system is examined along with controversial issues and grants and a text mining method for information extraction is tested.
479

Προδιαγραφές μιας καινοτόμας πλατφόρμας ηλεκτρονικής μάθησης που ενσωματώνει τεχνικές επεξεργασίας φυσικής γλώσσας

Φερφυρή, Ναυσικά 04 September 2013 (has links)
Ζούμε σε μια κοινωνία στην οποία η χρήση της τεχνολογίας έχει εισβάλει δυναμικά στην καθημερινότητα.Η εκπαίδευση δεν θα μπορούσε να μην επηρεαστεί απο τις Νέες Τεχνολογίες.Ήδη,όροι όπως “Ηλεκτρονική Μάθηση” και ”Ασύγχρονη Τηλε-εκπαίδευση” έχουν δημιουργήσει νέα δεδομένα στην κλασική Εκπαίδευση. Με τον όρο ασύγχρονη τηλε-εκπαίδευση εννοούμε μια διαδικασία ανταλλαγής μάθησης μεταξύ εκπαιδευτή - εκπαιδευομένων,που πραγματοποιείται ανεξάρτητα χρόνου και τόπου. Ηλεκτρονική Μάθηση είναι η χρήση των νέων πολυμεσικών τεχνολογιών και του διαδικτύου για τη βελτίωση της ποιότητας της μάθησης,διευκολύνοντας την πρόσβαση σε πηγές πληροφοριών και σε υπηρεσίες καθώς και σε ανταλλαγές και εξ'αποστάσεως συνεργασίες.Ο όρος καλύπτει ένα ευρύ φάσμα εφαρμογών και διαδικασιών,όπως ηλεκτρονικές τάξεις και ψηφιακές συνεργασίες, μάθηση βασιζόμενη στους ηλεκτρονικούς υπολογιστές και στις τεχνολογίες του παγκόσμιου ιστού. Κάποιες απο τις βασικές απαιτήσεις που θα πρέπει να πληρούνται για την δημιουργία μιας πλατφόρμας ηλεκτρονικής μάθησης είναι: Να υποστηρίζει τη δημιουργία βημάτων συζήτησης (discussion forums) και “δωματίων συζήτησης”(chat rooms),να υλοποιεί ηλεκτρονικό ταχυδρομείο,να έχει φιλικό περιβάλλον τόσο για το χρήστη/μαθητή όσο και για το χρήστη/καθηγητή,να υποστηρίζει προσωποποίηση(customization)του περιβάλλοντος ανάλογα με το χρήστη.Επίσης να κρατάει πληροφορίες(δημιουργία profiles)για το χρήστη για να τον “βοηθάει”κατά την πλοήγηση,να υποστηρίζει την εύκολη δημιουργία διαγωνισμάτων(online tests), να υποστηρίζει την παρουσίαση πολυμεσικών υλικών. Ως επεξεργασία φυσικής γλώσσας (NLP) ορίζουμε την υπολογιστική ανάλυση αδόμητων δεδομένων σε κείμενα, με σκοπό την επίτευξη μηχανικής κατανόησης του κειμένου αυτού.Είναι η επεξεργασία προτάσεων που εισάγονται ή διαβάζονται από το σύστημα,το οποίο απαντά επίσης με προτάσεις με τρόπο τέτοιο που να θυμίζει απαντήσεις μορφωμένου ανθρώπου. Βασικό ρόλο παίζει η γραμματική,το συντακτικό,η ανάλυση των εννοιολογικών στοιχείων και γενικά της γνώσης, για να γίνει κατανοητή η ανθρώπινη γλώσσα από τη μηχανή. Οι βασικές τεχνικές επεξεργασίας φυσικού κειμένου βασίζονται στις γενικές γνώσεις σχετικά με τη φυσική γλώσσα.Χρησιμοποιούν ορισμένους απλούς ευρετικούς κανόνες οι οποίοι στηρίζονται στη συντακτική και σημασιολογική προσέγγιση και ανάλυση του κειμένου.Ορισμένες τεχνικές που αφορούν σε όλα τα πεδία εφαρμογής είναι: ο διαμερισμός στα συστατικά στοιχεία του κειμένου (tokenization), η χρήση της διάταξης του κειμένου (structural data mining), η απαλοιφή λέξεων που δεν φέρουν ουσιαστική πληροφορία (elimination of insignificant words),η γραμματική δεικτοδότηση (PoS tagging), η μορφολογική ανάλυση και η συντακτική ανάλυση. Στόχος της παρούσας διπλωματικής είναι να περιγράψει και να αξιολογήσει πως οι τεχνικές επεξεργασίας της φυσικής γλώσσας (NLP), θα μπορούσαν να αξιοποιηθούν για την ενσωμάτωση τους σε πλατφόρμες ηλεκτρονικής μάθησης.Ο μεγάλος όγκος δεδομένων που παρέχεται μέσω μιας ηλεκτρονικής πλατφόρμας μάθησης, θα πρέπει να μπορεί να διαχειριστεί , να διανεμηθεί και να ανακτηθεί σωστά.Κάνοντας χρήση των τεχνικών NLP θα παρουσιαστεί μια καινοτόμα πλατφόρμα ηλεκτρονικής μάθησης,εκμεταλεύοντας τις υψηλού επιπέδου τεχνικές εξατομίκευσης, την δυνατότητα εξαγωγής συμπερασμάτων επεξεργάζοντας την φυσική γλώσσα των χρηστών προσαρμόζοντας το προσφερόμενο εκπαιδευτικό υλικό στις ανάγκες του κάθε χρήστη. / We live in a society in which the use of technology has entered dynamically in our life,the education could not be influenced by new Technologies. Terms such as "e-Learning" and "Asynchronous e-learning" have created new standards in the classical Education. By the term “asynchronous e-learning” we mean a process of exchange of learning between teacher & student, performed regardless of time and place. E-learning is the use of new multimedia technologies and the Internet to improve the quality of learning by facilitating access to information resources and services as well as remote exchanges .The term covers a wide range of applications and processes, such electronic classrooms, and digital collaboration, learning based on computers and Web technologies. Some of the basic requirements that must be met to establish a platform for e-learning are: To support the creation of forums and chat rooms, to deliver email, has friendly environment for both user / student and user / teacher, support personalization depending to the user . Holding information (creating profiles) for the user in order to provide help in the navigation, to support easy creating exams (online tests), to support multimedia presentation materials. As natural language processing (NLP) define the computational analysis of unstructured data in text, to achieve mechanical understanding of the text. To elaborate proposals that imported or read by the system, which also responds by proposals in a manner that reminds answers of educated man. A key role is played by the grammar, syntax, semantic analysis of data and general knowledge to understand the human language of the machine. The main natural text processing techniques based on general knowledge about natural language .This techniques use some simple heuristic rules based on syntactic and semantic analysis of the text. Some of the techniques pertaining to all fields of application are: tokenization, structural data mining, elimination of insignificant words, PoS tagging, analyzing the morphological and syntactic analysis. The aim of this study is to describe and evaluate how the techniques of natural language processing (NLP), could be used for incorporation into e-learning platforms. The large growth of data delivered through an online learning platform, should be able to manage, distributed and retrieved. By the use of NLP techniques will be presented an innovative e-learning platform, using the high level personalization techniques, the ability to extract conclusions digesting the user's natural language by customizing the offered educational materials to the needs of each user .
480

Automatic Identification of Duplicates in Literature in Multiple Languages

Klasson Svensson, Emil January 2018 (has links)
As the the amount of books available online the sizes of each these collections are at the same pace growing larger and more commonly in multiple languages. Many of these cor- pora contain duplicates in form of various editions or translations of books. The task of finding these duplicates is usually done manually but with the growing sizes making it time consuming and demanding. The thesis set out to find a method in the field of Text Mining and Natural Language Processing that can automatize the process of manually identifying these duplicates in a corpora mainly consisting of fiction in multiple languages provided by Storytel. The problem was approached using three different methods to compute distance measures between books. The first approach was comparing titles of the books using the Levenstein- distance. The second approach used extracting entities from each book using Named En- tity Recognition and represented them using tf-idf and cosine dissimilarity to compute distances. The third approach was using a Polylingual Topic Model to estimate the books distribution of topics and compare them using Jensen Shannon Distance. In order to es- timate the parameters of the Polylingual Topic Model 8000 books were translated from Swedish to English using Apache Joshua a statistical machine translation system. For each method every book written by an author was pairwise tested using a hypothesis test where the null hypothesis was that the two books compared is not an edition or translation of the others. Since there is no known distribution to assume as the null distribution for each book a null distribution was estimated using distance measures of books not written by the author. The methods were evaluated on two different sets of manually labeled data made by the author of the thesis. One randomly sampled using one-stage cluster sampling and one consisting of books from authors that the corpus provider prior to the thesis be considered more difficult to label using automated techniques. Of the three methods the Title Matching was the method that performed best in terms of accuracy and precision based of the sampled data. The entity matching approach was the method with the lowest accuracy and precision but with a almost constant recall at around 50 %. It was concluded that there seems to be a set of duplicates that are clearly distin- guished from the estimated null-distributions, with a higher significance level a better pre- cision and accuracy could have been made with a similar recall for the specific method. For topic matching the result was worse than the title matching and when studied the es- timated model was not able to create quality topics the cause of multiple factors. It was concluded that further research is needed for the topic matching approach. None of the three methods were deemed be complete solutions to automatize detection of book duplicates.

Page generated in 0.0807 seconds