Global ETD Search

11	針對臉書粉絲專頁貼文之政治傾向預測 / Predicting Political Affiliation for Posts on Facebook Fan Pages 張哲嘉, Chang, Che Chia Unknown Date (has links) 近年來社群媒體興起，尤其以臉書為主。在台灣超過1500萬個臉書用戶，其遍及族群從公眾人物到一般民眾。此外，這類的新興資訊交流平台其實內含許多有意義的資訊，每一則貼文都隱含著每個使用者的情緒以及立場傾向。然而，利用社群媒體來預測選舉與使用者政治傾向已成為目前的趨勢，在台灣各政黨與政治人物紛紛成立粉絲專頁，投入利用網路與社群媒體來打選戰與預測民調。本研究發現此一特性，致力於預測粉絲專頁貼文之政治傾向，收集台灣兩大政黨派國民黨與民進黨之粉絲專頁貼文，建立兩種預測模型分別為以相異字為特徵模型與文字互動特徵模型。利用資料探勘之相關技術，以貼文所含藍綠政黨特徵表現建立分類器，並細部探討與設計多種特徵組合，比較不同特徵組合之預測效果與影響因素以及在預測資料不平衡的情況下是否影響分類結果。最後，研究結果顯示使用文字特徵中黨派典型字與互動特徵值域取對數並搭配KNN分類器效果最佳，其準確度可達0.908，F1-score可達0.827。 / Recently, the social media is becoming more and more popular, especially Facebook. In Taiwan, there are 15 million Facebook users from celebrities to the general public. Receiving information every day from Facebook has become a lifestyle of most people. These new information-exchanging platforms contain lots of meaningful messages including users' emotions and affiliations. Moreover, using the social media data to predict the election result and political affiliation is becoming the current trend in Taiwan. For example, politicians try to win the election and predict the polls by means of Internet and the social media, and every political parties also have their own fan pages. In this thesis, we make an effort to predict the political inclinations of the posts of fan pages, especially for KMT and DPP which are the two largest political parties in Taiwan. We filter the appropriate literal and interactive features. We use the posts of the two parties to predict the political inclinations by constructing the classification models .In the end, we compare the performances of different classifiers .The result shows that the literal and interactive features work the best with KNN classifier, whose accuracy and F1-score are 0.908 and 0.827, respectively. 政治傾向分類臉書文字探勘 political affiliation classification facebook text mining
12	運用文字探勘技術輔助建構法律條文之語意網路－以公司法為例張露友 Unknown Date (has links) 本論文運用文字探勘相關技術，嘗試自動計算法條間的相似度，輔助專家從公司法眾多法條中整理出規則，建立法條之間的關聯，使整個法典並不是獨立的法條條號與法條內容的集合，而是在法條之間透過語意的方式連結成網路，並從分析與解釋關聯的過程中，探討文字探勘技術運用於法律條文上所遭受之困難及限制，以供後續欲從事相關研究之參考。本論文的研究結果，從積極面來看，除了可以建立如何運用文字探勘於輔助法律知識擷取的方法之外，另一方面，從消極面來看，倘若研究結果顯示，文字探勘技術並不完全適用於法律條文的知識擷取上，那麼對於從事類似研究的專業人員而言，本論文所提出的結論與建議，亦可作為改善相關技術的重要參考。 / This thesis tries to use text mining technique to calculate, compare and analyze the correlation of legal codes. And based on the well-known defined legal concept and knowledge, it also tries to help explain and evaluate the relations above using the result of automatic calculation. Furthermore, this thesis also wishes to contribute on how to apply information technology effectively onto legal knowledge domain. If the research reveals the positive result, it could be used for knowledge build-up on how to utilize text mining technology onto legal domain. However, if the study shows that text mining doesn’t apparently apply to knowledge extracting of legal domain, then the conclusion and suggestion from this thesis could also be regarded as a important reference to other professionals in the similar research fields. 文字探勘語意網路知識擷取 Text mining Semantic web Knowledge extraction
13	應用文字探勘技術於臺灣上市公司重大訊息對股價影響之研究 / The study on impact of material information of public listed company to its stock price by using text mining approach 吳漢瑞, Wu, Han Ruei Unknown Date (has links) 台灣股票市場屬於淺碟型，因此外界的訊息易於影響股價波動；同時台灣是一個以個別投資人為主的散戶市場，外界的訊息會影響市場投資。因此，重大訊息的發布對公司股價變化的影響，值得我們進一步探討。本研究以公開資訊觀測站之重大訊息為資料來源，蒐集2005~2009年間統一、中華電信、長榮航空以及臺灣企銀四間上市公司之重大訊息共1382篇。利用文字探勘kNN演算法將四間公司之重大訊息加以分群，分析出各訊息的發布對於股價之影響程度，並找出不同群組之重大訊息的漲跌趨勢，期能對未來即時重大訊息的發布，分析出其對於股價之漲跌影響，進一步得到訊息發布日後兩日之報酬率走勢，成為日後投資標的之選擇參考。本研究結果顯示取樣公司於發布前兩日至發布後兩日，交易量有顯著之異常，顯示訊息發布對於公司股票確有影響；而不同的重大訊息內容，將會被分於不同之群組當中，各群組也各有其不同之漲跌趨勢，本研究於測試資料之分類結果，整體平均有六成五之準確率，在於上漲類別之準確率更高達八成；最後於發布後累積報酬率之影響，投資正確率平均高於六成。本研究透過系統化之分析與預測，省去投資者對於重大訊息之搜尋以及解讀的時間，提供投資者一個可供參考之依據。 / In this study we used the technique of text mining to classify the material information of companies and analyze how the disclosure of it affects the market. Hence, we would be able to predict the price of stock based on disclosures of the material information and then use the outcome as reference of investment. This study chose the Market Observation Post System as the source of information to its justice. We chose UNI-PRESIDENT ENTERPRISES CORP, Chunghwa Telecom Co., Ltd, EVA AIRWAYS CORPORATION and Taiwan Business Bank for their great evaluation of the information disclosure. We collected 1382 material information from 2005 to 2009 and for the better performance, we selected kNN algorithm as our rule of classification. We conducted three experiments in this study. In these experiments, we have approved that the trading volume of two periods were with significant differences. We have over 60% accuracy of the all data to classify the tested data. As a result, we found that the return rate of the “up” group has over 60% upside probability and the “down” group has over 60% downside probability. In this study, we built a time-saving automatic system to group material information and find out those that are valuable. Based on our result, we provided a reference to investors for their investment strategy. At the same time, we also came up with some inspiration for future research. 重大訊息文字探勘 kNN演算法 Material Information Text Mining kNN Algorithm
14	運用文字探勘技術建立MD&A之分類閱讀器 / Using text-mining technology in developing a classified reader for MD&A 吳詩婷, Wu, Shih Ting Unknown Date (has links) 年報中富含眾多資訊，其中包含財務性資訊與文字性資訊，財務性資訊之分析方法已相當成熟，而文字性資訊受限於格式及檔案類型，而降低投資人使用或分析此類資訊之效率。管理階層討論與分析(Management’s Discussion & Analysis of Financial Condition and Results of Operations，以下簡稱MD&A)係管理階層傳達其經營決策觀點予投資人之媒介，投資人可透過閱讀MD&A取得更多資訊，過去學者之研究亦證實該項目內之文字性資訊有其重要性，由於文字性資訊缺乏通用之分類架構，因此投資人需耗費較多時間與成本分析該資訊。本研究自美國科技業上市公司，隨機選取40家企業2012年之年報作為樣本資料，藉由文字探勘技術，運用TFIDF將MD&A文字性內容分類至EBRC針對MD&A所發布之分類架構，建立分類閱讀器，使投資人可利用透過系統分類並彙整之文句，迅速取得所需之文字性資訊，以協助使用者有效率地閱讀這些非結構化之文字資訊，藉以減少資料蒐集之時間，提升文字性資訊之可使用性。 / Annual reports are rich in information, which contains financial information and textual information. While the approach of analyzing financial information is common, textual information is confined by its format or the file type it is stored, thus decreasing the efficiency of analyzing this sort of information. Management’s Discussion & Analysis of Financial Condition and Results of Operations (MD&A) is the vehicle for investor to share the sight of managements’ decision making consideration, through reading MD&A investor could obtain more information. According to past researches, textual information is of importance. Due to the lack of a common framework, investors would consume more time and cost to analyze textual information. This research randomly selected 40 samples from publicly traded technology firms of the United-States. Utilizing text-mining technology and TFIDF, classify textual information of MD&A into the framework EBRC established, developing a classified reader for MD&A. To assist investors read non-constructed textual information efficiently and reduce the time of information gathering, thereby enhancing the usability of textual information. 文字性資訊 MD&A 文字探勘 Textual information MD&A text-mining
15	內控缺失與財務報導一致性之關聯性 / The Relationship between Internal Control Weakness and the Financial Reporting Consistency 許正昇 Unknown Date (has links) 本研究使用TFIDF文字探勘技術分析樣本公司年度財務報告裏的管理階層討論與分析(Management’s Discussion & Analysis of Financial Condition and Results of Operations，以下簡稱MD&A)與財務資訊，欲探討公司內部控制有效性對於MD&A資訊與財務資訊一致性之影響。本研究樣本自2002年至2014年美國上市櫃公司之年報中選取，研究結果顯示，當內部控制出現重大缺失，會對企業財務報導一致性產生顯著影響，內部控制具備有效性，其財務資訊與MD&A文字性資訊所揭露之訊息較為一致。 / The major purpose of this study is to examine the relationship between internal control weakness and the financial reporting consistency. I use TFIDF text mining technology analysis the Management's Discussion & Analysis of Financial Condition and Results of Operations (MD&A) and financial information. All annual report of the US-listed companies from 2002 to 2014 are collected as data samples. As anticipated, we find that internal control weakness is negatively correlated to the financial reporting consistency. Companies with no internal control weakness present more consistent MD&A information comparing to their financial information. TFIDF 文字探勘 MD&A 內控缺失 TFIDF Text Mining MD&A Internal Control Weakness
16	基於文件相似度的標籤推薦-應用於問答型網站 / Applying Tag Recommendation base on Document Similarity in Question and Answer Website 葉早彬, Tsao, Pin Yeh Unknown Date (has links) 隨著人們習慣的改變，從網路上獲取新知漸漸取代傳統媒體，這也延伸產生許多新的行為。社群標籤是近幾年流行的一種透過使用者標記來分類與詮釋資訊的方式，相較於傳統分類學要求物件被分類到預先定義好的類別，社群標籤則沒有這樣的要求，因此容易因應內容的變動做出調整。問答型網站是近年來興起的一種個開放性的知識分享平台，例如quora、Stack Overflow、yahoo 奇摩知識+，使用者可以在平台上與網友做問答的互動，在問與答的討論中，結合大眾的經驗與專長，幫助使用者找到滿意的答案，使用單純的問答系統的好處是可以不必在不同且以分類為主的論壇花費時間尋找答案，和在關鍵字搜索中的結果花費時間尋找答案。本研究希望能針對問答型網站的文件做自動標籤分類，運用標籤推薦技術來幫助使用者能夠更有效率的找到需要的問題，也讓問答平台可以把這些由使用者所產生的大量問題分群歸類。在研究過程蒐集Stack Exchange問答網站共20638個問題，使用naïve Bayes演算法與文件相似度計算的方式，進行標籤推薦，推薦適合的標籤給新進文件。在研究結果中，推薦標籤的準確率有64.2% 本研究希望透過自動分類標籤，有效地分類問題。幫助使用者有效率的找到需要的問題，也能把這些由使用者所產生的大量問題分群歸類。 / With User's behavior change. User access to new knowledge from the internet instead of from the traditional media. This Change leads to a lot new behavior. Social tagging is popular in recent years through a user tag to classify and annotate information. Unlike traditional taxonomy requiring items are classified into predefined categories, Social tagging is more elastic to adjust through the content change. Q & A Website is the rise in recent years. Like Quora , Stack Overflow , yahoo Knowledge plus. User can interact with other people form this platform , in Q & A discussion, with People's experience and expertise to help the user find a satisfactory answer. This study hopes to build a tag recommendation system for Q & A Website. The recommendation system can help people find the right problem efficiently , and let Q & A platform can put these numerous problems into the right place. We collect 20,638 questions from Stack Exchange. Use naïve Bayes algorithm and document similarity calculation to recommend tag for the new document. The result of the evaluation show we can effectively recommend relevant tags for the new question. 文字探勘標籤推薦群眾智慧 Text Mining Tag Recommendation Collective Intelligence
17	應用文字探勘於影評文章自動摘要之研究 / A Study on Application of Text Mining for Automatic Text Summarization of Film Review 鄧亦安, Teng, I An Unknown Date (has links) 隨著網路世界的興起，在面臨選擇難題時，民眾不僅會接收口耳相傳的資訊，也會以關鍵字上網搜尋目標資訊，但是在海量資料的浪潮中，如何快速的整合資料是一大挑戰。電影影評文章摘要可以幫助民眾進電影院前了解電影的資訊，透過這樣的方式確認電影是自身有興趣的電影。本研究以電影：復仇者聯盟2影評66篇4616句、蝙蝠俠對超人：正義曙光60篇9345句、動物方城市60篇5545句、星際效應50篇4616句、高年級實習生62篇5622句為資料來源，以分群概念結合摘句之方法生成影評摘要。其中，利用K-Means演算法將五部電影的多篇影評特徵詞、句子進行分群後，使用TFIDF評比各分群語句的重要性來選取高權重語句，再以WWA方法挑選分群中不同面向的語句，最後以相似度計算最佳範本與各分群內容的相似度來決定每一群聚的排序順序，產生一篇具有相似內容段落和段落順序的影評多篇摘要。研究結果顯示，原本五部電影影評對最佳範本之相似度為15.87%，經由本研究方法產生之摘要對最佳範本單篇摘要之相似度為21.19%。另外，因為影評中各分群的順序是比對最佳範本相似度而產生的排序，整篇摘要會具有與最佳範本相似段落排序的摘要內容，其中內容包含了電影影評中廣泛提到的相似內容，不同的相似段落讓文章摘要的呈現更具廣泛性。藉由此摘要方法，可以幫助民眾藉由自動化彙整、萃取的摘要快速了解相關電影資訊內容和協助決策。 / Abstract As Facing the Big Data issue, there are too many information on the website for reader to understand. How to perform and summarize essential information quickly is a challenge. People who want to go to a movie will also face this situation. Before choosing movies, they will search relative information of the movies. However, there are many film reviews all over the websites. Automatic text summarization can efficiently extract important information for readers, and conclude concepts of reviews on the websites. Through this method, readers can easily comprehend the best idea of all the reviews and save their time. The research presents a multi-concept and extractive film review summary for readers. It generates film review summary from the most popular blog platform, PIXNET, with extract-based method and clustering concept. The method using K-Means algorism let the film review summary focus on specific film to cluster the sentences by features, and having statistical sense and WWA method to measure the weight of sentences in order to choose the representative sentences. On the last step, it will compare to templates to decide the sequence of classified sentences and summary all represent sentences from each cluster. The research provides a multi-concept and extractive film review summary for people. From the result, there are five movies, which are used summary method increase the average similarity to 21.19% that comparing between the film reviews summary and templates summary. It shows that the automatic film reviews summarization can extract the important sentences from the reviews. Also, with comparing template method to order the cluster, it can sequentially list the cluster of the sentences to generate a movie review, which saves readers’ time and easily comprehend. 文字探勘電影影評摘要自動文章摘要 Text-mining Film review summary Automatic text summarization
18	運用文字探勘技術協助建構公司治理本體知識陳言熙 Unknown Date (has links) 本體論的目的在表達一個大家能共用分享的概念，且為知識表達的重要基礎，可用來協助電腦搜尋、交換資訊及了解文字。本體論的應用使網路上的資源都能夠透過電腦明確的被定義出來，使機器透過本體論語言的描述，了解自然語言，加強資料檢索效率並達到知識共享的效果。本體論建置的困難點主要是有太多不同專業領域的領域本體知識需要被定義，所以非常的耗力費時。為了加強建置效率，需要依賴系統化的方法論來進行建置本體工程，並驗證其品質。為了使電腦能夠理解人類語言，許多研究者透過文字探勘技術發展能讓電腦理解的電子詞典，經過分析後將詞典中的詞彙連結成語意網絡，並將語意網路將應用於各種不同的研究領域。因此，本研究嘗試利用文字探勘技術協助建置本體知識，而結論包含可利用文字探勘技術半自動化的協助建置公司治理議題詞庫、語意網路，及以公司治理語意網路作為建置本體知識的基礎，並經由建置方法的提出，將語意網路轉化為公司治理本體知識。 / The purposes of ontology are offering reusable and sharable concepts, and being the base of knowledge representation. It serves a smart way of information searching and exchanging, the resources on internet can easily defined, and computer can understand people’s natural language by the application of ontology, improving the efficiency of data indexing. In order to let computer understand natural language, many researchers have worked hard on electronic lexicons containing computer’s logic through text mining technology, by analyzing lexicons for finding out relative vocabularies and connecting them into a semantic network. Therefore, this research try to utilize text mining technology to support on ontology engineering, the results are developing a text mining technology to support the building of corporate governance’s lexicon and semantic network semi-automatically, and take corporate governance semantic network as the bases of ontology engineering, and introduce a method to turn semantic network into corporate governance ontology. 本體論文字探勘語意網路公司治理議題 ontology text mining semantic network corporate governance
19	文件距離為基礎kNN分群技術與新聞事件偵測追蹤之研究 / A study of relative text-distance-based kNN clustering technique and news events detection and tracking 陳柏均, Chen, Po Chun Unknown Date (has links) 新聞事件可描述為「一個時間區間內、同一主題的相似新聞之集合」，而新聞大多僅是一完整事件的零碎片段，其內容也易受到媒體立場或撰寫角度不同有所差異；除此之外，龐大的新聞量亦使得想要瞭解事件全貌的困難度大增。因此，本研究將利用文字探勘技術群聚相關新聞為事件，以增進新聞所帶來的價值。分類分群為文字探勘中很常見的步驟，亦是本研究將新聞群聚成事件所運用到的主要方法。最近鄰 (k-nearest neighbor, kNN)搜尋法可視為分類法中最常見的演算法之一，但由於kNN在分類上必須要每篇新聞兩兩比較並排序才得以選出最近鄰，這也產生了kNN在實作上的效能瓶頸。本研究提出了一個「建立距離參考基準點」的方法RTD-based kNN (Relative Text-Distance-based kNN)，透過在向量空間中建立一個基準點，讓所有文件利用與基準點的相對距離建立起遠近的關係，使得在選取前k個最近鄰之前，直接以相對關係篩選出較可能的候選文件，進而選出前k個最近鄰，透過相對距離的概念減少比較次數以改善效率。本研究於Google News中抽取62個事件(共742篇新聞)，並依其分群結果作為測試與評估依據，以比較RTD-based kNN與kNN新聞事件分群時的績效。實驗結果呈現出RTD-based kNN的基準點以常用字字彙建立較佳，分群後的再合併則有助於改善結果，而在RTD-based kNN與kNN的F-measure並無顯著差距(α=0.05)的情況下，RTD-based kNN的運算時間低於kNN達28.13%。顯示RTD-based kNN能提供新聞事件分群時一個更好的方法。最後，本研究提供一些未來研究之方向。 / News Events can be described as "the aggregation of many similar news that describe the particular incident within a specific timeframe". Most of news article portraits only a part of a passage, and many of the content are bias because of different media standpoint or different viewpoint of reporters; in addition, the massive news source increases complexity of the incident. Therefore, this research paper employs Text Mining Technique to cluster similar news to a events that can value added a news contributed. Classification and Clustering technique is a frequently used in Text Mining, and K-nearest neighbor(kNN) is one of most common algorithms apply in classification. However, kNN requires massive comparison on each individual article, and it becomes the performance bottlenecks of kNN. This research proposed Relative Text-Distance-based kNN(RTD-based kNN), the core concept of this method is establish a Base, a distance reference point, through a Vector Space, all documents can create the distance relationship through the relative distance between itself and base. Through the concept of relative distance, it can decrease the number of comparison and improve the efficiency. This research chooses a sample of 62 events (with total of 742 news articles) from Google News for the test and evaluation. Under the condition of RTD-based kNN and kNN with a no significant difference in F-measure (α=0.05), RTD-based kNN out perform kNN in time decreased by 28.13%. This confirms RTD-based kNN is a better method in clustering news event. At last, this research provides some of the research aspect for the future. 文字探勘 kNN 事件偵測與追蹤分類分群 Text Mining kNN Events Detection and Tracking Classification and Clustering
20	雲端運算環境下基於知識本體之資訊檢索系統建置-以半導體產業為例 / Constructing ontology-based information retrieval system in cloud computing environment – the case of semiconductor industry 李佳穎, Li, Chia Ying Unknown Date (has links) 本研究針對半導體產業，提供一智慧型搜尋功能，讓使用者在大量資料中能快速及準確地搜尋。為達此目的，本研究中定義知識空間及其組成元素，並發展一組程式以產生該知識空間及知識空間搜尋機制，以提升使用者生產力。所使用到的技術包含：(1)建立知識本體，(2)計算兩詞彙同時出現頻率，(3)計算詞彙與文件關聯度，(4)發展知識空間搜尋環境。 / This study aims to provide an intelligent searching environment which users can search quickly and precisely from a large number of documents in semiconductor industry. In order to achieve the purpose, this paper defines a knowledge space and its composition elements to describe the knowledge of real world, and then develops a program to shorten the searching cost by providing the searching mechanism based on knowledge space. The techniques used in this study includes：(1) Construct 「Semiconductor Industry Ontology」(2) Compute the frequency of two terms appearing simultaneously (3) Compute the interrelatedness between terms and documents (4) Develop searching environment based on knowledge space. 雲端運算文字探勘知識本體資訊檢索 Cloud Computing Text Mining Ontology Information Retrieval

Search results