Global ETD Search

1	分群技術之研究姚蘊雯 Unknown Date (has links) No description available. 分群技術判別分析
2	階層式分群法在民事裁判要旨分群上之應用 / An Application of Hierarchical Clustering of Documents for Civil Judgments 何君豪, Ho,Jim How Unknown Date (has links) 司法院經常聘請資深的法官將民事裁判中具有參考價值的法律意見摘錄出來，製作成民事裁判要旨，民事裁判要旨可作為法官審理類似案件時的辦案參考，因此，在司法實務上民事裁判的搜尋為不可或缺的工作。然隨著資訊科技的發達及裁判數量的累積，民裁判要旨的搜尋結果可能多達數百篇，造成法官須耗費大量的時間在民事裁判要旨的閱讀上，如果能利用資料探勘的技術將搜尋到的民事裁判要旨加以分群，且分群的正確性又可達到一定旳水準，便可節省法官閱讀民事裁判要旨的時間。在本研究中我們嘗試將資料探勘技術中的階層式分群法應用在民事裁判要旨的分群上，並將法律條文所出現的用語作為加權的主關鍵字評估可否改善分群的效果，以探討資料探勘技術中的階層式分群法應用在民事裁判要旨分群上的可行性與成效。 / Judicial Yuan often invites senior civil judges to extract legal opinions from civil judgments for making the purports of civil judgments. The purports of civil judgments can be consulted as trial judges handle the similar cases, therefore, in judicial practices, it is an indispensable work for civil judges to search the purports of civil judgments. However, with the development of information technology and the cumulative number of judgments, the number of search results may be as high as hundreds, civil judges must have spent a lot of time reviewing of the purports of civil judgments. If we can utilize data mining technologies to cluster the search results, and the accuracy of clustering can be attained to a certain standard, it will save civil judges a lot of time on reviewing the purports of civil judgments. In this study we attempt to apply hierarchical method on the clustering of the purports of civil judgments, and adjust the weights of main keywords derived from frequently used vocabulary of legal provisions to assess the feasibility and effectiveness of application of hierarchical method on clustering of the purports of civil judgments. 人工智慧與法律階層式分群法聚合法分群 AI&Law Hierarchical Method Agglomerative Approach Cluster
3	應用資料探勘技術於食譜分享社群網站進行內容分群之研究 / A user-based content clustering system using data mining techniques on a recipe sharing website 林宜儒 Unknown Date (has links) 本研究以一個食譜分享社群網站為研究對象，針對網站上所提供的食譜建立了運用 kNN 分群演算法的自動分群機制，並利用該網站上使用者的使用行為進行分群後群集的特徵描述參考。本研究以三個階段建立了一針對食譜領域進行自動分群的資訊系統。第一階段為資料處理，在取得食譜網站上所提供的食譜資料後，雖然已經有相對結構化的格式可直接進行分群運算，然而由使用者所輸入的內容，仍有錯別字、贅詞、與食譜本身直接關連性不高等情形，因此必須進行處理。第二階段為資料分群，利用文字探勘進行內容特徵值的萃取，接著再以資料探勘的技術進行分群，分群的結果將會依群內的特徵、群間的相似度作為分群品質的主要指標。第三階段則為群集特徵分析，利用網站上使用者收藏食譜並加以分類的行為，運用統計的方式找出該群集的可能分類名稱。本研究實際以 500 篇食譜進行分群實驗，在最佳的一次分群結果中，可得到 10 個食譜群集、平均群內相似度為 0.4482，每個群集可觀察出明顯的相似特徵，並且可藉由網站上使用者的收藏行為，標註出其群集特徵，例如湯品、甜點、麵包、中式料理等類別。由於網站依照schema.org 所提供的食譜格式標準，針對網站上每一篇食譜內容進行了內容欄位的標記，本研究所實作之食譜分群機制，未來亦可運用在其他同樣採用 schema.org 所提供標準之同類型網站。文字探勘資料分群 text mining data clustering
4	行動應用軟體在迭代分群行為之研究 / Iterative Clustering on Behaviors of App Executables 邱莉晴, Chiu, Li Ching Unknown Date (has links) 行動裝置在現在這個世代相當普遍，而我們需要一個方法來探索App在背後的行為。本研究提出了一個非監督式的分群方式，目的是在於探討我們是否能使用App中的原始碼當作以行為分群的依據。在此研究中，我們應用了迭代分群的方式對Apps做分析，並且觀察分群的結果是否恰當。而在實驗中，我們由App Store下載了數百個App並加以分析，我們發現我們所提出的方式表現相當良好並且能給出正確的分群結果。 / Smart devices are everywhere nowadays. Mobile application (app) development has become one of the main streams in software industry with more than millions of apps that have been developed and published to billions of users. It is essential to have a systematic way to analyze apps, preferably on their executable that are the only public available sources of apps in most cases. In this work, we propose to apply unsupervised clustering to mobile applications on their system call distributions. This is done by first adopting a static binary analysis that reverses engineering on executable of apps to find method call/sequence counts that are embedded in apps. Apps are then clustered iteratively based on this information to reveal implicit relationships among apps based on function call similarity. The GHSOM (Growing Hierarchical Self-Organizing Map), an unsupervised learning tool, is integrated to cluster apps based on the information resolved from their executable directly. We use types of methods and sequences as features. To run the clustering algorithm on apps, however, we immediately confront a problem that we have a large amount of attributes and data that leads to a long/infeasible analysis time with GHSOMs. The new iterative approach is proposed to conquer this problem along with dimension reduction with principle component analysis, cutting attributes with limited information loss. In the preliminary result on analyzing hundreds of apps that are directly downloaded from Apple app store, we can find that the proposed clustering works well and reveals some interesting information. Apps that are developed by the same company are clustered in the same group. Apps that have similar behaviors, e.g., having the same functions on games, painting, socializing, are clustered together. 行動應用程式 GHSOM 分群 App Clustering GHSOM iterative
5	以企業財務資訊為基礎建構股票投資決策支援系統 / Decision Support System of Stocks Investment under Financial Information 黃加輝, Huang,Chia Hui Unknown Date (has links) 本研究採用統計分析與資料採礦的方法搭配企業生命週期理論與企業財務資訊做為股票投資決策的準則。其步驟順序為先將公司分群，並且按照生命週期理論命名各群為成長、成熟、老年期；再對各群分別使用決策樹與區別分析找出優秀股票的特徵；最後，按照特徵挑選出未來的報酬率有機會表現優秀的股票。期望在此方法下取得比台灣加權指數更高的報酬率。 / This paper adopted multivariate analysis and data mining to choose stock as a member of portfolio with financial indicators. The first, the public companies are divided into three periods according to life cycle by clustering; then, the rules are founded by decision tree and discriminant analysis; and the stocks are chosen as a member of portfolio. The result is that we can get higher return than TSEC weighted non-financial index. 決策支援資料採礦分群決策樹區別分析
6	基於音樂特徵以及文字資訊的音樂推薦 / Music recommendation based on music features and textual information 張筑鈞, Chang, Chu Chun Unknown Date (has links) 在WEB2.0的時代，網際網路中充斥著各式各樣的互動式平台。就音樂網站而言，使用者除了聽音樂外，更開始習慣於虛擬空間中交流及分享意見，並且在這些交流、分享的過程中留下他們的足跡，間接的提供許多帶有個人色彩的資訊。利用這些資訊，更貼近使用者的推薦系統因應而生。本研究中，將針對使用者過去存取過的音樂特徵以及使用者於系統中留下的文字評論特徵這兩個部份的資料，做音樂特徵的擷取、找尋具有價值的音樂特徵區間、建立使用者音樂特徵偏好，以及文字特徵的擷取、建立使用者文字特徵偏好。接著，採用協同式推薦方式，將具有相同興趣的使用者分於同一群，推薦給使用者與之同群的使用者的喜好物件，但這些推薦之物件為該使用者過去並沒有任何記錄於這些喜好物件上之物件。我們希望對於音樂推薦考慮的開始不只是音樂上之特徵，更包含了使用者交流、互動中留下的訊息。 / In the era of Web2.0, it is flooded with a variety of interactive platforms on the internet. In terms of music web site, in addition to listening to music, users got used to exchanging their comments and sharing their experiences through virtual platforms. And through the process of exchanging and sharing, they left their footprints. These footprints indirectly provide more information about users that contains personal characteristics. Moreover, from this information, we can construct a music recommendation system, which provides personalized service. In this research, we will focus on user’s access histories and comments of users to recommend music. Moreover, the user’s access histories are analyzed to derive the music features, then to find the valuable range of music features, and construct music profiles of user interests. On the other hand, the comments of users are analyzed to derive the textual features, then to calculate the importance of textual features, and finally to construct textual profiles of user interests. The music profile and the textual profile are behaviors for user grouping. The collaborative recommendation methods are proposed based on the favorite degrees of the users to the user groups they belong to. 音樂推薦使用者分群特徵切割 music recommendation user clustering feature partition
7	應用主題探勘與標籤聚合於標籤推薦之研究 / Application of topic mining and tag clustering for tag recommendation 高挺桂, Kao, Ting Kuei Unknown Date (has links) 標記社群標籤是Web2.0以來流行的一種透過使用者詮釋和分享資訊的方式，作為傳統分類方法的替代，其方便、靈活的特色使得使用者能夠輕易地因應內容標註標籤。不過其也有缺點，除了有相當多無標籤標註的內容，也存在大量模糊、不精確的標籤，降低了系統本身組織分類標籤的能力。為了解決上述兩項問題，本研究提出了一種結合主題探勘與標籤聚合的自動化標籤推薦方法，期望能夠建立一個去人工過程的自動化標籤推薦規則，來推薦合適的標籤給使用者。本研究蒐集了痞客邦部落格中，點閱次數大於5000次的熱門中文文章共2500篇，經過前處理，並以其中1939篇訓練模型及400篇作為測試語料來驗證方法。在主題探勘部分，本研究利用LDA主題模型計算不同文章的主題語意，來與既有標籤作出關聯，而能夠針對新進文章預測主題並推薦主題相關標籤給它。其中，本研究利用了能評斷模型表現情形的混淆度(Perplexity)來協助選取LDA的主題數，改善了LDA需要人主觀決定主題數的問題；在標籤聚合部分，本研究以階層式分群法，將有共同出現過的標籤群聚起來，以便找出有相似語意概念的標籤。其中，本研究將分群停止條件設定為共現次數最少為1次，改善了分群方法需要設定分群數量才能有結果的問題，也使本方法能夠自動化的找出合適的分群數目。實驗結果顯示，依照文章主題語意來推薦標籤有一定程度的可行性，且以混淆度所協助選取的主題數取得一致性較好的結果。而依照階層式分群所分出的標籤群中，同一群中的標籤確實擁有相似、類似的概念語意。最後，在結合主題探勘與標籤聚合的方法上，其Top-1至Top-5的準確率平均提升了14.1%，且Top-1準確率也達到72.25%。代表本研究針對文章寫作及標記標籤的習性切入的做法，確實能幫助提升標籤推薦的準確率，也代表本研究確實建立了一個自動化的標籤推薦規則，能推薦出合適的標籤來幫助使用者在撰寫文章後，能夠更方便、精確的標上標籤。 / Tags are a popular way of interpreting and sharing information through use, and as a substitute for traditional classification methods, the convenience and flexibility of the community makes it easy for users to use. But it also has disadvantages, in addition to a considerable number of non-tagged content, there are also many fuzzy and inaccurate tags. To solve these two problems, this study proposes a tag recommendation method that combines the Topic Mining and Tag Clustering. In this study, we collected a total of 2500 articles by Pixnet as a corpus. In the Topic Mining section, this study uses the LDA Model to calculate the subject semantics of different articles to associate with existing tags, and we can predict topics for new articles to recommend topics related tags to them. Among them, the topics number of the LDA Model uses the Perplexity to help the selection. In the Tag Clustering section, this study uses the Hierarchical Clustering to collect the tags that have appeared together to find similar semantic concepts. The stop condition is set to a minimum of 1 co-occurrence times, which solves the problem that the clustering method needs to set the number of groups to have the result. First, the Topic Mining results show that it is feasible to recommend tags according to the semantics of the article, and the experiment proves that the number of topics chosen according to the Perplexity is superior to the other topics. Second, the Tag Clustering results show that the same group of tags does have similar conceptual semantics. Last, experiments show that the accuracy rate of Top-1 to Top-5 in combination with two methods increased average of 14.1%, and its Top-1 accuracy rate is 72.25%,and it tells that our tag recommendation method can recommend the appropriate tag for users to use. 標籤推薦主題模型階層式分群 Tag recommendation Topic model Hierarchical clustering
8	設計與實作一個針對遊戲論壇的中文文章整合系統 / Design and Implementation of a Chinese Document Integration System for Game Forums 黃重鈞, Huang, Chung Chun Unknown Date (has links) 現今網路發達便利，人們資訊交換的方式更多元，取得資訊的方式，不再僅是透過新聞，透過論壇任何人都可以快速地、較沒有門檻地分享資訊。也因為這個特性造成資訊量暴增，就算透過搜尋引擎，使用者仍需要花費許多精力蒐集、過濾與處理特定的主題。本研究以巴哈姆特電玩資訊站─英雄聯盟哈拉討論板為例，期望可以為使用者提供一個全面且精要的遊戲角色描述，讓使用者至少對該角色有大概的認知。本研究參考網路論壇探勘及新聞文件摘要系統，設計適用於論壇多篇文章的摘要系統。首先必須了解並分析論壇的特性，實驗如何從論壇挖掘出潛藏的資訊，並認識探勘論壇會遭遇的困難。根據前面的論壇分析再設計系統架構大致可分為三階段：1. 資料前處理：論壇文章與新聞文章不同，很難直接將名詞、動詞作為關鍵字，因此使用TF-IDF篩選出論壇文章中有代表性的詞彙，作為句子的向量空間維度。2. 分群：使用K-Means分群法分辨哪些句子是比較相似的，並將相似的句子分在同一群。 3. 句子挑選：根據句子的分群結果，依句子的關鍵字含量及TF-IDF選擇出最能代表文件集的句子。我們發現實驗分析過程中可以看到一些有用的相關資訊，在論文的最後提出可能的改善方法，期望未來可以開發更好的論壇文章分類方式。 / With the establishment of network infrastructure, forum users can provide information fast and easily. However, users can have information retrieved through search engines, but they still have difficulty handling the articles. This is usually beyond the ability of human processing. In this study, we design a tool to automate retrieval of information from each topic in a Chinese game forum. We analyze the characteristics of the game forum, and refer to English news summary system. Our method is divided into three phases. The first phase attempts to discover the keywords in documents by TF-IDF instead of part of speech, and builds a vector space model. The second phase distinguishes the sentences by the vector space model built in the first phase. Also in the second phase, K-means clustering algorithm is exploited to gather sentences with the same sense into the same cluster. In the third phase, we choose two features to weight sentences and order sentences according to their weights. The two features are keywords of a sentence and TF-IDF. We conduct an experiment with data collected from the game forum, and find useful information through the experiment. We believe the developed techniques and the results of the analysis can be used to design a better system in the future. 中文遊戲論壇文件摘要關鍵字擷取 K-Means分群 Chinese game forum summary keyword selection K-means clustering
9	分群技術之研究溫蘊雯, YAO, YUN-WEN Unknown Date (has links) 判別分析是考慮到如何對於一個個體，根據一些特徵值，而將之歸類到二個或多個母體中去的問題。我們在判別分析中最常用的二種模型是：（一）母體間共變異矩陣相等的常態模型；（二）母體間共變異矩陣不等的常態模型。在一般情況下，母體的母數為未知，因此要由樣本統計量代入判別函數中。然而在共變異矩陣不等的情況下，特徵變數太多，而樣本之取得又受到限制時，則會影響判別函數的穩定性。因此，本文是研究如何做變數的選擇工作，使得變數個數能儘量小，以增進二次判別之效率。本文共一冊，約八千字，共分六章，茲分述如下：第一章為緒論；第二章為判別分析的基本理論；第三章是討論概度比檢定統計量的分解及組合方式；第四章是判別模型變數的逐步選擇；第五章是做變數選擇的電腦模擬實驗；第六章是結論。本文討論到一個變數的引入，Step－down概度比檢定統計量是如何地可以被分解成三個統計量，分別用來衡量此變數在（一）殘差的變異數不等（二）條件分配的平行（三）homo－ sceuastic 位置的散布對於判別的貢獻。此分割不僅在共變異矩陣不等時變數的選擇有用，也可以用在當我們執行一個共變異矩陣相等的判別分析時，來防止那些會導至任何形式的極端變異數不等的變數之選入。分群個體母體共變異矩陣函數模擬 ENTITY POPULATION COVARIANCE MARTRIX FUNCTION SIMULATION
10	中文訴訟文書檢索系統雛形實作 / A Prototype of Information Services for Chinese Judicial Documents 藍家樑, Lan, Chia Liang Unknown Date (has links) 訴訟案件與日俱增，欲閱讀完所有案件顯然不容易，此時便需要一套較完善的檢索系統來輔助使用者。我們整合前人的相關研究成果，實作一套分群式檢索系統的雛形，依檢索條件搜尋相關案件，並將結果分群輸出，便於使用者對各群集進行查詢，以期減少使用者閱讀案件上的負擔，同時獲得較完整資訊。另設計文件標記與註解功能，供使用者建立個人化資料庫，便於日後檢索。當輸入為關鍵詞時我們利用階層式分群法來為結果作分群，也以共現詞彙的概念建立的索引，列出可能的相關詞彙提供使用者作查詢；檢索條件亦可輸入一段犯罪事實，系統透過k最近鄰居法的概念，找到相似的案件，依照案由分群。另外也可以透過判決刑期分佈針對特定區間作檢索。本系統難以進行較正規的實驗，因為這是一個使用者互動的系統，而適不適用也難有一個評定標準。我們從使用者的執行效率，以及對於分群結果的相似度與判決刑期統計來分析與討論，檢驗本系統對使用者的助益以及討論系統本身須要再改善之處。 / Because cumulative number of the judgments grows unceasingly, it is obviously not easy for the users to read all the judicial documents. They need a handier system to retrieve the judgment information. We present a prototype of clustering retrieval system for Chinese judicial documents. The system can automatically cluster and integrate the search results. It is easy for the users to focus on the information they need and pass over the others. When they read a judicial document, they can mark some parts of sentences or annotate some comments if they are interested in. We let them create the personalized database and search more easily. We can type a keyword, and then our system executes the hierarchical clustering method to cluster search results. We also can view some words which may be relative to the keyword from the collocation word lists. Besides we can input a crime description, and then our system executes the k-nearest neighbor method to classify the crime into some prosecution reason and provide the similar cases. Moreover, our system lets the users view the distribution of prison sentence lengths and the documents in the specific interval. A formal evaluation of our system is not easy because this is an interactive system. We cannot definitely judge whether it is helpful or unhelpful. We evaluated the efficiency of our system by the operations of human subjects. Besides we made some statistics about the similarity and the distribution of prison sentence lengths from the clustering results. We tried to discuss the help by our system for users and how to improve the system. 法學資訊系統自然語言處理階層式分群法 k最近鄰居法

1	分群技術之研究姚蘊雯 Unknown Date (has links) No description available. 分群技術判別分析
2	階層式分群法在民事裁判要旨分群上之應用 / An Application of Hierarchical Clustering of Documents for Civil Judgments 何君豪, Ho,Jim How Unknown Date (has links) 司法院經常聘請資深的法官將民事裁判中具有參考價值的法律意見摘錄出來，製作成民事裁判要旨，民事裁判要旨可作為法官審理類似案件時的辦案參考，因此，在司法實務上民事裁判的搜尋為不可或缺的工作。然隨著資訊科技的發達及裁判數量的累積，民裁判要旨的搜尋結果可能多達數百篇，造成法官須耗費大量的時間在民事裁判要旨的閱讀上，如果能利用資料探勘的技術將搜尋到的民事裁判要旨加以分群，且分群的正確性又可達到一定旳水準，便可節省法官閱讀民事裁判要旨的時間。在本研究中我們嘗試將資料探勘技術中的階層式分群法應用在民事裁判要旨的分群上，並將法律條文所出現的用語作為加權的主關鍵字評估可否改善分群的效果，以探討資料探勘技術中的階層式分群法應用在民事裁判要旨分群上的可行性與成效。 / Judicial Yuan often invites senior civil judges to extract legal opinions from civil judgments for making the purports of civil judgments. The purports of civil judgments can be consulted as trial judges handle the similar cases, therefore, in judicial practices, it is an indispensable work for civil judges to search the purports of civil judgments. However, with the development of information technology and the cumulative number of judgments, the number of search results may be as high as hundreds, civil judges must have spent a lot of time reviewing of the purports of civil judgments. If we can utilize data mining technologies to cluster the search results, and the accuracy of clustering can be attained to a certain standard, it will save civil judges a lot of time on reviewing the purports of civil judgments. In this study we attempt to apply hierarchical method on the clustering of the purports of civil judgments, and adjust the weights of main keywords derived from frequently used vocabulary of legal provisions to assess the feasibility and effectiveness of application of hierarchical method on clustering of the purports of civil judgments. 人工智慧與法律階層式分群法聚合法分群 AI&Law Hierarchical Method Agglomerative Approach Cluster
3	應用資料探勘技術於食譜分享社群網站進行內容分群之研究 / A user-based content clustering system using data mining techniques on a recipe sharing website 林宜儒 Unknown Date (has links) 本研究以一個食譜分享社群網站為研究對象，針對網站上所提供的食譜建立了運用 kNN 分群演算法的自動分群機制，並利用該網站上使用者的使用行為進行分群後群集的特徵描述參考。本研究以三個階段建立了一針對食譜領域進行自動分群的資訊系統。第一階段為資料處理，在取得食譜網站上所提供的食譜資料後，雖然已經有相對結構化的格式可直接進行分群運算，然而由使用者所輸入的內容，仍有錯別字、贅詞、與食譜本身直接關連性不高等情形，因此必須進行處理。第二階段為資料分群，利用文字探勘進行內容特徵值的萃取，接著再以資料探勘的技術進行分群，分群的結果將會依群內的特徵、群間的相似度作為分群品質的主要指標。第三階段則為群集特徵分析，利用網站上使用者收藏食譜並加以分類的行為，運用統計的方式找出該群集的可能分類名稱。本研究實際以 500 篇食譜進行分群實驗，在最佳的一次分群結果中，可得到 10 個食譜群集、平均群內相似度為 0.4482，每個群集可觀察出明顯的相似特徵，並且可藉由網站上使用者的收藏行為，標註出其群集特徵，例如湯品、甜點、麵包、中式料理等類別。由於網站依照schema.org 所提供的食譜格式標準，針對網站上每一篇食譜內容進行了內容欄位的標記，本研究所實作之食譜分群機制，未來亦可運用在其他同樣採用 schema.org 所提供標準之同類型網站。文字探勘資料分群 text mining data clustering
4	行動應用軟體在迭代分群行為之研究 / Iterative Clustering on Behaviors of App Executables 邱莉晴, Chiu, Li Ching Unknown Date (has links) 行動裝置在現在這個世代相當普遍，而我們需要一個方法來探索App在背後的行為。本研究提出了一個非監督式的分群方式，目的是在於探討我們是否能使用App中的原始碼當作以行為分群的依據。在此研究中，我們應用了迭代分群的方式對Apps做分析，並且觀察分群的結果是否恰當。而在實驗中，我們由App Store下載了數百個App並加以分析，我們發現我們所提出的方式表現相當良好並且能給出正確的分群結果。 / Smart devices are everywhere nowadays. Mobile application (app) development has become one of the main streams in software industry with more than millions of apps that have been developed and published to billions of users. It is essential to have a systematic way to analyze apps, preferably on their executable that are the only public available sources of apps in most cases. In this work, we propose to apply unsupervised clustering to mobile applications on their system call distributions. This is done by first adopting a static binary analysis that reverses engineering on executable of apps to find method call/sequence counts that are embedded in apps. Apps are then clustered iteratively based on this information to reveal implicit relationships among apps based on function call similarity. The GHSOM (Growing Hierarchical Self-Organizing Map), an unsupervised learning tool, is integrated to cluster apps based on the information resolved from their executable directly. We use types of methods and sequences as features. To run the clustering algorithm on apps, however, we immediately confront a problem that we have a large amount of attributes and data that leads to a long/infeasible analysis time with GHSOMs. The new iterative approach is proposed to conquer this problem along with dimension reduction with principle component analysis, cutting attributes with limited information loss. In the preliminary result on analyzing hundreds of apps that are directly downloaded from Apple app store, we can find that the proposed clustering works well and reveals some interesting information. Apps that are developed by the same company are clustered in the same group. Apps that have similar behaviors, e.g., having the same functions on games, painting, socializing, are clustered together. 行動應用程式 GHSOM 分群 App Clustering GHSOM iterative
5	以企業財務資訊為基礎建構股票投資決策支援系統 / Decision Support System of Stocks Investment under Financial Information 黃加輝, Huang,Chia Hui Unknown Date (has links) 本研究採用統計分析與資料採礦的方法搭配企業生命週期理論與企業財務資訊做為股票投資決策的準則。其步驟順序為先將公司分群，並且按照生命週期理論命名各群為成長、成熟、老年期；再對各群分別使用決策樹與區別分析找出優秀股票的特徵；最後，按照特徵挑選出未來的報酬率有機會表現優秀的股票。期望在此方法下取得比台灣加權指數更高的報酬率。 / This paper adopted multivariate analysis and data mining to choose stock as a member of portfolio with financial indicators. The first, the public companies are divided into three periods according to life cycle by clustering; then, the rules are founded by decision tree and discriminant analysis; and the stocks are chosen as a member of portfolio. The result is that we can get higher return than TSEC weighted non-financial index. 決策支援資料採礦分群決策樹區別分析
6	基於音樂特徵以及文字資訊的音樂推薦 / Music recommendation based on music features and textual information 張筑鈞, Chang, Chu Chun Unknown Date (has links) 在WEB2.0的時代，網際網路中充斥著各式各樣的互動式平台。就音樂網站而言，使用者除了聽音樂外，更開始習慣於虛擬空間中交流及分享意見，並且在這些交流、分享的過程中留下他們的足跡，間接的提供許多帶有個人色彩的資訊。利用這些資訊，更貼近使用者的推薦系統因應而生。本研究中，將針對使用者過去存取過的音樂特徵以及使用者於系統中留下的文字評論特徵這兩個部份的資料，做音樂特徵的擷取、找尋具有價值的音樂特徵區間、建立使用者音樂特徵偏好，以及文字特徵的擷取、建立使用者文字特徵偏好。接著，採用協同式推薦方式，將具有相同興趣的使用者分於同一群，推薦給使用者與之同群的使用者的喜好物件，但這些推薦之物件為該使用者過去並沒有任何記錄於這些喜好物件上之物件。我們希望對於音樂推薦考慮的開始不只是音樂上之特徵，更包含了使用者交流、互動中留下的訊息。 / In the era of Web2.0, it is flooded with a variety of interactive platforms on the internet. In terms of music web site, in addition to listening to music, users got used to exchanging their comments and sharing their experiences through virtual platforms. And through the process of exchanging and sharing, they left their footprints. These footprints indirectly provide more information about users that contains personal characteristics. Moreover, from this information, we can construct a music recommendation system, which provides personalized service. In this research, we will focus on user’s access histories and comments of users to recommend music. Moreover, the user’s access histories are analyzed to derive the music features, then to find the valuable range of music features, and construct music profiles of user interests. On the other hand, the comments of users are analyzed to derive the textual features, then to calculate the importance of textual features, and finally to construct textual profiles of user interests. The music profile and the textual profile are behaviors for user grouping. The collaborative recommendation methods are proposed based on the favorite degrees of the users to the user groups they belong to. 音樂推薦使用者分群特徵切割 music recommendation user clustering feature partition
7	應用主題探勘與標籤聚合於標籤推薦之研究 / Application of topic mining and tag clustering for tag recommendation 高挺桂, Kao, Ting Kuei Unknown Date (has links) 標記社群標籤是Web2.0以來流行的一種透過使用者詮釋和分享資訊的方式，作為傳統分類方法的替代，其方便、靈活的特色使得使用者能夠輕易地因應內容標註標籤。不過其也有缺點，除了有相當多無標籤標註的內容，也存在大量模糊、不精確的標籤，降低了系統本身組織分類標籤的能力。為了解決上述兩項問題，本研究提出了一種結合主題探勘與標籤聚合的自動化標籤推薦方法，期望能夠建立一個去人工過程的自動化標籤推薦規則，來推薦合適的標籤給使用者。本研究蒐集了痞客邦部落格中，點閱次數大於5000次的熱門中文文章共2500篇，經過前處理，並以其中1939篇訓練模型及400篇作為測試語料來驗證方法。在主題探勘部分，本研究利用LDA主題模型計算不同文章的主題語意，來與既有標籤作出關聯，而能夠針對新進文章預測主題並推薦主題相關標籤給它。其中，本研究利用了能評斷模型表現情形的混淆度(Perplexity)來協助選取LDA的主題數，改善了LDA需要人主觀決定主題數的問題；在標籤聚合部分，本研究以階層式分群法，將有共同出現過的標籤群聚起來，以便找出有相似語意概念的標籤。其中，本研究將分群停止條件設定為共現次數最少為1次，改善了分群方法需要設定分群數量才能有結果的問題，也使本方法能夠自動化的找出合適的分群數目。實驗結果顯示，依照文章主題語意來推薦標籤有一定程度的可行性，且以混淆度所協助選取的主題數取得一致性較好的結果。而依照階層式分群所分出的標籤群中，同一群中的標籤確實擁有相似、類似的概念語意。最後，在結合主題探勘與標籤聚合的方法上，其Top-1至Top-5的準確率平均提升了14.1%，且Top-1準確率也達到72.25%。代表本研究針對文章寫作及標記標籤的習性切入的做法，確實能幫助提升標籤推薦的準確率，也代表本研究確實建立了一個自動化的標籤推薦規則，能推薦出合適的標籤來幫助使用者在撰寫文章後，能夠更方便、精確的標上標籤。 / Tags are a popular way of interpreting and sharing information through use, and as a substitute for traditional classification methods, the convenience and flexibility of the community makes it easy for users to use. But it also has disadvantages, in addition to a considerable number of non-tagged content, there are also many fuzzy and inaccurate tags. To solve these two problems, this study proposes a tag recommendation method that combines the Topic Mining and Tag Clustering. In this study, we collected a total of 2500 articles by Pixnet as a corpus. In the Topic Mining section, this study uses the LDA Model to calculate the subject semantics of different articles to associate with existing tags, and we can predict topics for new articles to recommend topics related tags to them. Among them, the topics number of the LDA Model uses the Perplexity to help the selection. In the Tag Clustering section, this study uses the Hierarchical Clustering to collect the tags that have appeared together to find similar semantic concepts. The stop condition is set to a minimum of 1 co-occurrence times, which solves the problem that the clustering method needs to set the number of groups to have the result. First, the Topic Mining results show that it is feasible to recommend tags according to the semantics of the article, and the experiment proves that the number of topics chosen according to the Perplexity is superior to the other topics. Second, the Tag Clustering results show that the same group of tags does have similar conceptual semantics. Last, experiments show that the accuracy rate of Top-1 to Top-5 in combination with two methods increased average of 14.1%, and its Top-1 accuracy rate is 72.25%,and it tells that our tag recommendation method can recommend the appropriate tag for users to use. 標籤推薦主題模型階層式分群 Tag recommendation Topic model Hierarchical clustering
8	設計與實作一個針對遊戲論壇的中文文章整合系統 / Design and Implementation of a Chinese Document Integration System for Game Forums 黃重鈞, Huang, Chung Chun Unknown Date (has links) 現今網路發達便利，人們資訊交換的方式更多元，取得資訊的方式，不再僅是透過新聞，透過論壇任何人都可以快速地、較沒有門檻地分享資訊。也因為這個特性造成資訊量暴增，就算透過搜尋引擎，使用者仍需要花費許多精力蒐集、過濾與處理特定的主題。本研究以巴哈姆特電玩資訊站─英雄聯盟哈拉討論板為例，期望可以為使用者提供一個全面且精要的遊戲角色描述，讓使用者至少對該角色有大概的認知。本研究參考網路論壇探勘及新聞文件摘要系統，設計適用於論壇多篇文章的摘要系統。首先必須了解並分析論壇的特性，實驗如何從論壇挖掘出潛藏的資訊，並認識探勘論壇會遭遇的困難。根據前面的論壇分析再設計系統架構大致可分為三階段：1. 資料前處理：論壇文章與新聞文章不同，很難直接將名詞、動詞作為關鍵字，因此使用TF-IDF篩選出論壇文章中有代表性的詞彙，作為句子的向量空間維度。2. 分群：使用K-Means分群法分辨哪些句子是比較相似的，並將相似的句子分在同一群。 3. 句子挑選：根據句子的分群結果，依句子的關鍵字含量及TF-IDF選擇出最能代表文件集的句子。我們發現實驗分析過程中可以看到一些有用的相關資訊，在論文的最後提出可能的改善方法，期望未來可以開發更好的論壇文章分類方式。 / With the establishment of network infrastructure, forum users can provide information fast and easily. However, users can have information retrieved through search engines, but they still have difficulty handling the articles. This is usually beyond the ability of human processing. In this study, we design a tool to automate retrieval of information from each topic in a Chinese game forum. We analyze the characteristics of the game forum, and refer to English news summary system. Our method is divided into three phases. The first phase attempts to discover the keywords in documents by TF-IDF instead of part of speech, and builds a vector space model. The second phase distinguishes the sentences by the vector space model built in the first phase. Also in the second phase, K-means clustering algorithm is exploited to gather sentences with the same sense into the same cluster. In the third phase, we choose two features to weight sentences and order sentences according to their weights. The two features are keywords of a sentence and TF-IDF. We conduct an experiment with data collected from the game forum, and find useful information through the experiment. We believe the developed techniques and the results of the analysis can be used to design a better system in the future. 中文遊戲論壇文件摘要關鍵字擷取 K-Means分群 Chinese game forum summary keyword selection K-means clustering
9	分群技術之研究溫蘊雯, YAO, YUN-WEN Unknown Date (has links) 判別分析是考慮到如何對於一個個體，根據一些特徵值，而將之歸類到二個或多個母體中去的問題。我們在判別分析中最常用的二種模型是：（一）母體間共變異矩陣相等的常態模型；（二）母體間共變異矩陣不等的常態模型。在一般情況下，母體的母數為未知，因此要由樣本統計量代入判別函數中。然而在共變異矩陣不等的情況下，特徵變數太多，而樣本之取得又受到限制時，則會影響判別函數的穩定性。因此，本文是研究如何做變數的選擇工作，使得變數個數能儘量小，以增進二次判別之效率。本文共一冊，約八千字，共分六章，茲分述如下：第一章為緒論；第二章為判別分析的基本理論；第三章是討論概度比檢定統計量的分解及組合方式；第四章是判別模型變數的逐步選擇；第五章是做變數選擇的電腦模擬實驗；第六章是結論。本文討論到一個變數的引入，Step－down概度比檢定統計量是如何地可以被分解成三個統計量，分別用來衡量此變數在（一）殘差的變異數不等（二）條件分配的平行（三）homo－ sceuastic 位置的散布對於判別的貢獻。此分割不僅在共變異矩陣不等時變數的選擇有用，也可以用在當我們執行一個共變異矩陣相等的判別分析時，來防止那些會導至任何形式的極端變異數不等的變數之選入。分群個體母體共變異矩陣函數模擬 ENTITY POPULATION COVARIANCE MARTRIX FUNCTION SIMULATION
10	中文訴訟文書檢索系統雛形實作 / A Prototype of Information Services for Chinese Judicial Documents 藍家樑, Lan, Chia Liang Unknown Date (has links) 訴訟案件與日俱增，欲閱讀完所有案件顯然不容易，此時便需要一套較完善的檢索系統來輔助使用者。我們整合前人的相關研究成果，實作一套分群式檢索系統的雛形，依檢索條件搜尋相關案件，並將結果分群輸出，便於使用者對各群集進行查詢，以期減少使用者閱讀案件上的負擔，同時獲得較完整資訊。另設計文件標記與註解功能，供使用者建立個人化資料庫，便於日後檢索。當輸入為關鍵詞時我們利用階層式分群法來為結果作分群，也以共現詞彙的概念建立的索引，列出可能的相關詞彙提供使用者作查詢；檢索條件亦可輸入一段犯罪事實，系統透過k最近鄰居法的概念，找到相似的案件，依照案由分群。另外也可以透過判決刑期分佈針對特定區間作檢索。本系統難以進行較正規的實驗，因為這是一個使用者互動的系統，而適不適用也難有一個評定標準。我們從使用者的執行效率，以及對於分群結果的相似度與判決刑期統計來分析與討論，檢驗本系統對使用者的助益以及討論系統本身須要再改善之處。 / Because cumulative number of the judgments grows unceasingly, it is obviously not easy for the users to read all the judicial documents. They need a handier system to retrieve the judgment information. We present a prototype of clustering retrieval system for Chinese judicial documents. The system can automatically cluster and integrate the search results. It is easy for the users to focus on the information they need and pass over the others. When they read a judicial document, they can mark some parts of sentences or annotate some comments if they are interested in. We let them create the personalized database and search more easily. We can type a keyword, and then our system executes the hierarchical clustering method to cluster search results. We also can view some words which may be relative to the keyword from the collocation word lists. Besides we can input a crime description, and then our system executes the k-nearest neighbor method to classify the crime into some prosecution reason and provide the similar cases. Moreover, our system lets the users view the distribution of prison sentence lengths and the documents in the specific interval. A formal evaluation of our system is not easy because this is an interactive system. We cannot definitely judge whether it is helpful or unhelpful. We evaluated the efficiency of our system by the operations of human subjects. Besides we made some statistics about the similarity and the distribution of prison sentence lengths from the clustering results. We tried to discuss the help by our system for users and how to improve the system. 法學資訊系統自然語言處理階層式分群法 k最近鄰居法

Search results