Global ETD Search

1	多維度資料庫設計方法之研究李淑銘 Unknown Date (has links) 在經常變動的競爭環境中，經理人需要快速靈活的掌握資訊，以為決策所用。資訊部門為因應這些需求，必須設計一套滿足使用者動態的資訊需求的資料庫管理系統。本研究提出一套以集合為基礎的表達方式定義多維度資料庫，同時提出設計多維度資料庫的方法，並以一行銷公司─南聯國際貿易公司為例，將本論文所提出的方法加以探討、驗證。多維度資料庫資料庫設計多維度資料模式
2	針對社群媒體上的趨勢變化之視覺化探索工具 / A study of visual exploration tool for comparing trend 郭建凱, Kuo, Jian-Kai Unknown Date (has links) 隨著社群媒體的普及，新聞媒體與意見領袖逐漸重視在社群媒體上以貼文方式發佈新聞資訊，社群媒體成為許多使用者會接收新聞與重大事件的主要管道且透過社群媒體的評論、分享與按讚等互動機制表達立場，這些即時互動行為是傳統媒體缺乏的機制，如何分析也是研究上的挑戰。本研究將針對Facebook上的貼文與互動行為進行分析，提供一款互動視覺化系統，找出貼文資料集中相似的貼文群集以及隨著時間推移下貼文屬性的變化，進一步瞭解Facebook上使用者、貼文與重大事件之間的相互影響。由於Facebook上的貼文與互動行為具多維度屬性，我們透過降維演算法將大量的貼文以二維散佈圖呈現，達到將相似貼文分群的效果。另外，我們設計了一種視覺化呈現方法，「Time Block」，突顯出時間的推移下貼文屬性的變化，藉此觀察出貼文資料集是否存在特定的模式。最後提供即時互動的操作介面，以及貼文屬性以及關鍵字兩者的統計，藉此連接到貼文集的屬性與時間的分佈關係，協助以視覺化方式進行探索與分析。最後，透過案例分析與使用者測試呈現此視覺化探索工具的優缺點。 / Social media becomes an essential medium for broadcast news. News media and option leader post information and people love to receive news and interactive using comment and likes to feedback. It is a research challenge to analysis this massive amount interactive behavior data in social media. In this paper, we propose an interactive visualization system to explore on the posts and interactions on Facebook. This system can help a user to find out the similar interactive behavior cluster and the trend of time-varying attributes to understand how the users, posts, and a big event to affect each other. Facebook Posts and interactive behavior contains multiple dimensional attributes; we adopt the dimensional reduce algorithm and 2D scatter plot to present the cluster in the spatial domain. Then, we design a time-varying visualization method, `Time Block,' can highlight the changing attributes and observe the unique pattern in the time domain. Also, we design a real-time interactive interface to connect the cluster and trend visualization with additional keyword distribution and attribute statistics. Finally, we use case study and user study to demonstrate the advantage of the proposed system. 資料視覺化多維度資料時序性資料社群網路
3	利用股價連動關係發展股票推薦系統 / Developing a stock recommendation system by stock prices correlation 簡志偉 Unknown Date (has links) 累積財富的方法隨著時代背景的不同而改變，在二十一世紀的今天，投資就是一個可以快速累積財富的方法。近年來國民所得與理財知識的提升，使得今日的台灣證劵市場交易活絡，根據台灣證劵交易所與財政部證劵暨期貨管理委員會統計資料顯示，股票市場已成為國內投資者重要的理財管道。　　而試圖在股市或是衍生性商品中投資獲利者，不可不重視股票價格的變化。然而影響股票價格的因素極為廣泛，對於如此大量且複雜的資訊，實非一般投資人可以輕易掌握的。近年來，藉資訊科技的快速發展，資料探勘應用於股市金融領域變的可行，優點是可以在大量的資訊裡找出有用的資訊。本研究目的在利用資料探勘的技術來尋找股票市場之買進標的與切入時機。本研究探討單一個股的價格走向，是否會跟群體股票有所關連。利用歷史交易資料找出股票之間的股價漲跌關連度與技術指標關連度，進而發展出條件機率法則與投票法則來求出每檔股票的買進與賣出推薦值，最後再依推薦值的變化來判斷買進與賣出的標的股票。　　本研究以2004年3月到2006年3月為訓練期間；2006年3月到2007年3月為預測時間。研究結果經由報酬率分佈分析、交易次數分析、正報酬比例分析、總獲利分析與「買進後持有」策略比較分析，顯示本研究所提出的四種預測模式中，以技術指標關連度搭配投票法則的方法最能夠有效的打敗「買進後持有」的策略。 / The method to accumulate wealth changes during different times. Making the investment is a method that can accumulate the wealth quickly in 21st century. The improvement of the national income makes today's stock market of Taiwan activate recently. From the statistical data of Taiwan Stock Exchange Corporation (TSEC), the stock market has already become important financing channel of investor. People attempting to earn profits in stock market must pay attention to the change of the stock price. However, many factors widely influence the price of the stock and make the investors hard to predict. In recent years, the fast development of computer science makes the technology of data mining applied to the stock market. The advantage of data mining is that we can find out useful information in a large amount of information. The purpose of this research is to use the technology of data mining to look for buying time and selling time of the stocks. This research investigates the correlation between a single stock and other stocks. By using the historical data of the stocks to find out the correlation between stocks, and developing the rules to calculate a prediction value, the recommendation of the buying or selling time of a stock can be done. The training analysis in this paper is collected from March, 2004 to March, 2006 and the prediction time is from March, 2006 to March, 2007. The empirical result shows that: from the distribution analysis of the profit rate, the trade number of the times analysis, the buy-and-hold policy comparative analysis, and the positive profit rate analysis: in the four models discussed the index with the rule of vote performs better than the buy-and-hold policy. 股票資料探勘
4	KDD系統－以民國八十三年至八十八年之死亡資料庫為例陳怡伶 Unknown Date (has links) 近年來大量資料不斷地快速累積，資料採礦與統計應用也隨著電腦等高科技之蓬勃發展而愈顯得重要。雖然市面上有一些專屬的統計軟體，能提供資料採礦的功能，但是會有受限於統計套裝軟體中不知如何輸入資料、尋找相關分析項目…等非人性化的缺失。為了改善此缺失，本文建立了『KDD系統』，希望以Visual Basic發展出人性化的使用者介面，進而連結統計軟體STATISTICA，將統計套裝軟體呈現出普及化的風格，讓不同領域的使用者都能輕鬆的進行資料採礦。最後，本文以流行病學為例，藉著『KDD系統』簡易地操作方式與多功能的分析表單內容，來找出六年來臺灣（含金馬地區）死亡原因之趨勢與關連的資訊，希望這些資訊能對研究死因的學者專家和政府官員在對病因的探討上有所助益。 / In coping with the increasingly accelerated supply of data and along with the fast pace of development in the computing science, data mining and statistical application have become a necessary and vital tools. Although there are a number statistical packages dealing with data mining available,there are still some shortcomings with these packages such as input data,transforming data, recoding data, etc. The focus of this research aims to improve the above mentioned shortcomings. To accomplish this goal the author has developed a user friendly interface called KDD system using Visual Basic and is linked with statistical software--STATISTICA making it a multi-function system. The system is easy to manipulate that allows users of all types even the novices be able to run the data mining application. This research took the data of the deceased in Taiwan from 1993 to 1998 and run the data on the KDD system in an attempt to find the trend of the cause of death. The research has found some valuable findings worth noting and could be useful for government officers and scholars in epidemiology. 資料採礦死亡率
5	失去部份訊息而有價值的類別資料依循序程式處理之計算方法汪為開 Unknown Date (has links) 以部分區分（或部份類別無法區分）(partially-classified) 失去部份訊息資料 (censored data) 的類別抽樣 (categorical sampling) 在許多的應用領域中都非常重要。這類問題的研究探討已行之有年，但大部份都把重點放在〝失去部份訊息資料但無價值性〞(non-informative censoring) 以及〝誠實回答〞(truthful reporting) 的前提下。Thomas J. Jiang取消了以上二個前提的限制，並提出了quasi-Bayes method來近似這類問題的貝氏解(Bayes solution)。此一quasi Bayes methood與Makov and Smith (1977)與Smith and Maikov (1978) 所整合出的〝quasi Bayes procedure for mixture〞相類似。本文所引用的quasi-Bayes method的計算公式都已導出，而且只需要少許的時間便可解出答案。本文重站在比較quasi-Bayes method與Bayes method的效卒，quasi-Bayes近似狀況的好壞，並探討在何種情況下quasi-Bayes的近似狀況較差。類別資料訊息
6	基於標記式主題模型之資料視覺化研究與實現 / A study of data visualization based on labeled topic model and its implementation 曾子芸 Unknown Date (has links) 隨著文字資訊的爆炸式增長，越來越多的訊息開始以電子文本的形式儲存及傳遞。但隨著文本內容資訊量不斷地增加，使用者也越來越難以快速地掌握文本全貌。因此本研究試圖透過主題模型（TopicModels）、標記式主題模型（Labeled Topic Models）演算法－在自然語言處理領域裡文本探勘的方法，識別出大規模文本中潛藏的主題訊息之後，再利用圖像視覺化在資訊表達上的優勢和效率，透過各種視覺化圖案的呈現從不同的角度來探索文本，形成一種嶄新的大規模文本閱讀與分析方式。本研究設計了兩階段實驗：第一階段任務導向性實驗、第二階段指定任務實驗，以及評估問卷來驗證本介面的易用性（ Ease-of-use ）和有用性（ Usefulness ）。並透過實驗問卷的分數結果驗證了，本研究所設計之介面在實務上的確能輔助專家學者進行文本相關研究，也能讓對文本熟悉程度不一的使用者在利用此介面探索文本的過程中，更快速地掌握大規模文本的事件全貌。 / With the explosion of text information, there are more and more data being recorded and transmitted in the form of texts. However, as the amount of textual information becomes larger, how to effectively and efficiently realize the information also becomes more difficult. This study attempts to use the Topics Models, text-mining techniques to identify the important topics in the large textual information. In addition, this study also aims to use the techniques of data visualization to present the most informative and valuable details within the large texts. There are two parts in this work: the first part is the introduction of text mining algorithms and the second part is the design of the data visualization.Moreover, in the experiments, we also conduct several surveys to verify the proficiency and usefulness and the visualization design. The results of the experiments and surveys, supports that our design provides an effective and efficient interface for users to understand a large set of texts, even for the experts familiar with the corpus. 資料視覺化文字資料視覺化主題模型
7	應用資料採礦技術於多個資料庫連結與整合劉致琪 Unknown Date (has links) 現今電子化的時代，有些企業雖然擁有數百萬的資料，但要分析起來是相當困難且耗時的，往往又浪費人力與金錢，而又無法得到預期的結果。而利用資料採礦技術，便可以從這大量的資料中，挖掘出隱藏的、有用的訊息及知識，還可以從既有的資料預測未來，使企業可優先獲得商機。對於資料採礦而言，一開始的資料收集便是一項很重要的課題，資料品質的良莠，牽動著結果的正確性及預測的成敗。但每一個研究主題都有其各自的目的、所需的資料變數、適用的演算法等等，所以也有可能無法幸運地在同一個資料庫中得到所需的完整訊息，若是重新進行調查，是很費時、費力的工作。當我們面臨以上問題時，對於部份資料的缺漏該如何補救呢？！這便是我們在本研究中的研究目的。所以我們可以試著從現有的資料庫下，利用兩個其他的資料庫來輔助，利用函數映射的方法來補齊我們所要的資料，如此情況下再來做資料採礦，便能更有效率；對於我們所建立出來的預測模型，也更為準確。在資料庫連結的過程中，我們討論了三種情況，分別為三個資料庫間有相同欄位、兩兩資料庫有相同欄位、三個資料庫間沒有相同欄位。從研究結果發現，不管資料庫之間有無相同欄位可供連結使用，利用函數映射方法為資料庫增加訊息是可行的，而且效能相當不錯，可以提供給資料採礦工作者在蒐集資料時的參考，以及未來的研究方向。資料採礦資料加值函數映射
8	由食譜資料探勘料理特徵樣式 / Mining Cuisine Patterns from Recipe Dataset 呂耀茹 Unknown Date (has links) 近年來越來越多人基於健康理由，自己動手烹調料理，也帶動食譜社群網站的成長。雖然隨著Big Data議題受到注目，Data Mining在近年來相當熱門，然而針對食譜的巨量資料探勘與分析研究並不多。本研究由網路擷取國外知名食譜網站Allrecipes.com、Food.com及Yummly.com的食譜資料，探勘世界主要料理的食材樣式與特性，包括料理口味、常用食材、特色食材、核心食材、食材搭配關係、料理間相似度與分群、及料理自動分類。針對資料前處理，本論文提出結合食材詞庫並利用連通單元標籤演算法，提出解決食材同義詞的方法。為了探勘料理的食材樣式與特性，本研究透過網絡分析、關連規則、Phi, PMI等方法來探勘分析各種料理的特色食材、核心食材與食材搭配樣式。此外，本論文依據料理食材之相似度，並結合階層式分群技術，有別於一般以地理位置來群聚各類料理。本論文也提出運用階層式分類技術，以根據食材來自動判斷食譜的料理種類。透過食譜網站的大量的使用者產生資料，探勘分析世界各種料理的樣式與特性，將可了解各種料理的風格與特色，進而應用在食譜網站的資料管理與查詢。巨量資料資料探勘食譜料理
9	An XML-based Multidimensional Data Exchange Study / 以XML為基礎之多維度資料交換之研究王容, Wang, Jung Unknown Date (has links) 在全球化趨勢與Internet帶動速度競爭的影響下，現今的企業經常採取將旗下部門分散佈署於各地，或者和位於不同地區的公司進行合併結盟的策略，藉以提昇其競爭力與市場反應能力。由於地理位置分散的結果，這類企業當中通常存在著許多不同的資料倉儲系統；為了充分支援管理決策的需求，這些不同的資料倉儲當中的資料必須能夠進行交換與整合，因此需要有一套開放且獨立的資料交換標準，俾能經由Internet在不同的資料倉儲間交換多維度資料。然而目前所知的跨資料倉儲之資料交換解決方案多侷限於逐列資料轉換或是以純文字檔案格式進行資料轉移的方式，這些方式除缺乏效率外亦不夠系統化。在本篇研究中，將探討多維度資料交換的議題，並發展一個以XML為基礎的多維度資料交換模式。本研究並提出一個基於學名結構的方法，以此方法發展一套單一的標準交換格式，並促成分散各地的資料倉儲間形成多對多的系統化映對模式。以本研究所發展之多維度資料模式與XML資料模式間的轉換模式為基礎，並輔以本研究所提出之多維度中介資料管理功能，可形成在網路上通用且以XML為基礎的多維度資料交換過程，並能兼顧效率與品質。本研究並開發一套雛型系統，以XML為基礎來實作多維度資料交換，藉資證明此多維度資料交換模式之可行性，並顯示經由中介資料之輔助可促使多維度資料交換過程更加系統化且更富效率。 / Motivated by the globalization trend and Internet speed competition, enterprise nowadays often divides into many departments or organizations or even merges with other companies that located in different regions to bring up the competency and reaction ability. As a result, there are a number of data warehouse systems in a geographically-distributed enterprise. To meet the distributed decision-making requirements, the data in different data warehouses is addressed to enable data exchange and integration. Therefore, an open, vendor-independent, and efficient data exchange standard to transfer data between data warehouses over the Internet is an important issue. However, current solutions for cross-warehouse data exchange employ only approaches either based on records or transferring plain-text files, which are neither adequate nor efficient. In this research, issues on multidimensional data exchange are studied and an XML-based Multidimensional Data Exchange Model is developed. In addition, a generic-construct-based approach is proposed to enable many-to-many systematic mapping between distributed data warehouses, introducing a consistent and unique standard exchange format. Based on the transformation model we develop between multidimensional data model and XML data model, and enhanced by the multidimensional metadata management function proposed in this research, a general-purpose XML-based multidimensional data exchange process over web is facilitated efficiently and improved in quality. Moreover, we develop an XML-based prototype system to exchange multidimensional data, which shows that the proposed multidimensional data exchange model is feasible, and the multidimensional data exchange process is more systematic and efficient using metadata. 資料倉儲多維度資料模式資料方塊中介資料資料交換 Data Warehouse Multidimensional Data Model Data Cube Metadata XML Data Exchange
10	應用資料採礦技術於資料庫加值中的抽樣方法 / THE SAMPLING METHODS FOR VALUE-ADDED DATABASE IN DATA-MINING 陳惠雯 Unknown Date (has links) In the wake of growing database that has already become the trend of today’s business environment within the foreseeable future, reviewing quality information from mountains of data residing on corporations or organizations’ network such as sales figures, manufacturing statistics, financial data and experimental data is clearly costly, time consuming and definitely ineffective approach. Therefore we would need a sound and effective method in obtaining only portions of the data that are representative to the population and which allow us to build the reliable model based upon the sampled data. However, sometimes we have a situation where the database is of limited in size, under such circumstance, we initiate the idea which is relatively new to adding the attributes or values into the database to enhance the quality of the data Follow through such a procedure; it is obvious that implementing a good sampling method is an important groundwork leading us to reach final destination that is obtaining a reliable predictive model. And this is our research goal that is to get an effective and representative value-added sample of by means of sampling method for building an accuracy predictive model. The concept is pretty straightforward that is if we want to get good predictive samples then we need the correct sampling methods. The sampling methods under study are simple random sample, system sample, stratified sample and uniform design. The models used are the C5.0, logistic regression, and neural network for categorical predictive variable and stepwise regression for continuous predictive variable. The results are discussed in the conclusion section. Keywords: Database、Data Mining、Sampling、Value-added database 資料庫資料採礦抽樣方法資料加值 Database Data Mining Sampling Value-added database

1	多維度資料庫設計方法之研究李淑銘 Unknown Date (has links) 在經常變動的競爭環境中，經理人需要快速靈活的掌握資訊，以為決策所用。資訊部門為因應這些需求，必須設計一套滿足使用者動態的資訊需求的資料庫管理系統。本研究提出一套以集合為基礎的表達方式定義多維度資料庫，同時提出設計多維度資料庫的方法，並以一行銷公司─南聯國際貿易公司為例，將本論文所提出的方法加以探討、驗證。多維度資料庫資料庫設計多維度資料模式
2	針對社群媒體上的趨勢變化之視覺化探索工具 / A study of visual exploration tool for comparing trend 郭建凱, Kuo, Jian-Kai Unknown Date (has links) 隨著社群媒體的普及，新聞媒體與意見領袖逐漸重視在社群媒體上以貼文方式發佈新聞資訊，社群媒體成為許多使用者會接收新聞與重大事件的主要管道且透過社群媒體的評論、分享與按讚等互動機制表達立場，這些即時互動行為是傳統媒體缺乏的機制，如何分析也是研究上的挑戰。本研究將針對Facebook上的貼文與互動行為進行分析，提供一款互動視覺化系統，找出貼文資料集中相似的貼文群集以及隨著時間推移下貼文屬性的變化，進一步瞭解Facebook上使用者、貼文與重大事件之間的相互影響。由於Facebook上的貼文與互動行為具多維度屬性，我們透過降維演算法將大量的貼文以二維散佈圖呈現，達到將相似貼文分群的效果。另外，我們設計了一種視覺化呈現方法，「Time Block」，突顯出時間的推移下貼文屬性的變化，藉此觀察出貼文資料集是否存在特定的模式。最後提供即時互動的操作介面，以及貼文屬性以及關鍵字兩者的統計，藉此連接到貼文集的屬性與時間的分佈關係，協助以視覺化方式進行探索與分析。最後，透過案例分析與使用者測試呈現此視覺化探索工具的優缺點。 / Social media becomes an essential medium for broadcast news. News media and option leader post information and people love to receive news and interactive using comment and likes to feedback. It is a research challenge to analysis this massive amount interactive behavior data in social media. In this paper, we propose an interactive visualization system to explore on the posts and interactions on Facebook. This system can help a user to find out the similar interactive behavior cluster and the trend of time-varying attributes to understand how the users, posts, and a big event to affect each other. Facebook Posts and interactive behavior contains multiple dimensional attributes; we adopt the dimensional reduce algorithm and 2D scatter plot to present the cluster in the spatial domain. Then, we design a time-varying visualization method, `Time Block,' can highlight the changing attributes and observe the unique pattern in the time domain. Also, we design a real-time interactive interface to connect the cluster and trend visualization with additional keyword distribution and attribute statistics. Finally, we use case study and user study to demonstrate the advantage of the proposed system. 資料視覺化多維度資料時序性資料社群網路
3	利用股價連動關係發展股票推薦系統 / Developing a stock recommendation system by stock prices correlation 簡志偉 Unknown Date (has links) 累積財富的方法隨著時代背景的不同而改變，在二十一世紀的今天，投資就是一個可以快速累積財富的方法。近年來國民所得與理財知識的提升，使得今日的台灣證劵市場交易活絡，根據台灣證劵交易所與財政部證劵暨期貨管理委員會統計資料顯示，股票市場已成為國內投資者重要的理財管道。　　而試圖在股市或是衍生性商品中投資獲利者，不可不重視股票價格的變化。然而影響股票價格的因素極為廣泛，對於如此大量且複雜的資訊，實非一般投資人可以輕易掌握的。近年來，藉資訊科技的快速發展，資料探勘應用於股市金融領域變的可行，優點是可以在大量的資訊裡找出有用的資訊。本研究目的在利用資料探勘的技術來尋找股票市場之買進標的與切入時機。本研究探討單一個股的價格走向，是否會跟群體股票有所關連。利用歷史交易資料找出股票之間的股價漲跌關連度與技術指標關連度，進而發展出條件機率法則與投票法則來求出每檔股票的買進與賣出推薦值，最後再依推薦值的變化來判斷買進與賣出的標的股票。　　本研究以2004年3月到2006年3月為訓練期間；2006年3月到2007年3月為預測時間。研究結果經由報酬率分佈分析、交易次數分析、正報酬比例分析、總獲利分析與「買進後持有」策略比較分析，顯示本研究所提出的四種預測模式中，以技術指標關連度搭配投票法則的方法最能夠有效的打敗「買進後持有」的策略。 / The method to accumulate wealth changes during different times. Making the investment is a method that can accumulate the wealth quickly in 21st century. The improvement of the national income makes today's stock market of Taiwan activate recently. From the statistical data of Taiwan Stock Exchange Corporation (TSEC), the stock market has already become important financing channel of investor. People attempting to earn profits in stock market must pay attention to the change of the stock price. However, many factors widely influence the price of the stock and make the investors hard to predict. In recent years, the fast development of computer science makes the technology of data mining applied to the stock market. The advantage of data mining is that we can find out useful information in a large amount of information. The purpose of this research is to use the technology of data mining to look for buying time and selling time of the stocks. This research investigates the correlation between a single stock and other stocks. By using the historical data of the stocks to find out the correlation between stocks, and developing the rules to calculate a prediction value, the recommendation of the buying or selling time of a stock can be done. The training analysis in this paper is collected from March, 2004 to March, 2006 and the prediction time is from March, 2006 to March, 2007. The empirical result shows that: from the distribution analysis of the profit rate, the trade number of the times analysis, the buy-and-hold policy comparative analysis, and the positive profit rate analysis: in the four models discussed the index with the rule of vote performs better than the buy-and-hold policy. 股票資料探勘
4	KDD系統－以民國八十三年至八十八年之死亡資料庫為例陳怡伶 Unknown Date (has links) 近年來大量資料不斷地快速累積，資料採礦與統計應用也隨著電腦等高科技之蓬勃發展而愈顯得重要。雖然市面上有一些專屬的統計軟體，能提供資料採礦的功能，但是會有受限於統計套裝軟體中不知如何輸入資料、尋找相關分析項目…等非人性化的缺失。為了改善此缺失，本文建立了『KDD系統』，希望以Visual Basic發展出人性化的使用者介面，進而連結統計軟體STATISTICA，將統計套裝軟體呈現出普及化的風格，讓不同領域的使用者都能輕鬆的進行資料採礦。最後，本文以流行病學為例，藉著『KDD系統』簡易地操作方式與多功能的分析表單內容，來找出六年來臺灣（含金馬地區）死亡原因之趨勢與關連的資訊，希望這些資訊能對研究死因的學者專家和政府官員在對病因的探討上有所助益。 / In coping with the increasingly accelerated supply of data and along with the fast pace of development in the computing science, data mining and statistical application have become a necessary and vital tools. Although there are a number statistical packages dealing with data mining available,there are still some shortcomings with these packages such as input data,transforming data, recoding data, etc. The focus of this research aims to improve the above mentioned shortcomings. To accomplish this goal the author has developed a user friendly interface called KDD system using Visual Basic and is linked with statistical software--STATISTICA making it a multi-function system. The system is easy to manipulate that allows users of all types even the novices be able to run the data mining application. This research took the data of the deceased in Taiwan from 1993 to 1998 and run the data on the KDD system in an attempt to find the trend of the cause of death. The research has found some valuable findings worth noting and could be useful for government officers and scholars in epidemiology. 資料採礦死亡率
5	失去部份訊息而有價值的類別資料依循序程式處理之計算方法汪為開 Unknown Date (has links) 以部分區分（或部份類別無法區分）(partially-classified) 失去部份訊息資料 (censored data) 的類別抽樣 (categorical sampling) 在許多的應用領域中都非常重要。這類問題的研究探討已行之有年，但大部份都把重點放在〝失去部份訊息資料但無價值性〞(non-informative censoring) 以及〝誠實回答〞(truthful reporting) 的前提下。Thomas J. Jiang取消了以上二個前提的限制，並提出了quasi-Bayes method來近似這類問題的貝氏解(Bayes solution)。此一quasi Bayes methood與Makov and Smith (1977)與Smith and Maikov (1978) 所整合出的〝quasi Bayes procedure for mixture〞相類似。本文所引用的quasi-Bayes method的計算公式都已導出，而且只需要少許的時間便可解出答案。本文重站在比較quasi-Bayes method與Bayes method的效卒，quasi-Bayes近似狀況的好壞，並探討在何種情況下quasi-Bayes的近似狀況較差。類別資料訊息
6	基於標記式主題模型之資料視覺化研究與實現 / A study of data visualization based on labeled topic model and its implementation 曾子芸 Unknown Date (has links) 隨著文字資訊的爆炸式增長，越來越多的訊息開始以電子文本的形式儲存及傳遞。但隨著文本內容資訊量不斷地增加，使用者也越來越難以快速地掌握文本全貌。因此本研究試圖透過主題模型（TopicModels）、標記式主題模型（Labeled Topic Models）演算法－在自然語言處理領域裡文本探勘的方法，識別出大規模文本中潛藏的主題訊息之後，再利用圖像視覺化在資訊表達上的優勢和效率，透過各種視覺化圖案的呈現從不同的角度來探索文本，形成一種嶄新的大規模文本閱讀與分析方式。本研究設計了兩階段實驗：第一階段任務導向性實驗、第二階段指定任務實驗，以及評估問卷來驗證本介面的易用性（ Ease-of-use ）和有用性（ Usefulness ）。並透過實驗問卷的分數結果驗證了，本研究所設計之介面在實務上的確能輔助專家學者進行文本相關研究，也能讓對文本熟悉程度不一的使用者在利用此介面探索文本的過程中，更快速地掌握大規模文本的事件全貌。 / With the explosion of text information, there are more and more data being recorded and transmitted in the form of texts. However, as the amount of textual information becomes larger, how to effectively and efficiently realize the information also becomes more difficult. This study attempts to use the Topics Models, text-mining techniques to identify the important topics in the large textual information. In addition, this study also aims to use the techniques of data visualization to present the most informative and valuable details within the large texts. There are two parts in this work: the first part is the introduction of text mining algorithms and the second part is the design of the data visualization.Moreover, in the experiments, we also conduct several surveys to verify the proficiency and usefulness and the visualization design. The results of the experiments and surveys, supports that our design provides an effective and efficient interface for users to understand a large set of texts, even for the experts familiar with the corpus. 資料視覺化文字資料視覺化主題模型
7	應用資料採礦技術於多個資料庫連結與整合劉致琪 Unknown Date (has links) 現今電子化的時代，有些企業雖然擁有數百萬的資料，但要分析起來是相當困難且耗時的，往往又浪費人力與金錢，而又無法得到預期的結果。而利用資料採礦技術，便可以從這大量的資料中，挖掘出隱藏的、有用的訊息及知識，還可以從既有的資料預測未來，使企業可優先獲得商機。對於資料採礦而言，一開始的資料收集便是一項很重要的課題，資料品質的良莠，牽動著結果的正確性及預測的成敗。但每一個研究主題都有其各自的目的、所需的資料變數、適用的演算法等等，所以也有可能無法幸運地在同一個資料庫中得到所需的完整訊息，若是重新進行調查，是很費時、費力的工作。當我們面臨以上問題時，對於部份資料的缺漏該如何補救呢？！這便是我們在本研究中的研究目的。所以我們可以試著從現有的資料庫下，利用兩個其他的資料庫來輔助，利用函數映射的方法來補齊我們所要的資料，如此情況下再來做資料採礦，便能更有效率；對於我們所建立出來的預測模型，也更為準確。在資料庫連結的過程中，我們討論了三種情況，分別為三個資料庫間有相同欄位、兩兩資料庫有相同欄位、三個資料庫間沒有相同欄位。從研究結果發現，不管資料庫之間有無相同欄位可供連結使用，利用函數映射方法為資料庫增加訊息是可行的，而且效能相當不錯，可以提供給資料採礦工作者在蒐集資料時的參考，以及未來的研究方向。資料採礦資料加值函數映射
8	由食譜資料探勘料理特徵樣式 / Mining Cuisine Patterns from Recipe Dataset 呂耀茹 Unknown Date (has links) 近年來越來越多人基於健康理由，自己動手烹調料理，也帶動食譜社群網站的成長。雖然隨著Big Data議題受到注目，Data Mining在近年來相當熱門，然而針對食譜的巨量資料探勘與分析研究並不多。本研究由網路擷取國外知名食譜網站Allrecipes.com、Food.com及Yummly.com的食譜資料，探勘世界主要料理的食材樣式與特性，包括料理口味、常用食材、特色食材、核心食材、食材搭配關係、料理間相似度與分群、及料理自動分類。針對資料前處理，本論文提出結合食材詞庫並利用連通單元標籤演算法，提出解決食材同義詞的方法。為了探勘料理的食材樣式與特性，本研究透過網絡分析、關連規則、Phi, PMI等方法來探勘分析各種料理的特色食材、核心食材與食材搭配樣式。此外，本論文依據料理食材之相似度，並結合階層式分群技術，有別於一般以地理位置來群聚各類料理。本論文也提出運用階層式分類技術，以根據食材來自動判斷食譜的料理種類。透過食譜網站的大量的使用者產生資料，探勘分析世界各種料理的樣式與特性，將可了解各種料理的風格與特色，進而應用在食譜網站的資料管理與查詢。巨量資料資料探勘食譜料理
9	An XML-based Multidimensional Data Exchange Study / 以XML為基礎之多維度資料交換之研究王容, Wang, Jung Unknown Date (has links) 在全球化趨勢與Internet帶動速度競爭的影響下，現今的企業經常採取將旗下部門分散佈署於各地，或者和位於不同地區的公司進行合併結盟的策略，藉以提昇其競爭力與市場反應能力。由於地理位置分散的結果，這類企業當中通常存在著許多不同的資料倉儲系統；為了充分支援管理決策的需求，這些不同的資料倉儲當中的資料必須能夠進行交換與整合，因此需要有一套開放且獨立的資料交換標準，俾能經由Internet在不同的資料倉儲間交換多維度資料。然而目前所知的跨資料倉儲之資料交換解決方案多侷限於逐列資料轉換或是以純文字檔案格式進行資料轉移的方式，這些方式除缺乏效率外亦不夠系統化。在本篇研究中，將探討多維度資料交換的議題，並發展一個以XML為基礎的多維度資料交換模式。本研究並提出一個基於學名結構的方法，以此方法發展一套單一的標準交換格式，並促成分散各地的資料倉儲間形成多對多的系統化映對模式。以本研究所發展之多維度資料模式與XML資料模式間的轉換模式為基礎，並輔以本研究所提出之多維度中介資料管理功能，可形成在網路上通用且以XML為基礎的多維度資料交換過程，並能兼顧效率與品質。本研究並開發一套雛型系統，以XML為基礎來實作多維度資料交換，藉資證明此多維度資料交換模式之可行性，並顯示經由中介資料之輔助可促使多維度資料交換過程更加系統化且更富效率。 / Motivated by the globalization trend and Internet speed competition, enterprise nowadays often divides into many departments or organizations or even merges with other companies that located in different regions to bring up the competency and reaction ability. As a result, there are a number of data warehouse systems in a geographically-distributed enterprise. To meet the distributed decision-making requirements, the data in different data warehouses is addressed to enable data exchange and integration. Therefore, an open, vendor-independent, and efficient data exchange standard to transfer data between data warehouses over the Internet is an important issue. However, current solutions for cross-warehouse data exchange employ only approaches either based on records or transferring plain-text files, which are neither adequate nor efficient. In this research, issues on multidimensional data exchange are studied and an XML-based Multidimensional Data Exchange Model is developed. In addition, a generic-construct-based approach is proposed to enable many-to-many systematic mapping between distributed data warehouses, introducing a consistent and unique standard exchange format. Based on the transformation model we develop between multidimensional data model and XML data model, and enhanced by the multidimensional metadata management function proposed in this research, a general-purpose XML-based multidimensional data exchange process over web is facilitated efficiently and improved in quality. Moreover, we develop an XML-based prototype system to exchange multidimensional data, which shows that the proposed multidimensional data exchange model is feasible, and the multidimensional data exchange process is more systematic and efficient using metadata. 資料倉儲多維度資料模式資料方塊中介資料資料交換 Data Warehouse Multidimensional Data Model Data Cube Metadata XML Data Exchange
10	應用資料採礦技術於資料庫加值中的抽樣方法 / THE SAMPLING METHODS FOR VALUE-ADDED DATABASE IN DATA-MINING 陳惠雯 Unknown Date (has links) In the wake of growing database that has already become the trend of today’s business environment within the foreseeable future, reviewing quality information from mountains of data residing on corporations or organizations’ network such as sales figures, manufacturing statistics, financial data and experimental data is clearly costly, time consuming and definitely ineffective approach. Therefore we would need a sound and effective method in obtaining only portions of the data that are representative to the population and which allow us to build the reliable model based upon the sampled data. However, sometimes we have a situation where the database is of limited in size, under such circumstance, we initiate the idea which is relatively new to adding the attributes or values into the database to enhance the quality of the data Follow through such a procedure; it is obvious that implementing a good sampling method is an important groundwork leading us to reach final destination that is obtaining a reliable predictive model. And this is our research goal that is to get an effective and representative value-added sample of by means of sampling method for building an accuracy predictive model. The concept is pretty straightforward that is if we want to get good predictive samples then we need the correct sampling methods. The sampling methods under study are simple random sample, system sample, stratified sample and uniform design. The models used are the C5.0, logistic regression, and neural network for categorical predictive variable and stepwise regression for continuous predictive variable. The results are discussed in the conclusion section. Keywords: Database、Data Mining、Sampling、Value-added database 資料庫資料採礦抽樣方法資料加值 Database Data Mining Sampling Value-added database

Search results