Global ETD Search

1	運用文字探勘及財務資料探討中國市場營運概況文字敘述及財務表現之一致性 / Using Text Mining and Financial Data to Explore for Consistency between Narrative Disclosure and Financial Performance in China Market 鄭凱文 Unknown Date (has links) 本研究透過文字探勘對中國大陸2011年上市公司的MD&A進行分析，並搭配財務資訊相互比對，分析中國大陸上市公司所揭露的MD&A是否誇大，再透過實證研究分析造成中國大陸上市公司MD&A揭露誇大與否的原因。本研究樣本為2011年中國大陸所有上市公司所揭露的MD&A及相關財務資訊，MD&A非量化資訊係運用Stanford Word Segmenter斷詞資料庫、正負向詞典、TFIDF、K-means等技術進行群集分析，並結合財務資訊的K-Means群集分析，分析出中國大陸2011年上市公司MD&A揭露是否誇大；再代入公司規模、管理階層對風險的偏好程度、獲利能力、償債能力等變數，分析影響公司MD&A揭露誇大與否的因素。研究結果顯示，公司規模、管理階層對風險的偏好程度與公司MD&A資訊揭露傾向於不誇大呈顯著負相關，而公司獲利能力、公司償債能力與公司MD&A資訊揭露傾向於不誇大呈顯著正相關。本研究希望提供投資人另一種分析MD&A的方式，並建議投資人運用上市公司所揭露的MD&A資訊時，需額外考慮公司MD&A揭露有無誇大的情勢，並作適度的調整，以降低投資風險，擬定正確的投資決策。 / This study presented a way to analyze MD&A on listed companies in 2011 in China via text mining, crossing comparison with its fiscal information, validating whether disclosed MD&A on the China listed companies is overstated and its possible factors by empirical study. The research sample is the disclosed MD&A and related financial information on China listed companies in 2011. Qualitative narrative MD&A utilizes Stanford Word Segmenter, NTUSD, TFIDF and K-means performing cluster analysis, combining K-means cluster analysis of financial information, figuring out disclosed MD&A of China listed companies in 2011 exaggerated. By the variables of company scale, the Management of risk preference, profit, liquidity analyzes the effect factor of whether disclosed MD&A exaggerated or not. According to the research, the disclosed MD&A tending not to exaggerate is significantly and negatively related to company scale and the management of risk preference. Profitability and liquidity are significantly and positively relationship to disclosed MD&A tending not to exaggerate. The research is providing another way of reading MD&A with investors, suggesting investors need to take whether disclosed MD&A is overstated into consideration, and adjusting in appropriate in reducing the investment risk when making Investment decisions. 文字探勘 K-means TFIDF
2	運用文字探勘技術建置知識本體之研究 -以財經文件為例 / The study of constructing ontology with text mining techniques－Take the macroeconomic analysis report for an instance 蘇晏譁, Su, Yan Hua Unknown Date (has links) 隨著理財觀念日漸普及，個人與企業對於財經相關資訊的需求也與日俱增。然而，各式各樣隱含有用資訊的財經相關文件雖然越來越容易取得，但多是以文字的方式呈現，無固定格式，較不易整理。如何協助使用者自大量財經文件中尋找和擷取出適當的資訊，已經成為財經相關應用領域的重要研究議題。　　在目前眾多知識挖掘相關方法中，文字探勘（text mining）即是以文件內容為主要分析對象，目的在於自非結構或半結構化的文件中萃取出有意義的知識。為此，若有一個良好的機制能將文字探勘所挖掘的知識加以彙整併保存，便可使財經文件內所隱藏的知識進一步的被應用在相關領域上(如決策支援、資訊檢索、知識管理，而這也成為提昇競爭力的重要利基。　　本研究針對財經領域相關文件（如財經新聞、投顧之研究報告…等）進行分析，結合文字探勘知識挖掘的能力與知識本體的概念，運用文字探勘中重要演算法－關聯分析挖掘財經文件中隱含的關鍵資訊，提出一套藉由關聯分析所得之關聯規則建立知識本體的新方法。此方法有以下幾點特色：(1)建構一「財經標的模型」，定義財經文件內容之基本架構(2)將文字探勘挖掘之知識以知識本體的方式呈現(3)自動化的建構知識本體。 / With the concept of financial management popularizing, Personal and corporations are increasing the financial information demands. However, implicit in all kinds of useful information relevant macroeconomic documents readily available, but most text has no fixed format and difficult to collate. To support users from a large number of macroeconomic documents to find and retrieve the appropriate information has become important research topic in financial-related applications. 　In many Knowledge Mining Approaches, Text mining is based on analyzing the content of the documents; it purpose to extract the meaningful knowledge from Unstructured or Semi-structured Documents. If there is a good mechanism to keep the accumulation of text mining knowledge exploration, the macroeconomic documents will enable to effective application of tacit knowledge in Decision Support, Information Retrieval, Knowledge Management and other related fields, it is the foundation of enhancing competitiveness. 　This study aims to analyzing the macroeconomic documents such as the financial and economic news, the research report of investment consular… and so on, Combined with Text Mining knowledge mining ability and concept of Ontology, by one of the important algorithms to text mining－Association Analysis, discovered latent key information in macroeconomic documents, apply a new method of Association Rules for building Ontology. The method has the following characteristics：(1) Constructed 「Target Model」on structure framework to give a definition for the macroeconomic documents (2) To display the knowledge form text mining by Ontology approaches (3) Constructing Ontology automatically。文字探勘知識本體 Text mining Ontology
3	應用文件探勘技術於概念股股價共同移動之研究 / A study of using text mining on the co-movement of concept stock price 吳振和, Wu, Cheng Ho Unknown Date (has links) 證券市場在台灣為相當熱門的投資標的，台灣屬於淺碟式市場，股市投資者以散戶居多，且資訊來源大多為報紙、電視、網路…等媒體，因此外界的訊息易於影響股價波動。近年來股票分類方式除了傳統的產業類別分類，衍生出了一種新的分類方式－概念股。概念股是某種被看好之產品或產業甚至政策相關個股的集合，概念下的股票通常具有相當大的話題性，因此會引發報章媒體的報導，引發投資者的關注。基於以上原因，可推論概念相關的報導會對概念相關個股的漲跌有一定影響。因此本研究以消息面的資訊作為基礎，並以文字探勘技術加以分析，以聚集出人們有興趣概念所相關之個股。本研究以聯合知識庫2011年1月至4月共86,579篇新聞為資料來源，以iPad2概念為標的，透過文字探勘的技術找出各新聞內容的特徵，並透過關聯分析對新聞做分析，從中找出概念及個股之間的關聯規則，藉此找出和概念相關之個股。接著本研究從台灣證券交易所網站取得2011年2月至5月所有交易日大盤之上漲、下跌個股數量，以其較大值與兩數和相除計算其股價共同移動程度，並取得其累積報酬率與本研究所選出之概念股進行比較。在研究結果中，本研究方法所選出之概念股在門檻值為0.2時在2月至5月股價共同移動程度分別為79.3%、73.6%、70.2%、68.1%，皆高於MoneyDJ選出之概念股及大盤同期間之股價共同移動程度。而以成對樣本T檢定在顯著水準95%下，顯示本研究選出之概念股顯著有股價共同移動現象。因此也證實了藉由文字探勘技術及關聯規則，能從雜亂無序的新聞中發掘出人們有興趣之概念所相關的個股，以提供投資者做更深入分析。 / In Taiwan stock market, most of investors are individual, as the result, the external information will affect the stock price. Concept stock is an aggregation of many stocks on a relative basis such as industry or particular product. It is usually makes the topic to mass media for report, therefore the investor will pay close attention to it. There are many websites offering digital news so we can obtain easily these from the Internet and analyze them. This paper proposes an method to find stocks that relate to the concept from the digital news. In this paper, we collected the news from Udndata, using the text mining technique to analyze these data and performing association analysis’s algorithm to find out the association rule between stocks and concept. Then, we use statistical test to test the co-movement pattern between these Concept Stocks to the Taiwan Stock Index. The result illustrate text mining technique is able to find the relation between stocks and concept and proofs the Concept Stocks have co-movement pattern. 概念股文字探勘關聯規則
4	應用資料探勘技術於食譜分享社群網站進行內容分群之研究 / A user-based content clustering system using data mining techniques on a recipe sharing website 林宜儒 Unknown Date (has links) 本研究以一個食譜分享社群網站為研究對象，針對網站上所提供的食譜建立了運用 kNN 分群演算法的自動分群機制，並利用該網站上使用者的使用行為進行分群後群集的特徵描述參考。本研究以三個階段建立了一針對食譜領域進行自動分群的資訊系統。第一階段為資料處理，在取得食譜網站上所提供的食譜資料後，雖然已經有相對結構化的格式可直接進行分群運算，然而由使用者所輸入的內容，仍有錯別字、贅詞、與食譜本身直接關連性不高等情形，因此必須進行處理。第二階段為資料分群，利用文字探勘進行內容特徵值的萃取，接著再以資料探勘的技術進行分群，分群的結果將會依群內的特徵、群間的相似度作為分群品質的主要指標。第三階段則為群集特徵分析，利用網站上使用者收藏食譜並加以分類的行為，運用統計的方式找出該群集的可能分類名稱。本研究實際以 500 篇食譜進行分群實驗，在最佳的一次分群結果中，可得到 10 個食譜群集、平均群內相似度為 0.4482，每個群集可觀察出明顯的相似特徵，並且可藉由網站上使用者的收藏行為，標註出其群集特徵，例如湯品、甜點、麵包、中式料理等類別。由於網站依照schema.org 所提供的食譜格式標準，針對網站上每一篇食譜內容進行了內容欄位的標記，本研究所實作之食譜分群機制，未來亦可運用在其他同樣採用 schema.org 所提供標準之同類型網站。文字探勘資料分群 text mining data clustering
5	探討美國上市公司MD&A揭露與財務表現一致性之決定因素 / Explore the Determinants of the Consistency between US Listed Companies’ MD&A Disclosure and Financial Performance 李宸昕, Lee, Chen Hsin Unknown Date (has links) 本研究透過文字探勘對美國企業2004年至2014年的MD&A資訊進行分析，並搭配財務資訊相互比較，分析美國企業所揭露的MD&A語調一致性，接著透過實證研究分析造成美國企業MD&A語調一致性結果的原因。MD&A非量化資訊運用Loughran and McDonald正負向詞典、TFIDF、K-Means等技術進行分析，並結合財務資訊分析，分析美國企業2004年至2014年的MD&A資訊；再利用企業績效變異度、企業規模與企業成立年數等變數，來分析影響公司MD&A揭露誇大與否的因素。研究結果顯示，企業規模、企業風險程度、分析師追蹤人數與企業成立年數皆會深深影響MD&A語調的一致性。除了主要實證分析結果外，另外搭配三組穩健性測試來測試模型的敏感性。本研究希望讓資訊使用者運用企業所揭露的MD&A資訊時，能做更多適當的調整，考慮公司MD&A的揭露是否有過度樂觀誇大或是過度悲觀的情勢，並且可以藉此做出正確的經濟決策。 / This study presented a way to analyze the MD&A information of US listed companies from 2004 to 2014 via text mining techniques such as Loughran and McDonald Word Count and TFIDF. Then I cross compare a company’s MD&A information with its financial information using K-Means and establish an index to capture the consistency between the two types of information. Finally, I develop empirical model with explanatory variables such as volatility of earnings, company scale, company’s age, etc. for the consistency index. According to the empirical results, company scale, company operating risks, analyst coverage, and company’s age are significantly related to the MD&A consistency. Three robustness checks demonstrate the similar results. The results suggest investors an additional way of using MD&A other than merely reading it. Investors should consider whether the MD&A is overstated or understated while using it in their investment decisions. MD&A K-Means 文字探勘 MD&A K-Means text mining
6	以文字探勘為基礎之財務風險分析方法研究 / Exploring Financial Risk via Text Mining Approaches 劉澤 Unknown Date (has links) 近年來有許多研究將機器學習應用於財務方面的股價走勢與風險預測。透過分析股票價格、財報的文字資訊、財經新聞或者更即時的推特推文,都有不同的應用方式可以做出一定程度的投資風險評估與股價走勢預測。在這篇論文中,我們著重在財務報表中的文字資訊,並利用文字資訊於財務風險評估的問題上。我們以財報中的文字資訊預測上市公司的風險程度,在此論文中我們選用股價波動度作為衡量財務風險的評量方法。在文字的處理上,我們首先利用財金領域的情緒字典改善原有的文字模型,情緒分析的研究指出情緒字能更有效率地反應文章中的意見或是對於事件的看法,因而能有效地降低文字資訊的雜訊並且提升財報文字資訊預測時的準確率。其次,我們嘗試以權重的方式將股價與投資報酬率等數值資訊帶入機器學習模型中,在學習模型時我們根據公司財報中的數值資訊,給予不同公司財報中的文字資訊權重,並且透過不同權重設定的支持向量機將財報中的文字資訊結合。根據我們的實驗結果顯示,財務情緒字典能有效地代表財報中的文字資訊,同時,財務情緒字與公司的風險高度相關。在財務情緒字以權重的方式將股價與投資報酬率結合的實驗結果中,數值資訊顯著地提升了風險預測的準確率。 / In recent years, there have been some studies using machine learning techniques to predict stock tendency and investment risks in finance. There have also been some applications that analyze the textual information in fi- nancial reports, financial news, or even twitters on social network to provide useful information for stock investors. In this paper, we focus on the problem that uses the textual information in financial reports and numerical informa- tion of companies to predict the financial risk. We use the textual information in financial report of companies to predict the financial risk in the following year. We utilize stock volatility to measure financial risk. In the first part of the thesis, we use a finance-specific sentiment lexicon to improve the pre- diction models that are trained only textual information of financial reports. Then we also provide a sentiment analysis to the results. In the second part of the thesis, we attempt to combine the textual information and the numeri- cal information, such as stock returns to further improve the performance of the prediction models. In specific, in the proposed approach each company instance associated with its financial textual information will be weighted by its stock returns by using the cost-sensitive learning techniques. Our experi- mental results show that, finance-specific sentiment lexicon models conduct comparable performance to those on the original texts, which confirms the importance of financial sentiment words on risk prediction. More impor- tantly, the learned models suggest strong correlations between financial sen- timent words and risk of companies. In addition, our cost-sensitive results significantly improve the cost-insensitive results. As a result, these findings identify the impact of sentiment words in financial reports, and the numerical information can be utilized as the cost weights of learning techniques. 文字探勘財務風險 Text Mining Financial Risk
7	運用kNN文字探勘分析智慧型終端App群集之研究 / The study of analyzing smart handheld device App's clusters by using kNN text mining 曾國傑, Tseng, Kuo Chieh Unknown Date (has links) 隨著智慧型終端設備日益普及，使用者對App需求逐漸增加，各大企業也因此開創了一種新的互動性行銷方式。同時，App下載所帶來的龐大商機也促使許多開發人員紛紛加入App的開發行列，造成App的數量呈現爆炸性成長，而讓使用者在面對種類繁多的App時，無法做出有效率的選擇。故本研究將透過文字探勘與kNN集群分析技術，分析網友發表的App推薦文並將App進行分群；再藉由參數的調整，期望能透過衡量指標的評估來獲得最佳品質之分群，以便作為使用者選擇App之參考依據。為了使大量App進行分群以解決使用者「資訊超載」的問題，本研究以App Store之遊戲類App為分析對象，蒐集了439篇App推薦文章，並依App推薦對象之異同，將其合併成357篇App推薦文章；接著，透過文字探勘技術將文章轉換成可相互比較的向量空間模型，再利用kNN群集分析對其進行分群。同時，藉由參數組合中k值與文件相似度門檻值的調整來獲得最佳品質之分群；其分群品質的評估則透過平均群內相似度等指標來進行衡量；而為了提升分群品質，本研究採用「多階段分群」，以分群後各群集內的文章數量來判斷是否進行再分群或群集合併。本研究結果顯示第一階段分群在k值為10、文件相似度門檻值為0.025時，能獲得最佳之分群品質。而在後續階段的分群過程中，因群集內文章數減少，故將k值降低並逐漸提高文件相似度門檻值以獲得分群效果。第二階段結束後，可針對已達到分群停止條件之群集進行關鍵詞彙萃取，並可歸類出「棒球/射擊」與「投擲飛行」等6種App類型；其後階段依循相同分群規則可獲得「守城塔防」等14種App類型。分群結束後，共可分出36個群集並獲得20種App類型。分群過程中，平均群內相似度逐漸增加；平均群間相似度則逐漸下降；分群品質衡量指標由第一階段分群後的12.65%提升到第五階段結束時的75.81%。由本研究可知分群之後相似度高的App會逐漸聚集成群，所獲得之各群集命名結果將能作為使用者選擇App之參考依據；App軟體開發人員也能從各群集之關鍵詞彙中了解使用者所注重的遊戲元素，改善App內容以更符合使用者之需求。而以本研究結果為基礎，透過建立專業詞庫改善分群品質、利用文件摘要技術加強使用者對各群集之了解，或建立App推薦系統等皆可做為未來研究之方向。 / With the popularity of Smart Handheld Devices are increasing, the needs of “App” are spreading. Developers whom devote themselves to this opportunity are also rising, making the total number of Apps growing rapidly. Facing these kind of situation, users couldn’t choose the App they need efficiently. This research uses text mining and kNN Clustering technique analyzing the recommendation reviews of App by netizen then clustering the App recommendation articles; Through the adjustments of parameters, we expect to evaluate the measurement indicators to obtain the best quality cluster to use as a basis for users to select Apps. In order to solve the information overload for the user, we analyzed apps of the “Games” category form App store and sorted out to 357 App recommendation articles to use as our analysis target. Then we used text mining technique to process the articles and uses kNN clustering analysis to sort out the articles. Simultaneously, we fine tuning the measurement indicators to find the optimal cluster. This research uses multi-phase clustering technique to assure the quality of each cluster. We discriminate 36 clusters and 20 categories from the clustering results. During the clustering process, the Mean of Intra-cluster Similarity increases gradually; in the contrary, the Mean of Inter-cluster Similarity reduces. The “Cluster Quality” increases from 12.65% significantly to 75.81%. In conclusion, similar Apps will gradually been clustered by its similarities, and can be used to be a reference by its cluster’s name. The App developers can also understands the game elements which the users pay greater attentions and tailored their contents to match the needs of the users according to the key phrases from each cluster. In further discussion, building specialized terms database of App to improve the quality of the clustering, using summarization technique to robust user understanding of each cluster, or to build up App recommendation system is liking to be further studied via using the results by this research. App kNN 群集分析文字探勘 App kNN Clustering Text Mining
8	應用文字探勘技術萃取設計概念之研究 / A study of using text mining on design concept extraction 羅康維, Luo, Kang Wei Unknown Date (has links) 近年來，設計已成為提高產品附加價值並增進利潤的利器之一，企業在全球競爭壓力下為了提升競爭力，積極透過設計力開發創新產品。在政府的積極推動下，許多傳統產業與設計公司媒合。然而如何將產品創新需求，轉換並傳達成設計概念，成為極其重要且困難的問題。本研究為有效傳達設計概念，蒐集2005年至2012年參加德國iF國際產品設計大獎以及RedDot設計獎得獎作品，鎖定所有桌椅櫃類的產品描述，應用文字探勘技術將產品描述過濾並找出對應特徵值亦即設計元素，再利用KNN技術將設計元素分群，試圖從各群中萃取出設計概念。本研究將260篇桌椅櫃類產品設計文件中分成16群設計概念。分群係以群內平均相似度大於0.05做為門檻以形成設計概念。本研究結果分為16群設計概念，分別命名為「特色零件多樣感覺概念」、「傳統與現代木椅概念」、「以系統為主的豪華家具」、「波型的時尚概念」、「多樣設計感沙發」、「多造型十字腳椅」、「仿生化人體工學概念」、「親子概念」、「舒適躺臥概念」、「具設計感的室內外用椅」、「注重靠背設計概念」、「多角度對稱概念」、「各式形狀桌面與沙發概念」、「殼形靠背椅」、「中國傳統」、「強調地點取向的概念」等概念，需求者可透過需求之設計元素對應出相關設計概念群與設計者進行有效溝通，更快的了解所想要設計之產品，設計師可以大大縮短在需求階段所消耗的時間以及力氣。最後本研究亦提出一些未來研究方向。關鍵字：文字探勘、kNN、設計概念、萃取文字探勘 kNN 設計概念萃取
9	運用文字探勘分析非量化資訊協助投資人預測公司財務表現 / Using text-mining analysis on qualitative information to predict companies’ financial performance 葉又豪 Unknown Date (has links) 藉由對數字和文字形式的資訊進行分析，以協助使用者有效率地分類閱讀這些非結構化文字資訊。本研究針對2002年至2011年的上市半導體公司，使用TFIDF分析非量化的資訊，並結合量化資訊的K-Means分群分析，進而對財務表現進行預測。希望能協助投資人有效降低投資風險，創造更大的報酬。以本研究之方法進行財務表現之預測，最後的預測相符率可接近60%，詳細分析後發現下列現象：一則是公司發佈較多未來的資訊導致了文件相似度計算的失真；二則是相同K-Means分群的公司，可能使用不同字眼(保守或是不保守)，闡述公司的表現，使TFIDF計算過程受到影響；三則是成長率對於經濟環境變動的反應最為明顯。但無論其原因為何，只要量化資訊有所改變，非量化資訊就會有所變化。投資人便可從非量化資訊內容的變化，進而預測下一年度的量化資訊，藉以減少投資的風險，制定正確的決策。 K-均值文字探勘量化與非量化資訊
10	透過文字探勘技術探討各校高階經營管理（EMBA）學程之特性－以九校國立大學為例 / Analyzing the Profiles of EMBA Program by Text Mining Methodology - A Case of Nine EMBA Programs 林庭竹, Lin, Ting Chu Unknown Date (has links) 近年來，臺灣高階經營管理（EMBA）學程市場逐漸飽和，預計就讀EMBA的企業經理人比例趨緩，再加上兩岸三地EMBA學程崛起，都將影響臺灣EMBA的發展。因此，本研究認為可根據供應面與需求面來進行檢視，分析出目前臺灣EMBA供需兩大層面，由各校教師與學生所嶄露的特徵輪廓，使臺灣的EMBA邁向具有各校特色的適性化學程。在第一階段研究過程中，選取臺灣九校國立頂尖大學所設立的EMBA，作為研究對象。利用Python撰寫爬蟲程式，蒐集九校EMBA教師與學生的文章標題與概要，其中教師文本總計23033篇，學生文本總計7342篇。運用Jieba對文本斷詞後，以14個管理學別視為供應面，需求面則是根據政府訂立的12個職業別，來做為目標字詞，透過Word2Vec模型計算管理學別與教師、職業別與學生文本兩大目標字詞的關聯詞，最後獲得各目標字詞20個關聯詞的詞集。而第二階段透過第一階段所呈現的關聯詞，進一步計算與教師和學生文本字詞的Cosine相似度，來辨別各校教師與學生間所呈現的供需面之共同特徵，代表該EMBA之特質。第一階段研究結果顯示，Word2Vec模型透過特徵向量辨別關聯詞時，可準確辨別出與目標字詞具有相同涵義或相互關聯的字詞，且所找出的20個關聯字詞與目標字詞的Cosine相似度也多大於0.7，因此透過Word2Vec模型建立目標字詞之擴增詞集具有相當高的準確性。而第二階段透過第一階段所呈現的關聯詞所計算的供需面Cosine相似度之排序，可發現各校EMBA由教師與學生成員文本與各目標字詞的相似度排序皆有所不同，因此各學程可透過其差異性作為特色指標，發展出適性化學程，提高臺灣企業經理人就讀EMBA之意願。文字探勘高階經營管理學程 EMBA Word2Vec 特徵輪廓

1	運用文字探勘及財務資料探討中國市場營運概況文字敘述及財務表現之一致性 / Using Text Mining and Financial Data to Explore for Consistency between Narrative Disclosure and Financial Performance in China Market 鄭凱文 Unknown Date (has links) 本研究透過文字探勘對中國大陸2011年上市公司的MD&A進行分析，並搭配財務資訊相互比對，分析中國大陸上市公司所揭露的MD&A是否誇大，再透過實證研究分析造成中國大陸上市公司MD&A揭露誇大與否的原因。本研究樣本為2011年中國大陸所有上市公司所揭露的MD&A及相關財務資訊，MD&A非量化資訊係運用Stanford Word Segmenter斷詞資料庫、正負向詞典、TFIDF、K-means等技術進行群集分析，並結合財務資訊的K-Means群集分析，分析出中國大陸2011年上市公司MD&A揭露是否誇大；再代入公司規模、管理階層對風險的偏好程度、獲利能力、償債能力等變數，分析影響公司MD&A揭露誇大與否的因素。研究結果顯示，公司規模、管理階層對風險的偏好程度與公司MD&A資訊揭露傾向於不誇大呈顯著負相關，而公司獲利能力、公司償債能力與公司MD&A資訊揭露傾向於不誇大呈顯著正相關。本研究希望提供投資人另一種分析MD&A的方式，並建議投資人運用上市公司所揭露的MD&A資訊時，需額外考慮公司MD&A揭露有無誇大的情勢，並作適度的調整，以降低投資風險，擬定正確的投資決策。 / This study presented a way to analyze MD&A on listed companies in 2011 in China via text mining, crossing comparison with its fiscal information, validating whether disclosed MD&A on the China listed companies is overstated and its possible factors by empirical study. The research sample is the disclosed MD&A and related financial information on China listed companies in 2011. Qualitative narrative MD&A utilizes Stanford Word Segmenter, NTUSD, TFIDF and K-means performing cluster analysis, combining K-means cluster analysis of financial information, figuring out disclosed MD&A of China listed companies in 2011 exaggerated. By the variables of company scale, the Management of risk preference, profit, liquidity analyzes the effect factor of whether disclosed MD&A exaggerated or not. According to the research, the disclosed MD&A tending not to exaggerate is significantly and negatively related to company scale and the management of risk preference. Profitability and liquidity are significantly and positively relationship to disclosed MD&A tending not to exaggerate. The research is providing another way of reading MD&A with investors, suggesting investors need to take whether disclosed MD&A is overstated into consideration, and adjusting in appropriate in reducing the investment risk when making Investment decisions. 文字探勘 K-means TFIDF
2	運用文字探勘技術建置知識本體之研究 -以財經文件為例 / The study of constructing ontology with text mining techniques－Take the macroeconomic analysis report for an instance 蘇晏譁, Su, Yan Hua Unknown Date (has links) 隨著理財觀念日漸普及，個人與企業對於財經相關資訊的需求也與日俱增。然而，各式各樣隱含有用資訊的財經相關文件雖然越來越容易取得，但多是以文字的方式呈現，無固定格式，較不易整理。如何協助使用者自大量財經文件中尋找和擷取出適當的資訊，已經成為財經相關應用領域的重要研究議題。　　在目前眾多知識挖掘相關方法中，文字探勘（text mining）即是以文件內容為主要分析對象，目的在於自非結構或半結構化的文件中萃取出有意義的知識。為此，若有一個良好的機制能將文字探勘所挖掘的知識加以彙整併保存，便可使財經文件內所隱藏的知識進一步的被應用在相關領域上(如決策支援、資訊檢索、知識管理，而這也成為提昇競爭力的重要利基。　　本研究針對財經領域相關文件（如財經新聞、投顧之研究報告…等）進行分析，結合文字探勘知識挖掘的能力與知識本體的概念，運用文字探勘中重要演算法－關聯分析挖掘財經文件中隱含的關鍵資訊，提出一套藉由關聯分析所得之關聯規則建立知識本體的新方法。此方法有以下幾點特色：(1)建構一「財經標的模型」，定義財經文件內容之基本架構(2)將文字探勘挖掘之知識以知識本體的方式呈現(3)自動化的建構知識本體。 / With the concept of financial management popularizing, Personal and corporations are increasing the financial information demands. However, implicit in all kinds of useful information relevant macroeconomic documents readily available, but most text has no fixed format and difficult to collate. To support users from a large number of macroeconomic documents to find and retrieve the appropriate information has become important research topic in financial-related applications. 　In many Knowledge Mining Approaches, Text mining is based on analyzing the content of the documents; it purpose to extract the meaningful knowledge from Unstructured or Semi-structured Documents. If there is a good mechanism to keep the accumulation of text mining knowledge exploration, the macroeconomic documents will enable to effective application of tacit knowledge in Decision Support, Information Retrieval, Knowledge Management and other related fields, it is the foundation of enhancing competitiveness. 　This study aims to analyzing the macroeconomic documents such as the financial and economic news, the research report of investment consular… and so on, Combined with Text Mining knowledge mining ability and concept of Ontology, by one of the important algorithms to text mining－Association Analysis, discovered latent key information in macroeconomic documents, apply a new method of Association Rules for building Ontology. The method has the following characteristics：(1) Constructed 「Target Model」on structure framework to give a definition for the macroeconomic documents (2) To display the knowledge form text mining by Ontology approaches (3) Constructing Ontology automatically。文字探勘知識本體 Text mining Ontology
3	應用文件探勘技術於概念股股價共同移動之研究 / A study of using text mining on the co-movement of concept stock price 吳振和, Wu, Cheng Ho Unknown Date (has links) 證券市場在台灣為相當熱門的投資標的，台灣屬於淺碟式市場，股市投資者以散戶居多，且資訊來源大多為報紙、電視、網路…等媒體，因此外界的訊息易於影響股價波動。近年來股票分類方式除了傳統的產業類別分類，衍生出了一種新的分類方式－概念股。概念股是某種被看好之產品或產業甚至政策相關個股的集合，概念下的股票通常具有相當大的話題性，因此會引發報章媒體的報導，引發投資者的關注。基於以上原因，可推論概念相關的報導會對概念相關個股的漲跌有一定影響。因此本研究以消息面的資訊作為基礎，並以文字探勘技術加以分析，以聚集出人們有興趣概念所相關之個股。本研究以聯合知識庫2011年1月至4月共86,579篇新聞為資料來源，以iPad2概念為標的，透過文字探勘的技術找出各新聞內容的特徵，並透過關聯分析對新聞做分析，從中找出概念及個股之間的關聯規則，藉此找出和概念相關之個股。接著本研究從台灣證券交易所網站取得2011年2月至5月所有交易日大盤之上漲、下跌個股數量，以其較大值與兩數和相除計算其股價共同移動程度，並取得其累積報酬率與本研究所選出之概念股進行比較。在研究結果中，本研究方法所選出之概念股在門檻值為0.2時在2月至5月股價共同移動程度分別為79.3%、73.6%、70.2%、68.1%，皆高於MoneyDJ選出之概念股及大盤同期間之股價共同移動程度。而以成對樣本T檢定在顯著水準95%下，顯示本研究選出之概念股顯著有股價共同移動現象。因此也證實了藉由文字探勘技術及關聯規則，能從雜亂無序的新聞中發掘出人們有興趣之概念所相關的個股，以提供投資者做更深入分析。 / In Taiwan stock market, most of investors are individual, as the result, the external information will affect the stock price. Concept stock is an aggregation of many stocks on a relative basis such as industry or particular product. It is usually makes the topic to mass media for report, therefore the investor will pay close attention to it. There are many websites offering digital news so we can obtain easily these from the Internet and analyze them. This paper proposes an method to find stocks that relate to the concept from the digital news. In this paper, we collected the news from Udndata, using the text mining technique to analyze these data and performing association analysis’s algorithm to find out the association rule between stocks and concept. Then, we use statistical test to test the co-movement pattern between these Concept Stocks to the Taiwan Stock Index. The result illustrate text mining technique is able to find the relation between stocks and concept and proofs the Concept Stocks have co-movement pattern. 概念股文字探勘關聯規則
4	應用資料探勘技術於食譜分享社群網站進行內容分群之研究 / A user-based content clustering system using data mining techniques on a recipe sharing website 林宜儒 Unknown Date (has links) 本研究以一個食譜分享社群網站為研究對象，針對網站上所提供的食譜建立了運用 kNN 分群演算法的自動分群機制，並利用該網站上使用者的使用行為進行分群後群集的特徵描述參考。本研究以三個階段建立了一針對食譜領域進行自動分群的資訊系統。第一階段為資料處理，在取得食譜網站上所提供的食譜資料後，雖然已經有相對結構化的格式可直接進行分群運算，然而由使用者所輸入的內容，仍有錯別字、贅詞、與食譜本身直接關連性不高等情形，因此必須進行處理。第二階段為資料分群，利用文字探勘進行內容特徵值的萃取，接著再以資料探勘的技術進行分群，分群的結果將會依群內的特徵、群間的相似度作為分群品質的主要指標。第三階段則為群集特徵分析，利用網站上使用者收藏食譜並加以分類的行為，運用統計的方式找出該群集的可能分類名稱。本研究實際以 500 篇食譜進行分群實驗，在最佳的一次分群結果中，可得到 10 個食譜群集、平均群內相似度為 0.4482，每個群集可觀察出明顯的相似特徵，並且可藉由網站上使用者的收藏行為，標註出其群集特徵，例如湯品、甜點、麵包、中式料理等類別。由於網站依照schema.org 所提供的食譜格式標準，針對網站上每一篇食譜內容進行了內容欄位的標記，本研究所實作之食譜分群機制，未來亦可運用在其他同樣採用 schema.org 所提供標準之同類型網站。文字探勘資料分群 text mining data clustering
5	探討美國上市公司MD&A揭露與財務表現一致性之決定因素 / Explore the Determinants of the Consistency between US Listed Companies’ MD&A Disclosure and Financial Performance 李宸昕, Lee, Chen Hsin Unknown Date (has links) 本研究透過文字探勘對美國企業2004年至2014年的MD&A資訊進行分析，並搭配財務資訊相互比較，分析美國企業所揭露的MD&A語調一致性，接著透過實證研究分析造成美國企業MD&A語調一致性結果的原因。MD&A非量化資訊運用Loughran and McDonald正負向詞典、TFIDF、K-Means等技術進行分析，並結合財務資訊分析，分析美國企業2004年至2014年的MD&A資訊；再利用企業績效變異度、企業規模與企業成立年數等變數，來分析影響公司MD&A揭露誇大與否的因素。研究結果顯示，企業規模、企業風險程度、分析師追蹤人數與企業成立年數皆會深深影響MD&A語調的一致性。除了主要實證分析結果外，另外搭配三組穩健性測試來測試模型的敏感性。本研究希望讓資訊使用者運用企業所揭露的MD&A資訊時，能做更多適當的調整，考慮公司MD&A的揭露是否有過度樂觀誇大或是過度悲觀的情勢，並且可以藉此做出正確的經濟決策。 / This study presented a way to analyze the MD&A information of US listed companies from 2004 to 2014 via text mining techniques such as Loughran and McDonald Word Count and TFIDF. Then I cross compare a company’s MD&A information with its financial information using K-Means and establish an index to capture the consistency between the two types of information. Finally, I develop empirical model with explanatory variables such as volatility of earnings, company scale, company’s age, etc. for the consistency index. According to the empirical results, company scale, company operating risks, analyst coverage, and company’s age are significantly related to the MD&A consistency. Three robustness checks demonstrate the similar results. The results suggest investors an additional way of using MD&A other than merely reading it. Investors should consider whether the MD&A is overstated or understated while using it in their investment decisions. MD&A K-Means 文字探勘 MD&A K-Means text mining
6	以文字探勘為基礎之財務風險分析方法研究 / Exploring Financial Risk via Text Mining Approaches 劉澤 Unknown Date (has links) 近年來有許多研究將機器學習應用於財務方面的股價走勢與風險預測。透過分析股票價格、財報的文字資訊、財經新聞或者更即時的推特推文,都有不同的應用方式可以做出一定程度的投資風險評估與股價走勢預測。在這篇論文中,我們著重在財務報表中的文字資訊,並利用文字資訊於財務風險評估的問題上。我們以財報中的文字資訊預測上市公司的風險程度,在此論文中我們選用股價波動度作為衡量財務風險的評量方法。在文字的處理上,我們首先利用財金領域的情緒字典改善原有的文字模型,情緒分析的研究指出情緒字能更有效率地反應文章中的意見或是對於事件的看法,因而能有效地降低文字資訊的雜訊並且提升財報文字資訊預測時的準確率。其次,我們嘗試以權重的方式將股價與投資報酬率等數值資訊帶入機器學習模型中,在學習模型時我們根據公司財報中的數值資訊,給予不同公司財報中的文字資訊權重,並且透過不同權重設定的支持向量機將財報中的文字資訊結合。根據我們的實驗結果顯示,財務情緒字典能有效地代表財報中的文字資訊,同時,財務情緒字與公司的風險高度相關。在財務情緒字以權重的方式將股價與投資報酬率結合的實驗結果中,數值資訊顯著地提升了風險預測的準確率。 / In recent years, there have been some studies using machine learning techniques to predict stock tendency and investment risks in finance. There have also been some applications that analyze the textual information in fi- nancial reports, financial news, or even twitters on social network to provide useful information for stock investors. In this paper, we focus on the problem that uses the textual information in financial reports and numerical informa- tion of companies to predict the financial risk. We use the textual information in financial report of companies to predict the financial risk in the following year. We utilize stock volatility to measure financial risk. In the first part of the thesis, we use a finance-specific sentiment lexicon to improve the pre- diction models that are trained only textual information of financial reports. Then we also provide a sentiment analysis to the results. In the second part of the thesis, we attempt to combine the textual information and the numeri- cal information, such as stock returns to further improve the performance of the prediction models. In specific, in the proposed approach each company instance associated with its financial textual information will be weighted by its stock returns by using the cost-sensitive learning techniques. Our experi- mental results show that, finance-specific sentiment lexicon models conduct comparable performance to those on the original texts, which confirms the importance of financial sentiment words on risk prediction. More impor- tantly, the learned models suggest strong correlations between financial sen- timent words and risk of companies. In addition, our cost-sensitive results significantly improve the cost-insensitive results. As a result, these findings identify the impact of sentiment words in financial reports, and the numerical information can be utilized as the cost weights of learning techniques. 文字探勘財務風險 Text Mining Financial Risk
7	運用kNN文字探勘分析智慧型終端App群集之研究 / The study of analyzing smart handheld device App's clusters by using kNN text mining 曾國傑, Tseng, Kuo Chieh Unknown Date (has links) 隨著智慧型終端設備日益普及，使用者對App需求逐漸增加，各大企業也因此開創了一種新的互動性行銷方式。同時，App下載所帶來的龐大商機也促使許多開發人員紛紛加入App的開發行列，造成App的數量呈現爆炸性成長，而讓使用者在面對種類繁多的App時，無法做出有效率的選擇。故本研究將透過文字探勘與kNN集群分析技術，分析網友發表的App推薦文並將App進行分群；再藉由參數的調整，期望能透過衡量指標的評估來獲得最佳品質之分群，以便作為使用者選擇App之參考依據。為了使大量App進行分群以解決使用者「資訊超載」的問題，本研究以App Store之遊戲類App為分析對象，蒐集了439篇App推薦文章，並依App推薦對象之異同，將其合併成357篇App推薦文章；接著，透過文字探勘技術將文章轉換成可相互比較的向量空間模型，再利用kNN群集分析對其進行分群。同時，藉由參數組合中k值與文件相似度門檻值的調整來獲得最佳品質之分群；其分群品質的評估則透過平均群內相似度等指標來進行衡量；而為了提升分群品質，本研究採用「多階段分群」，以分群後各群集內的文章數量來判斷是否進行再分群或群集合併。本研究結果顯示第一階段分群在k值為10、文件相似度門檻值為0.025時，能獲得最佳之分群品質。而在後續階段的分群過程中，因群集內文章數減少，故將k值降低並逐漸提高文件相似度門檻值以獲得分群效果。第二階段結束後，可針對已達到分群停止條件之群集進行關鍵詞彙萃取，並可歸類出「棒球/射擊」與「投擲飛行」等6種App類型；其後階段依循相同分群規則可獲得「守城塔防」等14種App類型。分群結束後，共可分出36個群集並獲得20種App類型。分群過程中，平均群內相似度逐漸增加；平均群間相似度則逐漸下降；分群品質衡量指標由第一階段分群後的12.65%提升到第五階段結束時的75.81%。由本研究可知分群之後相似度高的App會逐漸聚集成群，所獲得之各群集命名結果將能作為使用者選擇App之參考依據；App軟體開發人員也能從各群集之關鍵詞彙中了解使用者所注重的遊戲元素，改善App內容以更符合使用者之需求。而以本研究結果為基礎，透過建立專業詞庫改善分群品質、利用文件摘要技術加強使用者對各群集之了解，或建立App推薦系統等皆可做為未來研究之方向。 / With the popularity of Smart Handheld Devices are increasing, the needs of “App” are spreading. Developers whom devote themselves to this opportunity are also rising, making the total number of Apps growing rapidly. Facing these kind of situation, users couldn’t choose the App they need efficiently. This research uses text mining and kNN Clustering technique analyzing the recommendation reviews of App by netizen then clustering the App recommendation articles; Through the adjustments of parameters, we expect to evaluate the measurement indicators to obtain the best quality cluster to use as a basis for users to select Apps. In order to solve the information overload for the user, we analyzed apps of the “Games” category form App store and sorted out to 357 App recommendation articles to use as our analysis target. Then we used text mining technique to process the articles and uses kNN clustering analysis to sort out the articles. Simultaneously, we fine tuning the measurement indicators to find the optimal cluster. This research uses multi-phase clustering technique to assure the quality of each cluster. We discriminate 36 clusters and 20 categories from the clustering results. During the clustering process, the Mean of Intra-cluster Similarity increases gradually; in the contrary, the Mean of Inter-cluster Similarity reduces. The “Cluster Quality” increases from 12.65% significantly to 75.81%. In conclusion, similar Apps will gradually been clustered by its similarities, and can be used to be a reference by its cluster’s name. The App developers can also understands the game elements which the users pay greater attentions and tailored their contents to match the needs of the users according to the key phrases from each cluster. In further discussion, building specialized terms database of App to improve the quality of the clustering, using summarization technique to robust user understanding of each cluster, or to build up App recommendation system is liking to be further studied via using the results by this research. App kNN 群集分析文字探勘 App kNN Clustering Text Mining
8	應用文字探勘技術萃取設計概念之研究 / A study of using text mining on design concept extraction 羅康維, Luo, Kang Wei Unknown Date (has links) 近年來，設計已成為提高產品附加價值並增進利潤的利器之一，企業在全球競爭壓力下為了提升競爭力，積極透過設計力開發創新產品。在政府的積極推動下，許多傳統產業與設計公司媒合。然而如何將產品創新需求，轉換並傳達成設計概念，成為極其重要且困難的問題。本研究為有效傳達設計概念，蒐集2005年至2012年參加德國iF國際產品設計大獎以及RedDot設計獎得獎作品，鎖定所有桌椅櫃類的產品描述，應用文字探勘技術將產品描述過濾並找出對應特徵值亦即設計元素，再利用KNN技術將設計元素分群，試圖從各群中萃取出設計概念。本研究將260篇桌椅櫃類產品設計文件中分成16群設計概念。分群係以群內平均相似度大於0.05做為門檻以形成設計概念。本研究結果分為16群設計概念，分別命名為「特色零件多樣感覺概念」、「傳統與現代木椅概念」、「以系統為主的豪華家具」、「波型的時尚概念」、「多樣設計感沙發」、「多造型十字腳椅」、「仿生化人體工學概念」、「親子概念」、「舒適躺臥概念」、「具設計感的室內外用椅」、「注重靠背設計概念」、「多角度對稱概念」、「各式形狀桌面與沙發概念」、「殼形靠背椅」、「中國傳統」、「強調地點取向的概念」等概念，需求者可透過需求之設計元素對應出相關設計概念群與設計者進行有效溝通，更快的了解所想要設計之產品，設計師可以大大縮短在需求階段所消耗的時間以及力氣。最後本研究亦提出一些未來研究方向。關鍵字：文字探勘、kNN、設計概念、萃取文字探勘 kNN 設計概念萃取
9	運用文字探勘分析非量化資訊協助投資人預測公司財務表現 / Using text-mining analysis on qualitative information to predict companies’ financial performance 葉又豪 Unknown Date (has links) 藉由對數字和文字形式的資訊進行分析，以協助使用者有效率地分類閱讀這些非結構化文字資訊。本研究針對2002年至2011年的上市半導體公司，使用TFIDF分析非量化的資訊，並結合量化資訊的K-Means分群分析，進而對財務表現進行預測。希望能協助投資人有效降低投資風險，創造更大的報酬。以本研究之方法進行財務表現之預測，最後的預測相符率可接近60%，詳細分析後發現下列現象：一則是公司發佈較多未來的資訊導致了文件相似度計算的失真；二則是相同K-Means分群的公司，可能使用不同字眼(保守或是不保守)，闡述公司的表現，使TFIDF計算過程受到影響；三則是成長率對於經濟環境變動的反應最為明顯。但無論其原因為何，只要量化資訊有所改變，非量化資訊就會有所變化。投資人便可從非量化資訊內容的變化，進而預測下一年度的量化資訊，藉以減少投資的風險，制定正確的決策。 K-均值文字探勘量化與非量化資訊
10	透過文字探勘技術探討各校高階經營管理（EMBA）學程之特性－以九校國立大學為例 / Analyzing the Profiles of EMBA Program by Text Mining Methodology - A Case of Nine EMBA Programs 林庭竹, Lin, Ting Chu Unknown Date (has links) 近年來，臺灣高階經營管理（EMBA）學程市場逐漸飽和，預計就讀EMBA的企業經理人比例趨緩，再加上兩岸三地EMBA學程崛起，都將影響臺灣EMBA的發展。因此，本研究認為可根據供應面與需求面來進行檢視，分析出目前臺灣EMBA供需兩大層面，由各校教師與學生所嶄露的特徵輪廓，使臺灣的EMBA邁向具有各校特色的適性化學程。在第一階段研究過程中，選取臺灣九校國立頂尖大學所設立的EMBA，作為研究對象。利用Python撰寫爬蟲程式，蒐集九校EMBA教師與學生的文章標題與概要，其中教師文本總計23033篇，學生文本總計7342篇。運用Jieba對文本斷詞後，以14個管理學別視為供應面，需求面則是根據政府訂立的12個職業別，來做為目標字詞，透過Word2Vec模型計算管理學別與教師、職業別與學生文本兩大目標字詞的關聯詞，最後獲得各目標字詞20個關聯詞的詞集。而第二階段透過第一階段所呈現的關聯詞，進一步計算與教師和學生文本字詞的Cosine相似度，來辨別各校教師與學生間所呈現的供需面之共同特徵，代表該EMBA之特質。第一階段研究結果顯示，Word2Vec模型透過特徵向量辨別關聯詞時，可準確辨別出與目標字詞具有相同涵義或相互關聯的字詞，且所找出的20個關聯字詞與目標字詞的Cosine相似度也多大於0.7，因此透過Word2Vec模型建立目標字詞之擴增詞集具有相當高的準確性。而第二階段透過第一階段所呈現的關聯詞所計算的供需面Cosine相似度之排序，可發現各校EMBA由教師與學生成員文本與各目標字詞的相似度排序皆有所不同，因此各學程可透過其差異性作為特色指標，發展出適性化學程，提高臺灣企業經理人就讀EMBA之意願。文字探勘高階經營管理學程 EMBA Word2Vec 特徵輪廓

Search results