1 |
詞義相似度的社會網路分析研究 / A study on word similarity with social network analysis溫文喆 Unknown Date (has links)
社會網路分析(social network analysis)將社會關係以網路形式表示,從原本純粹分析社會互動的工具,到近年來被廣泛被應用在社會學、組織研究、資訊科學、生物學、語言學等各種領域,藉由引入數學圖學理論與與日益精進的電腦處理能力,使得社會網路分析能從有別於以往的角度找出個體間行動的規律;而詞義相似度(word similarity)是資訊檢索等技術發展的基礎課題之一,近年來對詞義相似度的量測有許多方法的提出。
本研究針對英語字詞利用社會網路分析這樣的工具,藉由提出不同的網路建構方式,以語料庫為資料來源,設定網路節點與連結關係,以共現網路(co-occurrence networks)為基礎,經由改變產生與篩選的條件,觀察以社會網路分析已有的性質或指標做調整,是否可以對詞義相似度提供另一種量測方式;同時以目前詞義相似度研究上已有同義詞標準評比對前述產生的網路與所計算的性質做驗證,並進一步探討使用社會網路分析在詞義相似度研究上的適用性。
|
2 |
模糊統計在時間數列分析與相似度之應用 / Application of fuzzy statistics in time series analysis and similarity recognition張建瑋 Unknown Date (has links)
在時間數列的分析上,由於一些辨識模型結構的方法,常受制於時間數列本身的非定態及不確定干擾的影響,因此若以單一模式來配適數列往往不能得到滿意的結果。
此外,傳統的統計方法太依賴數字本身,但當一時間數列其資料呈相當的模糊性時,我們往往僅對其走勢感興趣,故若能從圖形識別的觀點,找出與此時間數列具有高度相似性的資料,以作為此時間數列的領先指標或參考指標,應可比傳統單一時間數列模式(無論是線性或非線性)更能解釋資料走勢及解決結構性改變之問題,並能夠即時反應最丟出伏況,增加預測之準確性。
在本文中,我們考慮應用模糊理論建立一時間數列模糊相似性演算法 ,來辨識時間數列之間的相似性。在執行此演算法的過程申,我們依資料的特性如變異數是否改變、是否有離群值或突發值干擾等的不同,提出值域均分法、k-means值域均分法及Rank轉換法等三種方法來建構隸屬度函數 ,以求得對資料更好的解釋及預測結果。模擬的結果顯示 ,值域均分法在時間數列間的模糊相似性辦識表現最好。而在實證分析中,我們以此演算法來辨別GDP與民間消費、GDP與毛投資之間的模糊相似性,其結果相當不錯。 / An important problem in pattern recognition of a time series is similarity recognition. This paper presents the methods of similarity calculation for two time series. The methods considered include equally divided range method, K-rneans method and rank transformed method. The success of our similarity recognition relies a large extent on the fuzzy statistical concept. Simulation results demonstrate that, overall, the equally divided range method performed best in the similarity recognition. While other methods provide superior efficiency in calculating similarity for certain special time series. Finally two empirical examples, similarity calculating about GDP vs. Consumption and GDP vs. Invest, are illustrated.
|
3 |
新推個案競爭程度分析 / The analysis of the degree of the competition on residential projects彭竹君 Unknown Date (has links)
市場分析目的在於掌握顧客需求並了解競爭對手,知己知彼、百戰百勝。住宅市場中,新推個案供給者會參考競爭個案的產品類型,做出「跟隨者」與「區隔者」之選擇。過去市場分析致力於次市場範圍界定,試圖找出具有替代性之競爭個案。然而,針對競爭個案的選取,過去缺少量化分析,多依主觀經驗判斷,況且隨著推案策略之不同,競爭個案之選擇應具有彈性。
因此,本研究就供給者立場,利用2007年7月至2008年6月新推個案資料,從相似角度切入,針對產品屬性、價格、時間與空間等四個面向,根據ANP專家問卷結果為權重參考,衡量個案間彼此競爭程度,以0到1表示,並以台北都會區為例,探討市場範圍內推案競爭情況。
若一次市場內兩個案之產品屬性越相似、推案總價越相近、推案時間越接近、推案地點越近,則競爭程度數值越接近1,個案間彼此競爭程度越大。研究結果發現,空間距離為最重要之影響競爭因子,其次為產品屬性。就地區別觀察,台北市推案競爭程度高於台北縣,且郊區推案競爭程度較市中心大,嘗試打破過往以推案數或總銷金額等少數指標定義「一級戰區」之迷思。就個案而言,本研究之量化方法能協助判斷個案間之競爭程度,做為推案分析時競爭個案選取之依據;就市場分析整體而言,進一步了解次市場之推案競爭結構,作為新推個案供給者推案策略或產品定位之參考。 / The aim of this real estate analysis is to know what home owners want and how the construction developers analyze their housing projects. This market analysis will help housing developers better understand current and future market trends. In residential markets, the housing developers closely follow leading development projects and then decide to either follow the market trends or take an alternative development path. In the past market analysts attempted to define housing submarket trends and then cross-reference these trends with current market developments. However, these developers have been questioned that it is too subjective to choose potentially attractive development projects by relying on what is more likely qualitative market analysis instead of more objective quantitative data.
This paper creates a model to analyze competitive development projects more objectively than what was previously available. The paper will follow a comparison study of cross-referencing multiple development projects based on dual parallel models. The research area is based on Northern Taiwan in the metropolitan areas of Taipei City and New Taipei (previously known as Taipei County). The model used will first sample data from four market sensitive developer criteria which include housing attributes, housing prices, listing time and distance between dual project development models. Next, we measure the degree of competition between the development projects and give then a value between 0 and 1.
To establish value, if the housing attributes and the housing price of the two development projects are similar, the listing time is in the same month, and they are located adjacently, the value of competitive degree is closer to 1 and therefore more competitive. From the research findings, the distance is the most important factor of the four criteria and the housing attributes are the second. The research established the degree of competition in Taipei City is greater than in New Taipei. This research demonstrates that if used it will increase a housing developer’s objective understanding of correctly choosing a competitive project, and therefore better understand the overall market environment.
keyword:market analysis、similarity、the degree of competition
|
4 |
中文動詞自動分類研究 / Automatic Classification of Chinese Unknown Verbs曾慧馨, Tseng, Hui-Hsin Unknown Date (has links)
本文提出以規則法與相似法將未知動詞自動分類至中研院詞庫小組(1993)的動詞分類標記上。規則法中的規則從訓練語料中訓練出,並加上未知動詞重疊的規律,包含率約二成五,正確率約86.86%∼91.32%。規則法的優點在於正確率高,但缺點在於可以處理的未知動詞數量太少。相似法利用與未知動詞的相似例子猜測未知動詞的可能分類,利用詞彙內部的訊息---詞基的詞類、語意類與詞彙結構來計算相似度。相似法的可以全面性的處理未知動詞,缺點容易受到訓練語料中標記錯誤的例子誤導與訓練語料的大小所影響。我們結合規則法與相似法預測未知動詞分類的正確率為72%。 / We present two methods to classify the Chinese unknown verbs. First, we summarize some linguistic rules and morphological patterns from corpus. The accuracy of the rule-based method is 86.86%~91.32%. Second, we use the instance-based categorization to classify the Chinese unknown words. The accuracy of the instance-based method is 67.86%~70.92% and the accuracy of the integrated classifier is about 72%.
|
5 |
智慧型手機的使用者行為模式分析 / Behavior Analysis Based on Smart-phone User Logs許志毓, Hsu, Chih Yu Unknown Date (has links)
通訊技術的演化與智慧型手機的普及,改變了人際溝通的方式與手機的應用情境,在此變動快速的行動運算時代,欲研究探討使用者的行為模式,必須建立一個包含硬體、軟體與使用者社群的實驗平台,以量化的數據補強質性的觀察,準此,本論文將以現有之平台為基礎,強化其功能與易用性,方便其他研究者觀察資料的概況,並擷取符合某些條件之資料,此外,我們採用3-gram之應用程式序列,作為行為模式(behavior pattern)之特徵定義,配合不同的應用程式被使用之頻率,在相似度比較上進行不同比重的加權,根據實驗結果,可大致對使用者進行初步的分類,亦可利用此指標,針對已分類過的使用者更進一步探討之間的歧異程度。 / The rapid evolution of information technology and prevalence of smart-phones have changed the way people communicate. To effectively observe and investigate user behavior in this new era of mobile computing, an experimental platform that consists of hardware devices, software applications and user groups is essential. In this thesis, we enhance and extend the functions of a user log collection and analysis system to facilitate quick overview of the recorded data and allow flexible query/extraction of desired data segments for further processing. In addition, we employ 3-gram app log sequence as the main feature to characterize user behavior. A similarity measure that takes into account the relative app usage frequency has been defined to compare and classify users and their usage patterns. Experimental results indicate that this measure can effectively distinguish users of different traits given enough time period of observation.
|
6 |
以機器學習改善實證相似度技術指標交易策略之研究 / Adapting machine learning to similarity-based technical trading sstrategies陳致鈞 Unknown Date (has links)
技術面分析是使用過去市場資料包含股票價格與交易量來預測未來市場動態。技術分析將股價與交易量經由數學轉換成易懂且能繪製成圖表的技術分析指標,幫助技術分析投資人預測未來股價。本文的決策過程有別於傳統的技術面分析,使用相似度模型以貼近現實技術分析投資人的決策過程。此策略使用多個技術指標作為相似度技術指標交易策略的依據,用以捕捉市場動態與預測未來股價報酬,且即便不同的技術指標提供不同的買賣訊號,技術分析投資人依然可以藉由相似度技術指標交易策略進行投資決策。相似度技術指標交易策略所預測的未來報酬是根據過往價格圖形出現相似情境的報酬加權平均作為未來預測報酬。當預測報酬為正則買;預測報酬為負則賣。本文使用S&P500指數期貨來檢測相似度技術指標交易策略的獲利能力,發現在不同的技術指標下,相似度技術指標交易策略報酬顯著異於零也高於S&P500指數期貨在樣本期間內的B/H報酬。為使本文相似度技術指標交易策略更能模擬現實投資人的真實情況,導入機器學習改善相似度技術指標交易策略,分別使用貪婪演算法與模擬淬鍊法(Simulated Annealing)來模擬現實投資人會根據交易策略表現的好壞變更決策過程的策略。其報酬顯著異於零也高於S&P500指數期貨在樣本期間內的B/H報酬。本研究發現投資人會參考不同的混合技術指標策略,且會依照不同混合策略的過往績效,篩選出參考策略,進而決定投資策略,這也呼應混合技術指標的相似度技術指標交易策略比單一技術指標的相似度技術指標交易策略擁有較好的預測能力。因此使用混合技術指標的相似度技術指標交易策略作為機器學習篩選的策略可有效的改善原本的相似度技術指標交易策略。
|
7 |
英漢專利文書文句對列與應用 / English and Chinese Sentence Alignment for Statements in Patent Documents and its Applications田侃文 Unknown Date (has links)
綜觀現今全球化的趨勢,世界各國皆進行跨語言的專利文書翻譯工作。在專利文書翻譯及跨語言檢索方面,蒐集大量且正確的專利文書平行語料能夠協助相關研究的進行。利用人工進行平行語料文句的對列工作相當費時,因此,本研究利用斷句、斷詞及英文詞幹還原等前處理技術,搭配中英技術名詞對應表,透過統計詞頻調整對應詞組的權重,並以句子間的餘弦相似度作為輔助,計算中英文句子間的相似度,最後利用動態規劃演算法挑選最佳的對列組合,發展出一套中英文句對列的系統。以精確率及召回率評比對列成效,並將對列後產生的句對作為輔助式機器翻譯系統詞序調動的訓練語料,以2003年國際數學語科學教育成就趨勢調查測驗試題作為翻譯對象,採用BLEU及NIST的評比方式進行評估。實驗結果顯示本系統不僅在1:1對列模式的精確率達到0.995,且利用門檻值篩選出的大量中英文句對,確實能夠提升輔助式機器翻譯系統的翻譯品質。 / The importance of cross-language translation of patent documents has grown substantially as a result of globalization. Accurately aligned parallel corpora help researchers conduct their research projects that depend on bilingual data to develop techniques such as computer-aided translation and cross-language information retrieval. It takes time to collect parallel data manually; therefore, an English-Chinese sentence alignment system was built that will automatically complete this process.
A variety of preprocessing techniques for natural language processing were used, such as the stemming of the English words, to build this system. Two parts of scores were considered to align sentences. The first part considered the number and weight of aligned word pairs in the Chinese and English sentences. The second part came from a special way to compute the cosine value of the Chinese and English sentence pairs. Precision and recall rates were used to evaluate the quality of the aligned results and the 1:1 alignment achieved 0.995 precision. In addition, the aligned sentences were used as training data in a machine translation for the TIMSS test items, experimental results show that the aligned sentences are helpful for the translation system.
|
8 |
建構人脈社會網絡人才推薦系統之研究-以某國立大學EMBA人才庫為例 / Social network-based specialist recommendation system- a case of national university EMBA datebase呂春美, Lu, Chun-Mei Unknown Date (has links)
根據2010年人力銀行調查54%的尋找人才是以「工作分析」為主要依據,可見人才的遴選仍以經歷為主要因素。而近年來社會網絡與推薦系統普為應用於人才之找尋。
本研究實際以某國立大學EMBA學員資料,以同學與同事關係建置一個人脈社會網絡之人才推薦系統。本系統能依據使用者所輸入之人才搜尋條件,藉由距離相似度之運算,找出最接近的所需人才,並依距離相似度排序。其次,本系統可由各成員學歷,工作經歷所在之產業別,以及在組織中任職之功能別,來呈現人才之專業輪廓(Professional Profile),以作為決策者在遴選人才之依據。並提供所有關係路徑,以利使用者可進一步的諮詢路徑上成員對於推薦人選之評價。
本研究針對該校EMBA學員共計2,121人,應用資料探勘中群集分析建立推薦系統,有別於一般以關鍵字比對的搜尋方式,能找出與使用者需求條件相似度高的人才;並藉由人脈社會網路路徑,幫助使用者藉由自身的人脈評估推薦的結果。最後,本研究並提出結論、建議以及未來研究方向。 / Abstract
According to the Job Bank survey in 2010, about 54% recruiters who search for specialist is mainly based on job analysis. This research is based on Social Network and Rcommendation system to build a relationship between the students and the colleagues with the personnel social network contacts, thus, a specialist recommendation system is constructed. First the system can compute the dissimilarity between the conditions users input and the background of people, find out the closest result required by sorting of similarity. Secondly, the professional profiles is established by the education background and work experiences (contain the various industries and position type), to serve as the basis for decision-makers in the selection of specialist. Besides, they can also inquire people from social network path for further appraisals of the candidate.
The research is based on EMBA students totaled 2121 people, applying cluster analysis of data mining to build up the recommendation system, opposite to using key-word matching as a way to search people. Thus, the study can find the highest similar conditions demand of input. Via the associated social networks paths, to help users identify and use their own network to assess the recommend candidates.
Finally, this study proposes conclusions, recommendations and future research directions.
Keywords: Social Network , Similarity , Professional Profile , Specialist Recommendation System , Social Network Path
|
9 |
以認知與學習學理為基之漢字遊戲與其輔助設計系統 / A cognition-based interactive game platform for learning chinese characters and a computer-assisted listing chinese characters system張裕淇, Chang, Yu Chi Unknown Date (has links)
我們一共建立了兩個軟體,一個為遊戲系統軟體,另一個是電腦輔助列字軟體。遊戲系統是針對中文學習者而做的,目的在於讓初階中文學習者可以學會分辨各個形聲字以及認識一些中文詞彙,另外也為了讓此系統更具有吸引力,擴增了一些遊戲該有的輔助功能,讓學習者能夠更投入於此遊戲中;電腦輔助列字系統的目的是針對遊戲系統的資料庫做管理,為了讓遊戲題庫的題目有彈性的變化,讓使用者能更簡單管理遊戲題庫,除了提供了一個介面做管理之外,也會列出一些可能的中文候選字,讓使用者不必花太多時間去找尋字。 / We designed two systems. One is a game system, and the other is an interface for managing one of the game’s databases. The game’s system is used for learning Chinese characters, and let children know how to distinguish between each others. Also, it provides some Chinese words for learning. The other system is used for managing one of the game’s databases. Users can change, add or delete the questions from the database by themselves. The system also provides some Chinese characters to users so that they don’t need to waste time searching characters, and add them into the game’s database.
|
10 |
電腦輔助語言學習之研究-以我國學生學習日語為例 / A Study of Computer Aided Language Learning-Taiwan Students Learning Japanese as an Example王珮姍, Wang, Pei Shan Unknown Date (has links)
本研究針對我國學生學習日語發音進行相似度指標發展之初探,貢獻為針對目前日語發音提供一個相似度的指標可以和老師語音進行比較分析,找出分析日語發音相似度之模式。
研究從聲音數位化的角度切入,有別於過去研究使用語音辨識的方式來進行,聲音數位化後為數值的方式,因此使用指標來計算相似的程度。研究提出一套對應的聲音相似度指標,以電腦分析輔助日語學習者的發音練習。
指標建立過程由聲音取樣、正規化、端點偵測,到實際的運算,使用所蒐集的聲音資料來測試指標的穩定度與有效性,研究結果說明在以日語為母語者間的指標都很靠近,而不同日語腔調間會有一定的指標差異,對於一定日語程度的對象而言,指標落點很靠近,惟本研究此次蒐集到的聲音資料,其應用指標運算結果的分佈太過集中,如果能有更多樣化的聲音資料來測試指標應能有較漂亮的分佈圖形。 / This research includes developing a similarity index applies to the evaluation of Taiwan students learning Japanese pronunciation. The contribution of this research is that it provides a similarity index to the Japanese pronunciation comparing to the teacher’s pronunciation, finding the model of how to analysis the similarity of Japanese pronunciation.
This research uses the digital audio processing to begin with, which is different from the other research that uses the speech recognition to evaluate the pronunciation. The audio will turn into numerical format after digitalize, so this research uses an index to calculate the similarity. By using this similarity index, the computer can become an assistant role that helps to analysis while learning Japanese pronunciation.
The developing of index starts from audio sampling, audio normalizing, and end-point detection to the calculation of similarity index. This research collects audio data to test the stability and the validity of the similarity index. The result indicates that the similarity index of native Japanese speakers is very close;and the similarity index contains certain difference between different accents. For those Taiwan students who qualify with Japanese, their similarity index is close. Nevertheless, the result of the similarity index is too centralized, it would be better if there are more audio data to test the similarity index.
|
Page generated in 0.0198 seconds