Global ETD Search

1	利用詞組檢索中文訴訟文書之研究 / An Exploration of Indexing Chinese Judicial Documents with Term Pairs 謝淳達, Hsieh,Chwen-Dar Unknown Date (has links) 本文將針對相似訴訟文書之搜尋進行研究與探討。在這裡所說的「相似案件」指的是有著相同犯罪行為的案件。判例是法院對於訴訟案件所作的確定判決的先例。在法律案件審判的過程中，對法官和律師而言，與目前的新案件案情相似的過去判例有時是有參考價值的。這意味著我們可以透過判例來推測新的訴訟案件可能的判決方向，因此搜尋過去判例是有其價值的。與一般常用的資訊檢索方法中以單一詞彙作為索引不同的是，我們嘗試以案件事實段中的詞組（兩個詞彙的組合）集合為基礎，由於詞組所包含的資訊比詞彙還多，我們希望透過詞組集合的比對，能夠更精確地找出類似於新案件的過去判例，藉此幫助一般人搜尋過去的相似判例，並能夠從過去判例中自行推測所遇上的法律糾紛可能的判決方向。然而，由於既有的電子詞典並未包含所有可能的詞彙，尤其是訴訟文件中常出現的一些特定詞彙，因此我們提出了一個可以從文件中自動擷取可能的中文詞彙的方法，並且利用這些擷取而得的詞彙協助我們分析判決書的事實段文字。此外我們將相似案件搜尋系統應用在實作「案件分類器」上，用以猜測新案件可能的案件類型。在我們的實驗中，我們提出的中文詞彙擷取方法TermSpotter所擷取出來的詞彙中，詞頻為30次以上的擷取正確率（人工判定為有用的詞彙數量╱程式輸出詞彙數量）為56.3%，而且這些詞彙經過人工過濾後，有三分之一的詞彙（953個）是HowNet電子詞典中所沒有的詞彙。而我們實作的案件分類器，在竊盜、搶奪、強盜、贓物、恐嚇、傷害、賭博七大類型案件的案由分類實驗有89.3%的正確率，而賭博罪的法條分類實驗也有81.9%的正確率。至於相似案件搜尋實驗中，我們以人工判斷其效果，目前所搜尋到的過去判例只有42%是值得參考的，未來仍有空間需要繼續嘗試改進。 / I study information retrieval methods for retrieving similar judicial documents. Here “similar judicial documents” refers to “cases that have a similar process of criminal violation”. For judges and lawyers, it is sometimes worth referring to prior cases which are similar to the new case in the process of judgment. Information about the judgments of the similar prior cases helps people to obtain a rough picture about how the new cases might be judged. In this work, I use phrases, rather than individual words as indices of Chinese judicial documents. Phrases provide a better foundation for indexing and retrieving documents than individual words. Constituents of phrases make other component words in the phrase less unambiguous than when the words appear separately. I expect the system could help anyone who is not a legal expert to retrieve similar prior cases on their own. The existing electronic dictionary does not collect all the possible words, especially the words that appear in specific-domain documents. Hence, I put forth an algorithm to automatically retrieve possible words in the corpus, and we will use these words as the basis to construct phrases in our system. Moreover, I implement the case classifier to automatically classify new cases into several different prosecution categories. I put forth the algorithm “TermSpotter” to automatically retrieve possible words that occur more than 30 times. In the experiments, 56.3% of the retrieved words are considered as useful words after manual filtration. Among these useful words, about one third of the words are not included in HowNet, and some of them are legal-domain-specific words. The implemented case classifier categorizes new cases into seven different prosecution categories: larceny, robbery, robbery by threatening or disabling the victims, receiving stolen property, causing bodily harm, intimidation, and gambling. It reaches 89.3% in accuracy. The classifier can also categorize cases based on what criminal articles are violated. In the experiment of classifying gambling cases into four combinations of three articles, it reaches 81.9% in accuracy. In the experiment of retrieving prior cases which are similar to the new case, it only reaches 42% in accuracy judged by a practicing judge, so there is a lot of work to do to improve the classifier. 法學資訊自然語言處理 Machine Learning
2	中文訴訟文書檢索系統雛形實作 / A Prototype of Information Services for Chinese Judicial Documents 藍家樑, Lan, Chia Liang Unknown Date (has links) 訴訟案件與日俱增，欲閱讀完所有案件顯然不容易，此時便需要一套較完善的檢索系統來輔助使用者。我們整合前人的相關研究成果，實作一套分群式檢索系統的雛形，依檢索條件搜尋相關案件，並將結果分群輸出，便於使用者對各群集進行查詢，以期減少使用者閱讀案件上的負擔，同時獲得較完整資訊。另設計文件標記與註解功能，供使用者建立個人化資料庫，便於日後檢索。當輸入為關鍵詞時我們利用階層式分群法來為結果作分群，也以共現詞彙的概念建立的索引，列出可能的相關詞彙提供使用者作查詢；檢索條件亦可輸入一段犯罪事實，系統透過k最近鄰居法的概念，找到相似的案件，依照案由分群。另外也可以透過判決刑期分佈針對特定區間作檢索。本系統難以進行較正規的實驗，因為這是一個使用者互動的系統，而適不適用也難有一個評定標準。我們從使用者的執行效率，以及對於分群結果的相似度與判決刑期統計來分析與討論，檢驗本系統對使用者的助益以及討論系統本身須要再改善之處。 / Because cumulative number of the judgments grows unceasingly, it is obviously not easy for the users to read all the judicial documents. They need a handier system to retrieve the judgment information. We present a prototype of clustering retrieval system for Chinese judicial documents. The system can automatically cluster and integrate the search results. It is easy for the users to focus on the information they need and pass over the others. When they read a judicial document, they can mark some parts of sentences or annotate some comments if they are interested in. We let them create the personalized database and search more easily. We can type a keyword, and then our system executes the hierarchical clustering method to cluster search results. We also can view some words which may be relative to the keyword from the collocation word lists. Besides we can input a crime description, and then our system executes the k-nearest neighbor method to classify the crime into some prosecution reason and provide the similar cases. Moreover, our system lets the users view the distribution of prison sentence lengths and the documents in the specific interval. A formal evaluation of our system is not easy because this is an interactive system. We cannot definitely judge whether it is helpful or unhelpful. We evaluated the efficiency of our system by the operations of human subjects. Besides we made some statistics about the similarity and the distribution of prison sentence lengths from the clustering results. We tried to discuss the help by our system for users and how to improve the system. 法學資訊系統自然語言處理階層式分群法 k最近鄰居法
3	中文詞彙集的來源與權重對中文裁判書分類成效的影響 / Exploring the Influences of Lexical Sources and Term Weights on the Classification of Chinese Judgment Documents 鄭人豪, Cheng, Jen-Hao Unknown Date (has links) 國外法學資訊系統已研究多年，嘗試利用科技幫助提昇司法審判的效率。重要的議題包括輔助判決，法律文件分類，或是相似案件搜尋等。本研究將針對中文裁判書的分類做進一步談討。在文件特徵表示方面，我們以有序詞組來表達中文裁判書，我們嘗試比較採用不同的詞彙來源對於分類效果的影響。實驗中我們分別採用一般通用的電子詞典建立一般詞組；以及以演算法取出法學專業詞彙集建立專業詞組。並依tf-idf(term frequency – inverse document frequency)的概念，設計兩種詞組權重tpf-idf(term pair frequency – inverse document frequency)以及tpf-icf(term pair frequency – inverse category frequency)，來計算特徵詞組權重。在文件分類演算法方面，我們實作以相似度為基礎的k最近鄰居法作為系統分類機制，藉由裁判書的案由欄位，將案例分為七種類別，分別為竊盜、搶奪、強盜、贓物、傷害、恐嚇以及賭博。並藉由觀察案例資料庫的相似度分佈，以找出恰當的參數，進一步得到較佳的分類正確率與較低的拒絕率。我們並依照自省式學習法的精神，建立權重調整的機制。企圖藉由自省式學習法提昇分類效果，以及找出對分類有影響的詞組。而我們以案例資料庫的相似度差異值以及距離差異值，分析調整前後案例資料庫的變化，藉以觀察自省式學習法的效果。 / Legal information systems for non-Chinese languages have been studied intensively in the past many years. There are several topics under discussion, such as judgment assistance, legal document classification, and similar case search, and so on. This thesis studies the classification of Chinese judgment documents. I use phrases as the indices for documents. I attempt to compare the influences of different lexical sources for segmenting Chinese text. One of the lexical sources is a general machine-readable dictionary, Hownet, and the other is the set of terms algorithmically extracted from legal documents. Based on the concept of tf-idf, I design two kinds of phrase weights: tpf-idf and tpf-icf. In the experiments, I use the k-nearest neighbor method to classify Chinese judgment documents into seven categories based on their prosecution reasons: larceny(竊盜), robbery (搶奪), robbery by threatening or disabling the victims (強盜), receiving stolen property (贓物), causing bodily harm (傷害), intimidation (恐嚇), and gambling(賭博). To achieve high accuracy with low rejection rates, I observe and discuss the distribution of similarity of the training documents to select appropriate parameters. In addition, I also conduct a set of analogous experiments for classifying documents based on the cited legal articles for gambling cases. To improve the classification effects, I apply the introspective learning technique to adjust the weights of phrases. I observe the intra-cluster similarity and inter-cluster similarity in evaluating the effects of weight adjustment on experiments for classifying documents based on their prosecution reasons and cited articles. 法學資訊系統自然語言處理 k最近鄰居法自省式學習法 Legal information system Natural language processing k nearest neighbor introspective learning

1

Page generated in 0.0432 seconds