Global ETD Search

91	The research on chinese text multi-label classification / Avancée en classification multi-labels de textes en langue chinoise / 中文文本多标签分类研究 Wei, Zhihua 07 May 2010 (has links) Text Classification (TC) which is an important field in information technology has many valuable applications. When facing the sea of information resources, the objects of TC are more complicated and diversity. The researches in pursuit of effective and practical TC technology are fairly challenging. More and more researchers regard that multi-label TC is more suited for many applications. This thesis analyses the difficulties and problems in multi-label TC and Chinese text representation based on a mass of algorithms for single-label TC and multi-label TC. Aiming at high dimensionality in feature space, sparse distribution in text representation and poor performance of multi-label classifier, this thesis will bring forward corresponding algorithms from different angles.Focusing on the problem of dimensionality “disaster” when Chinese texts are represented by using n-grams, two-step feature selection algorithm is constructed. The method combines filtering rare features within class and selecting discriminative features across classes. Moreover, the proper value of “n”, the strategy of feature weight and the correlation among features are discussed based on variety of experiments. Some useful conclusions are contributed to the research of n-gram representation in Chinese texts.In a view of the disadvantage in Latent Dirichlet Allocation (LDA) model, that is, arbitrarily revising the variable in smooth process, a new strategy for smoothing based on Tolerance Rough Set (TRS) is put forward. It constructs tolerant class in global vocabulary database firstly and then assigns value for out-of-vocabulary (oov) word in each class according to tolerant class.In order to improve performance of multi-label classifier and degrade computing complexity, a new TC method based on LDA model is applied for Chinese text representation. It extracts topics statistically from texts and then texts are represented by using the topic vector. It shows competitive performance both in English and in Chinese corpus.To enhance the performance of classifiers in multi-label TC, a compound classification framework is raised. It partitions the text space by computing the upper approximation and lower approximation. This algorithm decomposes a multi-label TC problem into several single-label TCs and several multi-label TCs which have less labels than original problem. That is, an unknown text should be classified by single-label classifier when it is partitioned into lower approximation space of some class. Otherwise, it should be classified by corresponding multi-label classifier.An application system TJ-MLWC (Tongji Multi-label Web Classifier) was designed. It could call the result from Search Engines directly and classify these results real-time using improved Naïve Bayes classifier. This makes the browse process more conveniently for users. Users could locate the texts interested immediately according to the class information given by TJ-MLWC. / La thèse est centrée sur la Classification de texte, domaine en pleine expansion, avec de nombreuses applications actuelles et potentielles. Les apports principaux de la thèse portent sur deux points : Les spécificités du codage et du traitement automatique de la langue chinoise : mots pouvant être composés de un, deux ou trois caractères ; absence de séparation typographique entre les mots ; grand nombre d’ordres possibles entre les mots d’une phrase ; tout ceci aboutissant à des problèmes difficiles d’ambiguïté. La solution du codage en «n-grams »(suite de n=1, ou 2 ou 3 caractères) est particulièrement adaptée à la langue chinoise, car elle est rapide et ne nécessite pas les étapes préalables de reconnaissance des mots à l’aide d’un dictionnaire, ni leur séparation. La classification multi-labels, c'est-à-dire quand chaque individus peut être affecté à une ou plusieurs classes. Dans le cas des textes, on cherche des classes qui correspondent à des thèmes (topics) ; un même texte pouvant être rattaché à un ou plusieurs thème. Cette approche multilabel est plus générale : un même patient peut être atteint de plusieurs pathologies ; une même entreprise peut être active dans plusieurs secteurs industriels ou de services. La thèse analyse ces problèmes et tente de leur apporter des solutions, d’abord pour les classifieurs unilabels, puis multi-labels. Parmi les difficultés, la définition des variables caractérisant les textes, leur grand nombre, le traitement des tableaux creux (beaucoup de zéros dans la matrice croisant les textes et les descripteurs), et les performances relativement mauvaises des classifieurs multi-classes habituels. / 文本分类是信息科学中一个重要而且富有实际应用价值的研究领域。随着文本分类处理内容日趋复杂化和多元化，分类目标也逐渐多样化，研究有效的、切合实际应用需求的文本分类技术成为一个很有挑战性的任务，对多标签分类的研究应运而生。本文在对大量的单标签和多标签文本分类算法进行分析和研究的基础上，针对文本表示中特征高维问题、数据稀疏问题和多标签分类中分类复杂度高而精度低的问题，从不同的角度尝试运用粗糙集理论加以解决，提出了相应的算法，主要包括：针对n-gram作为中文文本特征时带来的维数灾难问题，提出了两步特征选择的方法，即去除类内稀有特征和类间特征选择相结合的方法，并就n-gram作为特征时的n值选取、特征权重的选择和特征相关性等问题在大规模中文语料库上进行了大量的实验，得出一些有用的结论。针对文本分类中运用高维特征表示文本带来的分类效率低，开销大等问题，提出了基于LDA模型的多标签文本分类算法，利用LDA模型提取的主题作为文本特征，构建高效的分类器。在PT3多标签分类转换方法下，该分类算法在中英文数据集上都表现出很好的效果，与目前公认最好的多标签分类方法效果相当。针对LDA模型现有平滑策略的随意性和武断性的缺点，提出了基于容差粗糙集的LDA语言模型平滑策略。该平滑策略首先在全局词表上构造词的容差类，再根据容差类中词的频率为每类文档的未登录词赋予平滑值。在中英文、平衡和不平衡语料库上的大量实验都表明该平滑方法显著提高了LDA模型的分类性能，在不平衡语料库上的提高尤其明显。针对多标签分类中分类复杂度高而精度低的问题，提出了一种基于可变精度粗糙集的复合多标签文本分类框架，该框架通过可变精度粗糙集方法划分文本特征空间，进而将多标签分类问题分解为若干个两类单标签分类问题和若干个标签数减少了的多标签分类问题。即，当一篇未知文本被划分到某一类文本的下近似区域时，可以直接用简单的单标签文本分类器判断其类别；当未知文本被划分在边界域时，则采用相应区域的多标签分类器进行分类。实验表明，这种分类框架下，分类的精确度和算法效率都有较大的提高。本文还设计和实现了一个基于多标签分类的网页搜索结果可视化系统（MLWC），该系统能够直接调用搜索引擎返回的搜索结果，并采用改进的Naïve Bayes多标签分类算法实现实时的搜索结果分类，使用户可以快速地定位搜索结果中感兴趣的文本。 La Classification de texte N-grams Codage de la texte chiniose La classification multi-labels Latent Dirichlet Model L’ensembles approximatifs Assouplissement Le corpus de textes chinois multi-labels Chinese text classification Text representation Multi-label classification Rough Set Latent Dirichlet Allocation (LDA) Classification method Smoothing model Chinese text multi-label corpus 中文文本分类文本表示多标签分类 N-gram 粗糙集隐含狄利克雷分配分类器设计同济多标签网页分类系统中文文本多标签语料库
92	農村土地承包經營權流轉的法律問題研究 =Research on the circulation legal issues of the contracted management right of rural land / Research on the circulation legal issues of the contracted management right of rural land 胡守鑫 January 2016 (has links) University of Macau / Faculty of Law 土地佔有權 -- 法規 -- 中國 Farm tenancy -- China 中文法學 -- 法學院
93	P2P 網絡借貸的法律風險與規制 =Legal risk and regulation of P2P lending / Legal risk and regulation of P2P lending 蔣東霖 January 2016 (has links) University of Macau / Faculty of Law Microfinance -- China 微型金融小額信貸 -- 中國中文法學 -- 法學院
94	中文對話中的異議現象 / Disagreement in Mandarin Chinese Conversation 林智怡, Lin, Zhi-Yi Unknown Date (has links) 國立政治大學研究所碩士論文提要研究所別: 語言學研究所論文名稱: 中文對話中的異議現象指導教授: 李櫻博士研究生: 林智怡論文提要內容: 共壹冊，分伍章在日常對話中，我們觀察到衝突對話是避免出現的，而大致上人們較傾向給予同意，而不傾向行使異議對話。既然在中國社會裡，面子和禮貌是相當受重視的，因此在中文對話中避免行使異議對話尤其重要。雖然在很多情境下，禮貌意味著盡量給予同意而避免異議，但顯然人們並不總是在同意他人。如果一個人不同意他的談話對象所表達的意見，他可以用一些禮貌的方式來避免威脅到對方的顏面。本篇研究的重點就是在調查當說話者要行使異議時，他們對不同的語用策略(pragmatic strategies)及語言特徵(linguistic features)的運用，並將探討說話者對談話原則中的合作原則(CP)及禮貌原則(PP)的遵守或違反。在調查的過程中，性別差異也將列入考慮。在這篇研究當中，我們收集了九筆日常面對面對話互動的語料：三筆男對男的對話，三筆女對女的對話，三筆男與女的對話。每筆語料都是錄音自兩位熟朋友間大約四十五分鐘的對話。在語料分析的過程中，我們同時採用質性與量化的探討來調查並解釋中文對話中的異議現象。研究結果顯示在異議時所呈現的語用策略及語言特徵似乎互相矛盾。人們傾向使用較具侵略性及競爭性的糾正策略(correction)來行使他們的異議，反而傾向用較宛轉的語言特徵來表示異議。然而，這並不是一個真正的矛盾，相反的，它可能顯示出年輕的族群不只在乎禮貌的和諧關係，同時也用較直接的語用策略來表示他們之間的一致性(solidarity)。除了用較宛轉的語言特徵來緩和行使異議時所用的較直接的語用策略外，人們在行使異議時也違反一些合作原則來實行禮貌原則或其他對話原則。不論人們如何行使異議，表面上禮貌或不禮貌，違反合作原則來遵守其他的談話原則的最終目的都是得體合宜及禮貌。至於異議表現的性別差異，雖然一般的印象中是男性的互動比女性較具競爭性、侵略性及好辯，然而我們這裡的研究並不完全和這樣的模式吻合。女性現在較主動表達她們的意見，然而她們自由表達她們的想法只侷限在同樣是女性面前。換句話說，雖然現在的女性比傳統女性受較好的教育，男性也已意識到表現騎士風範及尊重女性的重要性，然而男女之間的權力(power)差異，似乎仍在現代社會中有著影響力。 / Abstract In daily conversation, it can be observed that conflict talk is avoided and agreement is generally preferred over disagreement. Avoidance of disagreement plays an especially important role in Mandarin Chinese conversation, since face and politeness are valued high in Chinese society. Although in many contexts being polite means maximizing areas of agreement and minimizing disagreement, clearly people do not always agree; and if one does not agree with the views expressed by a conversational partner, there are polite options available for him to avoid any possible threat on the interlocutor’s face. The focus of this study is to investigate the speakers’ use of different pragmatic strategies and linguistic features when disagreement arises, and will also discuss the speakers’ observation or violation of the conversational principles of CP and PP. In the process of our investigation, gender difference will be taken into consideration. In this study, we collect 9 dyadic face to face daily conversations: three male-to-male conversations, three female-to-female conversations, and three mixed-gender conversations. Each conversation is given by two close friends and tape-recorded around forty-five minutes. In the process of data analysis, qualitative as well as quantitative analysis will both be adopted to investigate and explain the phenomenon of disagreement in Mandarin Chinese conversation. The results of this study indicate that the performance of the pragmatic strategies and the linguistic features in disagreement seems to be a contradiction. People tend to choose the aggressive and competitive strategy of correction to perform their disagreement, while they tend to choose mitigating features when performing their disagreement. However, this may not be a real contradiction. Instead, it may show that the young group’s concern is not only the harmonious relationships observed in politeness, but also the solidarity signaled by using direct pragmatic strategies. Besides using the mitigating linguistic features to tone down the direct pragmatic strategies in the performance of disagreement, people also violate some CP maxims to uphold the PP and other conversation principles when performing their disagreement. No matter how the disagreement is performed, superficially polite or impolite, the major concern and ultimate achievement of the CP violation for the sake of observation of other linguistic maxims is appropriateness and politeness. As for the gender differences in the performance of disagreement, though the overall impression is that male interaction is typically more competitive, aggressive and argumentative than female, our study here does not completely match with the general pattern. Females are now more active in expressing their opinions; however, they freely express their thought only in front of their own sex. In other words, the power difference between males and females still has its influence in this modern society though now females are better-educated than the traditional women and males have already known the importance of showing chivalry and respect to females. 異議中文對話語用策略語言特徵合作原則禮貌原則性別衝突對話 disagreement Mandarin Chinese conversation pragmatic strategies linguistic features CP PP gender conflict talk
95	Brand name translation : How translation distorts Oriflame’s Chinese brand name communication Arcangeli, Fabio, Edlund, Anna January 2010 (has links) <p>This pre-study explores how the process of translating from English to Chinese may distort intended brand name messages, using Oriflame as a case study. The findings show that the brand name had a tendency to be perceived as phonetic rather than phonosemantic and that the character combination was perceived to make no clear sense. The study identified these as two main reasons to why Oriflame’s intended brand name messages did not get through.</p> / <p>Denna förstudie utforskar hur varumärkesnamnens avsedda budskap kan bli förvrängda genom översättningsprocessen från engelska till kinesiska genom att använda Oriflame som en fallstudie. Resultaten visar på tendenser för varumärkesnamnet att uppfattas som fonetiskt snarare än fonosemantiskt och att kombinationen av tecknen inte anses vara begriplig. Studien identifierade dessa två resultat som den främsta anledningen till att Oriflames avsedda budskap inte nådde fram.</p> China chinese Oriflame brand brand name translation english international business international companies international company Kina kinesiska Oriflame varumärke varumärkesnamn översättning engelska internationella företag Business studies Företagsekonomi
96	粵語流行曲詞研究 = The study of lyric of Cantonese popular song / Study of lyric of Cantonese popular song 勞婉莎 January 2004 (has links) University of Macau / Faculty of Social Sciences and Humanities / Department of Chinese University of Macau -- Dissertations 澳門大學 -- 論文
97	"史記" "者" 、"所"指稱研究探新 = The new research method on the reference to Zhe & Suo in Shiji / New research method on the reference to Zhe & Suo in Shiji;"史記者所指稱研究探新" 董月凱 January 2004 (has links) University of Macau / Faculty of Social Sciences and Humanities / Department of Chinese University of Macau -- Dissertations 澳門大學 -- 論文 Chinese language -- To 600 -- Particles 中文 -- 至 600 年 -- 虛詞
98	創造思考策略融入中國語文科教學對學生創造力之影響 = The effects of inclusion of creative thinking strategy in the teaching of Chinese lessons on student creativity / Effects of inclusion of creative thinking strategy in the teaching of chinese lessons on student creativity 簡穗川 January 2006 (has links) University of Macau / Faculty of Education 澳門大學 -- 論文 University of Macau -- Dissertations
99	在澳門初中推行中文傳意寫作的探索性研究 / Exploratory study of Chinese communicative writing in junior secondary schools in Macao 蕭美歡 January 2004 (has links) University of Macau / Faculty of Education 澳門大學 -- 論文 University of Macau -- Dissertations 課程設計及管理 -- 教育學院
100	一位澳門小學普通話教師教學專業知識實踐之個案研究 / Case study of the pedagogical practice of a Putonghua teacher in Macau 汪以慧 January 2008 (has links) University of Macau / Faculty of Education University of Macau -- Dissertations 澳門大學 -- 論文初小教師 -- 澳門 -- 個案研究

Search results