Global ETD Search

61	Pattern Recognition in the Usage Sequences of Medical Apps / Analyse des Séquences d'Usage d'Applications Médicales Adam, Chloé 01 April 2019 (has links) Les radiologues utilisent au quotidien des solutions d'imagerie médicale pour le diagnostic. L'amélioration de l'expérience utilisateur est toujours un axe majeur de l'effort continu visant à améliorer la qualité globale et l'ergonomie des produits logiciels. Les applications de monitoring permettent en particulier d'enregistrer les actions successives effectuées par les utilisateurs dans l'interface du logiciel. Ces interactions peuvent être représentées sous forme de séquences d'actions. Sur la base de ces données, ce travail traite de deux sujets industriels : les pannes logicielles et l'ergonomie des logiciels. Ces deux thèmes impliquent d'une part la compréhension des modes d'utilisation, et d'autre part le développement d'outils de prédiction permettant soit d'anticiper les pannes, soit d'adapter dynamiquement l'interface logicielle en fonction des besoins des utilisateurs. Tout d'abord, nous visons à identifier les origines des crashes du logiciel qui sont essentielles afin de pouvoir les corriger. Pour ce faire, nous proposons d'utiliser un test binomial afin de déterminer quel type de pattern est le plus approprié pour représenter les signatures de crash. L'amélioration de l'expérience utilisateur par la personnalisation et l'adaptation des systèmes aux besoins spécifiques de l'utilisateur exige une très bonne connaissance de la façon dont les utilisateurs utilisent le logiciel. Afin de mettre en évidence les tendances d'utilisation, nous proposons de regrouper les sessions similaires. Nous comparons trois types de représentation de session dans différents algorithmes de clustering. La deuxième contribution de cette thèse concerne le suivi dynamique de l'utilisation du logiciel. Nous proposons deux méthodes -- basées sur des représentations différentes des actions d'entrée -- pour répondre à deux problématiques industrielles distinctes : la prédiction de la prochaine action et la détection du risque de crash logiciel. Les deux méthodologies tirent parti de la structure récurrente des réseaux LSTM pour capturer les dépendances entre nos données séquentielles ainsi que leur capacité à traiter potentiellement différents types de représentations d'entrée pour les mêmes données. / Radiologists use medical imaging solutions on a daily basis for diagnosis. Improving user experience is a major line of the continuous effort to enhance the global quality and usability of software products. Monitoring applications enable to record the evolution of various software and system parameters during their use and in particular the successive actions performed by the users in the software interface. These interactions may be represented as sequences of actions. Based on this data, this work deals with two industrial topics: software crashes and software usability. Both topics imply on one hand understanding the patterns of use, and on the other developing prediction tools either to anticipate crashes or to dynamically adapt software interface according to users' needs. First, we aim at identifying crash root causes. It is essential in order to fix the original defects. For this purpose, we propose to use a binomial test to determine which type of patterns is the most appropriate to represent crash signatures. The improvement of software usability through customization and adaptation of systems to each user's specific needs requires a very good knowledge of how users use the software. In order to highlight the trends of use, we propose to group similar sessions into clusters. We compare 3 session representations as inputs of different clustering algorithms. The second contribution of our thesis concerns the dynamical monitoring of software use. We propose two methods -- based on different representations of input actions -- to address two distinct industrial issues: next action prediction and software crash risk detection. Both methodologies take advantage of the recurrent structure of LSTM neural networks to capture dependencies among our sequential data as well as their capacity to potentially handle different types of input representations for the same data. Exploration de motifs fréquents Représentations pour l’apprentissage Représentations d’action Clustering Réseaux de Neurones Récurrents LSTM Frequent pattern mining Representation learning Action embeddings Clustering LSTM Recurrent Neural Networks
62	Leveraging Sequential Nature of Conversations for Intent Classification Gotteti, Shree January 2021 (has links) No description available. Computer Science Conversation Understanding Multi-labeled Text Classification Intent Classification Similarity Measures Sequential Pattern Mining Hierarchical Goal/Intent Networks Natural Language Understanding
63	A Data Driven Retrospective Study for Medication Strategy Analyses on Longitudinal Prescription Records / 長期処方記録上の薬物処方戦略分析のためのデータ駆動型後向き研究 Purnomo, Husnul Khotimah 25 September 2018 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(情報学) / 甲第21397号 / 情博第683号 / 新制\|\|情\|\|118(附属図書館) / 京都大学大学院情報学研究科社会情報学専攻 / (主査)教授吉川正俊, 教授黒田知宏, 教授守屋和幸 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM medication transition events medication strategy analyses long-term prescription dataset medication episode construction stable period identification adjacent pattern mining directed-graph based visualization 007
64	Wildlife-vehicle collisions : An evaluation of the mitigation effect by ecoducts and fauna bridges in Sweden Rietz, Anna January 2023 (has links) The occurrence of wildlife vehicle collisions (WVCs) is an increasing problem in Sweden with a calculated increase of 45 percent from 2015 to 2022. The highest measured number of WVCs occurred in 2021 with over 67,000 reported incidents, where only the payment for the search of wounded animals were approximately 60 million Swedish crowns. The Swedish transport agency works actively with the problem by constructing several types of wildlife passages to mitigate the increasing problem, inter alia, ecoducts and fauna bridges. The aim of this study was to evaluate the mitigation effect of wildlife passages, in this case ecoducts and fauna bridges. In addition, were also the spatial extent of the mitigation effect together with the relationship between mitigation effect and the annual daily traffic (ADT) evaluated. The evaluation of mitigation effect was conducted by the usage of several types of geographical information systems (GIS) tools in the software ArcGIS Pro. A selection of seven passages was made, based on several requirements and each passage was assigned a study area with an area of 100 square kilometers. The mitigation effect was initially determined by conducting an Emerging hot spot analysis, categorizing the result into showing trend of decrease or showing no trend of decrease. The spatial extent of the mitigating effect was evaluated through the Emerging hot spot results while the relation between ADT and WVCs was evaluated in an overlay analysis. Two of the passages were concluded as to having a mitigating effect, three passages were concluded as to showing no mitigating effect and two passages were excluded from further evaluation due to high uncertainty in the results. At the passages with stated mitigating effect were the spatial extent of effect shown in the whole study area. The result showed no evident correlation between ADT and mitigating effects which led to further reflections on the degree of influence that ADT has on the occurrence of WVCs. Space time pattern mining Space time cube Emerging hot spot analysis Overlay analysis Bivariate choropleth map ArcGIS Pro Wildlife-vehicle collision (WVC) Annual daily traffic Human Geography Kulturgeografi
65	高效率常見超集合探勘演算法之研究 / Efficient Algorithms for the Discovery of Frequent Superset 廖忠訓, Liao, Zhung-Xun Unknown Date (has links) 過去對於探勘常見項目集的研究僅限於找出資料庫中交易紀錄的子集合，在這篇論文中，我們提出一個新的探勘主題：常見超集合探勘。常見超集合意指它包含資料庫中各筆紀錄的筆數多於最小門檻值，而原本用來探勘常見子集合的演算法並無法直接套用，因此我們以補集合的角度，提出了三個快速的演算法來解決這個新的問題。首先為Apriori-C：此為使用先廣後深搜尋的演算法，並且以掃描資料庫的方式來決定具有相同長度之候選超集合的支持度，第二個方法是Eclat-C：此為採用先深後廣搜尋的演算法，並且搭配交集法來計算倏選超集合的支持度，最後是DCT：此方法可利用過去常見子集合探勘的演算法來進行探勘，如此可以省下開發新系統的成本。常見超集合的探勘可以應用在電子化的遠距學習系統，生物資訊及工作排程的問題上。尤其在線上學習系統，我們可以利用常見超集合來代表一群學生的學習行為，並且藉以預測學生的學習成就，使得老師可以及時發現學生的學習迷失等行為；此外，透過常見超集合的探勘，我們也可以為學生推薦個人化的課程，以達到因材施教的教學目標。在實驗的部份，我們比較了各演算法的效率，並且分別改變實驗資料庫的下列四種變因：1) 交易資料的筆數、2) 每筆交易資料的平均長度、3) 資料庫中項目的總數和4) 最小門檻值。在最後的分析當中，可以清楚地看出我們提出的各種方法皆十分有效率並且具有可延伸性。 / The algorithms for the discovery of frequent itemset have been investigated widely. These frequent itemsets are subsets of database. In this thesis, we propose a novel mining task: mining frequent superset from the database of itemsets that is useful in bioinformatics, E-learning systems, jobshop scheduling, and so on. A frequent superset means that the number of transactions contained in it is not less than minimum support threshold. Intuitively, according to the Apriori algorithm, the level-wise discovering starts from 1-itemset, 2-itemset, and so forth. However, such steps cannot utilize the property of Apriori to reduce search space, because if an itemset is not frequent, its superset maybe frequent. In order to solve this problem, we propose three methods. The first is the Apriori-based approach, called Apriori-C. The second is the Eclat-based approach, called Eclat-C, which is a depth-first approach. The last is the proposed data complement technique (DCT) that we utilize original frequent itemset mining approach to discover frequent superset. The experimental studies compare the performance of the proposed three methods by considering the effect of the number of transactions, the average length of transactions, the number of different items, and minimum support. The analysis shows that the proposed algorithms are time efficient and scalable. 常見超集合探勘常見樣式探勘關聯法則資料探勘 Frequent Superset Mining Frequent Pattern Mining Association Rule Data Mining
66	由華語流行歌詞探勘歌詞的特徵樣式 / Mining Patterns from Lyrics of Chinese Popular Music 周晏如, Chou, Yen Ju Unknown Date (has links) 華語流行歌詞一直是語言、文學、音樂或是文化研究等相關科系赤手可熱的研究題目，內容包含作詞者、修辭分析、風格、用韻及語言表達等，然由於歌詞數量龐大，難以全部以人工分析。近年來，資訊科技日新月異不斷地進步，隨著Big Data議題受到注目，Data Mining在近年來相當熱門，然而針對華語流行歌詞的巨量資料探勘與分析研究並不多。因此，本論文研究以程式來自動化分析歌詞的樣式與特性，包括詞彙頻率、詞彙相鄰關係分析、歌名分析、使用語系分析、舊曲新唱、歌詞風格自動分類、用韻及修辭等，而研究資料係透過網路擷取知名網站內容，包含魔鏡歌詞網 (Mojim.com)、臺北之音HitFM聯播網 (www.hitoradio.com) 及教育部重編國語辭典，透過分析規則及以Non-Trivial Repeating Pattern等方法，來完成分析及系統實作。透過華語流行歌詞的大量分析，探勘及了解各種歌詞的風格與特性，將可了解各種歌詞、作詞者的風格與特色，進而應用在歌詞資料的管理與查詢。此外，本研究將八萬多首歌詞的各種分析資料設置成網站，提供予學術研究使用，希冀此研究資料能使華語流行歌詞相關研究研究，進行更深入地探討。 / Chinese popular music lyrics has been a popular topic for researchers who major in languages and literature, music or culture. Related studies include of lyricists, rhetoric methods, styles, rhyme and language expression. However, all these studies were performed by manual analysis. It is difficult to analyze large amount of lyrics manually. With advances in computer technology, big data and data mining techniques have been widely used in different kinds of data. However, to the best of our knowledge, none have been done on pattern mining from big data of lyrics of Chinese popular music. Therefore, the objective of this thesis is to discover patterns from tremendous lyrics data based on data mining techniques. We use data downloaded from www.mojim.com, http://dict.revised.moe.edu.tw/cbdic/ and http://www.hitoradio.com (Hit FM). Data mining methods are employed to find lyrics’ patterns and features, including frequent words, word adjacency, analysis of hit songs' names, lyrics’ language studies, cover song research, automatic style prediction, rhyme and rhetoric patterns. With the analysis of tremendous lyrics and data, the developed approaches of this thesis will be helpful for discovering distinguishing styles of lyrics and lyricists. 巨量資料資料探勘流行音樂重複樣式探勘歌詞 Big Data Data Mining Popular Music Repeating Pattern Mining Lyrics
67	Topological and domain Knowledge-based subgraph mining : application on protein 3D-structures / Fouille de sous-graphes basée sur la topologie et la connaissance du domaine : application sur les structures 3D de protéines Dhifli, Wajdi 11 December 2013 (has links) Cette thèse est à l'intersection de deux domaines de recherche en plein expansion, à savoir la fouille de données et la bioinformatique. Avec l'émergence des bases de graphes au cours des dernières années, de nombreux efforts ont été consacrés à la fouille des sous-graphes fréquents. Mais le nombre de sous-graphes fréquents découverts est exponentiel, cela est dû principalement à la nature combinatoire des graphes. Beaucoup de sous-graphes fréquents ne sont pas pertinents parce qu'ils sont redondants ou tout simplement inutiles pour l'utilisateur. En outre, leur nombre élevé peut nuire ou même rendre parfois irréalisable toute utilisation ultérieure. La redondance dans les sous-graphes fréquents est principalement due à la similarité structurelle et / ou sémantique, puisque la plupart des sous-graphes découverts diffèrent légèrement dans leur structures et peuvent exprimer des significations similaires ou même identiques. Dans cette thèse, nous proposons deux approches de sélection des sous-graphes représentatifs parmi les fréquents afin d'éliminer la redondance. Chacune des approches proposées s'intéresse à un type spécifique de redondance. La première approche s'adresse à la redondance sémantique où la similarité entre les sous-graphes est mesurée en fonction de la similarité entre les étiquettes de leurs noeuds, en utilisant les connaissances de domaine. La deuxième approche s'adresse à la redondance structurelle où les sous-graphes sont représentés par des descripteurs topologiques définis par l'utilisateur, et la similarité entre les sous-graphes est mesurée en fonction de la distance entre leurs descriptions topologiques respectives. Les principales données d'application de cette thèse sont les structures 3D des protéines. Ce choix repose sur des raisons biologiques et informatiques. D'un point de vue biologique, les protéines jouent un rôle crucial dans presque tous les processus biologiques. Ils sont responsables d'une variété de fonctions physiologiques. D'un point de vue informatique, nous nous sommes intéressés à la fouille de données complexes. Les protéines sont un exemple parfait de ces données car elles sont faites de structures complexes composées d'acides aminés interconnectés qui sont eux-mêmes composées d'atomes interconnectés. Des grandes quantités de structures protéiques sont actuellement disponibles dans les bases de données en ligne. Les structures 3D des protéines peuvent être transformées en graphes où les acides aminés représentent les noeuds du graphe et leurs connexions représentent les arêtes. Cela permet d'utiliser des techniques de fouille de graphes pour les étudier. L'importance biologique des protéines et leur complexité ont fait d'elles des données d'application appropriées pour cette thèse. / This thesis is in the intersection of two proliferating research fields, namely data mining and bioinformatics. With the emergence of graph data in the last few years, many efforts have been devoted to mining frequent subgraphs from graph databases. Yet, the number of discovered frequentsubgraphs is usually exponential, mainly because of the combinatorial nature of graphs. Many frequent subgraphs are irrelevant because they are redundant or just useless for the user. Besides, their high number may hinder and even makes further explorations unfeasible. Redundancy in frequent subgraphs is mainly caused by structural and/or semantic similarities, since most discovered subgraphs differ slightly in structure and may infer similar or even identical meanings. In this thesis, we propose two approaches for selecting representative subgraphs among frequent ones in order to remove redundancy. Each of the proposed approaches addresses a specific type of redundancy. The first approach focuses on semantic redundancy where similarity between subgraphs is measured based on the similarity between their nodes' labels, using prior domain knowledge. The second approach focuses on structural redundancy where subgraphs are represented by a set of user-defined topological descriptors, and similarity between subgraphs is measured based on the distance between their corresponding topological descriptions. The main application data of this thesis are protein 3D-structures. This choice is based on biological and computational reasons. From a biological perspective, proteins play crucial roles in almost every biological process. They are responsible of a variety of physiological functions. From a computational perspective, we are interested in mining complex data. Proteins are a perfect example of such data as they are made of complex structures composed of interconnected amino acids which themselves are composed of interconnected atoms. Large amounts of protein structures are currently available in online databases, in computer analyzable formats. Protein 3D-structures can be transformed into graphs where amino acids are the graph nodes and their connections are the graph edges. This enables using graph mining techniques to study them. The biological importance of proteins, their complexity, and their availability in computer analyzable formats made them a perfect application data for this thesis. Sélection de motifs Fouille de motifs Sous-graphe fréquent Sous-graphe représentant non-substitué Graphe représentant topologique Structure de protéine Feature selection Pattern mining Frequent subgraph Representative unsubstituted subgraph Topological representative subgraph Protein structure
68	Contribution de la découverte de motifs à l’analyse de collections de traces unitaires / Contribution to unitary traces analysis with pattern discovery Cavadenti, Olivier 27 September 2016 (has links) Dans le contexte manufacturier, un ensemble de produits sont acheminés entre différents sites avant d’être vendus à des clients finaux. Chaque site possède différentes fonctions : création, stockage, mise en vente, etc. Les données de traçabilités décrivent de manière riche (temps, position, type d’action,…) les événements de création, acheminement, décoration, etc. des produits. Cependant, de nombreuses anomalies peuvent survenir, comme le détournement de produits ou la contrefaçon d’articles par exemple. La découverte des contextes dans lesquels surviennent ces anomalies est un objectif central pour les filières industrielles concernées. Dans cette thèse, nous proposons un cadre méthodologique de valorisation des traces unitaires par l’utilisation de méthodes d’extraction de connaissances. Nous montrons comment la fouille de données appliquée à des traces transformées en des structures de données adéquates permet d’extraire des motifs intéressants caractéristiques de comportements fréquents. Nous démontrons que la connaissance a priori, celle des flux de produits prévus par les experts et structurée sous la forme d’un modèle de filière, est utile et efficace pour pouvoir classifier les traces unitaires comme déviantes ou non, et permettre d’extraire les contextes (fenêtre de temps, type de produits, sites suspects,…) dans lesquels surviennent ces comportements anormaux. Nous proposons de plus une méthode originale pour détecter les acteurs de la chaîne logistique (distributeurs par exemple) qui auraient usurpé une identité (faux nom). Pour cela, nous utilisons la matrice de confusion de l’étape de classification des traces de comportement pour analyser les erreurs du classifieur. L’analyse formelle de concepts (AFC) permet ensuite de déterminer si des ensembles de traces appartiennent en réalité au même acteur. / In a manufacturing context, a product is moved through different placements or sites before it reaches the final customer. Each of these sites have different functions, e.g. creation, storage, retailing, etc. In this scenario, traceability data describes in a rich way the events a product undergoes in the whole supply chain (from factory to consumer) by recording temporal and spatial information as well as other important elements of description. Thus, traceability is an important mechanism that allows discovering anomalies in a supply chain, like diversion of computer equipment or counterfeits of luxury items. In this thesis, we propose a methodological framework for mining unitary traces using knowledge discovery methods. We show how the process of data mining applied to unitary traces encoded in specific data structures allows extracting interesting patterns that characterize frequent behaviors. We demonstrate that domain knowledge, that is the flow of products provided by experts and compiled in the industry model, is useful and efficient for classifying unitary traces as deviant or not. Moreover, we show how data mining techniques can be used to provide a characterization for abnormal behaviours (When and how did they occur?). We also propose an original method for detecting identity usurpations in the supply chain based on behavioral data, e.g. distributors using fake identities or concealing them. We highlight how the knowledge discovery in databases, applied to unitary traces encoded in specific data structures (with the help of expert knowledge), allows extracting interesting patterns that characterize frequent behaviors. Finally, we detail the achievements made within this thesis with the development of a platform of traces analysis in the form of a prototype. Informatique Fouille de données Fouille de motifs Modèle expert Découverte de connaissances Trace unitaire Produits manufacturiers Information Technology Data mining Pattern mining Expert model Knowledge discovery Unitary trace Manufacturing product 006.330 72
69	Inferring API Usage Patterns and Constraints : a Holistic Approach Saied, Mohamed Aymen 08 1900 (has links) Les systèmes logiciels dépendent de plus en plus des librairies et des frameworks logiciels. Les programmeurs réutilisent les fonctionnalités offertes par ces librairies à travers une interface de programmation (API). Par conséquent, ils doivent faire face à la complexité des APIs nécessaires pour accomplir leurs tâches, tout en surmontant l’absence de directive sur l’utilisation de ces API dans leur documentation. Dans cette thèse, nous proposons une approche holistique qui cible le problème de réutilisation des librairies, à trois niveaux. En premier lieu, nous nous sommes intéressés à la réutilisation d’une seule méthode d’une API. À ce niveau, nous proposons d’identifier les contraintes d’utilisation liées aux paramètres de la méthode, en analysant uniquement le code source de la librairie. Nous avons appliqué plusieurs analyses de programme pour détecter quatre types de contraintes d’utilisation considérées critiques. Dans un deuxième temps, nous changeons l’échelle pour nous focaliser sur l’inférence des patrons d’utilisation d’une API. Ces patrons sont utiles pour aider les développeurs à apprendre les façons courantes d’utiliser des méthodes complémentaires de l’API. Nous proposons d’abord une technique basée sur l’analyse des programmes clients de l’API. Cette technique permet l’inférence de patrons multi-niveaux. Ces derniers présentent des relations de co-utilisation entre les méthodes de l’API à travers des scénarios d’utilisation entremêlés. Ensuite, nous proposons une technique basée uniquement sur l’analyse du code de la librairie, pour surmonter la contrainte de l’existence des programmes clients de l‘API. Cette technique infère les patrons par analyse des relations structurelles et sémantiques entre les méthodes. Finalement, nous proposons une technique coopérative pour l’inférence des patrons d’utilisation. Cette technique est axée sur la combinaison des heuristiques basées respectivement sur les clients et sur le code de la librairie. Cette combinaison permet de profiter à la fois de la précision des techniques basées sur les clients et de la généralisabilité des techniques basées sur les librairies. Pour la dernière contribution de notre thèse, nous visons un plus haut niveau de réutilisation des librairies. Nous présentons une nouvelle approche, pour identifier automatiquement les patrons d’utilisation de plusieurs librairies, couramment utilisées ensemble, et généralement développées par différentes tierces parties. Ces patrons permettent de découvrir les possibilités de réutilisation de plusieurs librairies pour réaliser diverses fonctionnalités du projets. / Software systems increasingly depend on external library and frameworks. Software developers need to reuse functionalities provided by these libraries through their Application Programming Interfaces (APIs). Hence, software developers have to cope with the complexity of existing APIs needed to accomplish their work, and overcome the lack of usage directive in the API documentation. In this thesis, we propose a holistic approach that deals with the library usability problem at three levels of granularity. In the first step, we focus on the method level. We propose to identify usage constraints related to method parameters, by analyzing only the library source code. We applied program analysis strategies to detect four critical usage constraint types. At the second step, we change the scale to focus on API usage pattern mining in order to help developers to better learn common ways to use the API complementary methods. We first propose a client-based technique for mining multilevel API usage patterns to exhibit the co-usage relationships between API methods across interfering usage scenarios. Then, we proposed a library-based technique to overcome the strong constraint of client programs’ selection. Our technique infers API usage patterns through the analysis of structural and semantic relationships between API methods. Finally, we proposed a cooperative usage pattern mining technique that combines client-based and library-based usage pattern mining. Our technique takes advantage at the same time from the precision of the client-based technique and from the generalizability of the library-based technique. As a last contribution of this thesis, we target a higher level of library usability. We present a novel approach, to automatically identify third-party library usage patterns, of libraries that are commonly used together. This aims to help developers to discover reuse opportunities, and pick complementary libraries that may be relevant for their projects. Compréhension de programme Utilisabilité des API Documentation des API Program comprehension API usability Usage pattern mining Usage constraint inference API documentation
70	華語流行音樂之詞式分析與詞曲結構搭配之排比與同步 / Lyrics Form Analysis for Chinese Pop Music with Application to Structure Alignment between Lyrics and Melody 范斯越, Fan, Sz Yue Unknown Date (has links) 目前大部分的聽眾主要是透過歌詞與樂曲的搭配來了解音樂所要表達的內容，因此歌詞創作在目前的音樂工業是很重要的一環。一般流行音樂創作是由作曲人與作詞人共同完成，然而有另一種方式是將既有的詩詞做為歌詞，接著重新譜曲的方式產生新的流行音樂。這種創作方式是讓舊有的詞或曲注入新的生命力，得以流傳到現在。因此本研究希望可以為一首旋律推薦適合配唱的歌詞，以對數位音樂達到舊曲新詞的加值應用。本論文包括兩個部分，分別為：(1)自動分析歌詞的詞式，找出每個段落的位置與其段落的標籤；(2)詞曲結構搭配，找出相符合結構的詞與曲，並且同步每個漢字與音符。本論文的第一部分為詞式分析，首先將歌詞擷取四個面向的特徵值，分別為(1)句字數結構；(2)拼音結構；(3)詞性；(4)聲調音高。第二步驟，利用這四種特徵值分別建立詞行的自相似度矩陣(Self Similarity Matrix)，並且利用這四個特徵的自相似度矩陣產生一個線性組合自相似度矩陣。第三步驟，建立在自相似度矩陣上我們做段落分群以及家族(Family)組合找出最佳的分段方式，最後將找出的分段方式利用我們整理出來的規則讓電腦自動標記段落標籤。第二部分為詞曲結構搭配，首先我們將主旋律的樂句以及歌詞的詞句做第一層粗略的對應，第二步驟，將對應好的樂句與詞句做第二層漢字與音符細部的對應，最後整合兩層對應的成本當做詞曲搭配的分數。我們以KKBOX音樂網站當做歌詞來源，並且請專家標記華語流行歌詞資料庫的詞式。實驗顯示詞式分析的Pairwise f-score準確率達到0.83，標籤回復準確率達到0.78。詞曲結構搭配中，查詢的歌曲其原本搭配的歌詞，推薦排名皆為第一名。 / Nowadays, lots of pop music audiences understand the content of music via lyrics and melody collocation. In general, a Chinese pop music is produced by composer and lyricist cooperatively. However, another producing manner is composing new melody with ancient poetry. Therefore, we want to recommend present lyrics for a melody and then achieving value-added application for digital music. This thesis includes two subjects. The first subject is lyrics form analysis. This subject is finding the block of verse, chorus, etc., in lyrics. The second subject is structure alignment between lyrics and melody. We utilize the result of lyrics form analysis and then employ a 2-tier alignment to recommend present lyrics which is suitable for singing. In lyrics form analysis, the first step, we investigate four types of feature from lyrics: (1) Word Count Structure; (2) Pinyin Structure; (3) Part of Speech Structure; (4) Word Tone Pitch. For the second step, we utilize these four types of feature to construct a SSM(Self Similarity Matrix), and blend these four types of SSM to produce a linear combination SSM. The third step is clustering blocks and finding the best Family combination based on SSM. Finally, a rule-based technique is employed to label blocks of lyrics. For the second subject, the first step is aligning music phrases and lyrics sentences roughly. The second step is aligning a word and a note for corresponding phrase and sentence. Finally, we integrated the cost of two-level alignment regarded as the lyrics and melody collocation score. We collect lyrics from KKBOX, a music web site, and invite experts label ground truth of lyrics form. The experimental result of lyrics form analysis shows that the proposed method achieves the Pairwise f-score of 0.83, and the Label Recovering Ratio of 0.78. The experiment of structure alignment between lyrics and melody shows that the original lyrics of query melodies are ranked number one. 詞式分析詞曲排比華語流行音樂重複樣式探勘 Lyrics Form Analysis Lyrics Melody Alignment Chinese Pop Music Repeating Pattern Mining

Search results