Global ETD Search

51	A Method for Membership Card Generation Based on Clustering and Optimization Models in A Hypermarket Xiaojun, Chen, Bhattrai, Premlal January 2011 (has links) Context: Data mining as a technique is used to find interesting and valuable knowledge from huge amount of stored data within databases or data warehouses. It encompasses classification, clustering, association rule learning, etc., whose goals are to improve commercial decisions and behaviors in organizations. Amongst these, hierarchical clustering method is commonly used in data selection preprocessing step for customer segmentation in business enterprises. However, this method could not treat with the overlapped or diverse clusters very well. Thus, we attempt to combine clustering and optimization into an integrated and sequential approach that can substantially be employed for segmenting customers and subsequent membership cards generation. Clustering methods is used to segment customers into groups while optimization aids in generating the required membership cards. Objectives: Our master thesis project aims to develop a methodological approach for customer segmentation based on their characteristics in order to define membership cards based on mathematical optimization model in a hypermarket. Methods: In this thesis, literature review of articles was conducted using five reputed databases: IEEE, Google Scholar, Science Direct, Springer and Engineering Village. This was done to have a background study and to gain knowledge about the current research in the field of clustering and optimization based method for membership card generating in a hypermarket. Further, we also employed video interviews as research methodologies and a proof-of-concept implementation for our solution. Interviews allowed us to collect raw data from the hypermarket while testing the data produces preliminary results. This was important because the data could be regarded as a guideline to evaluate the performance of customer segmentation and generating membership cards. Results: We built clustering and optimization models as a two-step sequential method. In the first step, the clustering model was used to segment customers into different clusters. In the second step, our optimization model was utilized to produce different types of membership cards. Besides, we tested a dataset consisting of 100 customer records consequently obtaining five clusters and five types of membership cards respectively. Conclusions: This research provides a basis for customer segmentation and generating membership cards in a hypermarket by way of data mining techniques and optimization. Thus, through our research, an integrated and sequential approach to clustering and optimization can suitably be used for customer segmentation and membership card generation respectively. Data mining Hierarchical clustering Fuzzy clustering Optimization model Membership card Computer Sciences Datavetenskap (datalogi) Software Engineering Programvaruteknik
52	應用主題探勘與標籤聚合於標籤推薦之研究 / Application of topic mining and tag clustering for tag recommendation 高挺桂, Kao, Ting Kuei Unknown Date (has links) 標記社群標籤是Web2.0以來流行的一種透過使用者詮釋和分享資訊的方式，作為傳統分類方法的替代，其方便、靈活的特色使得使用者能夠輕易地因應內容標註標籤。不過其也有缺點，除了有相當多無標籤標註的內容，也存在大量模糊、不精確的標籤，降低了系統本身組織分類標籤的能力。為了解決上述兩項問題，本研究提出了一種結合主題探勘與標籤聚合的自動化標籤推薦方法，期望能夠建立一個去人工過程的自動化標籤推薦規則，來推薦合適的標籤給使用者。本研究蒐集了痞客邦部落格中，點閱次數大於5000次的熱門中文文章共2500篇，經過前處理，並以其中1939篇訓練模型及400篇作為測試語料來驗證方法。在主題探勘部分，本研究利用LDA主題模型計算不同文章的主題語意，來與既有標籤作出關聯，而能夠針對新進文章預測主題並推薦主題相關標籤給它。其中，本研究利用了能評斷模型表現情形的混淆度(Perplexity)來協助選取LDA的主題數，改善了LDA需要人主觀決定主題數的問題；在標籤聚合部分，本研究以階層式分群法，將有共同出現過的標籤群聚起來，以便找出有相似語意概念的標籤。其中，本研究將分群停止條件設定為共現次數最少為1次，改善了分群方法需要設定分群數量才能有結果的問題，也使本方法能夠自動化的找出合適的分群數目。實驗結果顯示，依照文章主題語意來推薦標籤有一定程度的可行性，且以混淆度所協助選取的主題數取得一致性較好的結果。而依照階層式分群所分出的標籤群中，同一群中的標籤確實擁有相似、類似的概念語意。最後，在結合主題探勘與標籤聚合的方法上，其Top-1至Top-5的準確率平均提升了14.1%，且Top-1準確率也達到72.25%。代表本研究針對文章寫作及標記標籤的習性切入的做法，確實能幫助提升標籤推薦的準確率，也代表本研究確實建立了一個自動化的標籤推薦規則，能推薦出合適的標籤來幫助使用者在撰寫文章後，能夠更方便、精確的標上標籤。 / Tags are a popular way of interpreting and sharing information through use, and as a substitute for traditional classification methods, the convenience and flexibility of the community makes it easy for users to use. But it also has disadvantages, in addition to a considerable number of non-tagged content, there are also many fuzzy and inaccurate tags. To solve these two problems, this study proposes a tag recommendation method that combines the Topic Mining and Tag Clustering. In this study, we collected a total of 2500 articles by Pixnet as a corpus. In the Topic Mining section, this study uses the LDA Model to calculate the subject semantics of different articles to associate with existing tags, and we can predict topics for new articles to recommend topics related tags to them. Among them, the topics number of the LDA Model uses the Perplexity to help the selection. In the Tag Clustering section, this study uses the Hierarchical Clustering to collect the tags that have appeared together to find similar semantic concepts. The stop condition is set to a minimum of 1 co-occurrence times, which solves the problem that the clustering method needs to set the number of groups to have the result. First, the Topic Mining results show that it is feasible to recommend tags according to the semantics of the article, and the experiment proves that the number of topics chosen according to the Perplexity is superior to the other topics. Second, the Tag Clustering results show that the same group of tags does have similar conceptual semantics. Last, experiments show that the accuracy rate of Top-1 to Top-5 in combination with two methods increased average of 14.1%, and its Top-1 accuracy rate is 72.25%,and it tells that our tag recommendation method can recommend the appropriate tag for users to use. 標籤推薦主題模型階層式分群 Tag recommendation Topic model Hierarchical clustering
53	Hierarkisk klustring av klickströmmar : En metodik för identifiering av användargrupper Schorn, Björn January 2022 (has links) Nasdaq utvecklar och tillhandahåller mjukvarulösningar för clearinghus. Det finns ett intresse för att utveckla en fördjupad förståelse för hur funktionaliteten av produkten används. En möjlighet för detta är att använda sig av hierarkisk klustring av klickströmmar från webbgränssnittet. Denna rapport utvecklar ett tillvägagångsätt för en sådan klustring och tillämpar den på ett redan befintligt dataset av klickströmsloggar. Att använda sig av ett euklidiskt avståndsmått kan fungera för enklare klustringar så som gruppering av produktsidor. För en djupare analys av användarbeteendet genom en klustring av sessioner ger dock Damerau-Levenshtein bättre resultat då det även tar hänsyn till i vilken ordningsföljd sidvisningarna för respektive session sker. / Nasdaq develops and provides software solutions for clearing houses. There is an interest in developing an in-depth understanding of how the functionality of this product is used. One possibility for this is to use hierarchical clustering of click streams from the web interface. This report develops a methodology for such clustering and applies it to an already existing dataset of clickstream logs. Using a Euclidean distance measure can work for simpler clusters such as grouping product pages. For a deeper analysis of user behavior through a clustering of sessions, however, Damerau–Levenshtein gives better results as it also takes into account the order of the pages visited within the sessions. Hierarchical clustering clickstream Damerau–Levenshtein data mining web mining Hierakisk klustring klickström Damerau–Levenshtein datautvinning web mining Mathematics Matematik
54	Finding Anomalous Energy ConsumersUsing Time Series Clustering in the Swedish Energy Market Tonneman, Lukas January 2023 (has links) Improving the energy efficiency of buildings is important for many reasons. There is a large body of data detailing the hourly energy consumption of buildings. This work studies a large data set from the Swedish energy market. This thesis proposes a data analysis methodology for identifying abnormal consumption patterns using two steps of clustering. First, typical weekly energy usage profiles are extracted from each building by clustering week-long segments of the building’s lifetime consumption, and by extracting the medoids of the clusters. Second, all the typical weekly energyusage profiles are clustered using agglomerative hierarchical clustering. Large clusters are assumed to contain normal consumption pattens, and small clusters are assumed to have abnormal patterns. Buildings with a large presence in small clusters are said to be abnormal, and vice versa. The method employs Dynamic Time Warping distance for dissimilarity measure. Using a set of 160 buildings, manually classified by domain experts, this thesis shows that the mean abnormality-score is higher for abnormal buildings compared to normal buildings with p ≈ 0.0036. Computer Sciences Datavetenskap (datalogi)
55	Semantic Integration across Heterogeneous Databases : Finding Data Correspondences using Agglomerative Hierarchical Clustering and Artificial Neural Networks / Semantisk integrering mellan heterogena databaser : Hitta datakopplingar med hjälp av hierarkisk klustring och artiﬁciella neuronnät Hobro, Mark January 2018 (has links) The process of data integration is an important part of the database field when it comes to database migrations and the merging of data. The research in the area has grown with the addition of machine learning approaches in the last 20 years. Due to the complexity of the research field, no go-to solutions have appeared. Instead, a wide variety of ways of enhancing database migrations have emerged. This thesis examines how well a learning-based solution performs for the semantic integration problem in database migrations. Two algorithms are implemented. One that is based on information retrieval theory, with the goal of yielding a matching result that can be used as a benchmark for measuring the performance of the machine learning algorithm. The machine learning approach is based on grouping data with agglomerative hierarchical clustering and then training a neural network to recognize patterns in the data. This allows making predictions about potential data correspondences across two databases. The results show that agglomerative hierarchical clustering performs well in the task of grouping the data into classes. The classes can in turn be used for training a neural network. The matching algorithm gives a high recall of matching tables, but improvements are needed to both receive a high recall and precision. The conclusion is that the proposed learning-based approach, using agglomerative hierarchical clustering and a neural network, works as a solid base to semi-automate the data integration problem seen in this thesis. But the solution needs to be enhanced with scenario specific algorithms and rules, to reach desired performance. / Dataintegrering är en viktig del inom området databaser när det kommer till databasmigreringar och sammanslagning av data. Forskning inom området har ökat i takt med att maskininlärning blivit ett attraktivt tillvägagångssätt under de senaste 20 åren. På grund av komplexiteten av forskningsområdet, har inga optimala lösningar hittats. Istället har flera olika tekniker framställts, som tillsammans kan förbättra databasmigreringar. Denna avhandling undersöker hur bra en lösning baserad på maskininlärning presterar för dataintegreringsproblemet vid databasmigreringar. Två algoritmer har implementerats. En är baserad på informationssökningsteori, som främst används för att ha en prestandamässig utgångspunkt för algoritmen som är baserad på maskininlärning. Den algoritmen består av ett första steg, där data grupperas med hjälp av hierarkisk klustring. Sedan tränas ett artificiellt neuronnät att hitta mönster i dessa grupperingar, för att kunna göra förutsägelser huruvida olika datainstanser har ett samband mellan två databaser. Resultatet visar att agglomerativ hierarkisk klustring presterar väl i uppgiften att klassificera den data som använts. Resultatet av matchningsalgoritmen visar på att en stor mängd av de matchande tabellerna kan hittas. Men förbättringar behöver göras för att både ge hög en hög återkallelse av matchningar och hög precision för de matchningar som hittas. Slutsatsen är att ett inlärningsbaserat tillvägagångssätt, i detta fall att använda agglomerativ hierarkisk klustring och sedan träna ett artificiellt neuronnät, fungerar bra som en basis för att till viss del automatisera ett dataintegreringsproblem likt det som presenterats i denna avhandling. För att få bättre resultat, krävs att lösningen förbättras med mer situationsspecifika algoritmer och regler. Semantic integration data integration artificial neural networks agglomerative hierarchical clustering heterogeneous databases relational data Computer Sciences Datavetenskap (datalogi)
56	Klusteranalys : Tillämpning av agglomerativ hierarkisk och k-means klustring för att hitta bra kluster bland fotbollsspelare baserat på spelarstatistik. Balbas, Sacko, Törnquist, Arvid January 2024 (has links) This work is about how the multivariate analysis tool cluster analysis can be appliedto find meaningfull groups of players based on player statistics. The aim of the work isan attempt to find good clusters among players within the Spanish top football divisionLa Liga for the 2022-2023 season. A comparison between agglomerative hierarchical and k-means has been applied as a method to answer the purpose. The result of the workshowed that no good clusters could be identified among the players based on playerstatistics from La Liga season 22-23. Cluster analysis hierarchical clustering k-means clustering La Liga football algorithm machine learning. Probability Theory and Statistics Sannolikhetsteori och statistik
57	Clustering Consistently Eldridge, Justin, Eldridge January 2017 (has links) No description available. Computer Science Statistics Artificial Intelligence machine learning unsupervised learning statistical learning clustering graphon mergeon density cluster tree hierarchical clustering
58	Modélisation statistique de l’érosion de cavitation d’une turbine hydraulique selon les paramètres d’opération Bodson-Clermont, Paule-Marjolaine 03 1900 (has links) Dans une turbine hydraulique, la rotation des aubes dans l’eau crée une zone de basse pression, amenant l’eau à passer de l’état liquide à l’état gazeux. Ce phénomène de changement de phase est appelé cavitation et est similaire à l’ébullition. Lorsque les cavités de vapeur formées implosent près des parois, il en résulte une érosion sévère des matériaux, accélérant de façon importante la dégradation de la turbine. Un système de détection de l’érosion de cavitation à l’aide de mesures vibratoires, employable sur les turbines en opération, a donc été installé sur quatre groupes turbine-alternateur d’une centrale et permet d’estimer précisément le taux d’érosion en kg/ 10 000 h. Le présent projet vise à répondre à deux objectifs principaux. Premièrement, étudier le comportement de la cavitation sur un groupe turbine-alternateur cible et construire un modèle statistique, dans le but de prédire la variable cavitation en fonction des variables opératoires (tels l’ouverture de vannage, le débit, les niveaux amont et aval, etc.). Deuxièmement, élaborer une méthodologie permettant la reproductibilité de l’étude à d’autres sites. Une étude rétrospective sera effectuée et on se concentrera sur les données disponibles depuis la mise à jour du système en 2010. Des résultats préliminaires ont mis en évidence l’hétérogénéité du comportement de cavitation ainsi que des changements entre la relation entre la cavitation et diverses variables opératoires. Nous nous proposons de développer un modèle probabiliste adapté, en utilisant notamment le regroupement hiérarchique et des modèles de régression linéaire multiple. / Cavitation erosion which results from repeated collapse of transient vapor cavities on solid surfaces is a constant problematic in hydraulic turbine runners and continues to enforce costly repair and loss of revenues. A vibratory detection system of cavitation erosion was installed 10 years ago for continuous monitoring of 4 hydropower units. A new hardware version of the system was developed and installed in 2010. This new system configuration is more reliable and allows more accurate evaluation of the cavitation erosion of the runners in kg/10 000 h. The first objective of this study is to investigate cavitation behavior upon one generating unit and to build a statistical model which will allow prediction of instant cavitation related to operating variables, such as gate opening, water flow, headwater level, tailwater levels, etc. The second objective is to develop a methodology for the reproducibility of the studies to other sites. A retrospective study will be conducted and we will mainly focus on data available since the system update in 2010. The preliminary analysis enhanced the complexity of the phenomenon. Indeed, changes in the relationship between cavitation and various operating variables were observed and could be due to a seasonal behavior or different operating conditions. Using hierarchical clustering and regression models, we formalize this heterogeneity by developing a model which includes operating variables such as active power, tailwater level and gate opening. Cavitation Turbine Francis Mélange de lois Statistique Opération Regroupement hiérarchique Régression linéaire multiple Francis turbine Mixture model Hierarchical clustering Multiple linear regression
59	Modélisation surfacique et volumique de la peau : classification et analyse couleur / Skin surface and volume modeling : clustering and color analysis Breugnot, Josselin 27 June 2011 (has links) Grâce aux innovations technologiques récentes, l’exploration cutanée est devenue de plus en plus facile et précise. Le relevé topographique de la surface de peau par projection de franges ainsi que l’exploration des structures intradermiques par microscopie confocale in-vivo en sont des exemples parfaits. La mise en place de ces techniques et les développements sont présentés dans cette thèse. L’apport de l’imagerie est évident tant pour le traitement des acquisitions de ces appareils que pour l’évaluation de paramètres cutanés à partir de photographie par exemple. L’extension du modèle LIP niveaux de gris à la couleur a été réalisée pour apporter une évaluation proche de celle d’un expert grâce aux fondements logarithmiques du modèle, proches de la vision humaine. Enfin, la classification de données dans une image, sujet omniprésent dans le traitement d’images, a été abordée par les classifications hiérarchiques ascendantes, utilisant un cadre mathématique rigoureux grâce aux métriques ultramétriques / Thanks to recent developments, skin evaluation has become easier and more accurate. Topographical evaluation of skin surface by fringes projection as intra-dermal structures and exploration by in-vivo laser confocal microscopy are some examples. The use and development of these tools are developed in this thesis. Image processing contribution is obvious, as much for the treatment of these tools acquisitions, as for cutaneous parameters evaluation, based on digital camera acquisitions for example. Grey level LIP model extension to color has been realized in order to bring way of analysis near to the expert one, thanks to logarithmic bases of this model, very close to the human vision. At least, data clustering in images, a redundant topic in image analysis, has been approached by ascending hierarchical clustering, using rigorous mathematical properties thanks to the ultrametric distances Couleur Vision humaine Ultramétriques Peau Evaluation de surface Microscopie confocale Color Human vision Ascending hierarchical clustering Ultrametric distance Skin Surface evaluation Confocal microscopy
60	Selecionando candidatos a descritores para agrupamentos hierárquicos de documentos utilizando regras de associação / Selecting candidate labels for hierarchical document clusters using association rules Santos, Fabiano Fernandes dos 17 September 2010 (has links) Uma forma de extrair e organizar o conhecimento, que tem recebido muita atenção nos últimos anos, é por meio de uma representação estrutural dividida por tópicos hierarquicamente relacionados. Uma vez construída a estrutura hierárquica, é necessário encontrar descritores para cada um dos grupos obtidos pois a interpretação destes grupos é uma tarefa complexa para o usuário, já que normalmente os algoritmos não apresentam descrições conceituais simples. Os métodos encontrados na literatura consideram cada documento como uma bag-of-words e não exploram explicitamente o relacionamento existente entre os termos dos documento do grupo. No entanto, essas relações podem trazer informações importantes para a decisão dos termos que devem ser escolhidos como descritores dos nós, e poderiam ser representadas por regras de associação. Assim, o objetivo deste trabalho é avaliar a utilização de regras de associação para apoiar a identificação de descritores para agrupamentos hierárquicos. Para isto, foi proposto o método SeCLAR (Selecting Candidate Labels using Association Rules), que explora o uso de regras de associação para a seleção de descritores para agrupamentos hierárquicos de documentos. Este método gera regras de associação baseadas em transações construídas à partir de cada documento da coleção, e utiliza a informação de relacionamento existente entre os grupos do agrupamento hierárquico para selecionar candidatos a descritores. Os resultados da avaliação experimental indicam que é possível obter uma melhora significativa com relação a precisão e a cobertura dos métodos tradicionais / One way to organize knowledge, that has received much attention in recent years, is to create a structural representation divided by hierarchically related topics. Once this structure is built, it is necessary to find labels for each of the obtained clusters, since most algorithms do not produce simple descriptions and the interpretation of these clusters is a difficult task for users. The related works consider each document as a bag-of-words and do not explore explicitly the relationship between the terms of the documents. However, these relationships can provide important information to the decision of the terms that must be chosen as descriptors of the nodes, and could be represented by rass. This works aims to evaluate the use of association rules to support the identification of labels for hierarchical document clusters. Thus, this paper presents the SeCLAR (Selecting Candidate Labels using Association Rules) method, which explores the use of association rules for the selection of good candidates for labels of hierarchical clusters of documents. This method generates association rules based on transactions built from each document in the collection, and uses the information relationship between the nodes of hierarchical clustering to select candidates for labels. The experimental results show that it is possible to obtain a significant improvement with respect to precision and recall of traditional methods Agrupamento hierárquico de documantos Association rules Hierarchical document clustering Label hierarchical clustering Mineração de texto Regras de associação Text mining

Search results