Global ETD Search

41	Extração de tópicos baseado em agrupamento de regras de associação / Topic extraction based on association rule clustering Fabiano Fernandes dos Santos 29 May 2015 (has links) Uma representação estruturada dos documentos em um formato apropriado para a obtenção automática de conhecimento, sem que haja perda de informações relevantes em relação ao formato originalmente não-estruturado, é um dos passos mais importantes da mineração de textos, pois a qualidade dos resultados obtidos com as abordagens automáticas para obtenção de conhecimento de textos estão fortemente relacionados à qualidade dos atributos utilizados para representar a coleção de documentos. O Modelo de Espaço de Vetores (MEV) é um modelo tradicional para obter uma representação estruturada dos documentos. Neste modelo, cada documento é representado por um vetor de pesos correspondentes aos atributos do texto. O modelo bag-of-words é a abordagem de MEV mais utilizada devido a sua simplicidade e aplicabilidade. Entretanto, o modelo bag-of-words não trata a dependência entre termos e possui alta dimensionalidade. Diversos modelos para representação dos documentos foram propostos na literatura visando capturar a informação de relação entre termos, destacando-se os modelos baseados em frases ou termos compostos, o Modelo de Espaço de Vetores Generalizado (MEVG) e suas extensões, modelos de tópicos não-probabilísticos, como o Latent Semantic Analysis (LSA) ou o Non-negative Matrix Factorization (NMF), e modelos de tópicos probabilísticos, como o Latent Dirichlet Allocation (LDA) e suas extensões. A representação baseada em modelos de tópicos é uma das abordagens mais interessantes uma vez que elas fornece uma estrutura que descreve a coleção de documentos em uma forma que revela sua estrutura interna e as suas inter-relações. As abordagens de extração de tópicos também fornecem uma estratégia de redução da dimensionalidade visando a construção de novas dimensões que representam os principais tópicos ou assuntos identificados na coleção de documentos. Entretanto, a extração é eficiente de informações sobre as relações entre os termos para construção da representação de documentos ainda é um grande desafio de pesquisa. Os modelos para representação de documentos que exploram a correlação entre termos normalmente enfrentam um grande desafio para manter um bom equilíbrio entre (i) a quantidade de dimensões obtidas, (ii) o esforço computacional e (iii) a interpretabilidade das novas dimensões obtidas. Assim,é proposto neste trabalho o modelo para representação de documentos Latent Association Rule Cluster based Model (LARCM). Este é um modelo de extração de tópicos não-probabilístico que explora o agrupamento de regras de associação para construir uma representação da coleção de documentos com dimensionalidade reduzida tal que as novas dimensões são extraídas a partir das informações sobre as relações entre os termos. No modelo proposto, as regras de associação são extraídas para cada documento para obter termos correlacionados que formam expressões multi-palavras. Essas relações entre os termos formam o contexto local da relação entre termos. Em seguida, aplica-se um processo de agrupamento em todas as regras de associação para formar o contexto geral das relações entre os termos, e cada grupo de regras de associação obtido formará um tópico, ou seja, uma dimensão da representação. Também é proposto neste trabalho uma metodologia de avaliação que permite selecionar modelos que maximizam tanto os resultados na tarefa de classificação de textos quanto os resultados de interpretabilidade dos tópicos obtidos. O modelo LARCM foi comparado com o modelo LDA tradicional e o modelo LDA utilizando uma representação que inclui termos compostos (bag-of-related-words). Os resultados dos experimentos indicam que o modelo LARCM produz uma representação para os documentos que contribui significativamente para a melhora dos resultados na tarefa de classificação de textos, mantendo também uma boa interpretabilidade dos tópicos obtidos. O modelo LARCM também apresentou ótimo desempenho quando utilizado para extração de informação de contexto para aplicação em sistemas de recomendação sensíveis ao contexto. / A structured representation of documents in an appropriate format for the automatic knowledge extraction without loss of relevant information is one of the most important steps of text mining, since the quality of the results obtained with automatic approaches for the text knowledge extraction is strongly related to the quality of the selected attributes to represent the collection of documents. The Vector Space model (VSM) is a traditional structured representation of documents. In this model, each document is represented as a vector of weights that corresponds to the features of the document. The bag-of-words model is the most popular VSM approach because of its simplicity and general applicability. However, the bag-of-words model does not include dependencies of the terms and has a high dimensionality. Several models for document representation have been proposed in the literature in order to capture the dependence among the terms, especially models based on phrases or compound terms, the Generalized Vector Space Model (GVSM) and their extensions, non-probabilistic topic models as Latent Semantic Analysis (LSA) or Non-negative Matrix Factorization (NMF) and still probabilistic topic models as the Latent Dirichlet Allocation (LDA) and their extensions. The topic model representation is one of the most interesting approaches since it provides a structure that describes the collection of documents in a way that reveals their internal structure and their interrelationships. Also, this approach provides a dimensionality reduction strategy aiming to built new dimensions that represent the main topics or ideas of the document collection. However, the efficient extraction of information about the relations of terms for document representation is still a major research challenge nowadays. The document representation models that explore correlated terms usually face a great challenge of keeping a good balance among the (i) number of extracted features, (ii) the computational performance and (iii) the interpretability of new features. In this way, we proposed the Latent Association Rule Cluster based Model (LARCM). The LARCM is a non-probabilistic topic model that explores association rule clustering to build a document representation with low dimensionality in a way that each dimension is composed by information about the relations among the terms. In the proposed approach, the association rules are built for each document to extract the correlated terms that will compose the multi-word expressions. These relations among the terms are the local context of relations. Then, a clustering process is applied for all association rules to discover the general context of the relations, and each obtained cluster is an extracted topic or a dimension of the new document representation. This work also proposes in this work an evaluation methodology to select topic models that maximize the results in the text classification task as much as the interpretability of the obtained topics. The LARCM model was compared against both the traditional LDA model and the LDA model using a document representation that includes multi-word expressions (bag-of-related-words). The experimental results indicate that LARCM provides an document representation that improves the results in the text classification task and even retains a good interpretability of the extract topics. The LARCM model also achieved great results as a method to extract contextual information for context-aware recommender systems. Agrupamento de regras de associação Extração de tópicos Mineração de textos Redução de dimensionalidade Association rule clustering Dimensionality reduction Topic extraction
42	Recommending new items to customers : A comparison between Collaborative Filtering and Association Rule Mining / Rekommendera nya produkter till kunder : En jämförelsestudie mellan Collaborative Filtering och Association Rule Mining Sohlberg, Henrik January 2015 (has links) E-commerce is an ever growing industry as the internet infrastructure continues to evolve. The benefits from a recommendation system to any online retail store are several. It can help customers to find what they need as well as increase sales by enabling accurate targeted promotions. Among many techniques that can form recommendation systems, this thesis compares Collaborative Filtering against Association Rule Mining, both implemented in combination with clustering. The suggested implementations are designed with the cold start problem in mind and are evaluated with a data set from an online retail store which sells clothing. The results indicate that Collaborative Filtering is the preferable technique while associated rules may still offer business value to stakeholders. However, the strength of the results is undermined by the fact that only a single data set was used. / E-handel är en växande marknad i takt med att Internet utvecklas samtidigt som antalet användare ständigt ökar. Antalet fördelar från rekommendationssytem som e-butiker kan dra nytta av är flera. Samtidigt som det kan hjälpa kunder att hitta vad de letar efter kan det utgöra underlag för riktade kampanjer, något som kan öka försäljning. Det finns många olika tekniker som rekommendationssystem kan vara byggda utifrån. Detta examensarbete ställer fokus på de två teknikerna Collborative Filtering samt Association Rule Mining och jämför dessa sinsemellan. Båda metoderna kombinerades med klustring och utformades för att råda bot på kallstartsproblemet. De två föreslagna implementationerna testades sedan mot en riktig datamängd från en e-butik med kläder i sitt sortiment. Resultaten tyder på att Collborative Filtering är den överlägsna tekniken samtidigt som det fortfarande finns ett värde i associeringsregler. Att dra generella slutsatser försvåras dock av att enbart en datamängd användes. Recommendation system Association rule mining Collaborative filtering Cold start Computer Sciences Datavetenskap (datalogi)
43	Understanding usage of Volvo trucks Dahl, Oskar, Johansson, Fredrik January 2019 (has links) Trucks are designed, configured and marketed for various working environments. There lies a concern whether trucks are used as intended by the manufacturer, as usage may impact the longevity, efficiency and productivity of the trucks. In this thesis we propose a framework divided into two separate parts, that aims to extract costumers’ driving behaviours from Logged Vehicle Data (LVD) in order to a): evaluate whether they align with so-called Global Transport Application (GTA) parameters and b): evaluate the usage in terms of performance. Gaussian mixture model (GMM) is employed to cluster and classify various driving behaviors. Association rule mining was applied on the categorized clusters to validate that the usage follow GTA configuration. Furthermore, Correlation Coefficient (CC) was used to find linear relationships between usage and performance in terms of Fuel Consumption (FC). It is found that the vast majority of the trucks seemingly follow GTA parameters, thus used as marketed. Likewise, the fuel economy was found to be linearly dependent with drivers’ various performances. The LVD lacks detail, such as Global Positioning System (GPS) information, needed to capture the usage in such a way that more definitive conclusions can be drawn. / <p>This thesis was later conducted as a scientific paper and was submit- ted to the conference of ICIMP, 2020. The publication was accepted the 23th of September (2019), and will be presented in January, 2020.</p> Machine Learning Clustering Usage Behaviors Association Rule Mining Gaussian Mixture Models Robotics Robotteknik och automation
44	Ontology-Based SemanticWeb Mining Challenges : A Literature Review March, Christopher January 2023 (has links) The semantic web is an extension of the current web that provides a standardstructure for data representation and reasoning, allowing content to be readable for both humans and machines in a form known as ontological knowledgebases. The goal of the Semantic Web is to be used in large-scale technologies or systems such as search engines, healthcare systems, and social mediaplatforms. Some challenges may deter further progress in the development ofthe Semantic Web and the associated web mining processes. In this reviewpaper, an overview of Semantic Web mining will examine and analyze challenges with data integration, dynamic knowledge-based methods, efficiencies,and data mining algorithms regarding ontological approaches. Then, a reviewof recent solutions to these challenges such as clustering, classification, association rule mining, and ontological building aides that overcome the challengeswill be discussed and analyzed. Ontology Semantic Web mining OWL Ontology matching Ontology-based association rule mining Computer Sciences Datavetenskap (datalogi)
45	Novel Algorithms for Cross-Ontology Multi-Level Data Mining Manda, Prashanti 15 December 2012 (has links) The wide spread use of ontologies in many scientific areas creates a wealth of ontologyannotated data and necessitates the development of ontology-based data mining algorithms. We have developed generalization and mining algorithms for discovering cross-ontology relationships via ontology-based data mining. We present new interestingness measures to evaluate the discovered cross-ontology relationships. The methods presented in this dissertation employ generalization as an ontology traversal technique for the discovery of interesting and informative relationships at multiple levels of abstraction between concepts from different ontologies. The generalization algorithms combine ontological annotations with the structure and semantics of the ontologies themselves to discover interesting crossontology relationships. The first algorithm uses the depth of ontological concepts as a guide for generalization. The ontology annotations are translated to higher levels of abstraction one level at a time accompanied by incremental association rule mining. The second algorithm conducts a generalization of ontology terms to all their ancestors via transitive ontology relations and then mines cross-ontology multi-level association rules from the generalized transactions. Our interestingness measures use implicit knowledge conveyed by the relation semantics of the ontologies to capture the usefulness of cross-ontology relationships. We describe the use of information theoretic metrics to capture the interestingness of cross-ontology relationships and the specificity of ontology terms with respect to an annotation dataset. Our generalization and data mining agorithms are applied to the Gene Ontology and the postnatal Mouse Anatomy Ontology. The results presented in this work demonstrate that our generalization algorithms and interestingness measures discover more interesting and better quality relationships than approaches that do not use generalization. Our algorithms can be used by researchers and ontology developers to discover inter-ontology connections. Additionally, the cross-ontology relationships discovered using our algorithms can be used by researchers to understand different aspects of entities that interest them. association rule mining cross-ontology data mining interestingness measures gene ontology anatomy ontology
46	Predictive Models for Hospital Readmissions Shi, Junyi January 2023 (has links) A hospital readmission can occur due to insufficient treatment or the emergence of an underlying disease that was not apparent at the initial hospital stay. The unplanned readmission rate is often viewed as an indicator of the health system performance and may reflect the quality of clinical care provided during hospitalization. Readmissions have also been reported to account for a significant portion of inpatient care expenditures. In an effort to improve treatment quality, clinical outcomes, and hospital operating costs, we present machine learning methods for identifying and predicting potentially preventable readmissions (PPR). In the first part of the thesis, we use logistic regression, extreme gradient boosting, and neural network to predict 30-day unplanned readmissions. In the second part, we apply association rule analysis to assess the clinical association between initial admission and readmission, followed by employing counterfactual analysis to identify potentially preventable readmissions. This comprehensive analysis can assist health care providers in targeting interventions to effectively reduce preventable readmissions. / Thesis / Master of Science (MSc) Hospital Readmission Predictive Modelling Machine Learning Association Rule Analysis Counterfactual Analysis
47	SQL Implementation of Value Reduction with Multiset Decision Tables Chen, Chen 16 May 2014 (has links) No description available. Computer Science Rough Set Theory Multiset decision table Association rule mining Value reduction
48	An analysis of semantic data quality defiencies in a national data warehouse: a data mining approach Barth, Kirstin 07 1900 (has links) This research determines whether data quality mining can be used to describe, monitor and evaluate the scope and impact of semantic data quality problems in the learner enrolment data on the National Learners’ Records Database. Previous data quality mining work has focused on anomaly detection and has assumed that the data quality aspect being measured exists as a data value in the data set being mined. The method for this research is quantitative in that the data mining techniques and model that are best suited for semantic data quality deficiencies are identified and then applied to the data. The research determines that unsupervised data mining techniques that allow for weighted analysis of the data would be most suitable for the data mining of semantic data deficiencies. Further, the academic Knowledge Discovery in Databases model needs to be amended when applied to data mining semantic data quality deficiencies. / School of Computing / M. Tech. (Information Technology) Data warehouse Data mining Data quality mining Exploratory data mining Cluster analysis Association rule Knowledge discovery in databases National Learners’ Records Database Learner enrolment data Semantic data quality deficiencies 005.745 Data warehousing Data mining Cluster analysis Association rule mining
49	Automating debugging through data mining / Automatisering av felsökning genom data mining Thun, Julia, Kadouri, Rebin January 2017 (has links) Contemporary technological systems generate massive quantities of log messages. These messages can be stored, searched and visualized efficiently using log management and analysis tools. The analysis of log messages offer insights into system behavior such as performance, server status and execution faults in web applications. iStone AB wants to explore the possibility to automate their debugging process. Since iStone does most parts of their debugging manually, it takes time to find errors within the system. The aim was therefore to find different solutions to reduce the time it takes to debug. An analysis of log messages within access – and console logs were made, so that the most appropriate data mining techniques for iStone’s system would be chosen. Data mining algorithms and log management and analysis tools were compared. The result of the comparisons showed that the ELK Stack as well as a mixture between Eclat and a hybrid algorithm (Eclat and Apriori) were the most appropriate choices. To demonstrate their feasibility, the ELK Stack and Eclat were implemented. The produced results show that data mining and the use of a platform for log analysis can facilitate and reduce the time it takes to debug. / Dagens system genererar stora mängder av loggmeddelanden. Dessa meddelanden kan effektivt lagras, sökas och visualiseras genom att använda sig av logghanteringsverktyg. Analys av loggmeddelanden ger insikt i systemets beteende såsom prestanda, serverstatus och exekveringsfel som kan uppkomma i webbapplikationer. iStone AB vill undersöka möjligheten att automatisera felsökning. Eftersom iStone till mestadels utför deras felsökning manuellt så tar det tid att hitta fel inom systemet. Syftet var att därför att finna olika lösningar som reducerar tiden det tar att felsöka. En analys av loggmeddelanden inom access – och konsolloggar utfördes för att välja de mest lämpade data mining tekniker för iStone’s system. Data mining algoritmer och logghanteringsverktyg jämfördes. Resultatet av jämförelserna visade att ELK Stacken samt en blandning av Eclat och en hybrid algoritm (Eclat och Apriori) var de lämpligaste valen. För att visa att så är fallet så implementerades ELK Stacken och Eclat. De framställda resultaten visar att data mining och användning av en plattform för logganalys kan underlätta och minska den tid det tar för att felsöka. Association rule mining Machine learning Classification algorithms Supervised learning Text mining Log management and analysis tools Association rule mining Maskininlärning Classification algorithms Supervised learning Text mining Logghanteringsverktyg Software Engineering Programvaruteknik
50	英文文法關係之型態探勘 / Pattern Mining on English Grammatical Relations 蔡吉章, Tsai, Chi Chang Unknown Date (has links) 一些研究發現常見的ESL(English as a Second Language)學習者在英語寫作時的錯誤為：用字不適當、動詞的形式不正確、句子缺少主詞、以及動詞時態錯誤。這些錯誤主要是起因於：字彙之不足、不清楚文法概念、本身母語的干擾。為了改善ESL學習者的寫作。我們希望能從文法關係的資訊來提供協助。目前，研究文法關係大多著重於字詞構成的單一文法關係，然而字詞在句中可能同時具備了不只一種文法關係。在我們的研究中，我們先發展文法關係樣式辨識系統。從句子中利用此系統提供使用者可搭配的文法關係，以了解可供使用的對應字詞。對應字詞可以輔助學習者適當地使用此關鍵字。此外設計使用者介面供查詢文法關係。以上，我們利用文法關係與搭配字詞來提供使用者英語寫作上的協助。而找尋樣式的做法，我們提出關聯法則和LSA(Latent Semantic Analysis)來實作。 / Some study found some common ESL (English as a Second Language) learners English Writing Error: improper use of the word, the verb form is not correct, the sentence lack of subject and verb tense errors. These errors are mainly due to: lack of vocabulary, grammar concept is not clear, the mother-tongue interference. In order to improve the ESL writing, we hope that the information from the grammatical relation to provide assistance. At present, the studies of grammatical relation mostly emphasize the word consisting of a single grammatical relation. However, words in the sentence may also have more than one grammatical relation. In our study, we first develop grammatical relation pattern recognition system. From the sentence using the system provides users with the grammatical relation, in order to understand the availability of the corresponding words. The corresponding words can help learners make appropriate use of this keyword. In addition the design of the user interface provides querying grammatical relation. This work makes use of grammatical relation and collocation in order to provide users with the assistance of English writing. And look for patterns of practice, we have proposed association rules and LSA (Latent Semantic Analysis) to implement. 關聯法則文法樣式文法關係 Association rule Grammatical patterns Grammatical relations LSA

Search results