Spelling suggestions: "subject:"knowledge discovery inn data"" "subject:"knowledge discovery iin data""
1 |
Scribe: A Clustering Approach To Semantic Information RetrievalLangley, Joseph R 05 August 2006 (has links)
Information retrieval is the process of fulfilling a user?s need for information by locating items in a data collection that are similar to a complex query that is often posed in natural language. Latent Semantic Indexing (LSI) was the predominant technique employed at the National Institute of Standards and Technology?s Text Retrieval Conference for many years until limitations of its scalability to large data sets were discovered. This thesis describes SCRIBE, a modification of LSI with improved scalability. SCRIBE clusters its semantic index into discrete volumes described by high-dimensional extensions to computer graphics data structures. SCRIBE?s clustering strategy limits the number of items that must be searched and provides for sub-linear time complexity in the number of documents. Experimental results with a large, natural language document collection demonstrate that SCRIBE achieves retrieval accuracy similar to LSI but requires 1/10 the time.
|
2 |
Identificación de las tendencias de reclamos presentes en reclamos.cl y que apunten contra instituciones de educación y organizaciones públicasBeth Madariaga, Daniel Guillermo January 2012 (has links)
Ingeniero Civil Industrial / En la siguiente memoria se busca corroborar, por medio de una experiencia práctica y aplicada, si a caso el uso de las técnicas de Web Opinion Mining (WOM) y de herramientas informáticas, permiten determinar las tendencias generales que pueden poseer un conjunto de opiniones presentes en la Web. Particularmente, los reclamos publicados en el sitio web Reclamos.cl, y que apuntan contra instituciones pertenecientes a las industrias nacionales de Educación y de Gobierno.
En ese sentido, los consumidores cada vez están utilizando más la Web para publicar en ella las apreciaciones positivas y negativas que poseen sobre lo que adquieren en el mercado, situación que hace de esta una mina de oro para diversas instituciones, especialmente para lo que es el identificar las fortalezas y las debilidades de los productos y los servicios que ofrecen, su imagen pública, entre varios otros aspectos.
Concretamente, el experimento se realiza a través de la confección y la ejecución de una aplicación informática que integra e implementa conceptos de WOM, tales como Knowledge Discovery from Data (KDD), a modo de marco metodológico para alcanzar el objetivo planteado, y Latent Dirichlet Allocation (LDA), para lo que es la detección de tópicos dentro de los contenidos de los reclamos abordados. También se hace uso de programación orientada a objetos, basada en el lenguaje Python, almacenamiento de datos en bases de datos relacionales, y se incorporan herramientas pre fabricadas con tal de simplificar la realización de ciertas tareas requeridas.
La ejecución de la aplicación permitió descargar las páginas web en cuyo interior se encontraban los reclamos de interés para la realización experimento, detectando en ellas 6.460 de estos reclamos; los cueles estaban dirigidos hacia 245 instituciones, y cuya fecha de publicación fue entre el 13 de Julio de 2006 y el 5 de Diciembre de 2011.
Así también, la aplicación, mediante el uso de listas de palabras a descartar y de herramientas de lematización, procesó los contenidos de los reclamos, dejando en ellos sólo las versiones canónicas de las palabras que los constituían y que aportasen significado a estos.
Con ello, la aplicación llevó a cabo varios análisis LDA sobre estos contenidos, los que arbitrariamente se definieron para ser ejecutados por cada institución detectada, tanto sobre el conjunto total de sus reclamos, como en segmentos de estos agrupados por año de publicación, con tal de generar, por cada uno de estos análisis, resultados compuestos por 20 tópicos de 30 palabras cada uno.
Con los resultados de los análisis LDA, y mediante una metodología de lectura e interpretación manual de las palabras que constituían cada uno de los conjuntos de tópicos obtenidos, se procedió a generar frases y oraciones que apuntasen a hilarlas, con tal de obtener una interpretación que reflejase la tendencia a la cual los reclamos, representados en estos resultados, apuntaban.
De esto se pudo concluir que es posible detectar las tendencias generales de los reclamos mediante el uso de las técnicas de WOM, pero con observaciones al respecto, pues al surgir la determinación de las tendencias desde un proceso de interpretación manual, se pueden generar subjetividades en torno al objeto al que apuntan dichas tendencias, ya sea por los intereses, las experiencias, entre otros, que posea la persona que realice el ejercicio de interpretación de los resultados.
|
3 |
A Framework for How to Make Use of an Automatic Passenger Counting SystemFihn, John, Finndahl, Johan January 2011 (has links)
Most of the modern cities are today facing tremendous traffic congestions, which is a consequence of an increasing usage of private motor vehicles in the cities. Public transport plays a crucial role to reduce this traffic, but to be an attractive alternative to the use of private motor vehicles the public transport needs to provide services that suit the citizens requirements for travelling. A system that can provide transit agencies with rapid feedback about the usage of their transport network is the Automatic Passenger Counting (APC) system, a system that registers the number of passengers boarding and alighting a vehicle. Knowledge about the passengers travel behaviour can be used by transit agencies to adapt and improve their services to satisfy the requirements, but to achieve this knowledge transit agencies needs to know how to use an APC system. This thesis investigates how a transit agency can make use of an APC system. The research has taken place in Melbourne where Yarra Trams, operator of the tram network, now are putting effort in how to utilise the APC system. A theoretical framework based on theories about Knowledge Discovery from Data, System Development, and Human Computer Interaction, is built, tested, and evaluated in a case study at Yarra Trams. The case study resulted in a software system that can process and model Yarra Tram's APC data. The result of the research is a proposal of a framework consistingof different steps and events that can be used as a guide for a transit agency that wants to make use of an APC system.
|
4 |
Approche évolutionnaire et agrégation de variables : application à la prévision de risques hydrologiques / Evolutionary approach and variable aggregation : application to hydrological risks forecastingSegretier, Wilfried 10 December 2013 (has links)
Les travaux de recherche présentés dans ce mémoire s'inscrivent dans la lignée des approches de modélisation hydrologiques prédictives dirigées par les données. Nous avons particulièrement développé leur application sur le contexte difficile des phénomènes de crue éclairs caractéristiques des bassins versants de la région Caraïbe qui pose un dé fi sé.curi taire. En envisageant le problème de la prévision de crues comme un problème d'optimisation combinatoire difficile nous proposons d'utiliser la notion de métaneuristiques, à travers les algorithmes évolutionnaire notamment pour leur capacité à parcourir efficacement de grands espaces de recherche et fi fournir des solutions de bOlIDe qualité en des temps d'exécution raisonnables. Nous avons présenté l'approche de prédiction AV2D : Aggregate Variable Data Driven dom le concept central est la notion de variable agrégée. L'idée sous-jacente à ce concept est de considérer le pouvoir prédictif de nouvelles variables définies comme le résultat de fonctions tatistiques, dites d'agrégation calculées sur de donnée' correspondant à des périodes de temps précédent uo événem nt à prédire. Ces variable sont caractérisées par des ensembles de paramètres correspondant a leur pJ:opriétés. Nous avons imroduitle variables agrégées hydrométéorologiques permettant de répondre au problème de la classification d événements hydrologiques. La complexité du parcours de l'espace de recherche engendré par les paramètres définissant ces variables a été prise en compte grâce à la njse en oeuvre d'un algorithme évolutionnaire particulier dont les composants ont été spécifiquement définis pour ce problème. Nous avons montré, à travers une étude comparative avec d'autres approches de modélisation dirigées par les données, menée sur deux cas d'études de bassins versant caribéens, que l'approche AV2D est particulièrement bien adaptée à leur contexte. Nous étudions par la suite les bénéfices offerts par les approches de modélisation hydrologiques modulaires dirigées par les données, en définissant un procédé de division en sous-processus prenant en compte les caractéristiques paniculières des bassins versants auxquels nous nous intéressons. Nou avons proposé une extension des travaux précédents à travers la définition d'une approche de modélisation modulaire M2D: Spatial Modular Data Driven, consistant à considérer des sous-processus en divisant l'ensemble des exemples à classifier en sous-ensembles correspondant à des comportements hydrologiques homogènes. Nous avons montré à travers une étude comparative avec d autres approches dU'igées par les données mises en oeuvre sur les mêmes sous-ensembles de données que celte approche permet d améliorer les résultats de prédiction particulièrement à coun Lenne. Nous avons enfin proposé la modélisation d un outil de pi / The work presented in this thesis is in the area of data-driven hydrological modeling approaches. We particularly investigared their application on the difficult problem of flash flood phenomena typically observed in Caribbean watersheds. By considering the problem of flood prediction as a combinatorial optimization problem, we propose to use the notion of Oleraheuristics, through evolutionary algorithms, especially for their capacity ta visit effjciently large search space and to provide good solutions in reasonable execution times. We proposed the hydrological prediction approach AV2D: Aggregate Variable Data Driven which central concept is the notion of aggregate variable. The underlying idea of this [concept is to consider the predictive power of new variables defined as the results of statistical functions, called aggregation functions, computed on data corresponding ta time periods before an event ta predict. These variables are characterized by sets of parameters corresponding ta their specifications. We introduced hydro-meteorological aggregate variables allowing ta address the classification problem of hydrological events. We showed through a comparative study on two typical caribbean watersheds, using several common data driven modelling techniques that the AV2D approach is panicul.rly weil fitted ta the studied context. We also study the benefits offered by modulaI' approaches through the definition of the SM2D: Spatial Modular DataDriven approach, consisting in considering sub-processes partly defined by spatial criteria. We showed that the results obtained by the AV2D on these sub-processes allows to increase the performances particularly for short term prediction. Finally we proposed the modelization of a generic control tool for hydro-meteorological prediction systems, H2FCT: Hydro-meteorological Flood Forecasting Control 1'001
|
5 |
Empirické porovnání systémů dobývání znalostí z databází / Empirical Comparison of Knowledge Discovery in Databases SystemsDopitová, Kateřina January 2010 (has links)
Submitted diploma thesis considers empirical comparison of knowledge discovery in databases systems. Basic terms and methods of knowledge discovery in databases domain are defined and criterions used to system comparison are determined. Tested software products are also shortly described in the thesis. Results of real task processing are brought out for each system. The comparison of individual systems according to previously determined criterions and comparison of competitiveness of commercial and non-commercial knowledge discovery in databases systems are performed within the framework of thesis.
|
6 |
Dolovací moduly systému pro dolování z dat v prostředí Oracle / Mining Modules of the Data Mining System in OracleMader, Pavel January 2009 (has links)
This master's thesis deals with questions of the data mining and an extension of a data mining system in the Oracle environment developed at FIT. So far, this system cannot apply to real-life conditions as there are no data mining modules available. This system's core application design includes an interface allowing the addition of mining modules. Until now, this interface has been tested on a sample mining module only; this module has not been executing any activity just demonstrating the use of this interface. The main focus of this thesis is the study of this interface and the implementation of a functional mining module testing the applicability of the implemented interface. Association rule mining module was selected for implementation.
|
7 |
Knowledge discovery for moderating collaborative projectsChoudhary, Alok K. January 2009 (has links)
In today's global market environment, enterprises are increasingly turning towards collaboration in projects to leverage their resources, skills and expertise, and simultaneously address the challenges posed in diverse and competitive markets. Moderators, which are knowledge based systems have successfully been used to support collaborative teams by raising awareness of problems or conflicts. However, the functioning of a moderator is limited to the knowledge it has about the team members. Knowledge acquisition, learning and updating of knowledge are the major challenges for a Moderator's implementation. To address these challenges a Knowledge discOvery And daTa minINg inteGrated (KOATING) framework is presented for Moderators to enable them to continuously learn from the operational databases of the company and semi-automatically update the corresponding expert module. The architecture for the Universal Knowledge Moderator (UKM) shows how the existing moderators can be extended to support global manufacturing. A method for designing and developing the knowledge acquisition module of the Moderator for manual and semi-automatic update of knowledge is documented using the Unified Modelling Language (UML). UML has been used to explore the static structure and dynamic behaviour, and describe the system analysis, system design and system development aspects of the proposed KOATING framework. The proof of design has been presented using a case study for a collaborative project in the form of construction project supply chain. It has been shown that Moderators can "learn" by extracting various kinds of knowledge from Post Project Reports (PPRs) using different types of text mining techniques. Furthermore, it also proposed that the knowledge discovery integrated moderators can be used to support and enhance collaboration by identifying appropriate business opportunities and identifying corresponding partners for creation of a virtual organization. A case study is presented in the context of a UK based SME. Finally, this thesis concludes by summarizing the thesis, outlining its novelties and contributions, and recommending future research.
|
8 |
Empirické porovnání volně dostupných systémů dobývání znalostí z databází / Empirical comparison of free software suites for knowledge discovery from dataKasík, Josef January 2009 (has links)
Both topic and main objective of the diploma thesis is a comparison of free data mining suites. Subjects of comparison are six particular applications developed under university projects as experimental tools for data mining and mediums for educational purposes. Criteria of the comparison are derived from four general aspects that form the base for further analyses. Each system is evaluated as a tool for handling real-time data mining tasks, a tool supporting various phases of the CRISP-DM methodology, a tool capable of practical employment on certain data and as a common software system. These aspects bring 31 particular criteria for comparison, evaluation of whose was determined by thorough analysis of each system. The results of comparison confirmed the anticipated assumption. As the best tool the Weka data mining suite was evaluated. The main advantages of Weka are high number of machine learning algorithms, numerous data preparation tools and speed of processing.
|
9 |
Une approche pour l'évaluation des systèmes d'aide à la décision mobiles basés sur le processus d'extraction des connaissances à partir des données : application dans le domaine médical / An approach for the evaluation of mobile decision support systems based on a knowledge discovery from data process : application in the medical fieldBorcheni, Emna 27 March 2017 (has links)
Dans ce travail, on s’intéresse aux Systèmes d’Aide à la Décision Mobiles qui sont basés sur le processus d’Extraction des Connaissances à partir des Données (SADM/ECD). Nous contribuons non seulement à l'évaluation de ces systèmes, mais aussi à l'évaluation dans le processus d’ECD lui-même. L'approche proposée définit un module de support d'évaluation pour chaque module composant le processus d’ECD en se basant sur des modèles de qualité. Ces modules évaluent non seulement la qualité d'utilisation de chaque module logiciel composant le processus d’ECD, mais aussi d'autres critères qui reflètent les objectifs de chaque module de l’ECD. Notre objectif est d'aider les évaluateurs à détecter des défauts le plus tôt possible pour améliorer la qualité de tous les modules qui constituent un SADM/ECD. Nous avons aussi pris en compte le changement de contexte d'utilisation en raison de la mobilité. De plus, nous avons proposé un système d’aide à l’évaluation, nommé CEVASM : Système d’aide à l’évaluation basée sur le contexte pour les SADM, qui contrôle et mesure tous les facteurs de qualité proposés. Finalement, l'approche que nous proposons est appliquée pour l'évaluation des modules d'un SADM/ECD pour la lutte contre les infections nosocomiales à l'hôpital Habib Bourguiba de Sfax, Tunisie. Lors de l'évaluation, nous nous sommes basés sur le processus d'évaluation ISO/IEC 25040. L'objectif est de pouvoir valider, a priori, l'outil d'évaluation réalisé (CEVASM) et par conséquent, l'approche proposée. / In this work, we are interested in Mobile Decision support systems (MDSS), which are based on the Knowledge Discovery from Data process (MDSS/KDD). Our work is dealing with the evaluation of these systems, but also to the evaluation in the KDD process itself. The proposed approach appends an evaluation support module for each software module composing the KDD process based on quality models. The proposed evaluation support modules allow to evaluate not only the quality in use of each module composing the KDD process, but also other criteria that reflect the objectives of each KDD module. Our main goal is to help evaluators to detect defects as early as possible in order to enhance the quality of all the modules that constitute a MDSS/KDD. We have also presented a context-based method that takes into account the change of context of use due to mobility. In addition, we have proposed an evaluation support system that monitors and measures all the proposed criteria. Furthermore, we present the implementation of the proposed approach. These developments concern mainly the proposed evaluation tool: CEVASM: Context-based EVAluation support System for MDSS. Finally, the proposed approach is applied for the evaluation of the modules of a MDSS/KDD for the fight against nosocomial infections, in Habib Bourguiba hospital in Sfax, Tunisia. For every module in KDD, we are interested with the phase of evaluation. We follow the evaluation process based on the ISO/IEC 25040 standard. The objective is to be able to validate, a priori, the realized evaluation tool (CEVASM) and consequently, the proposed approach.
|
10 |
Dolovací moduly systému pro dolování z dat na platformě NetBeans / Mining Modules of Data Mining System on NetBeans PlatformHenkl, Tomáš January 2009 (has links)
The master's thesis deals with the knowledge discover in databases and with the extending of the data mining systems in the Oracle environment developed at the VUT FIT. The system kernel conception incorporates an interface that enables the adding of data mining modules. The objective of the thesis is to learn this interface and implement and embed the data mining module for decision-tree classification into the application. In addition, the thesis compares the application with similar commercial product SAS Enterprise Miner
|
Page generated in 0.0825 seconds