201 |
Uso de redes neurais artificiais para descoberta de conhecimento sobre a escolha do modo de viagem / Using artificial neural network for the discovery of mode travel choice knowledgeWermersch, Fábio Glauco 09 May 2002 (has links)
Esta pesquisa objetivou uma melhor compreensão do processo de escolha do modo de viagem. Empregou-se a abordagem indutiva dirigida a dados livre de suposições a priori da mineração em banco de dados (Data Mining), utilizando redes neurais artificiais (RNA) como ferramenta mineradora, à procura de conhecimento, ou informação útil, a respeito de escolha e capaz de indicar qual das estruturas de decisão subjacentes aos modelos de escolha modal considerados mais se aproximaria ao do observado. Partindo-se da ideia de que nesse processo exista um padrão o qual pode ser captado por uma RNA, ajustou-se um modelo de RNA aos dados e extraiu-se então o conhecimento contido no modelo de RNA ajustado através de um algoritmo de extração de árvore de decisão de RNA chamado Trepan (Trees parroting network), que foi analisado e interpretado à luz dos objetivos desta pesquisa. Os dados que foram utilizados nesse processo de descoberta de conhecimento são provenientes de uma pesquisa de entrevista domiciliar realizada na cidade de Bauru - SP, para fins de estimativa da matriz de deslocamentos origem-destino dessa cidade. Obteve-se quatro árvores de decisão com estruturas simples e com a araucária preditiva de 75% aproximadamente para os três modos de viagem estudados. Embora o conhecimento extraído dos modelos neurais ajustados não tenham proporcionado a indicação de qual das estruturas de decisão subjacentes aos modelos de escolha modal mais se aproxima da obtida com o modelo neural, foi constatada nas árvores resultantes do processo de descoberta do conhecimento uma relação de compensação entre o atributo sexo e os atributos relacionados à capacidade econômica do domicílio na decisão de escolha do modo carro para a realização de uma viagem. Os resultados também sugerem a não necessidade de mais um atributo de entrada referente ao deslocamento realizado em uma viagem para modelagem por RNA do processo de escolha do modo de viagem no contexto estudado. / This research aimed at a better understanding of the mode travel choice process. The inductive data driven free from a priori assumptions of the data mining approach was employed, using artificial neural networks (ANN) as a mining tool, looking for knowledge or useful information, concerning the choice process and capable of indicating which of the underlying decision structures to the considered modal choice models would come closer to the observed one. Taking into consideration that there is a pattern in this process that can be captured by ANN, an ANN model was fitted (trained) to the data, and the knowledge contained in the trained ANN model was extracted by employing an ANN decision tree extraction algorithm called Trepan (Trees parroting network), which was analysed and interpreted in the light of the object of this research. The data which was employed in this knowledge discovery process come from a household survey carried out in Bauru - SP in order to estimate the O-D matrix in this city. Four decision trees with simple structures and predicting accuracy of approximately 75% for the three travel modes studied were obtained. Even though the knowledge extracted from the trained ANN model has not yielded the indication of which of the underlying decision structures to the modal choice models was closer to the neural model, a compensating relation between the sex attribute and the household economic-related attribute in the decision of choosing the car mode in order to travel was evidenced in the trees resulting from the process of knowledge discovery. The results also suggest the lack of necessity of more than one input travel attribute concerning the displacement performed in a trip for the ANN modelling of the mode travel choice process in the studied context.
|
202 |
Zpracování asociačních pravidel metodou vícekriteriálního shlukování / Post-processing of association rules by multicriterial clustering methodKejkula, Martin January 2002 (has links)
Association rules mining is one of several ways of knowledge discovery in databases. Paradoxically, data mining itself can produce such great amounts of association rules that there is a new knowledge management problem: there can easily be thousands or even more association rules holding in a data set. The goal of this work is to design a new method for association rules post-processing. The method should be software and domain independent. The output of the new method should be structured description of the whole set of discovered association rules. The output should help user to work with discovered rules. The path to reach the goal I used is: to split association rules into clusters. Each cluster should contain rules, which are more similar each other than to rules from another cluster. The output of the method is such cluster definition and description. The main contribution of this Ph.D. thesis is the described new Multicriterial clustering association rules method. Secondary contribution is the discussion of already published association rules post-processing methods. The output of the introduced new method are clusters of rules, which cannot be reached by any of former post-processing methods. According user expectations clusters are more relevant and more effective than any former association rules clustering results. The method is based on two orthogonal clustering of the same set of association rules. One clustering is based on interestingness measures (confidence, support, interest, etc.). Second clustering is inspired by document clustering in information retrieval. The representation of rules in vectors like documents is fontal in this thesis. The thesis is organized as follows. Chapter 2 identify the role of association rules in the KDD (knowledge discovery in databases) process, using KDD methodologies (CRISP-DM, SEMMA, GUHA, RAMSYS). Chapter 3 define association rule and introduce characteristics of association rules (including interestingness measuress). Chapter 4 introduce current association rules post-processing methods. Chapter 5 is the introduction to cluster analysis. Chapter 6 is the description of the new Multicriterial clustering association rules method. Chapter 7 consists of several experiments. Chapter 8 discuss possibilities of usage and development of the new method.
|
203 |
Uso de redes neurais artificiais para descoberta de conhecimento sobre a escolha do modo de viagem / Using artificial neural network for the discovery of mode travel choice knowledgeFábio Glauco Wermersch 09 May 2002 (has links)
Esta pesquisa objetivou uma melhor compreensão do processo de escolha do modo de viagem. Empregou-se a abordagem indutiva dirigida a dados livre de suposições a priori da mineração em banco de dados (Data Mining), utilizando redes neurais artificiais (RNA) como ferramenta mineradora, à procura de conhecimento, ou informação útil, a respeito de escolha e capaz de indicar qual das estruturas de decisão subjacentes aos modelos de escolha modal considerados mais se aproximaria ao do observado. Partindo-se da ideia de que nesse processo exista um padrão o qual pode ser captado por uma RNA, ajustou-se um modelo de RNA aos dados e extraiu-se então o conhecimento contido no modelo de RNA ajustado através de um algoritmo de extração de árvore de decisão de RNA chamado Trepan (Trees parroting network), que foi analisado e interpretado à luz dos objetivos desta pesquisa. Os dados que foram utilizados nesse processo de descoberta de conhecimento são provenientes de uma pesquisa de entrevista domiciliar realizada na cidade de Bauru - SP, para fins de estimativa da matriz de deslocamentos origem-destino dessa cidade. Obteve-se quatro árvores de decisão com estruturas simples e com a araucária preditiva de 75% aproximadamente para os três modos de viagem estudados. Embora o conhecimento extraído dos modelos neurais ajustados não tenham proporcionado a indicação de qual das estruturas de decisão subjacentes aos modelos de escolha modal mais se aproxima da obtida com o modelo neural, foi constatada nas árvores resultantes do processo de descoberta do conhecimento uma relação de compensação entre o atributo sexo e os atributos relacionados à capacidade econômica do domicílio na decisão de escolha do modo carro para a realização de uma viagem. Os resultados também sugerem a não necessidade de mais um atributo de entrada referente ao deslocamento realizado em uma viagem para modelagem por RNA do processo de escolha do modo de viagem no contexto estudado. / This research aimed at a better understanding of the mode travel choice process. The inductive data driven free from a priori assumptions of the data mining approach was employed, using artificial neural networks (ANN) as a mining tool, looking for knowledge or useful information, concerning the choice process and capable of indicating which of the underlying decision structures to the considered modal choice models would come closer to the observed one. Taking into consideration that there is a pattern in this process that can be captured by ANN, an ANN model was fitted (trained) to the data, and the knowledge contained in the trained ANN model was extracted by employing an ANN decision tree extraction algorithm called Trepan (Trees parroting network), which was analysed and interpreted in the light of the object of this research. The data which was employed in this knowledge discovery process come from a household survey carried out in Bauru - SP in order to estimate the O-D matrix in this city. Four decision trees with simple structures and predicting accuracy of approximately 75% for the three travel modes studied were obtained. Even though the knowledge extracted from the trained ANN model has not yielded the indication of which of the underlying decision structures to the modal choice models was closer to the neural model, a compensating relation between the sex attribute and the household economic-related attribute in the decision of choosing the car mode in order to travel was evidenced in the trees resulting from the process of knowledge discovery. The results also suggest the lack of necessity of more than one input travel attribute concerning the displacement performed in a trip for the ANN modelling of the mode travel choice process in the studied context.
|
204 |
Mining large amounts of mobile object data / Истраживање великих количина података о покретним објектима / Istraživanje velikih količina podataka o pokretnim objektimaGavrić Katarina 22 December 2017 (has links)
<p>Within this thesis, we examined the possibilities of using an increasing amount of<br />publicly available metadata about locations and peoples' activities in order to gain<br />new knowledge and develop new models of behavior and movement of people. The<br />purpose of the research conducted for this thesis was to solve practical problems,<br />such as: analyzing attractive tourist sites, defining the most frequent routes people<br />are taking, defining main ways of transportation, and discovering behavioral<br />patterns in terms of defining strategies to suppress expansion of virus infections. In<br />this thesis, a practical study was carried out on the basis of protected (aggregated<br />and anonymous) CDR (Caller Data Records) data and metadata of geo-referenced<br />multimedia content.</p> / <p>Предмет и циљ истраживања докторске дисертације представља евалуација<br />могућности коришћења све веће количине јавно доступних података о<br />локацији и кретању људи, како би се дошло до нових сазнања, развили нови<br />модели понашања и кретања људи који се могу применити за решавање<br />практичних проблема као што су: анализа атрактивних туристичких локација,<br />откривање путања кретања људи и средстава транспорта које најчешће<br />користе, као и откривање важних параметара на основу којих се може<br />развити стратегија за заштиту нације од инфективних болести итд. У раду је у<br />ту сврхе спроведена практична студија на бази заштићених (агрегираних и<br />анонимизираних) ЦДР података и метаподатака гео-референцираног<br />мултимедијалног садржаја. Приступ је заснован на примени техника<br />вештачке интелигенције и истраживања података.</p> / <p>Predmet i cilj istraživanja doktorske disertacije predstavlja evaluacija<br />mogućnosti korišćenja sve veće količine javno dostupnih podataka o<br />lokaciji i kretanju ljudi, kako bi se došlo do novih saznanja, razvili novi<br />modeli ponašanja i kretanja ljudi koji se mogu primeniti za rešavanje<br />praktičnih problema kao što su: analiza atraktivnih turističkih lokacija,<br />otkrivanje putanja kretanja ljudi i sredstava transporta koje najčešće<br />koriste, kao i otkrivanje važnih parametara na osnovu kojih se može<br />razviti strategija za zaštitu nacije od infektivnih bolesti itd. U radu je u<br />tu svrhe sprovedena praktična studija na bazi zaštićenih (agregiranih i<br />anonimiziranih) CDR podataka i metapodataka geo-referenciranog<br />multimedijalnog sadržaja. Pristup je zasnovan na primeni tehnika<br />veštačke inteligencije i istraživanja podataka.</p>
|
205 |
Geospatial Knowledge Discovery using Volunteered Geographic Information : a Complex System PerspectiveJia, Tao January 2012 (has links)
The continuous progression of urbanization has resulted in an increasing number of people living in cities or towns. In parallel, advancements in technologies, such as the Internet, telecommunications, and transportation, have allowed for better connectivity among people. This has engendered drastic changes in urban systems during the recent decades. From a social geographic perspective, the changes in urban systems are primarily characterized by intensive contacts among people and their interactions with the surrounding urban environment, which further leads to subsequent challenging problems such as traffic jams, environmental pollution, urban sprawl, etc. These problems have been reported to be heterogeneous and non-deterministic. Hence, to cope with them, massive amounts of geographic data are required to create new knowledge on urban systems. Due to the thriving of Volunteer Geographic Information (VGI) in recent years, this thesis presents knowledge on urban systems based on extensive VGI datasets from three sources: highway dataset from the OpenStreetMap (OSM) project, photo location dataset from the Flickr website, and GPS tracking datasets from volunteers, taxicabs, and air flights. The knowledge primarily relates to two issues of urban systems: the urban space and the corresponding human dynamics. In accordance, on one hand, urban space acts as a carrier for associated geographic activities and knowledge of it benefits our understanding of current social and economic problems in urban systems. On the other hand, human dynamics reflect human behavior in urban space, which leads to complex mobility or activity patterns. Its investigation allows a derivation of the underlying driving force that is very instructive to urban planning, traffic management, and infectious disease control. Therefore, to fully understand the two issues, this thesis conducts a thorough investigation from multiple aspects. The first issue is investigated from four aspects. First, at the city level, the controversial topic of city size regularity is investigated in terms of natural cities, and the conclusion is that Zipf’s law holds stably for all US cities. Second, at the sub-city level, the size distribution of spatial units within different cities in terms of the clusters formed by street nodes, photo locations, and taxi static points are explored, and the result shows a remarkable scaling property of these spatial units. Third, enlightened by the scaling property of the urban space at the city or sub-city level, this thesis devises a novel tool that can demarcate the cities into three categories: compact cities, normal cities, and sprawling cities. The tool is then applied to cities in both the US and three European countries. In the last, another representation of urban space is taken into account, namely the transportation network. The findings report that the US airport network displays the properties of scale-free, small-world, and disassortative mixing and that the individual natural airports show heterogeneous patterns that are probably subject to geographic constraints and socioeconomic factors. The second issue is examined from four perspectives. First, at the city level, the movement flow contributed by agents using two types of behavior is investigated through an agent-based simulation, and the result conjectures that the human mobility behavior is mainly shaped by the underlying street network. Second, at the country level, this thesis reports that the human travel length by air can be approximated well by an exponential distribution, and subsequent simulations indicate that human mobility behavior is largely constrained by the underlying airport network. Third, at the regional level, the length that humans travel by car is demonstrated to agree well with a power law with exponential cutoff distribution, and subsequent simulation further reproduces this levy flight characteristic. Based on the simulation, human mobility behavior is again revealed to be primarily shaped by the underlying hierarchical spatial structure. Finally, taxicab static points are adopted to explore human activity patterns, which can be characterized as the regularities in space and time, the heterogeneity and predictability in space. From a complex system perspective, this thesis presents the knowledge discovered in urban systems using massive volumes of geographic data. Together with new knowledge from empirical findings, the development of methods, and the design of theoretic models, this thesis also shares the research community with geographic data generated from extensive VGI datasets and the corresponding source codes. Moreover, this study is aligned with a paradigm shift in that it analyzes large-size datasets using high processing power as opposed to analyzing small-size datasets with low processing power. / <p>QC 20121113</p>
|
206 |
Μέθοδοι και τεχνικές ανακάλυψης γνώσης στο σημαντικό ιστό : παραγωγική απόκτηση γνώσης από οντολογικά έγγραφα και η τεχνική της σημασιακής προσαρμογής / Methods and techniques for semantic web knowledge discovery : deductive knowledge acquisition from ontology documents and the semantic profiling techniqueΚουτσομητρόπουλος, Δημήτριος 03 August 2009 (has links)
Ο Σημαντικός Ιστός (Semantic Web) είναι ένας συνδυασμός τεχνολογιών και προτύπων με σκοπό να προσδοθεί στη διαδικτυακή πληροφορία αυστηρά καθορισμένη σημασιακή δομή και ερμηνεία. Στόχος είναι να μπορούν οι χρήστες του Παγκόσμιου Ιστού καθώς και αυτοματοποιημένοι πράκτορες να επεξεργάζονται, να διαχειρίζονται και να αξιοποιούν την κατάλληλα χαρακτηρισμένη πληροφορία με τρόπο ευφυή και αποδοτικό.
Ωστόσο, παρά τις τεχνικές που έχουν κατά καιρούς προταθεί, δεν υπάρχει ξεκάθαρη μέθοδος ώστε, αξιοποιώντας το φάσμα του Σημαντικού Ιστού, η διαδικτυακή πληροφορία να ανακτάται με τρόπο παραγωγικό, δηλαδή με βάση τα ήδη εκπεφρασμένα γεγονότα να συνάγεται νέα, άρρητη πληροφορία.
Για την αντιμετώπιση αυτής της κατάστασης, αρχικά εισάγεται και προσδιορίζεται το πρόβλημα της Ανακάλυψης Γνώσης στο Σημαντικό Ιστό (Semantic Web Knowledge Discovery, SWKD). Η Ανακάλυψη Γνώσης στο Σημαντικό Ιστό εκμεταλλεύεται το σημασιακό υπόβαθρο και τις αντίστοιχες σημασιακές περιγραφές των πληροφοριών, όπως αυτές είναι θεμελιωμένες σε μια λογική θεωρία (οντολογίες εκφρασμένες σε γλώσσα OWL). Βάσει αυτών και με τη χρήση των κατάλληλων μηχανισμών αυτοματοποιημένου συλλογισμού μπορεί να συμπεραθεί νέα, άδηλη γνώση, η οποία, μέχρι τότε, μόνο υπονοούνταν στα ήδη υπάρχοντα δεδομένα.
Για να απαντηθεί το ερώτημα αν και σε πιο βαθμό οι τεχνολογίες και η λογική θεωρία του Σημαντικού Ιστού συνεισφέρουν αποδοτικά και εκφραστικά στο πρόβλημα της SWKD καταρτίζεται μια πρότυπη Μεθοδολογία Ανακάλυψης Γνώσης στο Σημαντικό Ιστό, η οποία θεμελιώνεται σε πρόσφατα θεωρητικά αποτελέσματα, αλλά και στην ποιοτική και πειραματική συγκριτική αξιολόγηση διαδεδομένων μηχανισμών συμπερασμού (inference engines) που βασίζονται σε Λογικές Περιγραφής (Description Logics). H αποδοτικότητα και η εκφραστικότητα της μεθόδου αυτής δείχνεται ότι εξαρτώνται από συγκεκριμένους θεωρητικούς, οργανωτικούς και τεχνικούς περιορισμούς.
Η πειραματική επαλήθευση της μεθοδολογίας επιτυγχάνεται με την κατασκευή και επίδειξη της Διεπαφής Ανακάλυψης Γνώσης (Knowledge Discovery Interface) μιας κατανεμημένης δηλαδή δικτυακής υπηρεσίας, η οποία έχει εφαρμοστεί με επιτυχία σε πειραματικά δεδομένα. Τα αποτελέσματα που προκύπτουν με τη χρήση της διεπαφής επαληθεύουν, μέχρι ορισμένο βαθμό, τις υποθέσεις που έχουν γίνει σχετικά κυρίως με την παράμετρο της εκφραστικότητας και δίνουν το έναυσμα για την αναζήτηση και εξέταση της υποστήριξης των νέων προτεινόμενων επεκτάσεων της λογικής θεωρίας του Σημαντικού Ιστού, δηλαδή της γλώσσας OWL 1.1.
Για την ενίσχυση της εκφραστικότητας της ανακάλυψης γνώσης στην περίπτωση συγκεκριμένων πεδίων γνώσης (knowledge domains) εισάγεται μια νέα τεχνική, αποκαλούμενη Σημασιακή Προσαρμογή. Η τεχνική αυτή εξελίσσει την Προσαρμογή Μεταδεδομένων Εφαρμογής (Metadata Application Profiling) από μια επίπεδη συρραφή και συγχώνευση σχημάτων και πεδίων μεταδεδομένων, σε μία ουσιαστική επέκταση και σημασιακή αναγωγή και εμπλουτισμό του αντίστοιχου μοντέλου στο οποίο εφαρμόζεται. Έτσι, η σημασιακή προσαρμογή εξειδικεύει ένα οντολογικό μοντέλο ως προς μια συγκεκριμένη εφαρμογή, όχι απλά με την προσθήκη λεξιλογίου από ετερογενή σχήματα, αλλά μέσω της σημασιακής εμβάθυνσης (semantic intension) και εκλέπτυνσης (semantic refinement) του αρχικού μοντέλου. Η τεχνική αυτή και τα αποτελέσματά της επαληθεύονται πειραματικά με την εφαρμογή στο μοντέλο πληροφοριών πολιτιστικής κληρονομιάς CIDOC-CRM και δείχνεται ότι, με τη χρήση κατάλληλων μεθόδων, η γενική εφαρμοσιμότητα του μοντέλου μπορεί να διαφυλαχθεί.
Για να μπορεί όμως η Ανακάλυψη Γνώσης στο Σημαντικό Ιστό να δώσει ικανοποιητικά αποτελέσματα, απαιτούνται όσο το δυνατόν πληρέστερες και αυξημένες περιγραφές των δικτυακών πόρων. Παρόλο που πληροφορίες άμεσα συμβατές με τη λογική θεωρία του Σημαντικού Ιστού δεν είναι ευχερείς, υπάρχει πληθώρα δεδομένων οργανωμένων σε επίπεδα σχήματα μεταδεδομένων (flat metadata schemata). Διερευνάται επομένως αν η SWKD μπορεί να εφαρμοστεί αποδοτικά και εκφραστικά στην περίπτωση τέτοιων ημιδομημένων μοντέλων γνώσης, όπως για παράδειγμα στην περίπτωση του σχήματος μεταδεδομένων Dublin Core. Δείχνεται ότι το πρόβλημα αυτό ανάγεται μερικώς στην εφαρμογή της σημασιακής προσαρμογής στην περίπτωση τέτοιων μοντέλων, ενώ για τη διαφύλαξη της διαλειτουργικότητας και την επίλυση αμφισημιών που προκύπτουν εφαρμόζονται ανάλογες μέθοδοι και επιπλέον εξετάζεται η τεχνική της παρονομασίας (punning) που εισάγει η OWL 1.1, βάσει της οποίας ο ορισμός ενός ονόματος μπορεί να έχει κυμαινόμενη σημασιακή ερμηνεία ανάλογα με τα συμφραζόμενα.
Συμπερασματικά, οι νέες μέθοδοι που προτείνονται μπορούν να βελτιώσουν το πρόβλημα της Ανακάλυψης Γνώσης στο Σημαντικό Ιστό ως προς την εκφραστικότητα, ενώ ταυτόχρονα η πολυπλοκότητα παραμένει η μικρότερη δυνατή. Επιτυγχάνουν επίσης την παραγωγή πιο εκφραστικών περιγραφών από υπάρχοντα μεταδεδομένα, προτείνοντας έτσι μια λύση στο πρόβλημα της εκκίνησης (bootstrapping) για το Σημαντικό Ιστό. Παράλληλα, μπορούν να χρησιμοποιηθούν ως βάση για την υλοποίηση πιο αποδοτικών τεχνικών κατανεμημένου και αυξητικού συλλογισμού. / Semantic Web is a combination of technologies and standards in order to give Web information strictly defined semantic structure and meaning. Its aim is to enable Web users and automated agents to process, manage and utilize properly described information in intelligent and efficient ways.
Nevertheless, despite the various techniques that have been proposed, there is no clear method such that, by taking advantage of Semantic Web technologies, to be able to retrieve information deductively, i.e. to infer new and implicit information based on explicitly expressed facts.
In order to address this situation, the problem of Semantic Web Knowledge Discovery (SWKD) is first specified and introduced. SWKD takes advantage of the semantic underpinnings and semantic descriptions of information, organized in a logic theory (i.e. ontologies expressed in OWL). Through the use of appropriate automated reasoning mechanisms, SWKD makes then possible to deduce new and unexpressed information that is only implied among explicit facts.
The question as to whether and to what extent do Semantic Web technologies and logic theory contribute efficiently and expressively enough to the SWKD problem is evaluated through the establishment of a SWKD methodology, which builds upon recent theoretical results, as well as on the qualitative and experimental comparison of some popular inference engines, based on Description Logics. It is shown that the efficiency and expressivity of this method depends on specific theoretical, organizational and technical limitations.
The experimental verification of this methodology is achieved through the development and demonstration of the Knowledge Discovery Interface (KDI), a web-distributed service that has been successfully applied on experimental data. The results taken through the KDI confirm, to a certain extent, the assumptions made mostly about expressivity and motivate the examination and investigation of the newly proposed extensions to the Semantic Web logic theory, namely the OWL 1.1 language.
In order to strengthen the expressivity of knowledge discovery in the case of particular knowledge domains a new technique is introduced, known as Semantic Profiling. This technique evolves traditional Metadata Application Profiling from a flat aggregation and mixing of schemata and metadata elements to the substantial extension and semantic enhancement and enrichment of the model on which it is applied. Thus, semantic profiling actually profiles an ontological model for a particular application, not only by bringing together vocabularies from disparate schemata, but also through the semantic intension and semantic refinement of the initial model. This technique and its results are experimentally verified through its application on the CIDOC-CRM cultural heritage information model and it is shown that, through appropriate methods, the general applicability of the model can be preserved.
However, for SWKD to be of much value, it requires the availability of rich and detailed resource descriptions. Even though information compatible with the Semantic Web logic theory are not always readily available, there are plenty of data organized in flat metadata schemata. To this end, it is investigated whether SWKD can be efficiently and expressively applied on such semi-structured knowledge models, as is the case for example with the Dublin Core metadata schema. It is shown that this problem can be partially reduced to applying semantic profiling on such models and, in order to retain interoperability and resolve potential ambiguities, the OWL 1.1 punning feature is investigated, based on which a name definition may have variable semantic interpretation depending on the ontological context.
In conclusion, these newly proposed methods can improve the SWKD problem in terms of expressive strength, while keeping complexity as low as possible. They also contribute to the creation of expressive descriptions from existing metadata, suggesting a solution to the Semantic Web bootstrapping problem. Finally, they can be utilized as the basis for implementing more efficient techniques that involve distributed and incremental reasoning.
|
207 |
Discovery and Analysis of Aligned Pattern Clusters from Protein Family SequencesLee, En-Shiun Annie 28 April 2015 (has links)
Protein sequences are essential for encoding molecular structures and functions. Consequently, biologists invest substantial resources and time discovering functional patterns in proteins. Using high-throughput technologies, biologists are generating an increasing amount of data. Thus, the major challenge in biosequencing today is the ability to conduct data analysis in an effi cient and productive manner. Conserved amino acids in proteins reveal important functional domains within protein families. Conversely, less conserved amino acid variations within these protein sequence patterns reveal areas of evolutionary and functional divergence.
Exploring protein families using existing methods such as multiple sequence alignment is computationally expensive, thus pattern search is used. However, at present, combinatorial methods of pattern search generate a large set of solutions, and probabilistic methods require richer representations. They require biological ground truth of the input sequences, such as gene name or taxonomic species, as class labels based on traditional classi fication practice to train a model for predicting unknown sequences. However, these algorithms are inherently biased by mislabelling and may not be able to reveal class characteristics in a detailed and succinct manner.
A novel pattern representation called an Aligned Pattern Cluster (AP Cluster) as developed in this dissertation is compact yet rich. It captures conservations and variations of amino acids and covers more sequences with lower entropy and greatly reduces the number of patterns. AP Clusters contain statistically signi cant patterns with variations; their importance has been confi rmed by the following biological evidences: 1) Most of the discovered AP Clusters correspond to binding segments while their aligned columns correspond to binding sites as verifi ed by pFam, PROSITE, and the three-dimensional structure. 2) By compacting strong correlated functional information together, AP Clusters are able to reveal class characteristics for taxonomical classes, gene classes and other functional classes, or incorrect class labelling. 3) Co-occurrence of AP Clusters on the same homologous protein sequences are spatially close in the protein's three-dimensional structure.
These results demonstrate the power and usefulness of AP Clusters. They bring in
similar statistically signifi cance patterns with variation together and align them to reveal
protein regional functionality, class characteristics, binding and interacting sites for the
study of protein-protein and protein-drug interactions, for diff erentiation of cancer tumour
types, targeted gene therapy as well as for drug target discovery.
|
208 |
An?lise de desempenho de vendas em telecomunica??es utilizando t?cnicas de minera??o de dados / Analysis of business development in telecommunication using data minig techniquesMattozo, Te?filo Camara 22 November 2007 (has links)
Made available in DSpace on 2014-12-17T14:52:36Z (GMT). No. of bitstreams: 1
TeofiloCM.pdf: 1145688 bytes, checksum: d9ef0be6d9fb3c2958916ee42bdb507a (MD5)
Previous issue date: 2007-11-22 / Nowadays, telecommunications is one of the most dynamic and strategic areas in the world. Organizations are always seeking to find new management practices within an ever increasing competitive environment where resources are getting scarce. In this scenario, data obtained from business and corporate processes have even greater importance, although this data is not yet adequately explored. Knowledge Discovery in Databases (KDD) appears then, as an option to allow the study of complex problems in different areas of management. This work proposes both a systematization of KDD activities using concepts from different methodologies, such as CRISP-DM, SEMMA and FAYYAD approaches and a study concerning the viability of multivariate regression analysis models to explain corporative telecommunications sales using performance indicators. Thus, statistical methods were outlined to analyze the effects of such indicators on the behavior of business productivity. According to business and standard statistical analysis, equations were defined and fit to their respective determination coefficients. Tests of hypotheses were also conducted on parameters with the purpose of validating the regression models. The results show that there is a relationship between these development indicators and the amount of sales / Telecomunica??es ? uma das mais din?micas e estrat?gicas ?reas no mundo atual. H? constante necessidade das organiza??es buscarem novas formas de gerenciamento, em um ambiente cada vez mais competitivo e com recursos cada vez menores. A exist?ncia de bases de dados nas empresas passou a ter maior import?ncia. Na grande maioria dos casos, dados n?o s?o ainda explorados adequadamente. T?cnicas de Descoberta de Conhecimento em Bases de Dados (DCBD) surgem como alternativas, permitindo o estudo de problemas complexos, sendo cada vez mais utilizadas nas diferentes ?reas de gest?o. O presente trabalho apresenta uma proposta para a sistematiza??o das atividades de DCBD a qual integra as metodologias CRISP-DM, SEMMA, FAYYAD, em um ambiente interativo, bem como um estudo de viabilidade do uso de an?lise de regress?o linear m?ltipla para explica??o de vendas, no setor corporativo de telecomunica??es, utilizando indicadores de desempenho. Foi delineado um m?todo estat?stico para analisar o efeito que os indicadores de
desempenho t?m sobre o comportamento da produtividade de venda. Mediante an?lises estat?sticas e comerciais criteriosas, as equa??es foram definidas, sendo ajustados os seus respectivos coeficientes de determina??o. Foram tamb?m realizados testes de hip?teses de seus par?metros, visando ? valida??o ou n?o dos modelos de regress?o e an?lise da qualidade de seus ajustamentos. Ficou evidenciada a exist?ncia de relacionamento entre as caracter?sticas desses indicadores de desempenho com o volume de vendas realizado
|
209 |
[en] HYBRID INTELLIGENT SYSTEM FOR CLASSIFICATION OF NON-RESIDENTIAL ELECTRICITY CUSTOMERS PAYMENT PROFILES / [pt] SISTEMA INTELIGENTE HÍBRIDO PARA CLASSIFICAÇÃO DO PERFIL DE PAGAMENTO DOS CONSUMIDORES NÃO-RESIDENCIAIS DE ENERGIA ELÉTRICANORMA ALICE DA SILVA CARVALHO 26 March 2018 (has links)
[pt] O objetivo desta pesquisa é classificar o perfil de pagamento dos consumidores não-residenciais de energia elétrica, considerando conhecimento armazenado em base de dados de distribuidoras de energia elétrica. A motivação para desenvolvê-la surgiu da necessidade das distribuidoras por um modelo de suporte a formulação de estratégias capazes de reduzir o grau inadimplência. A metodologia proposta consiste em um sistema inteligente híbrido composto por módulos intercomunicativos que usam conhecimentos armazenados em base de dados para segmentar consumidores e, então, atingir o objetivo proposto. O sistema inicia-se com o módulo neural, que aloca as unidades consumidoras em grupos conforme similaridades (valor fatura, consumo, demanda medida/demanda contratada, intensidade energética e peso da conta no orçamento), em sequência, o módulo bayesiano, estabelece um escore entre 0 e 1 que permite predizer o perfil de pagamento das unidades considerando os grupos gerados e os atributos categóricos (atividade econômica, estrutura tarifária, mesorregião, natureza jurídica e porte empresarial) que caracterizam essas unidades. Os resultados revelaram que o sistema proposto estabelece razoável taxa de acerto na classificação do perfil de consumidores e, portanto, constitui uma importante ferramenta de suporte a formulação de estratégias para combate à inadimplência. Conclui-se que, o sistema híbrido proposto apresenta caráter generalista podendo ser adaptado e implementado em outros mercados. / [en] The objective of this research is to classify the non-residential electricity customer payment profiles regarding the knowledge stored in electricity distribution utilities databases. The motivation for development of the work from the need of electricity distribution by a support model to formulate strategies for tackling non-payment and late payment. The proposed methodology consists of
a hybrid intelligent system constituted by intercommunicating modules that use knowledge stored in database to customer segmentation and then achieve the proposed objective. The system begins with the neural module, which allocates the consuming units in groups according to similarities (bill amount, consumption, measured demand/contracted demand, energy intensity and share of the electricity
bill in the customer s income), in sequence, the Bayesian module establishes a score between 0 and 1 that allows to predict what payment profile of the units considering the generated groups and categorical attributes (business activity, tariff type, business size, mesoregion and company s legal form) that characterize these units. The results showed that the proposed system provides a reasonable
success rate when classifying customer profiles and thus constitutes an important tool in the formulation of strategies for tackling non-payment and late payment. In conclusion, the hybrid system proposed here is a generalist one and could usefully be adapted and implemented in other markets.
|
210 |
Enhancing spatial association rule mining in geographic databases / Melhorando a Mineração de Regras de Associação Espacial em Bancos de Dados GeográficosBogorny, Vania January 2006 (has links)
A técnica de mineração de regras de associação surgiu com o objetivo de encontrar conhecimento novo, útil e previamente desconhecido em bancos de dados transacionais, e uma grande quantidade de algoritmos de mineração de regras de associação tem sido proposta na última década. O maior e mais bem conhecido problema destes algoritmos é a geração de grandes quantidades de conjuntos freqüentes e regras de associação. Em bancos de dados geográficos o problema de mineração de regras de associação espacial aumenta significativamente. Além da grande quantidade de regras e padrões gerados a maioria são associações do domínio geográfico, e são bem conhecidas, normalmente explicitamente representadas no esquema do banco de dados. A maioria dos algoritmos de mineração de regras de associação não garantem a eliminação de dependências geográficas conhecidas a priori. O resultado é que as mesmas associações representadas nos esquemas do banco de dados são extraídas pelos algoritmos de mineração de regras de associação e apresentadas ao usuário. O problema de mineração de regras de associação espacial pode ser dividido em três etapas principais: extração dos relacionamentos espaciais, geração dos conjuntos freqüentes e geração das regras de associação. A primeira etapa é a mais custosa tanto em tempo de processamento quanto pelo esforço requerido do usuário. A segunda e terceira etapas têm sido consideradas o maior problema na mineração de regras de associação em bancos de dados transacionais e tem sido abordadas como dois problemas diferentes: “frequent pattern mining” e “association rule mining”. Dependências geográficas bem conhecidas aparecem nas três etapas do processo. Tendo como objetivo a eliminação dessas dependências na mineração de regras de associação espacial essa tese apresenta um framework com três novos métodos para mineração de regras de associação utilizando restrições semânticas como conhecimento a priori. O primeiro método reduz os dados de entrada do algoritmo, e dependências geográficas são eliminadas parcialmente sem que haja perda de informação. O segundo método elimina combinações de pares de objetos geográficos com dependências durante a geração dos conjuntos freqüentes. O terceiro método é uma nova abordagem para gerar conjuntos freqüentes não redundantes e sem dependências, gerando conjuntos freqüentes máximos. Esse método reduz consideravelmente o número final de conjuntos freqüentes, e como conseqüência, reduz o número de regras de associação espacial. / The association rule mining technique emerged with the objective to find novel, useful, and previously unknown associations from transactional databases, and a large amount of association rule mining algorithms have been proposed in the last decade. Their main drawback, which is a well known problem, is the generation of large amounts of frequent patterns and association rules. In geographic databases the problem of mining spatial association rules increases significantly. Besides the large amount of generated patterns and rules, many patterns are well known geographic domain associations, normally explicitly represented in geographic database schemas. The majority of existing algorithms do not warrant the elimination of all well known geographic dependences. The result is that the same associations represented in geographic database schemas are extracted by spatial association rule mining algorithms and presented to the user. The problem of mining spatial association rules from geographic databases requires at least three main steps: compute spatial relationships, generate frequent patterns, and extract association rules. The first step is the most effort demanding and time consuming task in the rule mining process, but has received little attention in the literature. The second and third steps have been considered the main problem in transactional association rule mining and have been addressed as two different problems: frequent pattern mining and association rule mining. Well known geographic dependences which generate well known patterns may appear in the three main steps of the spatial association rule mining process. Aiming to eliminate well known dependences and generate more interesting patterns, this thesis presents a framework with three main methods for mining frequent geographic patterns using knowledge constraints. Semantic knowledge is used to avoid the generation of patterns that are previously known as non-interesting. The first method reduces the input problem, and all well known dependences that can be eliminated without loosing information are removed in data preprocessing. The second method eliminates combinations of pairs of geographic objects with dependences, during the frequent set generation. A third method presents a new approach to generate non-redundant frequent sets, the maximal generalized frequent sets without dependences. This method reduces the number of frequent patterns very significantly, and by consequence, the number of association rules.
|
Page generated in 0.0752 seconds