Spelling suggestions: "subject:"text minining"" "subject:"text chanining""
611 |
Uma an?lise comparativa entre as abordagens lingu?stica e estat?stica para extra??o autom?tica de termos relevantes de corporaSantos, Carlos Alberto dos 27 April 2018 (has links)
Submitted by PPG Ci?ncia da Computa??o (ppgcc@pucrs.br) on 2018-07-26T19:48:07Z
No. of bitstreams: 1
CARLOS ALBERTO DOS SANTOS_DIS.pdf: 1271475 bytes, checksum: 856ae87ad633d3c772b413816caa43d1 (MD5) / Approved for entry into archive by Sheila Dias (sheila.dias@pucrs.br) on 2018-08-01T13:39:36Z (GMT) No. of bitstreams: 1
CARLOS ALBERTO DOS SANTOS_DIS.pdf: 1271475 bytes, checksum: 856ae87ad633d3c772b413816caa43d1 (MD5) / Made available in DSpace on 2018-08-01T14:31:21Z (GMT). No. of bitstreams: 1
CARLOS ALBERTO DOS SANTOS_DIS.pdf: 1271475 bytes, checksum: 856ae87ad633d3c772b413816caa43d1 (MD5)
Previous issue date: 2018-04-27 / It is known that linguistic processing of corpora demands high computational effort because of the complexity of its algorithms, but despite this, the results reached are better than that generated by the statistical processing, where the computational demand is lower. This dissertation describes a comparative analysis between the process linguistic and statistical of term extraction. Experiments were carried out through four corpora in English idiom, built from scientific papers, on which terms extractions were carried out using the approaches. The resulting terms lists were refined with use of relevance metrics and stop list, and then compared with the reference lists of the corpora across the recall technical. These lists, in its turn, were built from the context these corpora, whith help of Internet searches. The results shown that the statistical extraction combined with the stop list and relevance metrics can produce superior results to linguistic process extraction using the same metrics. It?s concluded that statistical approach composed by these metrics can be ideal option to relevance terms extraction, by requiring few computational resources and by to show superior results that found in the linguistic processing. / Sabe-se que o processamento lingu?stico de corpora demanda grande esfor?o computacional devido ? complexidade dos seus algoritmos, mas que, apesar disso, os resultados alcan?ados s?o melhores que aqueles gerados pelo processamento estat?stico, onde a demanda computacional ? menor. Esta disserta??o descreve uma an?lise comparativa entre os processos lingu?stico e estat?stico de extra??o de termos. Foram realizados experimentos atrav?s de quatro corpora em l?ngua inglesa, constru?dos a partir de artigos cient?ficos, sobre os quais foram executadas extra??es de termos utilizando essas abordagens. As listas de termos resultantes foram refinadas com o uso de m?tricas de relev?ncia e stop list, e em seguida comparadas com as listas de refer?ncia dos corpora atrav?s da t?cnica do recall. Essas listas, por sua vez, foram constru?das a partir do contexto desses corpora e com ajuda de pesquisas na Internet. Os resultados mostraram que a extra??o estat?stica combinada com as t?cnicas da stop list e as m?tricas de relev?ncia pode produzir resultados superiores ao processo de extra??o lingu?stico refinado pelas mesmas m?tricas. Concluiu se que a abordagem estat?stica composta por essas t?cnicas pode ser a op??o ideal para extra??o de termos relevantes, por exigir poucos recursos computacionais e por apresentar resultados superiores ?queles encontrados no processamento lingu?stico.
|
612 |
Uma abordagem híbrida para sistemas de recomendação de notícias / A hybrid approach to news recommendation systemsPagnossim, José Luiz Maturana 09 April 2018 (has links)
Sistemas de Recomendação (SR) são softwares capazes de sugerir itens aos usuários com base no histórico de interações de usuários ou por meio de métricas de similaridade que podem ser comparadas por item, usuário ou ambos. Existem diferentes tipos de SR e dentre os que despertam maior interesse deste trabalho estão: SR baseados em conteúdo; SR baseados em conhecimento; e SR baseado em filtro colaborativo. Alcançar resultados adequados às expectativas dos usuários não é uma meta simples devido à subjetividade inerente ao comportamento humano, para isso, SR precisam de soluções eficientes e eficazes para: modelagem dos dados que suportarão a recomendação; recuperação da informação que descrevem os dados; combinação dessas informações dentro de métricas de similaridade, popularidade ou adequabilidade; criação de modelos descritivos dos itens sob recomendação; e evolução da inteligência do sistema de forma que ele seja capaz de aprender a partir da interação com o usuário. A tomada de decisão por um sistema de recomendação é uma tarefa complexa que pode ser implementada a partir da visão de áreas como inteligência artificial e mineração de dados. Dentro da área de inteligência artificial há estudos referentes ao método de raciocínio baseado em casos e da recomendação baseada em casos. No que diz respeito à área de mineração de dados, os SR podem ser construídos a partir de modelos descritivos e realizar tratamento de dados textuais, constituindo formas de criar elementos para compor uma recomendação. Uma forma de minimizar os pontos fracos de uma abordagem, é a adoção de aspectos baseados em uma abordagem híbrida, que neste trabalho considera-se: tirar proveito dos diferentes tipos de SR; usar técnicas de resolução de problemas; e combinar recursos provenientes das diferentes fontes para compor uma métrica unificada a ser usada para ranquear a recomendação por relevância. Dentre as áreas de aplicação dos SR, destaca-se a recomendação de notícias, sendo utilizada por um público heterogêneo, amplo e exigente por relevância. Neste contexto, a presente pesquisa apresenta uma abordagem híbrida para recomendação de notícias construída por meio de uma arquitetura implementada para provar os conceitos de um sistema de recomendação. Esta arquitetura foi validada por meio da utilização de um corpus de notícias e pela realização de um experimento online. Por meio do experimento foi possível observar a capacidade da arquitetura em relação aos requisitos de um sistema de recomendação de notícias e também confirmar a hipótese no que se refere à privilegiar recomendações com base em similaridade, popularidade, diversidade, novidade e serendipidade. Foi observado também uma evolução nos indicadores de leitura, curtida, aceite e serendipidade conforme o sistema foi acumulando histórico de preferências e soluções. Por meio da análise da métrica unificada para ranqueamento foi possível confirmar sua eficácia ao verificar que as notícias melhores colocadas no ranqueamento foram as mais aceitas pelos usuários / Recommendation Systems (RS) are software capable of suggesting items to users based on the history of user interactions or by similarity metrics that can be compared by item, user, or both. There are different types of RS and those which most interest in this work are content-based, knowledge-based and collaborative filtering. Achieving adequate results to user\'s expectations is a hard goal due to the inherent subjectivity of human behavior, thus, the RS need efficient and effective solutions to: modeling the data that will support the recommendation; the information retrieval that describes the data; combining this information within similarity, popularity or suitability metrics; creation of descriptive models of the items under recommendation; and evolution of the systems intelligence to learn from the user\'s interaction. Decision-making by a RS is a complex task that can be implemented according to the view of fields such as artificial intelligence and data mining. In the artificial intelligence field there are studies concerning the method of case-based reasoning that works with the principle that if something worked in the past, it may work again in a new similar situation the one in the past. The case-based recommendation works with structured items, represented by a set of attributes and their respective values (within a ``case\'\' model), providing known and adapted solutions. Data mining area can build descriptive models to RS and also handle, manipulate and analyze textual data, constituting one option to create elements to compose a recommendation. One way to minimize the weaknesses of an approach is to adopt aspects based on a hybrid solution, which in this work considers: taking advantage of the different types of RS; using problem-solving techniques; and combining resources from different sources to compose a unified metric to be used to rank the recommendation by relevance. Among the RS application areas, news recommendation stands out, being used by a heterogeneous public, ample and demanding by relevance. In this context, the this work shows a hybrid approach to news recommendations built through a architecture implemented to prove the concepts of a recommendation system. This architecture has been validated by using a news corpus and by performing an online experiment. Through the experiment it was possible to observe the architecture capacity related to the requirements of a news recommendation system and architecture also related to privilege recommendations based on similarity, popularity, diversity, novelty and serendipity. It was also observed an evolution in the indicators of reading, likes, acceptance and serendipity as the system accumulated a history of preferences and solutions. Through the analysis of the unified metric for ranking, it was possible to confirm its efficacy when verifying that the best classified news in the ranking was the most accepted by the users
|
613 |
Decision Support Systems for Financial Market SurveillanceAlic, Irina 30 November 2016 (has links)
Entscheidungsunterstützungssysteme in der Finanzwirtschaft sind nicht nur für die Wis-senschaft, sondern auch für die Praxis von großem Interesse. Um die Finanzmarktüber-wachung zu gewährleisten, sehen sich die Finanzaufsichtsbehörden auf der einen Seite, mit der steigenden Anzahl von onlineverfügbaren Informationen, wie z.B. den Finanz-Blogs und -Nachrichten konfrontiert. Auf der anderen Seite stellen schnell aufkommen-de Trends, wie z.B. die stetig wachsende Menge an online verfügbaren Daten sowie die Entwicklung von Data-Mining-Methoden, Herausforderungen für die Wissenschaft dar. Entscheidungsunterstützungssysteme in der Finanzwirtschaft bieten die Möglichkeit rechtzeitig relevante Informationen für Finanzaufsichtsbehörden und Compliance-Beauftragte von Finanzinstituten zur Verfügung zu stellen. In dieser Arbeit werden IT-Artefakte vorgestellt, welche die Entscheidungsfindung der Finanzmarktüberwachung unterstützen. Darüber hinaus wird eine erklärende Designtheorie vorgestellt, welche die Anforderungen der Regulierungsbehörden und der Compliance-Beauftragten in Finan-zinstituten aufgreift.
|
614 |
Méthodes de veille textométrique multilingue appliquées à des corpus de l’environnement et de l’énergie : « Restitution, prévision et anticipation d’événements par poly-résonances croisées » / Textometric Multilingual Information Monitoring Methods Applied to Energy & Environment Corpora : "Restitution, Forecasting and Anticipation of Events by Cross Poly-resonance"Shen, Lionel 21 October 2016 (has links)
Cette thèse propose une série de méthodes de veille textométrique multilingue appliquées à des corpus thématiques. Pour constituer ce travail, deux types de corpus sont mobilisés : un corpus comparable et un corpus parallèle, composés de données textuelles extraites des discours de presse, ainsi que ceux des ONG. Les informations récupérées proviennent de trois mondes en trois langues différentes : français, anglais et chinois. La construction de ces deux corpus s’effectue autour de deux thèmes d’actualité ayant pour objet, l’environnement et l’énergie, avec une attention particulière sur trois notions : les énergies, le nucléaire et l’EPR. Après un bref rappel de l’état de l’art en intelligence économique, veille et textométrie, nous avons exposé les deux sujets retenus, les technicités morphosyntaxiques des trois langues dans les contextes nationaux et internationaux. Successivement, les caractéristiques globales, les convergences et les particularités de ces corpus ont été mises en évidence. Les dépouillements et les analyses qualitatives et quantitatives des résultats obtenus sont réalisés à l’aide des outils de la textométrie, notamment grâce aux analyses factorielles des correspondances, réseaux cooccurrentiels et poly-cooccurrentiels, spécificités du modèle hypergéométrique, segments répétés ou encore à la carte des sections. Ensuite, la veille bi-textuelle bilingue a été appliquée sur les trois mêmes concepts dans l’objectif de mettre en évidence les modes selon lesquels les corpus multilingues à caractère comparé et parallèle se complètent dans un processus de veille plurilingue, de restitution, de prévision et d’anticipation. Nous concluons notre recherche en proposant une méthode analytique par Objets-Traits-Entrées (OTE). / This thesis proposes a series of textometric multilingual information monitoring methods applied to thematic corpora (textometry is also called textual statistics or text data analysis). Two types of corpora are mobilized to create this work: a comparable corpus and a parallel corpus in which the textual data are extracted from the press and discourse of NGOs. The information source was retrieved from three countries in three different languages: English, French and Chinese. The two corpora were constructed on two topical issues concerning the environment and energy, with a focus on three concepts: energy, nuclear power and the EPR (European Pressurized Reactor or Evolutionary Power Reactor). After a brief review of the state of the art on business intelligence, information monitoring and textometry, we first set out the two chosen subjects – the environment and energy – and then the morphosyntactic features of the three languages in national and international contexts. The overall characteristics, similarities and peculiarities of these corpora are highlighted successively. The recounts and qualitative and quantitative analyses of the results were carried out using textometric tools, including factor analysis of correspondences, co-occurrences and polyco-occurrential networks, specificities of the hypergeometric model and repeated segments or map sections. Thereafter, bilingual bitextual information monitoring was applied to the same three concepts with the aim of elucidating how the comparable corpus and the parallel corpus can mutually help each other in a process of multilingual information monitoring, by restitution, forecasting and anticipation. We conclude our research by offering an analytical method called Objects-Features-Opening (OFO).
|
615 |
Uma abordagem híbrida para sistemas de recomendação de notícias / A hybrid approach to news recommendation systemsJosé Luiz Maturana Pagnossim 09 April 2018 (has links)
Sistemas de Recomendação (SR) são softwares capazes de sugerir itens aos usuários com base no histórico de interações de usuários ou por meio de métricas de similaridade que podem ser comparadas por item, usuário ou ambos. Existem diferentes tipos de SR e dentre os que despertam maior interesse deste trabalho estão: SR baseados em conteúdo; SR baseados em conhecimento; e SR baseado em filtro colaborativo. Alcançar resultados adequados às expectativas dos usuários não é uma meta simples devido à subjetividade inerente ao comportamento humano, para isso, SR precisam de soluções eficientes e eficazes para: modelagem dos dados que suportarão a recomendação; recuperação da informação que descrevem os dados; combinação dessas informações dentro de métricas de similaridade, popularidade ou adequabilidade; criação de modelos descritivos dos itens sob recomendação; e evolução da inteligência do sistema de forma que ele seja capaz de aprender a partir da interação com o usuário. A tomada de decisão por um sistema de recomendação é uma tarefa complexa que pode ser implementada a partir da visão de áreas como inteligência artificial e mineração de dados. Dentro da área de inteligência artificial há estudos referentes ao método de raciocínio baseado em casos e da recomendação baseada em casos. No que diz respeito à área de mineração de dados, os SR podem ser construídos a partir de modelos descritivos e realizar tratamento de dados textuais, constituindo formas de criar elementos para compor uma recomendação. Uma forma de minimizar os pontos fracos de uma abordagem, é a adoção de aspectos baseados em uma abordagem híbrida, que neste trabalho considera-se: tirar proveito dos diferentes tipos de SR; usar técnicas de resolução de problemas; e combinar recursos provenientes das diferentes fontes para compor uma métrica unificada a ser usada para ranquear a recomendação por relevância. Dentre as áreas de aplicação dos SR, destaca-se a recomendação de notícias, sendo utilizada por um público heterogêneo, amplo e exigente por relevância. Neste contexto, a presente pesquisa apresenta uma abordagem híbrida para recomendação de notícias construída por meio de uma arquitetura implementada para provar os conceitos de um sistema de recomendação. Esta arquitetura foi validada por meio da utilização de um corpus de notícias e pela realização de um experimento online. Por meio do experimento foi possível observar a capacidade da arquitetura em relação aos requisitos de um sistema de recomendação de notícias e também confirmar a hipótese no que se refere à privilegiar recomendações com base em similaridade, popularidade, diversidade, novidade e serendipidade. Foi observado também uma evolução nos indicadores de leitura, curtida, aceite e serendipidade conforme o sistema foi acumulando histórico de preferências e soluções. Por meio da análise da métrica unificada para ranqueamento foi possível confirmar sua eficácia ao verificar que as notícias melhores colocadas no ranqueamento foram as mais aceitas pelos usuários / Recommendation Systems (RS) are software capable of suggesting items to users based on the history of user interactions or by similarity metrics that can be compared by item, user, or both. There are different types of RS and those which most interest in this work are content-based, knowledge-based and collaborative filtering. Achieving adequate results to user\'s expectations is a hard goal due to the inherent subjectivity of human behavior, thus, the RS need efficient and effective solutions to: modeling the data that will support the recommendation; the information retrieval that describes the data; combining this information within similarity, popularity or suitability metrics; creation of descriptive models of the items under recommendation; and evolution of the systems intelligence to learn from the user\'s interaction. Decision-making by a RS is a complex task that can be implemented according to the view of fields such as artificial intelligence and data mining. In the artificial intelligence field there are studies concerning the method of case-based reasoning that works with the principle that if something worked in the past, it may work again in a new similar situation the one in the past. The case-based recommendation works with structured items, represented by a set of attributes and their respective values (within a ``case\'\' model), providing known and adapted solutions. Data mining area can build descriptive models to RS and also handle, manipulate and analyze textual data, constituting one option to create elements to compose a recommendation. One way to minimize the weaknesses of an approach is to adopt aspects based on a hybrid solution, which in this work considers: taking advantage of the different types of RS; using problem-solving techniques; and combining resources from different sources to compose a unified metric to be used to rank the recommendation by relevance. Among the RS application areas, news recommendation stands out, being used by a heterogeneous public, ample and demanding by relevance. In this context, the this work shows a hybrid approach to news recommendations built through a architecture implemented to prove the concepts of a recommendation system. This architecture has been validated by using a news corpus and by performing an online experiment. Through the experiment it was possible to observe the architecture capacity related to the requirements of a news recommendation system and architecture also related to privilege recommendations based on similarity, popularity, diversity, novelty and serendipity. It was also observed an evolution in the indicators of reading, likes, acceptance and serendipity as the system accumulated a history of preferences and solutions. Through the analysis of the unified metric for ranking, it was possible to confirm its efficacy when verifying that the best classified news in the ranking was the most accepted by the users
|
616 |
Digital Intelligence – Möglichkeiten und Umsetzung einer informatikgestützten Frühaufklärung / Digital Intelligence – opportunities and implementation of a data-driven foresightWalde, Peter 18 January 2011 (has links) (PDF)
Das Ziel der Digital Intelligence bzw. datengetriebenen Strategischen Frühaufklärung ist, die Zukunftsgestaltung auf Basis valider und fundierter digitaler Information mit vergleichsweise geringem Aufwand und enormer Zeit- und Kostenersparnis zu unterstützen. Hilfe bieten innovative Technologien der (halb)automatischen Sprach- und Datenverarbeitung wie z. B. das Information Retrieval, das (Temporal) Data, Text und Web Mining, die Informationsvisualisierung, konzeptuelle Strukturen sowie die Informetrie. Sie ermöglichen, Schlüsselthemen und latente Zusammenhänge aus einer nicht überschaubaren, verteilten und inhomogenen Datenmenge wie z. B. Patenten, wissenschaftlichen Publikationen, Pressedokumenten oder Webinhalten rechzeitig zu erkennen und schnell und zielgerichtet bereitzustellen. Die Digital Intelligence macht somit intuitiv erahnte Muster und Entwicklungen explizit und messbar.
Die vorliegende Forschungsarbeit soll zum einen die Möglichkeiten der Informatik zur datengetriebenen Frühaufklärung aufzeigen und zum zweiten diese im pragmatischen Kontext umsetzen.
Ihren Ausgangspunkt findet sie in der Einführung in die Disziplin der Strategischen Frühaufklärung und ihren datengetriebenen Zweig – die Digital Intelligence.
Diskutiert und klassifiziert werden die theoretischen und insbesondere informatikbezogenen Grundlagen der Frühaufklärung – vor allem die Möglichkeiten der zeitorientierten Datenexploration.
Konzipiert und entwickelt werden verschiedene Methoden und Software-Werkzeuge, die die zeitorientierte Exploration insbesondere unstrukturierter Textdaten (Temporal Text Mining) unterstützen. Dabei werden nur Verfahren in Betracht gezogen, die sich im Kontext einer großen Institution und den spezifischen Anforderungen der Strategischen Frühaufklärung pragmatisch nutzen lassen. Hervorzuheben sind eine Plattform zur kollektiven Suche sowie ein innovatives Verfahren zur Identifikation schwacher Signale.
Vorgestellt und diskutiert wird eine Dienstleistung der Digital Intelligence, die auf dieser Basis in einem globalen technologieorientierten Konzern erfolgreich umgesetzt wurde und eine systematische Wettbewerbs-, Markt- und Technologie-Analyse auf Basis digitaler Spuren des Menschen ermöglicht.
|
617 |
Extraction automatique et visualisation des thèmes abordés dans des résumés de mémoires et de thèses en anthropologie au Québec, de 1985 à 2009Samson, Anne-Renée 06 1900 (has links)
S’insérant dans les domaines de la Lecture et de l’Analyse de Textes Assistées par Ordinateur (LATAO), de la Gestion Électronique des Documents (GÉD), de la visualisation de l’information et, en partie, de l’anthropologie, cette recherche exploratoire propose l’expérimentation d’une méthodologie descriptive en fouille de textes afin de cartographier thématiquement un corpus de textes anthropologiques. Plus précisément, nous souhaitons éprouver la méthode de classification hiérarchique ascendante (CHA) pour extraire et analyser les thèmes issus de résumés de mémoires et de thèses octroyés de 1985 à 2009 (1240 résumés), par les départements d’anthropologie de l’Université de Montréal et de l’Université Laval, ainsi que le département d’histoire de l’Université Laval (pour les résumés archéologiques et ethnologiques). En première partie de mémoire, nous présentons notre cadre théorique, c'est-à-dire que nous expliquons ce qu’est la fouille de textes, ses origines, ses applications, les étapes méthodologiques puis, nous complétons avec une revue des principales publications. La deuxième partie est consacrée au cadre méthodologique et ainsi, nous abordons les différentes étapes par lesquelles ce projet fut conduit; la collecte des données, le filtrage linguistique, la classification automatique, pour en nommer que quelques-unes. Finalement, en dernière partie, nous présentons les résultats de notre recherche, en nous attardant plus particulièrement sur deux expérimentations. Nous abordons également la navigation thématique et les approches conceptuelles en thématisation, par exemple, en anthropologie, la dichotomie culture ̸ biologie. Nous terminons avec les limites de ce projet et les pistes d’intérêts pour de futures recherches. / Taking advantage of the recent development of automated analysis of textual data, digital records of documents, data graphics and anthropology, this study was set forth using data mining techniques to create a thematic map of anthropological documents. In this exploratory research, we propose to evaluate the usefulness of thematic analysis by using automated classification of textual data, as well as information visualizations (based on network analysis). More precisely, we want to examine the method of hierarchical clustering (HCA, agglomerative) for thematic analysis and information extraction. We built our study from a database consisting of 1 240 thesis abstracts, granted from 1985 to 2009, by anthropological departments at the University of Montreal and University Laval, as well as historical department at University Laval (for archaeological and ethnological abstracts). In the first section, we present our theoretical framework; we expose definitions of text mining, its origins, the practical applications and the methodology, and in the end, we present a literature review. The second part is devoted to the methodological framework and we discuss the various stages through which the project was conducted; construction of database, linguistic and statistical filtering, automated classification, etc. Finally, in the last section, we display results of two specific experiments and we present our interpretations. We also discuss about thematic navigation and conceptual approaches. We conclude with the limitations we faced through this project and paths of interest for future research.
|
618 |
Le repérage automatique des entités nommées dans la langue arabe : vers la création d'un système à base de règlesZaghouani, Wajdi January 2009 (has links)
Mémoire numérisé par la Division de la gestion de documents et des archives de l'Université de Montréal
|
619 |
市場趨勢導向機會發掘之服務價值創新 / Market-Oriented Chance Discovery toward Service Value Creation沈品勳, Shen, Pin Hsun Unknown Date (has links)
市場導向指的是一種辨識市場狀態和顧客需求的企業理論,可以藉由不同的市場情報蒐集來達成。由此,其中的「市場感知能力」和「市場連結能力」能夠幫助企業不斷地修正、改進、創新及再定義原本的市場觀點,並研究市場中所有角色之間的關係。本研究致力於如何以文字探勘、資料探勘及機會發掘等理論創造一個服務的資訊系統來提升這兩個能力。運用本系統可以協助企業在不同市場中找到潛在資源及合作夥伴以共創複合性產品。換句話說,以這種開放關係和不同公司的共創方式更能達到相互曝光、產品與服務互補的效果。而更好的市場導向可以幫助企業間創造出更好的複合式產品並達到利基市場的競爭優勢。 / Market orientation refers to the business philosophy that focuses on identifying and meeting the stated or hidden needs of customers through various approaches of intelligence gathering activities. To this end, two important capabilities, market sensing and market relating, are to enable business to formulate, examine, modify, renovate and redefine their market views and investigate among all players in the market. This study focuses on how to exploit information system as a service to facilitate the acts of the market sensing and market relating capabilities in terms of information technologies of the text mining technique and the chance discovery theory. In addition, this service would help various businesses seek new, necessary and related resources or fellow companies for each other to cooperate to form a complex service product from different markets. In other words, cooperating in the open relationship, different companies attain more possibility to mutually expose their features, product or service, which achieves a marketing for those companies in different markets as well, and better market orientation is also avail business to make better complex product or service for niche market to get competitive advantages.
|
620 |
Παραμετροποίηση στοχαστικών μεθόδων εξόρυξης γνώσης από δεδομένα, μετασχηματισμού συμβολοσειρών και τεχνικών συμπερασματικού λογικού προγραμματισμού / Parameterization of stochastic data mining methods, string conversion algorithms and deductive logic programming techniquesΛύρας, Δημήτριος 02 February 2011 (has links)
Η παρούσα διατριβή πραγματεύεται το αντικείμενο της μάθησης από δύο διαφορετικές οπτικές γωνίες: την επαγωγική και την παραγωγική μάθηση.
Αρχικά, παρουσιάζονται παραμετροποιήσεις στοχαστικών μεθόδων εξόρυξης γνώσης από δεδομένα υπό τη μορφή τεσσάρων καινοτόμων εξατομικευμένων μοντέλων στήριξης ασθενών που πάσχουν από διαταραχές άγχους. Τα τρία μοντέλα προσανατολίζονται στην ανεύρεση πιθανών συσχετίσεων μεταξύ των περιβαλλοντικών παραμέτρων των ασθενών και του επιπέδου άγχους που αυτοί παρουσιάζουν, ενώ παράλληλα προτείνεται και η χρήση ενός Μπεϋζιανού μοντέλου πρόβλεψης του επιπέδου άγχους που είναι πιθανό να εμφανίσει κάποιος ασθενής δεδομένων ορισμένων τιμών του περιβαλλοντικού του πλαισίου εφαρμογής.
Αναφορικά με το χώρο της εξόρυξης γνώσης από κείμενο και του μετασχηματισμού συμβολοσειρών, προτείνεται η εκπαίδευση μοντέλων δέντρων αποφάσεων για την αυτόματη μεταγραφή Ελληνικού κειμένου στην αντίστοιχη φωνητική του αναπαράσταση, πραγματοποιείται η στοχαστική μοντελοποίηση όλων των πιθανών μεταγραφικών νορμών από ορθογραφημένα Ελληνικά σε Greeklish και τέλος παρουσιάζεται ένας καινοτόμος αλγόριθμος που συνδυάζει δύο γνωστά για την ικανοποιητική τους απόδοση μέτρα σύγκρισης ομοιότητας αλφαριθμητικών προκειμένου να επιτευχθεί η αυτόματη λημματοποίηση του κειμένου εισόδου.
Επιπρόσθετα, στα πλαίσια της ανάπτυξης συστημάτων που θα διευκολύνουν την ανάκτηση εγγράφων ή πληροφοριών προτείνεται η συνδυαστική χρήση του προαναφερθέντος αλγορίθμου λημματοποίησης παράλληλα με τη χρήση ενός πιθανοτικού δικτύου Bayes στοχεύοντας στην ανάπτυξη ενός εύρωστου και ανταγωνιστικού ως προς τις επιδόσεις συστήματος ανάκτησης πληροφοριών.
Τέλος, παρουσιάζονται οι προτάσεις μας που αφορούν στο χώρο της παραγωγικής μάθησης και του ελέγχου ικανοποιησιμότητας λογικών εκφράσεων. Συγκεκριμένα περιλαμβάνουν:
i) την ανάλυση και εκτενή παρουσίαση μιας καινοτόμας μαθηματικής μοντελοποίησης με την ονομασία AnaLog (Analytic Tableaux Logic) η οποία δύναται να εκφράσει τη λογική που διέπει τους αναλυτικούς πίνακες για προτασιακούς τύπους σε κανονική διαζευκτική μορφή. Mέσω του λογισμού Analog επιτυγχάνεται η εύρεση των κλειστών κλάδων του πλήρως ανεπτυγμένου δέντρου Smullyan, χωρίς να είναι απαραίτητος ο αναλυτικός σχεδιασμός του δέντρου, και
ii) την παράθεση ενός αναλυτικού αλγορίθμου που μπορεί να αξιοποιήσει τον φορμαλισμό AnaLog σε ένα πλαίσιο αριθμητικής διαστημάτων μέσω του οποίου μπορούμε να αποφανθούμε για την ικανοποιησιμότητα συμβατικών διαζευκτικών προτασιακών εκφράσεων. / The present dissertation deals with the problem of learning from two different perspectives, meaning the inferential and the deductive learning.
Initially, we present our suggestions regarding the parameterization of stochastic data mining methods in the form of four treatment supportive services for patients suffering from anxiety disorders. Three of these services focus on the discovery of possible associations between the patients’ contextual data whereas the last one aims at predicting the stress level a patient might suffer from, in a given environmental context.
Our proposals with regards to the wider area of text mining and string conversion include: i) the employment of decision-tree based models for the automatic conversion of Greek texts into their equivalent CPA format, ii) the stochastic modeling of all the existing transliteration norms for the Greek to Greeklish conversion in the form of a robust transcriber and iii) a novel algorithm that is able to combine two well-known for their satisfactory performance string distance metric models in order to address the problem of automatic word lemmatization.
With regards to the development of systems that would facilitate the automatic information retrieval, we propose the employment of the aforementioned lemmatization algorithm in order to reduce the ambiguity posed by the plethora of morphological variations of the processed language along with the parallel use of probabilistic Bayesian Networks aiming at the development of a robust and competitive modern information retrieval system.
Finally, our proposals regarding logical deduction and satisfiability checking include:
i) a novel mathematical formalism of the analytic tableaux methodology named AnaLog (after the terms Analytic Tableaux Logic) which allows us to efficiently simulate the structure and the properties of a complete clausal tableau given an input CNF formula. Via the AnaLog calculus it is made possible to calculate all the closed branches of the equivalent complete Smullyan tree without imposing the need to fully construct it, and
ii) a practical application of the AnaLog calculus within an interval arithmetic framework which is able to decide upon the satisfiability of propositional formulas in CNF format. This framework, apart from constituting an illustrative demonstration of the application of the AnaLog calculus, it may also be employed as an alternative conventional SAT system.
|
Page generated in 0.0633 seconds