Spelling suggestions: "subject:"textmining"" "subject:"detennining""
91 |
Web-based named entity recognition and data integration to accelerate molecular biology researchPafilis, Evangelos. January 2008 (has links)
Heidelberg, Univ., Diss., 2008. / Online publiziert: 2009.
|
92 |
Semantic support in multilingual text retrievalDe Luca, Ernesto William January 2008 (has links)
Zugl.: Magdeburg, Univ., Diss., 2008
|
93 |
Using layout data for the analysis of scientific literatureMathiak, Brigitte January 2008 (has links)
Zugl.: Braunschweig, Techn. Univ., Diss., 2008 / Fälschlich als Bd. 6 der Schriftenreihe bezeichnet
|
94 |
Τεχνικές text mining για την συγκριτική ανάλυση νοήματος κειμένουΠλώτα, Δέσποινα 27 December 2010 (has links)
Τις τελευταίες δεκαετίες έχουν παραχθεί ασύλληπτα μεγάλες ποσότητες δεδομένων από διάφορες διεργασίες που έχουν οργανωθεί με χρήση υπολογιστικών συστημάτων.
Το μεγαλύτερο βέβαια ποσό των δεδομένων βρίσκεται σε μορφή κειμένων και αυτός ο τύπος των μη δομημένων στοιχείων στερείται συνήθως «τα στοιχεία για τα στοιχεία». Η ανάγκη λοιπόν για την αυτοματοποιημένη εξαγωγή χρήσιμης γνώσης από τεράστια ποσά κειμενικών στοιχείων προκειμένου να βοηθηθεί η ανθρώπινη ανάλυση είναι προφανής.
Η εξόρυξη κειμένου (text mining) είναι ένας νέος ερευνητικός τομέας που προσπαθεί να επιλύσει το πρόβλημα της υπερφόρτωσης πληροφοριών με την χρησιμοποίηση των τεχνικών από την εξόρυξη από δεδομένα (data mining), την μηχανική μάθηση (machine learning), την επεξεργασία φυσικής γλώσσας (natural language processing), την ανάκτηση πληροφορίας (information retrieval), την εξαγωγή πληροφορίας (information extraction) και τη διαχείριση γνώσης (Knowledge management).
Βασιζόμενοι λοιπόν σε αυτήν την τεχνική εξόρυξης κειμένου παρουσιάζουμε σε αυτή την διπλωματική εργασία μια μεθοδολογία εξαγωγής γνώσης από κείμενο με απώτερο σκοπό την απόδοση της πατρότητας δυο έργων σε συγκεκριμένο συγγραφέα.
Το κύριο θέμα ενδιαφέροντος είναι το εξής: είναι η Ιλιάδα και Οδύσσεια έργα του ίδιου ποιητή;
Η μεθοδολογία μας βασίζεται στην ανάλυση του «σημαινόμενου» παρά του «σημαίνοντος» στην Ιλιάδα και στην Οδύσσεια.
Σε μία πρώτη φάση μετασχηματίζουμε τα δεδομένα: διατηρήθηκαν μόνο τα ουσιαστικά, τα ρήματα, τα επίθετα και τα επιρρήματα τα οποία οργανώθηκαν σε ομάδες συνωνύμων, όπου κάθε ομάδα αντιπροσωπεύει μία έννοια. Επιλέξαμε να κάνουμε ανάλυση των σχέσεων μεταξύ αυτών των εννοιών. Έτσι μετατρέψαμε όλες τις προτάσεις στο κείμενο, σε προτάσεις οι οποίες αποτελούνται μόνο από αυτές τις έννοιες, απαλείφοντας φυσικά τα διπλότυπα.
Στη συνέχεια μετασχηματίσαμε το κείμενο σε μια δομημένη μορφή, ώστε να μπορέσουμε να το αποθηκεύσουμε σε «εγγραφές» μιας βάσης δεδομένων. Συγκεκριμένα, θεωρήσαμε συνεχή τμήματα κειμένου σαν τέτοιες «εγγραφές». Πειραματιστήκαμε ορίζοντας είτε μία πρόταση είτε δύο συνεχόμενες ως «εγγραφή», χρησιμοποιώντας τον Apriori αλγόριθμο για να εξάγουμε «κανόνες συσχέτισης» της μορφής «90% των εγγραφών που περιέχουν την έννοια χ περιέχουν και την έννοια y». Εξάγαμε ένα μεγάλο αριθμό ισχυρών συσχετίσεων μεταξύ ίδιων εννοιών και στα δυο ποιήματα (π.χ. «γη»-«άνδρας»). Υπάρχουν επίσης συσχετίσεις μεταξύ διαφορετικών εννοιών (π.χ. «μάχη»-«άνδρας» μόνο στην Ιλιάδα) και διαφορετικές συσχετίσεις για την ίδια έννοια (π.χ. «ήρωας»-«μάχη» στην Ιλιάδα και «ήρωας»-«κατοικία» στην Οδύσσεια). Όμως, δεν βρήκαμε καμία αντίθεση. Αυτά τα αποτελέσματα ενδεχομένως να οδηγούν στο συμπέρασμα ότι ο Όμηρος έγραψε και τα δυο έπη. / What is generally called “the Homeric question” is by far the oldest author-attribution problem. The Homeric question really encompasses several issues, e.g. are the Iliad and Odyssey each work of a single poet? In this paper we try to answer the question using a data mining technique. Data mining is an emerging research area that develops techniques for knowledge discovery in huge volumes of data. Data mining methods have been applied to a wide variety of domains, from market basket analysis to the analysis of satellite pictures and human genomes.
More specifically, in this paper, we present an application of data mining in discovering whether a document is ascribed to a writer. Our methodology is based on analyzing rather the content than the syntax. More specifically, we propose a technique for mining association rules, in order to analyze associations amongst concepts. We, also demonstrate the results of the analyses which we have undertaken using this algorithm.
|
95 |
Avaliação das capacidades dinâmicas através de técnicas de business analytcsScherer, Jonatas Ost January 2017 (has links)
O desenvolvimento das capacidades dinâmicas habilita a empresa à inovar de forma mais eficiente, e por conseguinte, melhorar seu desempenho. Esta tese apresenta um framework para mensuração do grau de desenvolvimento das capacidades dinâmicas da empresa. Através de técnicas de text mining uma bag of words específica para as capacidades dinâmicas é proposta, bem como, baseado na literatura é proposto um conjunto de rotinas para avaliar a operacionalização e desenvolvimento das capacidades dinâmicas. Para avaliação das capacidades dinâmicas, foram aplicadas técnicas de text mining utilizando como fonte de dados os relatórios anuais de catorze empresas aéreas. Através da aplicação piloto foi possível realizar um diagnóstico das empresas aéreas e do setor. O trabalho aborda uma lacuna da literatura das capacidades dinâmicas, ao propor um método quantitativo para sua mensuração, assim como, a proposição de uma bag of words específica para as capacidades dinâmicas. Em termos práticos, a proposição pode contribuir para a tomada de decisões estratégicas embasada em dados, possibilitando assim inovar com mais eficiência e melhorar desempenho da firma. / The development of dynamic capabilities enables the company to innovate more efficiently and therefore improves its performance. This thesis presents a framework for measuring the dynamic capabilities development. Text mining techniques were used to propose a specific bag of words for dynamic capabilities. Furthermore, based on the literature, a group of routines is proposed to evaluate the operationalization and development of dynamic capabilities. In order to evaluate the dynamic capabilities, text mining techniques were applied using the annual reports of fourteen airlines as the data source. Through this pilot application it was possible to carry out a diagnosis of the airlines and the sector as well. The thesis approaches a dynamic capabilities literature gap by proposing a quantitative method for its measurement, as well as, the proposition of a specific bag of words for dynamic capabilities. The proposition can contribute to strategic decision making based on data, allowing firms to innovate more efficiently and improve performance.
|
96 |
Apoio à produção textual por meio do emprego de uma ferramenta de mineração de textosKlemann, Miriam Noering January 2011 (has links)
Esta pesquisa apresenta a proposta de utilização da ferramenta SOBEK no apoio à produção textual através de um processo interativo. Trata-se de uma abordagem para apoio à construção textual empregando uma ferramenta de mineração de textos, considerando as dificuldades que envolvem a expressão principalmente escrita do aluno. A metodologia idealizada para implementação da pesquisa foi baseada em tarefa. Para validar esta metodologia, foi feita uma análise de todo o processo de produção textual, e não apenas do seu resultado, contemplando os estágios de pré-escrita e escrita. Destacamos que já no estágio de pré-escrita, propôs-se uma série de passos que permitiram ao estudante se apropriar do tema, estruturar suas idéias, planejar e preparar-se para a tarefa, para só então partir para o estágio da escrita do texto. Um experimento foi realizado com uma turma de alunos do segundo ano do Curso Normal de Nível Médio, onde uma atividade de leitura e produção textual foi proposta, sendo decomposta em várias tarefas. Foram feitos diferentes registros da participação dos alunos no experimento, como vídeo com a captura de todos as ações dos alunos nos sistemas utilizados, bem como questionário com questões fechadas e abertas. Na fase de análise dos dados coletados, cada tarefa foi analisada separadamente. Os resultados reportaram que uma das estratégias adotadas pelos alunos na busca por uma compreensão mais aprofundada do texto, foi a releitura (integral ou parcial) incentivada pelos questionamentos decorrentes da interpretação dos grafos apresentados. As evidências identificadas estavam principalmente relacionadas às “idas e vindas” ao texto original. Outra estratégia identificada, relacionada à comparação que o aluno fez do grafo inicial gerado, foi a adição ou remoção de conceitos na base de conceitos e geração de um novo grafo. As observações feitas pelos estudantes indicaram uma fluência na compreensão do texto original e estruturação das idéias com relação à forma como conceitos relevantes no texto se relacionavam, o que auxiliou no planejamento e preparação para a tarefa final - escrita do texto pelo aluno. Foi observado que as estratégias adotadas pelos alunos utilizando a ferramenta de mineração de texto lhes permitiram escrever de maneira coesa, articulada, apresentando aspectos relevantes e adequados sobre o tema lançado. A principal contribuição deste trabalho residiu na elaboração da metodologia para emprego da ferramenta de mineração de texto como apoio à produção textual e demonstração de seu potencial quando aplicada no contexto escolar. / This work proposes the use of the tool SOBEK to support writing in an interacctive process. This is an approach to text writing with the support of a text mining tool, considering the difficulties involved in the written expression of students. The methodology conceived for the implementation of this research has been based on tasks. To validate this methodology, the whole text production process has been analysed, and not only its results, including the pre-writing and writing stages. We emphasize that in the pre-writing stage, a sequence of steps has been proposed to enable students to better understand the theme, structure the ideas, plan and feel better prepared for the actual task of writing. An experiment has been carried out with second year high school students, where a reading and writing activity has been proposed, being decomposed in several tasks. The students‟ participation in the tasks were recorded in different ways, such as video capture of all their actions in the mining system and text editor, and a questionnaire with multiple selection and open questions. In the analysis phase of the data collected, each task was analysed separately. The results reported that one of the strategies adopted by the students in the search for a better understanding of the text was the act of re-reading it (completely or partially), incentivated by the questioning resulting from the interpretation of the graphs presented. The evidence identified has been mainly related to the number of times the students were observed going back and forth to the original text. Another strategy observed, related to the comparison the students made with the original graph generated, has been the adding or deleting of terms from the system‟s concept base when generating a new graph. The observations made by the students indicated a fluency in the understanding of the original text and in the structuring of their ideas in relation to the ways with which the relevant concepts in the text were related, which has supported the planning of the final task – the actual writing of the text. It has been observed that the strategies adopted by the students using the text mining tool has enabled them to write in a cohesive and articulated way, presenting appropriate and relevant aspects of the theme considered. The main contribution of this work has been the development of the methodology for using the text mining tool as a support for text writing, and the verification of its potential when applied in the educational context.
|
97 |
DETECTION, CLASSIFICATION, AND LOCATION IDENTIFICATION OF TRAFFIC CONGESTION FROM TWITTER STREAM ANALYSISRezaeiDivkolaei, Pouya 01 December 2017 (has links)
Social media today is an important source of information about various events happening around the world. Among various social networking platforms, microtext based ones such as Twitter are of special interest as they are also a rich source of real-time events. In this thesis, our goal is to study the effectiveness of using Twitter as a social sensor for obtaining real-time information on road traffic conditions. Specifically, we focus on: i) identifying tweets that contain traffic event related information, ii) classify such tweets into six main groups of accident, fire, road construction, police activities, weather and others, iii) extract fine-grained location information about the traffic incident by analyzing tweet text. Our experimental results show that using Twitter as a social sensor for obtaining rich information about traffic events is indeed a promising approach. We show that we can correctly detect traffic related tweets with an accuracy of 81%. Moreover, the accuracy of correctly classifying traffic related tweets into one of the six categories is 97%. Lastly, our experimental results show that using only geo-tags of tweets is not sufficient for fine-grained localization of traffic incidents due to two reasons: i) a vast majority of traffic related tweets do not contain geo-tags, and ii) the location mentioned in the tweet text and the geo-tag of a tweet do not always agree. Such observations prove that fine-grained localization of traffic incidents from tweet must also include analysis of the tweet text using Natural Language Processing techniques.
|
98 |
Semantic Feature Extraction for Narrative AnalysisJanuary 2016 (has links)
abstract: A story is defined as "an actor(s) taking action(s) that culminates in a resolution(s)''. I present novel sets of features to facilitate story detection among text via supervised classification and further reveal different forms within stories via unsupervised clustering. First, I investigate the utility of a new set of semantic features compared to standard keyword features combined with statistical features, such as density of part-of-speech (POS) tags and named entities, to develop a story classifier. The proposed semantic features are based on <Subject, Verb, Object> triplets that can be extracted using a shallow parser. Experimental results show that a model of memory-based semantic linguistic features alongside statistical features achieves better accuracy. Next, I further improve the performance of story detection with a novel algorithm which aggregates the triplets producing generalized concepts and relations. A major challenge in automated text analysis is that different words are used for related concepts. Analyzing text at the surface level would treat related concepts (i.e. actors, actions, targets, and victims) as different objects, potentially missing common narrative patterns. The algorithm clusters <Subject, Verb, Object> triplets into generalized concepts by utilizing syntactic criteria based on common contexts and semantic corpus-based statistical criteria based on "contextual synonyms''. Generalized concepts representation of text (1) overcomes surface level differences (which arise when different keywords are used for related concepts) without drift, (2) leads to a higher-level semantic network representation of related stories, and (3) when used as features, they yield a significant (36%) boost in performance for the story detection task. Finally, I implement co-clustering based on generalized concepts/relations to automatically detect story forms. Overlapping generalized concepts and relationships correspond to archetypes/targets and actions that characterize story forms. I perform co-clustering of stories using standard unigrams/bigrams and generalized concepts. I show that the residual error of factorization with concept-based features is significantly lower than the error with standard keyword-based features. I also present qualitative evaluations by a subject matter expert, which suggest that concept-based features yield more coherent, distinctive and interesting story forms compared to those produced by using standard keyword-based features. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2016
|
99 |
Avaliação das capacidades dinâmicas através de técnicas de business analytcsScherer, Jonatas Ost January 2017 (has links)
O desenvolvimento das capacidades dinâmicas habilita a empresa à inovar de forma mais eficiente, e por conseguinte, melhorar seu desempenho. Esta tese apresenta um framework para mensuração do grau de desenvolvimento das capacidades dinâmicas da empresa. Através de técnicas de text mining uma bag of words específica para as capacidades dinâmicas é proposta, bem como, baseado na literatura é proposto um conjunto de rotinas para avaliar a operacionalização e desenvolvimento das capacidades dinâmicas. Para avaliação das capacidades dinâmicas, foram aplicadas técnicas de text mining utilizando como fonte de dados os relatórios anuais de catorze empresas aéreas. Através da aplicação piloto foi possível realizar um diagnóstico das empresas aéreas e do setor. O trabalho aborda uma lacuna da literatura das capacidades dinâmicas, ao propor um método quantitativo para sua mensuração, assim como, a proposição de uma bag of words específica para as capacidades dinâmicas. Em termos práticos, a proposição pode contribuir para a tomada de decisões estratégicas embasada em dados, possibilitando assim inovar com mais eficiência e melhorar desempenho da firma. / The development of dynamic capabilities enables the company to innovate more efficiently and therefore improves its performance. This thesis presents a framework for measuring the dynamic capabilities development. Text mining techniques were used to propose a specific bag of words for dynamic capabilities. Furthermore, based on the literature, a group of routines is proposed to evaluate the operationalization and development of dynamic capabilities. In order to evaluate the dynamic capabilities, text mining techniques were applied using the annual reports of fourteen airlines as the data source. Through this pilot application it was possible to carry out a diagnosis of the airlines and the sector as well. The thesis approaches a dynamic capabilities literature gap by proposing a quantitative method for its measurement, as well as, the proposition of a specific bag of words for dynamic capabilities. The proposition can contribute to strategic decision making based on data, allowing firms to innovate more efficiently and improve performance.
|
100 |
Apoio à produção textual por meio do emprego de uma ferramenta de mineração de textosKlemann, Miriam Noering January 2011 (has links)
Esta pesquisa apresenta a proposta de utilização da ferramenta SOBEK no apoio à produção textual através de um processo interativo. Trata-se de uma abordagem para apoio à construção textual empregando uma ferramenta de mineração de textos, considerando as dificuldades que envolvem a expressão principalmente escrita do aluno. A metodologia idealizada para implementação da pesquisa foi baseada em tarefa. Para validar esta metodologia, foi feita uma análise de todo o processo de produção textual, e não apenas do seu resultado, contemplando os estágios de pré-escrita e escrita. Destacamos que já no estágio de pré-escrita, propôs-se uma série de passos que permitiram ao estudante se apropriar do tema, estruturar suas idéias, planejar e preparar-se para a tarefa, para só então partir para o estágio da escrita do texto. Um experimento foi realizado com uma turma de alunos do segundo ano do Curso Normal de Nível Médio, onde uma atividade de leitura e produção textual foi proposta, sendo decomposta em várias tarefas. Foram feitos diferentes registros da participação dos alunos no experimento, como vídeo com a captura de todos as ações dos alunos nos sistemas utilizados, bem como questionário com questões fechadas e abertas. Na fase de análise dos dados coletados, cada tarefa foi analisada separadamente. Os resultados reportaram que uma das estratégias adotadas pelos alunos na busca por uma compreensão mais aprofundada do texto, foi a releitura (integral ou parcial) incentivada pelos questionamentos decorrentes da interpretação dos grafos apresentados. As evidências identificadas estavam principalmente relacionadas às “idas e vindas” ao texto original. Outra estratégia identificada, relacionada à comparação que o aluno fez do grafo inicial gerado, foi a adição ou remoção de conceitos na base de conceitos e geração de um novo grafo. As observações feitas pelos estudantes indicaram uma fluência na compreensão do texto original e estruturação das idéias com relação à forma como conceitos relevantes no texto se relacionavam, o que auxiliou no planejamento e preparação para a tarefa final - escrita do texto pelo aluno. Foi observado que as estratégias adotadas pelos alunos utilizando a ferramenta de mineração de texto lhes permitiram escrever de maneira coesa, articulada, apresentando aspectos relevantes e adequados sobre o tema lançado. A principal contribuição deste trabalho residiu na elaboração da metodologia para emprego da ferramenta de mineração de texto como apoio à produção textual e demonstração de seu potencial quando aplicada no contexto escolar. / This work proposes the use of the tool SOBEK to support writing in an interacctive process. This is an approach to text writing with the support of a text mining tool, considering the difficulties involved in the written expression of students. The methodology conceived for the implementation of this research has been based on tasks. To validate this methodology, the whole text production process has been analysed, and not only its results, including the pre-writing and writing stages. We emphasize that in the pre-writing stage, a sequence of steps has been proposed to enable students to better understand the theme, structure the ideas, plan and feel better prepared for the actual task of writing. An experiment has been carried out with second year high school students, where a reading and writing activity has been proposed, being decomposed in several tasks. The students‟ participation in the tasks were recorded in different ways, such as video capture of all their actions in the mining system and text editor, and a questionnaire with multiple selection and open questions. In the analysis phase of the data collected, each task was analysed separately. The results reported that one of the strategies adopted by the students in the search for a better understanding of the text was the act of re-reading it (completely or partially), incentivated by the questioning resulting from the interpretation of the graphs presented. The evidence identified has been mainly related to the number of times the students were observed going back and forth to the original text. Another strategy observed, related to the comparison the students made with the original graph generated, has been the adding or deleting of terms from the system‟s concept base when generating a new graph. The observations made by the students indicated a fluency in the understanding of the original text and in the structuring of their ideas in relation to the ways with which the relevant concepts in the text were related, which has supported the planning of the final task – the actual writing of the text. It has been observed that the strategies adopted by the students using the text mining tool has enabled them to write in a cohesive and articulated way, presenting appropriate and relevant aspects of the theme considered. The main contribution of this work has been the development of the methodology for using the text mining tool as a support for text writing, and the verification of its potential when applied in the educational context.
|
Page generated in 0.0582 seconds