Global ETD Search

81	DETECTION, CLASSIFICATION, AND LOCATION IDENTIFICATION OF TRAFFIC CONGESTION FROM TWITTER STREAM ANALYSIS RezaeiDivkolaei, Pouya 01 December 2017 (has links) Social media today is an important source of information about various events happening around the world. Among various social networking platforms, microtext based ones such as Twitter are of special interest as they are also a rich source of real-time events. In this thesis, our goal is to study the effectiveness of using Twitter as a social sensor for obtaining real-time information on road traffic conditions. Specifically, we focus on: i) identifying tweets that contain traffic event related information, ii) classify such tweets into six main groups of accident, fire, road construction, police activities, weather and others, iii) extract fine-grained location information about the traffic incident by analyzing tweet text. Our experimental results show that using Twitter as a social sensor for obtaining rich information about traffic events is indeed a promising approach. We show that we can correctly detect traffic related tweets with an accuracy of 81%. Moreover, the accuracy of correctly classifying traffic related tweets into one of the six categories is 97%. Lastly, our experimental results show that using only geo-tags of tweets is not sufficient for fine-grained localization of traffic incidents due to two reasons: i) a vast majority of traffic related tweets do not contain geo-tags, and ii) the location mentioned in the tweet text and the geo-tag of a tweet do not always agree. Such observations prove that fine-grained localization of traffic incidents from tweet must also include analysis of the tweet text using Natural Language Processing techniques. location identification text mining traffic detection tweet classification twitter
82	Semantic Feature Extraction for Narrative Analysis January 2016 (has links) abstract: A story is defined as "an actor(s) taking action(s) that culminates in a resolution(s)''. I present novel sets of features to facilitate story detection among text via supervised classification and further reveal different forms within stories via unsupervised clustering. First, I investigate the utility of a new set of semantic features compared to standard keyword features combined with statistical features, such as density of part-of-speech (POS) tags and named entities, to develop a story classifier. The proposed semantic features are based on <Subject, Verb, Object> triplets that can be extracted using a shallow parser. Experimental results show that a model of memory-based semantic linguistic features alongside statistical features achieves better accuracy. Next, I further improve the performance of story detection with a novel algorithm which aggregates the triplets producing generalized concepts and relations. A major challenge in automated text analysis is that different words are used for related concepts. Analyzing text at the surface level would treat related concepts (i.e. actors, actions, targets, and victims) as different objects, potentially missing common narrative patterns. The algorithm clusters <Subject, Verb, Object> triplets into generalized concepts by utilizing syntactic criteria based on common contexts and semantic corpus-based statistical criteria based on "contextual synonyms''. Generalized concepts representation of text (1) overcomes surface level differences (which arise when different keywords are used for related concepts) without drift, (2) leads to a higher-level semantic network representation of related stories, and (3) when used as features, they yield a significant (36%) boost in performance for the story detection task. Finally, I implement co-clustering based on generalized concepts/relations to automatically detect story forms. Overlapping generalized concepts and relationships correspond to archetypes/targets and actions that characterize story forms. I perform co-clustering of stories using standard unigrams/bigrams and generalized concepts. I show that the residual error of factorization with concept-based features is significantly lower than the error with standard keyword-based features. I also present qualitative evaluations by a subject matter expert, which suggest that concept-based features yield more coherent, distinctive and interesting story forms compared to those produced by using standard keyword-based features. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2016 Computer science Information Extraction Machine Learning Text Mining
83	Avaliação das capacidades dinâmicas através de técnicas de business analytcs Scherer, Jonatas Ost January 2017 (has links) O desenvolvimento das capacidades dinâmicas habilita a empresa à inovar de forma mais eficiente, e por conseguinte, melhorar seu desempenho. Esta tese apresenta um framework para mensuração do grau de desenvolvimento das capacidades dinâmicas da empresa. Através de técnicas de text mining uma bag of words específica para as capacidades dinâmicas é proposta, bem como, baseado na literatura é proposto um conjunto de rotinas para avaliar a operacionalização e desenvolvimento das capacidades dinâmicas. Para avaliação das capacidades dinâmicas, foram aplicadas técnicas de text mining utilizando como fonte de dados os relatórios anuais de catorze empresas aéreas. Através da aplicação piloto foi possível realizar um diagnóstico das empresas aéreas e do setor. O trabalho aborda uma lacuna da literatura das capacidades dinâmicas, ao propor um método quantitativo para sua mensuração, assim como, a proposição de uma bag of words específica para as capacidades dinâmicas. Em termos práticos, a proposição pode contribuir para a tomada de decisões estratégicas embasada em dados, possibilitando assim inovar com mais eficiência e melhorar desempenho da firma. / The development of dynamic capabilities enables the company to innovate more efficiently and therefore improves its performance. This thesis presents a framework for measuring the dynamic capabilities development. Text mining techniques were used to propose a specific bag of words for dynamic capabilities. Furthermore, based on the literature, a group of routines is proposed to evaluate the operationalization and development of dynamic capabilities. In order to evaluate the dynamic capabilities, text mining techniques were applied using the annual reports of fourteen airlines as the data source. Through this pilot application it was possible to carry out a diagnosis of the airlines and the sector as well. The thesis approaches a dynamic capabilities literature gap by proposing a quantitative method for its measurement, as well as, the proposition of a specific bag of words for dynamic capabilities. The proposition can contribute to strategic decision making based on data, allowing firms to innovate more efficiently and improve performance. Gestão organizacional Tomada de decisão Dynamic capabilities Text mining Business analytics
84	Apoio à produção textual por meio do emprego de uma ferramenta de mineração de textos Klemann, Miriam Noering January 2011 (has links) Esta pesquisa apresenta a proposta de utilização da ferramenta SOBEK no apoio à produção textual através de um processo interativo. Trata-se de uma abordagem para apoio à construção textual empregando uma ferramenta de mineração de textos, considerando as dificuldades que envolvem a expressão principalmente escrita do aluno. A metodologia idealizada para implementação da pesquisa foi baseada em tarefa. Para validar esta metodologia, foi feita uma análise de todo o processo de produção textual, e não apenas do seu resultado, contemplando os estágios de pré-escrita e escrita. Destacamos que já no estágio de pré-escrita, propôs-se uma série de passos que permitiram ao estudante se apropriar do tema, estruturar suas idéias, planejar e preparar-se para a tarefa, para só então partir para o estágio da escrita do texto. Um experimento foi realizado com uma turma de alunos do segundo ano do Curso Normal de Nível Médio, onde uma atividade de leitura e produção textual foi proposta, sendo decomposta em várias tarefas. Foram feitos diferentes registros da participação dos alunos no experimento, como vídeo com a captura de todos as ações dos alunos nos sistemas utilizados, bem como questionário com questões fechadas e abertas. Na fase de análise dos dados coletados, cada tarefa foi analisada separadamente. Os resultados reportaram que uma das estratégias adotadas pelos alunos na busca por uma compreensão mais aprofundada do texto, foi a releitura (integral ou parcial) incentivada pelos questionamentos decorrentes da interpretação dos grafos apresentados. As evidências identificadas estavam principalmente relacionadas às “idas e vindas” ao texto original. Outra estratégia identificada, relacionada à comparação que o aluno fez do grafo inicial gerado, foi a adição ou remoção de conceitos na base de conceitos e geração de um novo grafo. As observações feitas pelos estudantes indicaram uma fluência na compreensão do texto original e estruturação das idéias com relação à forma como conceitos relevantes no texto se relacionavam, o que auxiliou no planejamento e preparação para a tarefa final - escrita do texto pelo aluno. Foi observado que as estratégias adotadas pelos alunos utilizando a ferramenta de mineração de texto lhes permitiram escrever de maneira coesa, articulada, apresentando aspectos relevantes e adequados sobre o tema lançado. A principal contribuição deste trabalho residiu na elaboração da metodologia para emprego da ferramenta de mineração de texto como apoio à produção textual e demonstração de seu potencial quando aplicada no contexto escolar. / This work proposes the use of the tool SOBEK to support writing in an interacctive process. This is an approach to text writing with the support of a text mining tool, considering the difficulties involved in the written expression of students. The methodology conceived for the implementation of this research has been based on tasks. To validate this methodology, the whole text production process has been analysed, and not only its results, including the pre-writing and writing stages. We emphasize that in the pre-writing stage, a sequence of steps has been proposed to enable students to better understand the theme, structure the ideas, plan and feel better prepared for the actual task of writing. An experiment has been carried out with second year high school students, where a reading and writing activity has been proposed, being decomposed in several tasks. The students‟ participation in the tasks were recorded in different ways, such as video capture of all their actions in the mining system and text editor, and a questionnaire with multiple selection and open questions. In the analysis phase of the data collected, each task was analysed separately. The results reported that one of the strategies adopted by the students in the search for a better understanding of the text was the act of re-reading it (completely or partially), incentivated by the questioning resulting from the interpretation of the graphs presented. The evidence identified has been mainly related to the number of times the students were observed going back and forth to the original text. Another strategy observed, related to the comparison the students made with the original graph generated, has been the adding or deleting of terms from the system‟s concept base when generating a new graph. The observations made by the students indicated a fluency in the understanding of the original text and in the structuring of their ideas in relation to the ways with which the relevant concepts in the text were related, which has supported the planning of the final task – the actual writing of the text. It has been observed that the strategies adopted by the students using the text mining tool has enabled them to write in a cohesive and articulated way, presenting appropriate and relevant aspects of the theme considered. The main contribution of this work has been the development of the methodology for using the text mining tool as a support for text writing, and the verification of its potential when applied in the educational context. Produção textual Text production Text mining
85	Apoio à produção textual por meio do emprego de uma ferramenta de mineração de textos Klemann, Miriam Noering January 2011 (has links) Esta pesquisa apresenta a proposta de utilização da ferramenta SOBEK no apoio à produção textual através de um processo interativo. Trata-se de uma abordagem para apoio à construção textual empregando uma ferramenta de mineração de textos, considerando as dificuldades que envolvem a expressão principalmente escrita do aluno. A metodologia idealizada para implementação da pesquisa foi baseada em tarefa. Para validar esta metodologia, foi feita uma análise de todo o processo de produção textual, e não apenas do seu resultado, contemplando os estágios de pré-escrita e escrita. Destacamos que já no estágio de pré-escrita, propôs-se uma série de passos que permitiram ao estudante se apropriar do tema, estruturar suas idéias, planejar e preparar-se para a tarefa, para só então partir para o estágio da escrita do texto. Um experimento foi realizado com uma turma de alunos do segundo ano do Curso Normal de Nível Médio, onde uma atividade de leitura e produção textual foi proposta, sendo decomposta em várias tarefas. Foram feitos diferentes registros da participação dos alunos no experimento, como vídeo com a captura de todos as ações dos alunos nos sistemas utilizados, bem como questionário com questões fechadas e abertas. Na fase de análise dos dados coletados, cada tarefa foi analisada separadamente. Os resultados reportaram que uma das estratégias adotadas pelos alunos na busca por uma compreensão mais aprofundada do texto, foi a releitura (integral ou parcial) incentivada pelos questionamentos decorrentes da interpretação dos grafos apresentados. As evidências identificadas estavam principalmente relacionadas às “idas e vindas” ao texto original. Outra estratégia identificada, relacionada à comparação que o aluno fez do grafo inicial gerado, foi a adição ou remoção de conceitos na base de conceitos e geração de um novo grafo. As observações feitas pelos estudantes indicaram uma fluência na compreensão do texto original e estruturação das idéias com relação à forma como conceitos relevantes no texto se relacionavam, o que auxiliou no planejamento e preparação para a tarefa final - escrita do texto pelo aluno. Foi observado que as estratégias adotadas pelos alunos utilizando a ferramenta de mineração de texto lhes permitiram escrever de maneira coesa, articulada, apresentando aspectos relevantes e adequados sobre o tema lançado. A principal contribuição deste trabalho residiu na elaboração da metodologia para emprego da ferramenta de mineração de texto como apoio à produção textual e demonstração de seu potencial quando aplicada no contexto escolar. / This work proposes the use of the tool SOBEK to support writing in an interacctive process. This is an approach to text writing with the support of a text mining tool, considering the difficulties involved in the written expression of students. The methodology conceived for the implementation of this research has been based on tasks. To validate this methodology, the whole text production process has been analysed, and not only its results, including the pre-writing and writing stages. We emphasize that in the pre-writing stage, a sequence of steps has been proposed to enable students to better understand the theme, structure the ideas, plan and feel better prepared for the actual task of writing. An experiment has been carried out with second year high school students, where a reading and writing activity has been proposed, being decomposed in several tasks. The students‟ participation in the tasks were recorded in different ways, such as video capture of all their actions in the mining system and text editor, and a questionnaire with multiple selection and open questions. In the analysis phase of the data collected, each task was analysed separately. The results reported that one of the strategies adopted by the students in the search for a better understanding of the text was the act of re-reading it (completely or partially), incentivated by the questioning resulting from the interpretation of the graphs presented. The evidence identified has been mainly related to the number of times the students were observed going back and forth to the original text. Another strategy observed, related to the comparison the students made with the original graph generated, has been the adding or deleting of terms from the system‟s concept base when generating a new graph. The observations made by the students indicated a fluency in the understanding of the original text and in the structuring of their ideas in relation to the ways with which the relevant concepts in the text were related, which has supported the planning of the final task – the actual writing of the text. It has been observed that the strategies adopted by the students using the text mining tool has enabled them to write in a cohesive and articulated way, presenting appropriate and relevant aspects of the theme considered. The main contribution of this work has been the development of the methodology for using the text mining tool as a support for text writing, and the verification of its potential when applied in the educational context. Produção textual Text production Text mining
86	Cluster Analysis of Discussions on Internet Forums / Klusteranalys av Diskussioner på Internetforum Holm, Rasmus January 2016 (has links) The growth of textual content on internet forums over the last decade have been immense which have resulted in users struggling to find relevant information in a convenient and quick way. The activity of finding information from large data collections is known as information retrieval and many tools and techniques have been developed to tackle common problems. Cluster analysis is a technique for grouping similar objects into smaller groups (clusters) such that the objects within a cluster are more similar than objects between clusters. We have investigated the clustering algorithms, Graclus and Non-Exhaustive Overlapping k-means (NEO-k-means), on textual data taken from Reddit, a social network service. One of the difficulties with the aforementioned algorithms is that both have an input parameter controlling how many clusters to find. We have used a greedy modularity maximization algorithm in order to estimate the number of clusters that exist in discussion threads. We have shown that it is possible to find subtopics within discussions and that in terms of execution time, Graclus has a clear advantage over NEO-k-means. Cluster Analysis Text Mining Internet Forum Computer Sciences Datavetenskap (datalogi)
87	Towards Secure and Trustworthy Cyberspace: Social Media Analytics on Hacker Communities Li, Weifeng, Li, Weifeng January 2017 (has links) Social media analytics is a critical research area spawned by the increasing availability of rich and abundant online user-generated content. So far, social media analytics has had a profound impact on organizational decision making in many aspects, including product and service design, market segmentation, customer relationship management, and more. However, the cybersecurity sector is behind other sectors in benefiting from the business intelligence offered by social media analytics. Given the role of hacker communities in cybercrimes and the prevalence of hacker communities, there is an urgent need for developing hacker social media analytics capable of gathering cyber threat intelligence from hacker communities for exchanging hacking knowledge and tools. My dissertation addressed two broad research questions: (1) How do we help organizations gain cyber threat intelligence through social media analytics on hacker communities? And (2) how do we advance social media analytics research by developing innovative algorithms and models for hacker communities? Using cyber threat intelligence as a guiding principle, emphasis is placed on the two major components in hacker communities: threat actors and their cybercriminal assets. To these ends, the dissertation is arranged in two parts. The first part of the dissertation focuses on gathering cyber threat intelligence on threat actors. In the first essay, I identify and profile two types of key sellers in hacker communities: malware sellers and stolen data sellers, both of which are responsible for data breach incidents. In the second essay, I develop a method for recovering social interaction networks, which can be further used for detecting major hacker groups, and identifying their specialties and key members. The second part of the dissertation seeks to develop cyber threat intelligence on cybercriminal assets. In the third essay, a novel supervised topic model is proposed to further address the language complexities in hacker communities. In the fourth essay, I propose the development of an innovative emerging topic detection model. Models, frameworks, and design principles developed in this dissertation not only advance social media analytics research, but also broadly contribute to IS security application and design science research. Business Intelligence Cyber Threat Intelligence Social Media Analytics Text Mining
88	Probabilistic Models of Topics and Social Events Wei, Wei 01 December 2016 (has links) Structured probabilistic inference has shown to be useful in modeling complex latent structures of data. One successful way in which this technique has been applied is in the discovery of latent topical structures of text data, which is usually referred to as topic modeling. With the recent popularity of mobile devices and social networking, we can now easily acquire text data attached to meta information, such as geo-spatial coordinates and time stamps. This metadata can provide rich and accurate information that is helpful in answering many research questions related to spatial and temporal reasoning. However, such data must be treated differently from text data. For example, spatial data is usually organized in terms of a two dimensional region while temporal information can exhibit periodicities. While some work existing in the topic modeling community that utilizes some of the meta information, these models largely focused on incorporating metadata into text analysis, rather than providing models that make full use of the joint distribution of metainformation and text. In this thesis, I propose the event detection problem, which is a multidimensional latent clustering problem on spatial, temporal and topical data. I start with a simple parametric model to discover independent events using geo-tagged Twitter data. The model is then improved toward two directions. First, I augmented the model using Recurrent Chinese Restaurant Process (RCRP) to discover events that are dynamic in nature. Second, I studied a model that can detect events using data from multiple media sources. I studied the characteristics of different media in terms of reported event times and linguistic patterns. The approaches studied in this thesis are largely based on Bayesian nonparametric methods to deal with steaming data and unpredictable number of clusters. The research will not only serve the event detection problem itself but also shed light into a more general structured clustering problem in spatial, temporal and textual data. Machine Learning Topic Modeling Graphical Models Non-parametric Bayesian Text Mining
89	Personalized Medicine through Automatic Extraction of Information from Medical Texts Frunza, Oana Magdalena January 2012 (has links) The wealth of medical-related information available today gives rise to a multidimensional source of knowledge. Research discoveries published in prestigious venues, electronic-health records data, discharge summaries, clinical notes, etc., all represent important medical information that can assist in the medical decision-making process. The challenge that comes with accessing and using such vast and diverse sources of data stands in the ability to distil and extract reliable and relevant information. Computer-based tools that use natural language processing and machine learning techniques have proven to help address such challenges. This current work proposes automatic reliable solutions for solving tasks that can help achieve a personalized-medicine, a medical practice that brings together general medical knowledge and case-specific medical information. Phenotypic medical observations, along with data coming from test results, are not enough when assessing and treating a medical case. Genetic, life-style, background and environmental data also need to be taken into account in the medical decision process. This thesis’s goal is to prove that natural language processing and machine learning techniques represent reliable solutions for solving important medical-related problems. From the numerous research problems that need to be answered when implementing personalized medicine, the scope of this thesis is restricted to four, as follows: 1. Automatic identification of obesity-related diseases by using only textual clinical data; 2. Automatic identification of relevant abstracts of published research to be used for building systematic reviews; 3. Automatic identification of gene functions based on textual data of published medical abstracts; 4. Automatic identification and classification of important medical relations between medical concepts in clinical and technical data. This thesis investigation on finding automatic solutions for achieving a personalized medicine through information identification and extraction focused on individual specific problems that can be later linked in a puzzle-building manner. A diverse representation technique that follows a divide-and-conquer methodological approach shows to be the most reliable solution for building automatic models that solve the above mentioned tasks. The methodologies that I propose are supported by in-depth research experiments and thorough discussions and conclusions. Natural Language Processing Machine Learning Text Mining Medical Informatics
90	Monitoring Tweets for Depression to Detect At-Risk Users Jamil, Zunaira January 2017 (has links) According to the World Health Organization, mental health is an integral part of health and well-being. Mental illness can affect anyone, rich or poor, male or female. One such example of mental illness is depression. In Canada 5.3% of the population had presented a depressive episode in the past 12 months. Depression is difficult to diagnose, resulting in high under-diagnosis. Diagnosing depression is often based on self-reported experiences, behaviors reported by relatives, and a mental status examination. Currently, author- ities use surveys and questionnaires to identify individuals who may be at risk of depression. This process is time-consuming and costly. We propose an automated system that can identify at-risk users from their public social media activity. More specifically, we identify at-risk users from Twitter. To achieve this goal we trained a user-level classifier using Support Vector Machine (SVM) that can detect at-risk users with a recall of 0.8750 and a precision of 0.7778. We also trained a tweet-level classifier that predicts if a tweet indicates distress. This task was much more difficult due to the imbalanced data. In the dataset that we labeled, we came across 5% distress tweets and 95% non-distress tweets. To handle this class imbalance, we used undersampling methods. The resulting classifier uses SVM and performs with a recall of 0.8020 and a precision of 0.1237. Our system can be used by authorities to find a focused group of at-risk users. It is not a platform for labeling an individual as a patient with depres- sion, but only a platform for raising an alarm so that the relevant authorities could take necessary interventions to further analyze the predicted user to confirm his/her state of mental health. We respect the ethical boundaries relating to the use of social media data and therefore do not use any user identification information in our research. NLP Machine Learning Tweets text mining social media sentiment analysis

Search results