Global ETD Search

11	Tracking domain knowledge based on segmented textual sources Kalledat, Tobias 11 May 2009 (has links) Die hier vorliegende Forschungsarbeit hat zum Ziel, Erkenntnisse über den Einfluss der Vorverarbeitung auf die Ergebnisse der Wissensgenerierung zu gewinnen und konkrete Handlungsempfehlungen für die geeignete Vorverarbeitung von Textkorpora in Text Data Mining (TDM) Vorhaben zu geben. Der Fokus liegt dabei auf der Extraktion und der Verfolgung von Konzepten innerhalb bestimmter Wissensdomänen mit Hilfe eines methodischen Ansatzes, der auf der waagerechten und senkrechten Segmentierung von Korpora basiert. Ergebnis sind zeitlich segmentierte Teilkorpora, welche die Persistenzeigenschaft der enthaltenen Terme widerspiegeln. Innerhalb jedes zeitlich segmentierten Teilkorpus können jeweils Cluster von Termen gebildet werden, wobei eines diejenigen Terme enthält, die bezogen auf das Gesamtkorpus nicht persistent sind und das andere Cluster diejenigen, die in allen zeitlichen Segmenten vorkommen. Auf Grundlage einfacher Häufigkeitsmaße kann gezeigt werden, dass allein die statistische Qualität eines einzelnen Korpus es erlaubt, die Vorverarbeitungsqualität zu messen. Vergleichskorpora sind nicht notwendig. Die Zeitreihen der Häufigkeitsmaße zeigen signifikante negative Korrelationen zwischen dem Cluster von Termen, die permanent auftreten, und demjenigen das die Terme enthält, die nicht persistent in allen zeitlichen Segmenten des Korpus vorkommen. Dies trifft ausschließlich auf das optimal vorverarbeitete Korpus zu und findet sich nicht in den anderen Test Sets, deren Vorverarbeitungsqualität gering war. Werden die häufigsten Terme unter Verwendung domänenspezifischer Taxonomien zu Konzepten gruppiert, zeigt sich eine signifikante negative Korrelation zwischen der Anzahl unterschiedlicher Terme pro Zeitsegment und den einer Taxonomie zugeordneten Termen. Dies trifft wiederum nur für das Korpus mit hoher Vorverarbeitungsqualität zu. Eine semantische Analyse auf einem mit Hilfe einer Schwellenwert basierenden TDM Methode aufbereiteten Datenbestand ergab signifikant unterschiedliche Resultate an generiertem Wissen, abhängig von der Qualität der Datenvorverarbeitung. Mit den in dieser Forschungsarbeit vorgestellten Methoden und Maßzahlen ist sowohl die Qualität der verwendeten Quellkorpora, als auch die Qualität der angewandten Taxonomien messbar. Basierend auf diesen Erkenntnissen werden Indikatoren für die Messung und Bewertung von Korpora und Taxonomien entwickelt sowie Empfehlungen für eine dem Ziel des nachfolgenden Analyseprozesses adäquate Vorverarbeitung gegeben. / The research work available here has the goal of analysing the influence of pre-processing on the results of the generation of knowledge and of giving concrete recommendations for action for suitable pre-processing of text corpora in TDM. The research introduced here focuses on the extraction and tracking of concepts within certain knowledge domains using an approach of horizontally (timeline) and vertically (persistence of terms) segmenting of corpora. The result is a set of segmented corpora according to the timeline. Within each timeline segment clusters of concepts can be built according to their persistence quality in relation to each single time-based corpus segment and to the whole corpus. Based on a simple frequency measure it can be shown that only the statistical quality of a single corpus allows measuring the pre-processing quality. It is not necessary to use comparison corpora. The time series of the frequency measure have significant negative correlations between the two clusters of concepts that occur permanently and others that vary within an optimal pre-processed corpus. This was found to be the opposite in every other test set that was pre-processed with lower quality. The most frequent terms were grouped into concepts by the use of domain-specific taxonomies. A significant negative correlation was found between the time series of different terms per yearly corpus segments and the terms assigned to taxonomy for corpora with high quality level of pre-processing. A semantic analysis based on a simple TDM method with significant frequency threshold measures resulted in significant different knowledge extracted from corpora with different qualities of pre-processing. With measures introduced in this research it is possible to measure the quality of applied taxonomy. Rules for the measuring of corpus as well as taxonomy quality were derived from these results and advice suggested for the appropriate level of pre-processing. Datenvorverarbeitung Text Data Mining Korpuskennzahlen Korpuslinguistik Computerlinguistik Vorverarbeitungsqualität Wissensextraktion Text Data Mining Corpus Measures Corpus Linguistics Computational Linguistics Data Pre-processing Pre-processing Quality Knowledge Extraction 330 Wirtschaft 17 Wirtschaft QP 345 ddc:330
12	School Violence and Teacher Resiliency at a Midwest Elementary/Middle School Wright, Jounice Blackmon 01 January 2015 (has links) The purpose of this phenomenological study was to investigate, from the perspective of teachers, the possible effect of school violence on teacher resiliency. School violence has been studied with respect to student behavior and academic success, as well as socioeconomic influences, but not with respect to teacher resiliency, as expressed by teachers themselves. Resiliency theory was the conceptual framework. Participants were all teachers of Grades 2-8 at an elementary/middle school in the Midwest. Twelve in-depth interviews were transcribed into text data and analyzed for common themes. Using NVivo, Version 10, I was able to more easily manage the volumes of text data. Reoccurring themes and meanings were triangulated with a resiliency questionnaire, school climate surveys, and field notes. The overarching themes that emerged were that teacher resiliency at the target school was lowered when its teachers were exposed to a school climate which allowed for excessive violence, especially fights. A second overarching theme was that there were inconsistencies in the support offered by the school administration, which negatively impacted teacher resiliency. A third overarching theme was that there was a significant lack of parental and community support, which also negatively affected teacher resiliency at the target school. Overarching themes that emerged can now be used to support the need for more effective teacher training about school violence. The outcomes may also help generate improved school violence policies at the local, state, and national levels. phenomenology qualitative school violence teacher resiliency text data triangulation Education Public Health Education and Promotion
13	School Violence and Teacher Resiliency at a Midwest Elementary/Middle School Wright, Jounice Blackmon 01 January 2015 (has links) The purpose of this phenomenological study was to investigate, from the perspective of teachers, the possible effect of school violence on teacher resiliency. School violence has been studied with respect to student behavior and academic success, as well as socioeconomic influences, but not with respect to teacher resiliency, as expressed by teachers themselves. Resiliency theory was the conceptual framework. Participants were all teachers of Grades 2-8 at an elementary/middle school in the Midwest. Twelve in-depth interviews were transcribed into text data and analyzed for common themes. Using NVivo, Version 10, I was able to more easily manage the volumes of text data. Reoccurring themes and meanings were triangulated with a resiliency questionnaire, school climate surveys, and field notes. The overarching themes that emerged were that teacher resiliency at the target school was lowered when its teachers were exposed to a school climate which allowed for excessive violence, especially fights. A second overarching theme was that there were inconsistencies in the support offered by the school administration, which negatively impacted teacher resiliency. A third overarching theme was that there was a significant lack of parental and community support, which also negatively affected teacher resiliency at the target school. Overarching themes that emerged can now be used to support the need for more effective teacher training about school violence. The outcomes may also help generate improved school violence policies at the local, state, and national levels. phenomenology qualitative school violence teacher resiliency text data triangulation Education Public Health Education and Promotion
14	Impactos econômicos e financeiros de notícias Azevedo, Luis Fernando Pereira 18 April 2017 (has links) Submitted by Luis Azevedo (lfpazevedo@gmail.com) on 2017-04-25T22:48:00Z No. of bitstreams: 1 Tese_Luis_Fernando_Azevedo.pdf: 2851972 bytes, checksum: b18040b51fff2a231332968632ef42ab (MD5) / Approved for entry into archive by Vera Lúcia Mourão (vera.mourao@fgv.br) on 2017-04-27T20:59:44Z (GMT) No. of bitstreams: 1 Tese_Luis_Fernando_Azevedo.pdf: 2851972 bytes, checksum: b18040b51fff2a231332968632ef42ab (MD5) / Made available in DSpace on 2017-05-02T12:32:19Z (GMT). No. of bitstreams: 1 Tese_Luis_Fernando_Azevedo.pdf: 2851972 bytes, checksum: b18040b51fff2a231332968632ef42ab (MD5) Previous issue date: 2017-04-18 / Devido à crescente evolução tecnológica das últimas décadas, eventos locais, econômicos ou não, são divulgados de forma praticamente instantânea de forma global. Muitos destes eventos têm o poder de influenciar os preços dos ativos nos mais diversos mercados, em todas as localidades. Além da divulgação de eventos, o conteúdo das notícias econômicas traz valiosas informações que capturam as variações nos sentimentos dos agentes sobre a situação corrente e prospectiva da economia. Uma pauta negativa sobre atividade econômica, com anedóticos e relatos de empresários, por exemplo, sinaliza um crescimento negativo do PIB antes mesmo da sua divulgação. Apesar da sua amplitude e velocidade, notícias podem estar sujeitas a uma série de vieses, desde má interpretação do tema, passando por preconceito contra algum personagem da matéria, e até mesmo efeito manada. A evolução tecnológica também começa a permitir a superação destes problemas. O estudo de grandes volumes de dados (big data) em redes sociais ou plataformas de buscas fornece um meio de mensurar o sentimento dos agentes econômicos diretamente, em tempo real, sem a necessidade da intermediação por notícias. Através de notícias especializadas com frequência intradiária e ferramentas como o Google Trends, esta tese busca captar a variação de sentimentos de agentes e encontrar se há alguma relação entre estas medidas e flutuações dos ciclos econômicos e precificação de ativos. No capítulo 1, fatores macroeconômicos foram replicados através de notícias e, os aplicando em modelos fatoriais, encontram-se evidências de que notícias contêm informações relevantes para explicar o excesso de retorno de carteiras de ações da BM&FBovespa. Em seguida, no capítulo 2, analisou-se a comunicação do Banco Central do Brasil através da mídia especializada. Há evidências de que a volatilidade dos juros é maior em dias nos quais algum membro do Copom é citado na mídia, principalmente considerando anos mais recentes. Por fim, no capítulo 3, buscou-se extrair o sentimento das pessoas através de suas buscas realizadas no Google. Em um VAR com variáveis macroeconômicas como produção industrial e emprego, este indicador criado foi comparado a indicadores de sentimento dos investidores, condições financeiras e indicadores de incerteza política, encontrando-se evidências que a ferramenta do Google pode fornecer informações relevantes sobre o impacto do sentimento da pessoas na economia real. / Data from texts and Internet search queries are a relatively new source of information to economists. In this PhD dissertation, I analyze the impact of news’ information content and Google searches patterns on asset pricing and business cycle. In chapter 1, using a unique intradaily news database, I create news measures to approximate people’s concern about macroeconomic variables. Using Fama-McBeth framework, I conclude that those measures improve the explanatory power of asset pricing models for brazilian stocks. In chapter 2, I identify days in which the Brazilian monetary authority is cited on media and test whether interest rate volatility is higher in those days. I find evidences that all vertices of the yield curve show a higher volatility on those days, especially short and medium-term vertices. I also find evidence that interest rate volatility is higher in days in which the Brazilian Central Bank releases special reports or when central bank directors deliver speech. Both results are stronger after 2014. Finally, in chapter 3, a proxy for people’s sentiment measured as the relative searches of economic words with positive or negative connotation is extracted from Google’s search queries and used to explain macroeconomic variables in a VAR enviroment. For the US economy, responses of employment and production to shocks (innovations) in this sentiment index are similar to those observed when the innovation comes from a financial conditions indicator, and are higher and last longer than shocks on investor’s sentiment and policy uncertainty. Finanças empíricas Volatilidade Sentimento Text data Asset pricing VAR Empirical finance Volatility Sentiment Economia Imprensa Finanças Mercado financeiro Vias de comunicação
15	Dolovanie znalostí z textových dát použitím metód umelej inteligencie / Text Mining Based on Artificial Intelligence Methods Povoda, Lukáš January 2018 (has links) This work deals with the problem of text mining which is becoming more popular due to exponential growth of the data in electronic form. The work explores contemporary methods and their improvement using optimization methods, as well as the problem of text data understanding in general. The work addresses the problem in three ways: using traditional methods and their optimizations, using Big Data in train phase and abstraction through the minimization of language-dependent parts, and introduction of the new method based on the deep learning which is closer to how human reads and understands text data. The main aim of the dissertation was to propose a method for machine understanding of unstructured text data. The method was experimentally verified by classification of text data on 5 different languages – Czech, English, German, Spanish and Chinese. This demonstrates possible application to different languages families. Validation on the Yelp evaluation database achieve accuracy higher by 0.5% than current methods.
16	The development of accented English synthetic voices Malatji, Promise Tshepiso January 2019 (has links) Thesis (M. Sc. (Computer Science)) --University of Limpopo, 2019 / A Text-to-speech (TTS) synthesis system is a software system that receives text as input and produces speech as output. A TTS synthesis system can be used for, amongst others, language learning, and reading out text for people living with different disabilities, i.e., physically challenged, visually impaired, etc., by native and non-native speakers of the target language. Most people relate easily to a second language spoken by a non-native speaker they share a native language with. Most online English TTS synthesis systems are usually developed using native speakers of English. This research study focuses on developing accented English synthetic voices as spoken by non-native speakers in the Limpopo province of South Africa. The Modular Architecture for Research on speech sYnthesis (MARY) TTS engine is used in developing the synthetic voices. The Hidden Markov Model (HMM) method was used to train the synthetic voices. Secondary training text corpus is used to develop the training speech corpus by recording six speakers reading the text corpus. The quality of developed synthetic voices is measured in terms of their intelligibility, similarity and naturalness using a listening test. The results in the research study are classified based on evaluators’ occupation and gender and the overall results. The subjective listening test indicates that the developed synthetic voices have a high level of acceptance in terms of similarity and intelligibility. A speech analysis software is used to compare the recorded synthesised speech and the human recordings. There is no significant difference in the voice pitch of the speakers and the synthetic voices except for one synthetic voice. Text-to-speech synthesis system Language learning Text data mining Data compression (computer science) Informal language learning
17	Comparing Communities & User Clusters in Twitter Network Data Bhowmik, Kowshik January 2019 (has links) No description available. Computer Science Network Data Analysis Text Data Analysis Social Media Mining Community Detection Document Clustering Machine Learning
18	Improving Inferences about Preferences in Choice Modeling Kim, Hyowon 22 September 2020 (has links) No description available. Business Administration Marketing Statistics Economics
19	[en] FISCAL POLICY RISK AND THE YIELD CURVE: AN ALTERNATIVE MEASURE / [pt] RISCO FISCAL E CURVA DE JUROS: UMA MEDIDA ALTERNATIVA RENATA CARREIRO AVILA 07 August 2023 (has links) [pt] Risco fiscal afeta a curva de juros no contexto de economias emergentes? Como medir adequadamente esse tipo de risco? Explorando o caso do Brasil, estimamos uma medida alternativa de risco fiscal com base em notícias, utilizando processamento de linguagem de texto. Encontramos que aumento em risco fiscal gera aumento em taxas de juros longas, no prêmio a termo e depreciação na taxa de câmbio. Os efeitos são robustos a uma série de especificações alternativas do índice de risco fiscal, sugerindo que se trata de um fenômeno relevante no cenário brasileiro. / [en] Does fiscal policy risk affect the yield curve in an emerging economy? How can we adequately measure this kind of uncertainty? Exploiting the case of Brazil, we estimate a novel, news-based measure of fiscal policy risk using natural language processing. We show that increases in fiscal policy risk are associated to increases in the levels of long maturities in the yield curve, in the term spread and to a depreciation of the exchange rate. The effects are robust to a series of alternative specifications of the text-based index, suggesting that fiscal risk is a relevant phenomenon in the Brazilian setting. [pt] POLITICA FISCAL [pt] CURVA DE JUROS [pt] DADOS DE TEXTO [pt] RISCO E INCERTEZA [en] FISCAL POLICY [en] YIELD CURVE [en] TEXT DATA [en] RISK AND UNCERTAINTY
20	Towards Building Privacy-Preserving Language Models: Challenges and Insights in Adapting PrivGAN for Generation of Synthetic Clinical Text Nazem, Atena January 2023 (has links) The growing development of artificial intelligence (AI), particularly neural networks, is transforming applications of AI in healthcare, yet it raises significant privacy concerns due to potential data leakage. As neural networks memorise training data, they may inadvertently expose sensitive clinical data to privacy breaches, which can engender serious repercussions like identity theft, fraud, and harmful medical errors. While regulations such as GDPR offer safeguards through guidelines, rooted and technical protections are required to address the problem of data leakage. Reviews of various approaches show that one avenue of exploration is the adaptation of Generative Adversarial Networks (GANs) to generate synthetic data for use in place of real data. Since GANs were originally designed and mainly researched for generating visual data, there is a notable gap for further exploration of adapting GANs with privacy-preserving measures for generating synthetic text data. Thus, to address this gap, this study aims at answering the research questions of how a privacy-preserving GAN can be adapted to safeguard the privacy of clinical text data and what challenges and potential solutions are associated with these adaptations. To this end, the existing privGAN framework—originally developed and tested for image data—was tailored to suit clinical text data. Following the design science research framework, modifications were made while adhering to the privGAN architecture to incorporate reinforcement learning (RL) for addressing the discrete nature of text data. For synthetic data generation, this study utilised the 'Discharge summary' class from the Noteevents table of the MIMIC-III dataset, which is clinical text data in American English. The utility of the generated data was assessed using the BLEU-4 metric, and a white-box attack was conducted to test the model's resistance to privacy breaches. The experiment yielded a very low BLEU-4 score, indicating that the generator could not produce synthetic data that would capture the linguistic characteristics and patterns of real data. The relatively low white-box attack accuracy of one discriminator (0.2055) suggests that the trained discriminator was less effective in inferring sensitive information with high accuracy. While this may indicate a potential for preserving privacy, increasing the number of discriminators proves less favourable results (0.361). In light of these results, it is noted that the adapted approach in defining the rewards as a measure of discriminators’ uncertainty can signal a contradicting learning strategy and lead to the low utility of data. This study underscores the challenges in adapting privacy-preserving GANs for text data due to the inherent complexity of GANs training and the required computational power. To obtain better results in terms of utility and confirm the effectiveness of the privacy measures, further experiments are required to consider a more direct and granular rewarding system for the generator and to obtain an optimum learning rate. As such, the findings reiterate the necessity for continued experimentation and refinement in adapting privacy-preserving GANs for clinical text. Generative Adversarial Networks privacy-preserving language models clinical text data reinforcement learning synthetic data Information Systems

Search results