• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 389
  • 176
  • 42
  • 26
  • 26
  • 24
  • 20
  • 20
  • 12
  • 12
  • 9
  • 9
  • 9
  • 9
  • 9
  • Tagged with
  • 921
  • 212
  • 149
  • 144
  • 130
  • 104
  • 97
  • 84
  • 82
  • 81
  • 72
  • 70
  • 69
  • 67
  • 64
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
361

Prevalência de perda auditiva autodeclarada e fatores associados : informante primário versus proxy

Quevedo, André Luis Alves de January 2015 (has links)
INTRODUÇÃO: Estudos epidemiológicos do tipo inquérito domiciliar, em algumas situações, empregam informantes secundários, substitutos, informante-chave ou também denominados proxy para coletar informações sobre outros indivíduos, especialmente na ausência do informante primário. Estudos na literatura científica têm avaliado se existe diferença na prevalência dos desfechos quando consideradas separadamente as respostas dos informantes primários e informantes proxy. Na área dos distúrbios da comunicação não foram identificados estudos que verificassem a presença ou não de vieses sobre perda auditiva autodeclarada quando se utilizam respostas de informantes proxy. OBJETIVO: Avaliar se existe diferença entre as prevalências de perda auditiva autodeclarada e fatores associados quando separadas as respostas de informante primário e informante proxy no Estudo de Distúrbios da Comunicação Humana de base Populacional (DCH-POP). MÉTODO: Trata-se de estudo de métodos em epidemiologia realizado a partir dos dados de um inquérito domiciliar populacional, do tipo transversal, com uma amostra probabilística estratificada por múltiplos estágios de 1.248 indivíduos, realizada em um bairro do município de Porto Alegre, Rio Grande do Sul, Brasil. Foram realizadas medidas de proporções, medianas e desvio interquartílico (variáveis idade em anos e anos de escolaridade) para a população estudada, e por informante primário e informante proxy. Para verificar a existência de diferença nas características sócio demográficas e prevalências autodeclaradas por informantes primários e informante proxy foram utilizados os testes Qui-quadrado de Pearson e Exato de Fisher para variáveis categóricas, e o teste não paramétrico de Mann-Whitney para variáveis contínuas com distribuição não simétrica. Ainda, foi realizada modelagem por regressão logística para a variável dependente perda auditiva considerando as informações de toda a amostra estudada, somente as respostas dos informantes primários, e somente as respostas dos informantes proxy. No modelo de análise multivariada, ficaram retidas apenas aquelas variáveis que apresentaram uma associação com o desfecho perda auditiva ao nível de p<0,20. A magnitude da associação foi determinada por Razão de Odds (OR) e IC95%. RESULTADOS: Considerando respostas autodeclaradas pelos informantes primários (479 indivíduos) e informantes proxy (769 indivíduos), apenas as variáveis infecção de ouvido nos últimos 12 meses, cirurgia de ouvido, rinite e sinusite não apresentaram diferença entre as prevalências informadas por informantes primários e informantes proxy. De forma geral, observa-se que para todas as variáveis analisadas, e que diferiram estatisticamente, as prevalências declaradas por informantes proxy subestimaram os desfechos estudados quando comparadas às respostas dos informantes primários. Nos modelos finais, apenas as variáveis independentes idade e tontura estiveram associadas com o desfecho de perda auditiva. Para tontura a maior OR foi encontrada no modelo com apenas os dados dos informantes proxy; enquanto que o modelo com apenas as respostas dos informantes primários apresentou uma OR menor que a do modelo para toda a amostra estudada, e que o modelo somente com as respostas dos informantes proxy. CONSIDERAÇÕES FINAIS: Aponta-se sobre a necessidade de que, sempre, ao se utilizar dados coletados a partir de informante proxy explorar como essas respostas impactam nos resultados gerais da população estudada. E caso existam vieses é importante que sejam utilizados ajustes estatísticos para diminuir essas diferenças. / INTRODUCTION: Epidemiological studies, as household surveys, in some situations use secondary informants, substitutes, key-informant or also called as proxy to collect information about others, especially in the absence of primary informant. Studies in the literature have evaluated whether there are differences in the prevalence of outcomes when treated as separate responses of the primary informants and informant's proxy. In the field of communication disorders studies that check the presence or absence of biases on self-reported hearing loss when using informant's proxy answers were not identified. OBJECTIVE: To assess whether there is difference between the prevalence of selfreported hearing loss and associated factors when treated separately the primary informant and proxy informant answers in Distúrbios da Comunicação Humana de base Populacional (DCH-POP) Study. METHOD: This is a study of an epidemiological method based on data from a population-based cross-sectional household survey, with a probabilistic multistage stratified sample of 1,248 individuals held in a neighborhood of the city of Porto Alegre, Rio Grande do Sul, Brazil. Measurement of proportions, medians and interquartile range were performed, for the whole population studied, and primary informant and proxy informant. To verify the existence of differences in sociodemographic characteristics and self-reported prevalence of primary informants and proxy informant we used the chi-squared test and Fisher's exact test for categorical variables, and the Mann-Whitney nonparametric test for continuous variables with non-symmetrical distribution. Still, logistic regression was performed using the hearing loss as dependent variable and considering the information of the entire sample studied, only the responses of primary informants, and only the responses of proxy informant. In the multivariate model it were retained only those variables that showed association with hearing loss at level p<0,20. The magnitude of the association was determined by odds ratio (OR) and 95% CI. RESULTS: Considering self-reported answers by the primary informants (479 individuals) and proxy informants (769 individuals), only the variables ear infection in the last 12 months, ear surgery, rhinitis and sinusitis showed no difference between the prevalence reported by primary informants and proxy informants. In general, it is observed that for all variables which differ significantly, the prevalence declared by proxy informants underestimated the study outcomes when compared with the responses of primary informants. In the final model only independent variables age and dizziness were associated with the outcome of hearing loss. For dizziness the biggest OR was found in the model with only data from proxy informant; while the model with only the responses of primary informants found a lower OR that the model for the whole sample, and the model with only data from proxy informant. CONCLUSION: It is pointed out on the need, to explore how the proxy’s responses impact the overall results of the study population. And if biases are likely to occur, it is important that statistical adjustments are used to reduce these differences.
362

Modélisation et apprentissage de dépendances á l’aide de copules dans les modéles probabilistes latents / Modeling and learning dependencies with copulas in latent topic models

Amoualian, Hesam 12 December 2017 (has links)
Ce travail de thése a pour objectif de s’intéresser à une classe de modèles hiérarchiques bayesiens, appelés topic models, servant à modéliser de grands corpus de documents et ceci en particulier dans le cas où ces documents arrivent séquentiellement. Pour cela, nous introduisons au Chapitre 3, trois nouveaux modèles prenant en compte les dépendances entre les thèmes relatifs à chaque document pour deux documents successifs. Le premier modèle s’avère être une généralisation directe du modèle LDA (Latent Dirichlet Allocation). On utilise une loi de Dirichlet pour prendre en compte l’influence sur un document des paramètres relatifs aux thèmes sous jacents du document précédent. Le deuxième modèle utilise les copules, outil générique servant à modéliser les dépendances entre variables aléatoires. La famille de copules utilisée est la famille des copules Archimédiens et plus précisément la famille des copules de Franck qui vérifient de bonnes propriétés (symétrie, associativité) et qui sont donc adaptés à la modélisation de variables échangeables. Enfin le dernier modèle est une extension non paramétrique du deuxième. On intègre cette fois ci lescopules dans la construction stick-breaking des Processus de Dirichlet Hiérarchique (HDP). Nos expériences numériques, réalisées sur cinq collections standard, mettent en évidence les performances de notre approche, par rapport aux approches existantes dans la littérature comme les dynamic topic models, le temporal LDA et les Evolving Hierarchical Processes, et ceci à la fois sur le plan de la perplexité et en terme de performances lorsqu’on cherche à détecter des thèmes similaires dans des flux de documents. Notre approche, comparée aux autres, se révèle être capable de modéliser un plus grand nombre de situations allant d’une dépendance forte entre les documents à une totale indépendance. Par ailleurs, l’hypothèse d’échangeabilité sous jacente à tous les topics models du type du LDA amène souvent à estimer des thèmes différents pour des mots relevant pourtant du même segment de phrase ce qui n’est pas cohérent. Dans le Chapitre 4, nous introduisons le copulaLDA (copLDA), qui généralise le LDA en intégrant la structure du texte dans le modèle of the text et de relaxer l’hypothèse d’indépendance conditionnelle. Pour cela, nous supposons que les groupes de mots dans un texte sont reliés thématiquement entre eux. Nous modélisons cette dépendance avec les copules. Nous montrons de manièreempirique l’efficacité du modèle copLDA pour effectuer à la fois des tâches de natureintrinsèque et extrinsèque sur différents corpus accessibles publiquement. Pour compléter le modèle précédent (copLDA), le chapitre 5 présente un modèle de type LDA qui génére des segments dont les thèmes sont cohérents à l’intérieur de chaque document en faisant de manière simultanée la segmentation des documents et l’affectation des thèmes à chaque mot. La cohérence entre les différents thèmes internes à chaque groupe de mots est assurée grâce aux copules qui relient les thèmes entre eux. De plus ce modèle s’appuie tout à la fois sur des distributions spécifiques pour les thèmes reliés à chaque document et à chaque groupe de mots, ceci permettant de capturer les différents degrés de granularité. Nous montrons que le modèle proposé généralise naturellement plusieurs modèles de type LDA qui ont été introduits pour des tâches similaires. Par ailleurs nos expériences, effectuées sur six bases de données différentes mettent en évidence les performances de notre modèle mesurée de différentes manières : à l’aide de la perplexité, de la Pointwise Mutual Information Normalisée, qui capture la cohérence entre les thèmes et la mesure Micro F1 measure utilisée en classification de texte. / This thesis focuses on scaling latent topic models for big data collections, especiallywhen document streams. Although the main goal of probabilistic modeling is to find word topics, an equally interesting objective is to examine topic evolutions and transitions. To accomplish this task, we propose in Chapter 3, three new models for modeling topic and word-topic dependencies between consecutive documents in document streams. The first model is a direct extension of Latent Dirichlet Allocation model (LDA) and makes use of a Dirichlet distribution to balance the influence of the LDA prior parameters with respect to topic and word-topic distributions of the previous document. The second extension makes use of copulas, which constitute a generic tool to model dependencies between random variables. We rely here on Archimedean copulas, and more precisely on Franck copula, as they are symmetric and associative and are thus appropriate for exchangeable random variables. Lastly, the third model is a non-parametric extension of the second one through the integration of copulas in the stick-breaking construction of Hierarchical Dirichlet Processes (HDP). Our experiments, conducted on five standard collections that have been used in several studies on topic modeling, show that our proposals outperform previous ones, as dynamic topic models, temporal LDA and the Evolving Hierarchical Processes,both in terms of perplexity and for tracking similar topics in document streams. Compared to previous proposals, our models have extra flexibility and can adapt to situations where there are no dependencies between the documents.On the other hand, the "Exchangeability" assumption in topic models like LDA oftenresults in inferring inconsistent topics for the words of text spans like noun-phrases, which are usually expected to be topically coherent. In Chapter 4, we propose copulaLDA (copLDA), that extends LDA by integrating part of the text structure to the model and relaxes the conditional independence assumption between the word-specific latent topics given the per-document topic distributions. To this end, we assume that the words of text spans like noun-phrases are topically bound and we model this dependence with copulas. We demonstrate empirically the effectiveness of copLDA on both intrinsic and extrinsic evaluation tasks on several publicly available corpora. To complete the previous model (copLDA), Chapter 5 presents an LDA-based model that generates topically coherent segments within documents by jointly segmenting documents and assigning topics to their words. The coherence between topics is ensured through a copula, binding the topics associated to the words of a segment. In addition, this model relies on both document and segment specific topic distributions so as to capture fine-grained differences in topic assignments. We show that the proposed model naturally encompasses other state-of-the-art LDA-based models designed for similar tasks. Furthermore, our experiments, conducted on six different publicly available datasets, show the effectiveness of our model in terms of perplexity, Normalized Pointwise Mutual Information, which captures the coherence between the generated topics, and the Micro F1 measure for text classification.
363

Prevalência de perda auditiva autodeclarada e fatores associados : informante primário versus proxy

Quevedo, André Luis Alves de January 2015 (has links)
INTRODUÇÃO: Estudos epidemiológicos do tipo inquérito domiciliar, em algumas situações, empregam informantes secundários, substitutos, informante-chave ou também denominados proxy para coletar informações sobre outros indivíduos, especialmente na ausência do informante primário. Estudos na literatura científica têm avaliado se existe diferença na prevalência dos desfechos quando consideradas separadamente as respostas dos informantes primários e informantes proxy. Na área dos distúrbios da comunicação não foram identificados estudos que verificassem a presença ou não de vieses sobre perda auditiva autodeclarada quando se utilizam respostas de informantes proxy. OBJETIVO: Avaliar se existe diferença entre as prevalências de perda auditiva autodeclarada e fatores associados quando separadas as respostas de informante primário e informante proxy no Estudo de Distúrbios da Comunicação Humana de base Populacional (DCH-POP). MÉTODO: Trata-se de estudo de métodos em epidemiologia realizado a partir dos dados de um inquérito domiciliar populacional, do tipo transversal, com uma amostra probabilística estratificada por múltiplos estágios de 1.248 indivíduos, realizada em um bairro do município de Porto Alegre, Rio Grande do Sul, Brasil. Foram realizadas medidas de proporções, medianas e desvio interquartílico (variáveis idade em anos e anos de escolaridade) para a população estudada, e por informante primário e informante proxy. Para verificar a existência de diferença nas características sócio demográficas e prevalências autodeclaradas por informantes primários e informante proxy foram utilizados os testes Qui-quadrado de Pearson e Exato de Fisher para variáveis categóricas, e o teste não paramétrico de Mann-Whitney para variáveis contínuas com distribuição não simétrica. Ainda, foi realizada modelagem por regressão logística para a variável dependente perda auditiva considerando as informações de toda a amostra estudada, somente as respostas dos informantes primários, e somente as respostas dos informantes proxy. No modelo de análise multivariada, ficaram retidas apenas aquelas variáveis que apresentaram uma associação com o desfecho perda auditiva ao nível de p<0,20. A magnitude da associação foi determinada por Razão de Odds (OR) e IC95%. RESULTADOS: Considerando respostas autodeclaradas pelos informantes primários (479 indivíduos) e informantes proxy (769 indivíduos), apenas as variáveis infecção de ouvido nos últimos 12 meses, cirurgia de ouvido, rinite e sinusite não apresentaram diferença entre as prevalências informadas por informantes primários e informantes proxy. De forma geral, observa-se que para todas as variáveis analisadas, e que diferiram estatisticamente, as prevalências declaradas por informantes proxy subestimaram os desfechos estudados quando comparadas às respostas dos informantes primários. Nos modelos finais, apenas as variáveis independentes idade e tontura estiveram associadas com o desfecho de perda auditiva. Para tontura a maior OR foi encontrada no modelo com apenas os dados dos informantes proxy; enquanto que o modelo com apenas as respostas dos informantes primários apresentou uma OR menor que a do modelo para toda a amostra estudada, e que o modelo somente com as respostas dos informantes proxy. CONSIDERAÇÕES FINAIS: Aponta-se sobre a necessidade de que, sempre, ao se utilizar dados coletados a partir de informante proxy explorar como essas respostas impactam nos resultados gerais da população estudada. E caso existam vieses é importante que sejam utilizados ajustes estatísticos para diminuir essas diferenças. / INTRODUCTION: Epidemiological studies, as household surveys, in some situations use secondary informants, substitutes, key-informant or also called as proxy to collect information about others, especially in the absence of primary informant. Studies in the literature have evaluated whether there are differences in the prevalence of outcomes when treated as separate responses of the primary informants and informant's proxy. In the field of communication disorders studies that check the presence or absence of biases on self-reported hearing loss when using informant's proxy answers were not identified. OBJECTIVE: To assess whether there is difference between the prevalence of selfreported hearing loss and associated factors when treated separately the primary informant and proxy informant answers in Distúrbios da Comunicação Humana de base Populacional (DCH-POP) Study. METHOD: This is a study of an epidemiological method based on data from a population-based cross-sectional household survey, with a probabilistic multistage stratified sample of 1,248 individuals held in a neighborhood of the city of Porto Alegre, Rio Grande do Sul, Brazil. Measurement of proportions, medians and interquartile range were performed, for the whole population studied, and primary informant and proxy informant. To verify the existence of differences in sociodemographic characteristics and self-reported prevalence of primary informants and proxy informant we used the chi-squared test and Fisher's exact test for categorical variables, and the Mann-Whitney nonparametric test for continuous variables with non-symmetrical distribution. Still, logistic regression was performed using the hearing loss as dependent variable and considering the information of the entire sample studied, only the responses of primary informants, and only the responses of proxy informant. In the multivariate model it were retained only those variables that showed association with hearing loss at level p<0,20. The magnitude of the association was determined by odds ratio (OR) and 95% CI. RESULTS: Considering self-reported answers by the primary informants (479 individuals) and proxy informants (769 individuals), only the variables ear infection in the last 12 months, ear surgery, rhinitis and sinusitis showed no difference between the prevalence reported by primary informants and proxy informants. In general, it is observed that for all variables which differ significantly, the prevalence declared by proxy informants underestimated the study outcomes when compared with the responses of primary informants. In the final model only independent variables age and dizziness were associated with the outcome of hearing loss. For dizziness the biggest OR was found in the model with only data from proxy informant; while the model with only the responses of primary informants found a lower OR that the model for the whole sample, and the model with only data from proxy informant. CONCLUSION: It is pointed out on the need, to explore how the proxy’s responses impact the overall results of the study population. And if biases are likely to occur, it is important that statistical adjustments are used to reduce these differences.
364

Modèles thématiques pour la découverte non supervisée de points de vue sur le Web / Topic Models for Unsupervised Discovery of Viewpoints on the Web

Thonet, Thibaut 23 November 2017 (has links)
Les plateformes en ligne telles que les blogs et les réseaux sociaux permettent aux internautes de s'exprimer sur des sujets d'une grande variété (produits commerciaux, politique, services, etc.). Cet important volume de données d'opinions peut être exploré et exploité grâce à des techniques de fouille de texte connues sous le nom de fouille d'opinions ou analyse de sentiments. Contrairement à la majorité des travaux actuels en fouille d'opinions, qui se focalisent sur les opinions simplement positives ou négatives (ou un intermédiaire entre ces deux extrêmes), nous nous intéressons dans cette thèse aux points de vue. La fouille de point de vue généralise l'opinion au delà de son acception usuelle liée à la polarité (positive ou négative) et permet l'étude d'opinions exprimées plus subtilement, telles que les opinions politiques. Nous proposons dans cette thèse des approches non supervisées - ne nécessitant aucune annotation préalable - basées sur des modèles thématiques probabilistes afin de découvrir simultanément les thèmes et les points de vue exprimés dans des corpus de textes d'opinion. Dans notre première contribution, nous avons exploré l'idée de différencier mots d'opinions (spécifiques à la fois à un point de vue et à un thème) et mots thématiques (dépendants du thème mais neutres vis-à-vis des différents points de vue) en nous basant sur les parties de discours, inspirée par des pratiques similaires dans la littérature de fouille d'opinions classique - restreinte aux opinions positives et négatives. Notre seconde contribution se focalise quant à elle sur les points de vue exprimés sur les réseaux sociaux. Notre objectif est ici d'analyser dans quelle mesure l'utilisation des interactions entre utilisateurs, en outre de leur contenu textuel généré, est bénéfique à l'identification de leurs points de vue. Nos différentes contributions ont été évaluées et comparées à l'état de l'art sur des collections de documents réels. / The advent of online platforms such as weblogs and social networking sites provided Internet users with an unprecedented means to express their opinions on a wide range of topics, including policy and commercial products. This large volume of opinionated data can be explored and exploited through text mining techniques known as opinion mining or sentiment analysis. Contrarily to traditional opinion mining work which mostly focuses on positive and negative opinions (or an intermediate in-between), we study a more challenging type of opinions: viewpoints. Viewpoint mining reaches beyond polarity-based opinions (positive/negative) and enables the analysis of more subtle opinions such as political opinions. In this thesis, we proposed unsupervised approaches – i.e., approaches which do not require any labeled data – based on probabilistic topic models to jointly discover topics and viewpoints expressed in opinionated data. In our first contribution, we explored the idea of separating opinion words (specific to both viewpoints and topics) from topical, neutral words based on parts of speech, inspired by similar practices in the litterature of non viewpoint-related opinion mining. Our second contribution tackles viewpoints expressed by social network users. We aimed to study to what extent social interactions between users – in addition to text content – can be beneficial to identify users' viewpoints. Our different contributions were evaluated and benchmarked against state-of-the-art baselines on real-world datasets
365

A probabilistic and incremental model for online classification of documents : DV-INBC

Rodrigues, Thiago Fredes January 2016 (has links)
Recentemente, houve um aumento rápido na criação e disponibilidade de repositórios de dados, o que foi percebido nas áreas de Mineração de Dados e Aprendizagem de Máquina. Este fato deve-se principalmente à rápida criação de tais dados em redes sociais. Uma grande parte destes dados é feita de texto, e a informação armazenada neles pode descrever desde perfis de usuários a temas comuns em documentos como política, esportes e ciência, informação bastante útil para várias aplicações. Como muitos destes dados são criados em fluxos, é desejável a criação de algoritmos com capacidade de atuar em grande escala e também de forma on-line, já que tarefas como organização e exploração de grandes coleções de dados seriam beneficiadas por eles. Nesta dissertação um modelo probabilístico, on-line e incremental é apresentado, como um esforço em resolver o problema apresentado. O algoritmo possui o nome DV-INBC e é uma extensão ao algoritmo INBC. As duas principais características do DV-INBC são: a necessidade de apenas uma iteração pelos dados de treino para criar um modelo que os represente; não é necessário saber o vocabulário dos dados a priori. Logo, pouco conhecimento sobre o fluxo de dados é necessário. Para avaliar a performance do algoritmo, são apresentados testes usando datasets populares. / Recently the fields of Data Mining and Machine Learning have seen a rapid increase in the creation and availability of data repositories. This is mainly due to its rapid creation in social networks. Also, a large part of those data is made of text documents. The information stored in such texts can range from a description of a user profile to common textual topics such as politics, sports and science, information very useful for many applications. Besides, since many of this data are created in streams, scalable and on-line algorithms are desired, because tasks like organization and exploration of large document collections would be benefited by them. In this thesis an incremental, on-line and probabilistic model for document classification is presented, as an effort of tackling this problem. The algorithm is called DV-INBC and is an extension to the INBC algorithm. The two main characteristics of DV-INBC are: only a single scan over the data is necessary to create a model of it; the data vocabulary need not to be known a priori. Therefore, little knowledge about the data stream is needed. To assess its performance, tests using well known datasets are presented.
366

Análise informacional em paralelos : questões relacionadas às funções de tópico e de foco

Breunig, Gustavo January 2014 (has links)
Buscando uma visão mais particular de analisar a estrutura informacional, este trabalho apresenta uma análise do tópico e do foco sentencial em paralelos. A análise da distribuição da informação na sentença, ou estrutura informacional, é uma área ainda nova de pesquisa, com diversos termos, bem como sentidos para esses termos, surgindo frequentemente. A presente análise tem por objetivos dissociar o foco e o tópico como complementos um do outro, mostrando uma possibilidade de análise em que o foco e o tópico ocorrem em seus próprios níveis de análise – os paralelos – o que lhes torna mais autônomos e, ainda assim, mais próximos em determinadas análises. O foco é o elemento que recebe maior destaque neste trabalho, que compara ambos os paralelos, porém analisa mais sentenças e problemas envolvendo questões referentes ao foco, como a existência de um tipo de foco informacional secundário, aqui chamado de foco auxiliar, o qual tem como objetivo indicar uma nova informação não solicitada. Ainda, no tratamento dado ao paralelo do foco, relações entre o foco e implicaturas estão presentes. O tópico, por outro lado, é um elemento que ainda necessita de pesquisas mais profundas, estando aqui apresentado de uma forma mais inicial. O desenvolvimento do tópico aqui presente está relacionado a questões mais básicas, mas nem por isso menos importantes. / Looking for a particular way of analyzing the information structure, this work presents an analysis of sentential topic and focus in parallels. The way information is distributed in a sentences, or its information structure, is a research subject still beginning to develop, with many terms, as well as meanings for those terms, coming up often. This analysis aims at dissociating focus and topic as complements of each other, showing a possibility of analysis in which these elements occur in different levels of analysis – the parallels – which allow them more autonomy, as well as closer in some analysis. Focus is the element that will be dealt with more in the present work, which compares both parallels, but analyzes more sentences and problems concerning the focus, such as the existence of a secondary information focus, which is called auxiliary focus, that aims at providing new information that has not been asked for. Also, in the focus parallel, the relation between focus and implicature is dealt with. Topic, on the other hand, is an element that still needs deeper research, here being introduced concerning more basic problems, but still important ones.
367

Análise informacional em paralelos : questões relacionadas às funções de tópico e de foco

Breunig, Gustavo January 2014 (has links)
Buscando uma visão mais particular de analisar a estrutura informacional, este trabalho apresenta uma análise do tópico e do foco sentencial em paralelos. A análise da distribuição da informação na sentença, ou estrutura informacional, é uma área ainda nova de pesquisa, com diversos termos, bem como sentidos para esses termos, surgindo frequentemente. A presente análise tem por objetivos dissociar o foco e o tópico como complementos um do outro, mostrando uma possibilidade de análise em que o foco e o tópico ocorrem em seus próprios níveis de análise – os paralelos – o que lhes torna mais autônomos e, ainda assim, mais próximos em determinadas análises. O foco é o elemento que recebe maior destaque neste trabalho, que compara ambos os paralelos, porém analisa mais sentenças e problemas envolvendo questões referentes ao foco, como a existência de um tipo de foco informacional secundário, aqui chamado de foco auxiliar, o qual tem como objetivo indicar uma nova informação não solicitada. Ainda, no tratamento dado ao paralelo do foco, relações entre o foco e implicaturas estão presentes. O tópico, por outro lado, é um elemento que ainda necessita de pesquisas mais profundas, estando aqui apresentado de uma forma mais inicial. O desenvolvimento do tópico aqui presente está relacionado a questões mais básicas, mas nem por isso menos importantes. / Looking for a particular way of analyzing the information structure, this work presents an analysis of sentential topic and focus in parallels. The way information is distributed in a sentences, or its information structure, is a research subject still beginning to develop, with many terms, as well as meanings for those terms, coming up often. This analysis aims at dissociating focus and topic as complements of each other, showing a possibility of analysis in which these elements occur in different levels of analysis – the parallels – which allow them more autonomy, as well as closer in some analysis. Focus is the element that will be dealt with more in the present work, which compares both parallels, but analyzes more sentences and problems concerning the focus, such as the existence of a secondary information focus, which is called auxiliary focus, that aims at providing new information that has not been asked for. Also, in the focus parallel, the relation between focus and implicature is dealt with. Topic, on the other hand, is an element that still needs deeper research, here being introduced concerning more basic problems, but still important ones.
368

A probabilistic and incremental model for online classification of documents : DV-INBC

Rodrigues, Thiago Fredes January 2016 (has links)
Recentemente, houve um aumento rápido na criação e disponibilidade de repositórios de dados, o que foi percebido nas áreas de Mineração de Dados e Aprendizagem de Máquina. Este fato deve-se principalmente à rápida criação de tais dados em redes sociais. Uma grande parte destes dados é feita de texto, e a informação armazenada neles pode descrever desde perfis de usuários a temas comuns em documentos como política, esportes e ciência, informação bastante útil para várias aplicações. Como muitos destes dados são criados em fluxos, é desejável a criação de algoritmos com capacidade de atuar em grande escala e também de forma on-line, já que tarefas como organização e exploração de grandes coleções de dados seriam beneficiadas por eles. Nesta dissertação um modelo probabilístico, on-line e incremental é apresentado, como um esforço em resolver o problema apresentado. O algoritmo possui o nome DV-INBC e é uma extensão ao algoritmo INBC. As duas principais características do DV-INBC são: a necessidade de apenas uma iteração pelos dados de treino para criar um modelo que os represente; não é necessário saber o vocabulário dos dados a priori. Logo, pouco conhecimento sobre o fluxo de dados é necessário. Para avaliar a performance do algoritmo, são apresentados testes usando datasets populares. / Recently the fields of Data Mining and Machine Learning have seen a rapid increase in the creation and availability of data repositories. This is mainly due to its rapid creation in social networks. Also, a large part of those data is made of text documents. The information stored in such texts can range from a description of a user profile to common textual topics such as politics, sports and science, information very useful for many applications. Besides, since many of this data are created in streams, scalable and on-line algorithms are desired, because tasks like organization and exploration of large document collections would be benefited by them. In this thesis an incremental, on-line and probabilistic model for document classification is presented, as an effort of tackling this problem. The algorithm is called DV-INBC and is an extension to the INBC algorithm. The two main characteristics of DV-INBC are: only a single scan over the data is necessary to create a model of it; the data vocabulary need not to be known a priori. Therefore, little knowledge about the data stream is needed. To assess its performance, tests using well known datasets are presented.
369

A text-mining based approach to capturing the NHS patient experience

Bahja, Mohammed January 2017 (has links)
An important issue for healthcare service providers is to achieve high levels of patient satisfaction. Collecting patient feedback about their experience in hospital enables providers to analyse their performance in terms of the levels of satisfaction and to identify the strengths and limitations of their service delivery. A common method of collecting patient feedback is via online portals and the forums of the service provider, where the patients can rate and comment about the service received. A challenge in analysing patient experience collected via online portals is that the amount of data can be huge and hence, prohibitive to analyse manually. In this thesis, an automated approach to patient experience analysis via Sentiment Analysis, Topic Modelling, and Dependency Parsing methods is presented. The patient experience data collected from the National Health Service (NHS) online portal in the United Kingdom is analysed in the study to understand this experience. The study was carried out in three iterations: (1) In the first, the Sentiment Analysis method was applied, which identified whether a given patient feedback item was positive or negative. (2) The second iteration involved applying Topic Modelling methods to identify automatically themes and topics from the patient feedback. Further, the outcomes of the Sentiment Analysis study from the first iteration were utilised to identify the patient sentiment regarding the topic being discussed in a given comment. (3) In the third iteration of the study, Dependency Parsing methods were employed for each patient feedback item and the topics identified. A method was devised to summarise the reason for a particular sentiment about each of the identified topics. The outcomes of the study demonstrate that text-mining methods can be effectively utilised to identify patients’ sentiment in their feedback as well as to identify the themes and topics discussed in it. The approach presented in the study was proven capable of effectively automatically analysing the NHS patient feedback database. Specifically, it can provide an overview of the positive and negative sentiment rate, identify the frequently discussed topics and summarise individual patient feedback items. Moreover, an API visualisation tool is introduced to make the outcomes more accessible to the health care providers.
370

Extracting metadata from textual documents and utilizing metadata for adding textual documents to an ontology

Caubet, Marc, Cifuentes, Mònica January 2006 (has links)
The term Ontology is borrowed from philosophy, where an ontology is a systematic account of Existence. In Computer Science, ontology is a tool allowing the effective use of information, making it understandable and accessible to the computer. For these reasons, the study of ontologies gained growing interest recently. Our motivation is to create a tool able to build ontologies from a set of textual documents. We present a prototype implementation which extracts metadata from textual documents and uses the metadata for adding textual documents to an ontology. In this paper we will investigate which techniques we have available and which ones have been used to accomplish our problem. Finally, we will show a program written in Java which allows us to build ontologies from textual documents using our approach.

Page generated in 0.1151 seconds