Global ETD Search

1	[en] AUTOMATIC TEXT CATEGORIZATION BASED ON TEXT MINING / [pt] CATEGORIZAÇÃO AUTOMÁTICA DE TEXTOS BASEADA EM MINERAÇÃO DE TEXTOS FABIO DE AZEVEDO SOARES 15 July 2014 (has links) [pt] A Categorização de Documentos, uma das tarefas desempenhadas em Mineração de Textos, pode ser descrita como a obtenção de uma função que seja capaz de atribuir a um documento uma categoria a que ele pertença. O principal objetivo de se construir uma taxonomia de documentos é tornar mais fácil a obtenção de informação relevante. Porém, a implementação e a execução de um processo de Categorização de Documentos não é uma tarefa trivial: as ferramentas de Mineração de Textos estão em processo de amadurecimento e ainda, demandam elevado conhecimento técnico para a sua utilização. Além disso, exercendo grande importância em um processo de Mineração de Textos, a linguagem em que os documentos se encontram escritas deve ser tratada com as particularidades do idioma. Contudo há grande carência de ferramentas que forneçam tratamento adequado ao Português do Brasil. Dessa forma, os objetivos principais deste trabalho são pesquisar, propor, implementar e avaliar um framework de Mineração de Textos para a Categorização Automática de Documentos, capaz de auxiliar a execução do processo de descoberta de conhecimento e que ofereça processamento linguístico para o Português do Brasil. / [en] Text Categorization, one of the tasks performed in Text Mining, can be described as the achievement of a function that is able to assign a document to the category, previously defined, to which it belongs. The main goal of building a taxonomy of documents is to make easier obtaining relevant information. However, the implementation and execution of Text Categorization is not a trivial task: Text Mining tools are under development and still require high technical expertise to be handled, also having great significance in a Text Mining process, the language of the documents should be treated with the peculiarities of each idiom. Yet there is great need for tools that provide proper handling to Portuguese of Brazil. Thus, the main aims of this work are to research, propose, implement and evaluate a Text Mining Framework for Automatic Text Categorization, capable of assisting the execution of knowledge discovery process and provides language processing for Brazilian Portuguese. [pt] FRAMEWORK [en] FRAMEWORK [pt] MINERACAO DE TEXTOS [en] TEXTS MINING [pt] PORTUGUES BRASILEIRO [en] BRAZILIAN PORTUGUESE [pt] AUTOMATICA [en] AUTOMATIC [pt] CATEGORIZACAO [en] CATEGORIZATION
2	[en] THE SEMANTIC CLASSIFICATION OF TECHNICAL COMPOUND NOUNS AND THEIR TRANSLATION TO PORTUGUESE / [pt] A CATEGORIZAÇÃO SEMÂNTICA DOS COMPOSTOS NOMINAIS TÉCNICOS EM LÍNGUA INGLESA E OS RESULTADOS TRADUTÓRIOS EM PORTUGUÊS PAULA SANTOS DINIZ 23 May 2017 (has links) [pt] Este trabalho propõe uma classificação semântica dos compostos nominais técnicos em língua inglesa e a análise sintática e semântica das traduções para o português. Para tanto, faz-se um panorama da literatura sobre as relações semânticas dos compostos nominais em língua inglesa. A tipologia aqui proposta é, portanto, baseada em estudos clássicos sobre a semântica dos compostos nominais (Levi, 1978; Warren, 1978) e em pesquisas mais recentes — inseridas no escopo da Linguística Computacional e ou influenciadas pela Teoria do Léxico Gerativo, de Pustejovsky (1995) —, e adaptada para a natureza dos compostos nominais selecionados. A presente dissertação também analisa as traduções dos compostos nominais técnicos para o português, bem como a função das preposições nas estruturas com sintagmas preposicionados. O corpus foi retirado de um livro técnico da área de engenharia elétrica/eletrônica traduzido pela autora. Além da classificação semântica dos compostos nominais técnicos, propõe-se a criação de ontologias que contemplem os compostos com os mesmos núcleos ou modificadores, de modo a observar se núcleos ou modificadores iguais implicam a mesma categorização, e se é respeitada a relação de hiperonímia e hiponímia entre os compostos nominais inseridos na mesma ontologia. / [en] The major purpose of this thesis is to suggest a semantic categorization of English technical noun compounds, as well as to analyze the semantics and syntax of the Portuguese renderings. First, the literature on semantic relations in English compound nouns is reviewed. The classification here suggested is therefore based on classic studies on the semantics of compound nouns (Levi, 1978; Warren, 1978) and on more recent research within the scope of Computational Linguistics, which are influenced by the Generative Lexicon Theory (Pustejovsky, 1995). The semantic categorization is also adapted to the data collected in this work. This thesis also analyzes the Portuguese translation of the English compound nouns, as well as the role of the prepositions in prepositional phrases. The data was taken from an electrical/electronics engineering book which was translated by the author. In addition to the semantic classification, the technical compound nouns are grouped together according to the head or modifiers of the structure, and assembled into ontologies. Compound nouns sharing a common head or modifier are grouped together, so as to investigate if there is a hypernym-hyponym relation among the compounds assembled in the same ontology. [pt] TRADUCAO [en] TRANSLATION [pt] SEMANTICA [en] SEMANTICS [pt] CATEGORIZACAO [en] CATEGORIZATION [pt] CORPUS [pt] COMPOSTOS NOMINAIS [pt] TERMOS TECNICOS [pt] RELACOES SEMANTICAS
3	[en] A STUDY OF MULTILABEL TEXT CLASSIFICATION ALGORITHMS USING NAIVE-BAYES / [pt] UM ESTUDO DE ALGORITMOS PARA CLASSIFICAÇÃO AUTOMÁTICA DE TEXTOS UTILIZANDO NAIVE-BAYES DAVID STEINBRUCH 12 March 2007 (has links) [pt] A quantidade de informação eletrônica vem crescendo de forma acelerada, motivada principalmente pela facilidade de publicação e divulgação que a Internet proporciona. Desta forma, é necessária a organização da informação de forma a facilitar a sua aquisição. Muitos trabalhos propuseram resolver este problema através da classificação automática de textos associando a eles vários rótulos (classificação multirótulo). No entanto, estes trabalhos transformam este problema em subproblemas de classificação binária, considerando que existe independência entre as categorias. Além disso, utilizam limiares (thresholds), que são muito específicos para o conjunto de treinamento utilizado, não possuindo grande capacidade de generalização na aprendizagem. Esta dissertação propõe dois algoritmos de classificação automática de textos baseados no algoritmo multinomial naive Bayes e sua utilização em um ambiente on-line de classificação automática de textos com realimentação de relevância pelo usuário. Para testar a eficiência dos algoritmos propostos, foram realizados experimentos na base de notícias Reuters 21758 e na base de documentos médicos Ohsumed. / [en] The amount of electronic information has been growing fast, mainly due to the easiness of publication and spreading that Internet provides. Therefore, is necessary the organisation of information to facilitate its retrieval. Many works have solved this problem through the automatic text classification, associating to them several labels (multilabel classification). However, those works have transformed this problem into binary classification subproblems, considering there is not dependence among categories. Moreover, they have used thresholds, which are very sepecific of the classifier document base, and so, does not have great generalization capacity in the learning process. This thesis proposes two text classifiers based on the multinomial algorithm naive Bayes and its usage in an on-line text classification environment with user relevance feedback. In order to test the proposed algorithms efficiency, experiments have been performed on the Reuters 21578 news base, and on the Ohsumed medical document base. [pt] APRENDIZADO DE MAQUINA [en] MACHINE LEARNING [pt] INTERNET [en] INTERNET [pt] CATEGORIZACAO DE TEXTOS [en] TEXT CATEGORIZATION [pt] CLASSIFICACAO DE TEXTOS [en] TEXT CLASSIFICATION [pt] MULTIROTULO [en] MULTILABEL [pt] NAIVE-BAYES [en] NAIVE-BAYES
4	[en] CATEGORIES HYBRIDIZATION HYPOTHESIS: A REANALYSIS OF ENGLISH GERUNDS AS CP NOMINALIZATIONS / [pt] HIPÓTESE DA HIBRIDAÇÃO DE CATEGORIAS: UMA REANÁLISE DAS GERUNDIVAS DO INGLÊS COMO NOMINALIZAÇÕES NO NÍVEL DE CP AYRTHON MOREIRA BREDER 22 November 2022 (has links) [pt] Propostas mais antigas para nominalizações sentencias se concentram nas gerundivas do inglês, analisando-as como nominalizações a partir de sentenças (Lees, 1960; Chomsky, 1970), enquanto investigações mais recentes variam em analisá-las como estruturas sentencias podadas (Frank e Kroch, 1994) ou nominalizações a partir de diferentes domínios (Abney, 1987; Alexiadou, 2001; Panagiotidis e Grohmann, 2005; Kornfilt e Whitman, 2011). Em comum, essas propostas observam a subordinação (sentencial) como gatilho para o processo de inserção de um elemento nominalizador na estrutura sintática. Enfrentam, contudo, dificuldades no tratamento da complexidade interna das estruturas resultantes, apresentando problemas de violação de c-seleção e de definição adhoc da defectividade dos traços das categorias envolvidas. O presente trabalho lança a hipótese de que as nominalizações sentenciais emergem da hibridação das categorias funcionais C e D, contestando a ideia clássica de que tais categorias constituem (tão somente) conjuntos estanques de traços. A hipótese abre caminho para uma análise das gerundivas do inglês como estruturas dominadas por uma categoria funcional híbrida, nesse caso C(interseção)D (complementizador interseção com determinante). A análise proposta pode se aplicar translinguisticamente, podendo ainda abarcar diferentes tipos de nominalização dentro de uma mesma língua. / [en] Early studies on nominalization focus on English gerunds as sentential nominalizations (Lees, 1960; Chomsky, 1970), while recent investigations vary in analysing them as pruned clauses (Frank and Kroch, 1994) or nominalizations of different syntactic domains (Abney, 1987; Alexiadou, 2001; Panagiotidis and Grohmann, 2005; Kornfilt and Whitman, 2011). In common, these proposals understand (clausal) subordination as a trigger for insertion of a nominalizer element into the syntactic structure. They face however, difficulties in dealing with the internal complexity of the resulting structures, while challenging principles of c-selection and appealing to ad-hoc definitions for the defectiveness of the involved categories. The present study raises the hypothesis that sentential nominalizations result from hybridization of the functional categories C and D, rejecting, thus, the classical idea that these functional elements are formed only by rigid, solid sets of formal features. This hypothesis paves the way for an analysis of English gerundives as structures headed by a hybrid functional category, in this case, C(intersection)D (complementizer intersection determinant). This analysis might apply crosslinguistically while explaining different types of intra-language nominalizations. [pt] CATEGORIZACAO [pt] GERUNDIVAS DO INGLES [pt] NOMINALIZACAO SENTENCIAL [pt] ESTRUTURAS MISTAS [en] CATEGORIZATION [en] ENGLISH GERUNDS [en] CATEGORIES HYBRIDIZATION HYPOTHESIS [en] SENTENTIAL NOMINALIZATION [en] COMPOSITE STRUCTURES
5	[pt] AS METODOLOGIAS DECISÓRIAS DA LIBERDADE DE DISCURSO: UM ESTUDO SOBRE A RELAÇÃO ENTRE FORMA E SUBSTÂNCIA NA JURISDIÇÃO CONSTITUCIONAL DA PRIMEIRA EMENDA / [en] THE DECISION-MAKING METHODOLOGIES OF THE FREEDOM OF SPEECH: A STUDY ABOUT THE RELATIONSHIP BETWEEN FORM AND SUBSTANCE IN THE FIRST AMENDMENT S DOCTRINES JOHANN MEERBAUM 08 September 2023 (has links) [pt] Este é um trabalho sobre a natureza das razões as quais a Suprema Corte dos Estados Unidos recorre para resolver casos envolvendo a liberdade de discurso. Considero que sejam dois os tipos de razões que orientam o processo decisório da Primeira Emenda: as formais e as substantivas. As razões substantivas são aquelas que o direito compartilha com outros domínios da ação social humana, como a moral, a economia e a política. As formais, por sua vez, são razões jurídicas autoritativas - no sentido de derivarem de uma norma jurídica válida (Constituição, leis, regulamentos, precedentes, contratos, e outros documentos normativos afins) – e compulsórias (ou excludentes), pois geralmente excluem do horizonte do raciocínio decisório razões substantivas concorrentes. O meu objetivo nesta dissertação é descrever a maneira pela qual o raciocínio jurídico formal e o raciocínio jurídico substantivo foram em certa medida conciliados no âmago da prática decisória da Suprema Corte norte-americana. Para tanto, esforço-me em apresentar, comentar e comparar entre si alguns dos mais emblemáticos julgamentos levados a cabo pela Corte ao longo de mais de um século de jurisdição constitucional da Primeira Emenda. Procuro mostrar também que os métodos adjudicatórios por ela desenvolvidos podem ser classificados de acordo com a importância que cada um deles atribui às razões formais (ou, por outro lado, às razões substanciais) da liberdade de discurso. Por exemplo: o conflito entre “balanceamento” e as metodologias pertencentes a “tradição definicional” (e.g., absolutismo, categorização) nada mais representa senão uma instância particular do conflito mais geral entre forma e substância no pensamento jurídico norte-americano. Mas se até meados da década de 1960 a discussão sobre métodos decisórios da liberdade de discurso era completamente dominada pela oposição entre balanceamento e absolutismo, aos poucos a Suprema Corte dos Estados Unidos, em companhia com grandes nomes do pensamento jurídico daquele país, foi abrindo seus olhos para a existência de pontos médios entre aqueles dois extremos. O resultado disto foi a criação de novas teorias normativas da decisão (e.g., o balanceamento definicional), bem como de uma série de testes, fórmulas, parâmetros e presunções, tornando assim possível que elementos formais e substantivos do raciocínio jurídico da Primeira Emenda passassem a conviver no domínio das mesmas metodologias decisórias. Para além do meu esforço em reconstruir racionalmente as transformações pelas quais passaram as abordagens metodológicas da Suprema Corte ao longo das últimas décadas, me proponho também a dotá-las de algum sentido. Interpreto que a preocupação que a Corte historicamente tem demonstrado com a estabilização de seus procedimentos decisórios, bem como com a previsibilidade de seus julgamentos, guarda íntima relação com a crença de que as justificativas subjacentes à Primeira Emenda (e.g., maior controle do governo pelo povo; busca pela verdade e autoexpressão artística e intelectual) são mais eficazmente promovidas mediante a adoção de uma abordagem decisória que priorize o alcance de melhores resultados em um nível global em detrimento daquilo que muitas vezes parece ser o melhor resultado para o caso mais imediato. / [en] This is a paper about the nature of the reasons that the United States Supreme Court uses to resolve cases involving freedom of speech. I believe that there are two types of reasons that guide the First Amendment decision-making process: formal and substantive. Substantive reasons are those that law shares with other domains of human social action, such as morality, economics and politics. Formal reasons, in turn, are authoritative legal reasons - in the sense that they derive from a valid legal norm (Constitution, laws, regulations, precedents, contracts, and other related normative documents) - and compulsory (or exclusionary), because they generally exclude competing substantive reasons from the horizon of decisional reasoning. My aim in this dissertation is to describe the way in which formal legal reasoning and substantive legal reasoning have to some extent been reconciled at the heart of the decision-making practice of the US Supreme Court. To this end, I endeavor to present, comment on and compare with each other some of the most emblematic judgments carried out by the Court over more than a century of First Amendment constitutional jurisdiction. I also try to show that the adjudicatory methods she has developed can be classified according to the importance each of them attaches to the formal reasons (or, on the other hand, the substantial reasons) for freedom of discourse. For example: the conflict between balancing and the methodologies belonging to the definitional tradition (e.g., absolutism, categorization) represents nothing more than a particular instance of the more general conflict between form and substance in American legal thought. But while until the mid-1960s the discussion about methods of deciding freedom of speech was completely dominated by the opposition between balancing and absolutism, little by little the United States Supreme Court, in company with the great names of legal thought in that country, opened its eyes to the existence of middle points between those two extremes. The result was the creation of new normative theories of decision (e.g., definitional balancing), as well as a series of tests, formulas, parameters and presumptions, thus making it possible for formal and substantive elements of First Amendment legal reasoning to coexist in the realm of the same decision-making methodologies. Beyond my effort to rationally reconstruct the transformations that the Supreme Court s methodological approaches have undergone over the last few decades, I also propose to give them some meaning. I argue that the Court s historical concern with the stabilization of its decision-making procedures, as well as with the predictability of its judgments, is closely related to the belief that the justifications underlying the First Amendment (e.g., greater control of government by the people; the search for truth; and artistic and intellectual self-expression) are most effectively promoted by adopting a decision-making approach that prioritizes the achievement of better outcomes on a global level over what often appears to be the best outcome for the most immediate case. [pt] CATEGORIZACAO [pt] BALANCEAMENTO [pt] ABSOLUTISMO [pt] RACIOCINIO JURIDICO SUBSTANTIVO [pt] RACIOCINIO JURIDICO FORMAL [pt] PRIMEIRA EMENDA [en] CATEGORIZATION [en] BALANCING [en] ABSOLUTISM [en] SUBSTANTIVE LEGAL REASONING [en] FORMAL LEGAL REASONING [en] FIRST AMENDMENT
6	[en] TEXT CATEGORIZATION: CASE STUDY: PATENT S APPLICATION DOCUMENTS IN PORTUGUESE / [pt] CATEGORIZAÇÃO DE TEXTOS: ESTUDO DE CASO: DOCUMENTOS DE PEDIDOS DE PATENTE NO IDIOMA PORTUGUÊS NEIDE DE OLIVEIRA GOMES 08 January 2015 (has links) [pt] Atualmente os categorizadores de textos construídos por técnicas de aprendizagem de máquina têm alcançado bons resultados, tornando viável a categorização automática de textos. A proposição desse estudo foi a definição de vários modelos direcionados à categorização de pedidos de patente, no idioma português. Para esse ambiente foi proposto um comitê composto de 6 (seis) modelos, onde foram usadas várias técnicas. A base de dados foi constituída de 1157 (hum mil cento e cinquenta e sete) resumos de pedidos de patente, depositados no INPI, por depositantes nacionais, distribuídos em várias categorias. Dentre os vários modelos propostos para a etapa de processamento da categorização de textos, destacamos o desenvolvido para o Método 01, ou seja, o k-Nearest-Neighbor (k-NN), modelo também usado no ambiente de patentes, para o idioma inglês. Para os outros modelos, foram selecionados métodos que não os tradicionais para ambiente de patentes. Para quatro modelos, optou-se por algoritmos, onde as categorias são representadas por vetores centróides. Para um dos modelos, foi explorada a técnica do High Order Bit junto com o algoritmo k- NN, sendo o k todos os documentos de treinamento. Para a etapa de préprocessamento foram implementadas duas técnicas: os algoritmos de stemização de Porter; e o StemmerPortuguese; ambos com modificações do original. Foram também utilizados na etapa do pré-processamento: a retirada de stopwords; e o tratamento dos termos compostos. Para a etapa de indexação foi utilizada principalmente a técnica de pesagem dos termos intitulada: frequência de termos modificada versus frequência de documentos inversa TF -IDF . Para as medidas de similaridade ou medidas de distância destacamos: cosseno; Jaccard; DICE; Medida de Similaridade; HOB. Para a obtenção dos resultados foram usadas as técnicas de predição da relevância e do rank. Dos métodos implementados nesse trabalho, destacamos o k-NN tradicional, o qual apresentou bons resultados embora demande muito tempo computacional. / [en] Nowadays, the text s categorizers constructed based on learning techniques, had obtained good results and the automatic text categorization became viable. The purpose of this study was the definition of various models directed to text categorization of patent s application in Portuguese language. For this environment was proposed a committee composed of 6 (six) models, where were used various techniques. The text base was constituted of 1157 (one thousand one hundred fifty seven) abstracts of patent s applications, deposited in INPI, by national applicants, distributed in various categories. Among the various models proposed for the step of text categorization s processing, we emphasized the one devellopped for the 01 Method, the k-Nearest-Neighbor (k-NN), model also used in the English language patent s categorization environment. For the others models were selected methods, that are not traditional in the English language patent s environment. For four models, there were chosen for the algorithms, centroid vectors representing the categories. For one of the models, was explored the High Order Bit technique together with the k-NN algorithm, being the k all the training documents. For the pre-processing step, there were implemented two techniques: the Porter s stemization algorithm; and the StemmerPortuguese algorithm; both with modifications of the original. There were also used in the pre-processing step: the removal of the stopwards; and the treatment of the compound terms. For the indexing step there was used specially the modified documents term frequency versus documents term inverse frequency TF-IDF . For the similarity or distance measures there were used: cosine; Jaccard; DICE; Similarity Measure; HOB. For the results, there were used the relevance and the rank technique. Among the methods implemented in this work it was emphasized the traditional k-NN, which had obtained good results, although demands much computational time. [pt] CATEGORIZACAO DE TEXTOS [en] TEXT CATEGORIZATION [pt] CLASSIFICACAO DE TEXTOS [en] TEXT CLASSIFICATION [pt] STEMIZACAO [en] STEMMING [en] CENTROID OR PROTOTYPE ALGORITHM

1

Page generated in 0.055 seconds