11 |
Automatic Patent ClassificationYehe, Nala January 2020 (has links)
Patents have a great research value and it is also beneficial to the community of industrial, commercial, legal and policymaking. Effective analysis of patent literature can reveal important technical details and relationships, and it can also explain business trends, propose novel industrial solutions, and make crucial investment decisions. Therefore, we should carefully analyze patent documents and use the value of patents. Generally, patent analysts need to have a certain degree of expertise in various research fields, including information retrieval, data processing, text mining, field-specific technology, and business intelligence. In real life, it is difficult to find and nurture such an analyst in a relatively short period of time, enabling him or her to meet the requirement of multiple disciplines. Patent classification is also crucial in processing patent applications because it will empower people with the ability to manage and maintain patent texts better and more flexible. In recent years, the number of patents worldwide has increased dramatically, which makes it very important to design an automatic patent classification system. This system can replace the time-consuming manual classification, thus providing patent analysis managers with an effective method of managing patent texts. This paper designs a patent classification system based on data mining methods and machine learning techniques and use KNIME software to conduct a comparative analysis. This paper will research by using different machine learning methods and different parts of a patent. The purpose of this thesis is to use text data processing methods and machine learning techniques to classify patents automatically. It mainly includes two parts, the first is data preprocessing and the second is the application of machine learning techniques. The research questions include: Which part of a patent as input data performs best in relation to automatic classification? And which of the implemented machine learning algorithms performs best regarding the classification of IPC keywords? This thesis will use design science research as a method to research and analyze this topic. It will use the KNIME platform to apply the machine learning techniques, which include decision tree, XGBoost linear, XGBoost tree, SVM, and random forest. The implementation part includes collection data, preprocessing data, feature word extraction, and applying classification techniques. The patent document consists of many parts such as description, abstract, and claims. In this thesis, we will feed separately these three group input data to our models. Then, we will compare the performance of those three different parts. Based on the results obtained from these three experiments and making the comparison, we suggest using the description part data in the classification system because it shows the best performance in English patent text classification. The abstract can be as the auxiliary standard for classification. However, the classification based on the claims part proposed by some scholars has not achieved good performance in our research. Besides, the BoW and TFIDF methods can be used together to extract efficiently the features words in our research. In addition, we found that the SVM and XGBoost techniques have better performance in the automatic patent classification system in our research.
|
12 |
[pt] DIFERENCIAÇÕES DE GÊNERO NA CARACTERIZAÇÃO DE PERSONAGENS: UMA PROPOSTA METODOLÓGICA E PRIMEIROS RESULTADOS / [en] GENDER REPRESENTATIONS ON CHARACTERS DESCRIPTION: A METHODOLOGICAL PROPOSAL AND EARLY RESULTSFLAVIA MARTINS DA ROSA P DA SILVA 10 August 2021 (has links)
[pt] Este trabalho apresenta uma metodologia que propõe a combinação de dados
quantitativos e distanciados com a leitura mais detalhada e aproximada em análises
de discurso, oferecendo a oportunidade de novos olhares sobre os dados e diversas
perspectivas de análise. A metodologia faz uso de recursos dos estudos linguísticos
com corpus, tais como listas de frequência, preferência, categorização e leitura de
linhas de concordância. Demonstra-se sua aplicação, tomando-se como objeto de
exploração obras da literatura brasileira em domínio público compiladas em um
corpus com cerca de 5 milhões de palavras, anotado semântica e
morfossintaticamente, e utilizam-se ferramentas computacionais que permitem
buscas com base em padrões léxico-sintáticos da língua portuguesa. O objetivo é
identificar como as personagens masculinas e femininas são caracterizadas nos
textos, possibilitando tanto elaborar uma visão geral de como mulheres e homens
são construídos através da linguagem. O estudo se dá em duas frentes: observando
os predicadores na descrição das personagens e as ações são desempenhadas por elas, fazendo distinção entre masculinas e femininas, comparando-as e analisando as diferenças de forma crítica. / [en] This work presents a methodology that proposes the combination of
quantitative and distant-read data with detailed, closer reading in discourse
analysis, enabling new possible views over data and diverse perspectives of
analysis. This methodology makes use of resources most used in corpus-based
linguistic, such as frequency lists, preferences, categorization, and reading
concordance lines. Its application is demonstrated using as exploration object
Brazilian literature titles in the public domain, compiled in a corpus with
approximately 5 million words, semantically and morpho-syntactically
annotated, and by using computational tools that enable searches based on
lexical-syntactic patterns of the Portuguese language. The purpose is to identify
how the male and female characters are portrayed in those texts, enabling the
creation of a general view on how women and men are built through language.
The study happens in two fronts: by observing the predicates used on describing
characters and the actions these characters take, comparing the male and female
results and analyzing them in a critical way.
|
Page generated in 0.6552 seconds