• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 22
  • 4
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 35
  • 35
  • 8
  • 8
  • 6
  • 6
  • 6
  • 5
  • 5
  • 5
  • 5
  • 5
  • 5
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

ESTIMATING THE RESPIRATORY LUNG MOTION MODEL USING TENSOR DECOMPOSITION ON DISPLACEMENT VECTOR FIELD

Kang, Kingston 01 January 2018 (has links)
Modern big data often emerge as tensors. Standard statistical methods are inadequate to deal with datasets of large volume, high dimensionality, and complex structure. Therefore, it is important to develop algorithms such as low-rank tensor decomposition for data compression, dimensionality reduction, and approximation. With the advancement in technology, high-dimensional images are becoming ubiquitous in the medical field. In lung radiation therapy, the respiratory motion of the lung introduces variabilities during treatment as the tumor inside the lung is moving, which brings challenges to the precise delivery of radiation to the tumor. Several approaches to quantifying this uncertainty propose using a model to formulate the motion through a mathematical function over time. [Li et al., 2011] uses principal component analysis (PCA) to propose one such model using each image as a long vector. However, the images come in a multidimensional arrays, and vectorization breaks the spatial structure. Driven by the needs to develop low-rank tensor decomposition and provided the 4DCT and Displacement Vector Field (DVF), we introduce two tensor decompositions, Population Value Decomposition (PVD) and Population Tucker Decomposition (PTD), to estimate the respiratory lung motion with high levels of accuracy and data compression. The first algorithm is a generalization of PVD [Crainiceanu et al., 2011] to higher order tensor. The second algorithm generalizes the concept of PVD using Tucker decomposition. Both algorithms are tested on clinical and phantom DVFs. New metrics for measuring the model performance are developed in our research. Results of the two new algorithms are compared to the result of the PCA algorithm.
22

Autoregressive Tensor Decomposition for NYC Taxi Data Analysis

Zongwei Li (9192548) 31 July 2020 (has links)
Cities have adopted evolving urban digitization strategies, and most of those increasingly focus on data, especially in the field of public transportation. Transportation data have intuitively spatial and temporal characteristics, for they are often described with when and where the trips occur. Since a trip is often described with many attributes, the transportation data can be presented with a tensor, a container which can house data in $N$-dimensions. Unlike a traditional data frame, which only has column variables, tensor is intuitively more straightforward to explore spatio-temporal data-sets, which makes those attributes more easily interpreted. However, it requires unique techniques to extract useful and relatively correct information in attributes highly correlated with each other. This work presents a mixed model consisting of tensor decomposition combined with seasonal vector autoregression in time to find latent patterns within historical taxi data classified by types of taxis, pick-up and drop-off times of services in NYC, so that it can help predict the place and time where taxis are demanded. We validated the proposed approach using the experiment evaluation with real NYC tax data. The proposed method shows the best prediction among alternative models without geographical inference, and captures the daily patterns of taxi demands for business and entertainment needs.
23

An Investigation of Low-Rank Decomposition for Increasing Inference Speed in Deep Neural Networks With Limited Training Data

Wikén, Victor January 2018 (has links)
In this study, to increase inference speed of convolutional neural networks, the optimization technique low-rank tensor decomposition has been implemented and applied to AlexNet which had been trained to classify dog breeds. Due to a small training set, transfer learning was used in order to be able to classify dog breeds. The purpose of the study is to investigate how effective low-rank tensor decomposition is when the training set is limited. The results obtained from this study, compared to a previous study, indicate that there is a strong relationship between the effects of the tensor decomposition and how much available training data exists. A significant speed up can be obtained in the different convolutional layers using tensor decomposition. However, since there is a need to retrain the network after the decomposition and due to the limited dataset there is a slight decrease in accuracy. / För att öka inferenshastigheten hos faltningssnätverk, har i denna studie optimeringstekniken low-rank tensor decomposition implementerats och applicerats på AlexNet, som har tränats för att klassificera hundraser. På grund av en begränsad mängd träningsdata användes transfer learning för uppgiften. Syftet med studien är att undersöka hur effektiv low-rank tensor decomposition är när träningsdatan är begränsad. Jämfört med resultaten från en tidigare studie visar resultaten från denna studie att det finns ett starkt samband mellan effekterna av low-rank tensor decomposition och hur mycket tillgänglig träningsdata som finns. En signifikant hastighetsökning kan uppnås i de olika faltningslagren med hjälp av low-rank tensor decomposition. Eftersom det finns ett behov av att träna om nätverket efter dekompositionen och på grund av den begränsade mängden data så uppnås hastighetsökningen dock på bekostnad av en viss minskning i precisionen för modellen.
24

On the VC-dimension of Tensor Networks

Khavari, Behnoush 01 1900 (has links)
Les méthodes de réseau de tenseurs (TN) ont été un ingrédient essentiel des progrès de la physique de la matière condensée et ont récemment suscité l'intérêt de la communauté de l'apprentissage automatique pour leur capacité à représenter de manière compacte des objets de très grande dimension. Les méthodes TN peuvent par exemple être utilisées pour apprendre efficacement des modèles linéaires dans des espaces de caractéristiques exponentiellement grands [1]. Dans ce manuscrit, nous dérivons des limites supérieures et inférieures sur la VC-dimension et la pseudo-dimension d'une grande classe de Modèles TN pour la classification, la régression et la complétion . Nos bornes supérieures sont valables pour les modèles linéaires paramétrés par structures TN arbitraires, et nous dérivons des limites inférieures pour les modèles de décomposition tensorielle courants (CP, Tensor Train, Tensor Ring et Tucker) montrant l'étroitesse de notre borne supérieure générale. Ces résultats sont utilisés pour dériver une borne de généralisation qui peut être appliquée à la classification avec des matrices de faible rang ainsi qu'à des classificateurs linéaires basés sur l'un des modèles de décomposition tensorielle couramment utilisés. En corollaire de nos résultats, nous obtenons une borne sur la VC-dimension du classificateur basé sur le matrix product state introduit dans [1] en fonction de la dimension de liaison (i.e. rang de train tensoriel), qui répond à un problème ouvert répertorié par Cirac, Garre-Rubio et Pérez-García [2]. / Tensor network (TN) methods have been a key ingredient of advances in condensed matter physics and have recently sparked interest in the machine learning community for their ability to compactly represent very high-dimensional objects. TN methods can for example be used to efficiently learn linear models in exponentially large feature spaces [1]. In this manuscript, we derive upper and lower bounds on the VC-dimension and pseudo-dimension of a large class of TN models for classification, regression and completion. Our upper bounds hold for linear models parameterized by arbitrary TN structures, and we derive lower bounds for common tensor decomposition models (CP, Tensor Train, Tensor Ring and Tucker) showing the tightness of our general upper bound. These results are used to derive a generalization bound which can be applied to classification with low-rank matrices as well as linear classifiers based on any of the commonly used tensor decomposition models. As a corollary of our results, we obtain a bound on the VC-dimension of the matrix product state classifier introduced in [1] as a function of the so-called bond dimension (i.e. tensor train rank), which answers an open problem listed by Cirac, Garre-Rubio and Pérez-García [2].
25

Texts, Images, and Emotions in Political Methodology

Yang, Seo Eun 02 September 2022 (has links)
No description available.
26

Modern Electronic Structure Theory using Tensor Product States

Abraham, Vibin 11 January 2022 (has links)
Strongly correlated systems have been a major challenge for a long time in the field of theoretical chemistry. For such systems, the relevant portion of the Hilbert space scales exponentially, preventing efficient simulation on large systems. However, in many cases, the Hilbert space can be partitioned into clusters on the basis of strong and weak interactions. In this work, we mainly focus on an approach where we partition the system into smaller orbital clusters in which we can define many-particle cluster states and use traditional many-body methods to capture the rest of the inter-cluster correlations. This dissertation can be mainly divided into two parts. In the first part of this dissertation, the clustered ansatz, termed as tensor product states (TPS), is used to study large strongly correlated systems. In the second part, we study a particular type of strongly correlated system, correlated triplet pair states that arise in singlet fission. The many-body expansion (MBE) is an efficient tool that has a long history of use for calculating interaction energies, binding energies, lattice energies, and so on. We extend the incremental full configuration interaction originally proposed for a Slater determinant to a tensor product state (TPS) based wavefunction. By partitioning the active space into smaller orbital clusters, our approach starts from a cluster mean-field reference TPS configuration and includes the correlation contribution of the excited TPSs using a many-body expansion. This method, named cluster many-body expansion (cMBE), improves the convergence of MBE at lower orders compared to directly doing a block-based MBE from an RHF reference. The performance of the cMBE method is also tested on a graphene nano-sheet with a very large active space of 114 electrons in 114 orbitals, which would require 1066 determinants for the exact FCI solution. Selected CI (SCI) using determinants becomes intractable for large systems with strong correlation. We introduce a method for SCI algorithms using tensor product states which exploits local molecular structure to significantly reduce the number of SCI variables. We demonstrate the potential of this method, called tensor product selected configuration interaction (TPSCI), using a few model Hamiltonians and molecular examples. These numerical results show that TPSCI can be used to significantly reduce the number of SCI variables in the variational space, and thus paving a path for extending these deterministic and variational SCI approaches to a wider range of physical systems. The extension of the TPSCI algorithm for excited states is also investigated. TPSCI with perturbative corrections provides accurate excitation energies for low-lying triplet states with respect to extrapolated results. In the case of traditional SCI methods, accurate excitation energies are obtained only after extrapolating calculations with large variational dimensions compared to TPSCI. We provide an intuitive connection between lower triplet energy mani- folds with Hückel molecular orbital theory, providing a many-body version of Hückel theory for excited triplet states. The n-body Tucker ansatz (which is a truncated TPS wavefunction) developed in our group provides a good approximation to the low-lying states of a clusterable spin system. In this approach, a Tucker decomposition is used to obtain local cluster states which can be truncated to prune the full Hilbert space of the system. As a truncated variational approach, it has been observed that the self-consistently optimized n-body Tucker method is not size- extensive, a property important for many-body methods. We explore the use of perturbation theory and linearized coupled-cluster methods to obtain a robust yet efficient approximation. Perturbative corrections to the n-body Tucker method have been implemented for the Heisenberg Hamiltonian and numerical data for various lattices and molecular systems has been presented to show the applicability of the method. In the second part of this dissertation, we focus on studying a particular type of strongly correlated states that occurs in singlet fission material. The correlated triplet pair state 1(TT) is a key intermediate in the singlet fission process, and understanding the mechanism by which it separates into two independent triplet states is critical for leveraging singlet fission for improving solar cell efficiency. This separation mechanism is dominated by two key interactions: (i) the exchange interaction (K) between the triplets which leads to the spin splitting of the biexciton state into 1(TT),3(TT) and 5(TT) states, and (ii) the triplet-triplet energy transfer integral (t) which enables the formation of the spatially separated (but still spin entangled) state 1(T...T). We develop a simple ab initio technique to compute both the triplet-triplet exchange (K) and triplet-triplet energy transfer coupling (t). Our key findings reveal new conditions for successful correlated triplet pair state dissociation. The biexciton exchange interaction needs to be ferromagnetic or negligible compared to the triplet energy transfer for favorable dissociation. We also explore the effect of chromophore packing to reveal geometries where these conditions are achieved for tetracene. We also provide a simple connectivity rule to predict whether the through-bond coupling will be stabilizing or destabilizing for the (TT) state in covalently linked singlet fission chromophores. By drawing an analogy between the chemical system and a simple spin-lattice, one is able to determine the ordering of the multi-exciton spin state via a generalized usage of Ovchinnikov's rule. In the case of meta connectivity, we predict 5(TT) to be formed and this is later confirmed by experimental techniques like time-resolved electron spin resonance (TR-ESR). / Doctor of Philosophy / The study of the correlated motion of electrons in molecules and materials allows scientists to gain useful insights into many physical processes like photosynthesis, enzyme catalysis, superconductivity, chemical reactions and so on. Theoretical quantum chemistry tries to study the electronic properties of chemical species. The exact solution of the electron correlation problem is exponentially complex and can only be computed for small systems. Therefore, approximations are introduced for practical calculations that provide good results for ground state properties like energy, dipole moment, etc. Sometimes, more accurate calculations are required to study the properties of a system, because the system may not adhere to the as- sumptions that are made in the methods used. One such case arises in the study of strongly correlated molecules. In this dissertation, we present methods which can handle strongly correlated cases. We partition the system into smaller parts, then solve the problem in the basis of these smaller parts. We refer to this block-based wavefunction as tensor product states and they provide accurate results while avoiding the exponential scaling of the full solution. We present accurate energies for a wide variety of challenging cases, including bond breaking, excited states and π conjugated molecules. Additionally, we also investigate molecular systems that can be used to increase the efficiency of solar cells. We predict improved solar efficiency for a chromophore dimer, a result which is later experimentally verified.
27

Anwendung von Tensorapproximationen auf die Full Configuration Interaction Methode

Böhm, Karl-Heinz 12 September 2016 (has links) (PDF)
In dieser Arbeit werden verschiedene Ansätze untersucht, um Tensorzerlegungsmethoden auf die Full-Configuration-Interaction-Methode (FCI) anzuwenden. Das Ziel dieser Ansätze ist es, zuverlässig konvergierende Algorithmen zu erstellen, welche es erlauben, die Wellenfunktion effizient im Canonical-Product-Tensorformat (CP) zu approximieren. Hierzu werden drei Ansätze vorgestellt, um die FCI-Wellenfunktion zu repräsentieren und darauf basierend die benötigten Koeffizienten zu bestimmen. Der erste Ansatz beruht auf einer Entwicklung der Wellenfunktion als Linearkombination von Slaterdeterminanten, bei welcher in einer Hierarchie ausgehend von der Hartree-Fock-Slaterdeterminante sukzessive besetzte Orbitale durch virtuelle Orbitale ersetzt werden. Unter Nutzung von Tensorrepräsentationen im CP wird ein lineares Gleichungssystem gelöst, um die FCI-Koeffizienten zu bestimmen. Im darauf folgenden Ansatz, welcher an Direct-CI angelehnt ist, werden Tensorrepräsentationen der Hamiltonmatrix und des Koeffizientenvektors aufgestellt, welche zur Lösung des FCI-Eigenwertproblems erforderlich sind. Hier wird ein Algorithmus vorgestellt, mit welchem das Eigenwertproblem im CP gelöst wird. In einem weiteren Ansatz wird die Repräsentation der Hamiltonmatrix und des Koeffizientenvektors im Fockraum formuliert. Dieser Ansatz erlaubt die Lösung des FCI-Eigenwertproblems mit Hilfe verschiedener Algorithmen. Diese orientieren sich an den Rayleighquotienteniterationen oder dem Davidsonalgorithmus, wobei für den ersten Algorithmus eine zweite Version entwickelt wurde, wo die Rangreduktion teilweise durch Projektionen ersetzt wurde. Für den Davidsonalgorithmus ist ein breiteres Spektrum von Molekülen behandelbar und somit können erste Untersuchungen zur Skalierung und zu den zu erwartenden Fehlern vorgestellt werden. Schließlich wird ein Ausblick auf mögliche Weiterentwicklungen gegeben, welche eine effizientere Berechnung ermöglichen und somit FCI im CP auch für größere Moleküle zugänglich macht. / In this thesis, various approaches are investigated to apply tensor decomposition methods to the Full Configuration Interaction method (FCI). The aim of these approaches is the development of algorithms, which converge reliably and which permit to approximate the wave function efficiently in the Canonical Product format (CP). Three approaches are introduced to represent the FCI wave function and to obtain the corresponding coefficients. The first approach ist based on an expansion of the wave function as a linear combination of slater determinants. In this hierarchical expansion, starting from the Hartree Fock slater determinant, the occupied orbitals are substituted by virtual orbitals. Using tensor representations in the CP, a linear system of equations is solved to obtain the FCI coefficients. In a further approach, tensor representations of the Hamiltonian matrix and the coefficient vectors are set up, which are required to solve the FCI eigenvalue problem. The tensor contractions and an algorithm to solve the eigenvalue problem in the CP are explained her in detail. In the next approach, tensor representations of the Hamiltonian matrix and the coefficient vector are constructed in the Fock space. This approach allows the application of various algorithms. They are based on the Rayleight Quotient Algorithm and the Davidson algorithm and for the first one, there exists a second version, where the rank reduction algorithm is replaced by projections. The Davidson algorithm allows to treat a broader spectrum of molecules. First investigations regarding the scaling behaviour and the expectable errors can be shown for this approach. Finally, an outlook on the further development is given, that allows for more efficient calculations and makes FCI in the CP accessible for larger molecules.
28

Low-Rank Tensor Approximation in post Hartree-Fock Methods

Benedikt, Udo 24 February 2014 (has links) (PDF)
In this thesis the application of novel tensor decomposition and tensor representation techniques in highly accurate post Hartree-Fock methods is evaluated. These representation techniques can help to overcome the steep scaling behaviour of high level ab-initio calculations with increasing system size and therefore break the "curse of dimensionality". After a comparison of various tensor formats the application of the "canonical polyadic" format (CP) is described in detail. There, especially the casting of a normal, index based tensor into the CP format (tensor decomposition) and a method for a low rank approximation (rank reduction) of the two-electron integrals in the AO basis are investigated. The decisive quantity for the applicability of the CP format is the scaling of the rank with increasing system and basis set size. The memory requirements and the computational effort for tensor manipulations in the CP format are only linear in the number of dimensions but still depend on the expansion length (rank) of the approximation. Furthermore, the AO-MO transformation and a MP2 algorithm with decomposed tensors in the CP format is evaluated and the scaling with increasing system and basis set size is investigated. Finally, a Coupled-Cluster algorithm based only on low-rank CP representation of the MO integrals is developed. There, especially the successive tensor contraction during the iterative solution of the amplitude equations and the error propagation upon multiple application of the reduction procedure are discussed. In conclusion the overall complexity of a Coupled-Cluster procedure with tensors in CP format is evaluated and some possibilities for improvements of the rank reduction procedure tailored to the needs in electronic structure calculations are shown. / Die vorliegende Arbeit beschäftigt sich mit der Anwendung neuartiger Tensorzerlegungs- und Tensorrepesentationstechniken in hochgenauen post Hartree-Fock Methoden um das hohe Skalierungsverhalten dieser Verfahren mit steigender Systemgröße zu verringern und somit den "Fluch der Dimensionen" zu brechen. Nach einer vergleichenden Betrachtung verschiedener Representationsformate wird auf die Anwendung des "canonical polyadic" Formates (CP) detailliert eingegangen. Dabei stehen zunächst die Umwandlung eines normalen, indexbasierten Tensors in das CP Format (Tensorzerlegung) und eine Methode der Niedrigrang Approximation (Rangreduktion) für Zweielektronenintegrale in der AO Basis im Vordergrund. Die entscheidende Größe für die Anwendbarkeit ist dabei das Skalierungsverhalten das Ranges mit steigender System- und Basissatzgröße, da der Speicheraufwand und die Berechnungskosten für Tensormanipulationen im CP Format zwar nur noch linear von der Anzahl der Dimensionen des Tensors abhängen, allerdings auch mit der Expansionslänge (Rang) skalieren. Im Anschluss wird die AO-MO Transformation und der MP2 Algorithmus mit zerlegten Tensoren im CP Format diskutiert und erneut das Skalierungsverhalten mit steigender System- und Basissatzgröße untersucht. Abschließend wird ein Coupled-Cluster Algorithmus vorgestellt, welcher ausschließlich mit Tensoren in einer Niedrigrang CP Darstellung arbeitet. Dabei wird vor allem auf die sukzessive Tensorkontraktion während der iterativen Bestimmung der Amplituden eingegangen und die Fehlerfortpanzung durch Anwendung des Rangreduktions-Algorithmus analysiert. Abschließend wird die Komplexität des gesamten Verfahrens bewertet und Verbesserungsmöglichkeiten der Reduktionsprozedur aufgezeigt.
29

AnÃlise do contexto e dos resultados da aprendizagem da avaliaÃÃo educacional em um curso de graduaÃÃo em Engenharia / Analysis of the context and results of educational evaluation of learning in an undergraduate degree in Engineering

Francisco Herbert Lima Vasconcelos 20 March 2015 (has links)
Banco do Nordeste do Brasil / A avaliaÃÃo educacional dispÃe de mÃtodos para a obtenÃÃo de dados que podem ser Ãteis para avaliar grupos de indivÃduos (alunos, professores, administradores, tÃcnicos e outros), projetos, produtos e materiais, instituiÃÃes e sistemas educacionais, nos seus diversos nÃveis e competÃncias. No campo da educaÃÃo em engenharia, os processos avaliativos podem ajudar os gestores a tomarem decisÃes e a realizarem mudanÃas em cursos de graduaÃÃo. Esta tese investiga de forma inÃdita uma nova abordagem para a anÃlise e interpretaÃÃo de dados no campo da educaÃÃo em engenharia com Ãnfase no processo de avaliaÃÃo, levando em consideraÃÃo dois aspectos de modo integrado: a) a percepÃÃo/opiniÃo dos estudantes sobre o contexto/ambiente educacional (Learning Context - LC) e b) os resultados/rendimentos obtidos pelos mesmos discentes (Learning Outcomes - LO). Para a realizaÃÃo desta pesquisa, foram coletados dados de estudantes do curso de graduaÃÃo em Engenharia de TeleinformÃtica (ETI) do Centro de Tecnologia (CT) da Universidade Federal do Cearà (UFC). Os dados de LC foram coletados a partir da aplicaÃÃo do instrumento SEEQ (Studentâs Evaluation of Educational Quality) da metodologia SETE (Student Evaluate Teaching Effetivecness). Os dados de LO foram coletados a partir das informaÃÃes dos resultados de desempenho da aprendizagem dos mesmos discentes. Na realizaÃÃo do processamento da informaÃÃo dos dados matriciais e tensoriais obtidos, foram utilizadas duas ferramentas matemÃticas: a decomposiÃÃo bilinear, por meio da AnÃlise de Componentes Principais (Principal Component Analysis - PCA) e a decomposiÃÃo multilinear tensorial por meio da AnÃlise de Fatores Paralelos (Parallel Factor Analysis - PARAFAC). Os resultados obtidos permitem identificar caracterÃsticas comuns e semelhanÃas em componentes curriculares, tanto em termos da percepÃÃo quanto do desempenho dos estudantes. Os modelos PCA e PARAFAC tambÃm demonstraram um potencial significativo para extrair informaÃÃes de dados relacionados com variÃveis latentes em contextos educativos. / Educational evaluation provides methods to obtain data that can be useful for evaluating groups of individuals (students, teachers, administrators, technicians and others), projects, products and materials, educational institutions and systems at different levels and skills. In engineering education, evaluation processes can help managers to make decisions and changes in undergraduate courses. This thesis investigates in unprecedented way a new approach to the analysis and interpretation of data in the field of engineering education with emphasis in the evaluation process, taking into account two aspects in an integrated manner: a) perception / opinion of students about the context / educational environment (Learning Context - LC) and b) the results / income earned by the same students (Learning outcomes - LO). For this research, we collected data related to undergraduate students in Teleinformatics Engineering (TEI), at Technology Center (CT) of the Federal University of Cearà (UFC). LC data were collected from the application of SEEQ (Studentâs Evaluation of Educational Quality) instrument of SETE (Student Teaching Evaluate Effetivecness) methodology. The LO data was collected from the information of the performance of the studentsâ learning outcomes. Carrying out the information processing of the obtained tensor and matrix data, we have used two mathematical tools: the bilinear decomposition, called Principal Component Analysis - PCA decomposition and the multilinear tensor decomposition by Parallel Factor Analysis - PARAFAC. The results allow us to identify common features and similarities in curriculum components, both in terms of perception as the performance of students. The PCA and PARAFAC models also showed significant potential to extract data information related to latent variables in educational settings.
30

[en] EXTRACTING RELIABLE INFORMATION FROM LARGE COLLECTIONS OF LEGAL DECISIONS / [pt] EXTRAINDO INFORMAÇÕES CONFIÁVEIS DE GRANDES COLEÇÕES DE DECISÕES JUDICIAIS

FERNANDO ALBERTO CORREIA DOS SANTOS JUNIOR 09 June 2022 (has links)
[pt] Como uma consequência natural da digitalização do sistema judiciário brasileiro, um grande e crescente número de documentos jurídicos tornou-se disponível na internet, especialmente decisões judiciais. Como ilustração, em 2020, o Judiciário brasileiro produziu 25 milhões de decisões. Neste mesmo ano, o Supremo Tribunal Federal (STF), a mais alta corte do judiciário brasileiro, produziu 99.5 mil decisões. Alinhados a esses valores, observamos uma demanda crescente por estudos voltados para a extração e exploração do conhecimento jurídico de grandes acervos de documentos legais. Porém, ao contrário do conteúdo de textos comuns (como por exemplo, livro, notícias e postagem de blog), o texto jurídico constitui um caso particular de uso de uma linguagem altamente convencionalizada. Infelizmente, pouca atenção é dada à extração de informações em domínios especializados, como textos legais. Do ponto de vista temporal, o Judiciário é uma instituição em constante evolução, que se molda para atender às demandas da sociedade. Com isso, o nosso objetivo é propor um processo confiável de extração de informações jurídicas de grandes acervos de documentos jurídicos, tomando como base o STF e as decisões monocráticas publicadas por este tribunal nos anos entre 2000 e 2018. Para tanto, pretendemos explorar a combinação de diferentes técnicas de Processamento de Linguagem Natural (PLN) e Extração de Informação (EI) no contexto jurídico. Da PLN, pretendemos explorar as estratégias automatizadas de reconhecimento de entidades nomeadas no domínio legal. Do ponto da EI, pretendemos explorar a modelagem dinâmica de tópicos utilizando a decomposição tensorial como ferramenta para investigar mudanças no raciocinio juridico presente nas decisões ao lonfo do tempo, a partir da evolução do textos e da presença de entidades nomeadas legais. Para avaliar a confiabilidade, exploramos a interpretabilidade do método empregado, e recursos visuais para facilitar a interpretação por parte de um especialista de domínio. Como resultado final, a proposta de um processo confiável e de baixo custo para subsidiar novos estudos no domínio jurídico e, também, propostas de novas estratégias de extração de informações em grandes acervos de documentos. / [en] As a natural consequence of the Brazilian Judicial System’s digitization, a large and increasing number of legal documents have become available on the Internet, especially judicial decisions. As an illustration, in 2020, 25 million decisions were produced by the Brazilian Judiciary. Meanwhile, the Brazilian Supreme Court (STF), the highest judicial body in Brazil, alone has produced 99.5 thousand decisions. In line with those numbers, we face a growing demand for studies focused on extracting and exploring the legal knowledge hidden in those large collections of legal documents. However, unlike typical textual content (e.g., book, news, and blog post), the legal text constitutes a particular case of highly conventionalized language. Little attention is paid to information extraction in specialized domains such as legal texts. From a temporal perspective, the Judiciary itself is a constantly evolving institution, which molds itself to cope with the demands of society. Therefore, our goal is to propose a reliable process for legal information extraction from large collections of legal documents, based on the STF scenario and the monocratic decisions published by it between 2000 and 2018. To do so, we intend to explore the combination of different Natural Language Processing (NLP) and Information Extraction (IE) techniques on legal domain. From NLP, we explore automated named entity recognition strategies in the legal domain. From IE, we explore dynamic topic modeling with tensor decomposition as a tool to investigate the legal reasoning changes embedded in those decisions over time through textual evolution and the presence of the legal named entities. For reliability, we explore the interpretability of the methods employed. Also, we add visual resources to facilitate interpretation by a domain specialist. As a final result, we expect to propose a reliable and cost-effective process to support further studies in the legal domain and, also, to propose new strategies for information extraction on a large collection of documents.

Page generated in 0.0735 seconds