Global ETD Search

1	Improving Deposition Summarization using Enhanced Generation and Extraction of Entities and Keywords Sumant, Aarohi Milind 01 June 2021 (has links) In the legal domain, depositions help lawyers and paralegals to record details and recall relevant information relating to a case. Depositions are conversations between a lawyer and a deponent and are generally in Question-Answer (QA) format. These documents can be lengthy, which raises the need for applying summarization methods to the documents. Though many automatic summarization methods are available, not all of them give good results, especially in the legal domain. This creates a need to process the QA pairs and develop methods to help summarize the deposition. For further downstream tasks like summarization and insight generation, converting QA pairs to canonical or declarative form can be helpful. Since the transformed canonical sentences are not perfectly readable, we explore methods based on heuristics, language modeling, and deep learning, to improve the quality of sentences in terms of grammaticality, sentence correctness, and relevance. Further, extracting important entities and keywords from a deposition will help rank the candidate summary sentences and assist with extractive summarization. This work investigates techniques for enhanced generation of canonical sentences and extracting relevant entities and keywords to improve deposition summarization. / Master of Science / In the legal domain, depositions help lawyers and paralegals to record details and recall relevant information relating to a case. Depositions are conversations between a lawyer and a deponent and are generally in Question-Answer format. These documents can be lengthy, which raises the need for applying summarization methods to the documents. Typical automatic summarization techniques perform poorly on depositions since the data format is very different from standard text documents such as news articles, blogs. To standardize the process of summary generation, we convert the Question-Answer pairs from the deposition document to their canonical or declarative form. We apply techniques to improve the readability of these transformed sentences. Further, we extract entities such as person names, locations, organization and keywords from the deposition to retrieve important sentences and help in summarization. This work describes the techniques used to correct transformed sentences and extract important entities and keywords to improve the summarization of depositions. legal deposition summarization sentence correction information extraction
2	Summarizing Legal Depositions Chakravarty, Saurabh 18 January 2021 (has links) Documents like legal depositions are used by lawyers and paralegals to ascertain the facts pertaining to a case. These documents capture the conversation between a lawyer and a deponent, which is in the form of questions and answers. Applying current automatic summarization methods to these documents results in low-quality summaries. Though extensive research has been performed in the area of summarization, not all methods succeed in all domains. Accordingly, this research focuses on developing methods to generate high-quality summaries of depositions. As part of our work related to legal deposition summarization, we propose a solution in the form of a pipeline of components, each addressing a sub-problem; we argue that a pipeline based framework can be tuned to summarize documents from any domain. First, we developed methods to parse the depositions, accounting for different document formats. We were able to successfully parse both a proprietary and a public dataset with our methods. We next developed methods to anonymize the personal information present in the deposition documents; we achieve 95% accuracy on the anonymization using a random sampling based evaluation. Third, we developed an ontology to define dialog acts for the questions and answers present in legal depositions. Fourth, we developed classifiers based on this ontology and achieved F1-scores of 0.84 and 0.87 on the public and proprietary datasets, respectively. Fifth, we developed methods to transform a question-answer pair to a canonical/simple form. In particular, based on the dialog acts for the question and answer combination, we developed transformation methods using each of traditional NLP, and deep learning, techniques. We were able to achieve good scores on the ROUGE and semantic similarity metrics for most of the dialog act combinations. Sixth, we developed methods based on deep learning, heuristics, and machine translation to correct the transformed declarative sentences. The sentence correction improved the readability of the transformed sentences. Seventh, we developed a methodology to break a deposition into its topical aspects. An ontology for aspects was defined for legal depositions, and classifiers were developed that achieved an F1-score of 0.89. Eighth, we developed methods to segment the deposition into parts that have the same thematic context. The segments helped in augmenting candidate summary sentences with surrounding context, that leads to a more readable summary. Ninth, we developed a pipeline to integrate all of the methods, to generate summaries from the depositions. We were able to outperform the baseline and state of the art summarization methods in a majority of the cases based on the F1, Recall, and ROUGE-2 scores. The performance gains were statistically significant for all of the scores. The summaries generated by our system can be arranged based on the same thematic context or aspect and hence should be much easier to read and follow, compared to the baseline methods. As part of our future work, we will improve upon these methods. We will refine our methods to identify the important parts using additional documents related to a deposition. In addition, we will work to improve the compression ratio of the generated summaries by reducing the number of unimportant sentences. We will expand the training dataset to learn and tune the coverage of the aspects for various deponent types using empirical methods. Our system has demonstrated effectiveness in transforming a QA pair into a declarative sentence. Having such a capability could enable us to generate a narrative summary from the depositions, a first for legal depositions. We will also expand our dataset for evaluation to ensure that our methods are indeed generalizable, and that they work well when experts subjectively evaluate the quality of the deposition summaries. / Doctor of Philosophy / Documents in the legal domain are of various types. One set of documents includes trial and deposition transcripts. These documents capture the proceedings of a trial or a deposition by note-taking, often over many hours. They contain conversation sentences that are spoken during the trial or deposition and involve multiple actors. One of the greatest challenges with these documents is that generally, they are long. This is a source of pain for attorneys and paralegals who work with the information contained in the documents. Text summarization techniques have been successfully used to compress a document and capture the salient parts from it. They have also been able to reduce redundancy in summary sentences while focusing on coherence and proper sentence formation. Summarizing trial and deposition transcripts would be immensely useful for law professionals, reducing the time to identify and disseminate salient information in case related documents, as well as reducing costs and trial preparation time. Processing the deposition documents using traditional text processing techniques is a challenge because of their form. Having the deposition conversations transformed into a suitable declarative form where they can be easily comprehended can pave the way for the usage of extractive and abstractive summarization methods. As part of our work, we identified the different discourse structures present in the deposition in the form of dialog acts. We developed methods based on those dialog acts to transform the deposition into a declarative form. We were able to achieve an accuracy of 87% on the dialog act classification. We also were able to transform the conversational question-answer (QA) pairs into declarative forms for 10 of the top-11 dialog act combinations. Our transformation methods performed better in 8 out of the 10 QA pair types, when compared to the baselines. We also developed methods to classify the deposition QA pairs according to their topical aspects. We generated summaries using aspects by defining the relative coverage for each aspect that should be present in a summary. Another set of methods developed can segment the depositions into parts that have the same thematic context. These segments aid augmenting the candidate summary sentences, to create a summary where information is surrounded by associated context. This makes the summary more readable and informative; we were able to significantly outperform the state of the art methods, based on our evaluations. Natural Language Processing Deep Learning Legal Deposition Summarization

Search results

Improving Deposition Summarization using Enhanced Generation and Extraction of Entities and Keywords

Summarizing Legal Depositions