Spelling suggestions: "subject:"legal deposition"" "subject:"legal ceposition""
1 |
Improving Deposition Summarization using Enhanced Generation and Extraction of Entities and KeywordsSumant, Aarohi Milind 01 June 2021 (has links)
In the legal domain, depositions help lawyers and paralegals to record details and recall relevant information relating to a case. Depositions are conversations between a lawyer and a deponent and are generally in Question-Answer (QA) format. These documents can be lengthy, which raises the need for applying summarization methods to the documents. Though many automatic summarization methods are available, not all of them give good results, especially in the legal domain. This creates a need to process the QA pairs and develop methods to help summarize the deposition. For further downstream tasks like summarization and insight generation, converting QA pairs to canonical or declarative form can be helpful. Since the transformed canonical sentences are not perfectly readable, we explore methods based on heuristics, language modeling, and deep learning, to improve the quality of sentences in terms of grammaticality, sentence correctness, and relevance.
Further, extracting important entities and keywords from a deposition will help rank the candidate summary sentences and assist with extractive summarization. This work investigates techniques for enhanced generation of canonical sentences and extracting relevant entities and keywords to improve deposition summarization. / Master of Science / In the legal domain, depositions help lawyers and paralegals to record details and recall relevant information relating to a case. Depositions are conversations between a lawyer and a deponent and are generally in Question-Answer format. These documents can be lengthy, which raises the need for applying summarization methods to the documents. Typical automatic summarization techniques perform poorly on depositions since the data format is very different from standard text documents such as news articles, blogs. To standardize the process of summary generation, we convert the Question-Answer pairs from the deposition document to their canonical or declarative form. We apply techniques to improve the readability of these transformed sentences. Further, we extract entities such as person names, locations, organization and keywords from the deposition to retrieve important sentences and help in summarization. This work describes the techniques used to correct transformed sentences and extract important entities and keywords to improve the summarization of depositions.
|
2 |
Summarizing Legal DepositionsChakravarty, Saurabh 18 January 2021 (has links)
Documents like legal depositions are used by lawyers and paralegals to ascertain the facts
pertaining to a case. These documents capture the conversation between a lawyer and a
deponent, which is in the form of questions and answers. Applying current automatic summarization
methods to these documents results in low-quality summaries. Though extensive
research has been performed in the area of summarization, not all methods succeed in all
domains. Accordingly, this research focuses on developing methods to generate high-quality
summaries of depositions. As part of our work related to legal deposition summarization, we
propose a solution in the form of a pipeline of components, each addressing a sub-problem;
we argue that a pipeline based framework can be tuned to summarize documents from any
domain.
First, we developed methods to parse the depositions, accounting for different document
formats. We were able to successfully parse both a proprietary and a public dataset with
our methods. We next developed methods to anonymize the personal information present in
the deposition documents; we achieve 95% accuracy on the anonymization using a random
sampling based evaluation. Third, we developed an ontology to define dialog acts for the
questions and answers present in legal depositions. Fourth, we developed classifiers based
on this ontology and achieved F1-scores of 0.84 and 0.87 on the public and proprietary
datasets, respectively. Fifth, we developed methods to transform a question-answer pair to
a canonical/simple form. In particular, based on the dialog acts for the question and answer
combination, we developed transformation methods using each of traditional NLP, and deep
learning, techniques. We were able to achieve good scores on the ROUGE and semantic similarity
metrics for most of the dialog act combinations. Sixth, we developed methods based
on deep learning, heuristics, and machine translation to correct the transformed declarative
sentences. The sentence correction improved the readability of the transformed sentences.
Seventh, we developed a methodology to break a deposition into its topical aspects. An
ontology for aspects was defined for legal depositions, and classifiers were developed that
achieved an F1-score of 0.89. Eighth, we developed methods to segment the deposition into
parts that have the same thematic context. The segments helped in augmenting candidate
summary sentences with surrounding context, that leads to a more readable summary.
Ninth, we developed a pipeline to integrate all of the methods, to generate summaries from
the depositions. We were able to outperform the baseline and state of the art summarization
methods in a majority of the cases based on the F1, Recall, and ROUGE-2 scores. The performance
gains were statistically significant for all of the scores. The summaries generated
by our system can be arranged based on the same thematic context or aspect and hence
should be much easier to read and follow, compared to the baseline methods. As part of our
future work, we will improve upon these methods. We will refine our methods to identify
the important parts using additional documents related to a deposition. In addition, we will
work to improve the compression ratio of the generated summaries by reducing the number
of unimportant sentences. We will expand the training dataset to learn and tune the coverage
of the aspects for various deponent types using empirical methods.
Our system has demonstrated effectiveness in transforming a QA pair into a declarative
sentence. Having such a capability could enable us to generate a narrative summary from
the depositions, a first for legal depositions. We will also expand our dataset for evaluation
to ensure that our methods are indeed generalizable, and that they work well when experts
subjectively evaluate the quality of the deposition summaries. / Doctor of Philosophy / Documents in the legal domain are of various types. One set of documents includes trial and
deposition transcripts. These documents capture the proceedings of a trial or a deposition
by note-taking, often over many hours. They contain conversation sentences that are spoken
during the trial or deposition and involve multiple actors. One of the greatest challenges
with these documents is that generally, they are long. This is a source of pain for attorneys
and paralegals who work with the information contained in the documents.
Text summarization techniques have been successfully used to compress a document and capture
the salient parts from it. They have also been able to reduce redundancy in summary
sentences while focusing on coherence and proper sentence formation. Summarizing trial and
deposition transcripts would be immensely useful for law professionals, reducing the time to
identify and disseminate salient information in case related documents, as well as reducing
costs and trial preparation time. Processing the deposition documents using traditional text
processing techniques is a challenge because of their form. Having the deposition conversations
transformed into a suitable declarative form where they can be easily comprehended
can pave the way for the usage of extractive and abstractive summarization methods. As
part of our work, we identified the different discourse structures present in the deposition
in the form of dialog acts. We developed methods based on those dialog acts to transform
the deposition into a declarative form. We were able to achieve an accuracy of 87% on the
dialog act classification. We also were able to transform the conversational question-answer
(QA) pairs into declarative forms for 10 of the top-11 dialog act combinations. Our transformation
methods performed better in 8 out of the 10 QA pair types, when compared to the
baselines. We also developed methods to classify the deposition QA pairs according to their
topical aspects. We generated summaries using aspects by defining the relative coverage for
each aspect that should be present in a summary. Another set of methods developed can
segment the depositions into parts that have the same thematic context. These segments
aid augmenting the candidate summary sentences, to create a summary where information
is surrounded by associated context. This makes the summary more readable and informative;
we were able to significantly outperform the state of the art methods, based on our
evaluations.
|
Page generated in 0.086 seconds