Return to search

Summarizing Legal Depositions

Documents like legal depositions are used by lawyers and paralegals to ascertain the facts
pertaining to a case. These documents capture the conversation between a lawyer and a
deponent, which is in the form of questions and answers. Applying current automatic summarization
methods to these documents results in low-quality summaries. Though extensive
research has been performed in the area of summarization, not all methods succeed in all
domains. Accordingly, this research focuses on developing methods to generate high-quality
summaries of depositions. As part of our work related to legal deposition summarization, we
propose a solution in the form of a pipeline of components, each addressing a sub-problem;
we argue that a pipeline based framework can be tuned to summarize documents from any
domain.
First, we developed methods to parse the depositions, accounting for different document
formats. We were able to successfully parse both a proprietary and a public dataset with
our methods. We next developed methods to anonymize the personal information present in
the deposition documents; we achieve 95% accuracy on the anonymization using a random
sampling based evaluation. Third, we developed an ontology to define dialog acts for the
questions and answers present in legal depositions. Fourth, we developed classifiers based
on this ontology and achieved F1-scores of 0.84 and 0.87 on the public and proprietary
datasets, respectively. Fifth, we developed methods to transform a question-answer pair to
a canonical/simple form. In particular, based on the dialog acts for the question and answer
combination, we developed transformation methods using each of traditional NLP, and deep
learning, techniques. We were able to achieve good scores on the ROUGE and semantic similarity
metrics for most of the dialog act combinations. Sixth, we developed methods based
on deep learning, heuristics, and machine translation to correct the transformed declarative
sentences. The sentence correction improved the readability of the transformed sentences.
Seventh, we developed a methodology to break a deposition into its topical aspects. An
ontology for aspects was defined for legal depositions, and classifiers were developed that
achieved an F1-score of 0.89. Eighth, we developed methods to segment the deposition into
parts that have the same thematic context. The segments helped in augmenting candidate
summary sentences with surrounding context, that leads to a more readable summary.
Ninth, we developed a pipeline to integrate all of the methods, to generate summaries from
the depositions. We were able to outperform the baseline and state of the art summarization
methods in a majority of the cases based on the F1, Recall, and ROUGE-2 scores. The performance
gains were statistically significant for all of the scores. The summaries generated
by our system can be arranged based on the same thematic context or aspect and hence
should be much easier to read and follow, compared to the baseline methods. As part of our
future work, we will improve upon these methods. We will refine our methods to identify
the important parts using additional documents related to a deposition. In addition, we will
work to improve the compression ratio of the generated summaries by reducing the number
of unimportant sentences. We will expand the training dataset to learn and tune the coverage
of the aspects for various deponent types using empirical methods.
Our system has demonstrated effectiveness in transforming a QA pair into a declarative
sentence. Having such a capability could enable us to generate a narrative summary from
the depositions, a first for legal depositions. We will also expand our dataset for evaluation
to ensure that our methods are indeed generalizable, and that they work well when experts
subjectively evaluate the quality of the deposition summaries. / Doctor of Philosophy / Documents in the legal domain are of various types. One set of documents includes trial and
deposition transcripts. These documents capture the proceedings of a trial or a deposition
by note-taking, often over many hours. They contain conversation sentences that are spoken
during the trial or deposition and involve multiple actors. One of the greatest challenges
with these documents is that generally, they are long. This is a source of pain for attorneys
and paralegals who work with the information contained in the documents.
Text summarization techniques have been successfully used to compress a document and capture
the salient parts from it. They have also been able to reduce redundancy in summary
sentences while focusing on coherence and proper sentence formation. Summarizing trial and
deposition transcripts would be immensely useful for law professionals, reducing the time to
identify and disseminate salient information in case related documents, as well as reducing
costs and trial preparation time. Processing the deposition documents using traditional text
processing techniques is a challenge because of their form. Having the deposition conversations
transformed into a suitable declarative form where they can be easily comprehended
can pave the way for the usage of extractive and abstractive summarization methods. As
part of our work, we identified the different discourse structures present in the deposition
in the form of dialog acts. We developed methods based on those dialog acts to transform
the deposition into a declarative form. We were able to achieve an accuracy of 87% on the
dialog act classification. We also were able to transform the conversational question-answer
(QA) pairs into declarative forms for 10 of the top-11 dialog act combinations. Our transformation
methods performed better in 8 out of the 10 QA pair types, when compared to the
baselines. We also developed methods to classify the deposition QA pairs according to their
topical aspects. We generated summaries using aspects by defining the relative coverage for
each aspect that should be present in a summary. Another set of methods developed can
segment the depositions into parts that have the same thematic context. These segments
aid augmenting the candidate summary sentences, to create a summary where information
is surrounded by associated context. This makes the summary more readable and informative;
we were able to significantly outperform the state of the art methods, based on our
evaluations.

Identiferoai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/111223
Date18 January 2021
CreatorsChakravarty, Saurabh
ContributorsComputer Science, Fox, Edward A., Ashley, Kevin D., Reddy, Chandan K., Hsiao, Michael S., Karpatne, Anuj
PublisherVirginia Tech
Source SetsVirginia Tech Theses and Dissertation
Detected LanguageEnglish
TypeDissertation
FormatETD, application/pdf
RightsIn Copyright, http://rightsstatements.org/vocab/InC/1.0/

Page generated in 0.0021 seconds