Illegal logging and timber trade presents a persistent threat to global biodiversity and national security due to its ties with illicit financial flows, and causes revenue loss. The scale of global commerce in timber and associated products, combined with the complexity and geographical spread of the supply chain entities present a non-trivial challenge in detecting such transactions. International shipment records, specifically those containing bill of lading is a key source of data which can be used to detect, investigate and act upon such transactions. The comprehensive problem can be described as building a framework that can perform automated discovery and facilitate actionability on detected transactions. A data driven machine learning based approach is necessitated due to the volume, velocity and complexity of international shipping data. Such an automated framework can immensely benefit our targeted end-users---specifically the enforcement agencies.
This overall problem comprises of multiple connected sub-problems with associated research questions. We incorporate crucial domain knowledge---in terms of data as well as modeling---through employing expertise of collaborating domain specialists from ecological conservationist agencies. The collaborators provide formal and informal inputs spanning across the stages---from requirement specification to the design. Following the paradigm of similar problems such as fraud detection explored in prior literature, we formulate the core problem of discovering suspicious transactions as an anomaly detection task. The first sub-problem is to build a system that can be used find suspicious transactions in shipment data pertaining to imports and exports of multiple countries with different country specific schema. We present a novel anomaly detection approach---for multivariate categorical data, following constraints of data characteristics, combined with a data pipeline that incorporates domain knowledge. The focus of the second problem is U.S. specific imports, where data characteristics differ from the prior sub-problem---with heterogeneous attributes present. This problem is important since U.S. is a top consumer and there is scope of actionable enforcement. For this we present a contrastive learning based anomaly detection model for heterogeneous tabular data, with performance and scalability characteristics applicable to real world trade data. While the first two problems address the task of detecting suspicious trades through anomaly detection, a practical challenge with anomaly detection based systems is that of relevancy or scenario specific precision. The third sub-problem addresses this through a human-in-the-loop approach augmented by visual analytics, to re-rank anomalies in terms of relevance---providing explanations for cause of anomalies and soliciting feedback. The last sub-problem pertains to explainability and actionability towards suspicious records, through algorithmic recourse. Algorithmic recourse aims to provides meaningful alternatives towards flagged anomalous records, such that those counterfactual examples are not judged anomalous by the underlying anomaly detection system. This can help enforcement agencies advise verified trading entities in modifying their trading patterns to avoid false detection, thus streamlining the process. We present a novel formulation and metrics for this unexplored problem of algorithmic recourse in anomaly detection. and a deep learning based approach towards explaining anomalies and generating counterfactuals.
Thus the overall research contributions presented in this dissertation addresses the requirements of the framework, and has general applicability in similar scenarios beyond the scope of this framework. / Doctor of Philosophy / Illegal timber trade presents multiple global challenges to ecological biodiversity, vulnerable ecosystems, national security and revenue collection. Enforcement agencies---the target end-users of this framework---face a myriad of challenges in discovering and acting upon shipments with illegal timber that violate national and transnational laws due to volume and complexity of shipment data, coupled with logistical hurdles. This necessitates an automated framework based upon shipment data that can address this task---through solving problems of discovery, analysis and actionability.
The overall problem is decomposed into self contained sub-problems that address the associated specific research questions. These comprise of anomaly detection in multiple types of high dimensional tabular data, improving precision of anomaly detection through expert feedback and algorithmic recourse for anomaly detection. We present data mining and machine learning solutions to each of the sub-problems that overcome limitations and inapplicability of prior approaches. Further, we address two broader research questions. First is incorporation domain knowledge into the framework, which we accomplish through collaboration with domain experts from environmental conservation organizations. Secondly, we address the issue of explainability in anomaly detection for tabular data in multiple contexts. Such real world data presents with challenges of complexity and scalability, especially given the tabular format of the data that presents it's own set of challenges in terms of machine learning. The solutions presented to these machine learning problems associated with each of components of the framework provide an end-to-end solution to it's requirements. More importantly, the models and approaches presented in this dissertation have applicability beyond the application scenario with similar data and application specific challenges.
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/110362 |
Date | 27 May 2022 |
Creators | Datta, Debanjan |
Contributors | Computer Science, Ramakrishnan, Narendran, Reddy, Chandan K., Elmqvist, L. Niklas, North, Christopher L., Lu, Chang Tien |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Language | English |
Detected Language | English |
Type | Dissertation |
Format | ETD, application/pdf |
Rights | Creative Commons Attribution 4.0 International, http://creativecommons.org/licenses/by/4.0/ |
Page generated in 0.0023 seconds