Return to search

Domain-informed Language Models for Process Systems Engineering

Process systems engineering (PSE) involves a systems-level approach to solving problems in chemical engineering related to process modeling, design, control, and optimization and involves modeling interactions between various systems (and subsystems) governing the process. This requires using a combination of mathematical methods, physical intuition, and recently machine learning techniques. Recently, language models have seen tremendous advances due to new and more efficient model architectures (such as transformers), computing power, and large volumes of training data.

Many of these language models could be appropriately adapted to solve several PSE-related problems. However, language models are inherently complex and are often characterized by several million parameters, which could only be trained efficiently in data-rich areas, unlike PSE. Moreover, PSE is characterized by decades of rich process knowledge that must be utilized during model training to avoid mismatch between process knowledge and data-driven language models.

This thesis presents a framework for building domain-informed language models for several central problems in PSE spanning multiple scales. Specifically, the frameworks presented include molecular property prediction, forward and retrosynthesis reaction outcome prediction, chemical flowsheet representation and generation, pharmaceutical information extraction, and reaction classification. Domain knowledge is integrated with language models using custom model architectures, standard and custom-built ontologies, linguistics-inspired chemistry and process flowsheet grammar, adapted problem formulations, graph theory techniques, and so on. This thesis is intended to provide a path for future developments of domain-informed language models in process systems engineering that respect domain knowledge, but leverage their computational advantages.

Identiferoai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/kgkh-yj15
Date January 2024
CreatorsMann, Vipul
Source SetsColumbia University
LanguageEnglish
Detected LanguageEnglish
TypeTheses

Page generated in 0.0155 seconds