Data-driven modeling and optimization of sequential batch-continuous process
MetadataShow full item record
Driven by the need to lower capital expenditures and operating costs, as well as by competitive pressure to increase product quality and consistency, modern chemical processes have become increasingly complex. These trends are manifest, on the one hand, in complex equipment configurations and, on the other hand, in a broad array of sensors (and control systems), which generate large quantities of operating data. Of particular interest is the combination of two traditional routes of chemical processing: batch and continuous. Batch to continuous processes (B2C), which constitute the topic of this dissertation, comprise of a batch section, which is responsible for preparing the materials that are then processed in the continuous section. In addition to merging the modeling, control and optimization approaches related to the batch and continuous operating paradigms --which are radically different in many aspects-- challenges related to analyzing the operation of such processes arise from the multi-phase flow. In particular, we will be considering the case where a particulate solid is suspended in a liquid ``carrier'', in the batch stage, and the two-phase mixture is conveyed through the continuous stage. Our explicit goal is to provide a complete operating solution for such processes, starting with the development of meaningful and computationally efficient mathematical models, continuing with a control and fault detection solution, and finally, a production scheduling concept. Owing to process complexity, we reject out of hand the use of first-principles models, which are inevitably high dimensional and computationally expensive, and focus on data-driven approaches instead. Raw data obtained from chemical industry are subject to noise, equipment malfunction and communication failures and, as such, data recorded in process historian databases may contain outliers and measurement noise. Without proper pretreatment, the accuracy and performance of a model derived from such data may be inadequate. In the next chapter of this dissertation, we address this issue, and evaluate several data outlier removal techniques and filtering methods using actual production data from an industrial B2C system. We also address a specific challenge of B2C systems, that is, synchronizing the timing of the batch data need with the data collected from the continuous section of the process. Variable-wise unfolded data (a typical approach for batch processes) exhibit measurement gaps between the batches; however, this type of behavior cannot be found in the subsequent continuous section. These data gaps have an impact on data analysis and, in order to address this issue, we provide a method for filling in the missing values. The batch characteristic values are assigned in the gaps to match the data length with the continuous process, a procedure that preserves meaningful process correlations. Data-driven modeling techniques such as principal component analysis (PCA) and partial least squares (PLS) regression are well-established for modeling batch or continuous processes. In this thesis, we consider them from the perspective of the B2C systems under consideration. Specific challenges that arise during modeling of these systems are related to nonlinearity, which, in turn, is due to multiple operating modes associated with different product types/product grades. In order to deal with this, we propose partitioning the gap-filled data set into subsets using k-means clustering. Using the clustering method, a large data set that reflects multiple operating modes and the associated nonlinearity can be broken down into subsets in which the system exhibits a potentially linear behavior. Also, in order to further increase the model accuracy, the inputs to the model need to be refined. Unrelated variables may corrupt the resulting model by introducing unnecessary noise and irrelevant information. By properly eliminating any uninformative variables, the model performance can be improved along with the interpretability. We use variable selection methods to investigate the model coefficients or variable importance in projection (VIP) values to determine the variables to retain in the model. Developing a model to estimate the final product quality poses different challenges. Measuring and quantifying the final product quality online can be limited due to physical and economic constraints. Physically, there are some quantities that cannot be measured due to sensor sizes or surrounding environments. Economically, the offline ``lab'' measurements may lead to destroying the sample used for the testing. These constraints lead to multiple sampling rates. The process measurements are stored and available continuously in real-time, but the quality measurements have much lower sampling rate. In order to account for this discrepancy, the online process measurements are down-sampled to match the sampling frequency of the lab measurements, and subsequently, soft sensors are can be developed to estimated the final product quality. With the soft sensor in place, the process needs to be optimized to maximize the plant efficiency. Using the real-time optimization, the optimal sequence of manipulated inputs that minimizes the off-spec products are calculated. In addition, the optimal sequences of setpoints can be calculated by carrying out the scheduling calculation with the process model. Traditionally, the scheduling calculation is carried out without taking the process dynamics into account, which could result in off-spec products if a disturbance is introduced. Incorporating the process dynamics into the scheduling layer poses many different challenges numerically. The proposed time scale bridging model (SBM) is able to capture the input-output behavior of the process while greatly reducing the computational complexity and time.