Estimating Missing Unit Process Data in Life Cycle Assessment Using a Similarity-based Approach
In life cycle assessment (LCA), collecting unit process data from the empirical sources (i.e., meter readings, operation logs/journals) is often costly and time-consuming. We propose a new computational approach to estimate missing unit process data solely relying on limited known data based on a similarity-based link prediction method. The intuition is that similar processes in a unit process network tend to have similar material/energy inputs and waste/emission outputs. We use the ecoinvent 3.1 unit process datasets to test our method in four steps: 1) dividing the datasets into a training set and a test set; 2) randomly removing certain numbers of data in the test set indicated as missing; 3) using similarity-weighted means of various numbers of most similar processes in the training set to estimate the missing data in the test set; and 4) comparing estimated data with the original values to determine the performance of the estimation. The results show that missing data can be accurately estimated when less than 5% data are missing in one process. The estimation performance decreases as the percentage of missing data increases. This study provides a new approach to compile unit process data and demonstrates a promising potential of using computational approaches for LCA data compilation.