Data gaps in life cycle inventory (LCI) are stumbling blocks for investigating the life cycle performance and impact of emerging technologies. It can be tedious, expensive and time consuming for LCI practitioners to collect LCI data or to wait for experimental data become available. I propose a computational approach to estimate missing LCI data using link prediction techniques in network science. LCI data in Ecoinvent 3.1 is used to test the method.
The proposed approach is based on the similarities between different processes or environmental interventions in the LCI database. By comparing two processes’ material inputs and emission outputs, I measure the similarity of these processes. I hypothesize that similar processes tend to have similar material inputs and emission outputs which are life cycle inventory data I want to estimate. In particular, I measure similarity using four metrics, including average difference, Pearson correlation coefficient, Euclidean distance, and SimRank with or without data normalization. I test these four metrics and normalization method for their performance of estimating missing LCI data. The results show that processes in the same industrial classification have higher similarities, which validate the approach of measuring the similarity between unit processes. I remove a small set of data (from one data point to 50) for each process and then use the rest of LCI data as to train the model for estimating the removed data. It is found that approximately 80% of removed data can be successfully estimated with less than 10% errors. This study is the first attempt in the searching for an effective computational method for estimating missing LCI data. It is anticipated that this approach will significantly transform LCI compilation and LCA studies in future.