Can we quantify chemicals if we do not know their structure? Yes, we can!

Non-target analysis with LC/ESI/HRMS is increasingly used to detect thousands of chemicals in environmental samples. However, most of these chemicals are never confidently identified nor quantified. This in turn means that they are ignored in risk assessments. In our recent paper, we address this issue by modeling the response factor for chemicals without the need for structural identification. This paper is based on the bachelor’s thesis of Emma Palm.

To predict the LC/ESI/HRMS response factor we started from two properties known to influence ionization efficiency in ESI: logP and pKa. Based on this knowledge we defined 12 LC/ESI/HRMS features that may correlate to these properties and can be easily determined for any detected chemical even if the structure is unknown. Some of these include retention times using mobile phases with three different pH values, relative peak intensities in positive and negative ionization mode, and the presence of sodium adducts. We then measured a training set of 101 chemicals in positive and negative ionization mode at pH 2.7, 8.0 and 10.0. The LC/ESI/HRMS features were used to predict the response factors with random forest regression for each respective pH and ionization mode. We then tested the models on a test set of 31 chemicals and blind spiked water samples used for validation.

The validation of the models gave highly promising results with mean prediction errors of a factor of 6.0 and 90% of the chemicals having prediction errors lower than a factor of 10. We found this to significantly outperform bassline models that use the mean response or the closest eluting standard to estimate the response. The model also had similar accuracy to a model using PaDEL descriptors previously developed by Jaanus Liigand in our group. Of the selected descriptors, we found that the most significant ones were the relative peak area in positive and negative ionization mode, the m/z of the compound, and the difference in retention time between different pH mobile phases. We believe that relative peak area and difference in retention time are associated with the acid-base properties of the chemicals while m/z is, for this dataset, correlated with logP.

In the future, we hope that this model may be made transferable between instruments. This can then make further combinations with models predicting toxicity possible to better indicate the risk posed by unidentified compounds.