Predicting Electron Ionization Mass Spectra Based on Molecular Embedding
-
-
Abstract
At present, mass spectrometry technology is widely used in the analysis of unknown compounds. A common way is to calculate the similarities between the measured mass spectrum of the compound to be analyzed and the existing items in the existing mass spectra library. However, the existing reference mass spectral libraries have coverage problems: it is impossible to achieve a correct search for compounds that do not exist in the reference library. One way to solve this problem is to use neural networks to obtain the potential mapping relationship between molecular structure features and spectral peaks from the known molecular structure and its corresponding mass spectrum data, so as to realize the prediction of mass spectra. Aiming at the problem of loss of molecular structure features in current mass spectra prediction methods, a mass spectra prediction method based on molecular embedding is proposed, which uses molecular embedding methods to convert molecular structure features into high-dimensional feature vectors. The results show that compared to the traditional method using molecular fingerprints to express molecular structural features, the average similarity of mass spectra predicted by our model is increased by5.4%, and the performance of these predicted mass spectra in compound retrieval tasks also exceeds a prediction method based on molecular fingerprints. We have also analyzed the dataset used in our experiment to ensure that our method has a good ability of generalization.
-
-