基于Spec2Vec的电子电离质谱预测指纹研究
Predicting a Molecular Fingerprint from an Electron Ionization Mass Spectrum with Spec2Vec
-
摘要: 质谱法是一种广泛用于生物系统中化合物鉴定的技术。然而传统的质谱检索方法只能识别质谱库中已经存在的化合物。一种解决方法是通过分子指纹预测质谱,通过预测的质谱扩充质谱库。另一种方法是通过质谱预测分子指纹,通过分子指纹检索未知化合物。鉴于深度学习网络很难训练稀疏质谱,针对此问题,本文提出了一种基于Spec2Vec的质谱预测指纹方法。该方法使用质谱嵌入将稀疏质谱向量转化为稠密特征向量。实验结果表明,相较于直接使用质谱作为特征直接预测分子指纹,使用质谱嵌入方法进行指纹预测的表现更加优异。除此之外,本文所提出的方法还可以与指纹预测质谱方法联动以进一步提高识别精确度。Abstract: Mass spectrometry is a technique widely used for the identification of compounds in biological systems. However, traditional mass spectrometry matching methods can only identify compounds that are already present in the mass spectrometry library. One solution is to predict the mass spectra by molecular fingerprint and expand the mass spectral library by the predicted mass spectra. Another method is to predict the molecular fingerprint by mass spectra and retrieve unknown compounds by molecular fingerprint. Since deep learning networks are difficult to train sparse mass spectra. To address this problem, a fingerprint prediction method based on Spec2Vec is proposed. The proposed method uses mass spectrum embedding to transform the sparse mass spectrum vector into a dense feature vector. The experimental results show that the performance of fingerprint prediction using the mass spectral embedding is better than molecular fingerprint prediction directly using mass spectra as features. In addition, the proposed method can be linked with the fingerprint prediction mass spectrometry method to further improve the recognition accuracy.