1. Academic Validation
  2. Exposome-Scale Investigation of Cl-/Br-Containing Chemicals Using High-Resolution Mass Spectrometry, Multistage Machine Learning, and Cloud Computing

Exposome-Scale Investigation of Cl-/Br-Containing Chemicals Using High-Resolution Mass Spectrometry, Multistage Machine Learning, and Cloud Computing

  • Anal Chem. 2025 Jun 3;97(21):11099-11109. doi: 10.1021/acs.analchem.5c00503.
Tingting Zhao 1 Brian Low 1 Qiming Shen 2 Yukai Wang 1 David Hidalgo Delgado 1 K N Minh Chau 2 Zhiqiang Pang 3 Xiaoxiao Li 4 Jianguo Xia 3 5 Xing-Fang Li 2 Tao Huan 1
Affiliations

Affiliations

  • 1 Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver Campus, 2036 Main Mall, Vancouver, British Columbia V6T 1Z1, Canada.
  • 2 Division of Analytical and Environmental Toxicology, Department of Laboratory Medicine and Pathology, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, Alberta T6G 2G3, Canada.
  • 3 Institute of Parasitology, Faculty of Agricultural and Environmental Sciences, McGill University, Sainte-Anne-de-Bellevue, Quebec H9X 3 V9, Canada.
  • 4 Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada.
  • 5 Department of Microbiology and Immunology, School of Biomedical Sciences, McGill University, Montreal, Quebec H3A 2B4, Canada.
Abstract

Over 70% of organic halogens, representing chlorine- and bromine-containing disinfection byproducts (Cl-/Br-DBPs), remain unidentified after 50 years of research. This work introduces a streamlined and cloud-based exposomics workflow that integrates high-resolution mass spectrometry (HRMS) analysis, multistage machine learning, and cloud computing for efficient analysis and characterization of Cl-/Br-DBPs. In particular, the multistage machine learning structure employs progressively different heavy isotopic peaks at each layer and capture the distinct isotopic characteristics of nonhalogenated compounds and Cl-/Br-compounds at different halogenation levels. This innovative approach enables the recognition of 22 types of Cl-/Br-compounds with up to 6 Br and 8 Cl atoms. To address the data imbalance among different classes, particularly the limited number of heavily chlorinated and brominated compounds, data perturbation is performed to generate hypothetical/synthetic molecular formulas containing multiple Cl and Br atoms, facilitating data augmentation. To further benefit the environmental chemistry community with limited computational experience and hardware access, above innovations are incorporated into HalogenFinder (http://www.halogenfinder.com/), a user-friendly, web-based platform for Cl-/Br-compound characterization, with statistical analysis support via MetaboAnalyst. In the benchmarking, HalogenFinder outperformed two established tools, achieving a higher recognition rate for 277 authentic Cl-/Br-compounds and uniquely identifying the number of Cl/Br atoms. In laboratory tests of DBP mixtures, it identified 72 Cl-/Br-DBPs with proposed structures, of which eight were confirmed with chemical standards. A retrospective analysis of 2022 finished water HRMS data revealed insightful temporal trends in Cl-DBP features. These results demonstrate HalogenFinder's effectiveness in advancing Cl-/Br-compound identification for environmental science and exposomics.

Figures
Products