S.K. Pinky, P. Hogsed, A. Kancharla, N.A. Zaid, A. Gulyuk, D.S. Pendyala, R. Lakshmi-Ratan, R. Chirkova, E. McLamore, Y.G. Yingling
North Carolina State University,
United States
Keywords: AI, phosphorus sustainability, PASTOR, machine learning, environment, PFAS
Summary:
Phosphorus is a critical element for global food security and ecosystem balance, yet research on phosphorus sustainability remains fragmented across environmental, agricultural, chemical, and industrial domains. The Phosphorus Knowledge Hub is an AI-driven framework designed to unify research literature, open datasets, and machine learning workflows to accelerate scientific discovery and decision-making for phosphorus sustainability. Developed under the PASTOR (Phosphorus AI Scraping, Tracking, Optimization, and Research) architecture, the Hub integrates multimodal data sources, including text, figures, tables, and chemical descriptors through natural language processing, large language models, and knowledge graph reasoning. This unified infrastructure enables cross-domain inference and explainable AI to guide environmental management, materials design, and policy formulation related to phosphorus use and recovery. One major application of this platform focuses on identifying hotspots of groundwater prediction and phosphorus contamination across North Carolina. The model utilizes groundwater chemistry, land use, industrial activity, and hydrogeological data, processed with multiple data imputation algorithms to reconstruct incomplete groundwater records and reduce uncertainty. The imputed data are then fed into a multimodal predictive model that combines neural networks and an XGBoost regressor to estimate phosphorus levels based on related co-occurring contaminants. The model identifies potential phosphorus pollution hotspots where remediation materials and technologies can be deployed effectively, visualized through an interactive map. The framework is scalable to national datasets, offering the capability to predict phosphorus concentrations in unsampled regions using statistically reconstructed and AI-augmented inputs. Complementing this environmental modeling effort, the Hub hosts a phosphorus materials database containing over one million phosphorus-containing compounds curated from PubChem, ChemSpider, and agricultural chemistry repositories. Each compound is represented by 39 key descriptors capturing physicochemical, structural, and reactivity parameters. Machine learning models trained on this database can predict adsorption behavior of candidate sorbent materials such as biochar, zeolites, and metal oxide composites under different aqueous conditions by linking molecular properties of phosphorus species with the surface features of the sorbent materials. This will allow for rapid screening and ranking of materials for phosphorus removal, recovery, and recycling to support sustainable water treatment and circular nutrient management. In parallel, the Hub extends its AI capabilities in understanding and predicting the persistence of per- and polyfluoroalkyl substances (PFAS), which are difficult to remove from the environment. A standardized molecular library of PFAS compounds is being developed using cheminformatics workflows to clean and organize molecular structures. A rule-based ontology links molecular features, such as chain length, functional groups, and hydrophobicity, to environmental behaviors, such as persistence, mobility, and treatability. This system is being enhanced with machine learning models to predict PFAS degradation kinetics and identify optimal treatment methods, including adsorption, photolysis, and advanced oxidation processes. Together, these initiatives illustrate how the Phosphorus Knowledge Hub serves as a flexible, AI-enabled research ecosystem that connects molecular data, environmental models, and large-scale literature. The Hub provides a practical framework for integrating sustainability insights to inform policy, guide technology development, and advance global phosphorus management.