Carnegie Mellon University,
Keywords: catalysts, machine learning
Summary:The ability to forge difficult chemical bonds through catalysis has transformed society on all fronts, from feeding our ever-growing populations to increasing our life-expectancies by synthesizing new drugs. Not only has the rise in popularity of metal-catalyzed cross-coupling reactions enabled us to make existing processes more efficient, but it also has allowed us to synthesize novel and unexplored molecules and materials, unlocking the technologies of the future. In metal catalysis, the choice of ligand often leads to the most significant impact on the reaction outcomes, such as yield or product selectivity. Identifying optimal metal-ligand combinations can be a laborious experimental process. This practice is often held back by the difficulty of meaningfully comparing results with different ligands. I will introduce our efforts to develop a platform for inverse design of catalysts utilizing high-throughput virtual screening (HTVS) and machine learning (ML), coupled with an extensive ligands database. One of the most used strategies in ML for metal catalysis is the development of models based on reactant parameterization. Given that the space of potential ligands is immense, this tactic can be very challenging. We have developed a platform aimed at data-driven ligand optimization and their inverse design. At the center of this strategy is Kraken, a database of descriptors for organophosphorus ligands that encompasses features that are most important for catalysis including conformational flexibility and ability for both coordinative and non-covalent bonding. We characterize this chemical space systematically and demonstrate how these descriptors can be estimated for millions of ligands using ML, thus nearly completely mapping out an immensely relevant space of ligands.