University at Buffalo, SUNY,
Keywords: data science, informatics
Summary:Trial-and-error approaches are increasingly ill equipped to meeting the complex challenges involved in the discovery and design of next-generation chemistry and materials. Our work recognizes the great opportunities that are arising with the shift towards data-driven in silico research and rational design. These approaches are poised to mitigate the inefficiencies, shortcomings, and limitations of traditional trial-and-error research. However, the notion to utilize modern data science in the chemistry context is so recent that much of the basic infrastructure has not yet been developed, or is still in its infancy. The existing tools and expertise tend to be in-house, specialized, or otherwise unavailable to the community at large. Data science is thus in practice beyond the scope and reach of most researchers in the field. Our work aims to chart new paths in this area and advance the data-driven in silico research paradigm by creating an open, general-purpose software ecosystem designed to overcoming this situation, filling the prevalent infrastructure gap, and thus making data-driven research a viable and widely accessible proposition. The cyberinfrastructure we will present fuses in silico modeling (in particular computational quantum chemistry), high-throughput screening techniques, and big data analytics into an integrated research infrastructure. This work includes the development of methodological foundations, algorithms, protocols, and codes.