A. Nolan, R. Hubbard
Keywords: Machine Learning, Synthetics
Summary:Characterizing the performance of a ML algorithm as a function of statistical similarity between the test and train data sets is vital to understanding the particular sensitivities that a ML algorithm has and how these sensitivities can be overcome. Exploring this space is of important value because quite infrequently do deployed systems encounter the same environments as the algorithms were trained on and this is due to the impracticality of collecting the required amount of multi-condition measured data that would be necessary to truly train a robust algorithm. Synthetics have proven to be a relatively inexpensive and timely alternative for both supplementing measured training data and generating enough data to perform experiments to provide a scope of what data would be necessary to improve an algorithms performance. To understand these sensitivities 4 studies were defined where the training set was supplemented with instances of a particular EOC. The studies consisted of an articulation state study, a depression angle study, a squint angle study and a background noise study. The studies represent the general variability that can be seen during extended operating conditions and represent the factors that may cause an ML algorithm to perform poorly. Synthetic data was used to represent the SAR imagery during it's benign state and under the extended operating conditions as synthetic data allows for precise control of variables in the data. Leveraging the synthetic data 9 unique datasets were created for training and testing the ML algorithm, the exact breakdowns and statistical representations of these datasets are calculated using the Bhattacharyya Coefficient equation. The algorithm performance metrics were calculated using a python version of AFRL's test harness and are represented by ROC curves of PDEC vs PFA and Confusion Matrices at a given PDEC. The final results show that understanding the statistical similarity between training vs test operating conditions is a useful tool for training optimization and performance understanding and that synthetic data readily allows sensitivity explorations for multiple modalities.