A Privacy-Preserving Cyberinfrastructure for Synthetic Transportation Video and Safety Analytics

S. Das, A. Rafe
Texas State University,
United States

Keywords: Sheaf Theory, Causal Identifiability, Multimodal AI, Algebraic Topology, Sensor Fusion

Summary:

Transportation agencies across the United States collect an immense volume of operational video data from highway CCTV networks, traffic signal systems, and transit surveillance. While these massive datasets contain rich details about how drivers, pedestrians, and cyclists interact, information critical for proactive safety management and multimodal planning, they remain largely inaccessible to the research community. The primary barrier is a "data stalemate" driven by strict privacy policies, liability concerns, and the risk of re-identifying individuals from raw footage. Current ad hoc solutions, such as blurring faces or masking pixel regions, often degrade the fidelity required for advanced computer vision tasks, resulting in a significant loss of utility for safety science. To resolve this conflict between privacy and progress, we introduce PIVOT (Privacy-Preserving Infrastructure for Video Synthesis and Analysis in Open Transportation Safety). PIVOT is a novel cyberinfrastructure designed to transform sensitive operational video into privacy-protected, utility-preserving synthetic assets. Unlike traditional anonymization, PIVOT utilizes tracking-conditioned diffusion and flow-based generative models to synthesize entirely new video sequences and trajectories. These synthetic outputs retain the statistical and behavioral properties of the original traffic scenes, including complex interactions and physics, without containing any Personally Identifiable Information. A core innovation of the PIVOT framework is its focus on tail-fidelity science. Standard generative models often revert to the mean, erasing the rare, long-tail events that are most critical for safety analysis, such as near-miss vehicle conflicts or evacuation bottlenecks. PIVOT employs specialized stress tests and importance sampling during training to ensure that these high-risk, low-frequency events are accurately preserved in the synthetic data. The infrastructure is governed by a Privacy-Utility Evaluation Harness, which acts as a certification engine for every dataset release. This system co-reports formal differential privacy budgets (ε,δ) alongside task-specific fidelity metrics (e.g., Time-to-Collision accuracy, crowd pressure estimation). This allows data consumers to verify that the synthetic data is both mathematically safe regarding privacy loss and scientifically valid for downstream analytics. To accommodate the diverse regulatory environments of Department of Transportation agencies, PIVOT supports federated operation. The ingestion and synthesis pipelines can be deployed on-premises, ensuring that raw video never leaves the agency's secure firewall. Only the certified synthetic datasets and compliance reports are exported. These assets are then published in a FAIR (Findable, Accessible, Interoperable, Reusable) Registry, complete with Digital Object Identifiers (DOIs) and provenance tracking. By establishing a standardized, auditable marketplace for high-fidelity synthetic data, PIVOT unlocks a national resource for transportation researchers. It enables the development of next-generation conflict detection models and equitable infrastructure planning tools while strictly safeguarding civil liberties and adhering to agency data governance policies.