Research | Trusted AI Group

Our research aims to develop autonomous AI systems that can truly be trusted: systems that are safe, aligned with human norms, and understandable to the people who develop, deploy, and rely on them. To achieve this, we develop novel approaches that combine machine learning, formal methods, and explainable AI.

Keywords: safe reinforcement learning · shielding · fairness in AI · testing & repair of RL · formal explainability

Shielding

Reinforcement learning agents learn by trial and error, a process that is fundamentally at odds with safety-critical settings where mistakes carry real costs. We develop shielding approaches that wrap a reinforcement learning agent with a formally verified safety component. A shield enforces a safety specification on the agent’s behavior, providing correctness guarantees throughout both training and deployment.

Our work spans the theoretical foundations of shielding, probabilistic and stochastic shields, online and adaptive shielding under uncertainty, shielding under delayed observation, and practical tooling including the TEMPEST synthesis framework. Our work spans the theoretical foundations of shielding under different environment and observation models, as well as a practical implementation in our synthesis tool TEMPEST.

Related: Publications →

Fairness

Safety is not the only correctness requirement for AI systems operating in the real world, fairness is equally critical. We extend the shielding paradigm to fairness constraints, developing formal mechanisms that enforce statistical fairness properties on decision-making agents and algorithmic systems. Our work on fairness shields provides provable guarantees that an agent’s decisions remain free of prohibited biases, without requiring retraining or access to the agent’s internals.

Related: Publications →

Testing & Repair of Neural Policies

Before deploying a reinforcement learning agent, we need confidence that it behaves correctly. Standard testing methods are poorly suited to the continuous, high-dimensional state spaces that modern deep reinforcement learning agents inhabit. We develop formal testing and repair techniques tailored to reinforcement learning: search-based test generation, importance-driven testing that focuses effort on states where decisions matter most, fuzz-testing combined with automatic policy repair, and environment model learning for systematic coverage.

Related: Publications →

Explainability & Accountability

Trustworthy AI requires not only correct behavior but also the ability to explain and justify individual decisions. We study formal approaches to explainability grounded in logic and synthesis rather than heuristic attribution. Our work includes syntax-guided synthesis of human-readable explanations, SMT-based oracles for investigating autonomous agent decisions, formal explanations of the admissible actions computed by shields, and tools for legal accountability in automated decision making.

Related: Publications →

Safe Autonomous Construction of Molecular Assemblies

We apply our methods to enable autonomous assembly at the nanoscale. Using a scanning tunneling microscope as both sensor and actuator, we apply our methods to plan and execute sequences of atomic-precision manipulations to arrange individual molecules into prescribed target structures, without human intervention. The core challenge is not just control, but verified control: physical constraints are hard, exploration is costly, and errors at this scale cannot simply be rolled back. We develop fully autonomous frameworks that couple real-time measurement and execution on physical hardware with simulation environments for offline training of reinforcement learning agents.

Related: Publications →