Executes comprehensive red team and stress-testing exercises to identify vulnerabilities, failure modes, and safety risks across AI systems.
Position Summary:
The Senior AI Governance Scientist will execute comprehensive red team and stress-testing exercises to identify vulnerabilities, failure modes, and safety risks across AI systems, including large language models, generative models, and autonomous agents. The role involves designing, implementing, and refining evaluation methodologies and protocols to assess AI performance, safety, reliability, and alignment with intended use cases. The scientist will evaluate the adequacy and sufficiency of existing AI evaluations, identify gaps in coverage or rigor, and recommend targeted improvements.
Key duties and responsibilities:
· Executes comprehensive red team and stress-testing exercises to identify vulnerabilities, failure modes, and safety risks across AI systems, including large language models, generative models, and autonomous agents.
· Designs, implements, and refines evaluation methodologies and protocols to assess AI performance, safety, reliability, and alignment with intended use cases.
· Evaluates the adequacy and sufficiency of existing AI evaluations, identify gaps in coverage or rigor, and recommend targeted improvements.
· Designs and conducts reproducible experiments to measure AI value, impact, and risk, applying statistical methods and causal inference techniques where appropriate.
· Develops and maintains automated testing frameworks and evaluation pipelines that scale across the organization’s AI portfolio.
· Researches and applies novel attack vectors and stress-testing approaches for generative AI (e.g., prompt injection, jailbreaking, hallucination risks) and agentic systems (e.g., autonomy boundary violations, goal misalignment).
· Creates and curates benchmarks, datasets, and metrics aligned to specific AI capabilities, risk profiles, and governance requirements.
· Documents evaluation methodologies, findings, and recommendations in clear, governance-ready technical reports for review by governance bodies and cross-functional stakeholders.
· Partners with product, engineering, and research teams to integrate evaluation and assurance practices into AI design, development, and deployment workflows.
Qualifications:
- Exceptional written and verbal communication skills — must be able to produce crystal-clear documentation and confidently explain complex AI, governance, and risk concepts to senior leadership in a concise and structured way. Writing quality is a key differentiator.
- Experience performing manual penetration tests and/or red team assessments of applications and cloud infrastructure.
- Strong understanding of AI technologies (agentic AI, traditional machine learning, model risks) with the ability to evaluate how vendors are using AI without needing to build models themselves.
- Experience with AI governance, security, or responsible AI frameworks within enterprise environments (ideally cloud environments such as AWS, Azure, or GCP).
- Background in third-party/vendor risk management — ability to assess new vendors, ask the right technical and governance questions, and help integrate AI review into existing vendor evaluation processes.
- Ability to interpret AI regulations, policies, and emerging legislation and translate them into practical governance processes and clear guidance for the business.


