Senior Site Reliability Engineer
ScopelyMobile Gaming company
Mexico City, MexicoSenior
Software Engineering
About the role
Build observability, automation, and reliability tooling for AI platform production systems.
- •Senior SRE on Scopely's Gen AI team focused on observability, automation, and runtime reliability for AI platforms and internal agentic systems in production.
- •Key Responsibilities Design and operate observability layers for AI platforms (metrics, logs, traces).
- •Build automated findings-to-fix remediation workflows and runtime controls.
- •Implement reliability controls: alerting, health checks, rollback drills, rate limiting.
- •Codify detections, policies, and operational checks as code.
- •Requirements 5+ years in SRE, production engineering, platform operations, or security automation.
- •Strong scripting/coding experience, especially Python, and API/log pipeline work.
- •Experience building observability and alerting systems in AWS or comparable cloud.
- •Familiarity with infrastructure-as-code like Terraform or Pulumi and CI/CD integration.
Tech stack
PythonTerraformPulumiAWSCI/CDIAM
Match insights
Tech:Python, Terraform, Pulumi, AWS, CI/CD
Level:Senior
Location:Mexico City, Mexico