Performance Engineer, Inference Systems
AnthropicGenerative AI, company
San Francisco, United StatesMid
Data & AI
About the role
Investigate and improve inference fleet performance and correctness across hardware and serving layers.
- •Drive cross-layer performance and correctness for Anthropic's inference fleet, analyzing throughput, latency, reliability, and correctness across hardware and serving stacks.
- •Key Responsibilities Run cross-layer performance investigations and roofline analysis to find root causes and value of fixes Own and improve correctness evaluation pipelines and investigate regressions Build observability, dashboards, and modeling tools for performance and cost trade-offs Partner with kernel, serving, routing, autoscaling, and capacity teams to implement optimizations Requirements Hands-on performance engineering: profiling, latency/throughput optimization, root-cause analysis Proficiency in Python and ability to work in large production codebases Data analysis skills (SQL, pandas) to turn telemetry into findings Strong communication of quantitative results and interest in numerical correctness
Tech stack
PythonSQLPandas
Match insights
Tech:Python, SQL, Pandas
Level:Mid
Location:San Francisco, United States