Performance Engineer, Inference Systems

AnthropicGenerative AI, company

San Francisco, United StatesMid

Data & AI

About the role

Investigate and improve inference fleet performance and correctness across hardware and serving layers.

•Drive cross-layer performance and correctness for Anthropic's inference fleet, analyzing throughput, latency, reliability, and correctness across hardware and serving stacks.
•Key Responsibilities Run cross-layer performance investigations and roofline analysis to find root causes and value of fixes Own and improve correctness evaluation pipelines and investigate regressions Build observability, dashboards, and modeling tools for performance and cost trade-offs Partner with kernel, serving, routing, autoscaling, and capacity teams to implement optimizations Requirements Hands-on performance engineering: profiling, latency/throughput optimization, root-cause analysis Proficiency in Python and ability to work in large production codebases Data analysis skills (SQL, pandas) to turn telemetry into findings Strong communication of quantitative results and interest in numerical correctness

PythonSQLPandas

Tech:Python, SQL, Pandas

Level:Mid

Location:San Francisco, United States