Meta is seeking a Software Engineer to join our team. The ideal candidate is someone with experience working on maximizing performance of AI models on GPUs or custom silicon. This role involves applying these skills to solve some of the most crucial and exciting problems that exist on the web. The AI Applications Engineering team is dedicated to maximizing training and inference performance of Generative AI (GenAI) and Recommendation models on Meta's Training and Inference Accelerator (MTIA). We employ innovative optimization and parallelization strategies to maximize training throughput for the next generations of GenAI and recommendation models. Additionally, we work cross-functionally with many partner teams to ensure end-to-end performance of large-scale pre-training and inference, enabling us to deliver the next generation of AI experiences more quickly to our users.
Software Engineer Responsibilities:
Work cross-functionally to co-design models to maximize pre-training and inference efficiency
Applying and driving state-of-the-art optimization techniques to our latest large-scale AI workloads running on Meta’s fleet of accelerators including functional development and maintenance
Profiling, analyzing, debugging, and optimizing large-scale workloads on our next-generation training superclusters
Optimization of the underlying processes of the whole vertical stack, from kernels, framework, communication, and firmware to layers and hyperparameters
Set direction and goals for the team related to project impact, capacity, and developer efficiency
Lead large and complex technical efforts across many engineers and teams from zero to one
Minimum Qualifications:
Bachelor’s degree in computer science or a related STEM field
Experience programming AI accelerators (e.g. GPUs, custom silicon etc.) using AI frameworks such as PyTorch or similar
Experience developing custom kernels and compiler infrastructure to improve performance using low-level programming models such as CUDA, OpenCL or similar
Minimum 6+ years of experience developing and optimizing performance in modern C/C++
Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment
Preferred Qualifications:
Experience with training and validating large-scale AI models, including parallelising models across several accelerators
Understanding of multiprocessing, including race conditions and communications between processes
Experience of evaluating model performance, e.g., with profilers and tuning hyperparameters
Thorough understanding of model and data parallelisms such as FSDP, tensor parallelism, model parallelism, expert parallelism, etc
Demonstrated experience of the model life cycle from pre-training and post-training to inference, dataset splits and shuffling, metrics, especially for large language models
Experience of developing, optimizing and validating kernels on GPUs or other accelerators
About Meta:
Meta builds technologies that help people connect, find communities, and grow businesses. When Facebook launched in 2004, it changed the way people connect. Apps like Messenger, Instagram and WhatsApp further empowered billions around the world. Now, Meta is moving beyond 2D screens toward immersive experiences like augmented and virtual reality to help build the next evolution in social technology. People who choose to build their careers by building with us at Meta help shape a future that will take us beyond what digital connection makes possible today—beyond the constraints of screens, the limits of distance, and even the rules of physics.
Individual compensation is determined by skills, qualifications, experience, and location. Compensation details listed in this posting reflect the base hourly rate, monthly rate, or annual salary only, and do not include bonus, equity or sales incentives, if applicable. In addition to base compensation, Meta offers benefits. Learn more about benefits at Meta.