Who are you?
We are seeking individuals with extensive experience building infrastructures that enable training, fine-tuning, and serving billion-parameter scale deep learning models, especially in the NLP domain using Pytorch and the huggingface ecosystem. You are a passionate and driven individual who strives to be their best everyday.
What you’ll be doing
As a Deep Learning Engineer in the team, you’ll have the opportunity to build the infrastructure for in house LLM’s and other deep learning in house models.
What should you have?
- 2+ years of experience working with large-scale Pytorch-based deep learning applications on GPUs and TPUs using CUDA in multi-node multi-GPU scenarios
- 2+ years of experience building, training and fine-tuning pipelines for large language models using distributed training approaches for both model and data
- 2+ years of experience building serving APIs for sub-second latency inference of large language models using various optimization techniques
- Extensive experience with Pytroch, Pytorch lightning, DeepSpeed, Megatron-LM, JAX/FLAX, and the Huggingface ecosystem
- 1+ years of experience working with ML lifecycle solutions such as Kubeflow, AWS Sagemaker, or Vertex AI for bringing machine learning solutions from research to production