Who are you?
You are a seasoned Data Engineer with a deep understanding of data modeling, massive parallel processing (in both realtime and batch) and bringing Machine learning capabilities into large-scale production systems. You have experience at a cutting edge startup and are passionate about building the data infrastructures that fuels the world’s first intelligent agent. You are a team player with excellent collaboration, communication skills and a “can do” approach
What you’ll be doing
You will contribute your extensive experience of building large-scale data-intensive systems in both realtime and offline scenarios
What should you have?
- 3+ years of experience building massive parallel processing solutions such as Spark, Presto and similar technologies
- 2+ years of experience developing real-time stream processing solutions using Apache Kafka or Amazon Kinesis
- 2+ years of experience developing infrastructures that bring machine learning capabilities to production, using solutions such as Kubeflow, Sagemaker and Vertex
- Demonstrated experience orchestrating containerized applications in AWS and GCP using EKS and GKE
- 3+ years of experience writing production-grade Python code and working with both relational and non-relational databases
- 2+ years of experience administering and designing cloud-based data warehousing solutions such as Snowflake or Amazon Redshift
- 2+ years of experience working with unstructured data, complex data sets, and data modeling