fbpx
Data Engineer
R&D
Full-Time
Tel Aviv

Who are you?

You are a seasoned Data Engineer with a deep understanding of data modeling, massive parallel processing (in both real-time and batch) and bringing Machine learning capabilities into large-scale production systems. You have experience at a cutting-edge startup and are passionate about building the data infrastructures that fuel the world’s first intelligent agent. You are a team player with excellent collaboration, communication skills, and a “can-do” approach.

What you’ll be doing?

  • Build, maintain, and scale data pipelines for both batch and real-time data processing across multiple sources and ecosystems.
  • Design and implement robust APIs and integrate diverse data systems to support data collection and aggregation.
  • Develop and manage advanced data architectures, including lakehouses, streamhouses, and data warehouses.
  • Collaborate with data scientists and other stakeholders to implement effective data solutions and integrate large language models (LLMs) into our systems.
  • Work with cross-functional teams to define business needs and translate them into technical implementations that leverage your deep understanding of data architectures and software engineering best practices.
  • Develop and lead initiatives to manage, monitor, and debug data systems, enhancing their reliability, efficiency, and overall quality.

What should you have?

  • 8+ years of experience in designing and managing sophisticated lakehouse and data warehouse architectures, ensuring scalable, efficient, and reliable data storage solutions.
  • 8+ years of experience building and maintaining ETLs using Apache Spark.
  • 5+ years of experience working with streaming technologies (e.g., Apache Kafka, Pub/Sub) and implementing real-time data pipelines using Stream processing technologies (e.g., Spark Streaming, Cloud Functions).
  • 8+ years of experience with SQL and distributed query engines such as Presto and Trino, with a strong focus on analyzing and optimizing query plans to develop efficient and complex queries.
  • 5+ years of experience developing APIs using Python, with proficiency in asynchronous programming and task queues.
  • Proven expertise in deploying and managing Spark applications on enterprise-grade platforms such as Amazon EMR, Kubernetes (K8S), and Google Cloud Dataproc.
  • Solid understanding of distributed systems and experience with open file formats such as Paimon and Iceberg.
  • 5+ years of experience developing infrastructures that bring machine learning capabilities to production, using solutions such as Kubeflow, Sagemaker, and Vertex.
  • 8+ years of experience writing production-grade Python code and working with both relational and non-relational databases.
  • Solid understanding of software engineering concepts, design patterns, and best practices, with the ability to architect solutions and integrate different system components.
  • Proven experience working with unstructured data, complex data sets, and data modeling.
  • Advantage – Demonstrated experience orchestrating containerized applications in AWS and GCP using EKS and GKE.
  • Advantage – Proficiency in Scala and Java.

Apply for this job

    By submitting your application you accept our Privacy Policy & Terms of Use.

    Apply for this job

      By submitting your application you accept our Privacy Policy & Terms of Use.