Job Title: Senior Data Engineer
Location: USA – NY, SLC, TX, UT, Dallas, New York City (Onsite)
Job Summary:
We are seeking a Senior Data Engineer to design and develop scalable, high-performance data pipelines focused on time series data. The ideal candidate will have deep expertise in time series data platforms and strong proficiency in integrating data pipelines with machine learning workflows. This role involves collaboration with data scientists and ML engineers to ensure robust and efficient data systems that drive real-time and batch analytics.
Key Responsibilities:
Design, build, and optimize data pipelines to process large-scale time series data.
Develop scalable infrastructure using tools like KDB+, TimeSet, or Kronos.
Implement real-time and historical data ingestion and transformation workflows.
Integrate data systems with Python-based ML pipelines to support model training and inference.
Collaborate with cross-functional teams to ensure data availability, integrity, and performance.
Design data models and schemas tailored for time series data, including strategies for downsampling, indexing, and aggregation.
Monitor and fine-tune systems for reliability, scalability, and performance.
Implement best practices in data governance, lineage tracking, and system observability.
Mentor junior engineers in large-scale system architecture and distributed processing.
Work closely with product and infrastructure teams to align technical solutions with business objectives.
Requirements:
5+ years of experience in data engineering, particularly in large-scale, high-throughput environments.
Proven experience with time series databases and storage solutions (e.g., KDB+, TimeSet, Kronos).
Strong background in building both streaming and batch data pipelines using tools like AWS Glue, Apache Kafka, Apache Flink, or Apache Spark.
Proficiency in Python and experience integrating with ML libraries such as pandas, NumPy, scikit-learn, and PyTorch.
Experience designing scalable, efficient data models and partitioning strategies for time series data.
Knowledge of distributed systems, parallel computing, and columnar data storage.
Familiarity with cloud-based data architectures (AWS, GCP, or Azure) and containerized environments.
Strong understanding of data quality, lineage, monitoring, and observability practices.
Excellent communication skills and the ability to work in a client-facing, collaborative environment.
Preferred Qualifications:
Experience with multiple time series systems (e.g., KDB+ and Kronos).
Contributions to open-source data infrastructure or time series projects.