Job Description
Position: Big Data - Spark Developer
Location: Plano, TX or Wilmington DE
Job Length: Long term
Position Type: C2C/W2
We are seeking a skilled Spark Developer to join our dynamic team of data engineers and analysts. As a Spark Developer, you will play a key role in designing, developing, and optimizing data processing pipelines using Apache Spark. You will work closely with cross-functional teams to understand business requirements and translate them into efficient and scalable Spark applications. Your expertise will contribute to the organization's data-driven decision-making capabilities and drive innovative solutions.
Responsibilities:
- Design, develop, and maintain data processing pipelines using Apache Spark.
- Collaborate with data engineers, data scientists, and business analysts to understand data requirements and deliver solutions that meet business needs.
- Write efficient Spark code to process, transform, and analyze large datasets.
- Optimize Spark jobs for performance, scalability, and resource utilization.
- Integrate Python, AWS services (such as S3 and EMR), Redshift, Snowflake, Hadoop, Hive, Spring, Hibernate, Cassandra, Kafka, and ETL processes into Spark applications.
- Troubleshoot and resolve issues related to data pipelines and Spark applications.
- Monitor and manage Spark clusters to ensure high availability and reliability.
- Implement data quality and validation processes to ensure accuracy and consistency of data.
- Stay up-to-date with industry trends and best practices related to Spark, big data technologies, Python, and AWS services.
- Document technical designs, processes, and procedures related to Spark development.
- Provide technical guidance and mentorship to junior developers on Spark-related projects.
Qualifications:
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
- Proven experience (10+ years) as a Spark Developer or in a similar role working with big data technologies.
- Strong proficiency in Apache Spark, including Spark SQL, Spark Streaming, and Spark MLlib.
- Proficiency in programming languages such as Scala or Python for Spark development.
- Experience with data processing and ETL concepts, data warehousing, and data modeling.
- Solid understanding of distributed computing principles and cluster management.
- Familiarity with cloud platforms (e.g., AWS, Azure, GCP) and containerization (e.g., Docker, Kubernetes) is a plus.
- Excellent problem-solving skills and the ability to work in a fast-paced, collaborative environment.
- Strong communication skills to effectively interact with technical and non-technical stakeholders.
- Experience with version control systems (e.g., Git) and agile development methodologies.
- Certifications in Spark or related technologies are a plus.
Advanced - Python, AWS, Redshift, Snowflake,Hadoop, Hive, Spring, Hibernate, Cassandra, Kafka, ETL,