Job title: Lead Data Engineer in Los Angeles, CA at WorkHQ Company: WorkHQ Job description : Company Context Series A, well-funded US startup in HRTech developing and an AI Recruiter product. This is a US-only, Remote role (Mainland). Role Overview Lead data infrastructure architect managing billions of data points across 250M+ professional profiles. Hire data engineers to aid you in that journey. Core Responsibilities Design scalable data pipelines processing massive record volumes Architect ETL processes using PySpark on Amazon EMR (Open to shifting to other solutions like Data Bricks / Snowflake) Distribute enriched data through medallion architecture across Postgres, Athena, OpenSearch Integrate new data sources into the main pipeline Implement advanced data matching using Splink Technical Requirements 5-8 years professional data engineering experience Good proficiency in: PySpark and distributed computing AWS data services (EMR, Glue, Athena) Docker Pandas and DataFrame manipulation Complex data format handling (JSONL, Parquet) Strong background in: Big data processing architectures Data warehouse design Performance optimization Advanced Python, SQL skills Nice to Have Probabilistic record linking expertise OpenSearch/elasticsearch technologies Machine learning data pipeline design Recruitment tech ecosystem knowledge Technical Stack Big Data: PySpark, EMR Databases: Postgres, OpenSearch Cloud: AWS Containerization: Docker Data Formats: JSONL, Parquet Analytics: Metabase, Athena, Glue Data Processing: Pandas, Splink Other Considerations While this role has specific requirements - if you lack a few technical skills, but motivated to learn and lead the platform, please apply for consideration. If you are coming from Director/Head of/VP levels that is relevant to this job, you can apply as well. You will need to apply directly on our platform. Thank you for your time. Expected salary : $140000 - 180000 per year Location : Los Angeles, CA Apply for the job now! Apply for this job