Back to Projects
Job Postings ETL Pipeline
Python
dbt
PostgreSQL
Docker
Git
ChatGPT
Cursor
Overview
A modular ETL pipeline that ingests job postings from different third‑party APIs, normalizes and enriches them with skills extraction, ranks them and loads them to the final tables. Key Features • Modular ELT – microservices for extract, normalize, enrich, rank, publish • Skills extraction – spaCy + YAML keyword rules • Configurable ranking – YAML weights, explainable scores • dbt modeling – raw data is kept at the bronze layer; all transformations and modifications are done in silver layer; only final business-related logic is applied in gold layer • Airflow – the whole system is orchestrated with Apache Airflow • Docker – different parts of ETL are kept in separate images