ETL Package: Data Integration from CSV, Excel, and PostgreSQL to a Single Database
Overview
This project is an Extract, Transform, Load (ETL) package designed to merge data from various sources, including CSV files, Excel spreadsheets, and a PostgreSQL database, into a single destination database. The ETL process involves cleaning, transforming, and loading data to facilitate efficient analysis and reporting.
Features
- Extracts data from CSV files, Excel spreadsheets, and a PostgreSQL database.
- Applies customizable transformations to clean and format the data.
- Loads the transformed data into a single destination database.
Usage
Run the ETL pipeline: python PIPELINE_EXEC.py
data set from link : DATA_SOURCE
I separate csv files into different file sources to demonstrate extracting from different file sources.
please see below:
- january to april sales as excel (Sales_January_April_2019)
- may to august sales as csv(sales_csv folder)
- september to december as database (sales_db)