ETL Package: Data Integration from CSV, Excel, and PostgreSQL to a Single Database

Overview

This project is an Extract, Transform, Load (ETL) package designed to merge data from various sources, including CSV files, Excel spreadsheets, and a PostgreSQL database, into a single destination database. The ETL process involves cleaning, transforming, and loading data to facilitate efficient analysis and reporting.

Features

  • Extracts data from CSV files, Excel spreadsheets, and a PostgreSQL database.
  • Applies customizable transformations to clean and format the data.
  • Loads the transformed data into a single destination database.

Usage

Run the ETL pipeline: python PIPELINE_EXEC.py

420171277_408831038167542_8129673976330505678_n

data set from link : DATA_SOURCE

I separate csv files into different file sources to demonstrate extracting from different file sources.

please see below:

raw_sales_excel

  • january to april sales as excel (Sales_January_April_2019)

raw_sales_csv

  • may to august sales as csv(sales_csv folder)

raw_sales_database

  • september to december as database (sales_db)