ETL to QE, Update 6, Native SQL Decision

Date: 2023-10-10

See Discord Binding for project context

Issues and evolution of ETL Design

The same friend that recommended the psycopg2 optimizations in ETL to QE, Update 5, Optimizations mentioned that I was using my Postgres like MongoDB when I should be using proper native SQL. In this update I will defend my initial decision to use JSON tables and then circle back to why this design decision is not as optimal as I first expected and why JSON Files to Native SQL is the way to go.

Please read ETL to QE, Update 1, SQLite to Postgres for additional context.

Initial Design

JSON Files -> SQLite JSON Tables

I initally chose SQLite for the following reasons

I had previously used it in my Keybase Binding
No dependencies, SQLite is built into python
No need to worry about connecting to server
Portable, database is a file

Now add Step to load data into Postgres

JSON Files -> SQLite JSON Tables -> Native SQLite Tables -> CSV Tables -> Postgres Tables -> -> Concatenate Postgres Tables -> Cast Timestamps in Postgres Tables -> Any desired format

At one point I realized that I had so much data it would be better managed in a proper database like Postgres. Getting the data from SQLite into Postgres proved much more difficult than I had intended. If I had practiced the ETL pipeline with synthetic data things might have worked out better.

Only Postgres Design, using JSON tables like MongoDB

JSON Files -> JSON Postgres SQL Tables -> Native Postgres SQL Tables -> Any desired format

As articulated in ETL to QE, Update 1, SQLite to Postgres I explained why I initially wanted to use JSON tables rather than native SQL tables.

At this point when practicing loading the ETL pipline into the database I noticed it took hours to run even with The Optimizations therefore it was time to write some real SQL.

Native SQL Postgres Design

JSON Files -> Native Postgres SQL Tables -> Any desired format

Just like when doing a calculus proof you arrive at a simple description at the end.

Backlinks

Project Update Posts