Skip to content

ETL to QE, Update 7, Fixing Bugs

Date: 2023-10-11

See Discord Binding for project context

guild_id column not in authors_t table

commit: 112a76164e756e4edf1c14fae92f57de441ffe5e

I tried to run some aggregation queries on the authors_t table, such as What discord guild's have the most Users?, but found that guild_id was missing. I need to have the data noramlized because if I use a join the query can be an order of magnitude slower or even more. It is best to perform aggregations, for example using sum or group by in a single table.

isBot column not indexing in any table

commit: 112a76164e756e4edf1c14fae92f57de441ffe5e

I noticed when doing my initall analytics when asking questions like, What discord guilds have the most messages? that some guilds had millions of messages and others had 10's of thousands. I realized that this data is not useful unless I know who is a bot. Luckily this data was in the export but did not make it into the SQL database. I ran the following query to test if the isBot data was in the database or not.

SELECT
    distinct isBot
from
    messages_t

The result of the query above only returned a single row therefore the data was not getting indexed. I was able to go troubleshoot the source code and found I was pulling the incorrect key from the JSON from DiscordChatExporter

roles_t table not indexing too much redundant data

commit: 398ff8c73ac9d82e3cb95c5ea05700ca64ea8a5e

The roles_t table had almost as many rows as the messages_t table which did not make sense( see code snippit below). I noticed that roles_t did not have a primary key which was allowing this to happen. I went back into the code, added a primary key to the schema, and generated a primary key inside the ExportDiscord.process_discord_json method, and added a on conflict statement to the insert for roles_t

# Redacted
Getting Buckets
mah_json
Running process_discord_json
Done Running process_discord_json
json_data_to_sql Inserting 
# Redacted


tbd_table_name = guilds
1
tbd_table_name = channels
1
tbd_table_name = messages
20768
tbd_table_name = authors
6417
tbd_table_name = roles
257437