Skip to content

Date: 2023-10-19

See Discord Binding for project context,

BERTopic on AMD GPU using ROCm

Testing Bertopic Run times and Results

On Intel i7-9700 : 673.2816 Seconds On T4 on Google Colab: 80.7281 Seconds Cohere API: 358.6873 Seconds and 35 cents USD, 3,534,005 Calls GTX 1060 6Gb: 75.4157 Seconds

For additional context check out, What is the length of the Bertopic default dataset from sklearn?

CPU Script

from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups
import timeit

docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))['data']

topic_model = BERTopic()
start_time = timeit.default_timer()
topics, probs = topic_model.fit_transform(docs)
elapsed = timeit.default_timer() - start_time

Google Colab T4 Script

!pip install bertopic
import timeit
from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups
topic_model = BERTopic(embedding_model="all-MiniLM-L6-v2")
docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))['data']

# topic_model = BERTopic()
start_time = timeit.default_timer()
topics, probs = topic_model.fit_transform(docs)
elapsed = timeit.default_timer() - start_time

Cohere API Script (Requires Production API Key) - COSTS $0.35 USD

# !pip install cohere
# !pip install bertopic
import cohere
from bertopic import BERTopic
from bertopic.backend import CohereBackend
import timeit
from sklearn.datasets import fetch_20newsgroups

client = cohere.Client("PRODUCTION_API_KEY")
embedding_model = CohereBackend(client)
docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))['data']
topic_model = BERTopic(embedding_model=embedding_model)
start_time = timeit.default_timer()
topics, probs = topic_model.fit_transform(docs)
elapsed = timeit.default_timer() - start_time

Installation Guide - RAPIDS Docs

GTX 1060 6Gb

wget https://bootstrap.pypa.io/get-pip.py
python3 get-pip.py
rm get-pip.py
sudo apt install python3-dev
sudo apt install build-essential
sudo apt install nvidia-modprobe
python3 -m pip install bertopic

import timeit
from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups
topic_model = BERTopic(embedding_model="all-MiniLM-L6-v2")
docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))['data']

# topic_model = BERTopic()
start_time = timeit.default_timer()
topics, probs = topic_model.fit_transform(docs)
elapsed = timeit.default_timer() - start_time

GTX 3090

TODO

Issues with Bertopic on ROCm


RuntimeError: HIP error: invalid device function
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing HIP_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.