Get list of all wikipedia articles
The goal of this tutorial is to get a list of all wikipedia article URL's
Download a zim file of all of wikipedia
You can download from here or check out Where to download wikipedia? for more info.
Install zim-tools
`
sudo apt-get install zim-tools
Use zim-tools to list all paths in zim file
zimdump list wikipedia_*.zim > list_wikipedia_articles.dump
Use python script to only list articles
# Define the input file and output file names
input_file_name = '/home/dentropy/Projects/wikipedia-article-names/list_wikipedia_articles.dump'
output_file_name = 'articles_names.txt'
# Open the input file in read mode and the output file in write mode
with open(input_file_name, 'r') as input_file, open(output_file_name, 'w') as output_file:
for line in input_file:
if line.startswith('A'):
# If the line starts with 'A', append it to the output file
output_file.write(line[2:])