Hello everyone. In this blog article, I would like to demonstrate how to create a Python-based Reddit scrapper that allows you to scrape Reddit using a specific keyword query for OSINT investigation purposes.
Following are the libraries required to run this script:
pip install praw
pip install pandas
Reddit Scraper Script
Following is the script that I developed for scrapping Reddit. This script prompts you for input regarding a keyword, subreddit name, and a limit on the number of posts to retrieve. It searches for submissions containing the keyword and collects relevant information such as the username, community name, post title, post URL, and number of comments. The result can be exported in CSV format.
You need to replace YOUR_CLIENT_ID, YOUR_CLIENT_SECRET and YOUR_USER_AGENT with your actual Reddit API credentials. You can obtain these by creating an application on the Reddit website.
import praw
import pandas as pd
# Configure your Reddit API credentials
reddit = praw.Reddit(client_id='YOUR_CLIENT_ID',
client_secret='YOUR_CLIENT_SECRET',
user_agent='YOUR_USER_AGENT')
def scrape_reddit(keyword, subreddit_name='all', limit=100):
# Create a list to hold the scraped data
data = []
# Search for the keyword in the specified subreddit
for submission in reddit.subreddit(subreddit_name).search(keyword, limit=limit):
# Collect relevant information
data.append({
'User': submission.author.name if submission.author else 'N/A',
'Community': submission.subreddit.display_name,
'Post Title': submission.title,
'Post URL': submission.url,
'Comments': submission.num_comments
})
return data
def main():
keyword = input("Enter the keyword to search: ")
subreddit_name = input("Enter subreddit name (or 'all' for all subreddits): ")
limit = int(input("Enter number of posts to retrieve (max 100): "))
# Scrape Reddit
results = scrape_reddit(keyword, subreddit_name, limit)
# Convert results to DataFrame
df = pd.DataFrame(results)
# Display the results in table format
print(df)
# Export to CSV
csv_filename = f'reddit_osint_results_{keyword}.csv'
df.to_csv(csv_filename, index=False)
print(f"Results exported to {csv_filename}")
if __name__ == "__main__":
main()
Below is an example of what the output might look like when you run the provided Python script to scrape Reddit for a specific keyword. For this example, I use the keyword "data privacy" and I choose to search across all subreddits.
Example Input
Keyword: data privacy
Subreddit Name: all
Limit: 5
Subreddit Name: all
Limit: 5
Post a Comment
0Comments