How to scrape Reddit data in 2023

Tiya Vaj
2 min readMay 22, 2023

--

In Reddit, posts and comments are two distinct types of content that users can contribute to the platform. Here are the main differences between posts and comments:

  1. Posts: Posts are the original content submitted by users to a subreddit. They can be text-based submissions, links, images, videos, or other types of media. Posts are typically used to initiate discussions, ask questions, share information, or express opinions. Each post is associated with a title and may have a body text providing additional details or context. Users can upvote or downvote posts to determine their popularity, and comments can be made on posts to engage in conversations.

code : !pip install praw

import praw
import pandas as pd

reddit = praw.Reddit(client_id='your_client_id',
client_secret='your_client_secret',
user_agent='your_user_agent')

ml_subreddit = reddit.subreddit('MachineLearning')

# Retrieve posts from the subreddit
posts = ml_subreddit.new(limit=10) # Use the "new" method to retrieve new posts

# Create a list to store the post data
post_data = []

# Iterate over the posts and extract relevant information
for post in posts:
post_data.append({
'Title': post.title,
'Post Score': post.score,
'Author': post.author.name,
'Post ID': post.id,
'Subreddit': post.subreddit.display_name
})

# Create a DataFrame from the post data
posts_df = pd.DataFrame(post_data)

# Print the resulting DataFrame
print(posts_df)

2. Comments: Comments are the responses or replies made by users to posts or other comments. They are the main way users participate in discussions on Reddit. Comments allow users to express their thoughts, provide additional information, ask questions, or engage in conversations related to the post. Each comment is associated with a parent post or comment to establish the thread of discussion. Comments can be upvoted or downvoted by other users to indicate their relevance or contribution to the conversation.

import praw
import pandas as pd

reddit = praw.Reddit(client_id='your_client_id',
client_secret='your_client_secret',
user_agent='your_user_agent')

ml_subreddit = reddit.subreddit('MachineLearning')

# Retrieve comments from the subreddit
comments = ml_subreddit.comments(limit=10)

# Create a list to store the comment data
comment_data = []

# Iterate over the comments and extract relevant information
for comment in comments:
comment_data.append({
'Comment Body': comment.body,
'Comment Score': comment.score,
'Author': comment.author.name,
'Post ID': comment.link_id,
'Subreddit': comment.subreddit.display_name
})

# Create a DataFrame from the comment data
comments_df = pd.DataFrame(comment_data)

# Print the resulting DataFrame
print(comments_df)

In order to get client_id and client_secret : please follow this link

--

--

Tiya Vaj
Tiya Vaj

Written by Tiya Vaj

Ph.D. Research Scholar in NLP and my passionate towards data-driven for social good.Let's connect here https://www.linkedin.com/in/tiya-v-076648128/

No responses yet