Data Collection
Overall we collected…
5484 rows of GME stock price data
Over 1.3 million rows of r/wallstreetbets post data
r/wallstreetbets Posts
Collection Process
Challenges Faced
Reddit API Limitations: The inability to filter posts by date with the Reddit API hampers historical data collection.
Pushshift API Access Restrictions: Changes in the Pushshift API’s terms of use restrict its availability for research, limiting access to historical subreddit data.
Scraping Date-Specific Data: The need to find alternative methods to bypass the Reddit API’s date filtering limitations complicates data collection.
Web Scraping Tool Limitations: Using Selenium for scraping reveals a cap on data volume, with a maximum of 1000 recent posts being retrievable at a time.
API Access Challenges: Failed attempts to use another’s Pushshift API access highlight the difficulties in obtaining necessary permissions for data access.
Data Format Navigation: Requesting a CSV version of Reddit API data points to the challenges of managing and utilising the provided data formats efficiently.
GameStop Stock Prices
Collection Process
Challenges Faced
API Key Registration: Obtaining an API key requires registration, which may involve sharing personal information and adhering to specific use cases.
Rate Limits and Quotas: Alpha Vantage imposes rate limits that can slow the data collection process, a significant consideration for large datasets.