Data Cleaning
r/wallstreetbets Posts
Cleaning Process
Challenges Faced
Alignment: Formatting the ‘date’ column to ‘datetime64’ to ensure compatibility and precise alignment with GameStop (GME) stock price data for time series analysis.
Aggregation: Using the groupby method to aggregate posts by date requires careful handling to ensure accurate summarisation of data points, like total posts, average scores, or comments per day.
Data Frame: The process of transforming raw subreddit data into a cleaned and structured data frame ready for analysis involves several preprocessing steps, including handling missing values, removing duplicates, and standardising data formats.
GameStop Stock Prices
Cleaning Process
Challenges Faced
The process of transforming the raw API data into a cleaned data frame involved multiple sub-tasks:
- Ensuring data consistency and accuracy after type conversion.
- Handling any missing or anomalous data points that could skew analysis.
- Structuring the data in a way that would align with analytical goals.
Merging Data Frames
Once the data was cleaned, the GME stock price data frame was merged with the r/wallstreetbets post data frame.
Rows containing ‘Nan’ indicate that the stock market was closed - these rows were subsequently dropped.