Firstly, we decided to use 3 diverse categories (Education, Gaming, Food) to get a more holistic understanding of our objective.
At the same time, within each category, we chose creators that all had relatively different sized subscriber bases (1 small-sized creator, 1 medium-sized creator and 1 large-sized creator).
It is also important to note that the criteria we used to determine whether a creator was "small" or "large" was based on its relative subscriber base compared to all the creators in that category
Gaming
Greg Renko Gaming (Small)
MM7Games (Medium)
MrBeast Gaming (Large)
Food
Food Review UK (Small)
TheReportOfTheWeek (Medium)
Mark Wiens (Large)
Education
Sam O'Nella Academy (Small)
OverSimplified (Medium)
Veritasium (Large)
In order to answer our objective, we had to define an engagement metric. We defined it as the interaction a video receives.
Quantitatively, we calculated it as the proportion of the number of likes and comments a video receives, divided by the number of views it receives
Using these primary ways YouTube users can actively participate with content will help us better understand how much users actively engage with the content they consume. Moroever, its quantitative nature helps us to interpret engagement easier graphically.
1. Upload frequency holds limited significance for average engagement - High quality videos may require a lot of time to make, thus low frequency doesn't necessarily mean low engagement.
2. Subscriber base is strongly correlated with engagement - The Youtube algorithm will naturally show you more of their videos and YouTubers with more subscribers will retain more viewers, hence more engagement.
This graph illustrates how the average view count on the channels correlates with its relative size (subscriber count) in its category
1. Each content category shows distinct viewing patterns across channel sizes:
Average views are generally positively correlated with channel size in Food and Gaming categories. However, the trend appears less consistent in the Educational category, with the small and medium channels having marginally more average viewership than the larger channel
2. Magnitude of impact of channel size varies by content type:
Comparing between the Food and Gaming categories, the average viewership is more strongly correlated in the Food category than the Gaming category (steeper gradient of increase in viewership from smaller to larger channel)
This graph illustrates how the engagement rate of these channels correlates with its size (subscriber count) in its category
1. Smaller channels often achieve higher engagement rates:
From the graph, we can see that the channels with the highest engagement rates (above 3%) were smaller channels in their respective categories (Gaming - MM7Games, Food - TheReportOfTheWeek, Education - Sam O'Nella Academy). Furthermore, the larger channels tend to be on the lower end of the engagement rate(~2%)
2. The influence on the relative engagement patterns varies by content category:
For the Gaming category, the engagement rate is relatively the same (2-3%). Meanwhile, for the Educational category, there is a minor difference between engagement rates of the channels with the highest engagement rate (MM7Games - ~3%) and lowest engagement rate (MrBeast Gaming - ~2%). However, for the Food category, there is a relatively large difference between engagement rates of the channels, especially between TheReportOfTheWeek(~5%) and Mark Wiens (~2%).
This graph illustrates how the engagement rate of these channels correlates with its upload frequency (days between uploads)
1. Engagement rate generally decreases as upload frequency decreases (i.e. days between uploads increases):
However, correlation between upload frequency and engagement is weak, and the best-fit line has a gentle gradient. This suggests that while frequent uploads may contribute to engagement, other factors may also play a significant role.
2. There are notable outliers that exist:
Mark Wiens maintains a high upload frequency but experiences relatively low engagement. This implies that factors such as audience retention and content uniqueness might override the effects of upload frequency in influencing engagement.
Based on the findings from Graphs 1, 2, and 3, we can conclude the following about the impact of upload frequency and subscriber base size on average engagement across YouTube content categories:
1. Channel size influences average views but not necessarily engagement.
Larger channels in the Food and Gaming categories tend to have higher average views, but engagement rate does not consistently scale with channel size (Graph 1 & 2).
Meanwhile, Educational channels do not exhibit a strong correlation between size and views, implying other factors drive their success.
2. Engagement rate is not strongly linked to subscriber count.
Smaller channels often achieve higher engagement rates, especially in the Educational category (Graph 2).
Medium-sized channels in Gaming and Food also perform well in engagement, suggesting that a highly engaged audience matters more than a large subscriber base.
3. Higher upload frequency generally leads to higher engagement. (Graph 3)
However, exceptions such as Mark Wiens show that frequent uploads do not guarantee high engagement, indicating that content quality and audience retention are more crucial factors.
1. Overlap between number of likes and number of comments
Based on our earlier established formula to calculate engagement rate, one issue that could affect these values are overlaps between number of likes and number of comments i.e. the same user that does both, or long comment chains involving 2-3 users. Such instances would magnify the proportion of likes and comments with respect to views, thus impacting the overall magnitude of the engagement rate.
2. Existence of Outliers i.e. viral videos
There might be some videos made by the channels/creators that go especially viral, and thus skew the average values of our engagement metric. For example, a small channel like TheReportOfTheWeek could have an average viewership of ~500k per video, but a few of his viral videos obtained ~10m views, which would significantly skew the engagement rate values that we use.
3. Limited Sample Size & Selection Bias
To ensure a detailed analysis, we only collected data from three channels per category. This small sample size may introduce substantial random errors, making it difficult to generalize findings to all channels within a category.
Moreover, only handpicking 3 youtubers per category might not be a good representation of the whole category in general as well. For e.g. other large food channels like "The Best Ever Food Review Show" could be different from "Mark Wiens", and have a high engagement with high upload frequency. If we had used this channel instead, we could have come to different conclusions.
4. Dated Data
Some channels, such as Sam O'Nella Academy, have been inactive for several years. As a result, the findings based on these channels may not be applicable to newer or more active channels within the same category.
5. Calculation of Metrics
The metric days between uploads is calculated as the average time between video upload dates.
If a channel uploads a large batch of videos at once and then remains inactive for an extended period, the inconsistency in upload patterns is not captured accurately by the graphs. This could lead to misleading conclusions about the relationship between upload frequency and engagement.
Despite the various potential pitfalls in the analysis of the data, we can come to the general conclusion that a successful YouTube strategy depends on balancing channel size, content quality, and engagement rather than relying solely on frequent uploads or a large subscriber base. Different content categories respond differently to these factors, emphasizing the need for tailored approaches per category. However, further research can be done to explore other forms of content on Youtube that is affecting engagement of newer channels as well e.g. Youtube Shorts
Data Collection
Data Processing/Web Designer
Data Visualisation
Overall Repository Structure
I would like to express my sincere gratitude to the DS105A Course for an enriching learning experience.
Special thanks to Jon Cardoso-Silva, and the rest of the DS105A team (Riya Chhikara, Alex Soldatkin, Kevin Kittoe, and Sara Luxmoore) for their guidance and support throughout the course :).