Skip to main content
SearchLoginLogin or Signup

“Hey, it's been a while.” : The Spectacle of Wrongdoing and Recovery on YouTube

Published onDec 07, 2024
“Hey, it's been a while.” : The Spectacle of Wrongdoing and Recovery on YouTube
·

Abstract: While YouTube apology videos have become their own genre subject to parody, analysis of them has been minimal and limited. It is partly difficult to do so because some creators delete their videos when they receive backlash for a supposedly poor or inauthentic apology. We are interested in the performance of remorse and used internet archival tools to create the largest known dataset of YouTube apologies, which includes video metrics as well as YouTuber demographic information. This paper outlines the methods through which we recovered this lost internet data and prepared it for topic modelling, the type of data collected, as well as how we redirected the project as obstacles arose. 

This paper is adapted from a final project submitted for the University of Guelph course CTS*3000: Data and Difference in the spring of 2023. It was jointly written by the three authors. The project was originally on the performance and authenticity of a public YouTube apology but became an exercise in data collection and cleaning. As undergraduate students who are interested in such processes and invested in the visibility and recognition of academic labour, in this paper, we discuss things that did and did not work, with the primary aim of helping others who may be interested in similar research.

“Hey, it’s been a while.” 

YouTube apologies as performance and the recovery of lost data

Neluka Ameresekere, Susane Dang, Jingyi Long

Professor Susan Brown

Culture and Technology Studies

University of Guelph

Spring 2023

Abstract

While YouTube apology videos have become their own genre subject to parody, analysis of them has been minimal and limited. It is partly difficult to do so because some creators delete their videos when they receive backlash for a supposedly poor or inauthentic apology. We are interested in the performance of remorse and used internet archival tools to create the largest known dataset of YouTube apologies, which includes video metrics as well as YouTuber demographic information. This paper outlines the methods through which we recovered this lost internet data and prepared it for topic modelling, the type of data collected, as well as how we redirected the project as obstacles arose. 

This paper is adapted from a final project submitted for the University of Guelph course CTS*3000: Data and Difference in spring of 2023. It was jointly written by the three authors. The project was originally on the performance and authenticity of a public YouTube apology, but became an exercise in data collection and cleaning. As undergraduate students who are interested in such processes and invested in the visibility and recognition of academic labour, in this paper we discuss things that did and did not work, with the aim of helping others who may be interested in similar research.

Introduction: The Influencer and the Apology

The internet influencer is an individual who uses their own personal brand of authenticity to cultivate audiences whose loyalty can be quantified by likes, comments, and views.1

Unlike a traditional celebrity, the influencer bonds with their audiences through content that directly addresses the viewer and allows viewers to respond through comments.2 Influencers perform an intimate version of their “authentic” self through their content such as showcasing their feelings on personal topics or major life events. Intimacy builds trust, and trust builds a stronger parasocial bond. 


1 Emily Hund, The Influencer Industry: The Quest for Authenticity on Social Media (Princeton University Press, 2023), 13.

2 Amanda Tolbert and Kristin Drogos, “Tweens’ Wishful Identification and Parasocial Relationships With YouTubers,” Frontiers in Psychology 10, (2019): 4. https://doi.org/10.3389/fpsyg.2019.02781.

Because of the personal nature of the influencer’s content, their followers often feel personally affronted when their favourite creator has seemingly committed a transgression. These offenses can range from insincerity to financial scams, or physical assault. Regardless of what constitutes “bad behaviour,” followers may demand an apology or acknowledgement of wrongdoing. Whether the influencer believes they have actually done something wrong is often irrelevant, and they must perform an apology to regain their audience’s trust.

But what does it mean to apologize in a public space? These creators, whose entire brands are typically reliant on their “authentic” selves, are performing a persona when they create internet content. This performance extends to their apology—it is a performance of shame. 

People who “expose” the creator are usually doing so with the intention of holding the influencer accountable for their behaviour. However, since this often takes place on a public forum, the exposure is transformed into a public spectacle. The spectators can extend beyond the creator’s followers to other online communities and even mainstream media outlets. This can also extend into physical space with online users doxxing influencers. Wendy Chun describes this kind of exposure as “violent” and likens it to Sedgwick’s epistemology of the closet, as it seeks to shame the individual.3 Although Chun specifically discusses this exposure in slut-shaming, this forced accountability is similar to the exposure of the influencer. Unlike slut-shaming, however, the influencer also has financial incentive to apologize, or risk losing viewers and sponsorships.

We were interested in examining such performances, and the feedback that they receive. This paper outlines the process of how we collected and organized YouTube apology data, going beyond likes and views. It includes detailing of the tools used, accessing deleted content, and demographical information that can be used for a closer analysis. Due to time constraints, we were unable to actually perform much analysis. 

However, because this is a newer subject for discussion, we wanted to share these processes so that others may learn from them when developing their own project methods. We write about the pivoting and roadblocks to provide a realistic picture of the process. This was a successful data-collection project, but only because it had to be— creating this dataset was never meant to take so long, nor be the primary goal. 


3Wendy Hui Kyong Chun, Updating to Remain the Same: Habitual New Media. 1st ed. (Cambridge: MIT Press, 2016), 150.

Project Goals

Originally, we wanted to examine public personas and the performance of an apology. Our main guiding questions were:

  • What makes an apology work?

  • How was the YouTuber’s popularity affected by their controversy and subsequent apology?

  • Why do viewers care so much about the intentions of a stranger online?

We thought that data on this topic would be readily available because influencers are public figures, and these apologies have become milestones in their careers. However, the information we found was scattered and incomplete.

Project Goals 2.0

As a result, we pivoted our focus and started compiling this data ourselves, using transcription software as well as internet archival tools to collect metadata. Our new goal was to create a dataset of YouTube apologies that can be used for the type of analysis that we had originally wanted to do. Recording how we gathered this information was crucial, so that it may be replicated by others who wish to do similar work. 

Since this dataset was not available, it became more important for us to ask:

  • What kind of trends might exist in YouTube apologies?

  • How can we recover lost YouTube data?

Unlike other projects that have analyzed YouTuber apologies, our dataset also includes videos that were unlisted, set to private, deleted or were from a channel that was terminated entirely by YouTube. Excluding this from any textual analysis would be as though they had never been made. Furthermore, the deletion of a video or otherwise changing its status could be a factor used in examining the authenticity of a YouTuber’s apology. 

In-depth analysis was not possible within the given timeframe, but we still used textual analysis tools on a smaller scale to test that data’s usability and consider how it can be incorporated into future projects.

We wanted to make our dataset public like Kakkar and Samora’s visualization4 which included a public playlist of the apologies included, but as large (if not larger) than Choi and Mitchell’s sample of 1175. Neither of these ended up happening because many videos did not fit our criteria and we had concerns about publicly releasing this dataset. 


4Arjun Kakkar and Russell Samora, “The Aftermath of a YouTube Apology,” The Pudding, January 2020, data visualization, https://pudding.cool/2020/01/apology/.

5Grace Choi and Ann Marie Mitchell, “So Sorry, Now Please Watch: Identifying Image Repair Strategies, Sincerity and Forgiveness in YouTubers’ Apology Videos” Public Relations Review 48, no. 4, 2022 https://doi.org/10.1016/j.pubrev.2022.102226.

Considerations

The main ethical concern with this project was the collection of this type of data. What happens when creators no longer wish for this data to be on the internet? Some have already deleted their videos. Furthermore, they have not consented to data collection and analysis of this nature. The Feminist Data Manifest-No outlines ways in which we can study data that refuse harmful data regimes6 and made us reconsider the purposes of the project and its implications. Some declarations in the full piece include refusing to work about people and against using data in perpetuity.

The original purpose of this project was to analyze the impact that these apologies had on public perception and subscriber retention. The creators featured in our project were not affiliated with us in any way and had no impact on the data curation process. They would not see any kind of benefit from our work. On the other hand, these creators have established themselves in positions of power where the impact of their words and actions is worthy of scrutiny. 

In her Uncertain Archives chapter “Remains”, Tonia Sutherland discusses the right to be forgotten, and how it is often a privilege that is not afforded to people of colour.7 While control over personal information on the internet might be desirable on an individual level, for a public figure it can be whitewashing their history. Some influencers have committed crimes or taken advantage of their audiences while still returning to or exceeding their former heights. Instead of denying people the right to be forgotten, we wanted to investigate the way that some influencers may be given more grace when they do try to erase parts of their past, how the performance of an apology affects this “redemption arc”, and how other elements of identity such as race and gender come into play

Our project is strictly for the study of what we believe to be important events in internet pop culture that reflect how people perceive and think about influencers. We outline these methods so that they may be adapted and reused for other projects on lost internet media. Furthermore, in this paper we do not provide specific details of each influencer, nor are we publishing the dataset in its current form. 


6Marika Cifor, Patricia Garcia, et al., “Data Manifest-No”, 2019, https://www.manifestno.com/.

7Tonia Sutherland, “Remains,” in Uncertain Archives: Critical Keywords for Big Data, ed.  Nanna Bonde Thylstrup, Daniela Agostinho, Annie Ring, Catherine D'Ignazio, and Kristin Veel (The MIT Press: 2021) 440.

Methods

The original methodology included selecting and examining the YouTuber’s controversy, a close reading of the subsequent apology and the changes to the YouTuber’s following throughout the event. These were going to be done through five case studies of YouTubers who fall into different types of apologies. The original case studies are as follows:

  • An early example (Fine Brothers Entertainment)

  • A recent example (The Try Guys)

  • An example of a “good” apology (Jenna Marbles)

  • An example of a “bad” apology (James Charles and Tati Westbrook)

  • An example of a repetitive apologist (Logan Paul)

After choosing a number of case studies to focus on, we would delve deeper into the context surrounding the YouTuber's apology, social media metrics, and subsequent reception by their audience. We also planned on performing sentiment analysis on the comments of the apology videos through Python.

However, after transcribing a few apologies, we realized that we were limiting ourselves to a single controversy within the creator’s career. In addition, some of the chosen video case studies (ex. The Fine Bros) were deleted or inaccessible from the YouTuber’s channel. This limited our ability to perform sentiment analysis since we were unable to obtain the comments from the published video. 

Methods 2.0

We then decided to do an in-depth case study of a single creator who has been associated with multiple controversies so that we could better identify the effects of their apology. Focusing on a creator like Logan Paul who has been perceived as both hero and villain (multiple times) would act like a control—it would be possible to see what specifically about the apologies and scandals were different each time.

This may not be especially indicative of any larger trends because Logan Paul is young, white, and a millionaire. These factors may affect how he is perceived compared to other creators who are less successful or privileged. These are qualities of people who have been generally favoured by society and held to a lower standard of accountability for problematic behaviour.

Methods 3.0

We settled on a distant reading of a wide variety of apologies. By analyzing the contents of the videos, it would be possible to find emerging topics and similarities between creators. This could then be compared to the public reception of the apology, helping us identify whether other factors such as race or gender affected the creator’s decrease or increase in popularity (via subscribers, video views, and likes).

This ultimately did not happen, because much of our time and energy had to be rerouted into data collection and cleaning.

Data Collection Process

Selecting Apologies

We started our data collection process by compiling apologies from different sources. As undergraduate students who grew up on the internet, our team is well-versed in the genre of the YouTube apology. We started simply by writing down any apologies that we remembered. This list was used in tandem with news articles about YouTube apologies: mainstream coverage indicated the scandal to be an important point in the influencer’s career. After that, we searched YouTube with keywords that are typically included in apology video titles, such as “sorry” or “truth.” After that, we expanded our search beyond YouTube to other social media sites such as Reddit and X (formally known as Twitter). Apologies were also gathered through videos on YouTube that covered apologies, such as the “Youtuber Apology Tier List” by Charles White, which had just over 9 million views when this paper was originally written in 2023 and since has hit nearly 10 million views a year and a half later.8 

Forming Criteria

Through the collection of these apologies, we developed criteria to help filter which apologies would stay in the dataset. Our criteria was developed through a mix of our own ideas, as well as criteria from Karlsson’s thesis “The YouTube Apology.”9

To meet our criteria and be archived as part of our apology dataset, the content must have been created by an independent YouTuber and be uploaded directly to YouTube. We also required that the link to the apology video was accessible so that we could access Wayback captures. If videos did not have sufficient metadata, they were not included. Lastly, we narrowed our focus by only including one apology per Youtuber. In the case a YouTuber had created multiple apology videos, we chose the one that had the most views.

By applying the criteria mentioned above, we ended with 80 YouTubers from an original list of over 100 apologies.


8 Charles White, “Youtuber Apology Tier List,” penguinz0, December 19, 2020, entertainment video, https://www.youtube.com/watch?v=Eq72iEjcU4w.

9Gabriella Karlsson, “The YouTube Apology,” master’s thesis, Malmö University, 2020, Digitala Vetenskapliga Arkivet, https://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-23129.

Tools Used

YouTube 

YouTube is the original hosting platform of all the apology videos we looked at, as we focused on influencers for whom YouTube is their primary platform and source of income. Although some apology videos were inaccessible due to channel deletion, video deletion or privating, YouTube reuploads and captures from the Wayback Machine were used to view archived versions of video and channel pages. YouTube playlists were used to compile apology videos for easy access and organization purposes.

4K Video Downloader

4K Video Downloader is a YouTube downloader application that allows users to download videos in full resolution. It has the ability to extract audio from videos and download videos in different formats and in batches. 4K Video Downloader was used to download YouTube playlists of the apology videos, providing us with audio for transcription.

Social Blade

Social Blade is a site dedicated to tracking changes in social media engagement metrics. The curation of YouTube subscriber and viewership data largely depended on Social Blade, as the YouTube API could only gather current data.

Social Blade used to host channel statistics on its website. However, in November 2017, Social Blade was required by YouTube to only show statistics up to three years back due to EU data protection regulations.10 As a result, any data we wanted to obtain on Social Blade from 2020 and before had to be accessed via the Wayback machine.

Additionally, in May 2019, YouTube announced that channels with over 1,000 subscribers would have their public subscriber counts abbreviated (e.g., 432,930 as 432K and 51,389,232 as 51M).11 Prior to the change, which took place in September 2019, Social Blade was able to track daily changes in subscriber counts. Afterwards, these figures were only updated in set increments.


10Social Blade, “Frequently Asked Questions,” n.d., https://socialblade.com/youtube/help/data-retention.

11Team YouTube, “Early heads up: abbreviated public subscriber counts across YouTube,” May 21, 2019, https://support.google.com/youtube/thread/6543166/.

Screenshots of James Charles’ monthly subscriber counts from April 2019 captured from The Internet Archive (left) and April 2023 (right).

Given Social Blade’s inability to provide daily subscriber counts after August 2019, we had to manually look for subscriber counts using The Internet Archive for apologies posted after that date – this accounted for approximately half the apologies we curated. Any apologies that came prior were automatically scraped from Social Blade with a Python script by Anjali Shrivastava.12

The Wayback Machine

The Wayback Machine is a digital archive of web pages that allow visitors to view “Internet sites and other culture artifacts in digital form”. 13 It preserves web pages through periodic site crawls, which can take years to complete. As a result, many captures are contributed by individuals who have chosen to preserve a webpage on a specific date.

Many apologies we looked at were deleted or made private due to backlash from the apology or the controversy being resolved. Although a fair number of apologies could still be viewed on YouTube in the form of reuploads, the re-uploaded apology did not have the original subscriber or likes data we required. To gather this data, we used the Internet Archive’s Wayback Machine to look at captures of the original.


12Anjali Shrivastava, “Analyzing Content Cop,” September 14, 2020, GitHub repository, https://github.com/vastava/data-science-projects/tree/master/content%20cop.

13Internet Archive, “About the Internet Archive,”https://archive.org/about/.

Myka Stauffer’s Internet Archive Wayback Machine page for her now-private apology video.

Due to the crowdsourced nature of the Wayback Machine, the availability of pages is entirely dependent on the number of crawls performed and the interest of individuals who have chosen to capture the pages.

Wikitubia

Wikitubia is an unofficial, crowd-sourced wiki dedicated to YouTube creator channels with a minimum of one thousand subscribers.14 During our apology data collection, Wikitubia was used to gather the original links to YouTube channels: after the introduction of channel handles by YouTube, the main channel URLs were changed to personalized handles that the creator could choose.15 Through this URL change, the Wayback machine was not able to retrieve the appropriate data from the current YouTube channel pages as it recognized the handle as an entirely new URL. Using the original channel links provided by Wikitubia circumvented this and allowed us to obtain important channel information such as subscriber counts.


14 Wikitubia, “Home,” Fandom, n.d., https://youtube.fandom.com/wiki/YouTube_Wiki.

15 The YouTube Team, “An update to dislikes on YouTube,” YouTube Official Blog, November 10, 2021, https://blog.youtube/news-and-events/update-to-youtube/.

Python

Whisper: A free transcription script developed by OpenAI. This helped us automate the transcription process but sometimes resulted in inaccurate text.

Pandas: A data analysis library. This allowed for sorting by different categories such as genre, race, and gender.

Natural Language Toolkit (NLTK): Allows users to work with and analyze natural language. Using NLTK, we were able to clean up the text and find the frequency of certain words and phrases. Pandas helped organize the text for use with NLTK.

Code using NLTK to clean up and filter a corpus and obtain most frequent words.

Gensim: A library used for topic modelling. As we shifted the focus of our project, we were unable to do as much as originally planned. However, we used topic modelling on smaller batches of data to test out the tool. 

Sample code for topic modelling to determine ten topics with ten keywords each, and its output.

YouTube API: We were previously interested in using the YouTube Data API for accessing video information. However, the API could only access current data. Historical data and data for deleted videos were unavailable, so Social Blade and Python were used in place of it.

VADER: A sentiment analysis tool that specializes in social media sentiment. We successfully used this tool, but time constraints meant we were unable to perform it on a larger scale.

Types of Data Collected

An identifier was created for each creator to better locate specific apologies across multiple lists. The collected apologies were numbered and combined with a shorter version of the YouTuber's name that was still easily readable. (e.g. Beni and Rafi Fine, or The Fine Brothers, were the fourth apology on the list. Their name was shortened to “Fine”, resulting in the ID “004_Fine”).

We recorded the following in a spreadsheet:

  • Video information

  • YouTube Creator information (detailed below)

  • Wayback URL

  • Reason for apology

  • Subscriber count (current and over time)

Screenshots of different sections of the spreadsheet created.

Video Information

This included the video's title, duration, URL, and the date it was posted. We also recorded information regarding the video’s number of “likes” and “dislikes”.

In November 2021, YouTube announced that dislike counts would no longer be publicly available. Due to this, dislike counts for videos still available on YouTube were obtained through the “Return YouTube Dislike” Chrome extension. Dislike counts for apology videos not available on YouTube but captured on the Wayback Machine prior to December 2021 when the change was implemented, were able to be obtained. If an apology video was no longer available on YouTube and was posted after December 2021, their dislike count was unable to be gathered.


16The YouTube Team, “Introducing handles: A new way to identify your YouTube Channel,” YouTube Official Blog, October 10, 2022, https://blog.youtube/news-and-events/introducing-handles-a-new-way-to-identify-your-youtube-channel/

17Anarios, “Return YouTube Dislike (Version 3.0.0.8),”Google Chrome Web Store, browser extension, https://chrome.google.com/webstore/detail/return-youtube-dislike/gebbhagfogifgggkldgodflihgfeippi.

Additionally, in May 2019, YouTube announced that channels with over 1,000 subscribers would have their public subscriber counts abbreviated (e.g. 51,389,232 as 51M).18 This change took place in September 2019 and affected our methods in gathering subscriber data. Prior to the change, Social Blade was able to track daily changes in subscriber counts. Afterwards, these figures were only updated in set increments.

Since some videos are not publicly available on the YouTube site, we created a section for video status, which included:

Video Status

Description

Original

The video is on the creator’s channel and is completely public to view.

Unlisted

The video is on the creator’s channel but is hidden. It is accessible through a specific URL.

Private

The video is on the creator’s channel, but it is hidden. It is inaccessible to the public.

Deleted

The video does not exist on the channel anymore.

Terminated

The YouTuber’s channel itself was taken down by YouTube, and as a result, all their videos have been completely removed.


Different statuses have different effects on a creator’s channel statistics. When a video is unlisted or private, the views from the video are still included in the channel’s all-time view count. However, if a video has been deleted, its views are subtracted from the channel’s all-time views.

YouTube Creator

Basic information: Channel name, legal name, and other online aliases. For example, Daniel Keem is the individual who runs the DramaAlert channel but is mainly known online as “KeemStar.”

Gender: Recorded so that we could identify any potential trends in the treatment of different genders online. Our only two categories are man and woman, but that is because of all the Youtubers we found, all of them identified as one or the other. This category may evolve to include more gender identities if we grow this dataset in the future.

Race: Recorded to see if there was any correlation between racial groups and the reception of their apologies.


18Team Youtube, “Early heads up: abbreviated public subscriber counts across YouTube,”YouTube Help,  May 21, 2019, https://support.google.com/youtube/thread/6543166.

These categories were determined by consulting data collection guides from Public Health Ontario,19 which contained the categories below. We also looked at the race categories used by Statistics Canada,20 but their specificity was not necessary for our purposes.

Race Categories

Description/Examples

Black

African, Afro-Caribbean, African-Canadian descent.

East Asian

Chinese, Korean, Japanese, Taiwanese descent.

Latino

Latin American, Hispanic descent.

Middle Eastern

Arab, Persian, West Asian descent, e.g. Afghan, Egyptian, Iranian, etc.

South Asian

South Asian descent, e.g. East Indian, Pakistani, Sri Lankan, Indo-Caribbean, etc.

Southeast Asian

Filipino, Vietnamese, Cambodian, Thai, other Southeast Asian descent.

White

European descent.

Another race category

Another race category (write-in response).

Age: The age of the creator at the time that the video was posted.  

Country of origin: The country in which the creator lives and works.

Channel type: Many YouTubers have secondary channels that are dedicated to content that is different from their main channel. For example, someone who makes comedy videos may have a second channel for behind-the-scenes content, outtakes, or vlogs. These secondary channels almost always have lower viewership than the main channel and posting a video to a second channel may indicate the YouTuber’s intent to publicize their scandal and/or their apology.

Genre: We included YouTubers from across a wide variety of genres because we were interested in seeing if there were any trends within specific communities. We determined several channel genres for categorizing the YouTubers, written below. While some of the genres may seem similar, they are actually distinct types of content. Pranks may be a part of the comedy genre, but there are YouTube channels that are entirely dedicated to pranking.


19Public Health Ontario, “Collecting Information on Ethnic Origin, Race, Income, Household Size, and Language Data: A Resource for Data Collectors,” 2021, https://www.publichealthontario.ca/-/media/documents/ncov/he/2021/03/aag-race-ethnicity-income-language-data-collection.pdf?la=en.

20Statistics Canada, “Visible Minority and Population Group Reference Guide, Census of Population, 2016,” 2017, https://www12.statcan.gc.ca/census-recensement/2016/ref/guides/006/98-500-x2016006-eng.cfm

Genres

Description/Examples of content

Beauty

Makeup and fashion.

Gaming

Commentary, reviews, playthroughs.

Lifestyle

Vlogs, storytimes.

Family

Content focused on the family’s daily life or raising children.

Commentary

Covering current events in pop culture.

Art/music

Visual arts, musicians.

Health

Fitness, diet, gym.

Comedy

Comedy sketches.

Prank

Dedicated to pranking on friends, family, or the public.

Technology

Reviews, demos.

Food

Cooking, mukbangs.

Date joined: This is the date that the main channel was created and helped us gauge the length of the apologist’s YouTube career prior to the scandal.

Wayback URLs (and others)

Using code adapted from Anjali Shrivastava (2020), we were able to obtain accurate daily subscriber data using Wayback Machine captures of Social Blade’s monthly channel statistic pages. To automate the data collection process, we added a column in the spreadsheet with Wayback Machine URLs of the channel’s monthly statistics page. The spreadsheet was parsed through Python and the subscriber count was fetched from the Wayback Machine captures.

However, given YouTube’s update surrounding the abbreviation of public subscriber counts in September 2019, the code could no longer properly parse SocialBlade pages for accurate data.21 As a result, Social Blade URLs were not used in combination with the Wayback Machine if an apology video was uploaded after September 2019.

Since a channel’s front page displayed the subscriber count, channel URLs were entered into the Wayback Machine to determine subscriber counts for various points in time.

If a channel URL was not sufficient enough to yield subscription results for the periods of time we required, we entered the URLs from the original apology video and the YouTuber’s other videos into the Wayback Machine to find the closest matching date.


21 Team Youtube, “Early heads up”, 2019.

When watching a YouTube video, the page capture displays information such as the number of views, comments, and the subscription count we needed. YouTubers for whom we were able to get subscription information through YouTube videos were indicated in this as such.

Channels with apologies before/during 2019 that did not have a proper archive on the Wayback Machine were labelled “Not archived” as there was no other way to obtain the information.

Reason for Apology

The reasons for which viewers demanded the YouTuber’s apology video were categorized into the following:

Reason

Description/Examples

Beef

Conflict between the creator and another influencer.

Scamming

Scams involving fans losing money and/or receiving low-quality products. 

Animal abuse

Physical harm, improper care.

Child abuse

Abuse including emotional and psychological harm.

Lying/misinformation

General lying, plagiarism.

Exploitative content

Faking a death, recording the deceased, claiming to be part of a marginalized community for fame. 

Assault/abuse

Towards a spouse or romantic partner, including physical and sexual abuse.

Racism

Blackface, racial slurs.

Infidelity

Towards a spouse or romantic partner.

Grooming

Includes in-person and online.

Harassment

Doxxing, encouraging fans to target individuals.

Insensitive content

Complaints that viewers considered entitled, mockery, or ignorant.   


These were determined throughout our data collection process, adding more categories as needed.

Subscribers

Current subscribers: All current subscriber counts were taken directly from Social Blade. Within a YouTuber’s page on Social Blade, those visiting the page can access a section labelled “Live Subscriber Count.” This provided a real-time subscriber count for any YouTube channel, which is updated every second. Even for channels that were terminated, the live subscriber count would show the subscriber count that they had before termination.

A screenshot of YouTuber PewDiePie’s live subscriber count page.

Subscriber Count Over Time: The subscriber count of the YouTuber’s main channel was recorded at different intervals, ranging from one day before the apology was uploaded, to six months later. This was obtained from Wayback captures of Social Blade and of the YouTuber’s channel page.

Screenshot of the subscriber portion of our spreadsheet.

Results

The daily subscriber count and total subscriber count were plotted with the Python library matplotlib.pyplot (see Figures 1 and 2). Due to YouTube’s changes, these graphs only show the statistics of apologies posted prior to September 2019, which include 36 of the total 80 apologies. Labels were added to highlight the apologies with the most drastic increase and decrease in subscriber counts.

Figure 1. 

The Impact of an Apology on Daily Subscriber Changes (36 YouTubers)

Figure 2. 

The Impact of an Apology on Total Subscriber Counts (36 YouTubers)

Although the data in Figure 1 were generally in the same range, Figure 2, which plotted the percent change in channel subscribers with respect to the channel’s total subscriber count, had two main outliers: Sam and Nia, and Gabriel Zamora. Zamora nearly doubled his total subscriber count in one day while Sam and Nia saw close to a 30% increase in their subscriber count a few weeks before their apology. This increase was due to a video posted on August 5 announcing Nia’s pregnancy which went viral, along with another video posted on August 8 sharing news of Nia’s miscarriage.22 Although Sam and Nia had the second largest gain in subscribers in percentage change, the actual number of subscribers they gained in one day (53,080) pales in comparison to Gabriel Zamora or Jake Paul’s highest daily subscriber changes (204,116 and 122,845, respectively). 

To have a clearer visual of the other 34 apologies, which were compressed in Figure 2 due to the vertical scale’s consideration for the graph’s outliers, the outliers were excluded in Figure 3. In comparison to Figure 2, the edited graph was able to visualize data that had been obscured due to the previous vertical scale.


22 Stephanie McNeal and Rachel Zarrell, “Doctors Cast Doubt On Viral Video Stars Sam And Nia’s Pregnancy Claims,” BuzzFeed News, August 12, 2015, https://www.buzzfeednews.com/article/stephaniemcneal/people-are-doubting-vloggers-sam-nias-viral-pregnancy-announ.

Figure 3. 

The Impact of an Apology on Total Subscriber Counts (34 YouTubers)

While the differences between Figures 2 and 3 led us to believe that percent change would be a greater indicator of the apology’s reception, it was hindered by the size of the channel. In the case of smaller YouTube channels, percentage change had the positive effect of showing the impact a small number could have on the channel’s overall percentage change. For example, in Figure 4, Brad Sousa lost 3,403 subscribers on the day of his apology, which was nearly a 3% decrease in total subscribers and the worst percent change among lifestyle YouTubers. This change was not properly highlighted in the graph representing daily subscriber change (see Figure 5) as the amount he lost was tiny in comparison to fellow lifestyle YouTuber KSI who lost the most subscribers (33,000) among his peers. However, where percent change spotlights Sousa, it undermines the number of subscribers KSI lost, as 33,000 subscribers represented less than 0.16% of his twenty million subscribers at the time. 

Figure 4. 

The Impact of an Apology from Lifestyle YouTubers on Daily Subscriber Changes

Figure 5. 

The Impact of an Apology from Lifestyle YouTubers on Total Subscriber Counts

Although graphs using percent change and daily subscriber counts presented limitations in cohesive data visualization, plotting data of different apologies along a longer timeframe yielded interesting results. From the sample of percent change of 34 channels in Figure 3, many YouTubers sharply lost subscribers in the days following their apology. However, on average, channels had monthly growth in subscribers despite their controversies. For apologies posted after September 2019, subscriber counts at set intervals were gathered, including subscriber count a day before the apology and six months afterwards. All apologies with a subscriber count for those two dates were plotted in Figure 6, which includes apologies posted before and after 2019. 

Figure 6.

Apology Reception and Subscriber Difference after 6 Months (43 Apologies) 

The majority of the YouTubers fell between the range of losing 300,0000 subscribers to gaining 600,000 subscribers six months after the apology. There was, however, a significant difference in the apology date for subscriber changes outside of that range. Except Linus Tech Tips, all YouTubers who gained over a million subscribers six months after the apology posted their apologies before September 2019. Notably, both Jake Paul and PewDiePie posted their apologies in 2017 and were the most successful channels following their apology, gaining 6.4 million and 4.2 million subscribers, respectively. On the other hand, three of the five YouTubers who lost more than 300,000 subscribers, were posted after 2019. This could be attributed to the rise of cancel culture in 2019, where influencers would quickly become de-platformed by the online community if they made any mistakes that influencers from the past may have been able to get away with. An article from Business Insider published in September of 2019 detailed this rise in cancel culture, and how it quickly destroyed careers for influencers who were not prepared for it.23

Regardless of the year, there is a strong correlation between the percentage of video likes and overall subscriber impact. This speaks to the importance of addressing controversy as the impact of an apology with more dislikes can be interpreted as negatively affecting brand perception and audience trust. Viewers that do not feel that a YouTuber’s apology is genuine are more likely to leave them.

Likes Percentage and Apology Duration

On average, the majority of apologies that were well-liked by viewers were still publicly available on YouTube (see Figure 7). Of the 46 apologies that exceeded 50% in video likes, 78% or 36 apologies were still publicly available in their original form. When considering unlisted videos, which are available on a YouTuber’s channel but only accessible through the original video link, the percentage of public apologies increases to 85%. For apologies with more dislikes than likes, only 33% (8 apologies) were publicly available or 38% when including unlisted videos.


23Lindsay Dodgson, “YouTubers are calling out the platform's 'cancel culture' that subjects them to a rampant hate mob and sees them lose thousands of subscribers in a matter of hours,” Business Insider, September 28, 2019, https://www.insider.com/cancel-culture-what-it-means-creators-on-youtube-2019-9.

Figure 7.

The Impact of Duration on Apology Reception by Video Status (70 Apologies)

The length of an apology was also found to be indicative of its overall reception. Apologies with less than 50% in video likes had an average length of 8:47 minutes, whereas the average length of well-liked videos was double that, at 19 minutes. Even after accounting for outliers, such as Lindsay Ellis’ nearly two-hour-long apology, the median for disliked apologies was 5:49 minutes compared to 10:36 minutes for apologies with over 50% in video likes. In both cases, well-liked apologies were approximately double the length of their counterparts. In general, apology videos that exceeded 30 minutes drastically increased the chances of the apology being well-liked: of the 11 apologies above that threshold, 10 had more likes than dislikes and half of those apologies had over 95% likes compared to dislikes.

Channel Types and Reasons for Apology 

The types of YouTube channels behind the apologies we analyzed predominantly represented gaming, lifestyle, commentary, beauty, and family channels (see Figure 8). Different genres of YouTube channels attract specialized audiences so we anticipated trends would appear amongst specific audiences and apology reasons. 

Figure 8.

Apology by YouTuber Channel Type

There were several trends in reasons for apologies for each channel type (see Figure 9). Apologies for animal abuse and child abuse were exclusively made by lifestyle and family channels, respectively. Lifestyle YouTubers apologized for nearly every reason available and made up most of the apologies for exploitative, infidelity, and insensitive content. Gaming channels were disproportionately represented in apologies for grooming and scamming, whereas commentary channels had a large presence in apologies for misinformation and racism.


Figure 9.

Apology Reasons by YouTube Channel Type

Although we expected strong correlations between channel type, apology reason, and apology reception, the results were inconclusive. Several patterns emerged but they were not absolute, such as a favourable apology reception by commentary channels and for apologies addressing misinformation (see Figures 10 and 11). While some apologies by YouTubers for infidelity and exploitative content were able to regain the favour of their audiences, those were often ill-received. Apologies for animal abuse were divisive as they had the most polarizing video likes percentage for an apology reason: Brooke Houts received the second lowest likes percentage (3.73%) for her apology while Jenna Marbles received the highest percentage of likes in the dataset (98.92%).

Figure 10.

The Impact of Duration on Apology Reception by Channel Type (70 Apologies)

Figure 11.

The Impact of Duration on Apology Reception by Apology Reason (70 Apologies)

Topic Modelling

We used Gensim to experiment with topic modelling. The main variables to manipulate were the number of topics, the number of keywords associated with each topic, and the number of passes (times that the program read over the corpus).

Using NLTK’s dictionary of stopwords, we cleaned up a corpus containing the text of all of the apologies. We started with around 20 topics, steadily decreasing it to 10 with each run. We also decreased the number of keywords and increased the number of passes.

After running it 50 times for 10 topics with 10 words each, we ended up with the following words:

Topic 1

want, made, could, videos, okay, part, wanted, started, hope, went

Topic 2

going, video, time, get, ever, understand, situation, came, two, everyone 

Topic 3

would, even, way, make, see, wrong, years, let, long, else

Topic 4

things, never, something, feel, day, mean, much, friends, anything, us

Topic 5

like, one, sorry, right, everything, life, game, thought, little, always

Topic 6

say, think, better, twitter, maybe, shit, already, true, happen, conversation

Topic 7

thing, back, fucking, good, happened, last, talking, work, sure, give

Topic 8

know, lot, kind, also, take, trying, every, many, love, today

Topic 9

really, guys, said, saying, go, still, first, bad, call, thank

Topic 10

people, got, actually, person, stuff, done, yeah, someone, need, making

From these topics, very few contained apologetic words such as “sorry” or “wrong”. However, from certain groupings of words we can gain a bit of an idea of the subject of the video—for example, in topic 6, there is some indication of regret with the word “better” appearing alongside words such as “say” or “think”. The words “twitter” and “conversation” could mean that the influencer is responding to backlash on X (formally known as Twitter).

Future Development

Due to the size of this project and time constraints, there is much room for future development.

Analysis

Since a majority of our time was dedicated to compiling the data, we would apply some tools and actually manipulate the data. This could go in several directions, including but not limited to:

  • Sentiment analysis: This could include analysis of the apologies themselves, or be extended to news coverage/videos about them.

  • Topic modelling: This can be used to see any emerging topics between different creators who have made apology videos and examine relations to factors such as the reason for the apology or the influencer’s genre. 

  • Textual analysis: Tools such as frequency distribution could be analyzed for similar language and find characteristics of apology videos that are perceived as “good” or “bad”.

  • Visuals: This can include different non-verbal cues that indicate sentiments such as tears or body movements. Furthermore, we could examine different aspects of the video’s production, such as lighting, location, and the number of cuts, which were included in Choi and Mitchell’s analysis.24

  • Effects on creator growth: This could involve a closer analysis of the increase or decrease of a creator’s subscribers/views, which could potentially be related to different factors such as the reason for the apology, the creator’s race, or gender.


24 Grace Choi and Ann Marie Mitchell, “So Sorry, Now Please Watch,” 2022.

Visualization

An interactive visualization of our analysis results would be engaging for viewers, as well as representing interpretations of the data in a way that is easier to approach.

We looked to other visualizations for features that we would ideally incorporate into our own work. For example, Linked Jazz25 allows viewers to sort by specific individuals, and click on mentions of other artists to view them in context. 

A screenshot of Linked Jazz allowing the user to view all of Toshiko Akiyoshi’s connections,
as well as view them in context.

The YouTube apology visualization created by Kakkar and Russell for The Pudding26 compares apologies on a spectrum, with specific numbers viewable by hovering over the influencers’ images.


25Semantic Lab at Pratt, “Linked Jazz,” n.d., data visualization, https://linkedjazz.org/.

26 Arjun Kakkar and Russell Samora, “The Aftermath of a YouTube Apology,” data visualization, 2020

Screenshot of the Pudding visualization.

A visualization could be created in a variety of ways, but we would most likely begin with a prototype in Figma.

Conclusion

This project ended up very different from where it began. Different obstacles such as the lack of a database and lack of available data forced us to change our approach several times, and we eventually spent most of our time collecting and organizing the data. As a result, we couldn’t perform the analysis that we had wanted. Still, from the data we were able to analyze, there were some interesting trends. For instance, the length of an apology was correlated to its success, perhaps because audiences viewed a longer apology video to be more comprehensive and sincere.

The work detailed in this paper can be used by other researchers for similar projects that require searching for similar deleted data.

Bibliography

Anarios. “Return YouTube Dislike (Version 3.0.0.8).” Google Chrome Web Store. 2023. Browser extension. https://chrome.google.com/webstore/detail/return-youtube-dislike/gebbhagfogifgggkldgodflihgfeippi

Asarch, S. “Power Ranking: The 10 most famous influencers on the internet.” Insider Inc. February 21, 2021. https://www.insider.com/most-famous-influencers-insider-data-logan-paul-pewdiepie-2021-2.

Bird, S., Klein, E., & Loper, E. Natural language processing with Python: analyzing text with the natural language toolkit. Reilly Media, Inc. 2009.

Choi, G., and Mitchell, A. M. “So Sorry, Now Please Watch: Identifying Image Repair Strategies, Sincerity and Forgiveness in YouTubers’ Apology Videos.” Public Relations Review 48, no. 4 (2022). https://doi.org/10.1016/j.pubrev.2022.102226.

Chun, W. H. K. Updating to Remain the Same: Habitual New Media. 1st ed. Cambridge: MIT Press, 2016. https://doi.org/10.7551/mitpress/10483.001.0001.

Cifor, M., Garcia, P., Cowan, T.L., Rault, J., Sutherland, T., Chan, A., Rode, J., Hoffmann, A.L., Salehi, N., Nakamura, L. “Feminist Data Manifest-No.” 2019. https://www.manifestno.com/.

Dodgson, L. “YouTubers are calling out the platform's 'cancel culture' that subjects them to a rampant hate mob and sees them lose thousands of subscribers in a matter of hours.” Business Insider. September 28, 2019. https://www.insider.com/cancel-culture-what-it-means-creators-on-youtube-2019-9.

Harris, C.R., Millman, K.J., van der Walt, S.J. et al. “Array programming with NumPy.” Nature 585, 2020. 357–362. https://doi.org/10.1038/s41586-020-2649-2.

Hund, E. The Influencer Industry: The Quest for Authenticity on Social Media. Princeton University Press, 2023.

​​Hunter, J.D. “Matplotlib: A 2D Graphics Environment.” Computing in Science & Engineering, vol. 9, no. 3, pp. 90-95, 2007.  

Kakkar, A., and Samora, R. “The Aftermath of a YouTube Apology.” The Pudding. January 2020. Data visualization.  https://pudding.cool/2020/01/apology/.

Karlsson, G. “The YouTube Apology.” Master’s thesis, Malmö University, 2020. Digitala Vetenskapliga Arkivet (diva2:1483089). https://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-23129

Makalintal, B. “How YouTubers Turned the Apology Video into a Genre.” VICE. June 18, 2019. https://www.vice.com/en/article/how-youtubers-james-charles-jaclyn-hill-pewdiepie-turned-the-apology-video-into-a-genre/

McNeal, S., and Zarrell, R. “Doctors Cast Doubt On Viral Video Stars Sam And Nia’s Pregnancy Claims.” BuzzFeed News. August 12, 2015. https://www.buzzfeednews.com/article/stephaniemcneal/people-are-doubting-vloggers-sam-nias-viral-pregnancy-announ.

The Pandas Development Team (2020). pandas-dev/pandas: Pandas (Version 1.5). Zenodo. Python library. https://doi.org/10.5281/zenodo.7794821.

Public Health Ontario. “Collecting Information on Ethnic Origin, Race, Income, Household Size, and Language Data: A Resource for Data Collectors.” 2021. https://www.publichealthontario.ca/-/media/documents/ncov/he/2021/03/aag-race-ethnicity-income-language-data-collection.pdf?la=en.

Python Software Foundation. “Python Language Reference (Version 2.7).” http://www.python.org.

Radford, A. and Kim, J. W.  “Robust speech recognition via large-scale weak supervision.” GitHub. 2022. https://github.com/openai/whisper.

Rehurek, R., and Sojka, P. “Gensim: Topic modelling for humans.” Python library. 2011. https://radimrehurek.com/gensim/.

Semantic Lab at Pratt, “Linked Jazz.” n.d. Data visualization. https://linkedjazz.org/.

Statistics Canada. “Visible Minority and Population Group Reference Guide, Census of Population, 2016.” 2017. https://www12.statcan.gc.ca/census-recensement/2016/ref/guides/006/98-500-x2016006-eng.cfm.

Sutherland, T.  “Remains.” In Uncertain Archives: Critical Keywords for Big Data, edited by Nanna Bonde Thylstrup, Daniela Agostinho, Annie Ring, Catherine D'Ignazio, and Kristin Veel (The MIT Press: 2021) 433-442.

Team Youtube. “Early heads up: abbreviated public subscriber counts across YouTube.” YouTube Help. May 21, 2019. https://support.google.com/youtube/thread/6543166/

Tolbert, A., and Drogos, K. “Tweens’ Wishful Identification and Parasocial Relationships With YouTubers.” Frontiers in Psychology 10 (2019). https://doi.org/10.3389/fpsyg.2019.02781.

White, C. “Youtuber Apology Tier List.” penguinz0. December 19, 2020. Entertainment video. https://www.youtube.com/watch?v=Eq72iEjcU4w.

Wikitubia. “Home.” Fandom. n.d. https://youtube.fandom.com/wiki/YouTube_Wiki.

 The YouTube Team. “An update to dislikes on YouTube.” YouTube Official Blog. November 10, 2021. https://blog.youtube/news-and-events/update-to-youtube/.

Author Biographies: Susane (she/her), Jingyi (she/they), and Neluka (she/her) met at the University of Guelph while completing their undergraduate degrees in Culture and Technology Studies. Though each has their own individual research interests, they are bonded by their mutual fascination with internet pop culture and love for classic Barbie animated movies.

Affiliations:

Susane - University of Guelph, Culture and Technology Studies

Neluka - University of Guelph, Culture and Technology Studies

Jingyi - University of Guelph, Culture and Technology Studies + Arts and Sciences

Comments
0
comment
No comments here
Why not start the discussion?