Sentiment Analysis: PeacockTV
Updated: Jul 21, 2020
A note from the author:
Welcome to my first blog post! I've recently started exploring NLP and one of the first topics I came across that piqued my interest was "Sentiment Analysis." On the same day that I read my first blog post on this topic, I also read that NBC would be launching a new streaming platform called Peacock. I was immediately intrigued. Free content? Yes please! I figured this launch would be the perfect opportunity to dip my toes into the realm of NLP and perform a sentiment analysis of my own. I don't have any experience with NLP, so shoutout to all the lovely humans who write blog posts for Toward Data Science and those who contribute to StackOverFlow.
NBC has entered the subscription streaming business with PeacockTV. As the 10th such service to launch in an extremely competitive market, it can be incredibly helpful to monitor relevant Twitter content. It's important to detect and subsequently address any issues to provide users with the best experience possible. The user's experience on the platform can have a drastic effect on the streaming service's conversion rate.
Let's take a look at Twitter data from PeacockTV's nationwide launch date, July 15, 2020.
Extracting Data From Twitter
Our dataset includes tweets from July 15, 2020 including the keyword 'peacock', excluding retweets.
Cleaning the Data
The process can be broken down into 4 steps:
1. Remove tweets posted by accounts related to NBC, like @peacockTV, @nbc, etc.
We're interested in customer (or potential customer) sentiment.
2. Remove symbols, stopwords and mentions of 'peacock' and 'peacockTV'.
3. Replace emojis with text descriptions.
The emoji descriptions were scraped from: https://unicode.org/emoji/charts/full-emoji-list.html. This analysis checks for and replaces the first ~250 emojis listed on this site. The purpose of this step is to hopefully include the effect of an emoji in the polarity calculation. These days, people are a lot less explicit in terms of using words to express their thoughts and feelings and will instead use emojis.
4. Exclude tweets that contain the word 'animal(s)' or 'bird(s), while keeping the tweets that also have the words 'nbc', 'stream', 'streaming', 'app', and 'Comcast'.
We will assume that the excluded tweets aren't relevant to PeacockTV.
This brings us to approximately 10k tweets to analyze.
To get a general feel for what people are tweeting about Peacock, we can prepare a word cloud. This one is in the shape of NBC's PeacockTV logo!
Some words that stand out are 'streaming', 'service', 'Roku', 'new', 'watch', 'NBC', 'app', 'free', and 'TV'.
Let's break it down even further and take a look at the top 20 most common words.
TextBlob Polarity Analysis
To gauge the sentiment of these tweets, we use TextBlob to get polarity scores, where -1 is extremely negative, 0 is neutral, and +1 is extremely positive. We'll categorize the data as follows:
Negative: polarity < 0
Neutral: polarity = 0
Positive: polarity > 0
Almost 50% of the tweets were categorized as positive, with less than 20% categorized as negative. PeacockTV is off to great start!
We'll take a closer look at the more positive tweets (polarity > 0.5) and more negative tweets (polarity < -0.5) to see if it sheds light on potential issues, possible improvements, or areas where they're doing great
The most common positive descriptors are 'good', 'best', and 'great'. We also see that 'free' is one of the top words used in highly positive tweets. It's great to see this as one of the top words, since NBC felt that having a free basic level of the service would help the platform break into the streaming service industry. Aside from the word 'free', none of the other words stand out, so we'll move on to highly negative tweets.
The most common negative descriptors are 'terrible' and 'stupid'. Notice that 'Roku' is the second most common word. 'Roku' presumably refers to the digital media player. A second notable word is 'NBCSportsSoccer'.
Let's take a closer look at tweets with the words 'Roku' and 'NBCSportsSoccer'.
Sample Roku Tweets:
We see that the word 'Roku' is common in tweets related to PeacockTV because the streaming service isn't available on Roku. NBC and Roku are still negotiating, so all hope isn't lost for Roku users.
Sample NBCSportsSoccer Tweets:
Streaming soccer games appears to be especially frustrating for Peacock users.
The initial response on Twitter to Peacock's launch is very positive, with almost 50% of all tweets about Peacock being positive and less than 20% being negative. The term 'free' is frequently mentioned, suggesting that NBC's theory was correct, that 'free as a bird' would resonate with Americans who are financially strapped during these uncertain times, or who are already subscribed to a lot of streaming services already and just don't want to subscribe to another.
In general, Roku is commonly found in tweets about Peacock, presumably about the fact that Peacock isn't available on Roku. Interestingly, Peacock isn't available on Amazon Fire TV either, but Amazon isn't mentioned as frequently. Roku is also frequently mentioned in highly negative tweets, so it isn't just Twitter users pointing out that the service isn't available on Roku or asking why it isn't, but rather expressing a strong negative sentiment towards this unavailability. In terms of building their user base most efficiently, this may be an indicator that completing negotiations with Roku may be more valuable than with Amazon.
Another area Peacock should look into is NBCSportsSoccer. The words/phrases "freezing", "slow" and "isn't loading" suggest there may be a streaming problem. It would be a good idea to check if this is related to streaming of the Premier League in general, or if it's a user specific technical issue.
Some Final Notes
I had a lot of fun working through this analysis. Honestly, I had a lot of questions going into the project and am left with even more after completing it. It's time to reflect.
Areas that could use some improvement:
Explore AppAuthHandler instead of OAuthHandler. Set count parameter to 100 (I read that the default value is 15). My understanding is that these changes would increase efficiency and could also result in a greater number of tweets retrieved.
Exclude all the words used in the twitter search. In my analysis, by mistake, I didn't remove them all (i.e. 'TV', 'streaming').
Review and revise all emoji descriptions to ensure an accurate description and/or reflect emotion, where relevant.
Example 1 (inaccurate description): I disagree with the description of 'face with tears of joy' for the second emoji on the list. I think 'laughing face' is more accurate. This is the only emoji description I revised in my analysis.
Example 2 (inaccurate description & reflect emotion): The 22nd emoji on the list is described as 'angry face'. The 23rd emoji on the list is described as 'pouting face'. Descriptors of 'angry' and 'extremely angry', respectively, may result in a more accurate polarity analysis, moving some tweets out of the neutral classification into positive or negative.
Analyses that would be interesting to investigate:
Mapping where the tweets are coming from at the state level.
Plotting the number of tweets per time period (e.g. 10 min, 30 min, hr, day, etc.). Combined with streaming, this would be great as an interactive plot.
Relevant topics I'd like to understand better:
How well does textblob.sentiment.subjectivity function actually perform? Are there better libraries available? How does it compare to libraries available in R?
A better understanding of regular expressions. I faced some difficulties using re.sub to successfully remove specific words (like Peacock's) and symbols.
Python. I use R in my university coursework and this is my first hands on experience with Python. I hope to find my own style and balance between R and Python.
My code is available at: https://github.com/nicoledear/peacockTVsentiment