Working Towards Sentiment Analysis
In our work investigating how people discuss the EU within Twitter one of our aims is to determine sentiment, both pro and anti the EU, and in relation to the referendum on UK membership of the EU – pro-remain or pro-leave.
The approach we have taken initially is very straightforward. If tweeters use hashtags associated with the leave camp (including; #brexit, #no2eu , #notoeu, #betteroffout, #voteout #eureform, #britainout, #leaveeu, #voteleave, #beleave, #loveeuropeleaveeu, #leaveeu) then we judge their sentiment to be pro-leave. If they use hashtags associated with the remain camp ( #yes2eu, #yestoeu, #betteroffin, #votein, #ukineu, #bremain, #strongerin, #leadnotleave, #voteremain) then their sentiment was pro-remain.
If we do this we get the following results:
https://twitter.com/myimageoftheEU/status/659357133897670656/photo/1
And we can see that these results are fairly consistent day by day:
https://twitter.com/myimageoftheEU/status/662613598703853568/photo/1
But they are not consistent with polling data – such as the ICM tracker below which shows remain with the higher score. So why is that?
https://twitter.com/SamCoatesTimes/status/660791243694333953/photo/1
Firstly – we need to remember that research using Twitter data can only ever tell us what people who use Twitter think and does not necessarily reflect the population at large. Interestingly this has been a problem for pollsters trying to include this data in their election prediction polls (Metaxas, Mustafaraj & Gayo-Avello).
Secondly, we also need to remember that, as we discussed in a previous post, people tend to tweet against things rather than for them.
Thirdly, we are currently restricting our analysis to sentiment associated with hashtags. This means that if the tweet doesn’t have a hashtag we are not analysing it’s sentiment.
If we take a look at the data we can see some differences. There is a difference in style between the way that tweets and in particular hashtags are used between both camps. The leave camps tend to use many hashtags – see below for examples from both LEAVE.EU and Vote Leave – especially LEAVE.EU.
Congratulations to Portugal's new anti-austerity government – Brussels Beware https://t.co/YmRhfDRQpD #EUref #Brexit #Austerity
— Leave.EU (@LeaveEUOfficial) November 11, 2015
Leaving the EU 'won't undermine the UK as a financial centre' – Axel Weber, Chairman of UBS #euref #voteleave pic.twitter.com/LW1tqxc9cA
— Vote Leave (@vote_leave) November 10, 2015
LEAVE.EU use hashtags in their twitter bio – this may encourage followers to use them.
The remain camp do not tend to use as many hashtags in their posts – for example:
https://twitter.com/StrongerIn/status/664143550305722368
Tweets that would be considered pro-remain often also include hashtags we have classified as pro-leave. This could be for several factors such as positioning themselves within the debate by using a popular hashtag or trying to talk to those which hold opposite views.
U think being part of EU holds back trade with rest of world?
Think again:https://t.co/qIZEOSqk9p#StrongerIn #Brexit #EUref @euromove
— Richard Corbett (@RichardGCorbett) November 5, 2015
We can see this by looking at the data. When we look at hashtags that are used in conjunction with #strongerin and #brexit (graphs below). We find, overall, a much lower use of #strongerin and we find that it is used with leave hashtags especially #brexit, #leaveEU, #voteLeave. Where as #brexit is used not used with remain hashtags at all but with other leave hashtags.
hashtags associated with #strongerin in our data. Much lower use #brexit and often used in with leave hashtags pic.twitter.com/7hpQMuzR7V
— myimageoftheEU (@myimageoftheEU) November 6, 2015
terms associated with #brexit in our dataset pic.twitter.com/YcUbl4PLGe
— myimageoftheEU (@myimageoftheEU) November 6, 2015
In the future we aim to do a more sophisticated form of sentiment analysis where we analyse the text with the tweets. This leads on to a final problem that is often discussed in association with sentiment analysis of text, and one that we also see here, which is identifying the target of the sentiment. The target is the item that the sentiment is expressed towards. In the tweet by Richard Corbett MEP above the ‘leave’ sentiment is expressed towards the ‘U think being part of EU holds back trade with rest of world? ‘ and the ‘remain’ sentiment is expressed within ‘Think again’. Automatically identifying the text that the target is associated with is not always easy. This is not an easy issue to tackle and we have’t even begun to discuss jokes and sarcasm yet!
So you might be tempted to think that if this data is not representative of the general public why are we looking at it? What we can look at is how this data changes. If we can identify the differences and track the relationships between Twitter data and a more general public opinion we can start to hypothesise about how changes in the Twitter data equate to public opinion more widely. We’ll talk about this a lot more in the future.
Our project is part of the Economic and Social Research Council’s UK in a Changing Europe programme. Look out for our regular updates as the project tracks developments in the debate on the UK’s continued membership of the EU and follow us @myimageoftheEU.
This article was originally published on the imagineEurope Storify.