SDG 13, which addresses the need for urgent action to combat climate change and its impact, is the SDG with the highest number of tweets assigned, and therefore is the most addressed SDG.
SDG 13 is subject to the highest number of retweets.
After applying Topic Modelling and discovering that Climate Change was the most addressed topic, we wanted to go a step further and check whether we could link or relate each tweet to a certain SDG. To accomplish this, we trained a BERT based classifier. BERT is a classification algorithm which is designed to assign categories to different texts according to the words present in them. For example, if within a tweet we find words such as SDGs, climate change, climate action… the algorithm will classify it as linked to SDG13.
The first step is training the algorithm. This training is carried out in order to teach the algorithm which words belong to each category. Once the model was trained, we performed the analysis in our dataset and thus obtained the classification represented in Figure 1. As can be observed, SDG 13 has the highest number of assigned tweets. This result leads us to conclude that Climate action is then the most discussed Goal in our set of documents, which matches the results from the Topic Modelling.
With all the tweets assigned to an SDG, we selected only the retweets. These retweets are marked as “RT @UserName” in our data allowing us to distinguish between normal tweets and retweets. Once the retweets were selected, we extracted the names of the users who retweet, and the tweets they retweeted to create a network. The goal of this process is to study and unveil the connections between the users within our set of documents.
In this network, we had a total of 12,144 nodes (7,916 users and 4,228 tweets). To provide a clean visualisation of the network we removed those users with a low number of connections. The resulting graph contains 2,261 nodes, in Figure 2 the small blue nodes correspond to users and the other coloured nodes are the tweets. Each colour of the big nodes represents an SDG. What this network can tell us is that those tweets about SDG13 (green) are the most retweeted ones, so the SDG13 it is not only the one that users talk more about but is also the one that is most retweeted.
The classification with BERT is a new technique with a great performance when classifying. Once more, this study shows how such innovative approaches can provide useful insights that can translate into information about trends in social media for policy making and also as interesting advances that can lead towards more research avenues.