How are tweets distributed for each Sustainable Development Goal within the Citizen Science Twitter community?

by | Jul 8, 2022 | Graphical article, Highlighted, Impact & Reach, Internal Content

  Reading Time: 3 minutes
In this article, we analysed linkages between tweets and SDGs in our dataset by means of classification algorithms. In addition, the network of retweets for each SDG is provided.

SDG 13, which addresses the need for urgent action to combat climate change and its impact, is the SDG with the highest number of tweets assigned, and therefore is the most addressed SDG.

SDG 13 is subject to the highest number of retweets.

After applying Topic Modelling and discovering that Climate Change was the most addressed topic, we wanted to go a step further and check whether we could link or relate each tweet to a certain SDG. To accomplish this, we trained a BERT based classifier. BERT is a classification algorithm which is designed to assign categories to different texts according to the words present in them. For example, if within a tweet we find words such as SDGs, climate change, climate action… the algorithm will classify it as linked to SDG13.

The first step is training the algorithm. This training is carried out in order to teach the algorithm which words belong to each category. Once the model was trained, we performed the analysis in our dataset and thus obtained the classification represented in Figure 1. As can be observed, SDG 13 has the highest number of assigned tweets. This result leads us to conclude that Climate action is then the most discussed Goal in our set of documents, which matches the results from the Topic Modelling.

Figure 1: Tweets assigned to each SDG.

With all the tweets assigned to an SDG, we selected only the retweets. These retweets are marked as “RT @UserName” in our data allowing us to distinguish between normal tweets and retweets. Once the retweets were selected, we extracted the names of the users who retweet, and the tweets they retweeted to create a network. The goal of this process is to study and unveil the connections between the users within our set of documents.

In this network, we had a total of 12,144 nodes (7,916 users and 4,228 tweets). To provide a clean visualisation of the network we removed those users with a low number of connections. The resulting graph contains 2,261 nodes, in Figure 2 the small blue nodes correspond to users and the other coloured nodes are the tweets. Each colour of the big nodes represents an SDG. What this network can tell us is that those tweets about SDG13 (green) are the most retweeted ones, so the SDG13 it is not only the one that users talk more about but is also the one that is most retweeted.

Figure 2: Graph of retweeted tweets with SDG classification by colour.

The classification with BERT is a new technique with a great performance when classifying. Once more, this study shows how such innovative approaches can provide useful insights that can translate into information about trends in social media for policy making and also as interesting advances that can lead towards more research avenues.

Read more like this article

Help us disseminate our research results

Be the first to receive updates about CS Track project results, opinion pieces and News&Events related to Citizen Science.

Email Address
Twitter Account

Subscribe to our newsletter

Subscribe to the newsletter