17 topics were discussed in this analysis, matching the number of SDGs. The analysis showed that not all SDGs were discussed and that SDG 13 and 11 were present in several topics.
Tweets are distributed nearly equally amongst the 17 topics – except for topic 9 (cluster of retweeted tweets about climate change), which has many more tweets.
To explore the topics discussed in Twitter, we applied topic modelling. This is a technique that aims to identify patterns and relationships among text documents. For this research, we used Latent Dirichlet Allocation (LDA).
In our collection of tweets about Citizen Science we applied a filter based on keywords related to SDGs. Now, with the tweets related to SDGs ready we started with the Topic Modelling process. This process typically starts cleaning the tweets removing unnecessary elements such as punctuation, hashtag, links, prepositions… etc. We then applied a coherence analysis which allows for the correct selection of a number of topics. In this case we had a result of 17 topics under discussion among our filtered collection of tweets and we wanted to check if each topic corresponded to an SDG and which topics were more addressed by commentators in Twitter.
In Figure 1, we present a visualisation of the distances between the 17 topics, which shows that some of them overlap since they have words in common. Alongside the intertopic distance we can find the words in each topic and a short definition of the topic.
In order to define the size of these topics, we provide an analysis of the number of tweets connected with each topic. In Figure 2 the distribution of tweets per topic is presented.
The distribution is almost equal between all the topics, but topic number 9, in which the SDGs are discussed in a general way, has many more tweets than the rest. This high number of tweets in this topic is due to a user who retweets an elevated number of tweets about SDGs and climate change. With this method it is possible to detect unusual activity in the platform as previously mentioned.
After analysing all the topics, we can conclude that Climate Change is the most addressed subject in our data with topics 1, 3, 4, 5, 6, 9, 12, 13 and 15 directly addressing this theme. Furthermore, the keywords are clearly linked to SDGs 13 and 11 as they address the need to evolve to healthy and sustainable cities and communities. According to the lists of keywords, we can see several other SGDs reflected, for example: SDG3 (Health and wellbeing), SDG15 (Life on Earth), SDG12(Responsible consumption and production) and SDG7 (Affordable and clean energy).
Thanks to this technique, Data scientists involved in Citizen Science can identify the most relevant topics within their documents and also provide insights for academic investigations. It can also be used by policy makers wishing to know what is relevant to the Citizen Science community.
Annex
Click here to view the keywords of each topic.