Categorisation of Indian Research Publications by Sustainable Development Goals (SDGs): A Machine Learning Approach
DOI:
https://doi.org/10.17821/srels/2024/v61i6/171637Keywords:
AI, Annif, Machine Learning, Neural Network, SDG Classifiers, Sustainable Development Goals (SDGs)Abstract
This research explores the feasibility of automatically clustering and categorising Indian research output in terms of the seventeen categories of Sustainable Development Goals (SDGs) proposed by the United Nations (sdgs.un.org/goals). Utilising the OpenAlex database (openalex.org), an extensive open-access bibliographic repository containing over 250 million research objects, this study gathered research publications from India (with at least one Indian author) published during 2016 (the year of inception of the SDG categories) - 2023. OpenAlex’s comprehensive metadata includes SDG data elements for numerous publications, alongside other bibliographic information. The SDG classifications and corresponding accuracy scores (ranging from 0 to 1) in OpenAlex are derived from the Aurora SDG classifier (aurora-universities. eu/ sdg-research/). Initially, this study compiled a dataset of 500,000 research publications originating from India within the specified period, of which approximately 278,845 records contained SDG data elements. This primary dataset was divided into two subsets: the All Dataset (ADS), encompassing all records (278,845 records), and the High Accuracy Dataset (HAD), consisting of records with an SDG accuracy score of ≥0.5 (204,793 records). Both datasets (ADS and HAD) were further segmented into three groups: the training dataset (97% of total data), the validation dataset (1.5% of total data), and the test dataset (1.5% of total data). The training dataset was utilised to train the locally implemented open-source automated indexing tool, Annif, using various backends, including lexical, associative, and ensemble (simple) methods. The validation dataset was employed to develop an optimal weightage formula for combining results from different backends in an advanced-level neural network backend. One significant advantage of the neural network backend is its capacity for successive training with additional datasets. It was found that: (i) automatic categorisation of research publications by SDGs is feasible; (ii) the neural network-based backend outperformed other backends, such as SVC, Omikuji, and FastText, in terms of retrieval metrics like F1@5 and NDCG; (iii) lexical models are unsuitable for this purpose, performing poorly in terms of F1@5 and NDCG; and (iv) the neural network-based backend also outperformed other backends for the High Accuracy Dataset.
Downloads
Metrics
References
Ahmed, M., Mukhopadhyay, M., & Mukhopadhyay, P. (2023). Automated knowledge organization: Machine learning based subject indexing system for libraries. DESIDOC Journal of Library and Information Technology, 43(01), 45-54. https://doi.org/10.14429/djlit.43.01.18619
Boron, S., & Murray, K. (2004). Bridging the unsustainability gap: A framework for sustainable development. Sustainable Development, 12(2), 65-73. https://doi.org/10.1002/sd.231
Chan, H. C., Guness, V., & Kim, H.-W. (2015). A method for identifying journals in a discipline: An application to information systems. Information and Management, 52(2), 239-246. https://doi.org/10.1016/j.im.2014.11.003
Castro, G. D, Fe.rnández, M. C. G., & Colsa, Á. U. (2021). Unleashing the convergence amid digitalization and sustainability towards pursuing the Sustainable Development Goals (SDGs): A holistic review. Journal of Cleaner Production, 280, Article 122204. https://doi.org/10.1016/j.jclepro.2020.122204
Golub, K., Suominen, O., Mohammed, A. T., Aagaard, H., & Osterman, O. (2024). Automated Dewey decimal classification of Swedish library metadata using Annif software. Journal of Documentation, 80(5). https://doi.org/10.1108/JD-01-2022-0026
Huang, Y., Glänzel, W., & Zhang, L. (2021). Tracing the development of mapping knowledge domains. Scientometrics, 126(7), 6201-6224. https://doi.org/10.1007/s11192-020-03821-x
Kostetckaia, M., & Hametner, M. (2022). How sustainable development goals interlinkages influence European union countries’ progress towards the 2030 agenda. Sustainable Development, 30(5), 916-926. https://doi.org/10.1002/sd.2290
Kowsari, K., Meimandi, K. J., Heidarysafa, M., Mendu, S., Barnes, L., & Brown, D. (2019). Text classification algorithms: A survey. Information, 10(4), Article 4. https://doi.org/10.3390/info10040150
LaFleur, M. T. (2019). Art is long, life is short: An SDG classification system for DESA publications (UN Department of Economic and Social Affairs (DESA) Working Papers 159). https://doi.org/10.18356/312b6e49-en
LaFleur, M. T. (2023). Using large language models to help train machine learning SDG classifiers (UN Department of Economic and Social Affairs (DESA) Working Papers DESA working paper no. 180). https://doi.org/10.18356/25206656-180
Li, Y. (2024). How do capacity-based national development planning play a role in achieving integrated development goals? — A quantitative comparative study in the context of China’s five-year plan. Public Organization Review, 24(2), 593-612. https://doi.org/10.1007/s11115-02200683-0
Mukhopadhyay, P. (2023). Machine learning and bibliographic data universe: Assessing efficacy of backend algorithms in Annif through retrieval metrics. SRELS Journal of Information Management, 60(1), 39-48. https://doi.org/10.17821/srels/2023/v60i1/170891
SDSN Australia/Pacific. (2017). Getting started with the SDGs in universities: A guide for universities, higher education institutions, and the academic sector. Australia, New Zealand and Pacific Edition. Sustainable Development Solutions Network - Australia/Pacific, Melbourne. https://ap-unsdsn.org/wp-content/uploads/University-SDGGuide_ web.pdf
Stanberry, J., & Balda, J. B. (2024). A conceptual review of sustainable development goal 17: Picturing politics, proximity and progress. Journal of Tropical Futures: Sustainable Business, Governance and Development, 1(1), 110-139. https://doi.org/10.1177/27538931231170509
Yin, H., Aryani, A., Lambert, G., White, M., Salvador-Carulla, L., Sadiq, S., … Tham, W. W. (2023). Leveraging artificial intelligence technology for mapping research to sustainable development goals: A case study. https://doi.org/10.48550/ARXIV.2311.16162
Zhong, J., & Li, X. (2022). Interlinkages among county level construction indicators and related sustainable development goals in China. Land, 11(11), 2008. https://doi.org/10.3390/land11112008
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Journal of Information and Knowledge
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
All the articles published in Journal of Information and Knowledge are held by the Publisher. Sarada Ranganathan Endowment for Library Science (SRELS), as a publisher requires its authors to transfer the copyright prior to publication. This will permit SRELS to reproduce, publish, distribute and archive the article in print and electronic form and also to defend against any improper use of the article.