Automatic Indexing for Agriculture: Designing a Framework by Deploying Agrovoc, Agris and Annif

Mustak Ahmed

doi:10.17821/srels/2023/v60i2/170966

Automatic Indexing for Agriculture: Designing a Framework by Deploying Agrovoc, Agris and Annif

Authors

Mustak Ahmed
SRF, Department of Library and Information Sc, Kalyani University, WB

DOI:

https://doi.org/10.17821/srels/2023/v60i2/170966

Keywords:

Agriculture, Annif, Automatic Subject Indexing, Ensemble, Neural Network, Openrefine, Subject Indexing

Abstract

There are several ways to employ machine learning for automating subject indexing. One popular strategy is to utilize a supervised learning algorithm to train a model on a set of documents that have been manually indexed by subject matter using a standard vocabulary. The resulting model can then predict the subject of new and previously unseen documents by identifying patterns learned from the training data. To do this, the first step is to gather a large dataset of documents and manually assign each document a set of subject keywords/descriptors from a controlled vocabulary (e.g., from Agrovoc). Next, the dataset (obtained from Agris) can be divided into – i) a training dataset, and ii) a test dataset. The training dataset is used to train the model, while the test dataset is used to evaluate the model's performance. Machine learning can be a powerful tool for automating the process of subject indexing. This research is an attempt to apply Annif (http://annif. org/), an open-source AI/ML framework, to autogenerate subject keywords/descriptors for documentary resources in the domain of agriculture. The training dataset is obtained from Agris, which applies the Agrovoc thesaurus as a vocabulary tool (https://www.fao.org/agris/download).

Downloads

Download data is not yet available.

Downloads

PDF ³⁸⁵

Published

2023-05-13

How to Cite

Ahmed, M. (2023). Automatic Indexing for Agriculture: Designing a Framework by Deploying Agrovoc, Agris and Annif. Journal of Information and Knowledge, 60(2), 85–95. https://doi.org/10.17821/srels/2023/v60i2/170966

Download Citation

Issue

Volume 60, Issue 2, April 2023

Section

Articles

License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

All the articles published in Journal of Information and Knowledge are held by the Publisher. Sarada Ranganathan Endowment for Library Science (SRELS), as a publisher requires its authors to transfer the copyright prior to publication. This will permit SRELS to reproduce, publish, distribute and archive the article in print and electronic form and also to defend against any improper use of the article.

References

Ahmed, M., Mukhopadhyay, M. and Mukhopadhyay, P. (2023). Automated knowledge organization: AI/ML-based subject indexing system for libraries. DESIDOC Journal of Library and Information Technology, 43(01), 45-54. https://doi.org/10.14429/ djlit.43.01.18619 DOI: https://doi.org/10.14429/djlit.43.01.18619

Aizawa, A. (2003). An information-theoretic perspective of tf-idf measures. Information Processing and Management, 39(1), 45-65. https://doi.org/10.1016/ S0306-4573(02)00021-3 DOI: https://doi.org/10.1016/S0306-4573(02)00021-3

Anderson, J. D. and Pérez-Carballo, J. (2001). The nature of indexing: How humans and machines analyze messages and texts for retrieval. Part II: Machine indexing, and the allocation of human versus machine effort. Information Processing and Management, 37(2), 255- 77. https://doi.org/10.1016/S0306-4573(00)00046-7 DOI: https://doi.org/10.1016/S0306-4573(00)00046-7

Benos, L., Tagarakis, A. C., Dolias, G., Berruto, R., Kateris, D. and Bochtis, D. (2021). Machine Learning in Agriculture: A comprehensive updated review. Sensors, 21(11), 3758. https://doi.org/10.3390/s21113758 PMid:34071553 PMCid:PMC8198852 DOI: https://doi.org/10.3390/s21113758

Borlund, P. (2003). The concept of relevance in IR. Journal of the American Society for Information Science and Technology, 54(10), 913-925. https://doi. org/10.1002/asi.10286 DOI: https://doi.org/10.1002/asi.10286

Celli, F. and Keizer, J. Enabling multilingual search through controlled vocabularies: The AGRIS approach. In 10th International Conference, MTSR 2016, 22-25 November 2016, Göttingen, Germany, edited by E. Garoufallou, I. Subirats Coll, A. Stellato, and J. Greenberg, 2016, Metadata and Semantics Research, 672, pp. 237- 248. https://doi.org/10.1007/978-3-319-49157-8_21 DOI: https://doi.org/10.1007/978-3-319-49157-8_21

Frank, E. and Paynter, G. W. (2004). Predicting Library of Congress classifications from Library of Congress subject headings. Journal of the American Society for Information Science and Technology, 55(3), 214-27. https://doi.org/10.1002/asi.10360 DOI: https://doi.org/10.1002/asi.10360

Golub, K. (2021). Automated subject indexing: An overview. Cataloging and Classification Quarterly, 59(8), 702-19. https://doi.org/10.1080/01639374.2021.2012311 DOI: https://doi.org/10.1080/01639374.2021.2012311

Golub, K., Soergel, D., Buchanan, G., Tudhope, D., Lykke, M. and Hiom, D. (2016). A framework for evaluating automatic indexing or classification in the context of retrieval. Journal of the Association for Information Science and Technology, 67(1), 3-16. https://doi. org/10.1002/asi.23600 DOI: https://doi.org/10.1002/asi.23600

Hahn, J. (2021). Semi-automated methods for bibframe work entity description. Cataloging and Classification Quarterly, 59(8), 853-867. https://doi.org/10.1080/0163 9374.2021.2014011 DOI: https://doi.org/10.1080/01639374.2021.2014011

Hahn, J. (2022). Cataloger acceptance and use of semiautomated subject recommendations for web scale linked data systems. IFLA WLIC, 2022. 10. Available from: https:// repository.ifla.org/bitstream/123456789/1955/1/062- hahn-en.pdf

Handler, A., Denny, M., Wallach, H. and O’Connor, B. (2016). Bag of what? Simple noun phrase extraction for text analysis. In EMNLP Workshop on Natural Language Processing and Computational Social Science, 5 November 2016, Austin, TX, pp. 114-124. https://doi. org/10.18653/v1/W16-5615 DOI: https://doi.org/10.18653/v1/W16-5615

Hillard, D., Purpura, S. and Wilkerson, J. (2008). Computer-assisted topic classification for mixedmethods social science research. Journal of Information Technology and Politics, 4(4), 31-46. https://doi.org/10.1080/19331680801975367 DOI: https://doi.org/10.1080/19331680801975367

Huang, X. and Soergel, D. (2013). Functional relevance and inductive development of an e-retailing product information typology. Information Research, 18(2). Available from: https://informationr.net/ir/18-2/ paper574.html

ISO. (1985). ISO 5963:1985, Documentation-methods for examining documents, determining their subjects, and selecting indexing terms. Available from: https:// www.iso.org/obp/ui/#iso:std:iso:5963:ed-1:v1:en

Joorabchi, A. and E. Mahdi, A. (2013). Classification of scientific publications according to library controlled vocabularies: A new concept matching-based approach. Library Hi Tech, 31(4), 725-747. https://doi. org/10.1108/LHT-03-2013-0030 DOI: https://doi.org/10.1108/LHT-03-2013-0030

Junger, U. (2018). Automation first- The subject cataloguing policy of the Deutsche Nationalbibliothek. Available from: http://library.ifla.org/id/eprint/2213/

Lin, S.-C., Yang, J.-H., Nogueira, R., Tsai, M.-F., Wang, C.-J. and Lin, J. (2021). Multi-stage conversational passage retrieval: An approach to fusing term importance estimation and neural query rewriting (arXiv:2005.02230). arXiv. Available from: http://arxiv.org/abs/2005.02230 https://doi.org/10.1145/3446426 DOI: https://doi.org/10.1145/3446426

Martín-Moncunill, D., Sicilia-Urban, M. A., García- Barriocanal, E. and Stracke, C. M. (2017). Evaluating the concept specialization distance from an end-user perspective: The case of AGROVOC. Online Information Review, 41(6), 860-876. https://doi.org/10.1108/OIR-03- 2016-0094 DOI: https://doi.org/10.1108/OIR-03-2016-0094

Misra, N. N., Dixit, Y., Al-Mallahi, A., Bhullar, M. S., Upadhyay, R. and Martynenko, A. (2022). IoT, big data, and artificial intelligence in agriculture and food industry. IEEE Internet of Things Journal, 9(9), 6305-6324. https://doi.org/10.1109/JIOT.2020.2998584 DOI: https://doi.org/10.1109/JIOT.2020.2998584

Möller, G., Carstensen, K., Diekmann, B. and Wätjen, H. (1999). Automatic classification of the worldwide web using the universal decimal classification. Available from: https://www.semanticscholar.org/paper/ Automatic-Classification-of-the-World-Wide-Web-the- M%C3%B6ller-Carstensen/fb9f0675dd18608dc57244a9 34a552220183f34c

Mukhopadhyay, P. (2022). How green is my valley? Measuring open access friendliness of Indian Institutes of Technology (IITs) through data carpentry. In Panorama of Open Access: Progress, Practices and Prospects; pp. 67-89. Ess Ess. https://doi.org/10.5281/zenodo.6511080

Mukhopadhyay, P., Mitra, R. and Mukhopadhyay, M. (2021). Library carpentry: Towards a new professional dimension (Part I - Concepts and Case Studies). Journal of Information and Knowledge (Formerly SRELS Journal of Information Management), 58(2), 67-80. https://doi. org/10.17821/srels/2021/v58i2/159969 DOI: https://doi.org/10.17821/srels/2021/v58i2/159969

National Agricultural Library. (2014). NFAIS webinar: Automated indexing: A case study from the National Agricultural Library | ISSN. Available from: https:// www.issn.org/newsletter_issn/nfais-webinar- automatedindexing- a-case-study-from-the-national-agriculturallibrary/

National Library of Medicine (NLM). (2002). NLM Medical Text Indexer (MTI). Available from: https:// lhncbc.nlm.nih.gov/ii/tools/MTI.html

Oliver, C. (2021). Leveraging KOS to extend our reach with automated processes. Cataloging and Classification Quarterly, 59(8), 868-874. https://doi.org/10.1080/0163 9374.2021.2023717 DOI: https://doi.org/10.1080/01639374.2021.2023717

Purpura, S. and Hillard, D. (2006). Automated classification of congressional legislation. In 2006 National Conference on Digital Government Research, 21-24 May, 2006, San Diego California USA; pp. 219-225. https://doi.org/10.1145/1146598.1146660 DOI: https://doi.org/10.1145/1146598.1146660

Rayhana, R., Xiao, G. and Liu, Z. (2020). Internet of things empowered smart greenhouse farming. IEEE Journal of Radio Frequency Identification, 4(3), 195- 211. https://doi.org/10.1109/JRFID.2020.2984391 DOI: https://doi.org/10.1109/JRFID.2020.2984391

Roitblat, H. L., Kershaw, A. and Oot, P. (2010). Document categorization in legal electronic discovery: Computer classification vs. manual review. Journal of the American Society for Information Science and Technology, 61(1), 70-80. https://doi.org/10.1002/asi.21233 DOI: https://doi.org/10.1002/asi.21233

Salisbury, L. and Smith, J. J. (2014). Building the AgNIC Resource Database Using Semi-Automatic Indexing of Material. Journal of Agricultural and Food Information, 15(3), 159-176. https://doi.org/10.1080/10496505.2014. 919805 DOI: https://doi.org/10.1080/10496505.2014.919805

Salton, G. and McGill, M. J. (1983). Introduction to modern information retrieval. New York: McGraw-Hill.

Salton, G., Wong, A. and Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613-620. https://doi.org/10.1145/ 361219.361220 DOI: https://doi.org/10.1145/361219.361220

Saracevic, T. (2007). Relevance: A review of the literature and a framework for thinking on the notion in information science. Part II: Nature and manifestations of relevance. Journal of the American Society for Information Science and Technology, 58(13), 1915- 1933. https://doi.org/10.1002/asi.20682 DOI: https://doi.org/10.1002/asi.20682

Scorpion. (2022). OCLC. Available from: https://www. oclc.org/research/activities/scorpion.html

Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1-47. https://doi.org/10.1145/505282.505283 DOI: https://doi.org/10.1145/505282.505283

Shafer, K. E. (2001). Automatic subject assignment via the scorpion system. Journal of Library Administration, 34(1- 2), 187-189. https://doi.org/10.1300/J111v34n01_28 DOI: https://doi.org/10.1300/J111v34n01_28

Silvester, J. P. (1997). Computer supported indexing: A history and evaluation of NASA’s MAI System. Encyclopedia of Library and Information Science, 61. Available from: https://ntrs.nasa.gov/citations/19980010465

Sood, A., Sharma, R. K. and Bhardwaj, A. K. (2021). Artificial intelligence research in agriculture: A review. Online Information Review, 46(6), 1054-1075. https:// doi.org/10.1108/OIR-10-2020-0448 DOI: https://doi.org/10.1108/OIR-10-2020-0448

Suominen, O. (2019). Annif: DIY automated subject indexing using multiple algorithms. LIBER Quarterly: The Journal of the Association of European Research Libraries, 29(1). https://doi.org/10.18352/lq.10285 DOI: https://doi.org/10.18352/lq.10285

Suominen, O., Inkinen, J. and Lehtinen, M. (2022). Annif and Finto AI: Developing and Implementing Automated Subject Indexing. JLIS.It, 13(1). https://doi.org/10.4403/ jlis.it12740

Svarre, T. and Lykke, M. (2014). Experiences with automated categorization in E-Government Information Retrieval. Knowledge Organization, 41, 76-84. https:// doi.org/10.5771/0943-7444-2014-1-76 DOI: https://doi.org/10.5771/0943-7444-2014-1-76

Talaviya, T., Shah, D., Patel, N., Yagnik, H. and Shah, M. (2020). Implementation of artificial intelligence in agriculture for optimisation of irrigation and application of pesticides and herbicides. Artificial Intelligence in Agriculture, 4, 58-73. https://doi.org/10.1016/j. aiia.2020.04.002 DOI: https://doi.org/10.1016/j.aiia.2020.04.002

Thomas, R. L. and Uminsky, D. (2022). Reliance on metrics is a fundamental challenge for AI. Patterns, 3(5), 100476. https://doi.org/10.1016/j.patter.2022.100476 PMid:35607624 PMCid:PMC9122957 DOI: https://doi.org/10.1016/j.patter.2022.100476

Ünal, Z. (2020). Smart farming becomes even smarter with deep learning- a bibliographical analysis. IEEE Access, 8, 105587-609. https://doi.org/10.1109/ACCESS. 2020.3000175 DOI: https://doi.org/10.1109/ACCESS.2020.3000175

Willis, C. and Losee, R. M. (2013). A random walk on an ontology: Using thesaurus structure for automatic subject indexing: A random walk on an ontology: Using thesaurus structure for automatic subject indexing. Journal of the American Society for Information Science and Technology, 64(7), 1330-44. https://doi.org/10.1002/ asi.22853 DOI: https://doi.org/10.1002/asi.22853

Wu, H. C., Luk, R. W. P., Wong, K. F. and Kwok, K. L. (2008). Interpreting TF-IDF term weights as making relevance decisions. ACM Transactions on Information Systems, 26(3), 13:1-13:37. https://doi. org/10.1145/1361684.1361686 DOI: https://doi.org/10.1145/1361684.1361686

Young, L. and Soroka, S. (2012). Affective News: The Automated Coding of Sentiment in Political Texts. Political Communication, 29(2): 205-231. https://doi. org/10.1080/10584609.2012.671234 DOI: https://doi.org/10.1080/10584609.2012.671234

Zhang, Z., Liu, H., Meng, Z. and Chen, J. (2019). Deep learning-based automatic recognition network of agricultural machinery images. Computers and Electronics in Agriculture, 166, 104978. https://doi.org/10.1016/j. compag.2019.104978 DOI: https://doi.org/10.1016/j.compag.2019.104978

Automatic Indexing for Agriculture: Designing a Framework by Deploying Agrovoc, Agris and Annif

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

License

References

Make Submission

Authors Corner

Template

Our Journals

Editorial Team

Chief Editor

Announcements

Thanks to Authors for Publishing their articles as Open Access

backpage

Subscription

Keywords