Full-Text eBook Retrieval System in an Integrated Library Management System: Designing an Open- Source Technical Framework

Authors

DOI:

https://doi.org/10.17821/srels/2025/v62i4/171706

Keywords:

eBook Management, eBook Retrieval, Full Text Retrieval, Koha ILS, Library Discovery System, VuFind

Abstract

The study introduces an eBook retrieval architecture designed to retrieve both metadata and full-text content. At its core, Koha serves as the Integrated Library System (ILS) at the back end, while VuFind provides the discovery layer at the front end. Metadata retrieval is based on the MARC 21 bibliographic format, with full-text links stored in tag 856. A key component is the full-text retrieval system, which utilises the open-source Apache Tika to analyse full-text objects. Extracted words and phrases are then sent to Apache-Solr for indexing. The architecture operates across three layers, with Koha managing the backend, Apache-Tika processing the full-text in the middle, and VuFind, powered by Apache-Solr, facilitating search and discovery at the front end. This structure enables efficient retrieval of both metadata and full-text, streamlining the process of locating eBook content through word and phrase search.

Downloads

Download data is not yet available.

Published

2025-08-01

How to Cite

Barman, D., Dutta, A., & Mukhopadhyay, P. (2025). Full-Text eBook Retrieval System in an Integrated Library Management System: Designing an Open- Source Technical Framework. Journal of Information and Knowledge, 62(4), 223–234. https://doi.org/10.17821/srels/2025/v62i4/171706

Issue

Section

Articles

References

P., & Brogan, M. P. (2012). Scholarly use of e-books in a virtual academic environment: A case study. Australian Academic and Research Libraries, 43(3), 189-213. https://doi.org/10.1080/00048623.2012.10722277

Ali, S., Habes, M., Youssef, E., & Adwan, M. N. A. (2021). A cross-sectional analysis of digital library acceptance, and dependency during COVID-19. International Journal of Computing and Digital Systems, 10(1), 1415-1425. https://doi.org/10.12785/ijcds/1001125

Amirah, S. A. S. N., Sa’adah, M. N. N., & Aini, N. Y. N. (2023). A study on the implementation of Koha cataloguing module in Malaysian academic libraries. Malaysian Journal of Library and Information Science, 28(1), 69-87. https://doi.org/10.22452/mjlis.vol28no1.5

Anuradha, K. T., Sivakaminathan, R., & Kumar, P. A. (2011). Open‐source tools for enhancing full‐text searching of OPACs. Program Electronic Library and Information Systems, 45(2), 231-239. https://doi.org/10.1108/00330331111129750

Appasaheb, N., & Prafull, M. (2024). Role of web-scale discovery in the effective use of electronic resources. Journal of Indian Library Association, 60(1), 24-37.

Araya, N. T. W. (2020). Designing web-based library management system. International Journal of Engineering Research and Applications, 9(10), 272-277. https://doi.org/10.17577/IJERTV9IS100131

Ardhana, V. Y. P., Sapi’i, M., Hasbullah, H., & Sampetoding, E. A. M. (2022). Web-based library information system using Rapid Application Development (RAD) method at Qamarul Huda University. International Journal of Informatics and Computer Science, 6(1). https://doi.org/10.30865/ijics.v6i1.4031

Aruleba, K. D., Akomolafe, D. T., & Afeni, B. (2016). A full text retrieval system in a digital library environment. Intelligent Information Management, 8(1), 1-8. https://doi.org/10.4236/iim.2016.81001

Attwell, A. (2024). E-book. Encyclopedia Britannica. https://www.britannica.com/technology/e-book (accessed 12 September 2024).

Avery, J. M. (2016). Implementing an open source Integrated Library System (ILS) in a special focus institution. Digital Library Perspectives, 32(4), 287-298. https://doi.org/10.1108/DLP-02-2016-0003

Azimjonov, J., & Alikhanov, J. (2018). Rule based metadata extraction framework from academic articles. arXiv Preprint arXiv:1807.09009.

Barman, D., & Mukhopadhyay, P. (2018). Library discovery system in Bengali script: An experiment with VuFind. Journal of Advancements in Library Sciences, 5(2), 20-26.

Blanchy, G., Albrecht, L., Koestel, J., & Garré, S. (2023). Potential of natural language processing for metadata extraction from environmental scientific publications. SOIL, 9(1), 155-168. https://doi.org/10.5194/soil-9-155-2023

Boyd, E. (2020). Implementing an open source catalog in a consortial environment. ATLA Summary of Proceedings, pp.198-207. https://doi.org/10.31046/proceedings.2020.1859

Breuer, T., Voorhees, E. M., & Soboroff, I. (2024). Browsing and searching metadata of TREC. SIGIR ’24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery, New York, United States of America. https://doi.org/10.1145/3626772.3657873

Browne, G., & Coe, M. (2012). ebook navigation: Browse, search and index. The Australian Library Journal, 61(4), 288-297. https://doi.org/10.1080/00049670.2012.10739062

Chang, Y.-S., Ho, M.-H., & Yuan, S.-M. (2001). A unified interface for integrating information retrieval. Computer Standards and Interfaces, 23(4), 325-340. https://doi.org/10.1016/S0920-5489(01)00083-6

Dai-Wen, W. (2011). Design and implementation of fulltext retrieval system in website based on lucene. Modern Electronic Technology, 3(6), 42-44.

Daugherty, A. L. (2019). Migrating to full text finder: A case study. The Serials Librarian, 77(3-4), 113-123. https://doi.org/10.1080/0361526X.2019.1699489

Deodato, J. (2015). Evaluating web-scale discovery services: A step-by-step guide. Information Technology and Libraries, 34(2), 19-75. https://doi.org/10.6017/ital.v34i2.5745

Dutta, B., & Bhuvaneshwari, V. (2024). Towards standardizing the library circulation metadata. Communications in Computer and Information Science, pp. 117-131. https://doi.org/10.1007/978-3-031-65990-4_10

Dutta, A. & Mukhopadhyay, P. (2022). Towards unified retrieval system for GLAM institutions in India: Designing a prototype for biblio-cultural information space. Annals of Library and Information Studies, 61(1), 75-91. https://doi.org/10.56042/alis.v69i1.58292

Foust, J. E., Bergen, P., Maxeiner, G. L., & Pawlowski, P. N. (2007). Improving e-book access via a library-developed full-text search tool. PubMed, 95(1), 40-45.

Frederick, D. E. (2016). Understanding eBooks, metadata, and managing metadata. Elsevier eBooks, pp. 1-10. https://doi.org/10.1016/B978-0-08-100151-6.00001-9 PMid:26857488

Gao, R., Li, D., Li, W., & Dong, Y. (2012). Application of full text search engine based on Lucene. Advances in Internet of Things, 2(4), 106-109. https://doi.org/10.4236/ait.2012.24013

Garg, M., Hasan, N., & Gupta, A. (2023). Implementation of Koha in managing the e-resources of the library. Journal of Information and Knowledge, 60(2), 127-132. https://doi.org/10.17821/srels/2023/v60i2/170972

Golub, K. (2019). Automatic subject indexing of Text. Knowledge Organization, 46(2), 104-121. https://doi.org/10.5771/09437444-2019-2-104

Jaffy, M. (2020). Bento box user experience study at Franklin University. Information Technology and Libraries, 39(1). https://doi.org/10.6017/ital.v39i1.11581

Khatun, A., & Ahmed, S. M. Z. (2018). Usability testing for an open-source integrated library system. The Electronic Library, 36(3), 487-503. https://doi.org/10.1108/EL-032017-0049

Kont, K.-R. (2021). If time and money matters: ebook program challenges in Tallinn University of Technology Library. Slavic and East European Information Resources, 22(2), 170-196. https://doi.org/10.1080/15228886.2021.1 917065

Koutsomitropoulos, D. A. (2019). Semantic annotation and harvesting of federated scholarly data using ontologies. Digital Library Perspectives, 35(3/4), 157-171. https://doi.org/10.1108/DLP-12-2018-0038

Kumar, V. (2018). Selecting an appropriate web-scale discovery service: A study of the big 4’s. DESIDOC Journal of Library and Information Technology, 38(6), 396-402. https://doi.org/10.14429/djlit.38.6.12860

Lai, K. (2024). An examination of faceted searching in discovery systems and the impact on information discovery. CAML Review/Revue De L ACBM, 52(1), 33-59. https://doi.org/10.25071/1708-6701.40479

Lakhara, S., & Mishra, N. (2017). Design and implementation of desktop full-text searching system. 2017 International Conference on Intelligent Sustainable Systems (ICISS), 4, IEEE, Palladam, India, pp. 480-485. https://doi.org/10.1109/ISS1.2017.8389458

Lappalainen, Y., & Narayanan, N. (2022). Harvesting publication data to the institutional repository from Scopus, Web of Science, Dimensions and Unpaywall using a custom R Script. The Journal of Academic Librarianship, 49(1), Article 102653. https://doi.org/10.1016/j.acalib.2022.102653

Leebaw, D. E., Conlan, B., Gonnerman, K., Johnston, S., & Sinkler-Miller, C. (2013). Improving library resource discovery: Exploring the possibilities of VuFind and webscale discovery. Journal of Web Librarianship, 7(2), 154-189. https://doi.org/10.1080/19322909.2013.785825

Londhe, N. L., & Patil, S. K. (2015). Success and abandonment of OSS library management system. DESIDOC Journal of Library and Information Technology, 35(6), 398-407. https://doi.org/10.14429/djlit.35.6.8866

Madhusudhan, M., & Singh, V. (2016). Integrated library management systems. The Electronic Library, 34(2), 223249. https://doi.org/10.1108/EL-08-2014-0127

Maharazu, N., & Malumfashi, S. H. (2021). Adoption of Koha Integrated Library System (ILS) for the automation of Umaru Musa Yar’adua University Library, Katsina, Nigeria: Problems and prospects. Asian Journal of Information Science and Technology, 11(1), 9-14. https://doi.org/10.51983/ajist-2021.11.1.2657

Maxim, D., Andrey, T., Artyom, B., Denis, T., Alexey, F., & Maxim, R. (2024), Enhancing User-Centric Information Retrieval: A Unified Dual-DBMS Strategy for Integrating Full-Text and Knowledge Graph Searches, abs/2110.00991, IEEE Xplore, Velikiy Novgorod, Russian Federation, pp. 11-21. https://doi.org/10.1109/IVMEM63006.2024.10659389

Mhawi, Oleiwi, & Aldallal. (2024). Enhanced cultural algorithm for information retrieval system. Applied Mathematics and Information Sciences, 18(5), 1081-1094. https://doi.org/10.18576/amis/180514

Mischo, W., Norman, M., & Schlembach, M. (2017). Innovations in discovery systems: User studies and the bento approach. Charleston Conference Proceedings, pp. 299-304.

Mišutka, J., & Galamboš, L. (2008). Extending full text search engine for mathematical content. Towards Digital Mathematics Library, pp. 55-67.

Mozuraite, V. (2015). Change of the reading paradigm in the age of e-book. Libellarium Journal for Research in the Field of Information and Related Sciences, 7(1), 83-91. https://doi.org/10.15291/libellarium.v7i1.199

Mukhopadhyay, & Barman. (2018). Integrated resource retrieval interface in libraries: Design a discovery framework. IASLIC Bulletin, 63(1), 19-35.

Mukhopadhyay, P. (2015). Interoperability and retrieval. UNESCO Digital Library, UNESCO Publishing. https://unesdoc.unesco.org/ark:/48223/pf0000232199.locale=en (Accessed on July 23, 2025).

Mukhopadhyay, P. (2016). Cross collection discovery system in library: designing a framework. In Halder, S. N. (2016). Cross collection discovery system in library: designing a framework (pp. 36–42).

Ngo, L., Hennesy, C., & Knabe, I. (2019). The impact of webscale discovery on the use of electronic resources. Serials Review, 45(4), 227-238. https://doi.org/10.1080/00987913.2019.1695343

Nitu, M., Dascalu, M., Dascalu, M.-I., Cotet, T.-M., & Tomescu, S. (2020). Reconstructing scanned documents for fulltext indexing to empower digital library services. Lecture Notes in Computer Science, pp. 183-190. https://doi.org/10.1007/978-3-030-38778-5_21

Ojokoh, B. A., Adewale, O. S., & Falaki, S. O. (2009). Automated document metadata extraction. Journal of Information Science, 35(5), 563-570. https://doi.org/10.1177/0165551509105195

Omeluzor, S. U., Adara, O., Ezinwayi, M., & ObyUmahi, F. (2012). Implementation of Koha Integrated Library Management Software (ILMS): The Babcock University Experience. Canadian Social Science, 8(4), 211-221.

Peng, Z., & Plale, B. (2019). Reliable access to massive restricted texts: Experience‐based evaluation. Concurrency and Computation Practice and Experience, 32(16), Article e5255. https://doi.org/10.1002/cpe.5255

Rafique, A., Ameen, K., & Arshad, A. (2021). E-book data mining: Real information behavior of university academic community. Library Hi Tech, 41(2), 413-431. https://doi.org/10.1108/LHT-07-2020-0176

Roy, B. K., Biswas, S. C., & Mukhopadhyay, P. (2016). Designing metadata harvesting framework for OAI-based LIS repositories: A prototype. International Journal of Information Science and Management, 15(1), 73-88.

Roy, B. K., Biswas, S. C., & Mukhopadhyay, P. (2018). Designing web-scale discovery systems using the VuFind open source software. Library Hi Tech News, 35(3), 16-22. https://doi.org/10.1108/LHTN-12-2017-0088

Sarkar, P., & Mukhopadhyay, P. (2016). Full-text ETD retrieval in library discovery system: Designing a framework. Annals of Library and Information Studies, 63(4), 274-288.

Singh, M., & Sanaman, G. (2012). Open source integrated library management systems. The Electronic Library, 30(6), 809-832. https://doi.org/10.1108/02640471211282127

Sostek, K., Russell, D., Goyal, N., Alrashed, T., Dugall, S., & Noy, N. (2024). Discovering datasets on the web scale: challenges and recommendations for Google Dataset search. Harvard Data Science Review, (Special Issue 4). https://doi.org/10.1162/99608f92.4c3e11ca

Sendurur, E., & Gabriel, S. (2024). Evaluation of search engine results pages in various languages and contents: the changes in strategies and criteria. The Electronic Library, 42(2), 173192. https://doi.org/10.1108/EL-05-2023-0111

Stejskal, J., Hajek, P., & Prokop, V. (2021). The role of library user preferences in the willingness to read and pay for e-books: Case of the Czech Republic. The Electronic Library, 39(4), 639-660. https://doi.org/10.1108/EL-012021-0001

Tkaczyk, D. (2017). New methods for metadata extraction from scientific literature. arXiv (Cornell University).

Vasileiou, M., Rowley, J., & Hartley, R. (2012). The e-book management framework: The management of e-books in academic libraries and its challenges. Library and Information Science Research, 34(4), 282-291. https://doi.org/10.1016/j.lisr.2012.06.005

Veve, M. (2016). Harvesting ETD metadata from institutional repositories to OCLC: Approaches and barriers to implementation. Journal of Library Metadata, 16(2), 69-79. https://doi.org/10.1080/1051712X.2016.1215730

Vogus, B. (2020). ebooks in Academic libraries. Public and Access Services Quarterly, 16(3), 182-185. https://doi.org/10.1080/15228959.2020.1778599

Wang, Y. (2020). Web-scale discovery and Google Scholar: A study in use patterns. Journal of Electronic Resources Librarianship, 32(1), 1-10. https://doi.org/10.1080/1941126X.2019.1709722

Ward, S. M., Freeman, R. S., & Nixon, J. M. (2015). Introduction to Academic E-Books. Purdue University Press eBooks, pp. 1-16. https://doi.org/10.2307/j.ctt1wf4ds0.4

Watkinson, C. (2018). The academic eBook ecosystem reinvigorated: A perspective from the USA. Learned Publishing, 31(S1), 280-287. https://doi.org/10.1002/leap.1185

Way, D. (2010). The impact of web-scale discovery on the use of a library collection. Serials Review, 36(4), 214-220. https://doi.org/10.1080/00987913.2010.10765320

Wells, D., & Sallenbach, A. (2015). Books and ebooks in an academic library. The Australian Library Journal, 64(3), 168-179. https://doi.org/10.1080/00049670.2015.1041216

Wendi, C. (2022). Research on the application of computer digital retrieval technology in the construction of library and information database. 2022 International Conference on Applied Physics and Computing (ICAPC), Ottawa, Canada. https://doi.org/10.1109/ICAPC57304.2022.00047

Whitney, P., & Castell, C. (2016). Trade ebooks in libraries: The changing Landscape. https://doi.org/10.1515/9783110309805

Wong, S. (2020). Web-scale Discovery Service Adoption in Canadian Academic Libraries. Partnership the Canadian Journal of Library and Information Practice and Research, 15(2), 1-24. https://doi.org/10.21083/partnership.v15i2.6124

Wu, T., & Chang, M. (2021). The Cloud-Based Textbook: From Choice to Advantage, 2021 IEEE International Conference on Engineering, Technology & Education (TALE), 25, 01-08. https://doi.org/10.1109/TALE52509.2021.9678553

Wulfert, T. (2024). Boundary resources in e-commerce ecosystems. Springer eBooks, pp. 281-369. https://doi.org/10.1007/978-3-658-45198-1_6

Yang, T., Song, M., Zhang, Z., Huang, H., Deng, W., Sun, F., & Zhang, Q. (2023). Auto search indexer for end-toend document retrieval. Findings of the Association for Computational Linguistics: EMNLP 2023, pp. 6955-6970. https://doi.org/10.18653/v1/2023.findings-emnlp.464 PMCid:PMC10676647

Most read articles by the same author(s)

1 2 > >>