Library Carpentry: Towards a New Professional Dimension (Part III – Data Reconciliation, Named Entity Recognition and Advanced Utilities)

Authors

  • Department of Library and Information Science, University of Kalyani, Kalyani – 741235, West Bengal
  • Department of Library and Information Science, University of Kalyani, Kalyani – 741235, West Bengal

DOI:

https://doi.org/10.17821/srels/2021/v58i5/166770

Keywords:

Automatic Translation, Data Carpentry, Data Reconciliation, Data Sources Cross-Linking, Library Carpentry, Named Entity Recognition, Sentiment Analysis

Abstract

Data reconciliation and Named Entity Recognition (NER) are closely related concepts to the domain of data carpentry in general and library carpentry in particular. In this context, the part III of the three-part series on library carpentry (part I & II have been published in April & June issues of this journal) is an attempt to apply library carpentry methods in the core areas of information organization in a library of any type or size along with additional utilities like cross-linking of data sources, automatic translation, sentiment analysis and so on. A total of five case studies are included in this research study covering these areas with a focus on do-by-yourself mode.

Downloads

Download data is not yet available.

References

Agate, N. (2018). Wikidata: A platform for your library’s linked open data. The Idealis.

Allison-Cassin, S., Armstrong, A., Ayers, P., Cramer, T., Custer, M., Lemus-Rojas, M., McCallum, S., Proffitt, M., Puente, M., Ruttenberg, J. and Stinson, A. (2019). ARL white paper on Wikidata: Opportunities and recommendations.

Allison-Cassin, S. and Scott, D. (2018). Wikidata: A platform for your library’s linked open data. The Code4Lib Journal, 40. https://journal.code4lib.org/articles/13424.

Androutsopoulou, A. and Charalabidis, Y. (2018). A Framework for Evidence Based Policy Making Combining Big Data, Dynamic Modeling and Machine Intelligence. Proceedings of the 11th International Conference on Theory and Practice of Electronic Governance; p. 575-583. https://doi.org/10.1145/3209415.3209427. DOI: https://doi.org/10.1145/3209415.3209427

Aruna, K. and Anupriya, S. (2018). Sentiment analysis on social media information using data mining techniques a review. International Journal of Pure and Applied Mathematics, 120(6): 10807-10816.

Avgeris, Z. (2021). From text to space and vice versa: The travel accounts of Sir William Gell and Edward Dodwell in Phocis and Boeotia. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447010.

Brando, C., Frontini, F. and Ganascia, J.-G. (2016). REDEN: Named entity linking in digital literary editions using linked data sets. Complex Systems Informatics and Modeling Quarterly, 7: 60-80. https://doi.org/10.7250/csimq.2016-7.04. DOI: https://doi.org/10.7250/csimq.2016-7.04

Bryl, V., Bizer, C., Isele, R., Verlic, M., Hong, S. G., Jang, S., Yi, M. Y. and Choi, K.-S. (2014). Interlinking and Knowledge Fusion. In: S. Auer, V. Bryl & S. Tramp (Eds.), Linked Open Data- Creating Knowledge out of Interlinked Data: Results of the LOD2 Project, Springer International Publishing; p. 70-89. https://doi.org/10.1007/978-3-319- 09846-3_4. DOI: https://doi.org/10.1007/978-3-319-09846-3_4

Carlson, S. and Seely, A. (2017). Using OpenRefine’s reconciliation to validate local authority headings. Cataloging and Classification Quarterly, 55(1): 1-11. https://doi.org/10 .1080/01639374.2016.1245693. DOI: https://doi.org/10.1080/01639374.2016.1245693

Coll, R. and Ó Tuairisg, S. (2015). Preparing bilingual metadata for a bilingual repository. New Review of Information Networking, 20(1-2): 53-58. https://doi.org/10.1080/13614 576.2015.1110398. DOI: https://doi.org/10.1080/13614576.2015.1110398

Crowe, K. and Clair, K. (2015). Developing a tool for publishing linked local authority data. Journal of Library Metadata, 15(3-4): 227-240. https://doi.org/10.1080/19386 389.2015.1099993. DOI: https://doi.org/10.1080/19386389.2015.1099993

Cucerzan, S. (2007). Large-Scale Named Entity Disambiguation Based on Wikipedia Data. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL); 708-716. https://aclanthology.org/D07-1074.

Delpeuch, A. (2019). A survey of OpenRefine reconciliation services. ArXiv:1906.08092 [Cs]. http://arxiv.org/abs/1906.08092.

Dix, A., Cowgill, R., Bashford, C., McVeigh, S. and Ridgewell, R. (2016). Spreadsheets as User Interfaces. Proceedings of the International Working Conference on Advanced Visual Interfaces; 192-195. https://doi.org/10.1145/2909132.2909271. DOI: https://doi.org/10.1145/2909132.2909271

Downey, M. (2019). Assessing author identifiers: Preparing for a linked data approach to name authority control in an institutional repository context. Journal of Library Metadata, 19(1-2): 117-136. https://doi.org/10.1080/19386389.2019.1590936. DOI: https://doi.org/10.1080/19386389.2019.1590936

Goyal, A., Gupta, V. and Kumar, M. (2018). Recent named entity recognition and classification techniques: A systematic review. Computer Science Review, 29: 21-43. https:// doi.org/10.1016/j.cosrev.2018.06.001. DOI: https://doi.org/10.1016/j.cosrev.2018.06.001

Gracia, J., Villegas, M., Gómez-Pérez, A. and Bel, N. (2018). The apertium bilingual dictionaries on the web of data. Semantic Web, 9(2), 231-240. https://doi.org/10.3233/SW-170258. DOI: https://doi.org/10.3233/SW-170258

Green, H., Dickson, E., Tracy, D. G., Christensen, S., Emerson, M. and Jacoby, J. (2017). Scholarly commons digital humanities needs assessment study. https://www.ideals.illinois.edu/handle/2142/100081.

Hachey, B., Radford, W. and Curran, J. R. (2011). Graph-Based Named Entity Linking with Wikipedia. In: A. Bouguettaya, M. Hauswirth & L. Liu (Eds.), Web Information System Engineering - WISE 2011, Springer; p. 213-226. https://doi.org/10.1007/978-3-642-24434-6_16. DOI: https://doi.org/10.1007/978-3-642-24434-6_16

Hanson, E. M. (2014). A beginner’s guide to creating library linked data: Lessons from NCSU’s organization name linked data project. Serials Review, 40(4): 251-258. https:// doi.org/10.1080/00987913.2014.975887. DOI: https://doi.org/10.1080/00987913.2014.975887

Hashimi, H., Hafez, A. and Mathkour, H. (2015). Selection criteria for text mining approaches. Computers in Human Behavior, 51: 729-733. https://doi.org/10.1016/j.chb.2014.10.062. DOI: https://doi.org/10.1016/j.chb.2014.10.062

Hill, K. M. (2016). In search of useful collection metadata: Using Openrefine to create accurate, complete, and clean title-level collection information. Serials Review. https:// doi.org/10.1080/00987913.2016.1214529. DOI: https://doi.org/10.1080/00987913.2016.1214529

Hladka, J., Mynarz, J. and Sklenak, V. (2012). Experience with transformation of bibliographic data into linked data. Journal of Systems Integration, 3(1): 54-62. https://doi.org/10.20470/jsi.v3i1.106.

Hooland, S. van, Verborgh, R., Wilde, M. D., Hercher, J., Mannens, E. and Walle, R. V. de. (2013). Evaluating the success of vocabulary reconciliation for cultural heritage collections. Journal of the American Society for Information Science and Technology, 64(3): 464-479. https://doi.org/10.1002/asi.22763. DOI: https://doi.org/10.1002/asi.22763

Hooland, S. van, Wilde, M. D., Verborgh, R., Steiner, T. and Walle, R. V. de. (2015). Exploring entity recognition and disambiguation for cultural heritage collections. Digital Scholarship in the Humanities, 30(2): 262-279. https://doi.org/10.1093/llc/fqt067. DOI: https://doi.org/10.1093/llc/fqt067

Isaac, A., Schlobach, S., Matthezing, H. and Zinn, C. (2008). Integrated access to cultural heritage resources through representation and alignment of controlled vocabularies. Library Review, 57(3): 187-199. https://doi.org/10.1108/00242530810865475. DOI: https://doi.org/10.1108/00242530810865475

Kaffee, L.-A., Piscopo, A., Vougiouklis, P., Simperl, E., Carr, L. and Pintscher, L. (2017). A glimpse into Babel: An Analysis of Multilinguality in Wikidata. Proceedings of the 13th International Symposium on Open Collaboration. https://doi.org/10.1145/3125433.3125465. DOI: https://doi.org/10.1145/3125433.3125465

Lemus-Rojas, M. and Pintscher, L. (2017). Wikidata and libraries: Facilitating open knowledge. In: Leveraging Wikipedia: Connecting Communities of Knowledge, ALA Editions, ALA; p. 143-158. https://scholarworks.iupui.edu/handle/1805/16690.

Li, X., Feng, J., Meng, Y., Han, Q., Wu, F. and Li, J. (2020). A Unified MRC Framework for Named Entity Recognition. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, p.5849-5859. https://doi.org/10.18653/v1/2020.acl-main.519. DOI: https://doi.org/10.18653/v1/2020.acl-main.519

McCallum, A. and Li, W. (2003). Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons. North American Chapter of the Association for Computational Linguistics. https://doi.org/10.3115/1119176.1119206. DOI: https://doi.org/10.3115/1119176.1119206

Mehrabi, N., Gowda, T., Morstatter, F., Peng, N. and Galstyan, A. (2020). Man is to Person as Woman is to Location: Measuring Gender Bias in Named Entity Recognition. Proceedings of the 31st ACM Conference on Hypertext and Social Media, p.231-232. https://doi. org/10.1145/3372923.3404804. DOI: https://doi.org/10.1145/3372923.3404804

Mukhopadhyay, P., Mitra, R. and Mukhopadhyay, M. (2021). Library carpentry: Towards a new professional dimension (part i - concepts and case studies). SRELS Journal of Information Management, 58(2): 67-80. https://doi.org/10.17821/srels/2021/v58i2/159969. DOI: https://doi.org/10.17821/srels/2021/v58i2/159969

Mukhopadhyay, P. and Mukhopadhyay, M. (2021). Library carpentry: Towards a new professional dimension (part ii - automatic authority control to enhance retrieval). SRELS Journal of Information Management, 58(3): 135-155. https://doi.org/10.17821/srels/2021/v58i3/163890. DOI: https://doi.org/10.17821/srels/2021/v58i3/163890

Müller, B. (2009). Visualization and analysis of extracted information from full text and patent corpora [PhD Thesis]. https://doi.org/10.13140/RG.2.2.27175.44961.

Nanli, Z., Ping, Z., Weiguo, L. and Meng, C. (2012). Sentiment analysis: A literature review. International Symposium on Management of Technology (ISMOT), Publisher: IEEE; p. 572-576. https://doi.org/10.1109/ISMOT.2012.6679538. DOI: https://doi.org/10.1109/ISMOT.2012.6679538

Page, R. (2016). Towards a biodiversity knowledge graph. Research Ideas and Outcomes, 2, e8767. https://doi.org/10.3897/rio.2.e8767. DOI: https://doi.org/10.3897/rio.2.e8767

Papachristopoulos, L., Ampatzoglou, P., Seferli, I., Zafeiropoulou, A. and Petasis, G. (2019). Introducing sentiment analysis for the evaluation of library’s services effectiveness. Qualitative and Quantitative Methods in Libraries, 8(1): 99-110.

Park, Z. and Kim, H. (2014). Organizing and Sharing Information using Linked Data. In: Library and Information Science, Emerald Group Publishing Limited; p. 61-87. https://doi.org/10.1108/S1876-0562(2013)0000007008.

Parker, B. and Gray, A. (2019). Rethinking the university of Maryland authority file for the linked data environment. Journal of Library Metadata, 19(1-2): 69-81. https://doi.org /10.1080/19386389.2019.1589699. DOI: https://doi.org/10.1080/19386389.2019.1589699

Purkayastha, S. (2019, June 19). Top 10 Best Translation APIs [2021] for Developers 20+ API Reviewed [blog]. Rakuten RapidAPI Blog. https://blog.api.rakuten.net/ top-10-best-translation-apis-google-translate-microsofttranslator- and-others/.

Ryan, C., Grant, R., Carragáin, E. Ó., Collins, S., Decker, S. and Lopes, N. (2015). Linked data authority records for Irish place names. International Journal on Digital Libraries, 15(2): 73-85. https://doi.org/10.1007/s00799-014-0129-8. DOI: https://doi.org/10.1007/s00799-014-0129-8

Singh, A. K. and Shashi, M. (2017). Research aids for social media analytics. IJCSN, 6(6): 2277-5420. https://www. researchgate.net/publication/323456896_Research_Aids_ for_Social_Media_Analytics.

Smith-Yoshimura, K. (2016). Analysis of international linked data survey for implementers. D-Lib Magazine, p.22(7/8). https://doi.org/10.1045/july2016-smithyoshimura. DOI: https://doi.org/10.1045/july2016-smith-yoshimura

Smith-Yoshimura, K. (2018). Analysis of 2018 international linked data survey for implementers. The Code4Lib Journal, p.42. https://journal.code4lib.org/articles/13867.

Tillman, R. K. (2016). Extracting, augmenting, and updating metadata in Fedora 3 and 4 using a local Openrefine reconciliation service. The Code4Lib Journal, 31pp. https://journal.code4lib.org/articles/11179.

Verborgh, R. and Wilde, M. D. (2013). Using OpenRefine (Revised ed.). Packt Publishing.

Verborgh, R. and Wilde, M. D. (2013). Using OpenRefine. Packt Publishing. https://ruben.verborgh.org/publications/verborgh_packt_2013/#citation-styles.

Weichselbraun, A., Kuntschik, P., Francolino, V., Saner, M., Dahinden, U. and Wyss, V. (2021). Adapting data-driven research to the fields of social sciences and the humanities. Future Internet, 13(3): 59. https://doi.org/10.3390/fi13030059 DOI: https://doi.org/10.3390/fi13030059

Weston, L., Tshitoyan, V., Dagdelen, J., Kononova, O., Trewartha, A., Persson, K. A., Ceder, G. and Jain, A. (2019). Named entity recognition and normalization applied to large-scale information extraction from the materials science literature. Journal of Chemical Information and Modeling, 59(9): 3692-3702. https://doi.org/10.1021/acs. jcim.9b00470. PMid:31361962. DOI: https://doi.org/10.1021/acs.jcim.9b00470

Yadav, V. and Bethard, S. (2019). A survey on recent advances in named entity recognition from deep learning models. ArXiv:1910.11470 [Cs]. http://arxiv.org/abs/1910.11470.

Published

2021-10-30

How to Cite

Mukhopadhyay, P., & Mitra, R. (2021). Library Carpentry: Towards a New Professional Dimension (Part III – Data Reconciliation, Named Entity Recognition and Advanced Utilities). Journal of Information and Knowledge, 58(5), 287–303. https://doi.org/10.17821/srels/2021/v58i5/166770

Issue

Section

Invited Paper
Received 2021-10-26
Accepted 2021-10-26
Published 2021-10-30

Most read articles by the same author(s)

1 2 > >>