Library Carpentry: Towards a New Professional Dimension (Part III – Data Reconciliation, Named Entity Recognition and Advanced Utilities)
DOI:
https://doi.org/10.17821/srels/2021/v58i5/166770Keywords:
Automatic Translation, Data Carpentry, Data Reconciliation, Data Sources Cross-Linking, Library Carpentry, Named Entity Recognition, Sentiment AnalysisAbstract
Data reconciliation and Named Entity Recognition (NER) are closely related concepts to the domain of data carpentry in general and library carpentry in particular. In this context, the part III of the three-part series on library carpentry (part I & II have been published in April & June issues of this journal) is an attempt to apply library carpentry methods in the core areas of information organization in a library of any type or size along with additional utilities like cross-linking of data sources, automatic translation, sentiment analysis and so on. A total of five case studies are included in this research study covering these areas with a focus on do-by-yourself mode.Downloads
References
Agate, N. (2018). Wikidata: A platform for your library’s linked open data. The Idealis.
Allison-Cassin, S., Armstrong, A., Ayers, P., Cramer, T., Custer, M., Lemus-Rojas, M., McCallum, S., Proffitt, M., Puente, M., Ruttenberg, J. and Stinson, A. (2019). ARL white paper on Wikidata: Opportunities and recommendations.
Allison-Cassin, S. and Scott, D. (2018). Wikidata: A platform for your library’s linked open data. The Code4Lib Journal, 40. https://journal.code4lib.org/articles/13424.
Androutsopoulou, A. and Charalabidis, Y. (2018). A Framework for Evidence Based Policy Making Combining Big Data, Dynamic Modeling and Machine Intelligence. Proceedings of the 11th International Conference on Theory and Practice of Electronic Governance; p. 575-583. https://doi.org/10.1145/3209415.3209427. DOI: https://doi.org/10.1145/3209415.3209427
Aruna, K. and Anupriya, S. (2018). Sentiment analysis on social media information using data mining techniques a review. International Journal of Pure and Applied Mathematics, 120(6): 10807-10816.
Avgeris, Z. (2021). From text to space and vice versa: The travel accounts of Sir William Gell and Edward Dodwell in Phocis and Boeotia. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447010.
Brando, C., Frontini, F. and Ganascia, J.-G. (2016). REDEN: Named entity linking in digital literary editions using linked data sets. Complex Systems Informatics and Modeling Quarterly, 7: 60-80. https://doi.org/10.7250/csimq.2016-7.04. DOI: https://doi.org/10.7250/csimq.2016-7.04
Bryl, V., Bizer, C., Isele, R., Verlic, M., Hong, S. G., Jang, S., Yi, M. Y. and Choi, K.-S. (2014). Interlinking and Knowledge Fusion. In: S. Auer, V. Bryl & S. Tramp (Eds.), Linked Open Data- Creating Knowledge out of Interlinked Data: Results of the LOD2 Project, Springer International Publishing; p. 70-89. https://doi.org/10.1007/978-3-319- 09846-3_4. DOI: https://doi.org/10.1007/978-3-319-09846-3_4
Carlson, S. and Seely, A. (2017). Using OpenRefine’s reconciliation to validate local authority headings. Cataloging and Classification Quarterly, 55(1): 1-11. https://doi.org/10 .1080/01639374.2016.1245693. DOI: https://doi.org/10.1080/01639374.2016.1245693
Coll, R. and Ó Tuairisg, S. (2015). Preparing bilingual metadata for a bilingual repository. New Review of Information Networking, 20(1-2): 53-58. https://doi.org/10.1080/13614 576.2015.1110398. DOI: https://doi.org/10.1080/13614576.2015.1110398
Crowe, K. and Clair, K. (2015). Developing a tool for publishing linked local authority data. Journal of Library Metadata, 15(3-4): 227-240. https://doi.org/10.1080/19386 389.2015.1099993. DOI: https://doi.org/10.1080/19386389.2015.1099993
Cucerzan, S. (2007). Large-Scale Named Entity Disambiguation Based on Wikipedia Data. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL); 708-716. https://aclanthology.org/D07-1074.
Delpeuch, A. (2019). A survey of OpenRefine reconciliation services. ArXiv:1906.08092 [Cs]. http://arxiv.org/abs/1906.08092.
Dix, A., Cowgill, R., Bashford, C., McVeigh, S. and Ridgewell, R. (2016). Spreadsheets as User Interfaces. Proceedings of the International Working Conference on Advanced Visual Interfaces; 192-195. https://doi.org/10.1145/2909132.2909271. DOI: https://doi.org/10.1145/2909132.2909271
Downey, M. (2019). Assessing author identifiers: Preparing for a linked data approach to name authority control in an institutional repository context. Journal of Library Metadata, 19(1-2): 117-136. https://doi.org/10.1080/19386389.2019.1590936. DOI: https://doi.org/10.1080/19386389.2019.1590936
Goyal, A., Gupta, V. and Kumar, M. (2018). Recent named entity recognition and classification techniques: A systematic review. Computer Science Review, 29: 21-43. https:// doi.org/10.1016/j.cosrev.2018.06.001. DOI: https://doi.org/10.1016/j.cosrev.2018.06.001
Gracia, J., Villegas, M., Gómez-Pérez, A. and Bel, N. (2018). The apertium bilingual dictionaries on the web of data. Semantic Web, 9(2), 231-240. https://doi.org/10.3233/SW-170258. DOI: https://doi.org/10.3233/SW-170258
Green, H., Dickson, E., Tracy, D. G., Christensen, S., Emerson, M. and Jacoby, J. (2017). Scholarly commons digital humanities needs assessment study. https://www.ideals.illinois.edu/handle/2142/100081.
Hachey, B., Radford, W. and Curran, J. R. (2011). Graph-Based Named Entity Linking with Wikipedia. In: A. Bouguettaya, M. Hauswirth & L. Liu (Eds.), Web Information System Engineering - WISE 2011, Springer; p. 213-226. https://doi.org/10.1007/978-3-642-24434-6_16. DOI: https://doi.org/10.1007/978-3-642-24434-6_16
Hanson, E. M. (2014). A beginner’s guide to creating library linked data: Lessons from NCSU’s organization name linked data project. Serials Review, 40(4): 251-258. https:// doi.org/10.1080/00987913.2014.975887. DOI: https://doi.org/10.1080/00987913.2014.975887
Hashimi, H., Hafez, A. and Mathkour, H. (2015). Selection criteria for text mining approaches. Computers in Human Behavior, 51: 729-733. https://doi.org/10.1016/j.chb.2014.10.062. DOI: https://doi.org/10.1016/j.chb.2014.10.062
Hill, K. M. (2016). In search of useful collection metadata: Using Openrefine to create accurate, complete, and clean title-level collection information. Serials Review. https:// doi.org/10.1080/00987913.2016.1214529. DOI: https://doi.org/10.1080/00987913.2016.1214529
Hladka, J., Mynarz, J. and Sklenak, V. (2012). Experience with transformation of bibliographic data into linked data. Journal of Systems Integration, 3(1): 54-62. https://doi.org/10.20470/jsi.v3i1.106.
Hooland, S. van, Verborgh, R., Wilde, M. D., Hercher, J., Mannens, E. and Walle, R. V. de. (2013). Evaluating the success of vocabulary reconciliation for cultural heritage collections. Journal of the American Society for Information Science and Technology, 64(3): 464-479. https://doi.org/10.1002/asi.22763. DOI: https://doi.org/10.1002/asi.22763
Hooland, S. van, Wilde, M. D., Verborgh, R., Steiner, T. and Walle, R. V. de. (2015). Exploring entity recognition and disambiguation for cultural heritage collections. Digital Scholarship in the Humanities, 30(2): 262-279. https://doi.org/10.1093/llc/fqt067. DOI: https://doi.org/10.1093/llc/fqt067
Isaac, A., Schlobach, S., Matthezing, H. and Zinn, C. (2008). Integrated access to cultural heritage resources through representation and alignment of controlled vocabularies. Library Review, 57(3): 187-199. https://doi.org/10.1108/00242530810865475. DOI: https://doi.org/10.1108/00242530810865475
Kaffee, L.-A., Piscopo, A., Vougiouklis, P., Simperl, E., Carr, L. and Pintscher, L. (2017). A glimpse into Babel: An Analysis of Multilinguality in Wikidata. Proceedings of the 13th International Symposium on Open Collaboration. https://doi.org/10.1145/3125433.3125465. DOI: https://doi.org/10.1145/3125433.3125465
Lemus-Rojas, M. and Pintscher, L. (2017). Wikidata and libraries: Facilitating open knowledge. In: Leveraging Wikipedia: Connecting Communities of Knowledge, ALA Editions, ALA; p. 143-158. https://scholarworks.iupui.edu/handle/1805/16690.
Li, X., Feng, J., Meng, Y., Han, Q., Wu, F. and Li, J. (2020). A Unified MRC Framework for Named Entity Recognition. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, p.5849-5859. https://doi.org/10.18653/v1/2020.acl-main.519. DOI: https://doi.org/10.18653/v1/2020.acl-main.519
McCallum, A. and Li, W. (2003). Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons. North American Chapter of the Association for Computational Linguistics. https://doi.org/10.3115/1119176.1119206. DOI: https://doi.org/10.3115/1119176.1119206
Mehrabi, N., Gowda, T., Morstatter, F., Peng, N. and Galstyan, A. (2020). Man is to Person as Woman is to Location: Measuring Gender Bias in Named Entity Recognition. Proceedings of the 31st ACM Conference on Hypertext and Social Media, p.231-232. https://doi. org/10.1145/3372923.3404804. DOI: https://doi.org/10.1145/3372923.3404804
Mukhopadhyay, P., Mitra, R. and Mukhopadhyay, M. (2021). Library carpentry: Towards a new professional dimension (part i - concepts and case studies). SRELS Journal of Information Management, 58(2): 67-80. https://doi.org/10.17821/srels/2021/v58i2/159969. DOI: https://doi.org/10.17821/srels/2021/v58i2/159969
Mukhopadhyay, P. and Mukhopadhyay, M. (2021). Library carpentry: Towards a new professional dimension (part ii - automatic authority control to enhance retrieval). SRELS Journal of Information Management, 58(3): 135-155. https://doi.org/10.17821/srels/2021/v58i3/163890. DOI: https://doi.org/10.17821/srels/2021/v58i3/163890
Müller, B. (2009). Visualization and analysis of extracted information from full text and patent corpora [PhD Thesis]. https://doi.org/10.13140/RG.2.2.27175.44961.
Nanli, Z., Ping, Z., Weiguo, L. and Meng, C. (2012). Sentiment analysis: A literature review. International Symposium on Management of Technology (ISMOT), Publisher: IEEE; p. 572-576. https://doi.org/10.1109/ISMOT.2012.6679538. DOI: https://doi.org/10.1109/ISMOT.2012.6679538
Page, R. (2016). Towards a biodiversity knowledge graph. Research Ideas and Outcomes, 2, e8767. https://doi.org/10.3897/rio.2.e8767. DOI: https://doi.org/10.3897/rio.2.e8767
Papachristopoulos, L., Ampatzoglou, P., Seferli, I., Zafeiropoulou, A. and Petasis, G. (2019). Introducing sentiment analysis for the evaluation of library’s services effectiveness. Qualitative and Quantitative Methods in Libraries, 8(1): 99-110.
Park, Z. and Kim, H. (2014). Organizing and Sharing Information using Linked Data. In: Library and Information Science, Emerald Group Publishing Limited; p. 61-87. https://doi.org/10.1108/S1876-0562(2013)0000007008.
Parker, B. and Gray, A. (2019). Rethinking the university of Maryland authority file for the linked data environment. Journal of Library Metadata, 19(1-2): 69-81. https://doi.org /10.1080/19386389.2019.1589699. DOI: https://doi.org/10.1080/19386389.2019.1589699
Purkayastha, S. (2019, June 19). Top 10 Best Translation APIs [2021] for Developers 20+ API Reviewed [blog]. Rakuten RapidAPI Blog. https://blog.api.rakuten.net/ top-10-best-translation-apis-google-translate-microsofttranslator- and-others/.
Ryan, C., Grant, R., Carragáin, E. Ó., Collins, S., Decker, S. and Lopes, N. (2015). Linked data authority records for Irish place names. International Journal on Digital Libraries, 15(2): 73-85. https://doi.org/10.1007/s00799-014-0129-8. DOI: https://doi.org/10.1007/s00799-014-0129-8
Singh, A. K. and Shashi, M. (2017). Research aids for social media analytics. IJCSN, 6(6): 2277-5420. https://www. researchgate.net/publication/323456896_Research_Aids_ for_Social_Media_Analytics.
Smith-Yoshimura, K. (2016). Analysis of international linked data survey for implementers. D-Lib Magazine, p.22(7/8). https://doi.org/10.1045/july2016-smithyoshimura. DOI: https://doi.org/10.1045/july2016-smith-yoshimura
Smith-Yoshimura, K. (2018). Analysis of 2018 international linked data survey for implementers. The Code4Lib Journal, p.42. https://journal.code4lib.org/articles/13867.
Tillman, R. K. (2016). Extracting, augmenting, and updating metadata in Fedora 3 and 4 using a local Openrefine reconciliation service. The Code4Lib Journal, 31pp. https://journal.code4lib.org/articles/11179.
Verborgh, R. and Wilde, M. D. (2013). Using OpenRefine (Revised ed.). Packt Publishing.
Verborgh, R. and Wilde, M. D. (2013). Using OpenRefine. Packt Publishing. https://ruben.verborgh.org/publications/verborgh_packt_2013/#citation-styles.
Weichselbraun, A., Kuntschik, P., Francolino, V., Saner, M., Dahinden, U. and Wyss, V. (2021). Adapting data-driven research to the fields of social sciences and the humanities. Future Internet, 13(3): 59. https://doi.org/10.3390/fi13030059 DOI: https://doi.org/10.3390/fi13030059
Weston, L., Tshitoyan, V., Dagdelen, J., Kononova, O., Trewartha, A., Persson, K. A., Ceder, G. and Jain, A. (2019). Named entity recognition and normalization applied to large-scale information extraction from the materials science literature. Journal of Chemical Information and Modeling, 59(9): 3692-3702. https://doi.org/10.1021/acs. jcim.9b00470. PMid:31361962. DOI: https://doi.org/10.1021/acs.jcim.9b00470
Yadav, V. and Bethard, S. (2019). A survey on recent advances in named entity recognition from deep learning models. ArXiv:1910.11470 [Cs]. http://arxiv.org/abs/1910.11470.
Downloads
Published
How to Cite
Issue
Section
License
All the articles published in Journal of Information and Knowledge are held by the Publisher. Sarada Ranganathan Endowment for Library Science (SRELS), as a publisher requires its authors to transfer the copyright prior to publication. This will permit SRELS to reproduce, publish, distribute and archive the article in print and electronic form and also to defend against any improper use of the article.
Accepted 2021-10-26
Published 2021-10-30