Blog #13 “Reading “Thoughts” from Big Data of “Records””


Category: Blog

Author: Erina Ota-Tsukada

Digital humanities have recently gained popularity in historical research. Various methods have been proposed for the digital humanities, such as text analysis, database creation, and network visualization. These methods can initially appear to be in opposition to the conventional close reading of historical sources. However, they are not different from traditional research processes, since it is essential to determine the nature of the target materials and how to handle them to improve the accuracy of the analysis results.

My research is mostly based on chronicles, biographies, and other descriptive materials written by the ulama (Islamic intellectuals), who are the subject of my study of pre-modern Arab-Islamic history. Inevitably, the portrayals of contemporary society that these authors created were strongly influenced by their daily activities as scholars. The ulama positioned themselves in the genealogy of traditions (hadith) dating back to the time of the Prophet Muhammad. They considered the transmission of the Islamic community’s “scholastic lineage” to the next generation crucial. Therefore, when analyzing the historical descriptions of this period, it is important to remember that, in addition to the pure “record” of the society at the time, they also contain the “memory” of the community that the ulama inherited and had to pass on to the next generation. If we consider contemporary records as “horizontal information,” then the memory of the Islamic community may be regarded as “vertical information” passed on from the past to the future.

An analysis of the 15th-century biographical dictionary
(“The Shining Light for the People of the 9th Century”) by Voyant tools

The descriptions of the same incident or individual in contemporary historical sources vary significantly in terms of subjective expressions, such as positive or negative, as well as in the “granularity” of the information, or how much detail is recorded. As historical records are compiled by humans, recognizing the presence of any filters derived from the author’s own perspective and contemplating the rationale for the selection and omission of information is important. In analyzing texts, verifying factual information on who did what, when, and where is of course important; however, my research project will focus on how the ulama interpreted their contemporary society and sought to preserve it as a collective memory for future generations.

Historical knowledge can be digitized through various means, including the Resource Description Framework (RDF), a core technology used to build digital archives and databases. RDF is a language used to describe so-called “data about data,” where information is represented using three elements: “subject,” “predicate,” and “object.” To indicate that al-Sakhawi is the author of “The Shining Light for the People of the 9th Century,” a representative biographical dictionary from the 15th century, we set “The Shining Light for the People of the 9th Century” as the “subject,” the author as the “predicate,” and al-Sakhawi as the “object.” The original manuscript is actually written in Arabic. In the notation of Arabic personal names, which often include many individuals with the same name, the possibility of confusion with different people does exist. Therefore, in RDF, the subject and predicate are represented as resources on the web (URI) to eliminate ambiguity in the interpretation of the meaning and usage of words.

Proper nouns, such as book titles and authors, are managed using URI-based IDs rather than strings, as seen in the figure. Additionally, the predicate part is not expressed with flexible terms such as “author” or “writer,” which can vary depending on the user or language. Instead, the property defined by the Dublin Core Metadata Initiative as “creator” (=An entity primarily responsible for making the resource: is utilized. This approach helps to prevent confusion between entities with the same name and language-derived shaky notations.

The advantages of RDF are primarily highlighted in its ability to freely add related information and facilitate the visualization of networks. However, the real advantage of RDF lies in its query language, SPARQL. In conventional text-based searches that we are familiar with, we can only determine where mentions of search terms exist in the text (which, compared to the era when only paper-based historical sources were available, is a significant advancement in itself with the existence of digital text). However, with SPARQL, we can specify desired information using variables to extract data, such as “Who are the individuals related to person A in location B within the timeframe of C?” or “Which individual belongs to the academic lineage traced back through person A’s mentor B?” A query-based RDF inference system will get increasingly potent beyond human processing capabilities as the size of the data increases.

Genealogy drawn as a semantic web
(Muzhir family, a notable 15th-century bureaucratic family)

Between the 13th and 15th centuries, numerous extensive chronicles and biographical dictionaries were composed throughout the Arab world. However, this abundance of historical sources has made comprehensive discussions challenging when studying the dynamics of relationship-building during this period. Digital humanities methods have the potential to be significant breakthroughs in connecting case studies to a larger context if we can manage the overwhelming volume of data received from these sources, which the human eye cannot follow, through qualitative processes. What knowledge does the Islamic community want to inherit from the past and pass on to the future? What factors were involved in this process? Through inferences derived from the “big data” that recorded the period, we may even discover the “thoughts” of the people who lived during that time.

Author profile

Erina Ota-Tsukada(太田(塚田)絵里奈)

Project Assistant Professor, ILCAA

Studied at Cairo University while pursuing the Doctoral Program at the Graduate School of Letters, Keio University. Doctorate in history.
Lecturer at the Faculty of Letters and Institute of Cultural & Linguistic Studies, Keio University, Joint Researchers, ILCAA, and thereafter Project Assistant Professor, ILCAA
Papers and publications include:
“Formation of the Ideal Bureaucrat Image and Patronage in the Late Mamlūk Period: Zayn al-Dīn Ibn Muzhir and ʻUlamāʼ,” Al-Madaniyya 1, 2021, 41-61.
“The Formation of Islam: Religion and Society in the Near East, 600-1800” Berkey Jonathan Porter, trans. by, Shin Nomoto, Erina Ota-Tsukada (Role: Joint translator), Keio University, 2013.
Awarded the “Sumitomo Life Woman Researcher Encouragement Prizes” in 2019.


In the study of pre-modern Arab and Islamic history, I am working to elucidate the “survival strategy” of individuals trying to survive by creating “connections” with others through the observation of human “ecology.” Furthermore, in recent years, under the keyword “co-creation of academic knowledge,” there has been a growing demand to communicate the allure and values of academic disciplines to society. I aspire to create opportunities for interactive experiences to share the universal wisdom found in the humanities and social sciences through dialogue.