The State of Digital Media Data Research, 2023




Lukito, Josephine
Brown, Megan A.
Dahlke, Ross
Suk, Jiyoun
Yang, Yunkang
Zhang, Yini
Chen, Bin
Kim, Sang Jung
Soorholtz, Kaiya

Journal Title

Journal ISSN

Volume Title




DMD Report website:
The purpose of this report is to provide an account of digital media data (DMD) research practices and to highlight its ongoing challenges. We define DMD as data that are collected, extracted, gathered, or scraped from a web-based platform such as a website, social networking site, mobile application, or another virtual space. We break these practices and their challenges into three stages—collection, analysis, and sharing. We argue that continuing digital media data (DMD) research should be guided by four principles: collaboration, transparency, preparation, and consistency. 1. COLLABORATION: Working together on protocols for DMD research can occur across all stages of the research pipeline, including setting norms for data sharing, producing baseline research, and co-developing archives. 2. TRANSPARENCY: Make code and data accessible to other researchers, when possible. Open-source software development is especially helpful for advancing research in a transparent manner. Given the cost for collecting, storing, and analyzing DMD, we also encourage researchers to provide these details in their publications. 3. PREPARATION: Researchers should anticipate the risks or challenges to collecting, analyzing, and reporting on DMD. Emerging methods for data collection, such as donated data, provide a new mechanism for studying user-consented data. 4. CONSISTENCY: We end with this principle because it builds on the aforementioned three as, when researchers are collaborative, transparent and prepared, research approaches will be more consistent, allowing us to compare across studies and identify situational contexts that require nuanced protocol.

LCSH Subject Headings