Name |
Citations, Data |
Can Apply To |
Research data sets and journal articles that describe them. |
Metric Definition |
The number of times a journal article or book has referenced a data set. |
Metric Calculation |
Data citations are sometimes collected only in the formal sense (i.e., with the data set being listed in the References section of a paper, alongside journal articles). They can also be calculated in the informal sense (i.e., linked to from within the Methods section of a paper). It varies from tool to tool. |
Data Sources |
Web of Science via Data Citation Index, Google Scholar (rare) |
Appropriate Use Cases |
Data citations should be used to understand how often research data has been reused in others’ studies, thereby indicating advancement of the field. Some fields (e.g.,crystallography and genomics) practice data citation at higher rates than others, and therefore evaluation of research from those fields may be more suitable scenarios for using data citations. |
Limitations |
Data citation is still relatively rarely practiced, with only half of journals providing instruction for how to cite data and more than 88% of all Data Citation Index records going uncited. Lack for formal referencing poses a challenge for using data citations from tools that only count such formal references in their data citation metrics. Critics of data citation claim that data citations merely mimic existing metrics that do not “recognize all players involved in the life cycle of those data from collection to publication”. Disciplinary coverage in the Data Citation Index (as of 2017) is skewed, favoring the life sciences (48% of records) over the social sciences (20%), physical sciences (23%), arts & humanities (7%), and multidisciplinary research (2%). Note that the Data Citation Index tracks citations for datasets and also related data studies (defined as “a description of studies or experiments held in repositories with the associated data which have been used in the data study”) as they are cited in articles indexed by the Web of Science databases. The availability of data should be taken into account when attempting to make comparisons for data citation rates against other data sets, as in some disciplines, open access data is cited at higher rates (up to 69% higher for cancer research). |
Inappropriate Use Cases |
Citation counts should never be interpreted as a direct measure of quality. Raw citation counts should not be used as a measure of positive reputation for individual researchers. |
Available Metric Sources |
Data Citation Index, Google Scholar (rare) |
Transparency |
Varies by provider. The Data Citation Index is fully transparent regarding the data repositories it indexes. The Data Citation Index white paper, “Recommended practices to promote scholarly data citation and tracking” (n.d.), describes how the Web of Science can find properly formed citations to datasets in order to calculate citations for the DCI. Google Scholar can index any content that conforms to their formatting guidelines, but is designed to primarily index journal articles, monographs, and other “print” outputs. |
Website |
n/a |
Timeframe |
In theory, data sets from any year can be referenced in scholarly literature. Google Scholar’s temporal scope is unknown. Data Citation Index includes citations to data from 1900 onwards. |