Tutorial 05: Collecting Citation Data

Collecting citation data from OpenCitations.

References & Citations

Citation data, such as the list of articles a paper cites, and how many citations it receives, can be another useful measure for investigating the scientific literature. Unfortunately, citation data has historically been hard to access and investigate, due to a lack of available databases and APIs that provide access to such information.

OpenCitations Project

Recently, citation data has become more available with the OpenCitations project, which is an initiative to support and provide open bibliographic and citation data.

The OpenCitations project maintains a database of citation data, and provides an API. that can be accessed using LISC.

The main function for accessing the OpenCitations API is collect_citations().

# Import the function used to access the OpenCitations API
from lisc.collect import collect_citations

OpenCitations API

The OpenCitations API offers multiple utilities to collect citation and reference data.

Articles of interest can be searched for in the OpenCitations database using their DOIs.

LISC supports collecting the number of citations and references, as well as lists of DOIs that cite or are cited by requested articles.

In the following example, we will specify some DOIs or articles of interest, and collect citation and reference information about them with OpenCitations.

# Set up a list of DOIs to collect data for
dois = ['10.1007/s00228-017-2226-2', '10.1186/1756-8722-6-59']

Citation Data

Citations refers to articles that are cited by a specified article.

To do so, we need to pass our list of DOIs to the collect_citations() function. To get citations for these DOIs, we will use the ‘citations’ operation, which we specify by setting the ‘util’ argument to ‘citations’.

# Collect citation data from OpenCitations
n_citations, meta_data = collect_citations(dois, util='citations')

By default, collect_citations() returns a dictionary which stores the number of citations per input DOI, as well as a MetaData object describing the collection.

# Check out the number of citations per DOI
for doi, n_cite in n_citations.items():
    print('{:25s} \t : {}'.format(doi, n_cite))
10.1007/s00228-017-2226-2        : 25
10.1186/1756-8722-6-59           : 179

You can also specify to collect DOIs of the papers that cite papers of interest.

To do so, set the collect_dois argument to True, in which case an additional dictionary storing the DOIs of the articles that cite the searched for article(s) will be returned.

# Collect citations, including the list of cited DOIs
n_cites, cite_dois, meta_data = collect_citations(dois, util='citations', collect_dois=True)
# Check the collected list of citing DOIs
cite_dois[dois[0]]
['10.1038/s41584-019-0211-0', '10.1080/13543776.2017.1355908', '10.1080/17474086.2018.1435268', '10.1124/jpet.119.257089', '10.3390/ijms19041244', '10.1007/s00228-020-02851-x', '10.1080/13543784.2019.1692812', '10.3390/cancers12051328', '10.1080/03602532.2020.1765793', '10.1111/bcp.14571', '10.3390/ijms21145164', '10.1016/j.jpba.2020.113730', '10.1371/journal.pntd.0008425', '10.1016/j.bmc.2021.116163', '10.1002/cti2.1295', '10.1016/j.tetlet.2021.153068', '10.3899/jrheum.201622', '10.1016/bs.armc.2021.03.001', '10.1021/acs.jmedchem.0c01511', '10.3390/cancers13051103', '10.1021/acsmedchemlett.0c00335', '10.1021/acs.jmedchem.9b00167', '10.1021/acs.jmedchem.7b01712', '10.3390/molecules26164907', '10.1016/j.bioorg.2021.105541']

Reference Data

Instead of searching for citations to specified articles, we can also search for references, which refers to articles that are cited by the specified article(s).

To do so, set the util argument to use the ‘references’ operation.

# Collect reference data
n_references, ref_dois, meta_data = collect_citations(dois, util='references', collect_dois=True)
# Check out the number of references per DOI
for doi, n_refs in n_references.items():
    print('{:25s} \t : {}'.format(doi, n_refs))
10.1007/s00228-017-2226-2        : 23
10.1186/1756-8722-6-59           : 66

Additional Operations

There is additional information in the OpenCitations database, including meta-data- on individual articles that are cited and included in references.

This information is not yet accessible through LISC. Contributions are always welcome to extend the functionality. If you might be interested, feel free to get in touch on Github.

Total running time of the script: ( 0 minutes 4.655 seconds)

Gallery generated by Sphinx-Gallery