Note
Go to the end to download the full example code
URLs and Requests¶
Explore LISC utilities for managing URLs and requests.
URLs & Requests¶
LISC uses custom objects to manage URLs and launch requests. These can be used to define how to interact with APIs of interest.
For the main LISC functionality, you don’t have to deal with these objects directly. They are used ‘under the hood’ by LISC functions for collecting data and interacting with APIs without requiring direct user interaction.
In this example, we will explore using these objects directly, which may be useful for creating custom data collections, and/or to connect LISC with other APIs.
# Import the requester object
from lisc.requester import Requester
Requester Object¶
The Requester
object uses the
requests
module to launch URL requests. It also adds some functionality such as throttling, to
ensure requests respect API limits, as well as metadata collection and URL logging.
# Set the minimum wait time between requests
req.set_wait_time(0.5)
# Use the Requester object to request some web pages
for url in ['https://www.google.com', 'https://www.yahoo.com', 'https://duckduckgo.com']:
page = req.request_url(url)
print('Collecting web page \t {} \t got status code \t {}'.format(page.url, page.status_code))
Collecting web page https://www.google.com/ got status code 200
Collecting web page https://www.yahoo.com/ got status code 200
Collecting web page https://duckduckgo.com/ got status code 200
# Check details of the requester object
req.check()
Requester object is active: True
Number of requests sent: 3
Requester opened: 00:27:11 Sunday 29 October 2023
Requester closed:
# Get information from the requester object as a dictionary
print(req.as_dict())
{'is_active': True, 'n_requests': 3, 'wait_time': 0.5, 'start_time': '00:27:11 Sunday 29 October 2023', 'end_time': '', 'logging': None, 'log': None}
URLs Object¶
The URLs
object is used in LISC to store URLs used for interacting with APIs.
It includes functionality for using different utilities available through an API, and for storing and using different settings that may be available.
In this example, we will explore using the the URLs object to access the duckduckgo API.
# Import the URLs object
from lisc.urls import URLs
# Build and check the search URL
urls.build_url('search', settings={'format': 'json'})
urls.check_url('search')
https://api.duckduckgo.com/?format=json
# Get the URL to launch a search request with a specified search term
api_url = urls.get_url('search', settings={'q' : 'brain'})
print(api_url)
https://api.duckduckgo.com/?format=json&q=brain
# Request the URL with the requester object from before
api_page = req.request_url(api_url)
# Check the source of the first search result
api_page.json()['AbstractSource']
'Wikipedia'
Supported APIs¶
The URLs
object can be used to create objects that support external APIs.
LISC currently supports APIs for EUtils and OpenCitations.
These are implemented as custom objects which are built on top of the URLs
object.
# Import URL objects for supported APIs
from lisc.urls import EUtils, OpenCitations
{'info': 'einfo.fcgi', 'query': 'egquery.fcgi', 'search': 'esearch.fcgi', 'fetch': 'efetch.fcgi', 'base': 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils'}
# Initialize an OpenCitations API object
citations = OpenCitations()
# Check what utilities are supported for the OpenCitations API
print(citations.utils)
{'references': 'references', 'citations': 'citations', 'metadata': 'metadata', 'base': 'https://w3id.org/oc/index/coci/api/v1'}
Adding New APIs¶
The EUtils and OpenCitations objects can be used as examples for potentially adding new APIs to LISC. New API objects can be created by inheriting from the URLs object, and added information on the utilities and settings available for that particular API.
Total running time of the script: ( 0 minutes 2.572 seconds)