Note
Go to the end to download the full example code
Words with Functions¶
Collect article text data and metadata, using a function oriented approach.
Function Approach: collect_words¶
The function for collecting words data is collect_words()
.
Given a list of search terms, this function handles all the requests to collect the data.
The parameters for collect_words are the same as described in the Words tutorial.
Here we will briefly explore collecting data directly using the function approach.
# Import the function to collect words data
from lisc.collect import collect_words
# Set some terms to search for
terms = [['brain'], ['body']]
# Collect words data, setting to collect data for at most 5 articles per term
results, meta_data = collect_words(terms, retmax=5, usehistory=False,
save_and_clear=False, verbose=True)
/Users/tom/opt/anaconda3/lib/python3.8/site-packages/bs4/builder/__init__.py:545: XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument `features="xml"` into the BeautifulSoup constructor.
warnings.warn(
Collecting data for: brain
Collecting data for: body
# The meta data includes some information on the database from which data was collected
meta_data['db_info']
{'dbname': 'pubmed', 'menuname': 'PubMed', 'description': 'PubMed bibliographic record', 'dbbuild': 'Build-2023.10.29.00.14', 'count': '36388316', 'lastupdate': '2023/10/29 00:14'}
# The collected data is returned as a list of Articles objects
print(results)
[<lisc.data.articles.Articles object at 0x7fa0aa4b01c0>, <lisc.data.articles.Articles object at 0x7fa0aa2b66d0>]
# Each `Articles` object holds the data for the collected articles for a given term
res1 = results[0]
# Print out some of the data
print(res1.n_articles, '\n')
print('\n'.join(res1.titles), '\n')
5
Bench-to-bedside investigations of H3 K27-altered diffuse midline glioma: drug targets and potential pharmacotherapies.
Epidemiological risk factors and phylogenetic affinities of Sarcocystis infecting village chickens and pigs in Peninsular Malaysia.
Association of aEEG and brain injury severity on MRI at term-equivalent age in preterm infants.
Gut microbiota-derived short chain fatty acids act as mediators of the gut-brain axis targeting age-related neurodegenerative disorders: a narrative review.
Polyomavirus Wakes Up and Chooses Neurovirulence.
To further explore the collected data, check out the documentation for the
Articles
object. To aggregate data across articles, check out the
ArticlesAll
object.
Total running time of the script: ( 0 minutes 2.426 seconds)