Note
Go to the end to download the full example code.
Tutorial 02: Words Analysis¶
Analyzing collected text data and metadata.
Words Analyses¶
This tutorial covers exploring & analyzing words data.
For this tutorial, we will reload and use the Words
object with
the data that we collected in the last tutorial.
Note that this tutorial requires some optional dependencies, including the WordCloud module.
# Import the custom objects that are used to store collected words data
from lisc.data import Articles, ArticlesAll
# Import database and IO utilities to reload our previously collected data
from lisc.io import SCDB, load_object
# Import plots that are available for words data
from lisc.plts.words import plot_wordcloud
Articles Object¶
LISC uses custom objects to store and organize collected words data.
These objects are used internally in the Words
objects.
If the data collection was set to save out data as it was collected, then
Articles
objects can be loaded individually, using the label
of the search term.
ArticlesAll Object¶
The ArticlesAll
object aggregates collected data across all articles collected
for a given search term.
# Collapse data across articles
arts_all = ArticlesAll(arts)
It also has methods to create and check summaries created from the aggregate data.
# Check an example summary
arts_all.create_summary()
arts_all.print_summary()
frontal lobe :
Number of articles: 15
First publication: 2024
Most common author: Feng R
number of publications: 2
Most common journal: IEEE transactions on neural systems and rehabilitation engineering : a publication of the IEEE Engineering in Medicine and Biology Society
number of publications: 2
Words Object¶
The Words
object can also be used to reload and analyze collected data.
The results attribute contains a list of Articles
objects, one for each term.
# Reload the words object, specifying to also reload the article data
words = load_object('tutorial_words', directory=SCDB('lisc_db'), reload_results=True)
Note that the reloaded data is the raw data from the data collection.
The process_articles()
method can be used to do some preprocessing on the
collected data.
By default, the process_articles()
function is used to process articles, which
preprocesses journal and author names, and tokenizes the text data. You can also pass in
a custom function to apply custom processing to the collected articles data.
Note that some processing steps, like converting to the ArticlesAll representation, will automatically apply article preprocessing.
# Preprocess article data
words.process_articles()
We can also aggregate data across articles, just as we did before, directly in the Words object.
If you run the process_combined_results()
method, then the
combined_results attribute will contain the corresponding list of
ArticlesAll
objects, also one for each term.
# Process collected data into aggregated data objects
words.process_combined_results()
# Plot a WordCloud of the collected data for the first term
plot_wordcloud(words.combined_results[0].words, 25)

Exploring Words Data¶
The Words
object also has some methods for exploring the data, including
allowing for indexing into and looping through collected results.
# Index results for a specific label
print(words['frontal lobe'])
<lisc.data.articles.Articles object at 0x7f85becca7c0>
You can also loop through all the articles found for a specified search term.
The iteration returns a dictionary with all the article data, which can be examined.
# Iterating through articles found for a search term of interest
for art in words['temporal lobe']:
print(art['title'])
Frequency-specific network changes in mesial temporal lobe epilepsy: Analysis of chronic and transient dysfunctions in the temporo-amygdala-orbitofrontal network using magnetoencephalography.
Navigating the Diagnostic Challenges of Posterior Circulation Ischaemic Strokes: A Case Report of Delayed Diagnosis.
Herpes zoster central nervous system complication: An increasing trend of acute limbic encephalitis.
Activation of the Reelin/GSK-3β/p-Tau Signaling Pathway in the Hippocampus of Patients with Temporal Lobe Epilepsy.
Unveiling the clinical spectrum of herpes simplex virus CNS infections in adults: a systematic review.
Spatial invasion patterns of temporal lobe glioblastoma after complete resection of contrast-enhancing tumor.
Temporal lobectomy in bilateral temporal lobe epilepsy: A relook at factors in selection, invasive evaluation and seizure outcome.
Potential role of MELD and MAP18 in patients with structural temporal lobe epilepsy.
Systematic review and meta-analysis of bulk RNAseq studies in human Alzheimer's disease brain tissue.
Anatomical aspects, technical nuances, and a case series of the resection of the inferior temporal gyrus as a strategy to access the basal surface of the temporal lobe and the lateral incisural space.
A specific model of resting-state functional brain network in MRI-negative temporal lobe epilepsy.
Corrigendum: Structural-functional coupling abnormalities in temporal lobe epilepsy.
Epileptic Seizure Classification with Patient-level and Video-level Contrastive Pretraining.
Altered spatiotemporal consistency and their genetic mechanisms in mild cognitive impairment: a combined neuroimaging and transcriptome study.
Utility of [18F]fluciclovine PET/MRI for identifying the optimal biopsy target region, helping to avoid underdiagnosis in patients with glioblastoma: illustrative case.
Analyzing Words Data¶
Further analysis depends mostly on what one wants to do with the collected data.
For example, this might include building profiles for each search term, based on data in collected articles. It might also include using methods from natural language processing, such as vector embeddings and/or similarity measures.
Specific analyses might also be interested in exploring historical patterns in the literature, examining, for example, the history of when certain topics were written about, and in what journals, by which authors.
Total running time of the script: (0 minutes 0.236 seconds)