Note
Go to the end to download the full example code
Tutorial 02: Words Analysis¶
Analyzing collected text data and metadata.
Words Analyses¶
This tutorial covers exploring & analyzing words data.
For this tutorial, we will reload and use the Words
object with
the data that we collected in the last tutorial.
Note that this tutorial requires some optional dependencies, including the WordCloud module.
# Import the custom objects that are used to store collected words data
from lisc.data import Articles, ArticlesAll
# Import database and IO utilities to reload our previously collected data
from lisc.io import SCDB, load_object
# Import plots that are available for words data
from lisc.plts.words import plot_wordcloud
Articles Object¶
LISC uses custom objects to store and organize collected words data.
These objects are used internally in the Words
objects.
If the data collection was set to save out data as it was collected, then
Articles
objects can be loaded individually, using the label
of the search term.
ArticlesAll Object¶
The ArticlesAll
object aggregates collected data across all articles collected
for a given search term.
# Collapse data across articles
arts_all = ArticlesAll(arts)
It also has methods to create and check summaries created from the aggregate data.
# Check an example summary
arts_all.create_summary()
arts_all.print_summary()
frontal lobe :
Number of articles: 15
First publication: 2023
Most common author: Li M
number of publications: 2
Most common journal: Brain sciences
number of publications: 1
Words Object¶
The Words
object can also be used to reload and analyze collected data.
The results attribute contains a list of Articles
objects, one for each term.
# Reload the words object, specifying to also reload the article data
words = load_object('tutorial_words', directory=SCDB('lisc_db'), reload_results=True)
Note that the reloaded data is the raw data from the data collection.
The process_articles()
method can be used to do some preprocessing on the
collected data.
By default, the process_articles()
function is used to process articles, which
preprocesses journal and author names, and tokenizes the text data. You can also pass in
a custom function to apply custom processing to the collected articles data.
Note that some processing steps, like converting to the ArticlesAll representation, will automatically apply article preprocessing.
# Preprocess article data
words.process_articles()
We can also aggregate data across articles, just as we did before, directly in the Words object.
If you run the process_combined_results()
method, then the
combined_results attribute will contain the corresponding list of
ArticlesAll
objects, also one for each term.
# Process collected data into aggregated data objects
words.process_combined_results()
# Plot a WordCloud of the collected data for the first term
plot_wordcloud(words.combined_results[0].words, 25)
Exploring Words Data¶
The Words
object also has some methods for exploring the data, including
allowing for indexing into and looping through collected results.
# Index results for a specific label
print(words['frontal lobe'])
<lisc.data.articles.Articles object at 0x7fa0aa47feb0>
You can also loop through all the articles found for a specified search term.
The iteration returns a dictionary with all the article data, which can be examined.
# Iterating through articles found for a search term of interest
for art in words['temporal lobe']:
print(art['title'])
Antiepileptogenic Effects of Anakinra, Lamotrigine and Their Combination in a Lithium-Pilocarpine Model of Temporal Lobe Epilepsy in Rats.
The NLRP3 Inflammasome in Neurodegenerative Disorders: Insights from Epileptic Models.
An Integrated Multi-Channel Deep Neural Network for Mesial Temporal Lobe Epilepsy Identification Using Multi-Modal Medical Data.
Abnormal Topological Organization of Structural Covariance Networks in Patients with Temporal Lobe Epilepsy Comorbid Sleep Disorder.
An MR spectroscopy study of temporal areas excluding primary auditory cortex and frontal regions in subjective bilateral and unilateral tinnitus.
Involvement of the posterior cingulate gyrus in temporal lobe epilepsy: A study using stereo-EEG.
Phase II randomised placebo-controlled trial of sodium selenate as a disease-modifying treatment in chronic drug-resistant temporal lobe epilepsy: the SeLECT study protocol.
Language MEG predicts postoperative verbal memory change in left mesial temporal lobe epilepsy.
The role of subicular VIP-expressing interneurons on seizure dynamics in the intrahippocampal kainic acid model of temporal lobe epilepsy.
Hippocampal Network Dysfunction in Early Psychosis: A 2-Year Longitudinal Study.
Molecular subtypes of epilepsy associated with post-surgical seizure recurrence.
High-Grade Temporal Ganglioglioma in an Older Adult Woman.
Metabolic connectivity as a predictor of surgical outcome in mesial temporal lobe epilepsy.
Cell-specific NFIA upregulation promotes epileptogenesis by TRPV4-mediated astrocyte reactivity.
Autophagy and autophagy signaling in Epilepsy: possible role of autophagy activator.
Analyzing Words Data¶
Further analysis depends mostly on what one wants to do with the collected data.
For example, this might include building profiles for each search term, based on data in collected articles. It might also include using methods from natural language processing, such as vector embeddings and/or similarity measures.
Specific analyses might also be interested in exploring historical patterns in the literature, examining, for example, the history of when certain topics were written about, and in what journals, by which authors.
Total running time of the script: ( 0 minutes 0.179 seconds)