Tutorial 02: Words Analysis

Analyzing collected text data and metadata.

Words Analyses

This tutorial covers exploring & analyzing words data.

For this tutorial, we will reload and use the Words object with the data that we collected in the last tutorial.

Note that this tutorial requires some optional dependencies, including the WordCloud module.

# Import the custom objects that are used to store collected words data
from lisc.data import Articles, ArticlesAll

# Import database and IO utilities to reload our previously collected data
from lisc.io import SCDB, load_object

# Import plots that are available for words data
from lisc.plts.words import plot_wordcloud

Articles Object

LISC uses custom objects to store and organize collected words data.

These objects are used internally in the Words objects.

If the data collection was set to save out data as it was collected, then Articles objects can be loaded individually, using the label of the search term.

# Set up database object
db = SCDB('lisc_db')

# Load raw data for a particular term
term = 'frontal lobe'
arts = Articles(term)
arts.load(db)

ArticlesAll Object

The ArticlesAll object aggregates collected data across all articles collected for a given search term.

# Collapse data across articles
arts_all = ArticlesAll(arts)

It also has methods to create and check summaries created from the aggregate data.

# Check an example summary
arts_all.create_summary()
arts_all.print_summary()
frontal lobe :
  Number of articles:            15
  First publication:             2023
  Most common author:            Li M
    number of publications:      2
  Most common journal:           Brain sciences
    number of publications:      1

Words Object

The Words object can also be used to reload and analyze collected data.

The results attribute contains a list of Articles objects, one for each term.

# Reload the words object, specifying to also reload the article data
words = load_object('tutorial_words', directory=SCDB('lisc_db'), reload_results=True)

Note that the reloaded data is the raw data from the data collection.

The process_articles() method can be used to do some preprocessing on the collected data.

By default, the process_articles() function is used to process articles, which preprocesses journal and author names, and tokenizes the text data. You can also pass in a custom function to apply custom processing to the collected articles data.

Note that some processing steps, like converting to the ArticlesAll representation, will automatically apply article preprocessing.

# Preprocess article data
words.process_articles()

We can also aggregate data across articles, just as we did before, directly in the Words object.

If you run the process_combined_results() method, then the combined_results attribute will contain the corresponding list of ArticlesAll objects, also one for each term.

# Process collected data into aggregated data objects
words.process_combined_results()
# Plot a WordCloud of the collected data for the first term
plot_wordcloud(words.combined_results[0].words, 25)
plot 02 WordsAnalysis

Exploring Words Data

The Words object also has some methods for exploring the data, including allowing for indexing into and looping through collected results.

# Index results for a specific label
print(words['frontal lobe'])
<lisc.data.articles.Articles object at 0x7fa0aa47feb0>

You can also loop through all the articles found for a specified search term.

The iteration returns a dictionary with all the article data, which can be examined.

# Iterating through articles found for a search term of interest
for art in words['temporal lobe']:
    print(art['title'])
Antiepileptogenic Effects of Anakinra, Lamotrigine and Their Combination in a Lithium-Pilocarpine Model of Temporal Lobe Epilepsy in Rats.
The NLRP3 Inflammasome in Neurodegenerative Disorders: Insights from Epileptic Models.
An Integrated Multi-Channel Deep Neural Network for Mesial Temporal Lobe Epilepsy Identification Using Multi-Modal Medical Data.
Abnormal Topological Organization of Structural Covariance Networks in Patients with Temporal Lobe Epilepsy Comorbid Sleep Disorder.
An MR spectroscopy study of temporal areas excluding primary auditory cortex and frontal regions in subjective bilateral and unilateral tinnitus.
Involvement of the posterior cingulate gyrus in temporal lobe epilepsy: A study using stereo-EEG.
Phase II randomised placebo-controlled trial of sodium selenate as a disease-modifying treatment in chronic drug-resistant temporal lobe epilepsy: the SeLECT study protocol.
Language MEG predicts postoperative verbal memory change in left mesial temporal lobe epilepsy.
The role of subicular VIP-expressing interneurons on seizure dynamics in the intrahippocampal kainic acid model of temporal lobe epilepsy.
Hippocampal Network Dysfunction in Early Psychosis: A 2-Year Longitudinal Study.
Molecular subtypes of epilepsy associated with post-surgical seizure recurrence.
High-Grade Temporal Ganglioglioma in an Older Adult Woman.
Metabolic connectivity as a predictor of surgical outcome in mesial temporal lobe epilepsy.
Cell-specific NFIA upregulation promotes epileptogenesis by TRPV4-mediated astrocyte reactivity.
Autophagy and autophagy signaling in Epilepsy: possible role of autophagy activator.

Analyzing Words Data

Further analysis depends mostly on what one wants to do with the collected data.

For example, this might include building profiles for each search term, based on data in collected articles. It might also include using methods from natural language processing, such as vector embeddings and/or similarity measures.

Specific analyses might also be interested in exploring historical patterns in the literature, examining, for example, the history of when certain topics were written about, and in what journals, by which authors.

Total running time of the script: ( 0 minutes 0.179 seconds)

Gallery generated by Sphinx-Gallery