lisc.Counts

class lisc.Counts[source]

A class for collecting and analyzing co-occurrence data for specified terms list(s).

Attributes:
termsdict

Search terms to use.

counts2d array

The number of articles found for each combination of terms.

score2d array

A transformed ‘score’ of co-occurrence data. This may be normalized count data, or a similarity or association measure.

score_infodict

Information about the computed score data.

squarebool

Whether the count data matrix is symmetrical.

meta_dataMetaData

Meta data information about the data collection.

__init__()[source]

Initialize LISC Counts object.

Methods

__init__()

Initialize LISC Counts object.

add_labels(terms[, directory, dim])

Add labels for terms to the object.

add_terms(terms[, term_type, directory, dim])

Add search terms to the object.

check_counts([dim])

Check how many articles were found for each term.

check_data([data_type, dim])

Prints out the highest value count or score for each term.

check_top([dim])

Check the terms with the most articles.

clear_score()

Clear any previously computed score.

compute_score([score_type, dim, return_result])

Compute a score, such as an index or normalization, of the co-occurrence data.

copy()

Return a copy of the current object.

drop_data(n_articles[, dim, value])

Drop terms based on number of article results.

run_collection([db, field, api_key, ...])

Collect co-occurrence data.

set_joiners([search, inclusions, ...])

Set joiners to use, specified for each term type.

Attributes

has_data

Indicator for if the object has collected data.

add_labels(terms, directory=None, dim='A')[source]

Add labels for terms to the object.

Parameters:
labelslist of str or str

Labels for each term to add to the object. If list, is assumed to be labels. If str, is assumed to be a file name to load from.

directorySCDB or str, optional

Folder or database object specifying the file location, if loading from file.

dim{‘A’, ‘B’}, optional

Which set of labels to add.

add_terms(terms, term_type='terms', directory=None, dim='A')[source]

Add search terms to the object.

Parameters:
termslist or dict or str

Terms to add to the object. If list, assumed to be terms, which can be a list of str or a list of list of str. If dict, each key should reflect a term_type, and values the corresponding terms. If str, assumed to be a file name to load from.

term_type{‘terms’, ‘inclusions’, ‘exclusions’}, optional

Which type of terms are being added.

directorySCDB or str, optional

A string or object containing a file path.

dim{‘A’, ‘B’}, optional

Which set of terms to add.

Examples

Add one set of terms, from a list:

>>> counts = Counts()
>>> counts.add_terms(['frontal lobe', 'temporal lobe', 'parietal lobe', 'occipital lobe'])

Add a second set of terms, from a list:

>>> counts.add_terms(['attention', 'perception'], dim='B')

Add some exclusion words, for the second set of terms, from a list:

>>> counts.add_terms(['', 'extrasensory'], term_type='exclusions', dim='B')
check_counts(dim='A')[source]

Check how many articles were found for each term.

Parameters:
dim{‘A’, ‘B’, ‘both’}

Which set of terms to check.

Examples

Print the number of articles found for each term (assuming counts already has data):

>>> counts.check_counts() 
check_data(data_type='counts', dim='A')[source]

Prints out the highest value count or score for each term.

Parameters:
data_type{‘counts’, ‘score’}

Which data type to use.

dim{‘A’, ‘B’, ‘both’}, optional

Which set of terms to check.

Examples

Print the highest count for each term (assuming counts already has data):

>>> counts.check_data() 

Print the highest score value for each term (assuming counts already has data):

>>> counts.check_data(data_type='score') 
check_top(dim='A')[source]

Check the terms with the most articles.

Parameters:
dim{‘A’, ‘B’, ‘both’}, optional

Which set of terms to check.

Examples

Print which term has the most articles (assuming counts already has data):

>>> counts.check_top() 
clear_score()[source]

Clear any previously computed score.

compute_score(score_type='association', dim='A', return_result=False)[source]

Compute a score, such as an index or normalization, of the co-occurrence data.

Parameters:
score_type{‘association’, ‘normalize’, ‘similarity’}, optional

The type of score to apply to the co-occurrence data.

dim{‘A’, ‘B’}, optional

Which dimension of counts to use to normalize by or compute similarity across. Only used if ‘score’ is ‘normalize’ or ‘similarity’.

return_resultbool, optional, default: False

Whether to return the computed result.

Examples

Compute association scores of co-occurrence data collected for two lists of terms:

>>> counts = Counts()
>>> counts.add_terms(['frontal lobe', 'temporal lobe', 'parietal lobe', 'occipital lobe'])
>>> counts.add_terms(['attention', 'perception'], dim='B')
>>> counts.run_collection() 
>>> counts.compute_score() 

Once you have co-occurrence scores calculated, you might want to plot this data.

You can plot the results as a matrix:

>>> from lisc.plts.counts import plot_matrix  
>>> plot_matrix(counts)  

And/or as a clustermap:

>>> from lisc.plts.counts import plot_clustermap  
>>> plot_clustermap(counts)  

And/or as a dendrogram:

>>> from lisc.plts.counts import plot_dendrogram  
>>> plot_dendrogram(counts)  
copy()[source]

Return a copy of the current object.

drop_data(n_articles, dim='A', value='count')[source]

Drop terms based on number of article results.

Parameters:
n_articlesint

Minimum number of articles required to keep each term.

dim{‘A’, ‘B’}, optional

Which set of terms to drop.

value{‘count’, ‘coocs’}
Which data count to drop based on:

‘count’ : drops based on the total number of articles per term ‘coocs’ : drops based on the co-occurrences, if all values are below n_articles

Notes

This will drop any computed scores, as they may not be accurate after dropping data.

Examples

Drop terms with less than 20 articles (assuming counts already has data):

>>> counts.drop_data(20) 
property has_data

Indicator for if the object has collected data.

run_collection(db='pubmed', field='TIAB', api_key=None, logging=None, directory=None, verbose=False, **eutils_kwargs)[source]

Collect co-occurrence data.

Parameters:
dbstr, optional, default: ‘pubmed’

Which database to access from EUtils.

fieldstr, optional, default: ‘TIAB’

Field to search for term in. Defaults to ‘TIAB’, which is Title/Abstract.

api_keystr, optional

An API key for a NCBI account.

logging{None, ‘print’, ‘store’, ‘file’}, optional

What kind of logging, if any, to do for requested URLs.

directorystr or SCDB, optional

Folder or database object specifying the save location.

verbosebool, optional, default: False

Whether to print out updates.

**eutils_kwargs

Additional settings for the EUtils API.

Examples

Collect co-occurrence data from added terms, across one set of terms:

>>> counts = Counts()
>>> counts.add_terms(['frontal lobe', 'temporal lobe', 'parietal lobe', 'occipital lobe'])
>>> counts.run_collection() 

Collect co-occurrence data from added terms, across two sets of terms:

>>> counts = Counts()
>>> counts.add_terms(['frontal lobe', 'temporal lobe', 'parietal lobe', 'occipital lobe'])
>>> counts.add_terms(['attention', 'perception', 'cognition'], dim='B')
>>> counts.run_collection() 
set_joiners(search=None, inclusions=None, exclusions=None, dim='A')[source]

Set joiners to use, specified for each term type.

Parameters:
search, inclusions, exclusions{‘OR’, ‘AND’, ‘NOT’}

Joiner to use to combine terms, for search, inclusions, and exclusions terms.

dim{‘A’, ‘B’}, optional

Which set of terms to set joiners for.

Examples using lisc.Counts

Tutorial 04: Counts Collection

Tutorial 04: Counts Collection

Tutorial 05: Counts Analysis

Tutorial 05: Counts Analysis

Tutorial 07: MetaData

Tutorial 07: MetaData