lisc.Counts¶
- class lisc.Counts[source]¶
A class for collecting and analyzing co-occurrence data for specified terms list(s).
- Attributes
- termsdict
Search terms to use.
- counts2d array
The number of articles found for each combination of terms.
- score2d array
A transformed ‘score’ of co-occurrence data. This may be normalized count data, or a similarity or association measure.
- score_infodict
Information about the computed score data.
- squarebool
Whether the count data matrix is symmetrical.
- meta_dataMetaData
Meta data information about the data collection.
Methods
__init__
()Initialize LISC Counts object.
add_labels
(terms[, directory, dim])Add labels for terms to the object.
add_terms
(terms[, term_type, directory, dim])Add search terms to the object.
check_counts
([dim])Check how many articles were found for each term.
check_data
([data_type, dim])Prints out the highest value count or score for each term.
check_top
([dim])Check the terms with the most articles.
compute_score
([score_type, dim, return_result])Compute a score, such as an index or normalization, of the co-occurrence data.
copy
()Return a copy of the current object.
drop_data
(n_articles[, dim])Drop terms based on number of article results.
run_collection
([db, field, api_key, ...])Collect co-occurrence data.
Attributes
Indicator for if the object has collected data.
- add_labels(terms, directory=None, dim='A')[source]¶
Add labels for terms to the object.
- Parameters
- labelslist of str or str
Labels for each term to add to the object. If list, is assumed to be labels. If str, is assumed to be a file name to load from.
- directorySCDB or str, optional
Folder or database object specifying the file location, if loading from file.
- dim{‘A’, ‘B’}, optional
Which set of labels to add.
- add_terms(terms, term_type='terms', directory=None, dim='A')[source]¶
Add search terms to the object.
- Parameters
- termslist or dict or str
Terms to add to the object. If list, assumed to be terms, which can be a list of str or a list of list of str. If dict, each key should reflect a term_type, and values the corresponding terms. If str, assumed to be a file name to load from.
- term_type{‘terms’, ‘inclusions’, ‘exclusions’}, optional
Which type of terms are being added.
- directorySCDB or str, optional
A string or object containing a file path.
- dim{‘A’, ‘B’}, optional
Which set of terms to add.
Examples
Add one set of terms, from a list:
>>> counts = Counts() >>> counts.add_terms(['frontal lobe', 'temporal lobe', 'parietal lobe', 'occipital lobe'])
Add a second set of terms, from a list:
>>> counts.add_terms(['attention', 'perception'], dim='B')
Add some exclusion words, for the second set of terms, from a list:
>>> counts.add_terms(['', 'extrasensory'], term_type='exclusions', dim='B')
- check_counts(dim='A')[source]¶
Check how many articles were found for each term.
- Parameters
- dim{‘A’, ‘B’}
Which set of terms to check.
Examples
Print the number of articles found for each term (assuming counts already has data):
>>> counts.check_counts()
- check_data(data_type='counts', dim='A')[source]¶
Prints out the highest value count or score for each term.
- Parameters
- data_type{‘counts’, ‘score’}
Which data type to use.
- dim{‘A’, ‘B’}, optional
Which set of terms to check.
Examples
Print the highest count for each term (assuming counts already has data):
>>> counts.check_data()
Print the highest score value for each term (assuming counts already has data):
>>> counts.check_data(data_type='score')
- check_top(dim='A')[source]¶
Check the terms with the most articles.
- Parameters
- dim{‘A’, ‘B’}, optional
Which set of terms to check.
Examples
Print which term has the most articles (assuming counts already has data):
>>> counts.check_top()
- compute_score(score_type='association', dim='A', return_result=False)[source]¶
Compute a score, such as an index or normalization, of the co-occurrence data.
- Parameters
- score_type{‘association’, ‘normalize’, ‘similarity’}, optional
The type of score to apply to the co-occurrence data.
- dim{‘A’, ‘B’}, optional
Which dimension of counts to use to normalize by or compute similarity across. Only used if ‘score’ is ‘normalize’ or ‘similarity’.
- return_resultbool, optional, default: False
Whether to return the computed result.
Examples
Compute association scores of co-occurrence data collected for two lists of terms:
>>> counts = Counts() >>> counts.add_terms(['frontal lobe', 'temporal lobe', 'parietal lobe', 'occipital lobe']) >>> counts.add_terms(['attention', 'perception'], dim='B') >>> counts.run_collection() >>> counts.compute_score()
Once you have co-occurrence scores calculated, you might want to plot this data.
You can plot the results as a matrix:
>>> from lisc.plts.counts import plot_matrix >>> plot_matrix(counts)
And/or as a clustermap:
>>> from lisc.plts.counts import plot_clustermap >>> plot_clustermap(counts)
And/or as a dendrogram:
>>> from lisc.plts.counts import plot_dendrogram >>> plot_dendrogram(counts)
- drop_data(n_articles, dim='A')[source]¶
Drop terms based on number of article results.
- Parameters
- n_articlesint
Minimum number of articles required to keep each term.
- dim{‘A’, ‘B’}, optional
Which set of terms to drop.
Examples
Drop terms with less than 20 articles (assuming counts already has data):
>>> counts.drop_data(20)
- property has_data¶
Indicator for if the object has collected data.
- run_collection(db='pubmed', field='TIAB', api_key=None, logging=None, directory=None, verbose=False, **eutils_kwargs)[source]¶
Collect co-occurrence data.
- Parameters
- dbstr, optional, default: ‘pubmed’
Which database to access from EUtils.
- fieldstr, optional, default: ‘TIAB’
Field to search for term in. Defaults to ‘TIAB’, which is Title/Abstract.
- api_keystr, optional
An API key for a NCBI account.
- logging{None, ‘print’, ‘store’, ‘file’}, optional
What kind of logging, if any, to do for requested URLs.
- directorystr or SCDB, optional
Folder or database object specifying the save location.
- verbosebool, optional, default: False
Whether to print out updates.
- **eutils_kwargs
Additional settings for the EUtils API.
Examples
Collect co-occurrence data from added terms, across one set of terms:
>>> counts = Counts() >>> counts.add_terms(['frontal lobe', 'temporal lobe', 'parietal lobe', 'occipital lobe']) >>> counts.run_collection()
Collect co-occurrence data from added terms, across two sets of terms:
>>> counts = Counts() >>> counts.add_terms(['frontal lobe', 'temporal lobe', 'parietal lobe', 'occipital lobe']) >>> counts.add_terms(['attention', 'perception', 'cognition'], dim='B') >>> counts.run_collection()
Examples using lisc.Counts
¶
Tutorial 03: Counts Collection