lisc.io.db.SCDB

class lisc.io.db.SCDB(base=None, generate_paths=True, structure={1: {'base': ['terms', 'logs', 'data', 'figures']}, 2: {'data': ['counts', 'words']}, 3: {'words': ['raw', 'summary']}})[source]

Database object for a SCANR project.

Notes

The default set of paths for SCDB is:

Level 1: Base

terms

Terms files.

logs

Logs files.

figs

Figures files.

data

Data files.

Level 2: Data

counts

Counts data files.

words

Words data files.

Level 3: Words

raw

Raw words data files.

summary

Summary files for words data.

Attributes
pathsdict

Dictionary of all folder paths in the project.

__init__(base=None, generate_paths=True, structure={1: {'base': ['terms', 'logs', 'data', 'figures']}, 2: {'data': ['counts', 'words']}, 3: {'words': ['raw', 'summary']}})[source]

Initialize a SCDB object.

Parameters
basestr

The base path to where the database is located.

generate_pathsbool

Whether to automatically generate all the paths for the database folders.

Examples

Initialize a SCDB object:

>>> db = SCDB('lisc_db')

Methods

__init__([base, generate_paths, structure])

Initialize a SCDB object.

check_file_structure()

Check the file structure of the database.

gen_paths([structure])

Generate all the full paths for the database object.

get_file_path(folder, file_name)

Get a path to a file in a designated directory folder.

get_files(folder[, drop_ext, sort_files])

Get a list of available files in a folder in the database.

get_folder_path(folder)

Get the path to a folder in the directory.

check_file_structure()[source]

Check the file structure of the database.

gen_paths(structure={1: {'base': ['terms', 'logs', 'data', 'figures']}, 2: {'data': ['counts', 'words']}, 3: {'words': ['raw', 'summary']}})[source]

Generate all the full paths for the database object.

Parameters
structuredict, optional

Definition of the folder structure for the database.

Examples

Generate paths for a SCDB object:

>>> db = SCDB('lisc_db')
>>> db.gen_paths()
>>> db.paths 
{'base': PosixPath('lisc_db'),
 'terms': PosixPath('lisc_db/terms'),
 'logs': PosixPath('lisc_db/logs'),
 'data': PosixPath('lisc_db/data'),
 'figures': PosixPath('lisc_db/figures'),
 'counts': PosixPath('lisc_db/data/counts'),
 'words': PosixPath('lisc_db/data/words'),
 'raw': PosixPath('lisc_db/data/words/raw'),
 'summary': PosixPath('lisc_db/data/words/summary')}
get_file_path(folder, file_name)[source]

Get a path to a file in a designated directory folder.

Parameters
folderstr

Which folder path to get the file path from.

file_namestr

The name of the file to create the full file path for.

Returns
str

The full file path to the requested file.

Examples

Get the path to a Counts file:

>>> db = SCDB('lisc_db')
>>> db.get_file_path('counts', 'tutorial_counts.p')
PosixPath('lisc_db/data/counts/tutorial_counts.p')
get_files(folder, drop_ext=False, sort_files=True)[source]

Get a list of available files in a folder in the database.

Parameters
folderstr

Which folder path to get the list of files from.

drop_extbool, optional, default: True

Whether to drop the extensions from the list of file names.

sort_filesbool, optional, default: True

Whether to sort the list of files before returning.

Returns
fileslist of str

List of files available in specified folder.

Examples

Get a list of available terms files:

>>> db = SCDB('lisc_db')
>>> db.get_files('terms') 
get_folder_path(folder)[source]

Get the path to a folder in the directory.

Parameters
folderstr

Which folder to get the path for.

Returns
str

The path to the requested directory folder.

Examples

Get the path to the folder containing Counts data:

>>> db = SCDB('lisc_db')
>>> db.get_folder_path('counts')
PosixPath('lisc_db/data/counts')