lisc.io.db.SCDB

class lisc.io.db.SCDB(base=None, generate_paths=True, structure={1: {'base': ['terms', 'logs', 'data', 'figures']}, 2: {'data': ['counts', 'words']}, 3: {'words': ['raw', 'summary']}})[source]

Database object for a SCANR project.

Notes

The default set of paths for SCDB is:

Level 1: Base

terms

Terms files.

logs

Logs files.

figs

Figures files.

data

Data files.

Level 2: Data

counts

Counts data files.

words

Words data files.

Level 3: Words

raw

Raw words data files.

summary

Summary files for words data.

Attributes:
pathsdict

Dictionary of all folder paths in the project.

__init__(base=None, generate_paths=True, structure={1: {'base': ['terms', 'logs', 'data', 'figures']}, 2: {'data': ['counts', 'words']}, 3: {'words': ['raw', 'summary']}})[source]

Initialize a SCDB object.

Parameters:
basestr

The base path to where the database is located.

generate_pathsbool

Whether to automatically generate all the paths for the database folders.

Examples

Initialize a SCDB object:

>>> db = SCDB('lisc_db')

Methods

__init__([base, generate_paths, structure])

Initialize a SCDB object.

check_file_structure()

Check the file structure of the database.

gen_paths([structure])

Generate all the full paths for the database object.

get_file_path(folder, file_name)

Get a path to a file in a designated directory folder.

get_files(folder[, drop_ext, sort_files])

Get a list of available files in a folder in the database.

get_folder_path(folder)

Get the path to a folder in the directory.

check_file_structure()[source]

Check the file structure of the database.

gen_paths(structure={1: {'base': ['terms', 'logs', 'data', 'figures']}, 2: {'data': ['counts', 'words']}, 3: {'words': ['raw', 'summary']}})[source]

Generate all the full paths for the database object.

Parameters:
structuredict, optional

Definition of the folder structure for the database.

Examples

Generate paths for a SCDB object:

>>> db = SCDB('lisc_db')
>>> db.gen_paths()
>>> db.paths 
{'base': PosixPath('lisc_db'),
 'terms': PosixPath('lisc_db/terms'),
 'logs': PosixPath('lisc_db/logs'),
 'data': PosixPath('lisc_db/data'),
 'figures': PosixPath('lisc_db/figures'),
 'counts': PosixPath('lisc_db/data/counts'),
 'words': PosixPath('lisc_db/data/words'),
 'raw': PosixPath('lisc_db/data/words/raw'),
 'summary': PosixPath('lisc_db/data/words/summary')}
get_file_path(folder, file_name)[source]

Get a path to a file in a designated directory folder.

Parameters:
folderstr

Which folder path to get the file path from.

file_namestr

The name of the file to create the full file path for.

Returns:
str

The full file path to the requested file.

Examples

Get the path to a Counts file:

>>> db = SCDB('lisc_db')
>>> db.get_file_path('counts', 'tutorial_counts.p')
PosixPath('lisc_db/data/counts/tutorial_counts.p')
get_files(folder, drop_ext=False, sort_files=True)[source]

Get a list of available files in a folder in the database.

Parameters:
folderstr

Which folder path to get the list of files from.

drop_extbool, optional, default: True

Whether to drop the extensions from the list of file names.

sort_filesbool, optional, default: True

Whether to sort the list of files before returning.

Returns:
fileslist of str

List of files available in specified folder.

Examples

Get a list of available terms files:

>>> db = SCDB('lisc_db')
>>> db.get_files('terms') 
get_folder_path(folder)[source]

Get the path to a folder in the directory.

Parameters:
folderstr

Which folder to get the path for.

Returns:
str

The path to the requested directory folder.

Examples

Get the path to the folder containing Counts data:

>>> db = SCDB('lisc_db')
>>> db.get_folder_path('counts')
PosixPath('lisc_db/data/counts')