buhtzology

package documentation

(source)

A lose collection of code I reuse in different (scientific) projects.

Module	`analy`	Generate output for analysis in form of tables and figures.
Module	`bandas`	Helpers with `pandas`.
Module	`khq`	Scoring mode of the German version of the King's Health Questionnaire.
Module	`report`	Create reports from several data objects like `pandas.DataFrame`.
Module	`stausberg`	Stausberg comorbidity score calculation.
Module	`zuf`	Scoring mode of the ZUF-8 "Fragenbogen zur Patientenzufriedenheit".
Module	`_bandas_bins`	Helpers with `pandas`.
Module	`_bandas_misc`	Helpers with `pandas`.
Module	`_bandas_parallelize`	Helpers with `pandas`.
Module	`_buhtzology`	No module docstring; 0/1 variable, 1/1 constant documented
Module	`_datacontainer`	The bandas data container.

From __init__.py:

Function	`break_paragraph`	Break a paragraph as a whole.
Function	`generate_filepath`	Combine basename, path and timestamp to a filepath.
Function	`get_full_application_string`	Build a string representing the application name and its version.
Function	`get_git_repository_info`	Return the current branch and last commit hash.
Function	`how_much_elements_per_piece`	Calculate the number of elements needed in one piece.
Function	`nested_dict_update`	Nested update of dict-like 'org' with dict-like 'update'.
Function	`read_config_data`	Read a config file if present.
Function	`runtime_as_string`	Give the runtime timestamp as a string.
Function	`setup_logging`	Set up the root logger with a console and a file handler.
Function	`shorten_strings_but_unique`	Shorten strings to a given limit but keep them unique to the list.
Variable	`__full_name__`	Full name string including version, branch, commit.
Variable	`__version__`	Version string of the package.
Variable	`config`	Undocumented
Variable	`meta`	Undocumented
Function	`_package_metadata_as_dict`	Get package metadata and return it as a dict.

def break_paragraph(paragraph: list[str], width: int, suffix: str = '', prefix: str = '', sep_line: str = None) -> list[str]: (source) ¶

Break a paragraph as a whole.

The lines of the paragraph are cut into chunks of width length. Word wrapping is not done. The length of suffix and prefix is not calculated but added to the desired width.

# input
[
    'line one with banana',
    'line two with strawberry',
    'line two with ice cream',
]

# output; width = 10
[
    'line one w',
    'line two w',
    'line two w'
    'ith banana',
    'ith strawberry',
    'ith ice cream',
]

Parameters
paragraph:`list[str]`	The paragraph as a List of strings.
width:`int`	Paragraph is cut into chunks of that width.
suffix:`str`	String added to the end of every cut line (e.g. `…`).
prefix:`str`	String inserted to the beginning of every cut line (e.g. `\\t`).
sep_line:`str`	String to separate paragraphs with.
Returns
`list[str]`	Undocumented

def generate_filepath(basename: str, file_suffix: str, folder_path: pathlib.Path = None, age_offset: int = 0) -> pathlib.Path: (source) ¶

Combine basename, path and timestamp to a filepath.

Combine basename, path and timestamp to a filepath of an existing file. Generates a path of an existing file with a timestamp in its name based on a name stem, a file suffix. Optionally a folder different from current working dir (default) and an offset for the file age (based on the timestamp in filename) can be set.

The original file names should contain a timestamp (e.g. YYYY-MM-DD) or another numeric or lexicographical sortable element. The =folder_path= is searched for all files matching the name pattern =*basename*file_suffix=. The found names are sorted reverse lexicographical. The 'age_offset' element of that list is returned which is by default the first element in the list which in turn is the youngest file provided the timestamp's in the filenames are correct.

Parameters
basename:`str`	Name stem of the file's to search.
file_suffix:`str`	Suffix of the file's to search.
folder_path:`pathlib.Path`	Path to the folder where to search in (default: current working dir).
age_offset:`int`	Element to use after reverse lexicographic sorting.
Returns
`pathlib.Path`	File path relative to current working dir.
Raises
`FileNotFoundError`	When no file was found.
`IndexError`	If the 'age_offset' does not work.

def get_full_application_string() -> str: (source) ¶

Build a string representing the application name and its version.

If a git repo is present also information about its current state are added.

def get_git_repository_info() -> dict: (source) ¶

Return the current branch and last commit hash.

A special case is when that script runs on a Read The Docs instance. In that case the branch name is extracted from environment variables.

Credits: https://stackoverflow.com/a/51224861/4865723

def how_much_elements_per_piece(total_elements_n: int, pieces_n: int = None, min_max_per_piece: int = None, min_pieces_n: int = None) -> int: (source) ¶

Calculate the number of elements needed in one piece.

Calculate the number of elements in one piece when a bigger list of elements is cut into pieces depending on rules. If 'pieces_n' is given all other arguments are ignored and the result are the number of elements per piece you have to cut from your list to get the number of pieces specified by 'pieces_n'.

If 'pieces_n' is None then the number of elements per piece is calculated based on 'min_max_per_piece'. But if 'min_pieces_n' is given also then the min/max rules can be overwritten to give you as much elements per pieces to result with 'min_pieces_n.

There is a hierarchy of the rules.

If 'pieces_n' is given the rest of arguments are ignored.
If 'pieces_n' is None then 'min_max_per_piece' is taken into account. If present 'min_pieces_n' is also taken into account.

Be aware that the last piece in your list can have less then the resulting number of elements per piece.

Parameters
total_elements_n:`int`	x
pieces_n:`int`	y
min_max_per_piece:`int`	z
min_pieces_n:`int`	a
Returns
`int`	Number of elements one piece should have to fulfill the rules.
Raises
`ValueError`	No rules specified with arguments.

def nested_dict_update(org: dict, update: dict) -> dict: (source) ¶

Nested update of dict-like 'org' with dict-like 'update'.

See Deep merge dictionaries of dictionaries in Python at StackOverflow: https://stackoverflow.com/q/7204805/4865723 Credits for current solution:

https://stackoverflow.com/a/52319248/4865723

def read_config_data(path: pathlib.Path = None) -> dict: (source) ¶

Read a config file if present.

Read a config file if present, combine its data with config default values and return them.

# Filename is buhtzology.toml
[bandas]
decrease_workers_by=0

Parameters
path:`pathlib.Path`	The path to look for a file named `buhtzology.toml`.
Returns
`dict`	A dictionary with config data.

def runtime_as_string(start_time: datetime.datetime) -> str: (source) ¶

Give the runtime timestamp as a string.

The runtime is calculated from the difference between current time and 'start_time'. Example results are "20 seconds" or "4.5 minutes".

Parameters
start_time:`datetime.datetime`	Timestamp of start.
Returns
`str`	Undocumented

def setup_logging(*, log_directory: pathlib.Path = None, console_level: int = logging.INFO, console_format_str: str = '%(log_color)s[%(levelname)s]%(reset)s %(message)s', file_level: int = logging.DEBUG, file_format_str: str = '%(asctime)s - %(name)s - %(levelname)s - %(message)s', file_backup_count: int = 20, exceptional_loggers: dict[str, int] = None): (source) ¶

Set up the root logger with a console and a file handler.

The file handler is rotated for each session. The file name is based on sys.argv[0] (the filename of the current running python script). With exceptional_loggers it is possible to set levels to specific loggers; e.g. matplotlib.font_manager or PIL which are know for annoying debug messages.

Colorized console output is activated by default if the package colorlog is available.

Parameters
log_directory:`pathlib.Path`	Path to log file directory.
console_level:`int`	Log level for console (stdout) handler.
console_format_str:`str`	Format of the console log messages.
file_level:`int`	Log level for file handler. Deactivate with `None`.
file_format_str:`str`	Format of the file log messages.
file_backup_count:`int`	Keep that amount of log files.
exceptional_loggers:`dict[str, int]`	Levels for specific loggers.

def shorten_strings_but_unique(string_list: Iterable[str], limit: int = 25, omit_string: str = '…') -> Iterable[str]: (source) ¶

Shorten strings to a given limit but keep them unique to the list.

Parameters
string_list:`Iterable[str]`	List of strings to shorten.
limit:`int`	Number of characters each string is shorten to.
omit_string:`str`	Character used to visualize the omit.
Returns
`Iterable[str]`	List of shorten strings.

__full_name__ = (source) ¶

Full name string including version, branch, commit.

__version__ = (source) ¶

Version string of the package.

config = (source) ¶

Undocumented

meta = (source) ¶

Undocumented

def _package_metadata_as_dict(exclude_keys: list = None) -> dict: (source) ¶

Get package metadata and return it as a dict.