Skip to content

File Processing API

This is the API reference for all functions designed to be files. You can find usage examples here.

hh.get_excel_filepaths_in_folder

get_excel_filepaths_in_folder(
    input_dir: str, print_to_terminal: bool = False
) -> list[str]

Returns a list of filepaths to Excel files (with the extension .xlsx or .xls) in a given folder. Note: Only searches the top-level of input_dir; does not recursively search subdirectories.

Parameters:

Name Type Description Default
input_dir str

The directory you want to get the filepaths from.

required
print_to_terminal optional

Defaults to False. Set to True if you want the terminal to print messages about the file processing.

False

Raises:

Type Description
FileNotFoundError

Raises errors if the directory does not exist.

Returns:

Type Description
list[str]

A list of filepaths from the specified folder. Returns empty list if there are no Excel files in the folder.

hh.convert_col_snake_case

convert_col_snake_case(df: DataFrame) -> pd.DataFrame

Converts the column names of a DataFrame to snake case. This function also checks for duplicate columns before and after conversion, and raises a ColumnsNotUnique error if duplicates are found.

Parameters:

Name Type Description Default
df DataFrame

The DataFrame you want to convert the column names of to snake case.

required

Raises:

Type Description
TypeError

Raises an error if the input is not a DataFrame.

ColumnsNotUnique

Raises an error if there are duplicate columns in the original DataFrame or if the snake-case conversion results in duplicate column names. The error message will specify which columns are duplicated

Returns:

Type Description
DataFrame

A new DataFrame with the same data but with column names converted to snake case.