Skip to content

Names API

This is the API reference for all functions designed to be used on names. You can find usage examples here.

hh.format_name

format_name(text: str, errors: str = 'raise') -> str | None

Cleans the formatting of names. Strips extra whitespaces, converts to title case (with exceptions for names like McDonald and O'Reilly) and tidies any spaces around hyphens.

Parameters:

Name Type Description Default
text str

The name you wish to clean.

required
errors optional

Default = 'raise' which raises all errors. 'ignore' ignores errors and returns original value, 'coerce' returns None.

'raise'

Raises:

Type Description
TypeError

Raised if text is not a string.

Returns:

Type Description
str | None

Cleaned text.

hh.create_full_name

create_full_name(
    first_name: str | Series,
    last_name: str | Series,
    middle_name: str | Series = "",
) -> str | pd.Series

Joins strings or pandas DataFrame columns into a 'Full Name' string or column of strings. Useful if you are going to be fuzzy matching names.

Parameters:

Name Type Description Default
first_name str | Series

First name.

required
last_name str | Series

Last name.

required
middle_name optional

Middle name. Defaults to a blank string or blank pd.Series.

''

Returns:

Type Description
str | Series

One string or Series of strings with all names joined.

hh.find_numbers_in_text

find_numbers_in_text(
    text: str, errors: str = "raise", convert_to_string: bool = False
) -> bool | str | None

Checks if one or more numbers are present in a string. Numbers do not have to be consecutive.

Parameters:

Name Type Description Default
text str

The text to check for numbers.

required
errors optional

Default = 'raise' which raises all errors. 'ignore' ignores errors and returns original value, 'coerce' returns None.

'raise'
convert_to_string optional

Tells the function to convert text datatype to string, if possible. Defaults to False.

False

Raises:

Type Description
TypeError

Raised if text datatype is not string.

Returns:

Type Description
bool | str | None

True if string contains one or more numbers (0-9) or False if no numbers present.

hh.remove_numbers

remove_numbers(
    text: str, errors: str = "raise", convert_to_string: bool = False
) -> str | None

Removes one or more numbers from a string (text). Numbers do not have to be consecutive.

Parameters:

Name Type Description Default
text str

The string you want to remove numbers from e.g. 'Jane Doe 43'

required
errors optional

Default = 'raise' which raises all errors. 'ignore' ignores errors and returns original value, 'coerce' returns None.

'raise'
convert_to_string optional

Tells the function to convert text datatype to string, if possible. Defaults to False.

False

Raises:

Type Description
TypeError

Raised if text is not a string.

Returns:

Type Description
str | None

Text with numbers removed.

hh.remove_diacritics

remove_diacritics(
    input_text: str, errors: str = "raise"
) -> str | None

Removes diacritics (accented letters) from text. Uses python's built-in unicodedata library and normalises to NFKD before removal.

Parameters:

Name Type Description Default
input_text str

The text you want to remove diacritics from.

required
errors optional

Default = 'raise' which raises all errors. 'ignore' ignores errors and returns original value, 'coerce' returns None.

'raise'

Raises:

Type Description
TypeError

Raised if input_text is not a string.

Returns:

Type Description
str | None

Text with accents removed e.g. 'Chloë' -> 'Chloe'.

hh.remove_punctuation

remove_punctuation(
    text: str, punctuation: str = PUNCTUATION, errors: str = "raise"
) -> str | None

Removes all punctuation except for hyphens and apostrophes from text. Useful for cleaning names.

Parameters:

Name Type Description Default
text str

Text you wish to remove punctuation from.

required
punctuation optional

String containing all punctuation except for hyphens and apostrophes. Can be overridden with your own version if you want to exclude other types of punctuation. Should be one string of all chars to remove. Default includes the following chars: !@#£$%^&*()_=+`~,.<>/?;:"|[]

PUNCTUATION
errors optional

Default = 'raise' which raises all errors. 'ignore' ignores errors and returns original value, 'coerce' returns None.

'raise'

Raises:

Type Description
TypeError

Raised if text is not a string.

Returns:

Type Description
str | None

Text with all punctuation except hyphens and apostrophes removed e.g. 'Jane! Doe.' -> 'Jane Doe'