Names API
This is the API reference for all functions designed to be used on names. You can find usage examples here.
hh.format_name
format_name(text: str, errors: str = 'raise') -> str | None
Cleans the formatting of names. Strips extra whitespaces, converts to title case (with exceptions for names like McDonald and O'Reilly) and tidies any spaces around hyphens.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
The name you wish to clean. |
required |
errors
|
optional
|
Default = 'raise' which raises all errors. 'ignore' ignores errors and returns original value, 'coerce' returns None. |
'raise'
|
Raises:
| Type | Description |
|---|---|
TypeError
|
Raised if text is not a string. |
Returns:
| Type | Description |
|---|---|
str | None
|
Cleaned text. |
hh.create_full_name
create_full_name(
first_name: str | Series,
last_name: str | Series,
middle_name: str | Series = "",
) -> str | pd.Series
Joins strings or pandas DataFrame columns into a 'Full Name' string or column of strings. Useful if you are going to be fuzzy matching names.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
first_name
|
str | Series
|
First name. |
required |
last_name
|
str | Series
|
Last name. |
required |
middle_name
|
optional
|
Middle name. Defaults to a blank string or blank pd.Series. |
''
|
Returns:
| Type | Description |
|---|---|
str | Series
|
One string or Series of strings with all names joined. |
hh.find_numbers_in_text
find_numbers_in_text(
text: str, errors: str = "raise", convert_to_string: bool = False
) -> bool | str | None
Checks if one or more numbers are present in a string. Numbers do not have to be consecutive.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
The text to check for numbers. |
required |
errors
|
optional
|
Default = 'raise' which raises all errors. 'ignore' ignores errors and returns original value, 'coerce' returns None. |
'raise'
|
convert_to_string
|
optional
|
Tells the function to convert text datatype to string, if possible. Defaults to False. |
False
|
Raises:
| Type | Description |
|---|---|
TypeError
|
Raised if text datatype is not string. |
Returns:
| Type | Description |
|---|---|
bool | str | None
|
True if string contains one or more numbers (0-9) or False if no numbers present. |
hh.remove_numbers
remove_numbers(
text: str, errors: str = "raise", convert_to_string: bool = False
) -> str | None
Removes one or more numbers from a string (text). Numbers do not have to be consecutive.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
The string you want to remove numbers from e.g. 'Jane Doe 43' |
required |
errors
|
optional
|
Default = 'raise' which raises all errors. 'ignore' ignores errors and returns original value, 'coerce' returns None. |
'raise'
|
convert_to_string
|
optional
|
Tells the function to convert text datatype to string, if possible. Defaults to False. |
False
|
Raises:
| Type | Description |
|---|---|
TypeError
|
Raised if text is not a string. |
Returns:
| Type | Description |
|---|---|
str | None
|
Text with numbers removed. |
hh.remove_diacritics
remove_diacritics(
input_text: str, errors: str = "raise"
) -> str | None
Removes diacritics (accented letters) from text. Uses python's built-in unicodedata library and normalises to NFKD before removal.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_text
|
str
|
The text you want to remove diacritics from. |
required |
errors
|
optional
|
Default = 'raise' which raises all errors. 'ignore' ignores errors and returns original value, 'coerce' returns None. |
'raise'
|
Raises:
| Type | Description |
|---|---|
TypeError
|
Raised if input_text is not a string. |
Returns:
| Type | Description |
|---|---|
str | None
|
Text with accents removed e.g. 'Chloë' -> 'Chloe'. |
hh.remove_punctuation
remove_punctuation(
text: str, punctuation: str = PUNCTUATION, errors: str = "raise"
) -> str | None
Removes all punctuation except for hyphens and apostrophes from text. Useful for cleaning names.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Text you wish to remove punctuation from. |
required |
punctuation
|
optional
|
String containing all punctuation except for hyphens and apostrophes. Can be overridden with your own version if you want to exclude other types of punctuation. Should be one string of all chars to remove. Default includes the following chars: !@#£$%^&*()_=+`~,.<>/?;:"|[] |
PUNCTUATION
|
errors
|
optional
|
Default = 'raise' which raises all errors. 'ignore' ignores errors and returns original value, 'coerce' returns None. |
'raise'
|
Raises:
| Type | Description |
|---|---|
TypeError
|
Raised if text is not a string. |
Returns:
| Type | Description |
|---|---|
str | None
|
Text with all punctuation except hyphens and apostrophes removed e.g. 'Jane! Doe.' -> 'Jane Doe' |