Skip to content

Updates API

This is the API reference for all functions for finding data which needs updating. You can find usage examples here.

hh.get_updates

get_updates(df: DataFrame, new_col: str, heat_col: str) -> pd.Series

Compares two DataFrame columns and returns a value if new_col is different to heat_col. This can help you identify where new data is different to existing HEAT records and therefore needs updating. It returns a new column which only contains data that needs to be updated - so it can be copied to the HEAT import template. The original DataFrame is not modified; a copy is created internally.

Parameters:

Name Type Description Default
df DataFrame

DataFrame where the columns are located.

required
new_col str

The column which contains 'new' data. This would be the data you want to update on HEAT if it differs from values in heat_col.

required
heat_col str

The corresponding column in your HEAT export e.g. if you are checking if any postcodes need updating, both new_col and heat_col should contain postcodes.

required

Raises:

Type Description
TypeError

Raised if df is not a DataFrame or new_col and heat_col are not strings (text).

ColumnDoesNotExistError

Raised if either new_col or heat_col are not in df columns.

Returns:

Type Description
Series

A pandas Series (DataFrame column) where rows contain the value from new_col if this is different to heat_col.

hh.get_contextual_updates

get_contextual_updates(
    df: DataFrame,
    new_col: str,
    heat_col: str,
    bad_values: Iterable[str],
) -> pd.Series

This function is similar to get_updates, except you can also pass a list, set, tuple or other Iterable of 'bad' values you do not want to override HEAT data. This can be useful if you want to ensure that data like 'Not available' or 'Unknown' does not overwrite previously collected 'good' values in the contextual data columns. The original DataFrame is not modified; a copy is created internally.

Parameters:

Name Type Description Default
df DataFrame

DataFrame where the columns are located.

required
new_col str

The column which contains 'new' data. This would be the data you want to update on HEAT if it differs from values in heat_col.

required
heat_col str

The corresponding column in your HEAT export e.g. if you are checking if any postcodes need updating, both new_col and heat_col should contain postcodes.

required
bad_values Iterable[str]

A list, tuple, or set (or other Iterable) of values which should not overwrite 'good' data in your HEAT records. For example, if your new data contains 'Not available' or 'Unknown' but your HEAT records had values in these columns, you could pass ['Not available', 'Unknown'] to this variable, and these will not overwrite 'good' values in the heat_col.

required

Raises:

Type Description
TypeError

Raised if df is not a DataFrame, new_col and heat_col are not strings (text), or bad_values is not a list.

ColumnDoesNotExistError

Raised if either new_col or heat_col are not in df columns.

Returns:

Type Description
Series

A pandas Series (DataFrame column) where rows contain the value from new_col if this is different to heat_col.