Updates API
This is the API reference for all functions for finding data which needs updating. You can find usage examples here.
hh.get_updates
get_updates(df: DataFrame, new_col: str, heat_col: str) -> pd.Series
Compares two DataFrame columns and returns a value if new_col is different to heat_col. This can help you identify where new data is different to existing HEAT records and therefore needs updating. It returns a new column which only contains data that needs to be updated - so it can be copied to the HEAT import template. The original DataFrame is not modified; a copy is created internally.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame where the columns are located. |
required |
new_col
|
str
|
The column which contains 'new' data. This would be the data you want to update on HEAT if it differs from values in heat_col. |
required |
heat_col
|
str
|
The corresponding column in your HEAT export e.g. if you are checking if any postcodes need updating, both new_col and heat_col should contain postcodes. |
required |
Raises:
| Type | Description |
|---|---|
TypeError
|
Raised if df is not a DataFrame or new_col and heat_col are not strings (text). |
ColumnDoesNotExistError
|
Raised if either new_col or heat_col are not in df columns. |
Returns:
| Type | Description |
|---|---|
Series
|
A pandas Series (DataFrame column) where rows contain the value from new_col if this is different to heat_col. |
hh.get_contextual_updates
get_contextual_updates(
df: DataFrame,
new_col: str,
heat_col: str,
bad_values: Iterable[str],
) -> pd.Series
This function is similar to get_updates, except you can also pass a list, set, tuple or other Iterable of 'bad' values you do not want to override HEAT data. This can be useful if you want to ensure that data like 'Not available' or 'Unknown' does not overwrite previously collected 'good' values in the contextual data columns. The original DataFrame is not modified; a copy is created internally.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame where the columns are located. |
required |
new_col
|
str
|
The column which contains 'new' data. This would be the data you want to update on HEAT if it differs from values in heat_col. |
required |
heat_col
|
str
|
The corresponding column in your HEAT export e.g. if you are checking if any postcodes need updating, both new_col and heat_col should contain postcodes. |
required |
bad_values
|
Iterable[str]
|
A list, tuple, or set (or other Iterable) of values which should not overwrite 'good' data in your HEAT records. For example, if your new data contains 'Not available' or 'Unknown' but your HEAT records had values in these columns, you could pass ['Not available', 'Unknown'] to this variable, and these will not overwrite 'good' values in the heat_col. |
required |
Raises:
| Type | Description |
|---|---|
TypeError
|
Raised if df is not a DataFrame, new_col and heat_col are not strings (text), or bad_values is not a list. |
ColumnDoesNotExistError
|
Raised if either new_col or heat_col are not in df columns. |
Returns:
| Type | Description |
|---|---|
Series
|
A pandas Series (DataFrame column) where rows contain the value from new_col if this is different to heat_col. |