Updates

These functions are used to check if updates are required when matching data of new students to existing HEAT records. They return only the data that is different from your HEAT records, which can be copied to HEAT upload templates to bulk update records.

Get Updates

This function compares two DataFrame columns and returns a value if new_col is different to heat_col. This can help you identify where new data is different to existing HEAT records and therefore needs updating. It returns a new column which only contains data that needs to be updated - so it can be copied to the HEAT import template.

Note

This function copies the original DataFrame in memory and does not modify your original DataFrame in place.

For example, if you have collected new student data and matched this to your existing HEAT records and discover that some students already have records, you can use this function to check if the new data is different to the current records. If there are differences, the new data will be returned in the return column so that you can easily identify changes.

Tip

If used on string columns, it will attempt to 'normalise' nulls by filling them with blank strings and then turning them to None. This should mean that no null values are returned as 'new' data.

ExampleExample: running on multiple cols at once

import heat_helper as hh
import pandas as pd

# Example Data: we want to check if any postcodes in NEW Postcode 
# are different to the existing HEAT Postcodes
#
#       NEW Full Name NEW Postcode HEAT Postcode
#            Jane Doe      BB1 1AB       AA1 1AA
#          Mike Jones      BB2 2BB       BB2 2BB
#        Thomas Smith                    CC3 3CC
#         Sarah Brown      DD4 4DD       DD4 4DD
#  Christopher Bloggs      AA1 1AA          None

df['Updated Postcodes'] = hh.get_updates(
df,
'NEW Postcode',
'HEAT Postcode'
)

# Result: Jane Doe's postcode needs updating, as does Christopher's 
# (which was blank on HEAT)
#
#       NEW Full Name NEW Postcode HEAT Postcode Updated Context
#            Jane Doe      BB1 1AB       AA1 1AA         BB1 1AB
#          Mike Jones      BB2 2BB       BB2 2BB            None
#        Thomas Smith         None       CC3 3CC            None
#         Sarah Brown      DD4 4DD       DD4 4DD            None
#  Christopher Bloggs      AA1 1AA          None         AA1 1AA

import heat_helper as hh
import pandas as pd

personal_columns_to_update = [
"First Name",
"Middle Name",
"Last Name",
"Date of Birth",
"Home Postcode",
"Last known Institution HEAT ID",
"Last known Institution Name",
"Individual Data Collection Date",
"Individual Data Collection Privacy Form",
"Phase Adjustment",
]

# Update person cols if new data is different to old
for col in personal_columns_to_update:
    heat_col = f"HEAT: {col}"
    update_col = f"Update: {col}"
    df[update_col] = hh.get_updates(df, col, heat_col)

# This would create a new column for each column in personal_columns_to_update 
# containing only data that needs updating

Get Contextual Updates

This function compares two DataFrame columns and returns a value if new_col is different to heat_col and if the values in new_col are not in bad_values. This can help you identify where new data is different to existing HEAT records and therefore needs updating, but will not override 'good' data with values like 'Not available' 'Unknown' or 'Information Refused'. It returns a new column which only contains data that needs to be updated - so it can be copied to the HEAT import template.

For example, if you have collected new student data and matched this to your existing HEAT records and discover that some students already have records, you can use this function to check if the new data is different to the current records. If there are differences, the new data will be returned in the return column so that you can easily identify changes.

Note

This function copies the original DataFrame in memory and does not modify your original DataFrame in place.

The best use case for this function is on the contextual data columns in the HEAT Student Export, as you can pass all 'bad' values as a list, tuple, set or any other Iterable and avoid them overwriting older data.

Tip

If used on string columns, it will attempt to 'normalise' nulls by filling them with blank strings and then turning them to None. This should mean that no null values are returned as 'new' data.

ExampleExample: running on multiple cols at once

import heat_helper as hh
import pandas as pd

# Example Data: we want to check if any values in NEW Context 
# are different to the existing HEAT Context, but we don't want
# values like 'Not available' or 'Information Refused' to overwrite 
# 'good' values like 'Yes' and 'No'.
#
#       NEW Full Name          NEW Context HEAT Context
#            Jane Doe        Not available          Yes
#          Mike Jones  Information Refused           No
#        Thomas Smith                  Yes           No
#         Sarah Brown                                No
#  Christopher Bloggs                  Yes          Yes

df['Updated Context'] = hh.get_contextual_updates(
df,
'NEW Context',
'HEAT Context',
bad_values = ['Not available', 'Information Refused']
)

# Result: only Thomas Smith's Context needs updating
#
#       NEW Full Name          NEW Context HEAT Context Updated Context
#            Jane Doe        Not available          Yes            None
#          Mike Jones  Information Refused           No            None
#        Thomas Smith                  Yes           No             Yes
#         Sarah Brown                 None           No            None
#  Christopher Bloggs                  Yes          Yes            None

import heat_helper as hh
import pandas as pd

# Define the "bad" values we do not want to be overridden 
# by good values (lose no data via updates)
BAD_VALUES = (
"Not available",
"Information refused",
"Not available (999)",
"Unknown",
"Not known (997)",
"Prefer not to say (998)",
"Prefer not to say",
)

contextual_columns_to_update = [
"Sex",
"First Generation HE",
"Disability",
"Ethnicity",
"Refugee / Asylum Seeker",
"Care Leaver",
"Estranged",
"Service Children",
"Young Carer",
]

# Update contextual only if new data is different AND 'good'
for col in contextual_columns_to_update:
    left_col = col
    heat_col = f"HEAT: {col}"
    update_col = f"Update: {col}"

    # Create all cols in the list
    df[update_col] = hh.get_contextual_updates(df, 
                            left_col, heat_col, BAD_VALUES)

# This would create a new column for each column in 
# contextual_columns_to_update containing only data that needs 
# updating