ALGAE Protocol: An automated protocol for assigning early life exposures to longitudinal cohort studies

An automated protocol for assigning early life exposures to longitudinal cohort studies

Data loading part 3: Prepare the address history data

by Kevin Garwood

Overview Previous Next

Purpose

This table is used to establish when and where study members were with respect to historically modelled exposure values at addresses they occupied.

Location of Original Data File

You will need to create a file that has this name
original_address_history_data.csv
which must be located in:
early_life/input_data
or
later_life/input_data

Example Original Data File

See here.

Suggested Approach

The goal of this activity is to populate a table that has the following fields: code>person_id, geocode, start_date, end_date. Before you begin this part of the protocol, please ensure that you have done all the steps needed to prepare the original_geocode_data table. See Creating the original_geocode_data table.

We assume that you will obtain all the residential address records of study members from an administrative system that audits current addresses for all cohort members.

Although ALGAE does not care how you create the expected fields, we will assume that your administrative database will record an address using the following fields:

  • person_id
  • one or more fields for a residential address (eg: address line 1, post code)
  • a time stamp
  • other contact data fields that ALGAE does not need (eg: e-mail, phone number)

Step 1: Substitute address fields with geocodes

From your work preparing the original_geocode_data, you should have been able to substitute the address fields in each address record with a geocode.

Step 2: Identify candidate field for start_date

As part of our assumptions about your administrative system, the start date will probably correspond to the time stamp of the address record. Note that this time stamp will represent when a change of address was added to the system, and not when a study member began living there.

Step 3: Identify candidate field for end_date

Either your Contacts database will have a field for end_date or you will have to derive one. Map data from a corresponding end_date field that already exists in the Contacts database. In the Imperial-ALSPAC use case study, ALSPAC provided Imperial with address records that included an end_date field. However, for some administrative systems, the end date may have to be computed. If this is the case, we would expect that the end date of the previous address record would be 1 day before the current address record.

If you find that you have to compute an end_date field, then it is likely that all of your address periods will fit together perfectly with one another. In this case, your results will exhibit no exposure measurement error in the _err fields that appear in the early or late cleaned mobility result tables.

General Advice

If your administrative data set has both start_date and end_date fields, then do not attempt to clean them. Otherwise ALGAE will not detect the errors and attempt to assess exposure measurement error in the results.

Table Properties

You need to produce a CSV file called original_address_history_data. It must have the following fields:
Field Description Required Properties Examples
person_id Anonymised unique identifier representing a study member Yes Any Text 1001XYZ
comments Any other information that exposure scientists want to provide about a geocode No Any text
geocode Represents the location of a residential address. For ALGAE, the geocode is treated as just an identifier and the protocol attaches no meaning to the code. Yes Any text 37.422036-122.084124
x4353bi838 (anonymised)
start_date The date when it is assumed that a study member began living at an address. In the use case study, the date actually represented the date when cohort administrators updated a study member's current address in their Contacts database application. Yes Date Format: dd/MM/yyyy 23/03/1996
end_date The date when it is assumed that a study member stopped living at an address. In the use case study, the end date appeared in some cases to have been manually calculated whereas in others it appeared to have been automatically computed to correspond with the start date of the next address period. Yes Date Format: dd/MM/yyyy 23/05/1996