Data loading part 3: Prepare the address history data
by Kevin Garwood
Overview | Previous | Next |
Purpose
This table is used to establish when and where study members were with respect to historically modelled exposure values at addresses they occupied.Location of Original Data File
You will need to create a file that has this nameoriginal_address_history_data.csvwhich must be located in:
early_life/input_dataor
later_life/input_data
Example Original Data File
See here.Suggested Approach
The goal of this activity is to populate a table that has the following fields: code>person_id,geocode
, start_date
,
end_date
. Before you begin this part of the protocol, please ensure
that you have done all the steps needed to prepare the original_geocode_data
table
. See Creating the
original_geocode_data
table.
We assume that you will obtain all the residential address records of study members from an administrative system that audits current addresses for all cohort members.
Although ALGAE does not care how you create the expected fields, we will assume that your administrative database will record an address using the following fields:
- person_id
- one or more fields for a residential address (eg: address line 1, post code)
- a time stamp
- other contact data fields that ALGAE does not need (eg: e-mail, phone number)
Step 1: Substitute address fields with geocodes
From your work preparing theoriginal_geocode_data
, you should have been
able to substitute the address fields in each address record with a geocode.
Step 2: Identify candidate field for start_date
As part of our assumptions about your administrative system, the start date will
probably correspond to the time stamp of the address record. Note that this time stamp
will represent when a change of address was added to the system, and not when a study
member began living there.
Step 3: Identify candidate field for end_date
Either your Contacts database will have a field for end_date or you will have to derive
one.
Map data from a corresponding end_date field that already exists in the Contacts database.
In the Imperial-ALSPAC use case study, ALSPAC provided Imperial with address records that
included an end_date
field. However, for some administrative systems, the end
date may have to be computed. If this is the case, we would expect that the end date of the
previous address record would be 1 day before the current address record.
If you find that you have to compute an end_date field, then it is likely that all of your
address periods will fit together perfectly with one another. In this case, your results will
exhibit no exposure measurement error in the _err
fields that appear in the
early or
late cleaned mobility result tables.
General Advice
If your administrative data set has bothstart_date
and end_date
fields,
then do not attempt to clean them. Otherwise ALGAE will not detect the errors and attempt
to assess exposure measurement error in the results.
Table Properties
You need to produce a CSV file calledoriginal_address_history_data
. It must
have the following fields:
Field | Description | Required | Properties | Examples |
---|---|---|---|---|
person_id | Anonymised unique identifier representing a study member | Yes | Any Text | 1001XYZ |
comments | Any other information that exposure scientists want to provide about a geocode | No | Any text | |
geocode | Represents the location of a residential address. For ALGAE, the geocode is treated as just an identifier and the protocol attaches no meaning to the code. | Yes | Any text |
37.422036-122.084124
x4353bi838 (anonymised) |
start_date | The date when it is assumed that a study member began living at an address. In the use case study, the date actually represented the date when cohort administrators updated a study member's current address in their Contacts database application. | Yes | Date Format: dd/MM/yyyy | 23/03/1996 |
end_date | The date when it is assumed that a study member stopped living at an address. In the use case study, the end date appeared in some cases to have been manually calculated whereas in others it appeared to have been automatically computed to correspond with the start date of the next address period. | Yes | Date Format: dd/MM/yyyy | 23/05/1996 |