ALGAE Data Dictionary: Later life exposures, using cleaned mobility assessment

An automated protocol for assigning early life exposures to longitudinal cohort studies

ALGAE Data Dictionary: Later life exposures, using cleaned mobility assessment (algae3500-algae3557)

by Kevin Garwood

Context of Variables

These later life exposure values are based on the cleaned mobility assessment method. It includes days that may have been involved in gaps and overlaps that appeared in the original address period file. Note that this method of assessment is different than the "unclean mobility" approach, which ignores any gap or overlap in its calculations.

Cumulative, average and median exposures are calculated for each pollutant (NAME, NOX_rd, PM10_rd, PM10_gr, PM10_tot) for each life stage (T1, T2, T3, EL) for each person.

Exposure Measurement Error Variables

The table also includes exposure measurement error variables that correspond with each aggregated exposure value. Exposure measurement errors are based on exposures that come from "days of contention". A day of contention is caused by a gap or an overlap in the residential history records, when a person could have occupied more than one location.

The exposure measurement error for a given day of contention is an opportunity cost, measured as the absolute difference between assigned and opportunity geocodes (See Calculations and Algorithms). The assigned geocode is one that the cleaning algorithm has assigned in its attempts to fix a gap or overlap. The opportunity geocode is the other location a person could have occupied on that day. If a day is not covered by a gap or overlap problem, then it will have a daily exposure error of zero.

Daily exposure error values are aggregated to match the context of the aggregated exposure values. If your address periods all have temporally contiguous start and end dates, then you should expect the exposure values to be zero.

You may observe that some of the exposure error values are not zero, but are extremely small values. We believe that some of these may owe to numerical round-off error and we advise that you decide on a threshold for assigning them zero.

Location of Result File

You will find these variables in a file having a name that fits the form:
res_later_mob_cln_exp_[Date stamp].csv
which will be found in the directory:
later_life/results/exposure_data/mobility_clean
or
later_life/results/exposure_data/mobility_clean

Example Result File

See here.

Variable Naming Conventions

It may be quicker to understand the variables through naming conventions rather than looking at specific table entries. The basic format of variables in this section follows this pattern:
	algae35[00-31]_[pollution type]_[optional err_][aggregate value]

In this pattern:

  • algae35: indicates that they refer to exposure values for later life that make full use of cleaned address periods and consider error values
  • pollution_type: will be name, nox_rd, pm10_tot, pm10_rd, pm10_gr
  • _err_: If _err_ appears in the name, then the variables describes some aggregate of error values over a life stage. If _err_ does not appear in the name, then the value indicates an aggregate pollution value over the life stage.
  • aggregate value: sum for cumulative value, avg for average value and med for median value.

The pollutant codes have the following meanings:

  • name: high level pollution that comes from outside the exposure area
  • nox_rd: Nitrogen oxide pollution coming from roads
  • pm10_rd: PM10 particulate matter coming from roads.
  • pm10_gr: PM10 particulate matter coming from sources other than roads.
  • pm10_tot: PM10 particulate matter coming from either roads or other sources.

Be aware that in many cases field values with _err_ can be very small. Extremely small values can indicate either that the amount of error is small, or reflect numerical roundoff error somewhere in the calculations.

Variable Dictionary

Variable Description
algae3500_person_id An anonymised or pseudonymised identifier which represents a study member. ALGAE uses this variable to link data together for a given study member.
algae3501_life_stage The name of a life stage. For example, "T1" may be the name of the Trimester 1 life stage.
algae3502_life_stage_duration The number of days in the life stage.
algae3503_name_inv_addr_days The number of NAME exposure days in the life stage that the study member spent at an invalid address. See definition of Invalid address days.
algae3504_name_oob_days The number of NAME exposure days in the life stage that the study member spent living at a location that is considered outside the bounds of the exposure area. See definition of Out of bounds days.
algae3505_name_poor_addr_days The number of NAME exposure days in the life stage that the study member spent living at a location whose geocode was derived from a poor quality residential address. The geocode was used to generate exposure values, but it is still considered to be invalid because it is of such poor quality. See definition of Poor address days.
algae3506_name_missing_exp_days The number of NAME exposure days in the life stage that the study member spent living at a valid geocode which has some exposure values but not for specific days. See definition of Missing exposure days.
algae3507_name_good_addr_days The number of NAME exposure days in the life stage that the study member spent living at a geocode that is considered a good match: it has a valid geocode and it has a non-blank exposure value for a given day. See definition of Good address days.
algae3508_nox_rd_inv_addr_days The number of NOX RD exposure days in the life stage that the study member spent at an invalid address. See definition of Invalid address days.
algae3509_nox_rd_oob_days The number of NOX RD exposure days in the life stage that the study member spent living at a location that is considered outside the bounds of the exposure area. See definition of Out of bounds days.
algae3510_nox_rd_poor_addr_days The number of NOX RD exposure days in the life stage that the study member spent living at a location whose geocode was derived from a poor quality residential address. The geocode was used to generate exposure values, but it is still considered to be invalid because it is of such poor quality. See definition of Poor address days.
algae3511_nox_rd_missing_exp_days The number of NOX RD exposure days in the life stage that the study member spent living at a valid geocode which has some exposure values but not for specific days. See definition of Missing exposure days.
algae3512_nox_rd_good_addr_days The number of NOX RD exposure days in the life stage that the study member spent living at a geocode that is considered a good match: it has a valid geocode and it has a non-blank exposure value for a given day. See definition of Good address days.
algae3513_pm10_rd_inv_addr_days The number of PM10 RD exposure days in the life stage that the study member spent at an invalid address. See definition of Invalid address days.
algae3514_pm10_rd_oob_days The number of PM10 RD exposure days in the life stage that the study member spent living at a location that is considered outside the bounds of the exposure area. See definition of Out of bounds days.
algae3515_pm10_rd_poor_addr_days The number of PM10 RD exposure days in the life stage that the study member spent living at a location whose geocode was derived from a poor quality residential address. The geocode was used to generate exposure values, but it is still considered to be invalid because it is of such poor quality. See definition of Poor address days.
algae3516_pm10_rd_missing_exp_days The number of PM10 RD exposure days in the life stage that the study member spent living at a valid geocode which has some exposure values but not for specific days. See definition of Missing exposure days.
algae3517_pm10_rd_good_addr_days The number of PM10 RD exposure days in the life stage that the study member spent living at a geocode that is considered a good match: it has a valid geocode and it has a non-blank exposure value for a given day. See definition of Good address days.
algae3518_pm10_gr_inv_addr_days The number of PM10 GR exposure days in the life stage that the study member spent at an invalid address. See definition of Invalid address days.
algae3519_pm10_gr_oob_days The number of PM10 GR exposure days in the life stage that the study member spent living at a location that is considered outside the bounds of the exposure area. See definition of Out of bounds days.
algae3520_pm10_gr_poor_addr_days The number of PM10 GR exposure days in the life stage that the study member spent living at a location whose geocode was derived from a poor quality residential address. The geocode was used to generate exposure values, but it is still considered to be invalid because it is of such poor quality. See definition of Poor address days.
algae3521_pm10_gr_missing_exp_days The number of PM10 GR exposure days in the life stage that the study member spent living at a valid geocode which has some exposure values but not for specific days. See definition of Missing exposure days.
algae3522_pm10_gr_good_addr_days The number of PM10 GR exposure days in the life stage that the study member spent living at a geocode that is considered a good match: it has a valid geocode and it has a non-blank exposure value for a given day. See definition of Good address days.
algae3523_pm10_tot_inv_addr_days The number of PM10 TOT exposure days in the life stage that the study member spent at an invalid address. See definition of Invalid address days.
algae3524_pm10_tot_oob_days The number of PM10 TOT exposure days in the life stage that the study member spent living at a location that is considered outside the bounds of the exposure area. See definition of Out of bounds days.
algae3525_pm10_tot_poor_addr_days The number of PM10 TOT exposure days in the life stage that the study member spent living at a location whose geocode was derived from a poor quality residential address. The geocode was used to generate exposure values, but it is still considered to be invalid because it is of such poor quality. See definition of Poor address days.
algae3526_pm10_tot_missing_exp_days The number of PM10 TOT exposure days in the life stage that the study member spent living at a valid geocode which has some exposure values but not for specific days. See definition of Missing exposure days.
algae3527_pm10_tot_good_addr_days The number of PM10 TOT exposure days in the life stage that the study member spent living at a geocode that is considered a good match: it has a valid geocode and it has a non-blank exposure value for a given day. See definition of Good address days.
algae3528_name_sum Cumulative exposure of NAME for the given life_stage.
algae3529_name_err_sum Cumulative exposure measurement error for NAME measured for the given life_stage
algae3530_name_avg Average exposure for NAME measured for the given life_stage
algae3531_name_err_avg Average exposure measurement error for NAME measured for the given life_stage
algae3532_name_med Median exposure for NAME measured for the given life_stage
algae3533_name_err_med Median exposure measurement error for NAME measured for the given life_stage
algae3534_nox_rd_sum Cumulative exposure of NOX (road sources) for a given life_stage.
algae3535_nox_rd_err_sum Cumulative exposure measurement error for NOX (road sources), measured for the given life_stage
algae3536_nox_rd_avg Average exposure of NOX (road sources) for a given life_stage.
algae3537_nox_rd_err_avg Cumulative exposure measurement error for NOX (road sources), measured for the given life_stage
algae3538_nox_rd_med Median exposure for NOX (road sources), measured for the given life_stage
algae3539_nox_rd_err_med Median exposure measurement error for NOX (road sources), measured for the given life_stage
algae3540_pm10_gr_sum Cumulative exposure for PM10 (non-road sources), measured for the given life_stage
algae3541_pm10_gr_err_sum Cumulative exposure measurement error for PM10 (non-road sources), measured for the given life_stage
algae3542_pm10_gr_avg Average exposure for PM10 (non-road sources), measured for the given life_stage
algae3543_pm10_gr_err_avg Average exposure measurement error for PM10 (non-road sources), measured for the given life_stage
algae3544_pm10_gr_med Median exposure for PM10 (non-road sources), measured for the given life_stage
algae3545_pm10_gr_err_med Median exposure measurement error for PM10 (non-road sources), measured for the given life_stage
algae3546_pm10_rd_sum Cumulative exposure for PM10 (road sources), measured for the given life_stage
algae3547_pm10_rd_err_sum Cumulative exposure measurement error for PM10 (road sources), measured for the given life_stage
algae3548_pm10_rd_avg Average exposure for PM10 (road sources), measured for the given life_stage
algae3549_pm10_rd_err_avg Average exposure measurement error for PM10 (road sources), measured for the given life_stage
algae3550_pm10_rd_med Median exposure for PM10 (road sources), measured for the given life_stage
algae3551_pm10_rd_err_med Median exposure measurement error for PM10 (road sources), measured for the given life_stage
algae3552_pm10_tot_sum Cumulative exposure for PM10 (all sources), measured for the given life_stage
algae3553_pm10_tot_err_sum Cumulative exposure measurement error for PM10 (all sources), measured for the given life_stage
algae3554_pm10_tot_avg Average exposure for PM10 (all sources), measured for the given life_stage
algae3555_pm10_tot_err_avg Average exposure measurement error for PM10 (all sources), measured for the given life_stage
algae3556_pm10_tot_med Median exposure for PM10 (all sources), measured for the given life_stage
algae3557_pm10_tot_err_med Median exposure measurement error for PM10 (all sources), measured for the given life_stage