ALGAE Data Dictionary: Early life exposures, using cleaned mobility assessment (algae3100-algae3157)
by Kevin Garwood
Context of Variables
These early life exposure values are based on the cleaned mobility assessment method. It includes days that may have been involved in gaps and overlaps that appeared in the original address period file. Note that this method of assessment is different than the uncleaned mobility assessment, which ignores any gap or overlap in its calculations. For each study member, cumulative, average and median exposures are assessed based on two factors:For each life stage, the protocol counts the number of days which can be described by the following categories for data quality:
- good match days,
- invalid address days,
- out of bounds address days,
- poor match days, and
- missing exposure days.
Please see the Assess the Data Quality of Each Daily Exposure Value section of the ALGAE methodology to learn more about these data quality categories.
Variables marked _err
aggregate daily exposure errors for a given pollutant and
life stage.
Exposure Measurement Error Variables
The table also includes exposure measurement error variables that correspond with each aggregated exposure value. Exposure measurement errors are based on exposures that come from "days of contention". A day of contention is caused by a gap or an overlap in the residential history records, when a person could have occupied more than one location.The exposure measurement error for a given day of contention is an opportunity cost, measured as the absolute difference between assigned and opportunity geocodes (See Calculations and Algorithms). The assigned geocode is one that the cleaning algorithm has assigned in its attempts to fix a gap or overlap. The opportunity geocode is the other location a person could have occupied on that day. If a day is not covered by a gap or overlap problem, then it will have a daily exposure error of zero.
Daily exposure error values are aggregated to match the context of the aggregated exposure values. If your address periods all have temporally contiguous start and end dates, then you should expect the exposure values to be zero.
You may observe that some of the exposure error values are not zero, but are extremely small values. We believe that some of these may owe to numerical round-off error and we advise that you decide on a threshold for assigning them zero.
Location of Result File
You will find these variables in a file having a name that fits the form:res_early_mob_cln_exp_[Date stamp].csvwhich will be found in the directory:
early_life/results/exposure_data/mobility_cleanor
later_life/results/exposure_data/mobility_clean
Example Result File
See here.Variable Naming Conventions
It may be quicker to understand the variables through naming conventions rather than looking at specific table entries. Most of the variable names fit one of the following patterns: ll variables in this section start withalgae31
, which indicates
that the variables relate to the early life exposure assessment. This assessment
makes full use of cleaned address periods and considers error values.
Variables related to exposure values will be of the format:
algae31[nn]_[pollution type]_[aggregate value] algae31[nn]_[pollution type]_err_[aggregate value] algae31[nn]_[pollution_type]_[daily exposure quality]_days
In these patterns:
- nn: a two digit number.
- pollution type: will be name, nox_rd, pm10_tot, pm10_rd, pm10_gr.
-
aggregate value: refers to one of sum, average or median operations that
are applied to daily exposure values for a given life stage. They are
represented by
_sum
,_avg
and_med
respectively. - err: means the value is an aggregation of daily error values measured for a given life stage
- daily exposure quality: are variables that measure the number of days in a given life stage which exhibit different types of quality for exposures. The daily exposure data quality indicators are variables that cover categories such as good exposure days, invalid address days and out of bounds exposure days.
Be aware that in many cases field values with _err_
can be very small.
Extremely small values can indicate either that the amount of error is small, or reflect
numerical roundoff error somewhere in the calculations.
Variable Dictionary
Variable | Description |
---|---|
algae3100_person_id | An anonymised or pseudonymised identifier which represents a study member. ALGAE uses this variable to link data together for a given study member. |
algae3101_life_stage | The name of a life stage. For example, "T1" may be the name of the Trimester 1 life stage. |
algae3102_life_stage_duration | The number of days in the life stage. |
algae3103_name_inv_addr_days | The number of NAME exposure days in the life stage that the study member spent at an invalid address. See definition of Invalid address days. |
algae3104_name_oob_days | The number of NAME exposure days in the life stage that the study member spent living at a location that is considered outside the bounds of the exposure area. See definition of Out of bounds days. |
algae3105_name_poor_addr_days | The number of NAME exposure days in the life stage that the study member spent living at a location whose geocode was derived from a poor quality residential address. The geocode was used to generate exposure values, but it is still considered to be invalid because it is of such poor quality. See definition of Poor address days. |
algae3106_name_missing_exp_days | The number of NAME exposure days in the life stage that the study member spent living at a valid geocode which has some exposure values but not for specific days. See definition of Missing exposure days. |
algae3107_name_good_addr_days | The number of NAME exposure days in the life stage that the study member spent living at a geocode that is considered a good match: it has a valid geocode and it has a non-blank exposure value for a given day. See definition of Good address days. |
algae3108_nox_rd_inv_addr_days | The number of NOX RD exposure days in the life stage that the study member spent at an invalid address. See definition of Invalid address days. |
algae3109_nox_rd_oob_days | The number of NOX RD exposure days in the life stage that the study member spent living at a location that is considered outside the bounds of the exposure area. See definition of Out of bounds days. |
algae3110_nox_rd_poor_addr_days | The number of NOX RD exposure days in the life stage that the study member spent living at a location whose geocode was derived from a poor quality residential address. The geocode was used to generate exposure values, but it is still considered to be invalid because it is of such poor quality. See definition of Poor address days. |
algae3111_nox_rd_missing_exp_days | The number of NOX RD exposure days in the life stage that the study member spent living at a valid geocode which has some exposure values but not for specific days. See definition of Missing exposure days. |
algae3112_nox_rd_good_addr_days | The number of NOX RD exposure days in the life stage that the study member spent living at a geocode that is considered a good match: it has a valid geocode and it has a non-blank exposure value for a given day. See definition of Good address days. |
algae3113_pm10_rd_inv_addr_days | The number of PM10 RD exposure days in the life stage that the study member spent at an invalid address. See definition of Invalid address days. |
algae3114_pm10_rd_oob_days | The number of PM10 RD exposure days in the life stage that the study member spent living at a location that is considered outside the bounds of the exposure area. See definition of Out of bounds days. |
algae3115_pm10_rd_poor_addr_days | The number of PM10 RD exposure days in the life stage that the study member spent living at a location whose geocode was derived from a poor quality residential address. The geocode was used to generate exposure values, but it is still considered to be invalid because it is of such poor quality. See definition of Poor address days. |
algae3116_pm10_rd_missing_exp_days | The number of PM10 RD exposure days in the life stage that the study member spent living at a valid geocode which has some exposure values but not for specific days. See definition of Missing exposure days. |
algae3117_pm10_rd_good_addr_days | The number of PM10 RD exposure days in the life stage that the study member spent living at a geocode that is considered a good match: it has a valid geocode and it has a non-blank exposure value for a given day. See definition of Good address days. |
algae3118_pm10_gr_inv_addr_days | The number of PM10 GR exposure days in the life stage that the study member spent at an invalid address. See definition of Invalid address days. |
algae3119_pm10_gr_oob_days | The number of PM10 GR exposure days in the life stage that the study member spent living at a location that is considered outside the bounds of the exposure area. See definition of Out of bounds days. |
algae3120_pm10_gr_poor_addr_days | The number of PM10 GR exposure days in the life stage that the study member spent living at a location whose geocode was derived from a poor quality residential address. The geocode was used to generate exposure values, but it is still considered to be invalid because it is of such poor quality. See definition of Poor address days. |
algae3121_pm10_gr_missing_exp_days | The number of PM10 GR exposure days in the life stage that the study member spent living at a valid geocode which has some exposure values but not for specific days. See definition of Missing exposure days. |
algae3122_pm10_gr_good_addr_days | The number of PM10 GR exposure days in the life stage that the study member spent living at a geocode that is considered a good match: it has a valid geocode and it has a non-blank exposure value for a given day. See definition of Good address days. |
algae3123_pm10_tot_inv_addr_days | The number of PM10 TOT exposure days in the life stage that the study member spent at an invalid address. See definition of Invalid address days. |
algae3124_pm10_tot_oob_days | The number of PM10 TOT exposure days in the life stage that the study member spent living at a location that is considered outside the bounds of the exposure area. See definition of Out of bounds days. |
algae3125_pm10_tot_poor_addr_days | The number of PM10 TOT exposure days in the life stage that the study member spent living at a location whose geocode was derived from a poor quality residential address. The geocode was used to generate exposure values, but it is still considered to be invalid because it is of such poor quality. See definition of Poor address days. |
algae3126_pm10_tot_missing_exp_days | The number of PM10 TOT exposure days in the life stage that the study member spent living at a valid geocode which has some exposure values but not for specific days. See definition of Missing exposure days. |
algae3127_pm10_tot_good_addr_days | The number of PM10 TOT exposure days in the life stage that the study member spent living at a geocode that is considered a good match: it has a valid geocode and it has a non-blank exposure value for a given day. See definition of Good address days. |
algae3128_name_sum |
Cumulative exposure of NAME for the given life_stage .
|
algae3129_name_err_sum |
Cumulative exposure measurement error for NAME measured for the given life_stage
|
algae3130_name_avg |
Average exposure for NAME measured for the given life_stage
|
algae3131_name_err_avg |
Average exposure measurement error for NAME measured for the given life_stage
|
algae3132_name_med |
Median exposure for NAME measured for the given life_stage
|
algae3133_name_err_med |
Median exposure measurement error for NAME measured for the given life_stage
|
algae3134_nox_rd_sum |
Cumulative exposure of NOX (road sources) for a given life_stage .
|
algae3135_nox_rd_err_sum |
Cumulative exposure measurement error for NOX (road sources), measured for the given life_stage
|
algae3136_nox_rd_avg |
Average exposure of NOX (road sources) for a given life_stage .
|
algae3137_nox_rd_err_avg |
Cumulative exposure measurement error for NOX (road sources), measured for the given life_stage
|
algae3138_nox_rd_med |
Median exposure for NOX (road sources), measured for the given life_stage
|
algae3139_nox_rd_err_med |
Median exposure measurement error for NOX (road sources), measured for the given life_stage
|
algae3140_pm10_gr_sum |
Cumulative exposure for PM10 (non-road sources), measured for the given life_stage
|
algae3141_pm10_gr_err_sum |
Cumulative exposure measurement error for PM10 (non-road sources), measured for the given life_stage
|
algae3142_pm10_gr_avg |
Average exposure for PM10 (non-road sources), measured for the given life_stage
|
algae3143_pm10_gr_err_avg |
Average exposure measurement error for PM10 (non-road sources), measured for the given life_stage
|
algae3144_pm10_gr_med |
Median exposure for PM10 (non-road sources), measured for the given life_stage
|
algae3145_pm10_gr_err_med |
Median exposure measurement error for PM10 (non-road sources), measured for the given life_stage
|
algae3146_pm10_rd_sum |
Cumulative exposure for PM10 (road sources), measured for the given life_stage
|
algae3147_pm10_rd_err_sum |
Cumulative exposure measurement error for PM10 (road sources), measured for the given life_stage
|
algae3148_pm10_rd_avg |
Average exposure for PM10 (road sources), measured for the given life_stage
|
algae3149_pm10_rd_err_avg |
Average exposure measurement error for PM10 (road sources), measured for the given life_stage
|
algae3150_pm10_rd_med |
Median exposure for PM10 (road sources), measured for the given life_stage
|
algae3151_pm10_rd_err_med |
Median exposure measurement error for PM10 (road sources), measured for the given life_stage
|
algae3152_pm10_tot_sum |
Cumulative exposure for PM10 (all sources), measured for the given life_stage
|
algae3153_pm10_tot_err_sum |
Cumulative exposure measurement error for PM10 (all sources), measured for the given life_stage
|
algae3154_pm10_tot_avg |
Average exposure for PM10 (all sources), measured for the given life_stage
|
algae3155_pm10_tot_err_avg |
Average exposure measurement error for PM10 (all sources), measured for the given life_stage
|
algae3156_pm10_tot_med |
Median exposure for PM10 (all sources), measured for the given life_stage
|
algae3157_pm10_tot_err_med |
Median exposure measurement error for PM10 (all sources), measured for the given life_stage
|