ALGAE Data Dictionary: Later life address period changes (algae2200-algae2233)
by Kevin Garwood
Context of Variables
These variables describe the original address periods used in early life analysis and all the changes that were made to them so they could be used in an exposure assessment. Although most of the variables would have the same value as address period variables used in the later life analysis, there are some key differences. The variablealgae2228_is_within_exp
indicates whether or not the address period
overlaps with a study member's exposure period. There is a similar variable
algae2228_is_within_exp
for the later life analysis.
Both of these variables depend on the duration of the study member's exposure period. In
early life, this could be [conception date, last day of first year]
, whereas in
later life analyses, this could be [yr1.start_date, year15.end_date]
. You should
expect that these field values may appear different for the same address period, depending on
whether you are running an early life or later life analysis.
Note that some of these variables may be considered too sensitive to take off-site. Check your information governance policies and see: Assessing Sensitive Data in the Data Dictionary.
Location of Result File
You will find these variables in a file having a name that fits the form:res_later_cleaned_addr[Date stamp].csvwhich will be found in the directory:
later_life/results/cleaned_address_historyor
later_life/results/cleaned_address_history
Example Result File
See here.Variable Naming Conventions
The basic format of variables in this section follows this pattern:algae22[00-30]_[base variable name]
In this pattern, algae22
indicates that the variables relate to the address periods,
as they were used within the later life assessment. Two other common phrases that appear
within variable names here are:
-
_adj_
: adjusted -
_within_exp
: within exposure period.
Variable Dictionary
Variable | Description |
---|---|
algae2200_original_row_number |
The row number refers to the position the record had in the file that was loaded
into the original_addr_history_data .
|
algae2201_person_id | An anonymised or pseudonymised identifier which represents a study member. ALGAE uses this variable to link data together for a given study member. |
algae2202_ith_residence | Describes the sequence of address periods. Note that a study member could have two address periods which have the same location but a different ith_residence value. This would indicate that a person is moving back and forth between places. |
algae2203_geocode | Represents the location. Normally this would be some concatenation of map coordinates but ALGAE attaches no meaning to the contents of this field. It simply uses geocode as an identifier that is used to link tables. |
algae2204_date_state |
Indicates what actions were taken to ensure that both the start date and
end date fields have values. The states include:
|
algae2205_start_date | The original start date, before any data cleaning was done. |
algae2206_end_date | The original end date, before any data cleaning was done. |
algae2207_duration | The number of days represented by the time frame [start_date, end_date]. The total includes the boundary dates as well. |
algae2208_ith_residence_type |
Describes the relative position of an address period with respect to
all a study member's address periods. The variable has the following
states:
|
algae2209_has_valid_geocode | 'Y' if an address period has a valid geocode and 'N' if it does not have a valid geocode. For this field, a geocode is valid if it has a non-blank value and the 'has_valid_geocode' field in the original geocode data table is 'Y'. |
algae2210_has_name_exposures |
'Y' if the geocode is associated with at least one non-null NAME value in
the exposure records found in the staging_exp_data table. 'N'
if the geocode has no NAME values at all.
|
algae2211_has_nox_rd_exposures |
'Y' if the geocode is associated with at least one non-null NOX RD value in
the exposure records found in the staging_exp_data table. 'N'
if the geocode has no NOX RD values at all.
|
algae2212_has_pm10_gr_exposures |
'Y' if the geocode is associated with at least one non-null PM10 GR value in
the exposure records found in the staging_exp_data table. 'N'
if the geocode has no PM10 GR values at all.
|
algae2213_has_pm10_rd_exposures |
'Y' if the geocode is associated with at least one non-null PM10 RD value in
the exposure records found in the staging_exp_data table. 'N'
if the geocode has no PM10 RD values at all.
|
algae2214_has_pm10_tot_exposures |
'Y' if the geocode is associated with at least one non-null PM10 TOT value in
the exposure records found in the staging_exp_data table. 'N'
if the geocode has no PM10 TOT values at all.
|
algae2215_max_life_stage_overlap | The maximum number of days that an address period will overlap with any life stage. The value is used to help assess whether address periods having bad geocodes should be cleaned or not. |
algae2216_is_fixed_inv_geocode |
Indicates 'Y' or 'N' for whether an address period has a fixed invalid geocode.
An address period with a bad geocode can be fixed if it meets three criteria:
If an address period |
algae2217_fit_extent |
A number whose sign indicates how this address period a(n) fits with the
previous one a(n - 1). It has the following meanings:
|
algae2218_adj_start_date | The start date of the address period, correcting for any gaps which may have occurred between the current address period and the previous one. |
algae2219_adj_end_date | The end date of the address period, correcting for any overlaps which may have occurred between the current address period and the next one. |
algae2220_days_changed |
The total number of days that were changed after gap and overlaps were fixed.
It is calculated as follows:
Note that the days_changed value between two successive address periods may not
necessarily describe different days. For example, if |
algae2221_fit_type |
Describes how the temporal boundaries of an address period were changed as a result of
correcting for gaps and overlaps. It can have the following values:
|
algae2222_start_date_delta1 |
The lower limit of the period which describes the change in the start date. For example,
suppose the start date for an address period a(n) was originally
05-05-1996 but was changed to 01-05-1996 to help close a gap
with the previous address period a(n-1) . Then the change (delta) in start
date would be characterised by the period [01-05-1996, 04-05-1996]
inclusively and 01-05-1996 would be algae2222_start_date_delta1
and 04-05-1996 would be algae2223_start_date_delta2 .
Note that if these fields are null they indicate that the start date of the address period did not have to be changed in response to a gap with the previous address period. |
algae2223_start_date_delta2 |
The upper limit of the period which describes the change in the start date. See the
example for algae2222_start_date_delta1 . Notice this value will either
be null or the day before the original start date.
|
algae2224_end_date_delta1 |
The lower limit of the period which describes the change in the end date. For example,
suppose the end date for an address period a(n) was originally
15-11-1994 but was changed to 10-11-1994 so that it would not
overlap with the start date of the following address period a(n+1)
Then the change (delta) in end date would be characterised by the period
[11-11-1994, 15-11-1994]
inclusively. 11-11-1994 would be algae2224_end_date_delta1
and 15-11-1994 would be algae2225_end_date_delta2 . Notice that
the value for algae2224_end_date_delta1 will either be null or the start
date of the following address period that was being overlapped.
Note that if these fields are null they indicate that the start date of the address period did not have to be changed in response to an overlap with the next address period. |
algae2225_end_date_delta2 |
The upper limit of the period which describes the change in end date. See the example for
algae2224_end_date_delta1 .
|
algae2226_previous_geocode |
The geocode appearing in the previous address period. Note that determining the previous geocode
ignores all address periods that have had bad geocodes 'fixed' (See entry for
algae2216_is_fixed_inv_geocode ). previous_geocode and
next_geocode are used to assess the difference between assigned and opportunity
cost exposures.
|
algae2227_next_geocode |
The geocode appearing in the next address period. Determining the next geocode ignores
address periods which have bad geocodes that have been fixed (See entry for
algae2222_previous_geocode ).
|
algae2228_fin_adj_start_date |
This the final value of the start date. Whereas algae2218_adj_start_date
will be adjusted in response to fixing gaps, this field may consider other changes that
are not considered part of a correction. Specifically, the start date may be altered
if it is a study member's first address period and the start date needs to be changed
so that it covers all the time from the date of conception through to the date when
he or she was first enrolled in the study.
Moving the start date back to the conception date is not treated as a change that warrants assessing the kind of exposure measurement error that is associated with gaps and overlaps. Instead, we rely on sensitivity variables to indicate how certain we can be that the study members occupied their first address when they were being conceived. |
algae2229_imputed_first_start | Indicates a 'Y' or 'N' response whether this is a first address period that has been altered to cover the period from conception until study member enrolment. In the case of a birth cohort that recruited already pregnant mothers, we would expect the answer to be 'Y'. |
algae2230_fin_adj_end_date |
This is the final value of the end date. The value may differ from
algae2219_adj_end_date if this is the last address period for a study member
and the end date had to be moved so that it covered up until the last day in the exposure
period. The context of this variable is similar to algae2228_fin_adj_start_date .
|
algae2231_imputed_last_end | 'Y' if this is the last address period for a study member and the end date of that period had to be changed to be at least the last day in his or her exposure period. Otherwise 'N' for No. |
algae2232_start_date_days_from_concep |
Measures the total number of days between the algae2228_fin_adj_start_date
and the person's date of conception. The value is used to construct a data set that
captures geographical covariates for every move made by a study member during their
exposure period. Moves can be ordered not based on an explicit date but on a date relative
to conception.
|
algae2233_is_within_exp |
Indicates 'Y' or 'N' whether this address period falls within the study member's exposure period.
For example, if their last day in their exposure period is on 01-05-1994 , an
address period covering the dates [11-10-2003, 22-12-2005] would not fall within
the person's exposure period.
|