ALGAE Data Dictionary: Later life sensitivity variables (algae6300-algae6317)
by Kevin Garwood
Context of Variables
These variables describe aspects of how the original data sets have been cleaned and processed in the later life analysis. They can be used by researchers in order to isolate subsets of results and estimate the effect of data cleaning activities on results.Location of Result File
You will find these variables in a file having a name that fits the form:res_later_sens_variables_[Date stamp].csvwhich will be found in the directory:
later_life/results/sensitivity_variables
Example Result File
See here.Variable Naming Conventions
All of the variables begin withalgae63
, which indicates that they
part of the later life sensitivity variables. There are other phrases in the variable names
that have the following meanings:
-
exp_period
: exposure period eg:[conception date, last day of first year of life]
-
cln_
: cleaned value. -
addr_
: addresses
Variable Dictionary
Variable | Description |
---|---|
algae6300_person_id | An anonymised or pseudonymised identifier which represents a study member. ALGAE uses this variable to link data together for a given study member. |
algae6301_at_1st_addr_concept |
"Y" for "Yes" if study members were definitely at their enrolment addresses when they were conceived.
Otherwise the result is "N" for "No". These values have been borrowed from the
original_study_member_data table.
|
algae6302_absent_in_exp |
"Y" for "Yes" if study members spent a significant amount of their exposure period living at
addresses that are not included in their residential address history shown in the
original_study_member_data table. Otherwise the result is "N" for "No".
These values have been borrowed from the original_study_member_data table.
|
algae6303_gest_age |
Estimated gestation age at birth, measured in weeks. This sensitivity variable is useful to look at
in cases of premature births. If study members are born too premature, they will not have a T3
exposure value. These values have been borrowed from the original_study_member_data table.
|
algae6304_is_gest_age_imp |
"Y" for "Yes" if the gestation age at birth value for algae6303_gest_age was imputed.
Otherwise, the result is "N" for "No".
|
algae6305_total_addr | Total number of addresses study members occupied during their whole exposure period. |
algae6306_fixed_geocodes |
The total number of address periods that fall within the exposure period and which have fixed geocodes.
A fixed geocode refers to a data cleaning scenario where ALGAE assumes that an incorrectly specified
residential address, which failed to be geocoded, was corrected in the next residential address period.
An address period will be fixed if it meets all three of the following criteria:
If an address period has been 'fixed', then it will be subsumed by the next address period. ALGAE will assume
that for a fixed address an, an+1's start date can be set to the start date of
an. This has the effect of creating a version of an+1 which completely overlaps with
an. This is not recorded as a duplicate that needs to be deleted. Instead, it is flagged and
then ignored as if it never appeared in the |
algae6307_over_laps | The total number of overlaps that appear in the residential address history that overlaps with the person's exposure period. |
algae6308_gaps | The total number of gaps that appear in the residential address history that overlaps with the person's exposure period. |
algae6309_gap_over_lap | The number of address periods that overlap with the exposure period and have been altered both to fix a gap and an overlap. In such an address period, you should expect to see that its start date was altered to fill a gap and its end date was contracted to fix an overlap with the following address period. |
algae6310_deletions |
The number of address periods which overlap with the exposure period and which have been deleted. An address
period is marked for deletion if the data cleaning routines determine that an address period is completely
subsumed by the following address period.
For example, consider this example for person 123: a1 01-MAY-1996 10-MAY-1996 a2 01-MAY-1996 20-MAY-1996
Here, a1 will be marked as deleted. If it falls within the study member 123's exposure period, then
it will contribute to |
algae6311_cln_blank_start_date | The total number of blank start dates that were imputed in address periods which overlapped with the study member's exposure period. When an address period has a blank start date, it is imputed with the study member's conception date. |
algae6312_cln_blank_end_date | The total number of blank end dates that were imputed in address periods which overlapped with the study member's exposure period. When an address period has a blank end date, it is imputed with the current date. |
algae6313_cln_blank_both_dates | The total number of address periods which had both a blank start date and a blank end dates. Note that all of these address periods will overlap with a study member's exposure period. |
algae6314_cln_last_dates | The number of address periods which overlap with the study member's exposure period and which had to have their end date adjusted to cover the end of the exposure period. This will always be the last address period whose end date is less than the person's last exposure day. |
algae6315_days_changed |
Total number of days changed in address periods that overlapped with the study member's exposure period.
Note that this value is the sum of changed days in each address period. In cases of successive
overlapping address periods, the total could contain days which are counted multiple times. For example,
consider the following address periods:
a1 01-MAY-1996 10-MAY-1996 a2 05-MAY-1996 20-MAY-1996 a3 08-MAY-1996 25-MAY-1996
After ALGAE cleans these address periods, |
algae6316_contention_days |
The total number of contention days that overlap with a study
member's exposure period.
Note that whereas Original Cleaned Days Changed Start Date End Date Start Date End Date a1 01-MAY-1996 03-MAY-1996 01-MAY-1996 01-MAY-1996 2 a2 02-MAY-1996 04-MAY-1996 02-MAY-1996 02-MAY-1996 2 a3 03-MAY-1996 05-JAN-2016 03-MAY-1996 05-JAN-2016 0
If we consider all three address periods, the total days changed would be 4, but the total days of contention
would be 3: one for each of |
algae6317_no_exp_data_days | The total number of missing exposure days that overlap with a study member's exposure period. Note that this variable does not describe |