ALGAE Data Dictionary: Later life sensitivity variables (algae6300-algae6317)

An automated protocol for assigning early life exposures to longitudinal cohort studies

ALGAE Data Dictionary: Later life sensitivity variables (algae6300-algae6317)

by Kevin Garwood

Context of Variables

These variables describe aspects of how the original data sets have been cleaned and processed in the later life analysis. They can be used by researchers in order to isolate subsets of results and estimate the effect of data cleaning activities on results.

Location of Result File

You will find these variables in a file having a name that fits the form:
res_later_sens_variables_[Date stamp].csv
which will be found in the directory:
later_life/results/sensitivity_variables

Example Result File

See here.

Variable Naming Conventions

All of the variables begin with algae63, which indicates that they part of the later life sensitivity variables. There are other phrases in the variable names that have the following meanings:
  • exp_period : exposure period eg: [conception date, last day of first year of life]
  • cln_ : cleaned value.
  • addr_ : addresses

Variable Dictionary

Variable Description
algae6300_person_id An anonymised or pseudonymised identifier which represents a study member. ALGAE uses this variable to link data together for a given study member.
algae6301_at_1st_addr_concept "Y" for "Yes" if study members were definitely at their enrolment addresses when they were conceived. Otherwise the result is "N" for "No". These values have been borrowed from the original_study_member_data table.
algae6302_absent_in_exp "Y" for "Yes" if study members spent a significant amount of their exposure period living at addresses that are not included in their residential address history shown in the original_study_member_data table. Otherwise the result is "N" for "No". These values have been borrowed from the original_study_member_data table.
algae6303_gest_age Estimated gestation age at birth, measured in weeks. This sensitivity variable is useful to look at in cases of premature births. If study members are born too premature, they will not have a T3 exposure value. These values have been borrowed from the original_study_member_data table.
algae6304_is_gest_age_imp "Y" for "Yes" if the gestation age at birth value for algae6303_gest_age was imputed. Otherwise, the result is "N" for "No".
algae6305_total_addr Total number of addresses study members occupied during their whole exposure period.
algae6306_fixed_geocodes The total number of address periods that fall within the exposure period and which have fixed geocodes. A fixed geocode refers to a data cleaning scenario where ALGAE assumes that an incorrectly specified residential address, which failed to be geocoded, was corrected in the next residential address period.

An address period will be fixed if it meets all three of the following criteria:

  1. it has an invalid geocode (a blank geocode value or one which has a has_valid_geocode value of "N" in the original_geocode_data table)
  2. the address period does not overlap with any of the study member's life stage by at least 25%.
  3. it is immediately followed by an address period which has a valid geocode

If an address period has been 'fixed', then it will be subsumed by the next address period. ALGAE will assume that for a fixed address an, an+1's start date can be set to the start date of an. This has the effect of creating a version of an+1 which completely overlaps with an. This is not recorded as a duplicate that needs to be deleted. Instead, it is flagged and then ignored as if it never appeared in the original_address_history_data file.

algae6307_over_laps The total number of overlaps that appear in the residential address history that overlaps with the person's exposure period.
algae6308_gaps The total number of gaps that appear in the residential address history that overlaps with the person's exposure period.
algae6309_gap_over_lap The number of address periods that overlap with the exposure period and have been altered both to fix a gap and an overlap. In such an address period, you should expect to see that its start date was altered to fill a gap and its end date was contracted to fix an overlap with the following address period.
algae6310_deletions The number of address periods which overlap with the exposure period and which have been deleted. An address period is marked for deletion if the data cleaning routines determine that an address period is completely subsumed by the following address period.

For example, consider this example for person 123:

a1   01-MAY-1996   10-MAY-1996
a2   01-MAY-1996   20-MAY-1996

Here, a1 will be marked as deleted. If it falls within the study member 123's exposure period, then it will contribute to algae6312_deletions.

algae6311_cln_blank_start_date The total number of blank start dates that were imputed in address periods which overlapped with the study member's exposure period. When an address period has a blank start date, it is imputed with the study member's conception date.
algae6312_cln_blank_end_date The total number of blank end dates that were imputed in address periods which overlapped with the study member's exposure period. When an address period has a blank end date, it is imputed with the current date.
algae6313_cln_blank_both_dates The total number of address periods which had both a blank start date and a blank end dates. Note that all of these address periods will overlap with a study member's exposure period.
algae6314_cln_last_dates The number of address periods which overlap with the study member's exposure period and which had to have their end date adjusted to cover the end of the exposure period. This will always be the last address period whose end date is less than the person's last exposure day.
algae6315_days_changed Total number of days changed in address periods that overlapped with the study member's exposure period. Note that this value is the sum of changed days in each address period. In cases of successive overlapping address periods, the total could contain days which are counted multiple times. For example, consider the following address periods:
a1   01-MAY-1996   10-MAY-1996
a2   05-MAY-1996   20-MAY-1996
a3   08-MAY-1996   25-MAY-1996

After ALGAE cleans these address periods, 08-MAY-1996 will be counted in the days changed for each of the address periods. 12-MAY-1996 will be counted once in each of a2 and a3.

algae6316_contention_days The total number of contention days that overlap with a study member's exposure period.

Note that whereas algae6317_days_changed may count the same day multiple times if it forms part of the adjustment to more than one address period, this variable only counts distinct days. For example, consider the following three address periods:

	       Original                       Cleaned                 Days Changed
	 Start Date    End Date         Start Date    End Date
a1   01-MAY-1996   03-MAY-1996      01-MAY-1996   01-MAY-1996         2
a2   02-MAY-1996   04-MAY-1996      02-MAY-1996   02-MAY-1996         2
a3   03-MAY-1996   05-JAN-2016      03-MAY-1996   05-JAN-2016         0

If we consider all three address periods, the total days changed would be 4, but the total days of contention would be 3: one for each of 02-MAY-1996, 03-MAY-1996, and 04-MAY-1996. The total days of contention would ignore the fact that 03-MAY-1996 appear in the changed parts of both a1 and a2.

algae6317_no_exp_data_days The total number of missing exposure days that overlap with a study member's exposure period. Note that this variable does not describe