Testing Part 4: Address History Features

by Kevin Garwood

Background

These features describe the temporal aspects of cleaning address periods. They ensure that study members live at exactly one address for each day of their exposure time frame. It mainly focuses on how the protocol handles temporal gaps, overlaps and deletions in a chronologically ordered sequence of address periods.

Tests examine cleaned address variables that appear in the res_early_cleaned_addr result file and the sensitivity variables that appear in the res_sens_variables and res_early_stage_sens result files.

If all the tests in both address history and geocode feature areas pass, then we may assume that all the address periods used in the Exposures test area will be valid.

Coverage

Input Fields Covered by Test Cases

Table	Field
original_geocode_data	geocode
original_geocode_data	has_valid_geocode

Test Case Design

This area covers a lot of variables, many of which involve variables which capture the extent of changes made from cleaning address periods:

out_of_bounds_geocodes
invalid_geocodes
fixed_geocodes
total_addr_periods
over_laps
gaps
gap_and_overlap_same_period
deletions
imp_blank_start_dates
imp_blank_end_dates
imp_blank_both_dates
imp_last_dates
days_changed
has_bad_geocode_within_time_frame
total_contention_days

The next few test case themes focus on how the number of address periods study members have influences these variables.

Test for a study member who has no address periods

The protocol needs to anticipate errors that may occur when the input data sets are linked. If the original_study_member_data and original_address_history_data files are prepared by separate groups, then it is possible that study members mentioned in one file may not appear in the other.

If a study member has no address periods, then would we use NULL or 0 to indicate the total number of address periods they occupied during his or her exposure time frame? On one hand we know that the study member must have lived somewhere, so it would seem zero would be incorrect. However, if the protocol is counting the number of available address periods that cover the exposure time frame, then the answer would be 0.

In this area, if study members have no associated address periods, we will assign zero rather than NULL to variables that count different types of changes.

Test for a study member who has one address period

If a study member experiences no address changes, then many fields will have predictable values:

fixed_geocodes = 0
total_addr_periods = 1
over_laps = 0
gaps = 0
gap_and_overlap_same_period = 0
deletions = 0
imp_blank_start_dates = 0
imp_blank_end_dates = 0
imp_blank_both_dates = 0
imp_last_dates = 0
days_changed = 0
total_contention_days = 0

Test for a study member who has two or more address periods

Once study members have at least two address periods, they are able to have gaps, overlaps, deletions, contention days, and fixed bad geocodes.

Test exhaustively for gaps and overlaps using two successive address periods that are 1, 2 and 3 days of length

One of the most important aspects of the protocol is that it is able to create a temporally continuous address history that spans the entire exposure time frame of the study member. Fixing gaps and overlaps that may exist between successive address periods is a critical part of that process. Study members will not likely have address period durations that are so short. However, periods of 1, 2 and 3 days often capture many of the edge test cases in test cases and we felt that if the protocol could handle them, it could handle combinations of address periods that had an arbitrary length.

Include test case where the same address period is involved with cleaning gap and an overlap

It is possible that the temporal boundaries of an address period could be changed twice: once in response to fixing a gap and another in response to fixing an overlap. The diagram below illustrates this case:

The ALGAE Protocol

ALorithms for Generating address histories and Exposures

An automated protocol for assigning early life exposures to longitudinal cohort studies