Testing Part 2: Geocode Features
by Kevin Garwood
Testing Overview | Previous | Next |
Background
These features relate to the spatial aspects of cleaning address histories. For various reasons, some address periods will have a 'bad geocode' - one which is blank, is considered out-of-bounds, or is otherwise marked with has_valid_geocode=N in the staging_geocode_data table. If a study member has an address period with a bad geocode that is within their exposure time frame and cannot be fixed, then that person will be excluded from any exposure assessment.
Processing geocodes is identical for early life and later life analyses; therefore,
we will limit testing to only using early life data. The tests will cover variables
that appear in both the finished address period file res_early_cleaned_addr.csv
and in the sensitivity variable files res_early_sens_variables
.
If all the tests pass for the geocode testing area, then test cases in the remaining test areas can be designed to use only valid geocodes that have exposure values. Having this test area allows us to simplify test design by separating concerns about spatial from temporal data cleaning.
Coverage
Input Fields Covered by Test Cases
Table | Field |
---|---|
original_geocode_data | geocode |
original_geocode_data | has_valid_geocode |
original_address_history_data | geocode |
original_address_history_data | geocode |
original_address_history_data | start_date |
original_address_history_data | end_date |
Output Fields Covered by Test Cases
Table | Field |
early_cleaned_addr | start_date |
early_cleaned_addr | end_date |
early_cleaned_addr | is_fixed_invalid_geocode |
early_cleaned_addr | ith_residence_type |
early_cleaned_addr | fin_adjusted_start_date |
early_cleaned_addr | fin_adjusted_end_date |
early_sens_variables | total_addr_periods |
early_sens_variables | out_of_bounds_geocodes |
early_sens_variables | invalid_geocodes |
early_sens_variables | fixed_geoocodes |
early_sens_variables | has_bad_geocode_within_time_frame |