Testing Part 1: Pre-processing features
by Kevin Garwood
Testing Overview | Next |
Background
ALGAE alters copies of original data set files to produce corresponding staging tables that standardise certain field values and check that various fields are either not empty or are unique. The rest of the protocol is driven from these staging tables rather than the original data files.
Tests for the pre-processing features need not be concerned with the outcomes of any other features such as cleaning address histories or calculating exposures. Most of the tests focus on correctly standardising the values of "Yes/No" fields or fields with empty values in the original data sets.
If the tests for the pre-processing features all pass, then test cases in all
subsequent test areas can rely on using standardised field values. For example,
if we show that the protocol will replace Yes
for Y
in the
has_valid_geocode
field of the original_geocode_data file, we don't have
to write a similar test for the Geocode Feature area. If the pre-processing tests complete,
we can be guaranteed that any table field "person_id" will not be blank.
The main reason for developing this test area is to help constrain the variation in input values in order to minimise the proliferation of test cases we would have to consider.
Coverage
The coverage of this test area is shown in the following diagram:
Most of the testing relates to checking whether database tables exist,
or if certain fields have values which are empty or duplicated. Automated
test cases exist for the Yes/No fields absent_during_exp_period
,
at_1st_address_conception
and has_valid_geocode
.
There are no result output values to check. All the tests examine values in the staging tables to confirm that values were standardised correctly.
Input Fields Covered by Test Cases
Table | Field |
---|---|
staging_study_member_data | person_id |
staging_study_member_data | birth_date |
staging_study_member_data | absent_during_exp_period |
staging_study_member_data | at_1st_addr_conception |
staging_geocode_data | geocode |
staging_geocode_data | has_valid_geocode |
staging_address_history_data | person_id |
staging_address_history_data | geocode |
staging_exp_data | geocode |
staging_exp_data | date_of_year |
Output Fields Covered by Test Cases
There are no result output values to check. All the tests examine values in the staging tables to confirm that values were standardised correctly.Test Case Design
Do not test built-in database function
Testing does not include any effort to check that PostgreSQL's built-in database functions work - we assume they do. Therefore, no automated test cases are used to check whether the staging tables exist, or whether the database is able to add table field constraints such as requiring that they are unique or non-empty. Instead, visual inspection of the code is used to verify that built-in constraint checks are being applied to tables and fields. If any of the following assumptions do not hold, the protocol will throws an exception and stop working:-
the tables
staging_study_member_data
,staging_geocode_data
,staging_exposure_data
andstaging_address_history_data exist
-
staging_study_member_data.person_id
is both unique and non-empty -
staging_study_member_data.birth_date
is not empty -
staging_study_member_data.absent_during_exp_period
andat_1st_addr_conception
are not empty. -
staging_geocode_data.geocode
andhas_valid_geocode
are not empty -
staging_exp_data.geocode
anddate_of_year
are not null and together must be unique.
Create test cases that show that the protocol standardises yes/no field values correctly.
There are three fields in the original data tables that provide yes/no values. These are:-
original_study_member_data.absent_during_pregnancy
-
original_study_member_data.at_1st_address_conception
-
original_geocode_data.has_valid_geocode
Tests are made to ensure that in each field, the following data cleaning operations can be successfully applied:
-
{n, No, NO, FALSE, false, 0}
are cleaned to becomeN
-
{y, Yes, YES, TRUE, true, 1}
are cleaned to becomeY
Create test cases that show protocol standardises empty values for geocodes
Tests exist to check that the protocol can correctly transform field values ofNULL
, #NULLIF
and the empty string '' to
empty_geocode
. All of the
work needed to standardise empty values is contained with the function call:
standardise_geocode_value(original_geocode_value, is_null_allowed)
In order to minimise testing costs, tests are developed to exercise this function rather than the set of input fields that may be transformed by them. In a production settings, these tests will appear commented out. The following inputs for the function should provide these results:
-
an original geocode of
123456-234543
should remain unchanged -
original geocode values of
NULL
,#NULLIF
, '' should result inempty_geocode
.