Data loading part 4: Prepare the exposure data
by Kevin Garwood
Overview | Previous |
Purpose
This table contains exposure values that are linked to locations that study members have occupied during their exposure time frame. Exposures for various types of pollutants are associated with a given geocode location for a given date.In the early analysis, ALGAE assumes the records represent daily exposure values for each location. In the later life analysis, ALGAE assumes the records are annual average exposure values. For this analysis, the date is expected to be the first date of the calendar year that an annual exposure record describes for a given location. ALGAE later converts the annual exposure values into daily exposure records.
Location of Original Data File
You will need to create a file that has this nameoriginal_exposure_data.csvwhich must be located in:
early_life/input_dataor
later_life/input_data
Example Original Data File
See here.Suggested Approach
The suggested approach can best be viewed as a way of making a matrix where exposure values for all geocoded locations are set against daily exposure values that cover the time span used by all cohort members that are in the study.Step 1:Identify earliest exposure start date
We assume that you will generate exposure values by using an exposure model to process addresses of all cohort members, and cover the the exposure periods of all cohort members who are part of your study.In order to know when your exposure model should begin generating historical values, find the earliest start date for any exposure period of any person who is enrolled in your study. In early life analysis, the date will describe the earliest conception date (See life stage calculations to see how conception date is calculated). In later life analysis, the date will describe the earliest value of birth_date + 1 year for any cohort member who is in the study. So if you start with year 1 (YR1), year 2 (YR2), year 3 (YR3) then you would be looking for the earliest value of YR1.
Step 2: Identify latest exposure end date
Next, you will want to know the latest date of the latest life stage that is covered by an analysis. In the case of early analysis, that early life stage will typically be called EL and the key date will be the most recent last day of a first year of life. For exposure modelling, you would then want to create daily exposure values for:
[earliest conception date, latest last day of EL]
[earliest YR1 start date, latest YR15 end date]
Step 3: Identify addresses used by study members during overall exposure time frame
The most economical way of doing this step would be to use only the addresses that study members used during their exposure period. However, this would assume that you had done all the data cleaning for the address periods first and the data cleaning for the address periods presumes that the geocodes already exist.
In the ALGAE protcol, we assume that you will geocode residential addresses without
knowing who they may belong to or when they were occupied. That information is contained
in the original_address_history_data
table.
Step 4: Know the meaning of date_of_year depending on your analysis
The fielddate_of_year
has a different meaning, depending on whether you are
doing an early life analysis or a later life analysis. In the early life analysis,
this value refers to the date of daily exposure values for various pollutants. In
the later life analysis, ALGAE expects that the records will be annual exposure values and
that the date will be the first day of the calendar year of exposure coverage.
General Advice
Using an exposure model to generate exposures for multiple pollutants for all geocoded locations for the overall exposure time frame will likely be computationally intensive. We suggest that early in your project, you should invest in acquiring some form of secure access, high performance computing environment.Example Table
See here.Table Properties
You need to produce a CSV file calledoriginal_geocode_data
. It must
have the following fields:
Field | Description | Required | Properties | Examples |
---|---|---|---|---|
geocode | Represents the location of a residential address. For ALGAE, the geocode is treated as just an identifier and the protocol attaches no meaning to the code. | Yes | Any text |
37.422036-122.084124
x4353bi838 (anonymised) |
comments | Any other information the cohort researchers want to provide about the geocode | No | Any text | |
date_of_year | Date of an exposure record. If you are running the early life analyses, ALGAE assumes this is the date of a daily exposure record. If you are running the later life analysis then this date is January 1st of a calendar year and the exposure values represent annual average values. | Yes | Date Format: dd/MM/yyyy | 23/03/1996 |
pm10_rd | PM10 from local road sources | No | A number with at most 15 decimal digits precision | |
nox_rd | NOx from local road sources | No | A number with at most 15 decimal digits precision | |
pm10_gr | PM10 from sources other than roads. | No | A number with at most 15 decimal digits precision | |
pm10_rd | PM10 from road sources. | No | A number with at most 15 decimal digits precision | |
pm10_tot | PM10 from all sources. | No | A number with at most 15 decimal digits precision |