RIF Data Loading

  1. Overview
  2. RIF Capabilities
  3. Limitations
    1. Data Loader Tool
  4. RIF Data Loading Prerequisites
    1. Postgres
    2. SQL Server
      1. BCP Access Permissions
      2. BCP Format Issues
    3. Data Structure
      1. Numerator
      2. Denominator
      3. Covariates
      4. Administrative Geography
        1. Geography
        2. Geolevels
        3. Lookup tables
        4. Tile tables
        5. Adjacency tables
        6. Hierarchy tables
        7. Geometry tables
        8. Shapefile, shapefile tables
        9. Centroids tables
      5. Health Themes
  5. Load Processing
    1. Generate Series
    2. Administrative Geography
    3. Postgres
    4. SQL Server
    5. Numerator
    6. Pre Processing
    7. Load Processing
    8. Denominator
    9. Pre Processing
    10. Load Processing
    11. Covariates
    12. Pre Processing
    13. Load Processing
  6. Information Governance
  7. Flexible Configuration Support
    1. Age Groups
    2. ICD field Name
    3. Automatic Numerator Denominator Pairs
  8. Quality Control
    1. Extract Warnings
    2. Numerator Denominator Pair Errors
    3. Check that the RIF can deduce the outcome columns for the numerator tables
  9. Remote Data Links
    1. SQL Server to Oracle
    2. Postgres to Oracle
      1. Setting up
      2. Diagnostics
      3. HES Example

Overview

This document details the manual process for the loading of data into the RIF. The RIF requires the following types of data:

RIF data loading occurs in two distinct phases;

RIF Capabilities

The RIF supports:

In addition to the limitations detailed below which will be removed in future releases the following capabilities may potentially be added in future releases:

Limitations

Limitations in the current RIF:

It is planned to remove the following restrictions progressively in future releases:

Like the old RIF there are field naming restrictions:

THe ICD field name is now configurable. The default is ICD_SAHSU_01.

Data Loader Tool

The data loader tool in its current prototype form is severely limited in its ability to transform data by:

The following are known issues with the RIF data loader.

It is envisaged that the data loader will become a browser based tool like the RIF of the RIF tool chain.

The current data loader documentation is in the RIF Data Loader Manual

RIF Data Loading Prerequisites

The following steps need to be carried out to load process the data:

The following are restrictions on the naming of columns:

Postgres

Postgres uses the \copy command to load and unload data. \copy cannot handle fixed length data; this is loaded as a fixed length string and parsed using SQL.

The SEER data load requires USA load phase data pg_USQ_2014.sql to be loaded as a RIF user (not rif40) and the production data (rif_pg_usa_2014.sql) needs to be loaded into the rif40 account in the rif_data schema.

SQL Server

BCP Access Permissions

SQL Server needs access permission granted to the directories used to BULK INSERT files, the files are not copied from the client to the server as in the Postgres psql ```\copy` command and the Oracle sqlldr command. This also implies that a full file path is required and the file name must be accessible on the SQL Server server; it does NOT have to be accessible on the client.

SQL Server needs access to these directories. The simplest way is to allow read/execute access to the local users group (e.g. PH-LAPTOP\Users or USERS depending on your Windows version).

DO NOT TRY TO RUN BULK INSERT FROM NETWORK DRIVES or CLOUD DRIVES (e.g. Google Drive). Use a local directory which SQL Server has access to; e.g. somewhere on the C: drive. Note that SQL Server BULK LOAD behaves deterrently if you logon using Windows authentication (where it will use your credentials to access the files) to using a username and password (where it will use the Server’s credentials to access the file).

BULK INSERT rif_data.lookup_sahsu_grd_level1
FROM 'C:\Users\Peter\Documents\GitHub\rapidInquiryFacility\rifDatabase\SQLserver\installation\..\..\GeospatialData\tileMaker/mssql_lookup_sahsu_grd_level1.csv'     -- Note use of pwd; set via -v pwd="%cd%" in the sqlcmd command line
WITH
(
        FORMATFILE = 'C:\Users\Peter\Documents\GitHub\rapidInquiryFacility\rifDatabase\SQLserver\installation\..\..\GeospatialData\tileMaker/mssql_lookup_sahsu_grd_level1.fmt',            -- Use a format file
        TABLOCK                                 -- Table lock
);

Msg 4861, Level 16, State 1, Server PH-LAPTOP\SQLEXPRESS, Line 7
Cannot bulk load because the file "C:\Users\Peter\Documents\GitHub\rapidInquiryFacility\rifDatabase\SQLserver\installation\..\..\GeospatialData\tileMaker/mssql_lookup_sahsu_grd_level1.csv" could not be opened. Operating system error code 5(Access is denied.).

BCP Format Issues

BULK LOAD and BCP cannot handle:

Data Structure

All RIF data tables are located in the rif_data schema. They must be located in rif_data because SQL Server does not have the concept of a schema search path; the search paths have been hard coded. Tables may be located in any tablespace and you may use views which again must be located in the rif_data schema.

The RIF also support views for numerator and denominator tables. This allows for considerable flexibility in configuration as:

All table and column names must be a valid [Oracle] database name - 30 characters, uppercase, A-Z, 0-9 and underscore (_) and start with a letter. In some cases the length is reduced to 20 characters so that derived names of indexes, primary and foreign keys are under the 30 character limit.

The middleware translates these names into the correct format for the database software port. Names must NOT have schemas appended.

Numerator

The example numerator table is num_sahsuland_cancer:

column_name description data_type
year year field integer
age_sex_group An integer field which represents a combination of codes for sex and age. integer
sahsu_grd_level1 first level geographical resolution varchar(20)
sahsu_grd_level2 second geographical resolution varchar(20)
sahsu_grd_level3 third level of geographical resolution varchar(20)
sahsu_grd_level4 fourth level geographical resolution varchar(20)
icd ICD code field varchar(5)
total total field integer

The num_sahsuland_cancer table is aggregated to the highest geographic resolution. The table can disaggregated to individual level data (e.g. SAHSU cancer incidence data). Numerator table must have the following fields:

To add a numerator table to the RIF if must be added to:

outcome_group_name numer_tab current_version_start_year
SAHSULAND_ICD NUM_SAHSULAND_CANCER 1989

Denominator

The example denominator table is pop_sahsuand_pop:

column_name description data_type
year year field integer
age_sex_group An integer field which represents a combination of codes for sex and age. integer
sahsu_grd_level1 first level geographical resolution varchar(20)
sahsu_grd_level2 second geographical resolution varchar(20)
sahsu_grd_level3 third level of geographical resolution varchar(20)
sahsu_grd_level4 fourth level geographical resolution varchar(20)
total total field integer

The pop_sahsuand_pop table is aggregated to the highest geographic resolution. The table CANNOT disaggregated to individual level data or rotated with AGE_SEX_GROUP as multiple columns. Denominator table must have the following fields:

To add a denominator table to the RIF if must be added to:

theme table_name description year_start year_stop total_field isindirectdenominator isdirectdenominator isnumerator automatic age_sex_group_field_name age_group_id
cancers POP_SAHSULAND_POP population health file 1989 2016 TOTAL 1 0 0 1 AGE_SEX_GROUP 1

So, for the example denominator table pop_sahsuand_pop:

Covariates

The covariate tables with one table per geolevel for additional covariates within a geography (e.g. social exclusion scores). The table is defined in rif40_geolevels.covariate_table.

Covariate tables must contain the following fields (as in the covar_sahsuland_covariates3 example below):

Unlike other administrative geography tables; RIF managers MUST create these tables for themselves. A full range of all possible values MUST be provided, even if the covariate data is null. So in the example below

column_name description data_type
year year field integer
sahsu_grd_level3 third level of geographical resolution varchar(20)
ses socio-economic status integer
ethnicity ethnicity integer

As an example the covariate table covar_sahsuland_covariates3 contains (with rows truncated):

year sahsu_grd_level3 ses ethnicity
1989 01.001.000100 4 1
1989 01.001.000200 5 1
1989 01.001.000300 5 3
1989 01.002.000300 2 2
1989 01.002.000400 5 3
1989 01.002.000500 5 3
1989 01.002.000600 4 3
1989 01.002.000700 5 3
1989 01.002.000800 4 3
1989 01.002.000900 1 2

Administrative Geography

RIF administrative geography has nine components:

The following components are used by the administrative geography preprocessing (tile maker):

The final component is in the TODO list for future additions:

All components are created by the administrative geography preprocessing (tile maker) and loaded by a SQL Server sqlcmd or Postgres psql script.

Geography

The table rif40.rif40_geographies is used to define one or more administrative geographies

column_name description data_type
geography Geography name varchar(50)
description Description varchar(250)
hierarchytable Hierarchy table varchar(30)
srid Postgres projection SRID integer
defaultcomparea Default comparison area varchar(30)
defaultstudyarea Default study area varchar(30)
postal_population_table Postal population table. Table of postal points (e.g. postcodes, ZIP codes); geolevels; X and YCOORDINATES (in projection SRID); male, female and total populations. Converted to SRID points by loader [not in 4326 Web Mercator lat/long]. Used in creating population weight centroids and in converting postal points to geolevels. Expected columns <postal_point_column>, XCOORDINATE, YCOORDINATE, 1+ <GEOLEVEL_NAME>, MALES, FEMALES, TOTAL varchar(30)
postal_point_column Column name for postal points (e.g. POSTCODE, ZIP_CODE) varchar(30)
partition Enable partitioning. Extract tables will be partition if the number of years >= 2x the RIF40_PARAMETERS parameters Parallelisation [which has a default of 4, so extracts covering 8 years or more will be partitioned]. smallint
max_geojson_digits Max digits in ST_AsGeoJson() [optimises file size by removing unnecessary precision, the default value of 8 is normally fine.] smallint
geometrytable Geometry table name varchar(30)
tiletable Tile table name varchar(30)
minzoomlevel Minimum zoomlevel integer
maxzoomlevel Maximum zoomlevel integer
adjacencytable Adjacency table varchar(30)

The supplied SAHUSLAND test database is set up as follows:

geography description hierarchytable srid defaultcomparea defaultstudyarea partition max_geojson_digits geometrytable tiletable minzoomlevel maxzoomlevel adjacencytable
SAHSULAND SAHSU Example geography HIERARCHY_SAHSULAND 27700 SAHSU_GRD_LEVEL1 SAHSU_GRD_LEVEL3 1 6 GEOMETRY_SAHSULAND TILES_SAHSULAND 6 11 ADJACENCY_SAHSULAND

The example SEER test database is set up as follows:

geography description hierarchytable srid defaultcomparea defaultstudyarea partition max_geojson_digits geometrytable tiletable minzoomlevel maxzoomlevel adjacencytable
USA_2014 US 2014 Census geography to county level HIERARCHY_USA_2014 4269 CB_2014_US_NATION_5M CB_2014_US_STATE_500K 1 6 GEOMETRY_USA_2014 TILES_USA_2014 6 9 ADJACENCY_USA_2014

Geolevels

The table and view rif40.t_rif40_geolevels/rif40.rif40_geolevels is used to define a hierarchy of administrative areas that define a geography where each higher resolution contains one or more areas that fit exactly within the lower resolution. Users with the rif_student role cannot see the restricted geolevels.

column_name description data_type
geography Geography (e.g EW2001) varchar(50)
geolevel_name Name of geolevel. This will be a column name in the numerator/denominator tables varchar(30)
geolevel_id ID for ordering (1=lowest resolution). Up to 99 supported. smallint
description Description varchar(250)
lookup_table Lookup table name. This is used to translate codes to the common names, e.g a LADUA of 00BK is "Westminster" varchar(30)
lookup_desc_column Lookup table description column name. varchar(30)
centroidxcoordinate_column Lookup table centroid X co-ordinate column name. Can also use CENTROIDSFILE instead. varchar(30)
centroidycoordinate_column Lookup table centroid Y co-ordinate column name. varchar(30)
shapefile Location of the GIS shape file. NULL if PostGress/PostGIS used. Can also use SHAPEFILE_GEOMETRY instead, varchar(512)
centroidsfile Location of the GIS centroids file. Can also use CENTROIDXCOORDINATE_COLUMN, CENTROIDYCOORDINATE_COLUMN instead. varchar(512)
shapefile_table Table containing GIS shape file data (created using shp2pgsql). varchar(30)
shapefile_area_id_column Column containing the AREA_IDs in SHAPEFILE_TABLE varchar(30)
shapefile_desc_column Column containing the AREA_ID descriptions in SHAPEFILE_TABLE varchar(30)
centroids_table Table containing GIS shape file data with Arc GIS calculated population weighted centroids (created using shp2pgsql). PostGIS does not support population weighted centroids. varchar(30)
centroids_area_id_column Column containing the AREA_IDs in CENTROIDS_TABLE. X and Y co-ordinates columns are assumed to be named after CENTROIDXCOORDINATE_COLUMN and CENTROIDYCOORDINATE_COLUMN. varchar(30)
avg_npoints_geom Average number of points in a geometry object (AREA_ID). Used to evaluation the impact of ST_SIMPLIFY_TOLERANCE. bigint
avg_npoints_opt Average number of points in a ST_SimplifyPreserveTopology() optimised geometry object (AREA_ID). Used to evaluation the impact of ST_SIMPLIFY_TOLERANCE. bigint
file_geojson_len File length estimate (in bytes) for conversion of the entire geolevel geometry to GeoJSON. Used to evaluation the impact of ST_SIMPLIFY_TOLERANCE. bigint
leg_geom The average length (in projection units - usually metres) of a vector leg. Used to evaluation the impact of ST_SIMPLIFY_TOLERANCE. Numeric
leg_opt The average length (in projection units - usually metres) of a ST_SimplifyPreserveTopology() optimised geometry vector leg. Used to evaluation the impact of ST_SIMPLIFY_TOLERANCE. Numeric
covariate_table Name of table used for covariates at this geolevel varchar(30)
restricted Is geolevel access restricted by Information Governance restrictions (0/1). If 1 (Yes) then a) students cannot access this geolevel and b) if the system parameter ExtractControl=1 then the user must be granted permission by a RIF_MANAGER to extract from the database the results, data extract and maps tables. This is enforced by the RIF application. smallint
resolution Can use a map for selection at this resolution (0/1) smallint
comparea Able to be used as a comparison area (0/1) smallint
listing Able to be used in a disease map listing (0/1) smallint
areaid_count Area ID count integer

The supplied SAHUSLAND test database is set up as follows:

geography geolevel_name geolevel_id description lookup_table lookup_desc_column shapefile centroidsfile shapefile_table shapefile_area_id_column shapefile_desc_column covariate_table resolution comparea listing restricted areaid_count
SAHSULAND SAHSU_GRD_LEVEL4 4 Level 4 LOOKUP_SAHSU_GRD_LEVEL4 AREANAME SAHSU_GRD_Level4.shp   SAHSU_GRD_LEVEL4 LEVEL4 LEVEL4 COVAR_SAHSULAND_COVARIATES4 1 1 1 0 1230
SAHSULAND SAHSU_GRD_LEVEL3 3 Level 3 LOOKUP_SAHSU_GRD_LEVEL3 AREANAME SAHSU_GRD_Level3.shp   SAHSU_GRD_LEVEL3 LEVEL3 LEVEL3 COVAR_SAHSULAND_COVARIATES3 1 1 1 0 200
SAHSULAND SAHSU_GRD_LEVEL2 2 Level 2 LOOKUP_SAHSU_GRD_LEVEL2 AREANAME SAHSU_GRD_Level2.shp   SAHSU_GRD_LEVEL2 LEVEL2 NAME COV_SAHSU_GRD_LEVEL2 1 1 1 0 17
SAHSULAND SAHSU_GRD_LEVEL1 1 Level 1 (top level) LOOKUP_SAHSU_GRD_LEVEL1 AREANAME SAHSU_GRD_Level1.shp   SAHSU_GRD_LEVEL1 LEVEL1 LEVEL1   1 1 1 0 1

The field geolevel_id is used to order the geolevels by resolution. The highest geolevel_id is the highest resolution (i.e. has the most areas).

The example SEER test database is set up as follows:

geography geolevel_name geolevel_id description lookup_table lookup_desc_column shapefile centroidsfile shapefile_table shapefile_area_id_column shapefile_desc_column covariate_table resolution comparea listing restricted areaid_count
USA_2014 CB_2014_US_COUNTY_500K 3 The County at a scale of 1:500,000 LOOKUP_CB_2014_US_COUNTY_500K AREANAME cb_2014_us_county_500k.shp   CB_2014_US_COUNTY_500K COUNTYNS NAME COV_CB_2014_US_COUNTY_500K 1 1 1 0 3233
USA_2014 CB_2014_US_STATE_500K 2 The State at a scale of 1:500,000 LOOKUP_CB_2014_US_STATE_500K AREANAME cb_2014_us_state_500k.shp   CB_2014_US_STATE_500K STATENS NAME COV_CB_2014_US_STATE_500K 1 1 1 0 56
USA_2014 CB_2014_US_NATION_5M 1 The nation at a scale of 1:5,000,000 LOOKUP_CB_2014_US_NATION_5M AREANAME cb_2014_us_nation_5m.shp   CB_2014_US_NATION_5M GEOID NAME     1 1 1 1

Lookup tables

Lookup tables contain the names of the administrative area id codes within a geolevel and must have:

Lookup table lookup_sahsu_grd_level2;

column_name description data_type
sahsu_grd_level2 Area ID field varchar(100)
areaname Area Name field varchar(1000)
gid GID field integer
geographic_centroid Geographic centroid json

As an example the lookup table lookup_sahsu_grd_level2 contains:

sahsu_grd_level2 areaname gid geographic_centroid
01.001 Abellan 1 {“type”:”Point”,”coordinates”:[-6.36447811663261,55.1846108882703]}
01.002 Cobley 2 {“type”:”Point”,”coordinates”:[-6.72944498678687,54.9861042071018]}
01.003 Beale 3 {“type”:”Point”,”coordinates”:[-7.30272703673231,54.8380270006204]}
01.004 Hambly 4 {“type”:”Point”,”coordinates”:[-6.49752751099877,54.8431984603293]}
01.005 Briggs 5 {“type”:”Point”,”coordinates”:[-6.12712384427312,54.9266239071054]}
01.006 Andersson 6 {“type”:”Point”,”coordinates”:[-5.69842684960186,54.8976428162187]}
01.007 Hodgson 7 {“type”:”Point”,”coordinates”:[-6.89721995908163,54.0867635909076]}
01.008 Jarup 8 {“type”:”Point”,”coordinates”:[-6.54372797176735,54.4006825147175]}
01.009 Elliot 9 {“type”:”Point”,”coordinates”:[-6.1142402340227,54.4434524682342]}
01.011 Clarke 10 {“type”:”Point”,”coordinates”:[-6.23842742452612,54.1627636409006]}
01.012 Tirado 11 {“type”:”Point”,”coordinates”:[-5.77869692964237,54.2166512827777]}
01.013 Kozniewska 12 {“type”:”Point”,”coordinates”:[-6.52542157743406,53.8402864247415]}
01.014 Stordy 13 {“type”:”Point”,”coordinates”:[-7.24669171811797,53.2995738304985]}
01.015 Maitland 14 {“type”:”Point”,”coordinates”:[-6.68411239150798,53.6052365920332]}
01.016 De Hoogh 15 {“type”:”Point”,”coordinates”:[-5.57639756386895,53.7060196095887]}
01.017 Savigny 16 {“type”:”Point”,”coordinates”:[-6.99828886177757,53.0298596972378]}
01.018 Cockings 17 {“type”:”Point”,”coordinates”:[-5.93617635675363,53.1514779844403]}

And for USA states:

cb_2014_us_state_500k areaname gid geographic_centroid
00068085 Arkansas 1 {“type”:”Point”,”coordinates”:[-91.7460600676377,34.7989664318261]}
00294478 Florida 2 {“type”:”Point”,”coordinates”:[-82.191285629571,27.2641344923978]}
00448508 Indiana 3 {“type”:”Point”,”coordinates”:[-86.6396085844172,39.2583984174708]}
00481813 Kansas 4 {“type”:”Point”,”coordinates”:[-96.9603123072795,38.7478452695085]}
00606926 Massachusetts 5 {“type”:”Point”,”coordinates”:[-70.8496779292619,41.9574785877662]}
00662849 Minnesota 6 {“type”:”Point”,”coordinates”:[-94.2946075529846,46.9142548993034]}
00767982 Montana 7 {“type”:”Point”,”coordinates”:[-112.440158950593,46.0953789167559]}
00897535 New Mexico 8 {“type”:”Point”,”coordinates”:[-106.210319618421,34.0796842247757]}
01027616 North Carolina 9 {“type”:”Point”,”coordinates”:[-78.1838147583888,35.4144296092622]}
01085497 Ohio 10 {“type”:”Point”,”coordinates”:[-82.4309672991619,40.379873178333]}
01102857 Oklahoma 11 {“type”:”Point”,”coordinates”:[-97.1409457095718,34.712480112523]}
01155107 Oregon 12 {“type”:”Point”,”coordinates”:[-121.26568196262,44.5161240770227]}
01219835 Rhode Island 13 {“type”:”Point”,”coordinates”:[-71.3998545840162,41.5702071077978]}
01325873 Tennessee 14 {“type”:”Point”,”coordinates”:[-85.8408130481816,35.8337160114235]}
01455989 Utah 15 {“type”:”Point”,”coordinates”:[-111.621001625722,39.1152289226468]}
01629543 Louisiana 16 {“type”:”Point”,”coordinates”:[-91.2369599329216,30.3596077655874]}
01702382 District of Columbia 17 {“type”:”Point”,”coordinates”:[-77.0161156802619,38.899287485358]}
01705317 Georgia 18 {“type”:”Point”,”coordinates”:[-82.8985467943147,32.6450260988439]}
01714934 Maryland 19 {“type”:”Point”,”coordinates”:[-76.5695402961508,38.7769267877812]}
01779775 Alabama 20 {“type”:”Point”,”coordinates”:[-86.7664800530139,31.7690551760402]}
01779777 Arizona 21 {“type”:”Point”,”coordinates”:[-113.396230811877,34.458696450714]}
01779778 California 22 {“type”:”Point”,”coordinates”:[-120.249510325325,36.2457149373203]}
01779779 Colorado 23 {“type”:”Point”,”coordinates”:[-104.808380671455,38.9255583779329]}
01779780 Connecticut 24 {“type”:”Point”,”coordinates”:[-72.8528546706635,41.3456700453746]}
01779781 Delaware 25 {“type”:”Point”,”coordinates”:[-75.4994397114792,39.2171211171611]}
01779782 Hawaii 26 {“type”:”Point”,”coordinates”:[-157.752347796731,21.0611974598024]}
01779783 Idaho 27 {“type”:”Point”,”coordinates”:[-114.675046469887,45.4195058476719]}
01779784 Illinois 28 {“type”:”Point”,”coordinates”:[-89.2220209830065,39.8410669123454]}
01779785 Iowa 29 {“type”:”Point”,”coordinates”:[-93.8227666555831,42.0221134891659]}
01779786 Kentucky 30 {“type”:”Point”,”coordinates”:[-85.2123467005517,37.7223857693082]}
01779787 Maine 31 {“type”:”Point”,”coordinates”:[-68.9047268601883,44.5331050274351]}
01779789 Michigan 32 {“type”:”Point”,”coordinates”:[-85.7125400934235,45.4651010832855]}
01779790 Mississippi 33 {“type”:”Point”,”coordinates”:[-90.2098738969815,32.2488403702241]}
01779791 Missouri 34 {“type”:”Point”,”coordinates”:[-92.1883046699325,38.3681761336563]}
01779792 Nebraska 35 {“type”:”Point”,”coordinates”:[-97.8496886533389,41.4426955679412]}
01779793 Nevada 36 {“type”:”Point”,”coordinates”:[-115.992474312302,37.9127709992178]}
01779794 New Hampshire 37 {“type”:”Point”,”coordinates”:[-71.5978177402021,43.9585841263735]}
01779795 New Jersey 38 {“type”:”Point”,”coordinates”:[-74.7342060895472,40.1862212281874]}
01779796 New York 39 {“type”:”Point”,”coordinates”:[-74.931413218779,42.2529504249618]}
01779797 North Dakota 40 {“type”:”Point”,”coordinates”:[-97.7974492684418,47.3592469399905]}
01779798 Pennsylvania 41 {“type”:”Point”,”coordinates”:[-76.8260790277368,40.9801819217324]}
01779799 South Carolina 42 {“type”:”Point”,”coordinates”:[-81.3588011364492,33.5808270966468]}
01779801 Texas 43 {“type”:”Point”,”coordinates”:[-98.7725068956115,29.8219670967323]}
01779802 Vermont 44 {“type”:”Point”,”coordinates”:[-72.436271076001,44.0669830822455]}
01779803 Virginia 45 {“type”:”Point”,”coordinates”:[-78.0541807437174,37.6982304423068]}
01779804 Washington 46 {“type”:”Point”,”coordinates”:[-122.640006855541,47.7959450287645]}
01779805 West Virginia 47 {“type”:”Point”,”coordinates”:[-80.3551373923862,38.7506276616833]}
01779806 Wisconsin 48 {“type”:”Point”,”coordinates”:[-89.5095248112451,45.198644842279]}
01779807 Wyoming 49 {“type”:”Point”,”coordinates”:[-107.571827493527,42.9599050320965]}
01779808 Puerto Rico 50 {“type”:”Point”,”coordinates”:[-66.2423900561918,18.2032215322885]}
01779809 Commonwealth of the Northern Mariana Islands 51 {“type”:”Point”,”coordinates”:[145.615819585475,16.1727270208791]}
01785533 Alaska 52 {“type”:”Point”,”coordinates”:[-150.41866893173,58.3829444770504]}
01785534 South Dakota 53 {“type”:”Point”,”coordinates”:[-98.1393052717026,43.8929744764808]}
01802701 American Samoa 54 {“type”:”Point”,”coordinates”:[-170.302817441699,-14.2005732150488]}
01802705 Guam 55 {“type”:”Point”,”coordinates”:[144.763252929807,13.4279675884233]}
01802710 United States Virgin Islands 56 {“type”:”Point”,”coordinates”:[-64.8355292735655,18.1760172621614]}

Tile tables

The tile tables and views contain WGS84 topoJSON and geoJSON tiles for a geography used by the RIF front end and extract utilities to display administrative geography geolevels The view is defined in rif40_geolevels.tiletable and adds valid empty tiles for the whole planet. The view uses the table which only contains tiles with data. The view is efficient.

Tiles tables must contain the following fields (t_tiles_sahsuland example):

column_name description data_type
geolevel_id ID for ordering (1=lowest resolution). Up to 99 supported. integer
zoomlevel Zoom level: 0 to 11. Number of tiles is 2&ast;&ast;<zoom level> &ast; 2&ast;&ast;<zoom level>; i.e. 1, 2x2, 4x4 … 2048x2048 at zoomlevel 11 integer
x X tile number. From 0 to (2&ast;&ast;<zoom level>)-1 integer
y Y tile number. From 0 to (2&ast;&ast;<zoom level>)-1 integer
optimised_topojson Tile multi-polygon in TopoJSON format, optimised for zoomlevel N. The SRID is always 4326. json
tile_id Tile ID in the format <geolevel number><geolevel name><zoom level><X tile number><Y tile number> varchar(200)
areaid_count Total number of areaIDs (geoJSON features) integer

Example view SQL for SAHSULAND:

CREATE OR REPLACE VIEW rif_data.tiles_sahsuland
AS
WITH a AS (
         SELECT t_rif40_geolevels.geography,
                MAX(t_rif40_geolevels.geolevel_id) AS max_geolevel_id
           FROM t_rif40_geolevels
          WHERE t_rif40_geolevels.geography::text = 'SAHSULAND'::text
          GROUP BY t_rif40_geolevels.geography
), b AS (
         SELECT a.geography,
                generate_series(1, a.max_geolevel_id::integer, 1) AS geolevel_id
           FROM a
), c AS (
         SELECT b2.geolevel_name,
                b.geolevel_id,
                b.geography,
                b2.areaid_count
           FROM b,
                t_rif40_geolevels b2
          WHERE b.geolevel_id = b2.geolevel_id AND b.geography::text = b2.geography::text
), d AS (
         SELECT generate_series(0, 11, 1) AS zoomlevel
), ex AS (
         SELECT d.zoomlevel,
                generate_series(0,
						power(2::double precision, d.zoomlevel::double precision)::integer - 1,
						1) AS xy_series
           FROM d
), ey AS (
         SELECT c.geolevel_name,
                c.areaid_count,
                c.geolevel_id,
                c.geography,
                ex.zoomlevel,
                ex.xy_series
           FROM c,
                ex
)
SELECT z.geography,
       z.geolevel_id,
       z.geolevel_name,
       CASE
            WHEN h1.tile_id IS NULL AND h2.tile_id IS NULL THEN 1
            ELSE 0
       END AS no_area_ids,
       COALESCE(h1.tile_id,
			((((((((z.geolevel_id::character varying::text ||
				'_'::text) ||
				z.geolevel_name::text) ||
				'_'::text) ||
				z.zoomlevel::character varying::text) ||
				'_'::text) ||
				z.x::character varying::text) ||
				'_'::text) ||
				z.y::character varying::text)::character varying) AS tile_id,
       z.x,
       z.y,
       z.zoomlevel,
       COALESCE(h1.optimised_topojson,
			h2.optimised_topojson,
			'{"type": "FeatureCollection","features":[]}'::json) AS optimised_topojson
  FROM (
     SELECT ey.geolevel_name,
            ey.areaid_count,
            ey.geolevel_id,
            ey.geography,
            ex.zoomlevel,
            ex.xy_series AS x,
            ey.xy_series AS y
       FROM ey,
            ex
      WHERE ex.zoomlevel = ey.zoomlevel) z
     LEFT JOIN t_tiles_sahsuland h1 ON
			z.areaid_count > 1 AND
			z.zoomlevel = h1.zoomlevel AND
			z.x = h1.x AND z.y = h1.y AND
			z.geolevel_id = h1.geolevel_id
     LEFT JOIN t_tiles_sahsuland h2 ON
			z.areaid_count = 1 AND
			h2.zoomlevel = 0 AND
			h2.x = 0 AND
			h2.y = 0 AND
			h2.geolevel_id = 1;

Adjacency tables

The adjacency tables are a list of adjacent areas for each geolevel and area id in a geography used in BAtesian smoothing; The table is defined in rif40_geolevels.adjacencytable.

Adjacency tables must contain the following fields (adjacency_sahsuland example):

column_name description data_type
geolevel_id ID for ordering (1=lowest resolution). Up to 99 supported. integer
areaid Area Id varchar(200)
num_adjacencies Number of adjacencies integer
adjacency_list Adjacent area Ids varchar(8000)

As an example the adjacency table adjacency_sahsuland contains (with rows truncated):

geolevel_id areaid num_adjacencies adjacency_list
2 01.001 3 01.002,01.004,01.005
2 01.002 3 01.001,01.003,01.004
2 01.003 3 01.002,01.004,01.008
2 01.004 6 01.001,01.002,01.003,01.005,01.008,01.009
2 01.005 4 01.001,01.004,01.006,01.009
3 01.001.000200 3 01.001.000100,01.001.000300,01.005.002400
3 01.001.000300 1 01.001.000200
3 01.002.000300 4 01.002.000400,01.002.000500,01.002.000600,01.002.001900
3 01.002.000400 4 01.002.000300,01.002.001600,01.002.001700,01.002.001900
3 01.002.000500 3 01.001.000100,01.002.000300,01.002.000600
3 01.002.000600 7 01.001.000100,01.002.000300,01.002.000500,01.002.000700,01.002.000800,01.002.001100,01.002.001900
3 01.002.000700 3 01.001.000100,01.002.000600,01.002.000800
4 01.002.001300.2 4 01.002.001200.3,01.002.001200.7,01.002.001300.1,01.002.001300.4
4 01.002.001300.3 6 01.002.001200.5,01.002.001200.6,01.002.001200.7,01.002.001300.4,01.002.001300.9,01.002.001400.2
4 01.002.001300.4 6 01.002.001200.7,01.002.001300.1,01.002.001300.2,01.002.001300.3,01.002.001300.5,01.002.001300.9
4 01.002.001300.5 6 01.001.000100.1,01.002.001300.1,01.002.001300.4,01.002.001300.8,01.002.001300.9,01.002.001500.2
4 01.002.001300.6 4 01.002.001300.7,01.002.001300.9,01.002.001400.5,01.002.001400.7

Hierarchy tables

Hierarchy tables have one table per administrative geography; contains a hierarchy of higher resolution of one or more areas that fit exactly within the lower resolution The table is defined in rif40_geographies.hierarchytable*.

Hierarchy tables must contain the following fields (as in the hierarchy_sahsuland example below):

column_name description data_type
sahsu_grd_level1 Hierarchy lookup for Level 1 (top level) varchar(100)
sahsu_grd_level2 Hierarchy lookup for Level 2 varchar(100)
sahsu_grd_level3 Hierarchy lookup for Level 3 varchar(100)
sahsu_grd_level4 Hierarchy lookup for Level 4 varchar(100)

As an example the hierarchy table hierarchy_sahsuland contains (with rows truncated):

sahsu_grd_level1 sahsu_grd_level2 sahsu_grd_level3 sahsu_grd_level4
01 01.001 01.001.000100 01.001.000100.1
01 01.001 01.001.000100 01.001.000100.2
01 01.001 01.001.000200 01.001.000200.1
01 01.001 01.001.000300 01.001.000300.1
01 01.002 01.002.000300 01.002.000300.1
01 01.002 01.002.000300 01.002.000300.2
01 01.002 01.002.000300 01.002.000300.3

Geometry tables

The geometry tables contains geometric data for a geography to allow the database to perform spatial queries with the administrative geometry The table is defined in rif40_geolevels.geometrytable.

Geometry tables must contain the following fields (geometry_sahsuland example):

column_name description data_type
geolevel_id ID for ordering (1=lowest resolution). Up to 99 supported. integer
areaid Area ID. varchar(200)
zoomlevel Zoom level: 0 to 11. Number of tiles is 2&ast;&ast;<zoom level> &ast; 2&ast;&ast;<zoom level>; i.e. 1, 2x2, 4x4 … 2048x2048 at zoomlevel 11 integer
geom Geometry data in SRID 4326 (WGS84). geometry
wkt Well known text Text

Geometry tables are partitioned on Postgres ports.

Shapefile, shapefile tables

The Shapefile, shapefile tables contain the names of the original shapefile and the data in geoJSON format. They are used as part of the administrative geography preprocessing (tile maker) The shapefile is defined in rif40_geolevels.shapefile. The table is defined in rif40_geolevels.shapefile_table.

Centroids tables

Centroids tables are used to import population weighted or spatially processed (centroid pulled to within the centroid boundary) centroids into an administrative geography. Centroids tables are not currently supported. The table is defined in rif40_geolevels.centroids_table. This is in the TODO list for future additions

Health Themes

The table rif40.rif40_health_study_themes is used to group numerators logically so the user is not presented with a long list of numerator/denominator pairs. Purpose could include:

column_name description data_type
theme Theme varchar(30)
description Description varchar(200)

An example theme is provided:

theme description
cancers covering various types of cancers

Load Processing

Generally load processing requires three steps:

The following scripts are used in the examples:

Action Postgres SQL Server
Load the Administrative geography rif_pg_usa_2014.sql rif_mssql_usa_2014.sql
Pre-process the SEER data pg_load_seer.sql TO FOLLOW
 • Pre-process numerator data pg_load_seer_cancer.sql TO FOLLOW
 • Pre-process denominator data pg_load_seer_population.sql TO FOLLOW
 • Pre-process covariates data pg_load_seer_covariates.sql TO FOLLOW
Load the processed SEER data pg_rif40_load_seer.sql ms_rif40_load_seer.sql
 • Load numerator data pg_rif40_load_seer_cancer.sql ms_rif40_load_seer_cancer.sql
 • Load denominator data pg_rif40_load_seer_population.sql ms_rif40_load_seer_population.sql
 • Load covariate data pg_rif40_load_seer_covariates.sql ms_rif40_load_seer_covariates.sql

To install, change to the <SEER Data directory, e.g. C:\Users\phamb\OneDrive\April 2018 deliverable for SAHSU\SEER Data\ >;

The SEER pre-processing script pg_load_seer_covariates.sql has a dependency on the cb_2014_us_nation_5m, cb_2014_us_state_500k and cb_2014_us_county_500l that are part of the tilemaker pre-processing. The FIPS code is required to make the join and this field is not in the standard lookup tables. For this reason it is necessary to build the covariates table on sahsuland_dev. In the longer term the FIPS codes should be added to the lookup tables.

SQL Server pre-processing scripts have not been created due to the lack of an unload facility in sqlcmd. There are two basic options to resolve this problem without recourse to C# or the many GUI wizards available:

Other useful references:

To run a script:

Once all the data has been loaded, check the table appears in a user (not rif40) rif40_num_denom

sahsuland=> select * from rif40_num_denom;
 geography |   numerator_table    |             numerator_description             |         theme_description         | denominator_table |                                 denominator_description                                  | automatic
-----------+----------------------+-----------------------------------------------+-----------------------------------+-------------------+------------------------------------------------------------------------------------------+-----------
 SAHSULAND | NUM_SAHSULAND_CANCER | cancer numerator                              | covering various types of cancers | POP_SAHSULAND_POP | population health file                                                                   |         1
 USA_2014  | SEER_CANCER          | SEER Cancer data 1973-2013. 9 States in total | covering various types of cancers | SEER_POPULATION   | SEER Population 1972-2013. Georgia starts in 1975, Washington in 1974. 9 States in total |         1
(2 rows)

If your data does not appear; see Numerator Denominator Pair Errors

Generate Series

The generate_series function is used in the RIF rif_data.tiles_sahsuland VIEW to add back tiles for which there is no data (i.e. outside of the extent of the geography so the tiles cover the entire world. It is also very useful for converting non annual covariates to the annual form required by the RIF; e.g:

SELECT generate_series(2011, 2018);
 generate_series
-----------------
            2011
            2012
            2013
            2014
            2015
            2016
            2017
            2018
(8 rows)

Using rif_data.covar_sahsuland_covariates3 year 2000 data only (i.e pretend it is not annualised!)

WITH a AS (
	SELECT generate_series(2011, 2018) AS year
)
SELECT a.year, b.sahsu_grd_level3, b.ses, b.ethnicity
  FROM a, covar_sahsuland_covariates3 b
 WHERE b.year = 2000 /* USe only one year! */
 ORDER BY 1, 2 LIMIT 20;
 year | sahsu_grd_level3 | ses | ethnicity
------+------------------+-----+-----------
 2011 | 01.001.000100    |   4 |         1
 2011 | 01.001.000200    |   5 |         1
 2011 | 01.001.000300    |   5 |         3
 2011 | 01.002.000300    |   2 |         2
 2011 | 01.002.000400    |   5 |         3
 2011 | 01.002.000500    |   5 |         3
 2011 | 01.002.000600    |   4 |         3
 2011 | 01.002.000700    |   5 |         3
 2011 | 01.002.000800    |   4 |         3
 2011 | 01.002.000900    |   1 |         2
 2011 | 01.002.001000    |   2 |         1
 2011 | 01.002.001100    |   3 |         3
 2011 | 01.002.001200    |   1 |         2
 2011 | 01.002.001300    |   2 |         3
 2011 | 01.002.001400    |   2 |         3
 2011 | 01.002.001500    |   1 |         2
 2011 | 01.002.001600    |   2 |         3
 2011 | 01.002.001700    |   1 |         1
 2011 | 01.002.001800    |   2 |         3
 2011 | 01.002.001900    |   1 |         1
(20 rows)

See Postgres set returning functions

The generate_series function is NOT part of SQL Server; however the following code provides this functionality. It is installed as a table valued function in the rif40 schema.

/*
 * SQL statement name: 	generate_series.sql
 * Type:				MS SQL Server SQL statement
 * Parameters:			None
 * Description:			Generate a series of values, from start to stop with a step size of step
 *						Original by: Simon Greener, Independent GeoSpatial Solutions Architect
 *						http://www.spatialdbadvisor.com/sql_server_blog/86/generate_series-for-sql-server-2008
 */
CREATE FUNCTION generate_series ( @p_start INT, @p_end INT, @p_step INT=1 )
RETURNS @Integers TABLE ( [IntValue] INT )
AS
BEGIN
    DECLARE
      @v_i                 INT,
      @v_step              INT,
      @v_terminating_value INT;
    BEGIN
      SET @v_i = CASE WHEN @p_start IS NULL THEN 1 ELSE @p_start END;
      SET @v_step  = CASE WHEN @p_step IS NULL OR @p_step = 0 THEN 1 ELSE @p_step END;
      SET @v_terminating_value =  @p_start + CONVERT(INT,ABS(@p_start-@p_end) / ABS(@v_step) ) * @v_step;
      -- Check for impossible combinations
      IF NOT ( ( @p_start > @p_end AND SIGN(@p_step) = 1 )
               OR
               ( @p_start < @p_end AND SIGN(@p_step) = -1 ))
      BEGIN
        -- Generate values
        WHILE ( 1 = 1 )
        BEGIN
           INSERT INTO @Integers ( [IntValue] ) VALUES ( @v_i )
           IF ( @v_i = @v_terminating_value )
              BREAK
           SET @v_i = @v_i + @v_step;
        END;
      END;
    END;
    RETURN
END;

Administrative Geography

The SAHSULAND example geography is supplied as part of the RIF. To load the SEER test dataset you first need to load the USA County level administrative geography. The scripts and the data are creared by the tile-maker program:

To install, change to the <tile maker directory, e.g. C:\Users\phamb\OneDrive\April 2018 deliverable for SAHSU\SEER Data\Tile maker USA>

The scripts rif_pg_usa_2014.sql or rif_mssql_usa_2014.sql load processed geometry and tiles tables into production database:

a) integrate with RIF40 control tables, e.g:

b) Processed geometry data (partitioned in PostGres), e.g:

c) Hierarchy table, e.g:

d) Lookup tables, e.g:

e) Tiles table and view

Postgres

Run: psql -U rif40 -w -e -f rif_pg_USA_2014.sql

SQL Server

Run: sqlcmd -U rif40 -d <database name> -b -m-1 -e -r1 -i rif_mssql_USA_2014.sql -v pwd="%cd%"

Numerator

Pre Processing

Load Processing

Denominator

Pre Processing

Load Processing

Covariates

Pre Processing

This process is then repeated for all other covariates:

We can now create the production covariate tables for both county and state level from:

E.g:

Load Processing

Information Governance

See Database management manual - Information Governance

Flexible Configuration Support

RIF 4.0 has a number of options for more flexible configuration:

Age Groups

This is configured using the tables:

age_group_id age_group_name
1 RIF Default
2 RIF Birth defects
3 RIF Childhood infectious diseases
age_group_id offset low_age high_age fieldname
1 0 0 0 0
1 1 1 1 1
1 2 2 2 2
1 3 3 3 3
1 4 4 4 4
1 5 5 9 5_9
1 6 10 14 10_14
1 7 15 19 15_19
1 8 20 24 20_24
1 9 25 29 25_29
1 10 30 34 30_34
1 11 35 39 35_39
1 12 40 44 40_44
1 13 45 49 45_49
1 14 50 54 50_54
1 15 55 59 55_59
1 16 60 64 60_64
1 17 65 69 65_69
1 18 70 74 70_74
1 19 75 79 75_79
1 20 80 84 80_84
1 21 85 255 85PLUS
2 0 0 0 0
3 0 0 1 0_1
3 1 2 3 2_3
3 2 4 5 4_5
3 3 6 7 6_7
3 4 8 9 8_9
3 5 10 19 10_19
3 6 20 39 20_39
3 7 40 79 40_79
3 8 80 255 80PLUS

ICD field Name

Numerators are currently limited to ICD 9 and 10 coding only and a single ICD field. Support will be added for ICD 11 (subject to the release of the 11th Edition in June 2018).

In the longer term it is expected that support will be added for:

This is configured using the tables:

outcome_group_name numer_tab current_version_start_year
SAHSULAND_ICD NUM_SAHSULAND_CANCER 1989

Normally, when adding a new numerator table to the RIF; the default outcome group name (SAHSULAND_ICD) is used.

outcome_type outcome_description current_version current_sub_version previous_version previous_sub_version
ICD International Classification of Disease 10 11th Revision - 2010 9  
OPCS Office of Population Censuses and Surveys [OPCS] Classification of Interventions and Procedures 4 4.6 1/11/2011    
ICD-O International Classification of Disease for Oncology 3 2 ?/2000    
A&E A&E clinical diagnosis (3 char) Unk N/A    
BIRTHWEIGHT Birthweight (e.g. low <2500g) 1      
outcome_type outcome_group_name outcome_group_description field_name multiple_field_count
ICD SINGLE_ICD Single ICD ICD_SAHSU_01 0
ICD MORTALITY_SECONDARY_ICD Single ICD - Secondary cause of Death ICD_SAHSU_01S 0
ICD MORTALITY_MULTIPLE_ICD ONS Mortality multiple ICD ICD_SAHSU 16
ICD HES_MULTIPLE_ICD HES multiple ICD ICD_SAHSU 20
A&E HES_SINGLE_A&E HES single A+E DIAG 0
A&E HES_MULTIPLE_A&E HES multiple A+E DIAG 20
OPCS HES_SINGLE_OPCS HES single OPCS OPCS_SAHSU 0
OPCS HES_MULTIPLE_OPCS HES multiple OPCS OPCS_SAHSU 20
ICD-O CANCER_ICD_O Cancer type of growth histology coded to Classification of Neoplasms TYPE_GROWTH 0
ICD SAHSULAND_ICD Single ICD ICD 0
BIRTHWEIGHT BIRTHWEIGHT Birthweight (e.g. low <2500g) BIRTHWEIGHT 0

Automatic Numerator Denominator Pairs

Quality Control

Extract Warnings

The table/view t_rif40_warnings/rif40_warnings will contain contain warning messages on a study basis. These can be created by extract or R scripts. Traps will be added for:

Numerator Denominator Pair Errors

The normal cases if for valid numerator and denominator pairs to appear in the view rif40_num_denom.

sahsuland=> select * from rif40_num_denom;
 geography |   numerator_table    | numerator_description |         theme_description         | denominator_table | denominator_description | automatic
-----------+----------------------+-----------------------+-----------------------------------+-------------------+-------------------------+-----------
 SAHSULAND | NUM_SAHSULAND_CANCER | cancer numerator      | covering various types of cancers | POP_SAHSULAND_POP | population health file  |         1
(1 row)

This view is special in that every user has one as it then uses your permissions to determine from the RIF configuration and your permissions whether you have access to a given numerator and denominator pair. A further view rif40_num_denom_errors is provided to help resolve issues:

Column Type Description
geography character varying(50) Geography
numerator_owner character varying Numerator table owner
numerator_table character varying(30) Numerator table
is_numerator_resolvable integer Is the numerator table resolvable and accessible (0/1)
n_num_denom_validated integer Is the numerator valid for this geography (0/1). If N_NUM_DENOM_VALIDATED and D_NUM_DENOM_VALIDATED are both 1 then the pair will appear in RIF40_NUM_DENOM.
numerator_description character varying(250) Numerator table description
denominator_owner character varying Denominator table owner
denominator_table character varying(30) Denominator table
is_denominator_resolvable integer Is the denominator table resolvable and accessible (0/1)
d_num_denom_validated integer Is the denominator valid for this geography (0/1). If N_NUM_DENOM_VALIDATED and D_NUM_DENOM_VALIDATED are both 1 then the pair will appear in RIF40_NUM_DENOM.
denominator_description character varying(250) Denominator table description
automatic integer Is the pair automatic (0/1). Cannot be applied to direct standardisation denominator. Restricted to 1 denominator per geography. The default in RIF40_TABLES is 0 because of the restrictions.
auto_indirect_error_flag integer Error flag 0/1. Denominator table with automatic set to “1” that fails the RIF40_CHECKS.RIF40_AUTO_INDIRECT_CHECKS test. Restricted to 1 denominator per geography to prevent the automatic RIF40_NUM_DENOM having >1 pair per numerator.
auto_indirect_error text Denominator table with automatic set to “1” that fails the RIF40_CHECKS.RIF40_AUTO_INDIRECT_CHECKS test. Restricted to 1 denominator per geography to prevent the automatic RIF40_NUM_DENOM having >1 pair per numerator. List of geographies and tables in error.
n_fdw_create_status character varying(1) RIF numerator foreign data wrappers table create status: C (Created, no errors), E(Created, errors in test SELECT), N(Not created, errors).
n_fdw_error_message character varying(300) RIF numerator foreign data wrappers table error message when create status is: E(Created, errors in test SELECT, N(Not created, errors).
n_fdw_date_created timestamp without time zone RIF numerator foreign data wrappers table date FDW table created (or attempted to be).
n_fdw_rowtest_passed smallint RIF numerator foreign data wrappers table SELECT rowtest passed (0/1).
sahsuland=> select * from rif40_num_denom_errors;
- geography | numerator_owner |   numerator_table    | is_numerator_resolvable | n_num_denom_validated | numerator_description | denominator_owner | denominator_table | is_denominator_resolvable | d_num_denom_validated | denominator_description | automatic | auto_indirect_error_flag | auto_indirect_error | n_fdw_create_status | n_fdw_error_message | n_fdw_date_created | n_fdw_rowtest_passed
-----------+-----------------+----------------------+-------------------------+-----------------------+-----------------------+-------------------+-------------------+---------------------------+-----------------------+-------------------------+-----------+--------------------------+---------------------+---------------------+---------------------+--------------------+----------------------
 SAHSULAND | rif_data        | NUM_SAHSULAND_CANCER |                       1 |                     1 | cancer numerator      | rif_data          | POP_SAHSULAND_POP |                         1 |                     1 | population health file  |         1 |                        0 |                     |                     |                     |                    |
 USA_2014  | rif_data        | NUM_SAHSULAND_CANCER |                       1 |                     0 | cancer numerator      | rif_data          | POP_SAHSULAND_POP |                         1 |                     0 | population health file  |         1 |                        0 |                     |                     |                     |                    |
(2 rows)

In this case the error is not an error as n_num_denom_validated and d_num_denom_validated are both 0 so the denominator and numerator valid for this geography:

geography numerator_owner numerator_table is_numerator_resolvable n_num_denom_validated numerator_description denominator_owner denominator_table is_denominator_resolvable d_num_denom_validated denominator_description automatic auto_indirect_error_flag auto_indirect_error
SAHSULAND rif_data NUM_SAHSULAND_CANCER 1 1 cancer numerator rif_data POP_SAHSULAND_POP 1 1 population health file 1 0  
USA_2014 rif_data NUM_SAHSULAND_CANCER 1 0 cancer numerator rif_data POP_SAHSULAND_POP 1 0 population health file 1 0  

In both these case you need to check the grants on the tables: Viewing your user setup

Check that the RIF can deduce the outcome columns for the numerator tables

Check that the ICD column (HES_DIAG) is visible in rif40_numerator_outcome_columns for tables you have loaded, e.g.V_HES_201617_APR2019:

sahsuland=> select * from rif40_numerator_outcome_columns;
geography |      table_name      |            table_description                                 | outcome_group_name | outcome_type | outcome_group_description |  field_name  | multiple_field_count | columnn_exists |  column_comment
-----------+----------------------+--------------------------------------------------------------+--------------------+--------------+---------------------------+--------------+----------------------+----------------+-------------------
SAHSULAND | NUM_SAHSULAND_CANCER | cancer numerator                                             | SAHSULAND_ICD      | ICD          | SAHSULAND SingleICD      | ICD           |                    0 | t              | ICD code field
EWS2011   | V_HES_201617_APR2019 | England 2011 Census boundaries HES Inpatients view 2016-2017 | HES_DIAG           | ICD          | UK diag_01               | diag_01       |                    0 | t              | 
(2 rows)

Remote Data Links

The RIF can access data through remote data links with the following limitations:

SQL Server to Oracle

This is technically possible but was not tested.

Postgres to Oracle

The following steps are required to connect an Oracle table to a Postgres RIF database:

Setting up

  1. Create roles and accounts in Oracle using SQL*Plus and SYSTEM, and test:
  2. Setup Oracle FDW as Postgres using psql

  3. Test the foreign data wrapper functionality

Diagnostics

The information will also be available in various pg_ views.

HES Example

The objective of this example is to test access to some HES test data in Oracle. As a result of this test it was decided to advise the use of materialize views to provide for a local copy of the data.

Peter Hambly, April 2019