ISD linked database

 

Introduction

The ISD linked database contains information on Scotland's SMR01 records for acute specialty day case and inpatient discharges from hospital since January 1981, cancer registrations for patients diagnosed since January 1980, National Records for Scotland (NRS) death registrations from January 1980, and mental health (SMR04) admissions from January 1981. Records for an individual are linked together, allowing a more comprehensive overview of aspects of their health and NHS healthcare.

Uses of the linked database

The database is an extremely useful tool for looking at patient pathways and follow-up, such as readmission to hospital and survival. Its public health uses also include looking at co-morbidity and relationships between diseases. In addition, it has the potential to be used to estimate, for a wide range of diseases involving hospital admission and/or death:

  • the incidence (rate of occurrence of new cases in the population over a specified period)
  • the (point) prevalence (number of cases as a percentage of the population at a given point in time).

Incidence

To estimate the incidence of a particular disease, a count is made during a defined period of:

  • 'new' admissions for the disease (ie records for people with no previous admission record for the disease); plus
  • deaths from the disease for people with no admission record for the disease.

However, it is important to recognise two key limitations of this approach:

  • It will tend to underestimate the true incidence of the disease in the population, since not all cases are hospitalised. The degree of underestimation depends on how likely patients with a particular disease are to be hospitalised, and this may also change over time as clinical practice changes.
  • The lack of records prior to 1980/1981 means that existing cases will be erroneously included in the incidence estimates, particularly in the years immediately after the database started in 1980/1981, when the incidence estimates will be inflated. This effect decreases over time, and eventually disappears, but may be responsible for incidence estimates apparently decreasing over time. It is recommended that a constant look-back period (which does not extend back prior to 1980/1981) be used to avoid this.

Prevalence

Similarly, if using the linked database to estimate the prevalence of a disease (by counting people still living on a certain date who have at some point been hospitalised with the disease) care must be taken for the following reasons:

  • Anyone living with a particular disease who has not been hospitalised for it will not be included, and therefore prevalence will be underestimated.
  • The effect of excluding cases admitted before 1981 is that prevalence may appear to increase over time. It is recommended that a constant look-back period (which does not extend back prior to 1981) be used to avoid this.
  • There is an assumption that no-one is cured from the disease, which will be inappropriate for some diseases and will inflate the prevalence estimate.

Technical notes

The potential for bringing records together on a patient basis was first outlined in 1968.  The basis of forming a linked data set is the comparison of two records, and the decision as to whether or not they relate to the same person. However, due to errors in recording, exact matching could miss many true links. To allow for imperfections in the data, the Scottish Record Linkage System uses methods of probability matching which have been developed over the past 35 years. A computer matching algorithm calculates a score for each pair of records that are compared; the odds that they belong to the same person. The overall score is the sum of scores derived from the comparison of each item of identifying information, weighted according to the rarity of the information (e.g. the initial Z has a high weight). Similar negative weightings are applied to levels of disagreement between items. The following identifying information is used:

  • Surname (and its phonetic code to overcome differences in spelling)
  • First initial (also full forename and second initial when available)
  • Sex
  • Year, month and day of birth
  • Postcode
  • Date of death, if available
  • Patient identifiers: Hospital Case Reference Number, CHI/UPI and NHS Number, where available

The phonetic code used is a bespoke version of the Soundex coding system which reduces a name to a code consisting of the leading letter followed by 3 digits. A Soundex weight is assigned to each code, reflecting the rarity of the Soundex code throughout the Scottish population, with a maximum weighting of 15.00. The Soundex weight is used by the computer matching algorithm in the calculation of the comparison scores.

The records are blocked on (1) surname soundex, first initial and sex, and (2) date of birth. A full comparison is only carried out for records which agree on either of the blocking criteria. The threshold (that is, the score at which the decision to link is made) is determined by clerical checking of a sample of records.

From an independent check of the quality of linkages carried out by the Scottish record linkage team there was a false positive (incorrect links) rate of 3.7% and a false negative (missed links) rate of 1.9% between two incidence databases (3077 subjects). In that analysis, the rates were higher for non-postcoded data (4.2% false positive and 2.4% false negative). The independent analysis was based on 'clinical' events and would be lower if transfers and additional treatments were included.