- How many orthopedic, neurological, and laboratory tests did you learn in your professional training?
- How many of them are simply designed by a well-intentioned individual but have never been tested for reliability or validity?
- What about my individual patient?
- Do I perform every test I learned for a given condition or are certain tests better suited to my patient?
- After performing the tests, how certain will I be regarding the final diagnosis?
- When I read an article on the value of a diagnostic test, what do all those terms mean?
- How do I make sense of them and come to a conclusion about using the test?

Below are descriptions of application. For definitions of terms, click here.

Reliability

Do we know if the test has been tested using more than one examiner and do they agree on the test results?

Regardless of what you are testing for, will multiple examiners get the same results using the same testing procedure?

If examiner A, B, and C arrive at different conclusions using the same test on the same patients (inter-examiner reliability) or each examiner gets different test results on repeated testing of the same patients (intra-examiner reliability), the test is unreliable.

This is evident in what is called a kappa statistic. In essence, it is measuring the proportion of potential agreement beyond chance that was actually reached. This is important for many reason. One is because the possibility of finding a positive test result is higher if the prevalence of a given disorder is high.

Examples of kappa values ability to determine agreement beyond chance:

0-0.2 - Slight

0.2-0.4 - Moderate

0.4-0.6 - Fair

0.6-0.8 - Substantial

0.8 -1.0 - Excellent (almost perfect)

Rule-In or Rule-Out

The next obvious question:
Is this test going to help me rule-in or rule-out a disorder/condition?

As always, that depends, but generally it is first based on whether the test was ever compared to a known standard; a gold standard. For many orthopedic conditions, visualization during a surgical procedure is the gold standard.

Assuming there is no gold standard, it will be difficult to determine the value of the test (see construct validity). If there is, we start with sensitivity and specificity. Helpful reminders are the acronyms Spin and Snout.

Sp(P)IN - If the Specificity of a given test is high, a POSITIVE test result will more likely rule IN the disorder.

Sn(N)OUT - If the Sensitivity of a given test is high, a NEGATIVE test result will more likely rule OUT the disorder.

Likelihood Ratios

Summarizing the information from snsitivity and specificity into what are called likelihood ratios (LRs) allows for another use of these measures.When you read an article which uses likelihood ratios (often in the abstract), it generally can be interpreted as follows:

- a positive LR of 10 or higher indicates that a positive result for that test more likely rules IN the disorder
- a negative LR of 0.1 or less indicates that a negative result for that test more likely rules OUT the disorder
- LRs close to 1.0 provide little discrimination in ruling in or out a disorder

Putting It All together for Interpretation

By utilizing the LRs with pre-test probability, an estimation of post-test probability is possible.

How do you determine pre-test probability? Prevalence.

What is the prevalence of a given disorder (number of patients with the disorder found in the sample population) and how comparable is that prevalence to your setting? In other words, it may be that a study determines that the prevalence for meniscus tears in an orthopedist's office (in which the study was performed) is generally high, however, that same prevalence may be different in your office based on the type of practice you have.

This is tricky. If your prevalence is lower, the predictive value will also be dramatically lower. Certainly, the higher the prevalence, the more of an effect on the predictive value of that test if the test is positive.

There are two standard methods to determine post-test probability (this is what you need to make decisions). One is mathamatical, the other uses what is called a nomogram. Click here for a functioning nomogram.

There are three columns in the nomogram: the pre-test probability, the likelihood ratio, and the post-test probability. If you know the first two, you simply draw a line connecting these known values to a point on the post-test probability column and arrive at a percentage. The higher the percentage, the more sure you are that the patient has or does not have the disorder.

For the pre-test probability (what you think before you perform the test) you need to know the prevalence of the thing you are testing for.

Pre-test probabilities can either be obtained through a literature search, from the article you are reading, or your estimation of the likelihood that your patient has the disorder based on your patient base or positive findings from the history or tests which will then raise the likelihood for that given patient.

Examples of some known prevalence data:

- idiopathic scoliosis in 10 to 16 year-olds - 2-4%
- peripheral neuropathy in the elderly - 8%
- persistent asthma in children - 9%