Patient Health Questionnaire Somatic Symptom Severity Scale (PHQ-15)

In the mid-1990s the Patient Health Questionnaire (PHQ), was developed and validated as a shorter self-administered version of the Primary Care Evaluation of Mental Disorders (PRIME-MD). The PHQ was developed by Robert Spitzer, Janet Williams and Kurt Kroenke and colleagues at Columbia University. A large study found the PHQ had diagnostic validity comparable to the original clinician-administered PRIME-MD and was more efficient in clinical practice (Spitzer et al., 1999). The Patient Health Questionnaire Somatic Symptom Severity Scale (PHQ-15) is a brief, self-administered questionnaire that was derived from the full PHQ and is increasingly used to assess somatic symptom severity and screen for the potential presence of somatisation and somatoform disorders (based on DSM-IV criteria) in adults (Kroenke et al., 2002). The scale consists of 15 items that ask whether somatic symptoms, such as stomach pain or dizziness are present within the last 4 weeks and the severity (response categories of “not bothered at all”, “bothered a little” and “bothered a lot”). The PHQ-15 scores of 5, 10, and 15 represent cut off points for low, medium, and high somatic symptom severity, respectively (Spritzer et al., 1994).

Psychometric Properties

The PHQ-15 has been validated in different clinical and occupational populations. (De Vroege et al., 2012; Kroenke et al., 2010). With a cut off score of 6 or more the sensitivity of the PHQ-15 was 78% (true positive) and specificity was 71% (false negative). The negative predictive value of 97% indicates that only 3% of individuals who have a score of less than 6 will have a somatoform disorder (Van Ravesteijn et al., 2009). Convergent validity with the Beck Depression Inventory (BDI) and the General Health Questionnaire-12 (GHQ-12) were positive. Increasing scores on the PHQ-15 are strongly associated with increased functional impairment, disability, health care use and symptom-related difficulty (Changsu et al., 2009; Kroenke et al., 2002).  The PHQ-15 demonstrates acceptable internal consistency (Cronbach coefficient alpha of .80) (Kroenke et al., 2002; Van Ravesteijn et al., 2009; Kroenke et al., 1998). The PHQ-15 has moderate test-retest reliability (intraclass correlation coefficient of 0.83) with a 2 week interval (Van Ravesteijn et al., 2009).

The reliability and validity of the PHQ-15 is unaffected by pertinent individual difference factors such as age, gender and education (Kroenke et al., 2010; Kocalevent et al., 2013; Changsu et al., 2009; Shih-Cheng et al., 2016). The PHQ-15 has been translated into over 20 languages (Spritzer et al., 1994). The scale has been validated in Korean and Chinese populations, however does not perform well in Hispanic populations which could be due to multiple factors within the cultural context that may affect how individuals identify and classify bodily sensations, perceive illness and seek medical attention (APA, 2013; Interian et al., 2006; Changsu et al., 2009; Shih-Cheng et al., 2016). East Asian populations often complain of somatic symptoms rather than reveal any depressive feelings, which is important for clinical practice as somatoform disorders have considerable comorbidity with anxiety and depressive disorders which the PHQ-15 does not screen for (Changsu et al., 2009).

Clinical utility

Overall, the PHQ-15 is a valid and reliable screening tool for presence of somatic symptoms and severity. The DSM-IV main criteria for somatoform disorder was medically unexplained symptoms, whereas, the DSM-5 emphasises distress (APA, 2013). Therefore, the PHQ-15 can be aligned with the DSM-5 criteria as the scale is a screening tool of severity and distress (Shih-Chen et al., 2016). More research is needed to support the PHQ-15 as a measure of responsiveness to changes throughout treatment of individuals with somatoform disorders (Kroenke et al., 2010). It is important to note the PHQ-15 is a self-report scale therefore susceptible to reporting biases. Elevated neuroticism or negative affectivity may lead to inflated symptom reporting (Watson et al., 1989). The main strengths of the scale are it is easy to use (for clinician and client), free and has been validated in different clinical and occupational populations. It has shown good sensitivity and specificity for screening for somatoform disorders, however it is not a diagnostic tool rather an indication of an individual at risk (Kroenke et al., 2010). Further, the scale addresses current rather than previous symptoms to gain more valid and reliable data (Kroenke et al., 2010).

Link to free version of PHQ-15


American Psychiatric Association (APA). (2013). Diagnostic and statistical manual of mental disorders: DSM-5. Washington, D.C: American Psychiatric Association.

Spitzer, R., Williams, J., & Kroenke, K. (1994). Instructions for Patient Health Questionnaire (PHQ) and GAD-7 Measures (pp. 1-9). Retrieved from

Spitzer, R., Kroenke, K., & Williams, J. (1999). Validation and Utility of a Self-report Version of PRIME-MD. The Patient Health Questionnaire Primary Care Study Group. JAMA, 282(18), 1737–1744. doi:10.1001/jama.282.18.1737

Kroenke, K., Spitzer, R., & Williams, J. (2002). The PHQ-15: Validity of a New Measure for Evaluating the Severity of Somatic Symptoms. Psychosomatic Medicine64(2), 258-266.

Kroenke, K., Spitzer, R., Williams, J., & Löwe, B. (2010). The Patient Health Questionnaire Somatic, Anxiety, and Depressive Symptom Scales: a systematic review. General Hospital Psychiatry, 32(4), 345-359.

Han, C., Pae, C., Patkar, A., Masand, P., Woong Kim, K., Joe, S., & Jung, I. (2009). Psychometric Properties of the Patient Health Questionnaire–15 (PHQ–15) for Measuring the Somatic Symptoms of Psychiatric Outpatients. Psychosomatics, 50(6), 580-585.

Liao, S., Huang, W., Ma, H., Lee, M., Chen, T., Chen, I., & Gau, S. (2016). The relation between the patient health questionnaire-15 and DSM somatic diagnoses. BMC Psychiatry, 16(1).

Van Ravesteijn, H., Wittkampf, K., Lucassen, P., van de Lisdonk, E., van den Hoogen, H., & van Weert, H. et al. (2009). Detecting Somatoform Disorders in Primary Care With the PHQ-15. The Annals Of Family Medicine, 7(3), 232-238.

Interian, A., Allen, L., Gara, M., Escobar, J., & Díaz-Martínez, A. (2006). Somatic Complaints in Primary Care: Further Examining the Validity of the Patient Health Questionnaire (PHQ-15). Psychosomatics, 47(5), 392-398.

Kocalevent, R., Hinz, A., & Brähler, E. (2013). Standardization of a screening instrument (PHQ-15) for somatization syndromes in the general population. BMC Psychiatry, 13(1).

De Vroege, L., Hoedeman, R., Nuyen, J., Sijtsma, K., & van der Feltz-Cornelis, C. (2012). Erratum to: Validation of the PHQ-15 for Somatoform Disorder in the Occupational Health Care Setting. Journal Of Occupational Rehabilitation, 22(4), 590-590.

Watson, D., & Pennebaker, J. W. (1989). Health complaints, stress, and distress: Exploring the central role of negative affectivity. Psychological Review, 96, 234–254


Florida Obsessive-Compulsive Inventory (FOCI)

The Florida Obsessive-Compulsive Inventory (FOCI) is a free to use measure of the number of symptoms of obsessive-compulsive disorder (OCD) present, as well as the severity of the symptoms1. It was initially developed in 2007 by researchers at the University of Florida1, 2. It was based on the Yale-Brown Obsessive-Compulsive Scale-Self Report (Y-BOCS-SR), which was considered the gold standard at the time and the only other self-report measure for OCD. However, the FOCI is a much quicker measure to use and to score, taking less than 5 minutes2.

One of the primary reasons the FOCI was developed is that there were some concerns surrounding the Y-BOCS-SR’s validity due to its use of separate obsession and compulsion scales when factor analysis is conducted1. Secondly, many of the other OCD measures that had been used by clinicians in the past were not able to measure the number and severity of symptoms in a brief manner. Thus, the FOCI was developed with the Y-BOCS in mind, was reviewed by OCD experts for reliability and relevance, and was revised in consultation with a few OCD in-patients1, 2.

Psychometric Properties

Once the final FOCI was developed, its psychometric properties were measured using 113 previously diagnosed (using DSM-III-R or DSM-IV) OCD patients, who were diagnosed at least one year prior. It has been since found to have good internal consistency (α = 0.89), adequate reliability (K-R 20 = 0.83) for the SC, and is highly correlated with the Y-BOCS-SR total score (previously considered the gold standard)1, 2.

In addition, internal consistency has been shown with the moderate correlations between the two parts of the measure (SC and SS, rs < 0.45). It has also shown to correlate with other measures such as the DASS and Hamilton Depression Rating Scale (depression/anxiety) and Clinical Global Impression Scale (psychopathology severity)1, 2.


The FOCI contains two parts: 1) the symptom checklist (SC) and 2) the severity scale (SS). The SC measures the number of symptoms present from a 20-item list of common symptoms that the individual will circle either “yes” for present or “no” for not present (range 0 – 20; 10 each of obsessions and compulsions). If there is more than one “yes”, the client completes the SS on the second page. They will rate the severity of their symptoms identified on the SC. The clinician adds the total and a score of 8+ indicates possible OCD traits. The clinician can also average the scores over the SS to find an overall severity The SS measures the severity of the symptoms that have been identified, as a whole, and not individual symptoms1, 2.

Cultural issues

There does not appear to be any issues between gender, culture or age at this stage of research, and the measure has been adapted into a child version (C-FOCI), which has been translated into Spanish. The adult version has been translated into Thai and Chinese, and all versions developed to date have similar psychometric properties to the adult English version3 – 6.

Critical analysis

While it cannot measure the severity of individual symptoms, it does measure the severity of the impact of the symptoms on the client. It cannot, for example, determine the severity of contamination concerns versus the severity of avoiding certain numbers; but it can determine the severity of time consumed on the behaviours.

One other issue with the FOCI is that there is no option to add extra symptoms to the list, and the list is not exhaustive. However, the list does include the most common obsessions and compulsions that occur in OCD clients. Because the FOCI is a self-report, it is possible that the client may indicate this in another way (such as writing their own) or, because it should be followed by a clinical interview, this can be brought to the clinicians attention on deeper analysis.

The FOCI has established, good sensitivity to change, and is therefore a great tool to use when determining the success or failure of treatment interventions over time, and there are no known issues with using the measure multiple times with the same client. Because it is quick to complete and easy to score, it is preferable to use the FOCI instead of longer assessments, such as the Y-BOCS. However, it should be noted that the English version has not been tested across clinical and non-clinical populations or clinical-OCD versus other clinical populations.

Finally, it is worth noting that there is a high correlation with the FOCI and measures of depression and anxiety.  However, this is thought to be due to the high co-morbidity of these disorders.


  1. Storch, E. A., Kaufman, D. A. S., Bagner, D., Merlo, L. J., Shapira, N. A., Geffken, G. R., Murphy, T. K., & Goodman, W. K. (2007). Florida Obsessive-Compulsive Inventory: Development, reliability and validity. Journal of Clinical Psychology, 63(9), 851 – 859. DOI: 10.1002/jclp.20382
  2. Aleda, M. A., Geffken, G. R., Jacob, M. L., Goodman, W. K., & Storch, E. A. (2009). Further psychometric analysis of the Florida Obsessive-Compulsive Inventory. Journal of Anxiety Disorders, 23, 124 – 129. DOI:10.1016/j.janxdis.2008.05.001
  3. Saipanish, R., Hiranyatheb, T., Jullagate, S., & Lotrakul, M. (2015). A study of diagnostic accuracy of the Florida Obsessive-Compulsive Inventory – Thai version (FOCI-T). BMC Psychiatry, 15, 251 – 257. DOI: 10.1186/s12888-015-0643-2
  4. Storch, E. A., Khanna, M., Merlo, L. J., Loew, A., Franklin, M., Reid, J. M., Goodman, W. K., & Murphy, T. K. (2009). Children’s Florida Obsessive Compulsive Inventory: Psychometric properties and feasibility of a self-report measure of obsessive-compulsive symptoms in youth. Child Psychiatry & Human Development, 40, 467 – 483. DOI: 10.1007/s10578-009-0138-9
  5. Piqueras, J. A., Rodriquez-Jimenez, T., Ortiz, A. G., Moreno, E., Lazaro, L., & Storch, E. A. (2017). Factor structure, reliability and validity of the Spanish version of the Children’s Florida Obsessive-Compulsive Inventory (C-FOCI). Child Psychiatry & Human Development, 48, 166 – 179. DOI: 10.1007/s10578-016-0661-4
  6. Zhang, C. C., McGuire, J. F., Qiu, X., Jin, H., Li, Z., Cepeda, S., Goodman, W. K., & Storch, E. A. (2017). Florida Obsessive-Compulsive Inventory: Psychometric properties in a Chinese psychotherapy-seeking sample.  Journal of Obsessive-Compulsive and Related Disorders, 12, 41 – 45. DOI: 10.1016/j.jocrd.2016.11.006

Eating Disorder Diagnostic Scale (EDDS)

The Eating Disorder Diagnostic Scale (EDDS; Stice, Telch, & Rizvi, 2000) is a 22-item self-report questionnaire designed to measure Anorexia nervosa, Bulimia nervosa, and Binge-eating disorder symptomatology aligned with the DSM-IV diagnostic criteria.

The scale is comprised of a combination of Likert ratings, dichotomous scores, behavioural frequency scores, and open-ended questions asking for weight and height. The first four questions assess attitudinal symptoms of Anorexia and Bulimia within the past 3 months. The next four items measure the frequency of uncontrollable food consumption, with a focus on the number of days per week over the past 6 months (a criterion for Binge-eating disorder), and number times per week over the last 3 months (a criterion for Bulimia). The following four items measure frequency of compensatory behaviours. Lastly, individuals are asked to record their height, weight, presence of menstrual cycles and birth control pill use.

There are two further scales used in the EDDS that differentiate between eating disorders and deviance from healthy eating pathology. The diagnostic scale may be used to inform diagnosis of Anorexia, Bulimia and Binge-eating disorders. Stice et al. (2000) have developed a scoring algorithm to accompany this scale to determine score cut-offs. The symptom composite scale may be used to create a continuous composite score of disordered eating pathology.

Psychometric Development & Validation

The EDDS went through a rigorous development and validation process with careful adherence to a number of steps. The developers first generated a pool of items to assess DSM-IV eating disorder diagnostic criteria. These items were evaluated by a panel of 14 eating disorder experts, followed by revision to eventually produce the final EDDS to test for reliability and validity against an American female sample aged 13 to 65 years inclusive of those with and without eating disorders.

Results revealed excellent 1-week test-retest reliability for Anorexia (kappa = .95), and adequate test-retest coefficients for Bulimia (kappa = .71) and Binge-eating disorder (kappa = .75). The overall symptom composite test-retest reliability was also strong (kappa = .87). Likewise, internal consistency of the overall symptom composite score was robust (Cronbach’s α = .91). These reliability magnitudes reflect Shrout’s (1998) psychometric rule-of-thumb whereby kappa values above .8 represent high reliability, values between .4 and .8 indicate moderate agreement, and values less than .4 suggest poor reliability.

Content validity results generated by the 14 eating disorder experts revealed that items within the scale adequately reflected the DSM-IV diagnostic criteria for Anorexia, Bulimia, and Binge-eating disorder. Consistently, data also suggested that the EDDS possessed convergent validity by comparing participants with eating disorders with their non-diagnosis control counterparts; with higher scores reported for those with eating disorders than those without.

Strengths & Weaknesses

The EDDS has an abundance of strengths. It is short and quick to complete. With only 22 items, it takes only a few minutes to complete the entire instrument. It is sensitive to change over time; that is, the EDDS has the versatility of being used as a screening tool at the beginning stages of assessment, a diagnostic tool in supporting eating disorder diagnostic criteria, and lastly it may also be used for treatment monitoring and evaluation.

However, the EDDS is not without its limitations. In at least one study, the EDDS has been found to generate a large number of ‘false positives’ (Lee et al., 2007), indicating a weakness in specificity. Conversely, this may not necessarily be a negative drawback considering that when used as a screening tool it is preferable to be able to identify more people as false positives than run a risk of missing out on detecting potential cases of eating disorders. This is because eating disorders, though low in prevalence compared to other clinical disorders, has one the highest mortality rates amongst psychiatric conditions (Arcelus, Mitchell, Wales, & Nielsen, 2011). Additional weaknesses include gender and cultural insensitivity. Different attitudes towards food consumption for gender was found to be reinforced by differing cultural ideals–which were not adequately captured in the EDDS (Lee et al., 2007). Similarly, eating disorder pathology and risk factors were not invariant across Caucasian American women and African American women (Kelly et al., 2012).


Overall, the EDDS is a short and quick to complete self-assessment tool that is versatile to use as a screening measure, diagnostic instrument, and treatment evaluation and monitoring tool for the assessment of Anorexia, Bulimia, and Binge-eating disorder. Its tendency to detect more false positives need not necessarily be a weakness given the vulnerability of the eating disorder population as having some of the most severe mortality and prognosis rates amongst mental conditions. The lack of gender and cultural sensitivity warrants further modifications and refinements by researchers so that this tool can adequately capture the individual nuances that exist both within and between minority groups.


The EDDS is freely available following this link: Information regarding scoring and interpretation may be found here:


Arcelus, J., Mitchell, A. J., Wales, J., & Nielsen, S. (2011). Mortality rates in patients with anorexia nervosa and other eating disorders: A meta-analysis of 36 studies. Archives of General Psychiatry, 68(7), 724-731. doi:10.1001/archgenpsychiatry.2011.74

Kelly, N. R., Mitchell, K. S., Gow, R. W., Trace, S. E., Lydecker, J. A., Bair, C. E., & Mazzeo, S. (2012). An evaluation of the reliability and construct validity of eating disorder measures in white and black women. Psychological Assessment, 24(3), 608-617. doi:10.1037/a0026457

Lee, S. W., Stewart, S. M., Striegel-Moore, R. H., Lee, S., Ho, S-Y., Lee, P. W. H., …Lam, T-H. (2007). Validation of the eating disorder diagnostic scale for use with Hong Kong adolescents. International Journal of Eating Disorders, 40(6), 569-574. doi:10.1002/eat

Shrout, P. (1998). Measurement reliability and agreement in psychiatry. Statistical Methods in Medical Research, 7(3), 301-317. doi:10.1191/096228098672090967

Stice, E., Telch, C. F., & Rizvi, S. L. (2000). Development and validation of the eating disorder diagnostic scale: A brief self-report measure of anorexia, bulimia, and binge-eating disorder. Psychological Assessment, 12(2), 123-131. doi:10.1037//1040-3590.12.2.123

General Behavior Inventory (GBI)

The General Behavior Inventory (GBI), first developed by Depue et al. (1981), was designed to identify the presence and severity of depressive and manic/hypomanic symptoms, as well as to assess for cyclothymia in adults. In their attempts to explore predisposition to bipolar disorder, the authors created a behavioural paradigm to identify persons at risk. Though intended for use in an adult population, a slightly modified version of the GBI has demonstrated potential as a parent-report measure of mood symptomatology amongst children and adolescents (Youngstrom, Findling, Danielson, & Calabrese, 2001). In addition, a short version has been developed via factor analysis that allows for it to be a screening tool in both adult and adolescent populations (Youngstrom, Murray, Johnson, & Findling, 2016).

The original self-report includes three dimensions, or subscales, that comprise 73 items on which respondents use a 4-point Likert-type scale (0 = never or hardly ever; 3 = very often/almost constantly) to indicate the frequency with which they experience a behaviour over the past year. The Depression scale sums 45 of the items whilst the Hypomanic/Biphasic scales combined sum 28 items. Questions include: “Have you become sad, depressed, or irritable for several days or more without really understanding why?” and “has your mood or energy shifted rapidly back and forth from happy to sad or high to low?” As suggested by Depue, Krauss, and Spoont (1987), the items may be scored using a dichotomous model. This involves dividing the population into cases and non-cases, where those individuals responding 0 or 1 to an item receive 0 points and those responding 2 or 3 to an item receive 1 point. The scale may also be scored in the traditional Likert fashion, where the responses are merely summed. Whilst higher scores reflect increased psychopathology, it is important to note that the GBI is not a diagnostic tool. Research has indicated that the scales can discriminate between bipolar and disruptive behaviour disorders, unipolar and bipolar depression, and mood and disruptive behaviour disorders or no diagnosis (Danielson, Youngstrom, Findling, & Calabrese, 2003).

The GBI has strong psychometric properties. In a recent evaluation study, it demonstrated excellent internal consistency (Cronbach’s ⍺ over .93 for both subscales; Pendergast et al., 2014). Results from the original validation study suggest the tool has good test-retest reliability (r = .73 over 15 weeks), excellent content validity, excellent construct validity, and excellent discriminative validity (Depue et al., 1981). More recent studies have found the GBI to have excellent discriminant validity (Youngstrom, Genzlinger, Egerton, & Van Meter, 2015) and good treatment sensitivity (Youngstrom et al., 2013).

Evidence has shown that gender differences have not compromised the overall psychometric properties of the GBI (Depue & Klein, 1988). However, Chmielewski and colleagues (1995) compared GBI data for African American, Asian American, Caucasian, and Latino samples, and discovered significant cultural differences – Caucasians scored lower than all other groups. Though two decades later, involving a combined Caucasian and African American sample, Pendergast et al. (2015) found that GBI scores were largely invariant across racial groups.

Free access to the GBI:

Chmielewski, P. M., Fernandes, L. O., Yee, C. M., & Miller, G. A. (1995). Ethnicity and gender in scales of psychosis proneness and mood disorders. Journal of Abnormal Psychology, 104(3), 464-470.

Danielson, C. K., Youngstrom, E. A., Findling, R. L., & Calabrese, J. R. (2003). Discriminative validity of the General Behavior Inventory using youth report. Journal of Abnormal Child Psychology, 31(1), 29-39.

Depue, R. A., & Klein, D. N. (1988). Identification of unipolar and bipolar affective conditions in nonclinical and clinical populations by the General Behavior Inventory. In D. L. Dunner, E. S. Gershon, & J. E. Barrett (Eds.), Relatives at risk for mental disorders (pp. 179- 202). New York: Raven Press.

Depue, R. A., Krauss, S., & Spoont, M. R. (1987). A two-dimensional threshold model of seasonal bipolar affective disorder. In D. Magnusson & A. Ohman (Eds.), Psychopathology: An interactional perspective (pp. 95-123). New York: Academic Press.

Depue, R. A., Slater, J. F., Wolfstetter-Kausch, H., Klein, D. N., Goplerud, E., & Farr, D. A. (1981). A behavioral paradigm for identifying persons at risk for bipolar depressive disorder: A conceptual framework and five validation studies. Journal of Abnormal Psychology, 90, 381-437.

Pendergast, L. L., Youngstrom, E. A., Brown, C., Jensen, D., Abramson, L. Y., & Alloy, L. B. (2015). Structural invariance of General Behavior Inventory (GBI) scores in Black and White young adults. Psychological Assessment, 27(1), 21-30.

Pendergast, L. L., Youngstrom, E. A., Merkitch, K. G., Moore, K. A., Black, C. L., Abramson, L. Y., & Alloy, L. B. (2014). Differentiating bipolar disorder from unipolar depression and ADHD: The utility of the General Behavior Inventory. Psychological Assessment, 26(1), 195-206.

Youngstrom, E. A., Findling, R. L., Danielson, C. K., & Calabrese, J. R. (2001). Discriminative validity of parent report of hypomanic and depressive symptoms on the General Behavior Inventory. Psychological Assessment, 13(2), 267-276.

Youngstrom, E. A., Genzlinger, J. E., Egerton, G, A., & Van Meter, A. R. (2015). Multivariate meta-analysis of the discriminative validity of caregiver, youth, and teacher rating scales for pediatric bipolar disorder: Mother knows best about mania. Archives of Scientific Psychology, 3(1), 112-137.

Youngstrom, E. A., Murray, G., Johnson, S. L., & Findling, R. L. (2016). The 7 Up 7 Down Inventory: A 14-item measure of manic and depressive tendencies carved from the General Behavior Inventory. Psychological assessment, 25(4), 1377-1383.

Youngstrom, E. A., Zhao, J., Mankoski, R., Forbes, R. A., Marcus, R. M., Carson, W., … Findling, R. L. (2013). Clinical significance of treatment effects with aripiprazole versus placebo in a study of manic or mixed episodes associated with pediatric bipolar I disorder. Journal of child and Adolescent Psychopharmacology, 23(2), 72-9.

Somatic Symptom Scale – 8 (SSS-8)

The eight item Somatic Symptom Scale (SSS-8) was recently developed as a brief, patient-reported outcome measure of somatic symptom burden.  The scale assesses common somatic symptoms and is a shortened version of the PHQ-15 questionnaire scale, it was first developed for the DSM 5 field trials that investigated the newly established somatic symptom disorder (Zijlema, 2013). The SSS-8 has a five point response option instead of the three point options for the PHQ-15, and a seven day time frame instead of the four week time-frame for the PHQ-15 Initially called the PHQ-SSS in the DSM 5 field trials it was renamed to shorten it and reflect the number of items (Gierk et al., 2015).

Psychometric properties

Research found the SSS-8 was a reliable and valid measure of somatic symptoms and cut-off scores identify individuals with low, medium, high, and very high somatic symptom burden.

One survey study (n = 2510), identified the SSS-8 to have excellent item characteristics and good reliability (Cronbach α = 0.81). Somatic symptom burden as measured by the SSS-8 was significantly associated with depression (r = 0.57 [95% CI, 0.54 to 0.60]), anxiety (r = 0.55 [95% CI, 0.52 to 0.58]), general health status (r = -0.24 [95% CI, -0.28 to -0.20]), and health care use (incidence rate ratio, 1.12 [95% CI, 1.10 to 1.14]). The SSS-8 severity categories were calculated in accordance with percentile ranks: no to minimal (0-3 points), low (4-7 points), medium (8-11 points), high (12-15 points), and very high (16-32 points) somatic symptom burden. For every SSS-8 severity category increase, there was a 53% (95% CI, 44% to 63%) increase in health care visits (Gierk et al., 2014).

Gierk et al., 2015 did a comparison study between the SSS-8 and the PHQ-15 with psychosomatic outpatients (n=131) and found the reliabilities of the PHQ-15 and SSS-8 were α=0.80 and α=0.76, respectively and both scales were highly correlated (r=0.83). The item characteristics were comparable. There was the same pattern of correlations with measures of depression, anxiety, health anxiety and health-related quality of life (r=0.32 to 0.61). On both scales a 1-point increase was associated with a 3% increase in health care use. The percentile distributions of both scales were similar.



Gierk, B., Kohlmann, S., Kroenke, K., Spangenberg, L., Zenger, M., Brähler, E., & Löwe, B. (2014). The somatic symptom scale–8 (SSS-8): a brief measure of somatic symptom burden. JAMA internal medicine, 174(3), 399-407.

Gierk, B., Kohlmann, S., Toussaint, A., Wahl, I., Brünahl, C. A., Murray, A. M., & Löwe, B. (2015). Assessing somatic symptom burden: A psychometric comparison of the Patient Health Questionnaire—15 (PHQ-15) and the Somatic Symptom Scale—8 (SSS-8). Journal of psychosomatic research, 78(4), 352-355.

Zijlema, W. L., Stolk, R. P., Löwe, B., Rief, W., White, P. D., & Rosmalen, J. G. (2013). How to assess common somatic symptoms in large-scale studies: a systematic review of questionnaires. Journal of psychosomatic research, 74(6), 459-468.



Calgary Depression Scale for Schizophrenia (CDSS)

Depression is reported to be prevalent in 7–75% of patients with schizophrenia, with an average of 25% (Kim et al., 2006; Müller et al., 2005). During the late 1980’s, depression in schizophrenia generated substantial research attention because of its importance in diagnosis, treatment and long-term outcomes of the disorder. Scales for assessing depression in non-psychotic populations have been criticised for being inappropriate for assessing depression in individuals with schizophrenia.

The Calgary Depression Scale for Schizophrenia (CDSS) is a nine item structured interview scale that was designed in 1990 specifically to assess depression independently of symptoms of psychosis in schizophrenia. Originally an 11 item scale (Donald Addington, Addington, & Schissel, 1990), the CDS was developed from, and validated against, the Hamilton Depression Rating Scale (HDRS), Beck Depression Inventory (BDI), and the Brief Psychiatric Rating Scale (BPRS) using factor analysis, internal consistency, and face validity (Donald Addington, Addington, Maticka-Tyndale, & Joyce, 1992; Donald Addington et al., 1990).

The CDS consists of eight structured questions and a ninth observational item that depends on observation over the course of the interview (Kim et al., 2006). Items were constructed to measure: 1. Depression; 2. Hopelessness; 3. Self deprecation; 4. Guilty ideas; 5. Pathological guilt; 6. Morning depression; 7. Early wakening; 8. Suicidal ideation; and 9. Observed depression.

Items are graded on a 4-point Likert type scale (0, absent; 1, mild; 2, moderate; 3, severe), anchored by descriptors (Donald Addington et al., 1992). Point scores of all nine items are summed to obtain the CDS depression score. A score higher than 6 has an 82% specificity and 85% sensitivity for predicting the presence of a major depressive episode.

Psychometric properties

  • Reliable, valid and specific measure of depression in patients with schizophrenia. Measures depression separately from negative and extrapyramidal symptoms. Low correlation with positive and negative symptoms and no substantial correlation with extrapyramidal symptoms
  • High internal consistency: α = 0.76 – 0.86
  • Good internal and inter-rater reliability:
  • High validity: Ability to predict presence of MDD; 2. Correlation with other depression measures; 3. Confirmatory factor analysis
  • Strong construct validity: Single dimension being measured. Confirmed by correlations with other depression rating scales and prediction of major depressive episode
  • Divergent validity: Absence of correlations with positive negative and extrapyramidal symptoms


  • Used in clinical populations of patients with depression in schizophrenia (DSM-III-R, DSM-IV)
  • Focused on maximising internal and external validity across inpatients and outpatients
  • Has been translated into 40 languages. Validated in: Arabic, Spanish, German, Chinese, Thai, Brazilian, Greek, French


  • Quick to administer
  • Sensitive to change, so can be used at both the acute and residual stages of schizophrenia
  • Superior to the Hamilton Depression Rating Scale (HDRS) and Montgomery-Asberg Scale for differentiating between depression and negative and positive symptoms. All items significantly discriminate between the presence and absence of a major depressive episode
  • Most specific and valid assessment of depression in schizophrenia


  • Scale is designed for use by an experienced rater. It is not intended for self assessment



Addington, D., Addington, J., & Maticka-Tyndale, E. (1991). Reliability and validity of a depression scale for schizophrenics. Schizophrenia Research, 4(3), 247.

Addington, D., Addington, J., & Maticka-Tyndale, E. (1994). Specificity of the Calgary Depression Scale for schizophrenics. Schizophrenia Research, 11(3), 239-244.

Addington, D., Addington, J., Maticka-Tyndale, E., & Joyce, J. (1992). Reliability and validity of a depression rating scale for schizophrenics. Schizophrenia Research, 6(3), 201-208.

Addington, D., Addington, J., & Schissel, B. (1990). A depression rating scale for schizophrenics. Schizophrenia Research, 3(4), 247-251.

Addington, J., Shah, H., Liu, L., & Addington, D. (2014). Reliability and validity of the Calgary Depression Scale for Schizophrenia (CDSS) in youth at clinical high risk for psychosis. Schizophrenia Research, 153(1), 64-67.

Galletly, C., Castle, D., Dark, F., Humberstone, V., Jablensky, A., Killackey, E., Kulkarni, J., McGorry, P., Nielssen, O., Tran, N. (2016). Royal Australian and New Zealand College of Psychiatrists clinical practice guidelines for the management of schizophrenia and related disorders. Australian & New Zealand Journal of Psychiatry, 50(5), 410-472. doi:10.1177/0004867416641195

Kim, S.-W., Kim, S.-J., Yoon, B.-H., Kim, J.-M., Shin, I.-S., Hwang, M. Y., & Yoon, J.-S. (2006). Diagnostic validity of assessment scales for depression in patients with schizophrenia. Psychiatry Research, 144(1), 57-63.

Lançon, C., Auquier, P., Reine, G., Bernard, D., & Toumi, M. (2000). Study of the concurrent validity of the Calgary Depression Scale for Schizophrenics (CDSS). Journal of Affective Disorders, 58(2), 107-115.

Müller, M. J., Brening, H., Gensch, C., Klinga, J., Kienzle, B., & Müller, K.-M. (2005). The Calgary Depression Rating Scale for schizophrenia in a healthy control group: Psychometric properties and reference values. Journal of Affective Disorders, 88(1), 69-74.

Mood Disorder Questionnaire (MDQ)



The Mood Disorder Questionnaire (MDQ) was created by Hirschfeld and colleagues (2000) to address the need for accurately screening individuals with a bipolar spectrum disorder. Accurate identification of bipolar disorder (BD) is of concern as it’s often unrecognised or inaccurately diagnosed, which results in a delay of diagnosis and appropriate treatment (Lish, et al., 1994). Items on the MDQ are derived from the DSM-IV criteria and experience as a clinician (Hirschfeld, 2000).

Clinical Use

Self-report format, around five minutes to complete, not to be used for diagnostic purposes, only as a screening tool, and a comprehensive evaluation should follow a positive screen outcome.

Administration and Scoring

The MDQ consists of 3 questions. First, there are 13 items that examine manic symptoms. Second and third, enquires whether these symptoms identified have co-occurred, and the severity of the symptoms. To screen positive, the individual must have answered ‘yes’ to a minimum of 7 items on question 1, responded ‘yes’ to question 2, and answered ‘moderate problem’ or ‘serious problem’ to question 3.

Development and Psychometric Properties

The MDQ has achieved adequate internal consistency with a Cronbach’s alpha of 0.79 and 0.90 (Hirschfeld, 2000; Isometsä et al., 2003). The validation study administered the MDQ to patients at five psychiatric clinics in the United States (Hirschfeld, 2000). The results were used to determine cut off points for items, specificity, and sensitivity. Findings demonstrated that the MDQ had a 0.73 sensitivity and a 0.90 specificity when contrasted against other screening questionnaires in psychiatric settings. The researchers then conducted testing in a general population, which identified a 0.28 sensitivity and a 0.97 specificity (Hirschfeld, 2002). An additional study assessed the effectiveness of the MDQ in unipolar and bipolar depressive patients and found a 0.58 sensitivity (higher sensitivity for bipolar 1) and a 0.67 specificity (Miller, Klugman, Berv, Rosenquist, Ghaemi, 2004). Lastly, testing in a primary care setting revealed a 0.58 sensitivity and a 0.93 specificity (Hirschfeld, Cass, Holt, Carlson, 2005).

In sum, the MDQ is a useful screening tool for BD, demonstrating validity in clinical settings and across cultures. However, consideration should be given towards its higher sensitivity to detect BD type 1 compared to other BD on the spectrum, and its low sensitivity in general populations. Additionally, the use of differing cutoff points of items in scoring (e.g., standard or modified cutoff value of 7 for question 1), and the inclusion/exclusion criteria (e.g., more defined BD definition/criteria includes more severe cases, and increases sensitivity) has shown variability in sensitivity and specificity thus, limiting its overall effectiveness (Wang, et al., 2015).


Hirschfeld, R. M., Williams, J. B., Spitzer, R. L., Calabrese, J. R., Flynn, L., Keck Jr, P. E., … & Russell, J. M. (2000). Development and validation of a screening instrument for bipolar spectrum disorder: the Mood Disorder Questionnaire. American Journal of Psychiatry157, 1873-1875.

Hirschfeld, R. M. (2002). The Mood Disorder Questionnaire: a simple, patient-rated screening instrument for bipolar disorder. Primary care companion to the Journal of Clinical Psychiatry4, 9.

Miller, C. J., Klugman, J., Berv, D. A., Rosenquist, K. J., & Ghaemi, S. N. (2004). Sensitivity and specificity of the Mood Disorder Questionnaire for detecting bipolar disorder. Journal of Affective Disorders81, 167-171.

Hirschfeld, R. M., Cass, A. R., Holt, D. C., & Carlson, C. A. (2005). Screening for bipolar disorder in patients treated for depression in a family medicine clinic. The Journal of the American board of family practice18, 233-239.

Isometsä, E., Suominen, K., Mantere, O., Valtonen, H., Leppämäki, S., Pippingsköld, M., & Arvilommi, P. (2003). The mood disorder questionnaire improves recognition of bipolar disorder in psychiatric care. BMC psychiatry, 3, 8.

Lish, J. D., Dime-Meenan, S., Whybrow, P. C., Price, R. A., & Hirschfeld, R. M. (1994). The National Depressive and Manic-depressive Association (DMDA) survey of bipolar members. Journal of affective disorders31, 281-294.

Clinician-Administered PTSD Scale (CAPS-5)

The Clinician-Administered PTSD Scale (CAPS) is considered to be the gold standard for posttraumatic stress disorder (PTSD) diagnosis.  The 30-item structured interview corresponds to the diagnostic criteria for PTSD described in version 5 of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5).   The Life Events Checklist (LEC) is used in conjunction with the CAPS to assess PTSD Criterion A (the trauma experienced).  The full interview typically takes 45-60 minutes to administer.

Use & availability

The CAPS is the intellectual property of the National Center for PTSD, a division of the US Department of Veterans Affairs (VA).  It is available at no cost to health professionals, but a request for use must be submitted to VA.  This can be done online at

Psychometric properties

The CAPS has proven reliability and is well-validated.  Initial validation of the DSM-5-aligned version shows r = .83 convergent validity with the widely-validated CAPS-IV.  It has been translated into more than 10 languages, with validation studies occurring in Bosnian and Swedish.


Department of Veterans Affairs, United States of America. (2017). National Center for PTSD. (accessed 1 September 2017).

Weathers, F.W., Keane, T.M., & Davidson, J.R.T. (2001). Clinician Administered PTSD Scale: The first 10 years of research. Depression and Anxiety, 13(3), 132-156.

Weathers, F.W., Blake, D.D., Schnurr, P.P., Kaloupek, D.G., Marx, B.P., & Keane, T.M. (2015). The Clinician-Administered PTSD Scale for DSM-5 (CAPS-5) – Past Month. Available from

Major Depression Inventory (MDI)

The most commonly utilized measures of depression were created prior to the release of the Diagnostic and Statistical Manual of Mental Disorders III (DSM-III) in 1980. Therefore, items on these tests may not be optimal. Consequently, new tools were formulated such as the Major Depression Inventory (MDI) (Cuijpers et. al., 2007). The MDI is a self-rated tool that has a dual function; it can be either a diagnostic instrument that aids in assessing the presence of DSM-IV major depression, or a measure of the degree of depression severity (Bech et. al., 2015).
It was developed by Professor Per Bach and associates in collaboration with the Psychiatric Research Unit of the Danish World Health Organization Collaborative Centre for Mental Health (Konstantinidis et al., 2011 & Psychiatric Times, 2013). It consists of 12 items; Items 8 and 10 involve two sub-items; a and b, all scored on a frequency response scale ranging from “none of the time” (zero) to “all of the time” (five), and is answered in the context of the last 2 weeks. Functionally, it only contains 10 items as only the highest score of either a or b are counted in both Item 8 and 10 (Bech et. al., 2015, Konstantinidis et. al., 2011, & Bech et. al., 2001).

Using the MDI as measure of depression severity: total score of ten items calculated by adding together 10 scores. The total score range is 0-50. 0-20 indicates depression does not exist or its existence is doubtful, 21-25 indicates mild depression, 26-30 indicates moderate depression, and 31-50 indicates severe depression.

Using the MDI as a diagnostic tool: algorithm for DSM-IV diagnosis of major depression; Items 4 and 5 are combined and only the highest answer of the two is considered. The presence of at least 5 of 9 symptoms indicates diagnosis of major depression. Item 1 or 2 must be among the 5 or more symptoms. The clinical range incorporates Items 1 to 3 occurring most of the time or all of the time, and all other symptoms occurring either slightly more than half of the time, most of the time or all of the time. If 5 or more symptoms are in this range, a diagnosis of major depression is supported (Bech et. al., 2015, Konstantinidis et. al., 2011, & Bech, 2011).

Psychometric Properties
Research findings suggest that the MDI possesses good reliability, validity, sensitivity and specificity (Cuijpesr, 2007). Cuijpers and associates (2007) found that the test had good reliability, a substantial correlation with another measure of depressive symptoms, and acceptable specificity and sensitivity. Also, Forsell (2003) found that the MDI has high internal consistency. Furthermore, Olsen and colleagues (2003) found that the tool demonstrated adequate internal and external validity as a measure of depression severity.
In regards to differential diagnosis, the levels of sensitivity and specificity that the MDI has demonstrated across multiple studies indicates that the MDI has the ability to identify individuals who have depression and to identify those who do not. Hence, this test may assist in the process of differential diagnosis (Cuijpers, et. al., 2007).

Strengths of the MDI include: being able to utilize it as a continuous scale indicating level of depression symptoms, and as a method of acquiring an indication of the existence of major depression, the fact that it appears to be a reliable tool for evaluating depression, and that it is brief in nature (Cuijpers et. al., 2007).

Some weaknesses of the MDI: the fact that whilst sensitivity and specificity of the diagnostic algorithm have been found to be acceptable in clinical populations, in general populations sensitivity and specificity have been found to be low (Amris et. al., 2016). Also, further research on the MDI is needed, and the tool was based on the DSM-IV, however this has been superseded by the DSM-5, thus the tool may not be representative of the new DSM.

Some evidence exists to suggest the MDI is reliable and valid across many countries and cultures and across genders (Cuijpers, 2007, Olsen et. al., 2003, Fountoulakis, et. al., 2003, & Konstantinidis et. al., 2011).


Amris, K., Omerovic, E., Danneskiold-Samsoe, B, Bliddal, H., & E. E.Waehrens. (2016). The validity of self-rating depression scales in patients with chronic widespread pain: a Rasch analysis of the Major Depression Inventory. Scandinavian Journal of Rheumatology, 45(3), 236-246. doi: 10.3109/03009742.2015.1067712

Bech, P., Timmerby, N., Martiny, K., Lunde, M., & Soendergaard, S. (2015). Psychometric evaluation of the Major Depression Inventory (MDI) as depression severity rating scale using the LEAD (Longitudinal Expert Assessment of All Data) as index of validity. BMC Psychiatry , 15(90) , 1-7. doi: 10.1186/s12888-051-0529-3.
Bech, P., Rasmussen, M.A., Raabaek Olsen , L., Noreholm, V., & Abildgard, W. (2001). The sensitivity and specificity of the Major Depression Inventory, using the Present State Examination as the index of diagnostic validity. Journal of Affective Disorders, 66(2001), 159-164.

Cuijpers, P., Dekker, J., Noteboom, A., Smits, N., & Peen, J. (2007) Sensitivity and specificity of the Major Depression Inventory in outpatients. BMC Psychiatry, 7(39), 1-6. doi:10.1186/147-244X-7-39

Forsell, Y. (2005). The Major Depression Inventory versus Schedules for Clinical Assessment in Neuropsychiatry in a population sample. Social Psychiatry, 2005(40), 209-213. doi:10.1007/z00127-005-0876-3

Fountoulakis, K.N., Iacovides, A., Kleanthous, S., Samolis, S., Gougoulias, Kaprinis, GS, & Bech, P. (2003) Reliability, validity and psychometric properties of the greek translation of the Major Depression Inventory. BMC Psychiatry,3(2), 1-8.

Konstantinidis, A., Martiny, K., Bech, P., & Kasper, S. (2011). A comparison of the Major Depression Inventory (MDI) and the Beck Depression Inventory (BDI) in severely depressed patients. International journal of psychiatry in clinical practice, 15(1), 56-61. doi: 10.3109/13651501.2010.507870

Olsen, L.R., Jensen, D.V., Noerholm, V., Martiy, K & Bech, P. (2003). The internal and external validity of the Major Depression Inventory in measuring severity of depressive states. Psychological Medicine, 2003(33), 251-356. doi: 10.1017/SOO33291702006724.

Psychiatric Times. (2013, April). MDI. Retrieved from

Drug Use Disorder Identification Test – Extension (DUDIT-E)

The DUDIT-E is an extension of the DUDIT, which previously only consisted of 11 items, whereas the extended version has 54 items. It is important to note that the DUDIT-E is an assessment tool, rather than a diagnostic tool. The assessment test is based on the idea that it is a behavioural choice to use drugs, and that that reflects positive outcome expectancies. For active drug users, the choice to use substances does not necessarily reflect a perceived sense of low self-efficacy, but rather, the use of drugs can be perceived as an adequate and self-efficacious response to legitimate existential needs.

A benefit from using the DUDIT-E, involves the use of motivation, as motivation is a widely used concept in substance use treatment, and is commonly seen as a premise for change during treatment (Sletteng et al., 2011). The short assessment tool is divided into four sections: frequency of drug use, positive aspects of drug use, negative aspects of drug use, and treatment readiness and aspects of motivation. In the section that focuses on treatment readiness, it asks the individual for their thought about drugs, which help indicate whether or not they are ready for change. Questions include, “have you been worried about your drug use…?” and “are you ready to work to change your drug use?”. Another benefit includes the assessment tool being available online, and in different languages from different countries, if there is no available copy in a specific language then the assessor can apply to get it translated, or translate it themselves. Identified substances that the DUDIT-E (drug list) covers include: cannabis, amphetamines, cocaine, opiates, hallucinogens, thinner and other drugs, GHB, sleeping/calming pills, pain relievers and tobacco.

The DUDIT-E has also been tested and examined psychometrically across various countries, such as Sweden and Norway, and using populations such as prison inmates and inpatients. Reliability analyses indicate good internal consistency, and high intraclass correlations indicate good test-retest reliability. The Cronbach’s Alpha Values indicate that the subscales range from 0.73 to 0.93, and for the online version, it was reported that the Cronbach’s Alpha was greater than 0.73 for all subscales (Berman et al., 2007).

Limitations involve gender distributions in past studies showing a preponderance of males. Another limitation is that the tool has only been studied in detoxification settings and criminal justice settings, therefore, there may be biased responses as individuals can be affected by perceived pressure to answer questions in a socially desirable manner. This may occur more frequently when the expression of negative attitude towards continued drug use is connected to possible benefits for the respondent, such as within the criminal justice setting before sentencing, or in an employment setting where scores may be invalid due to fears regarding possible sanctions following admitted drug use (Matuszka et al., 2014). Therefore, future research should be made to contribute to treatment admission among female drug users and young drug users in an outpatient addiction treatment setting, where treatment is clearly defined and follows a specified placement procedure. Overall, the DUDIT-E is a reliable and valid assessment tool, which is internationally appropriate as it can be translated to many different languages in different settings, and is available to use on both females and males.


Berman, A. H., Palmstierna, T., Källmén, H., & Bergman, H. (2007). The self-report Drug Use Disorders Identification Test—Extended (DUDIT-E): reliability, validity, and motivational index. Journal of substance abuse treatment, 32(4), 357-369.

Matuszka, B., Bácskai, E., Berman, A. H., Czobor, P., Sinadinovic, K., & Gerevich, J. (2014). Psychometric characteristics of the drug use disorders identification test (DUDIT) and the drug use disorders identification test-extended (DUDIT-E) among young drug users in Hungary. International journal of behavioral medicine, 21(3), 547-555.

Sletteng, R., Harnang, A. K., Hoxmark, E., Aslaksen, P. M., Friborg, O., & Wynn, R. (2011). A Psychometric Study of the Drug Use Disorders Identification Test—Extended in a Norwegian Sample. Psychological reports109(2), 663-674.