Learning Keywords: Reliability, Validity, Psychometric Assessment, Clinical Application
Link to Resource

Introduction

This session, led by Dr. Andrew Kiselica as part of the KnowNeuropsychology didactic series, provided a comprehensive examination of reliability and validity in clinical neuropsychology. The discussion centred on the conceptual foundations, technical nuances, and practical application of these psychometric properties in clinical and research settings. Key learning points included the importance of understanding what we are truly measuring in neuropsychological contexts, differentiating between various forms of reliability and validity, and recognising their implications for assessment, research, and culturally informed practice.

Main Discussion

Reconceptualising Reliability and Validity: Focus on Scores, Not Tests

A crucial message from Dr. Andrew Kiselica was the importance of distinguishing between the reliability and validity of test scores rather than the psychometric properties of the test itself. Measuring constructs—such as executive functioning or processing speed—entails inferring underlying psychological abilities from observable test scores rather than observing tangible traits. In neuropsychological assessment, the precision and accuracy of these scores are fundamental, as they form the basis of clinical decisions and research inferences.

Reliability was described as the consistency or precision of a score when the same test is administered under similar conditions. Validity, on the other hand, concerns whether the score is an accurate reflection of the intended construct. These are not binary attributes but exist along a continuum, dependent on the accumulation of evidence and always contextual to the purpose of measurement.

Forms of Reliability and Their Clinical Significance

The session examined several types of reliability that are particularly pertinent in neuropsychology:

  • Test-retest reliability (both between-subjects and within-subjects): Between-subjects reliability assesses the stability of scores relative to others over time and is fundamental for comparing individuals or groups to normative samples. Within-subjects reliability focuses on the stability of an individual’s scores across multiple assessments, essential for monitoring change or decline in clinical follow-up.
  • Internal consistency: Particularly relevant to multi-item questionnaires and scales, internal consistency evaluates how well items designed to measure the same construct yield similar results. This is commonly measured with coefficients such as Cronbach’s alpha.
  • Alternate forms reliability: This applies where multiple equivalently designed versions of a test are needed to reduce practice effects during repeated assessments.
  • Inter-rater reliability: Especially significant when behavioural ratings or observational scales are used, inter-rater reliability assesses the agreement between different assessors.

Dr. Andrew Kiselica presented illustrative research examples making clear that no single reliability threshold suits all purposes. The required level of reliability is dictated by what is at stake—higher in contexts such as pre-surgical mapping or forensic assessments and potentially lower when screening for less critical conditions.

Exploring Validity: Evidence and Context

The session then turned to validity, emphasising that a score cannot be valid unless it is reliable. Several types of validity evidence were covered:

  • Face and content validity: While face validity refers to the apparent suitability of a measure to those being assessed, content validity ensures the measure represents the breadth of the construct and excludes irrelevant elements. Both are important, but content validity is non-negotiable as it ensures accurate construct representation.
  • Construct validity: This umbrella includes several strands:
    • Convergent and divergent validity, assessed through the pattern of correlations between measures thought to relate closely (convergent) or to differ (divergent).
    • Structural validity, typically established using factor analysis to determine whether test items or subtests coalesce as theorised.
    • Responsiveness, which assesses whether a measure detects expected changes due to development, intervention, or disease progression.
  • Criterion validity: This encompasses both concurrent validity (a test’s ability to distinguish between known groups at a single time point) and predictive validity (the effectiveness of scores in forecasting future outcomes). The idea of incremental validity was also discussed, pinpointing whether a new score adds significant explanatory value above existing established measures.

Through these frameworks, the session underscored that validity is not a one-off determination but an ongoing evidentiary process tailored to how and with whom the measure is being used.

Contextual and Individual Considerations

A major theme was the interplay between population-level psychometric evidence and its application in clinical contexts, especially when group-derived norms must inform decisions about individuals. Practical challenges arise when neuropsychological tests, often standardised in Western contexts, are used across different cultural settings. Dr. Andrew Kiselica discussed how cultural differences, linguistic backgrounds, and varying test-taking traditions can affect test performance—particularly on measures like processing speed.

The discussion also addressed the gap between group-level validation and individual clinical decision-making. The need for careful consideration of situational factors in the assessment process—such as testing environment, accurate administration, patient engagement, and use of performance validity tests—was highlighted as crucial for ensuring that scores are genuinely reliable and valid in the context of the individual case.

Implications for Neuropsychological Practice

Several practical points stand out for clinicians:

  • Refer to reliability and validity at the score level, not simply the instrument, and consider these properties as specific to the context, population, and purpose of use.
  • Scrutinise the evidence base for the specific cohort and clinical application—what is reliable and valid in one setting may not be transferable to another, especially when considering cultural and linguistic factors or when local norms are absent.
  • When making judgements about individuals, be aware of the limitations of group-derived statistics. Supplement quantitative test scores with behavioural observations, background information, and performance validity checks to ensure interpretations are warranted.
  • Maintain a critical awareness of the practical consequences of psychometric properties. For example, less reliable scores yield wider confidence intervals, which directly impacts the certainty with which clinical decisions can be made.
  • Draw on test manuals and original research to understand the reliability and validity estimates for specific tools, particularly focusing on populations tested and contexts of use.
  • Embed psychometric awareness into all stages of the assessment cycle, from test selection and administration to interpretation, communication, and formulation.

Conclusion

The session provided an in-depth overview of the centrality of reliability and validity in neuropsychological assessment, emphasising their dynamic, context-bound nature. Understanding the types of evidence available, the distinction between score and test properties, and the critical clinical implications enables us to make more precise, accurate, and defensible clinical decisions.

Key themes included the necessity of accumulating and evaluating psychometric evidence relevant to the individual and setting, the importance of interpreting test results in light of cultural and contextual factors, and the value of integrating multiple sources of evidence in clinical work. The session served as a reminder that the science of measurement underpins the art of clinical neuropsychology, and that rigorously considering reliability and validity is essential for responsible, effective practice.

🔗 Watch the full session here