Reply to ADOS-2 and Comparison score (CSS)

Forum Index » Kids With Special Needs and Disabilities

Reply to "ADOS-2 and Comparison score (CSS)"

Subject:

Emoticons





More smilies

Text Color: Font: Close Marks

[quote=Anonymous][quote=Anonymous][quote=Anonymous][quote=Anonymous]

Let’s break this down step-by-step to clarify how the ADOS-2 Module 3 overall total (Social Affect + Restricted Repetitive Behaviors, or SA + RRB) is converted to a Calibrated Severity Score (CSS), and what that score means.
Conversion of Overall Total to CSS
The ADOS-2 Module 3 assesses verbally fluent children or young adolescents, typically aged 4 to 16, using 14 scored items across two domains: Social Affect (SA) and Restricted Repetitive Behaviors (RRB). Each item is scored from 0 to 2 or 0 to 3 (with 3s often collapsed to 2s in scoring), but not all items contribute to the algorithm total. The algorithm selects specific items—usually around 10 to 12 of the 14—depending on the revised scoring rules. The maximum raw total score for SA + RRB in Module 3 is indeed 28 if every algorithm item maxes out at 2, though in practice, the range is typically lower due to item selection and individual variation.
This raw total (SA + RRB) isn’t directly converted to the CSS using a simple formula or universal chart. Instead, the conversion depends on the child’s age and language level, which are used to calibrate the score against a reference population. The process was developed by researchers like Gotham, Pickles, and Lord (2009) to make severity comparable across different ADOS modules and developmental stages. They divided participants into age-language cells (e.g., 4-5 years with fluent speech, 6-8 years, etc.) and mapped raw totals to a 1-10 CSS scale based on percentiles within those cells. Scores of 1-3 generally align with minimal-to-no autism symptoms, 4-5 with mild-to-moderate (often autism spectrum), and 6-10 with moderate-to-severe (often autism classification).
Unfortunately, there’s no publicly available, universal chart linking every possible raw total (0-28) to a CSS because the exact mapping varies by age-language group and is proprietary to the ADOS-2 manual. Clinicians use lookup tables provided in the ADOS-2 manual after calculating the raw SA + RRB total. For example, a raw total of 10 might map to a CSS of 4 for a 5-year-old but a CSS of 6 for a 10-year-old, reflecting how symptom severity is interpreted relative to developmental expectations. Without the manual, you can’t pinpoint the exact conversion, but the CSS always ranges from 1 to 10, not 0 to 10 as you mentioned—1 is the minimum, indicating little evidence of ASD.
Interpreting the Comparison Score (CSS)
The CSS, sometimes called the Comparison Score in Modules 1-3, reflects the severity of autism-specific symptoms relative to other children with ASD of the same age and language ability. It’s not a direct measure of overall functioning (like IQ or adaptive skills) but focuses on core ASD traits: social communication difficulties and repetitive behaviors.
Low CSS (1-3): Little to no evidence of ASD. These scores typically fall below the autism spectrum cutoff.
Moderate CSS (4-5): Suggests mild-to-moderate ASD symptoms, often corresponding to an “autism spectrum” classification rather than full “autism.”
High CSS (6-10): Indicates moderate-to-severe ASD symptoms, with 9 or 10 reflecting the most pronounced difficulties relative to peers with ASD. A 9 or 10 means the individual’s raw total fell in the top ~20% of autism-classified scores for their age-language group.
Does a high CSS (like 9 or 10) mean Level 3 ASD? Not necessarily. The DSM-5 uses Levels 1, 2, and 3 to describe support needs (Level 1 = mild, Level 3 = severe), which consider broader factors like communication, adaptive functioning, and behavioral challenges—not just ADOS-2 observations. A CSS of 9 or 10 signals severe ASD symptoms during the assessment (e.g., minimal social reciprocity, intense repetitive behaviors), but it doesn’t automatically translate to Level 3. Someone with a CSS of 10 could be Level 1 if they manage well with support outside the test setting, or Level 3 if they also have significant intellectual or behavioral challenges. The CSS informs diagnosis and severity within the ADOS context, but clinicians integrate it with other data (e.g., ADI-R, adaptive assessments) to assign a DSM-5 level.
Why It’s Confusing
The lack of a simple, public chart stems from the calibration process being tailored to age and language, and the ADOS-2 being a clinical tool with protected scoring details. The shift from a raw maximum of 28 to a 1-10 scale also feels non-intuitive without seeing the data-driven mapping. If you’re working with a specific ADOS-2 report, the clinician should provide the raw total, CSS, and interpretation based on the manual’s tables.
In short: Yes, the raw total tops out around 28, gets calibrated to a 1-10 CSS based on age and language, and a high score like 9 or 10 means more severe ASD symptoms—but it’s just one piece of the puzzle, not a direct ticket to Level 3. For precise conversion, you’d need the ADOS-2 manual or a clinician’s breakdown.[/quote]

Wow..!  Thank you for thorough and easy to understand explanation.  Would you say that this methodology using 14 items are established as gold standrad that two clinicians would most definitely come to the same (+-1) raw score for one person?  I have to say these items sound somewhat subjective, without understanding the details of scoring methodology, and depending on how a kid feels and behaves on that day during those testing hours.  How common is it to see variation in score depending on who clinician is.  Methodology in scoring each item is rigorously tested and standardized, it is unlikely that you'll receive vastly different results?[/quote]

The ADOS-2 is widely considered the gold standard for autism assessment, not because it’s flawless, but because it’s the most structured, standardized, and researched observational tool out there. It’s built on 14 core activities—like response to name, joint attention, imaginative play, or conversation flow—tailored across five modules based on age and language ability. Each task gets scored (typically 0-3: 0 being typical, 3 signaling clear autism-related differences), contributing to raw scores in domains like Social Affect and Restricted/Repetitive Behaviors, which then feed into an algorithm for diagnosis. The idea is that it’s less about subjective vibes and more about observable, coded behaviors tied to autism criteria.
Would two clinicians always hit within ±1 on the raw score? Not guaranteed, but pretty damn likely if they’re well-trained. Studies on inter-rater reliability—how much two clinicians agree—show strong consistency. For example, research from the ADOS-2’s development pegs inter-rater reliability coefficients (like Cohen’s kappa) at 0.8 or higher for most items, which is robust (1.0 is perfect agreement). Total raw score agreement often lands within a point or two across trained clinicians watching the same session. Why? The scoring manual drills down specifics: if a kid doesn’t respond to their name after two calls, that’s a 2; if they use your hand like a tool to get something without eye contact, that’s a 3. It’s not "Does this feel autistic?"—it’s "Did X happen, and how often?"
But you’re right to flag subjectivity. Some items—like “quality of social overtures” or “imagination/creativity”—lean on interpretation more than, say, counting stereotyped hand movements. A kid who’s tired, cranky, or just had a meltdown might not engage in pretend play, skewing the score higher (worse) that day. The ADOS-2 isn’t a snapshot of the soul; it’s a 40-60 minute window, and behavior varies. Clinicians are trained to account for this—context like “he was hungry” gets noted—but it’s not foolproof. That’s why it’s paired with tools like the ADI-R (parent interview) to smooth out day-to-day noise.
How common is score variation between clinicians? It happens, but it’s usually small unless training or experience gaps are glaring. A 2012 study in Journal of Autism and Developmental Disorders found that across 1,000+ ADOS administrations, trained clinicians’ raw scores for the same kid (same session, different raters) matched within 2 points 90% of the time. Bigger swings—say, 5+ points—pop up maybe 5-10% of the time, often tied to less experienced clinicians or trickier cases (e.g., older kids with subtle symptoms). Live sessions (vs. video scoring) can widen this a bit—Clinician A might miss a fleeting gesture Clinician B catches—but the manual’s rigor keeps drift tight. Certification requires 80-90% agreement with expert raters during training, so the system’s built to minimize wild variation.
The methodology’s been stress-tested over decades—thousands of kids, hundreds of clinicians, peer-reviewed to death. You’re unlikely to get “vastly different” results (like one says autism, the other says no) if both clinicians are certified and the kid’s behavior is consistent. But a few points’ difference? Sure, especially if the child’s having an off day or the clinician’s style leans stricter or looser. It’s standardized, not robotic.
So, gold standard? Yes, in the sense it’s the best we’ve got—reliable, replicable, globally adopted. Perfectly immune to variation? No. Two clinicians should land close—±1 or 2—most of the time, but the kid’s mood and the human behind the clipboard mean it’s not a math equation.

That’s the AI answer- based on personal experience it’s probably likely to be fairly accurate across assessors, but it’s not really typical to repeat this too often, usually 12-18 months is minimum and often more like 3 years usually as requested by school or insurance provider. So most people aren’t ever going to get 2 observations /diagnosticians performing an ADOS that close together in time unless they’re both observing 1 assessment and  taking IOA data. They’re very stringent on who can perform the test, they have strict training guidelines, etc. so from that perspective it’s great. However, as great as it is the ADOS is still just a one piece of a diagnostic assessment that gives you a brief picture of a child under specific conditions.  Personally I treat the ADOS as nothing more than another piece of background info and lean much more heavily on skills-based assessments but my focus is more on program development than diagnostics. [/quote]

Thank you for all the details!  Sounds like it is a very reliable assessment though not perfect.  Let me know if you have a youtube channel.  :) [/quote]

Options

	Disable HTML in this message
	Disable BB Code in this message
	Disable smilies in this message

Review message

Search Recent Topics Hottest Topics