Comprehensive Comparison: Missing Values Support

SAS vs. Stata vs. SPSS - Detailed Analysis

SAS 9.4 M9 / Viya 2025.07
Stata 19 (April 2025)
SPSS 31 (June 2025)

Executive Summary

This document provides a detailed comparison of missing values support across three major statistical software packages. The analysis identifies identical features, key differences, and unique capabilities of each platform.

Key Finding: SAS and Stata share a categorical/alphabetic approach with 28 and 27 missing value types respectively, while SPSS uses a fundamentally different value-based definition approach.

Quick Reference Comparison

Feature SAS Stata SPSS
Latest Version 9.4 M9 / Viya 2025.07 19 (April 2025) 31 (June 2025)
Numeric Missing Types 28 total 27 total System + User-defined
Extended/Special Missing Yes (27: ._ + .A-.Z) Yes (26: .a-.z) No (different approach)
Character/String Missing Types 1 (blank only) 1 (blank only) Up to 3 discrete values
Range-Based Missing No No Yes (numeric only)
Can Label System Missing Yes No N/A
Case Sensitivity (coding) No No N/A

1. Numeric Variables - Missing Values

1.1 Total Number of Missing Value Types

Software Total Types Breakdown Notes
SAS 28 1 standard (.) + 1 special (._) + 26 letters (.A-.Z) Most options
Stata 27 1 system (.) + 26 letters (.a-.z) Very similar to SAS
SPSS Variable 1 system + user-defined (3 discrete OR 1 range + 1 discrete) Fundamentally different approach
IDENTICAL:
DIFFERENT:

1.2 Extended/Special Missing Values

Software Support Syntax Display Purpose
SAS Yes ._, .A-.Z (or .a-.z) Upper case Categorical reasons for missingness
Stata Yes .a-.z (or .A-.Z) Lower case Categorical reasons for missingness
SPSS No N/A N/A Uses value-based missing instead
IDENTICAL (SAS & Stata):
DIFFERENT:

1.3 Range-Based Missing Values

Software Support Example Syntax Use Case
SAS No N/A Cannot define range as missing
Stata No N/A Cannot define range as missing
SPSS Yes MISSING VALUES income (-99 THRU -1). Define continuous range as missing
IDENTICAL (SAS & Stata): Neither supports range-based missing values
DIFFERENT:

1.4 Missing Value Sort Order

Software Sort Order Comparison Behavior
SAS ._ < . < .A < .B < ... < .Z All missing > any non-missing number
Stata . < .a < .b < ... < .z All missing > any non-missing number
SPSS System missing has special handling System missing ≠ specific value

2. Character/String Variables - Missing Values

2.1 Number of Missing Value Types

Software Missing Types Representation Specification
SAS 1 Blank: ' ' or " " Single space between quotes
Stata 1 Blank: "" Empty string
SPSS Up to 3 Discrete values User specifies up to 3 distinct strings
IDENTICAL (SAS & Stata):
DIFFERENT:

3. Value Labels / Formats

3.1 Labeling Mechanism

Software Mechanism Syntax Pattern Terminology
SAS PROC FORMAT proc format; value name val='label'; run; Formats
Stata label define label define name val "label" Value labels
SPSS VALUE LABELS VALUE LABELS var val 'label'. Value labels

3.2 Labeling Missing Values

Software Can Label Missing? Which Missing? Limitations
SAS Yes All types (., ._, .A-.Z) None - complete flexibility
Stata ~ Partial Extended only (.a-.z) Cannot label system missing (.)
SPSS Yes User-defined missing values Only user-defined, not system missing
KEY DIFFERENCE:

3.3 String/Character Variable Labels

Software Direct Labeling Approach
SAS Yes value $formatname ($ prefix for character)
Stata No Must encode string to numeric first
SPSS Yes Direct VALUE LABELS application to strings

4. Code Examples: Defining Missing Values with Labels

4.1 Complete Code Examples - All Possible Missing Values

The following table shows complete, executable code examples for defining all possible missing values with labels in each software package. Each example demonstrates the maximum capabilities of the system.

Missing Value Type SAS Code Example Stata Code Example SPSS Code Example
NUMERIC VARIABLES - DEFINING MISSING VALUES
Standard/System Missing /* Automatic - always exists */ age = .; // Automatic - always exists age = . * Automatic - always exists COMPUTE age = $SYSMIS.
Special Missing: Refused age = .A; /* or .a */ replace age = .a * Use user-defined value COMPUTE age = -9. MISSING VALUES age (-9).
Special Missing: Not Applicable age = .B; /* or .b */ replace age = .b * Use user-defined value COMPUTE age = -8. MISSING VALUES age (-9, -8).
Special Missing: Don't Know age = .C; /* or .c */ replace age = .c * Use user-defined value COMPUTE age = -7. MISSING VALUES age (-9, -8, -7).
All Extended/Special Missing (.A-.Z / .a-.z) /* SAS supports 28 total: . (standard) ._ (underscore - lowest) .A, .B, .C, ... .Z (26 letters) */ age = ._; /* Lowest missing */ age = .A; /* to */ age = .Z; /* 26 letter options */ // Stata supports 27 total: // . (system) // .a, .b, .c, ... .z (26 letters) replace age = . // System replace age = .a // to replace age = .z // 26 letter options * SPSS: Up to 3 discrete OR * 1 range + 1 discrete * Option 1: Three discrete values MISSING VALUES age (-9, -8, -7). * Option 2: One range + one value MISSING VALUES age (-99 THRU -1, 999).
NUMERIC VARIABLES - LABELING MISSING VALUES
Complete Format/Label Definition proc format; value agefmt 18-25 = 'Young Adult' 26-40 = 'Adult' 41-65 = 'Middle Age' 66-high = 'Senior' . = 'Unknown' ._ = 'Lowest Missing' .A = 'Refused' .B = 'Not Applicable' .C = 'Don''t Know' .D = 'Not Reported' .E = 'Other Missing' .F-.Z = 'Other Special'; run; /* Apply format */ format age agefmt.; label define age_lbl /// 18 "18 years" /// 25 "25 years" /// 35 "35 years" /// .a "Refused" /// .b "Not Applicable" /// .c "Don't Know" /// .d "Not Reported" /// .e "Other Missing" /// .f "Invalid" /// .z "End Missing", modify // NOTE: System missing (.) // cannot be labeled in Stata label values age age_lbl * Define user-defined missing MISSING VALUES age (-9, -8, -7). * Label all values including missing VALUE LABELS age 18 '18 years' 25 '25 years' 35 '35 years' -9 'Refused' -8 'Not Applicable' -7 'Don''t Know'. * Note: System missing cannot * be labeled in SPSS
CHARACTER/STRING VARIABLES - DEFINING MISSING VALUES
String Missing: Blank gender = ' '; /* or " " */ /* Blank is the only character missing */ replace gender = "" // Blank is the only // string missing COMPUTE gender = ''. MISSING VALUES gender ('').
String Missing: Multiple Values /* NOT SUPPORTED Only blank ' ' is missing for character variables */ // NOT SUPPORTED // Only blank "" is missing // for string variables * SPSS supports up to 3 discrete MISSING VALUES gender ('X', 'NK', '').
CHARACTER/STRING VARIABLES - LABELING VALUES
Complete String Format/Label proc format; value $genderfmt 'M' = 'Male' 'F' = 'Female' 'X' = 'Other' ' ' = 'Not Reported' other = 'Unknown'; run; format gender $genderfmt.; // Stata requires encoding // strings to use labels encode gender, /// gen(gender_num) /// label(gender_lbl) // Then modify the label label define gender_lbl /// 1 "Male" /// 2 "Female" /// 3 "Other", modify * Direct labeling supported VALUE LABELS gender 'M' 'Male' 'F' 'Female' 'X' 'Other' '' 'Not Reported'.
COMPLETE WORKING EXAMPLE - ALL FEATURES
Full Working Example /* Create dataset */ data demo; age = 25; output; age = .A; output; /* Refused */ age = .B; output; /* N/A */ age = .; output; /* Unknown */ run; /* Define formats */ proc format; value agefmt low-17 = 'Under 18' 18-65 = 'Adult' 66-high = 'Senior' . = 'Unknown' .A = 'Refused' .B = 'Not Applicable' .C-.Z = 'Other Missing'; run; /* Apply and display */ data demo; set demo; format age agefmt.; run; proc freq data=demo; tables age / missing; run; // Create dataset clear set obs 4 gen age = . replace age = 25 in 1 replace age = .a in 2 // Refused replace age = .b in 3 // N/A replace age = . in 4 // Unknown // Define labels label define age_lbl /// 18 "18 years" /// 25 "25 years" /// 65 "65 years" /// .a "Refused" /// .b "Not Applicable" /// .c "Don't Know" /// .d-.z "Other Missing", modify // Apply labels label values age age_lbl // Display with missing tabulate age, missing * Create dataset DATA LIST FREE / age. BEGIN DATA 25 -9 -8 . END DATA. * Define missing values MISSING VALUES age (-9, -8). * Label all values VALUE LABELS age 18 '18 years' 25 '25 years' 65 '65 years' -9 'Refused' -8 'Not Applicable'. * Display frequencies FREQUENCIES VARIABLES=age /ORDER=ANALYSIS.

Key Observations from Code Examples:

5. Detection and Testing

5.1 Functions for Missing Detection

Software Function/Test Detects Notes
SAS missing(var) All 28 types Universal check
var = . Standard (.) only Specific check
var <= .Z All missing Range check
Stata missing(var) All 27 types Universal check
var == . System (.) only Specific check
var < . Exclude all missing Comparison check
SPSS MISSING(var) System + user-defined Universal check
SYSMIS(var) System missing only Specific check
VALUE(var) User-defined missing Specific check

5.2 Code Examples for Detection

Detection Task SAS Code Stata Code SPSS Code
Check if any missing if missing(age) then flag = 1; gen flag = missing(age) IF MISSING(age) flag=1.
Check for system/standard missing only if age = . then flag_std = 1; gen flag_std = (age == .) IF SYSMIS(age) flag_std=1.
Check for specific special missing if age = .A then flag_refused = 1; gen flag_refused = (age == .a) IF (age = -9) flag_refused=1.
Exclude all missing in condition if age <= .Z then delete; /* Or */ if age < . then process; drop if missing(age) // Or keep non-missing keep if age < . SELECT IF NOT MISSING(age).
Count missing by type data counts; set mydata; if age = . then cnt_std + 1; if age = .A then cnt_ref + 1; if age = .B then cnt_na + 1; run; count if age == . count if age == .a count if age == .b // Or use tabulate tab age, missing FREQUENCIES age /MISSING.
IDENTICAL:

5. Handling in Statistical Procedures

5.1 Automatic Exclusion

Software Automatic Exclusion? User Control
SAS Yes Procedures exclude all missing by default
Stata Yes Commands exclude all missing by default
SPSS Yes Procedures exclude all missing by default
IDENTICAL:

Summary: Identical vs. Different Features

Features IDENTICAL Across All Three

Feature SAS Stata SPSS
Concept of missing values for numeric
Concept of missing values for character/string
Automatic exclusion from statistics
Can label/format regular values
Functions to detect missing
Options to include missing in tables
Stable across recent versions

Unique Features by Software

SAS Unique Features

  • 28 missing values - Most missing value types (includes ._)
  • Can label standard missing - Only one that can label . (dot)
  • Character formats ($) - Direct character variable formats
  • Complete flexibility - Label all missing types without exception

Stata Unique Features

  • 27 missing values - Extended missing (.a-.z)
  • misstable command - Dedicated missing pattern analysis suite
  • Clean notation - No ._ to remember, simpler than SAS
  • Lowercase display - Consistent lowercase output

SPSS Unique Features

  • Range-based missing - Only one supporting ranges (e.g., -99 THRU -1)
  • 3 string missing values - Most flexible for character variables
  • Direct string labels - Can apply VALUE LABELS directly to strings
  • User-friendly approach - Value-based definition more intuitive

Key Differences Summary

Aspect SAS Stata SPSS
Missing Value Approach Categorical (28 types) Categorical (27 types) Value-based (user-defined)
String Missing Flexibility Low (1 type) Low (1 type) High (3 types)
Range Support No No Yes
Labeling Completeness Complete (all types) Partial (not .) User-defined only
Ease of Use (beginners) Moderate Moderate High
Power (advanced users) High High Moderate

Recommendations by Use Case

Choose SAS if:

Choose Stata if:

Choose SPSS if:

Conclusion

Overall Assessment

IDENTICAL CORE CONCEPTS: All three software packages understand and handle missing values, automatically exclude them from analysis, provide detection functions, and maintain stable implementations across versions.

KEY PHILOSOPHICAL DIVIDE:

FLEXIBILITY RANKING:

  1. SAS: Most missing types (28), can label all types, complete flexibility
  2. Stata: Nearly as powerful (27 types), excellent tools (misstable)
  3. SPSS: Different strengths (ranges, multiple string missing), more user-friendly

BEST PRACTICE: The choice depends on data source (pre-coded missing vs. new collection), complexity needs (simple vs. many missing types), user expertise (beginners vs. advanced), and organizational standards and existing workflows.

All three are fully capable statistical packages with robust missing value support - the differences are in philosophy and implementation details rather than fundamental capability.