SAS vs. Stata vs. SPSS - Detailed Analysis
This document provides a detailed comparison of missing values support across three major statistical software packages. The analysis identifies identical features, key differences, and unique capabilities of each platform.
Key Finding: SAS and Stata share a categorical/alphabetic approach with 28 and 27 missing value types respectively, while SPSS uses a fundamentally different value-based definition approach.
| Feature | SAS | Stata | SPSS |
|---|---|---|---|
| Latest Version | 9.4 M9 / Viya 2025.07 | 19 (April 2025) | 31 (June 2025) |
| Numeric Missing Types | 28 total | 27 total | System + User-defined |
| Extended/Special Missing | Yes (27: ._ + .A-.Z) | Yes (26: .a-.z) | No (different approach) |
| Character/String Missing Types | 1 (blank only) | 1 (blank only) | Up to 3 discrete values |
| Range-Based Missing | No | No | Yes (numeric only) |
| Can Label System Missing | Yes | No | N/A |
| Case Sensitivity (coding) | No | No | N/A |
| Software | Total Types | Breakdown | Notes |
|---|---|---|---|
| SAS | 28 | 1 standard (.) + 1 special (._) + 26 letters (.A-.Z) | Most options |
| Stata | 27 | 1 system (.) + 26 letters (.a-.z) | Very similar to SAS |
| SPSS | Variable | 1 system + user-defined (3 discrete OR 1 range + 1 discrete) | Fundamentally different approach |
| Software | Support | Syntax | Display | Purpose |
|---|---|---|---|---|
| SAS | ✓ Yes | ._, .A-.Z (or .a-.z) |
Upper case | Categorical reasons for missingness |
| Stata | ✓ Yes | .a-.z (or .A-.Z) |
Lower case | Categorical reasons for missingness |
| SPSS | ✗ No | N/A | N/A | Uses value-based missing instead |
| Software | Support | Example Syntax | Use Case |
|---|---|---|---|
| SAS | ✗ No | N/A | Cannot define range as missing |
| Stata | ✗ No | N/A | Cannot define range as missing |
| SPSS | ✓ Yes | MISSING VALUES income (-99 THRU -1). |
Define continuous range as missing |
| Software | Sort Order | Comparison Behavior |
|---|---|---|
| SAS | ._ < . < .A < .B < ... < .Z |
All missing > any non-missing number |
| Stata | . < .a < .b < ... < .z |
All missing > any non-missing number |
| SPSS | System missing has special handling | System missing ≠ specific value |
| Software | Missing Types | Representation | Specification |
|---|---|---|---|
| SAS | 1 | Blank: ' ' or " " |
Single space between quotes |
| Stata | 1 | Blank: "" |
Empty string |
| SPSS | Up to 3 | Discrete values | User specifies up to 3 distinct strings |
MISSING VALUES gender ('X', 'NK', '').| Software | Mechanism | Syntax Pattern | Terminology |
|---|---|---|---|
| SAS | PROC FORMAT | proc format; value name val='label'; run; |
Formats |
| Stata | label define | label define name val "label" |
Value labels |
| SPSS | VALUE LABELS | VALUE LABELS var val 'label'. |
Value labels |
| Software | Can Label Missing? | Which Missing? | Limitations |
|---|---|---|---|
| SAS | ✓ Yes | All types (., ._, .A-.Z) | None - complete flexibility |
| Stata | ~ Partial | Extended only (.a-.z) | Cannot label system missing (.) |
| SPSS | ✓ Yes | User-defined missing values | Only user-defined, not system missing |
| Software | Direct Labeling | Approach |
|---|---|---|
| SAS | ✓ Yes | value $formatname ($ prefix for character) |
| Stata | ✗ No | Must encode string to numeric first |
| SPSS | ✓ Yes | Direct VALUE LABELS application to strings |
The following table shows complete, executable code examples for defining all possible missing values with labels in each software package. Each example demonstrates the maximum capabilities of the system.
| Missing Value Type | SAS Code Example | Stata Code Example | SPSS Code Example |
|---|---|---|---|
| NUMERIC VARIABLES - DEFINING MISSING VALUES | |||
| Standard/System Missing |
/* Automatic - always exists */
age = .;
|
// Automatic - always exists
age = .
|
* Automatic - always exists
COMPUTE age = $SYSMIS.
|
| Special Missing: Refused |
age = .A; /* or .a */
|
replace age = .a
|
* Use user-defined value
COMPUTE age = -9.
MISSING VALUES age (-9).
|
| Special Missing: Not Applicable |
age = .B; /* or .b */
|
replace age = .b
|
* Use user-defined value
COMPUTE age = -8.
MISSING VALUES age (-9, -8).
|
| Special Missing: Don't Know |
age = .C; /* or .c */
|
replace age = .c
|
* Use user-defined value
COMPUTE age = -7.
MISSING VALUES age (-9, -8, -7).
|
| All Extended/Special Missing (.A-.Z / .a-.z) |
/* SAS supports 28 total:
. (standard)
._ (underscore - lowest)
.A, .B, .C, ... .Z (26 letters)
*/
age = ._; /* Lowest missing */
age = .A; /* to */
age = .Z; /* 26 letter options */
|
// Stata supports 27 total:
// . (system)
// .a, .b, .c, ... .z (26 letters)
replace age = . // System
replace age = .a // to
replace age = .z // 26 letter options
|
* SPSS: Up to 3 discrete OR
* 1 range + 1 discrete
* Option 1: Three discrete values
MISSING VALUES age (-9, -8, -7).
* Option 2: One range + one value
MISSING VALUES age (-99 THRU -1, 999).
|
| NUMERIC VARIABLES - LABELING MISSING VALUES | |||
| Complete Format/Label Definition |
proc format;
value agefmt
18-25 = 'Young Adult'
26-40 = 'Adult'
41-65 = 'Middle Age'
66-high = 'Senior'
. = 'Unknown'
._ = 'Lowest Missing'
.A = 'Refused'
.B = 'Not Applicable'
.C = 'Don''t Know'
.D = 'Not Reported'
.E = 'Other Missing'
.F-.Z = 'Other Special';
run;
/* Apply format */
format age agefmt.;
|
label define age_lbl ///
18 "18 years" ///
25 "25 years" ///
35 "35 years" ///
.a "Refused" ///
.b "Not Applicable" ///
.c "Don't Know" ///
.d "Not Reported" ///
.e "Other Missing" ///
.f "Invalid" ///
.z "End Missing", modify
// NOTE: System missing (.)
// cannot be labeled in Stata
label values age age_lbl
|
* Define user-defined missing
MISSING VALUES age (-9, -8, -7).
* Label all values including missing
VALUE LABELS age
18 '18 years'
25 '25 years'
35 '35 years'
-9 'Refused'
-8 'Not Applicable'
-7 'Don''t Know'.
* Note: System missing cannot
* be labeled in SPSS
|
| CHARACTER/STRING VARIABLES - DEFINING MISSING VALUES | |||
| String Missing: Blank |
gender = ' '; /* or " " */
/* Blank is the only
character missing */
|
replace gender = ""
// Blank is the only
// string missing
|
COMPUTE gender = ''.
MISSING VALUES gender ('').
|
| String Missing: Multiple Values |
/* NOT SUPPORTED
Only blank ' ' is missing
for character variables */
|
// NOT SUPPORTED
// Only blank "" is missing
// for string variables
|
* SPSS supports up to 3 discrete
MISSING VALUES gender
('X', 'NK', '').
|
| CHARACTER/STRING VARIABLES - LABELING VALUES | |||
| Complete String Format/Label |
proc format;
value $genderfmt
'M' = 'Male'
'F' = 'Female'
'X' = 'Other'
' ' = 'Not Reported'
other = 'Unknown';
run;
format gender $genderfmt.;
|
// Stata requires encoding
// strings to use labels
encode gender, ///
gen(gender_num) ///
label(gender_lbl)
// Then modify the label
label define gender_lbl ///
1 "Male" ///
2 "Female" ///
3 "Other", modify
|
* Direct labeling supported
VALUE LABELS gender
'M' 'Male'
'F' 'Female'
'X' 'Other'
'' 'Not Reported'.
|
| COMPLETE WORKING EXAMPLE - ALL FEATURES | |||
| Full Working Example |
/* Create dataset */
data demo;
age = 25; output;
age = .A; output; /* Refused */
age = .B; output; /* N/A */
age = .; output; /* Unknown */
run;
/* Define formats */
proc format;
value agefmt
low-17 = 'Under 18'
18-65 = 'Adult'
66-high = 'Senior'
. = 'Unknown'
.A = 'Refused'
.B = 'Not Applicable'
.C-.Z = 'Other Missing';
run;
/* Apply and display */
data demo;
set demo;
format age agefmt.;
run;
proc freq data=demo;
tables age / missing;
run;
|
// Create dataset
clear
set obs 4
gen age = .
replace age = 25 in 1
replace age = .a in 2 // Refused
replace age = .b in 3 // N/A
replace age = . in 4 // Unknown
// Define labels
label define age_lbl ///
18 "18 years" ///
25 "25 years" ///
65 "65 years" ///
.a "Refused" ///
.b "Not Applicable" ///
.c "Don't Know" ///
.d-.z "Other Missing", modify
// Apply labels
label values age age_lbl
// Display with missing
tabulate age, missing
|
* Create dataset
DATA LIST FREE / age.
BEGIN DATA
25
-9
-8
.
END DATA.
* Define missing values
MISSING VALUES age (-9, -8).
* Label all values
VALUE LABELS age
18 '18 years'
25 '25 years'
65 '65 years'
-9 'Refused'
-8 'Not Applicable'.
* Display frequencies
FREQUENCIES VARIABLES=age
/ORDER=ANALYSIS.
|
| Software | Function/Test | Detects | Notes |
|---|---|---|---|
| SAS | missing(var) |
All 28 types | Universal check |
var = . |
Standard (.) only | Specific check | |
var <= .Z |
All missing | Range check | |
| Stata | missing(var) |
All 27 types | Universal check |
var == . |
System (.) only | Specific check | |
var < . |
Exclude all missing | Comparison check | |
| SPSS | MISSING(var) |
System + user-defined | Universal check |
SYSMIS(var) |
System missing only | Specific check | |
VALUE(var) |
User-defined missing | Specific check |
| Detection Task | SAS Code | Stata Code | SPSS Code |
|---|---|---|---|
| Check if any missing |
if missing(age) then
flag = 1;
|
gen flag = missing(age)
|
IF MISSING(age) flag=1.
|
| Check for system/standard missing only |
if age = . then
flag_std = 1;
|
gen flag_std = (age == .)
|
IF SYSMIS(age) flag_std=1.
|
| Check for specific special missing |
if age = .A then
flag_refused = 1;
|
gen flag_refused = (age == .a)
|
IF (age = -9) flag_refused=1.
|
| Exclude all missing in condition |
if age <= .Z then delete;
/* Or */
if age < . then process;
|
drop if missing(age)
// Or keep non-missing
keep if age < .
|
SELECT IF NOT MISSING(age).
|
| Count missing by type |
data counts;
set mydata;
if age = . then cnt_std + 1;
if age = .A then cnt_ref + 1;
if age = .B then cnt_na + 1;
run;
|
count if age == .
count if age == .a
count if age == .b
// Or use tabulate
tab age, missing
|
FREQUENCIES age
/MISSING.
|
missing() or MISSING() function available| Software | Automatic Exclusion? | User Control |
|---|---|---|
| SAS | ✓ Yes | Procedures exclude all missing by default |
| Stata | ✓ Yes | Commands exclude all missing by default |
| SPSS | ✓ Yes | Procedures exclude all missing by default |
| Feature | SAS | Stata | SPSS |
|---|---|---|---|
| Concept of missing values for numeric | ✓ | ✓ | ✓ |
| Concept of missing values for character/string | ✓ | ✓ | ✓ |
| Automatic exclusion from statistics | ✓ | ✓ | ✓ |
| Can label/format regular values | ✓ | ✓ | ✓ |
| Functions to detect missing | ✓ | ✓ | ✓ |
| Options to include missing in tables | ✓ | ✓ | ✓ |
| Stable across recent versions | ✓ | ✓ | ✓ |
| Aspect | SAS | Stata | SPSS |
|---|---|---|---|
| Missing Value Approach | Categorical (28 types) | Categorical (27 types) | Value-based (user-defined) |
| String Missing Flexibility | Low (1 type) | Low (1 type) | High (3 types) |
| Range Support | No | No | Yes |
| Labeling Completeness | Complete (all types) | Partial (not .) | User-defined only |
| Ease of Use (beginners) | Moderate | Moderate | High |
| Power (advanced users) | High | High | Moderate |
IDENTICAL CORE CONCEPTS: All three software packages understand and handle missing values, automatically exclude them from analysis, provide detection functions, and maintain stable implementations across versions.
KEY PHILOSOPHICAL DIVIDE:
FLEXIBILITY RANKING:
BEST PRACTICE: The choice depends on data source (pre-coded missing vs. new collection), complexity needs (simple vs. many missing types), user expertise (beginners vs. advanced), and organizational standards and existing workflows.
All three are fully capable statistical packages with robust missing value support - the differences are in philosophy and implementation details rather than fundamental capability.