Introduction
In structured programming, the ability to create clear, distinguishable names for variables, functions, and user-defined types directly impacts code maintainability and error rates. Unlike natural language, where context clarifies ambiguous references, compiled languages require strict adherence to naming conventions to avoid parser conflicts.
This article examines the formal rules governing identifier creation in the C programming language, distinguishes identifiers from keywords, and explains the rationale behind seemingly arbitrary restrictions. You will gain an understanding of character limitations, starting-character constraints, and the special case of underscore-prefixed names.
(toc) #title=(Table of Content)
What Are Identifiers in C Programming
An identifier is a user-defined name assigned to a program entity. Functions, variables, structures, unions, and enumerated types each require unique identifiers within their scope. When a developer writes int currentTemperature;, the word currentTemperature functions as an identifier—the compiler does not inherently understand its meaning, only that it represents a memory location or function entry point.
Unlike keywords such as if, while, or return—which have fixed meanings embedded in the language specification—identifiers carry no predefined semantics. A function named calculateAverage could just as accurately be named x42 from the compiler’s perspective. The semantic value exists solely for human readers.
Keywords vs. Identifiers: Core Distinctions
C reserves exactly 32 keywords (in its original standard) that cannot be redefined as identifiers. Attempting to name a variable int or struct produces a compilation error because the lexer recognizes these sequences as belonging to the language’s fixed vocabulary.
| Feature | Keywords | Identifiers |
|---|---|---|
| Origin | Predefined in language specification | User-defined |
| Semantic meaning | Fixed, understood by compiler | None (arbitrary) |
| Quantity | 32 (C89/C90) | Unlimited |
| Reassignment | Prohibited | Freely assignable |
An identifier named intValue remains valid because the keyword int appears only as a prefix. The compiler distinguishes sequences based on complete token matching, not substring presence.
Rules for Valid Identifier Names
C imposes four distinct restrictions on identifier formation. Violating any rule triggers a compiler error, with the exception of rule three which produces warnings or undefined behavior depending on implementation.
Rule One: Permitted Characters
Identifiers may contain only three categories of characters:
- Lowercase alphabets (a–z)
- Uppercase alphabets (A–Z)
- Numeric digits (0–9)
- The underscore character (
_)
No other symbols are permitted. Characters such as @, #, $, %, &, *, +, -, or space cannot appear anywhere within an identifier.
Valid examples: temperature, dataBuffer_2, _internalFlag, MAX_SIZE
Invalid examples: data-buffer (hyphen not allowed), temp@25 (@ not allowed), user name (space not allowed)
Rule Two: First Character Restriction
An identifier cannot begin with a numeric digit. The first character must be either an alphabet (uppercase or lowercase) or an underscore.
Valid starts: count, _count, Value9, _2ndIteration
Invalid starts: 9count, 2ndValue, 007agent
The rationale relates to parser efficiency—when a lexer encounters a digit, it immediately shifts to numeric literal interpretation mode. Allowing digits at the start would create ambiguous token boundaries.
Rule Three: Underscore Convention (Compiler Advisory)
Identifiers beginning with an underscore followed by either another underscore or an uppercase letter are reserved for compiler and standard library implementations. The C standard specifies this reservation but does not require compilers to reject violations consistently.
Why this matters: A compiler’s internal symbol table uses names such as _Exit, __FILE__, and _Thread_local. A programmer-defined identifier _buffer may conflict with an existing implementation-specific symbol on certain platforms, causing undefined behavior that manifests as linking errors or runtime corruption.
Recommendation: Never begin identifiers with double underscores (__) or a single underscore followed by an uppercase letter (_Uppercase). For identifiers beginning with a single underscore and a lowercase letter (e.g., _localVar), the risk varies by compiler; most professionals avoid the pattern entirely for portability.
Rule Four: Single-Word Requirement
Identifiers cannot contain whitespace. Multi-word concepts must be represented using either underscores (first_name) or camel case (firstName). Spaces cause the lexer to split the sequence into multiple tokens, producing syntax errors.
Practical Examples and Common Errors
The following table demonstrates valid and invalid identifier declarations:
| Identifier | Valid? | Reason |
|---|---|---|
sum |
Yes | Alphabets only |
sum_of_values |
Yes | Underscore allowed as separator |
_value |
Valid but discouraged | Begins with underscore (rule three advisory) |
1stElement |
No | Begins with digit (rule two violation) |
element@2 |
No | @ symbol prohibited (rule one violation) |
int |
No | Keywords cannot be identifiers |
INT |
Yes | Uppercase differs from lowercase keyword |
my value |
No | Space character prohibited (rule four) |
_2a |
Valid but discouraged | Begins with underscore; digits allowed after first character |
a__b |
Yes | Multiple underscores permitted |
Case Sensitivity and Length Considerations
C identifiers are case-sensitive. The names temperature, Temperature, and TEMPERATURE refer to three distinct entities. This feature enables deliberate naming patterns—constants often use uppercase (MAX_BUFFER), while variables use lowercase or camel case.
The C standard guarantees that identifiers are significant for at least 31 characters for external names (functions and global variables) and 63 characters for internal names (local variables). Most modern compilers support significantly longer names, but portability considerations suggest keeping identifiers under these limits for cross-platform code.
Best Practices for Professional Code
- Use meaningful names—
currentSensorReadingcommunicates intent wherexdoes not - Follow consistent case conventions—choose snake_case or camelCase and apply uniformly
- Avoid leading underscores—eliminates potential conflicts with compiler internals
- Reserve ALL_CAPS for macros—prevents naming collisions with preprocessor symbols
- Keep names under 31 characters—ensures maximum portability across older compilers
Frequently Asked Questions
Conclusion
The identifier naming rules in C balance three competing requirements: parser simplicity, programmer flexibility, and compiler implementation freedom. The restriction to alphanumeric characters plus underscore reduces lexer complexity. The prohibition on digit-first names resolves an otherwise ambiguous tokenization boundary. The reservation of underscore-prefixed names preserves space for compiler evolution without breaking existing user code.
Understanding these rules transforms identifier selection from arbitrary guesswork into systematic engineering. A developer who internalizes these constraints writes code that not only compiles reliably across platforms but also communicates intent clearly to other human readers—the primary audience for any source file.