Escape sequences

Escape sequences are used to represent certain special characters within string literals and character literals.

The following escape sequences are available:

Escape sequence

Description

Representation

Simple escape sequences

single quote

byte 0x27 in ASCII encoding

double quote

byte 0x22 in ASCII encoding

question mark

byte 0x3f in ASCII encoding

backslash

byte 0x5c in ASCII encoding

audible bell

byte 0x07 in ASCII encoding

backspace

byte 0x08 in ASCII encoding

form feed - new page

byte 0x0c in ASCII encoding

line feed - new line

byte 0x0a in ASCII encoding

carriage return

byte 0x0d in ASCII encoding

horisontal tab

byte 0x09 in ASCII encoding

vertical tab

byte 0x0b in ASCII encoding

Numeric escape sequences

\nnn
\o{n...} (since C++23)

arbitrary octal value

code unit nnn (1~3 octal digits)

code unit n... (arbitrary number of octal digits)

\xn...
\x{n...} (since C++23)

arbitrary hexadecimal value

code unit n... (arbitrary number of hexadecimal digits)

Conditional escape sequences^[1]

Implementation-defined

Universal character names

\unnnn
\u{n..} (since C++23)
\Unnnnnnnn

arbitrary Unicode value;
may result in several code units

code point U+nnnn (4 hexadecimal digits)

code point U+n... (arbitrary number of hexadecimal digits)

code point U+nnnnnnnn (8 hexadecimal digits)

\N{NAME} (since C++23)

arbitrary Unicode character

character named by NAME (see below)

↑ Conditional escape sequences are conditionally-supported. The character c in each conditional escape sequence is a member of basic source character set (until C++23)basic character set (since C++23) that is not the character following the \ in any other escape sequence.

Range of universal character names

If a universal character name corresponds to a code point that is not 0x24 ($), 0x40 (@), nor 0x60 (`) and less than 0xA0, the program is ill-formed. In other words, members of basic source character set and control characters (in ranges 0x0-0x1F and 0x7F-0x9F) cannot be expressed in universal character names.

(until C++11)

If a universal character name corresponding to a code point of a member of basic source character set or control characters appear outside a character or string literal, the program is ill-formed.

If a universal character name corresponds surrogate code point (the range 0xD800-0xDFFF, inclusive), the program is ill-formed.

If a universal character name used in a UTF-16/32 string literal does not correspond to a code point in ISO/IEC 10646 (the range 0x0-0x10FFFF, inclusive), the program is ill-formed.

(since C++11) (until C++20)

If a universal character name corresponding to a code point of a member of basic source character set or control characters appear outside a character or string literal, the program is ill-formed.

If a universal character name does not correspond to a code point in ISO/IEC 10646 (the range 0x0-0x10FFFF, inclusive) or corresponds to a surrogate code point (the range 0xD800-0xDFFF, inclusive), the program is ill-formed.

(since C++20) (until C++23)

If a universal character name corresponding to a scalar value of a character in the basic character set or a control character appear outside a character or string literal, the program is ill-formed.

If a universal character name does not correspond to a scalar value of a character in the translation character set, the program is ill-formed.

(since C++23)

Named universal character escapes

{

n-char-sequence

}

pub	n-char-sequence	one or more n-chars
pub	n-char	a character from the translation character set, except the right curly bracket `}` or new-line character

A universal character name of the syntax above is a named universal character. It designates the corresponding character in the Unicode Standard (chapter 4.8 Name) if the n-char-sequence is equal to its character name or to one of its character name aliases of type “control”, “correction”, or “alternate”; otherwise, the program is ill-formed.

These aliases are listed in the Unicode Character Database’s NameAliases.txt. None of these names or aliases have leading or trailing spaces.

A valid n-char-sequence must contain only uppercase Latin letters A through Z, digits, space, and hyphen-minus. Other characters never occur in a Unicode character name, and thus their appearance in a n-char-sequence always renders the program ill-formed.

(since C++23)

Notes

\0 is the most commonly used octal escape sequence, because it represents the terminating null character in null-terminated strings.

The new-line character \n has special meaning when used in text mode I/O: it is converted to the OS-specific newline representation, usually a byte or byte sequence. Some systems mark their lines with length fields instead.

Octal escape sequences have a limit of three octal digits, but terminate at the first character that is not a valid octal digit if encountered sooner.

Hexadecimal escape sequences have no length limit and terminate at the first character that is not a valid hexadecimal digit. If the value represented by a single hexadecimal escape sequence does not fit the range of values represented by the character type used in this string literal (char, char8_t (since C++20), char16_t, char32_t (since C++11), or wchar_t), the result is unspecified.

A universal character name in a narrow string literal or a 16-bit string literal may map to more than one code unit, e.g. \U0001f34c is 4 char code units in UTF-8 (\xF0\x9F\x8D\x8C) and 2 char16_t code units in UTF-16 (\xD83C\xDF4C).

(since C++11)

The question mark escape sequence \? is used to prevent trigraphs from being interpreted inside string literals: a string such as "??/" is compiled as "", but if the second question mark is escaped, as in "?\?/", it becomes "??/". As trigraphs have been removed from C++, the question mark escape sequence is no longer necessary. It is preserved for compatibility with C++14 (and former revisions) and C. (since C++17)

Feature-test macro	Value	Std	Comment
__cpp_named_character_escapes	202207L	(C++23)	Named universal character escapes

Example

#include <iostream>
 
int main()
{
    std::cout << "This\nis\na\ntest\n\n";
    std::cout << "She said, \"Sells she seashells on the seashore?\"\n";
}

Result
This
is
a
test
 
She said, "Sells she seashells on the seashore?"

Defect Reports

The following behavior-changing defect reports were applied retroactively to previously published

DR	Applied to	Behavior as published	Correct behavior
CWG 505	(C++98)	the behavior was undefined if the character following a backslash was not one of those specified in the table	made conditionally supported (semantic is implementation-defined)

Escape sequences

Escape sequences are used to represent certain special characters within string literals and character literals.

The following escape sequences are available:

Escape sequence

Description

Representation

Simple escape sequences

single quote

byte 0x27 in ASCII encoding

double quote

byte 0x22 in ASCII encoding

question mark

byte 0x3f in ASCII encoding

backslash

byte 0x5c in ASCII encoding

audible bell

byte 0x07 in ASCII encoding

backspace

byte 0x08 in ASCII encoding

form feed - new page

byte 0x0c in ASCII encoding

line feed - new line

byte 0x0a in ASCII encoding

carriage return

byte 0x0d in ASCII encoding

horisontal tab

byte 0x09 in ASCII encoding

vertical tab

byte 0x0b in ASCII encoding

Numeric escape sequences

\nnn
\o{n...} (since C++23)

arbitrary octal value

code unit nnn (1~3 octal digits)

code unit n... (arbitrary number of octal digits)

\xn...
\x{n...} (since C++23)

arbitrary hexadecimal value

code unit n... (arbitrary number of hexadecimal digits)

Conditional escape sequences^[1]

Implementation-defined

Universal character names

\unnnn
\u{n..} (since C++23)
\Unnnnnnnn

arbitrary Unicode value;
may result in several code units

code point U+nnnn (4 hexadecimal digits)

code point U+n... (arbitrary number of hexadecimal digits)

code point U+nnnnnnnn (8 hexadecimal digits)

\N{NAME} (since C++23)

arbitrary Unicode character

character named by NAME (see below)

↑ Conditional escape sequences are conditionally-supported. The character c in each conditional escape sequence is a member of basic source character set (until C++23)basic character set (since C++23) that is not the character following the \ in any other escape sequence.

Range of universal character names

(until C++11)

If a universal character name corresponding to a code point of a member of basic source character set or control characters appear outside a character or string literal, the program is ill-formed.

If a universal character name corresponds surrogate code point (the range 0xD800-0xDFFF, inclusive), the program is ill-formed.

If a universal character name used in a UTF-16/32 string literal does not correspond to a code point in ISO/IEC 10646 (the range 0x0-0x10FFFF, inclusive), the program is ill-formed.

(since C++11) (until C++20)

If a universal character name corresponding to a code point of a member of basic source character set or control characters appear outside a character or string literal, the program is ill-formed.

(since C++20) (until C++23)

If a universal character name corresponding to a scalar value of a character in the basic character set or a control character appear outside a character or string literal, the program is ill-formed.

If a universal character name does not correspond to a scalar value of a character in the translation character set, the program is ill-formed.

(since C++23)

Named universal character escapes

{

n-char-sequence

}

pub	n-char-sequence	one or more n-chars
pub	n-char	a character from the translation character set, except the right curly bracket `}` or new-line character

These aliases are listed in the Unicode Character Database’s NameAliases.txt. None of these names or aliases have leading or trailing spaces.

(since C++23)

Notes

\0 is the most commonly used octal escape sequence, because it represents the terminating null character in null-terminated strings.

Octal escape sequences have a limit of three octal digits, but terminate at the first character that is not a valid octal digit if encountered sooner.

(since C++11)

Feature-test macro	Value	Std	Comment
__cpp_named_character_escapes	202207L	(C++23)	Named universal character escapes

Example

#include <iostream>
 
int main()
{
    std::cout << "This\nis\na\ntest\n\n";
    std::cout << "She said, \"Sells she seashells on the seashore?\"\n";
}

Result
This
is
a
test
 
She said, "Sells she seashells on the seashore?"

Defect Reports

The following behavior-changing defect reports were applied retroactively to previously published

DR	Applied to	Behavior as published	Correct behavior
CWG 505	(C++98)	the behavior was undefined if the character following a backslash was not one of those specified in the table	made conditionally supported (semantic is implementation-defined)

Escape sequences

Range of universal character names​

Named universal character escapes​

Notes​

Example​

Defect Reports​

Escape sequences

Range of universal character names​

Named universal character escapes​

Notes​

Example​

Defect Reports​

Range of universal character names

Named universal character escapes

Notes

Example

Defect Reports

Range of universal character names

Named universal character escapes

Notes

Example

Defect Reports