Przejdź do głównej zawartości

String literal

Syntax

1
"s-char-sequence(optional)"
2L
"s-char-sequence(optional)"
3u8
"s-char-sequence(optional)"
4u
"s-char-sequence(optional)"
5U
"s-char-sequence(optional)"
6
prefix(optional)R"d-char-sequence(optional)(r-char-sequence(optional))d-char-sequence(optional)"

Explanation

pubs-char-sequenceA sequence of one or more s-chars
pubs-charOne of
pubbasic-s-charA character from the translation character set, except the double-quote ", backslash \, or new-line character
pubprefixOne of L, u8, u, U
pubd-char-sequenceA sequence of one or more d-chars, at most 16 characters long
pubd-charA character from the basic character set, except parentheses, backslash and spaces
pubr-char-sequenceA sequence of one or more r-chars, except that it must not contain the closing sequence)d-char-sequence"
pubr-charA character from the translation character set
  1. Ordinary string literal. The type of an unprefixed string literal is const char[N], where N is the size of the string in code units of the ordinary literal encoding, including the null terminator.
  2. Wide string literal. The type of a L"..." string literal is const wchar_t[N], where N is the size of the string in code units of the wide literal encoding, including the null terminator.
  3. UTF-8 string literal. The type of a u8"..." string literal is const char[N] (do C++20) const char8_t[N] (od C++20), where N is the size of the string in UTF-8 code units including the null terminator.
  4. UTF-16 string literal. The type of a u"..." string literal is const char16_t[N], where N is the size of the string in UTF-16 code units including the null terminator.
  5. UTF-32 string literal. The type of a U"..." string literal is const char32_t[N], where N is the size of the string in UTF-32 code units including the null terminator.
  6. Raw string literal. Used to avoid escaping of any character. Anything between the delimiters becomes part of the string. prefix, if present, has the same meaning as described above. The terminating d-char-sequence is the same sequence of characters as the initial d-char-sequence.

Each s-char (originally from non-raw string literals) or r-char (originally from raw string literals) (od C++11) initializes the corresponding element(s) in the string literal object. An s-char or r-char (od C++11) corresponds to more than one element if and only if it is represented by a sequence of more than one code units in the string literal's associated character encoding. If a character lacks representation in the associated character encoding, the program is ill-formed.  (od C++23)

Each numeric escape sequence corresponds to a single element. If the value specified by the escape sequence fits within the unsigned version of the element type, the element has the specified value (possibly after conversion to the element type); otherwise (the specified value is out of range), the program is ill-formed.  (od C++23)

Concatenation

String literals placed side-by-side are concatenated at translation phase 6 (after the preprocessor). That is, "Hello," " world!" yields the (single) string "Hello, world!". If the two strings have the same encoding prefix (or neither has one), the resulting string will have the same encoding prefix (or no prefix).

If one of the strings has an encoding prefix and the other doesn't, the one that doesn't will be considered to have the same encoding prefix as the other.

L"Δx = %" PRId16 // at phase 4, PRId16 expands to "d"
// at phase 6, L"Δx = %" and "d" form L"Δx = %d"

If a UTF-8 string literal and a wide string literal are side by side, the program is ill-formed.

 (od C++11)

Any other combination of encoding prefixes is conditionally supported with implementation-defined semantics. (No known implementation supports such concatenation.)

 (do C++23)
 (od C++11)

Any other combination of encoding prefixes is ill-formed.

 (od C++23)

Unevaluated strings

The following contexts expect a string literal, but do not evaluate it:

The standard does not specify whether an encoding prefix is allowed in these contexts (except that a literal operator name must not have an encoding prefix) (od C++11). Implementations behave inconsistently. (until C++26)

No encoding prefix is allowed in these contexts.

Each universal character name and each simple escape sequence in an unevaluated string is replaced by the member of the translation character set it denotes. An unevaluated string that contains a numeric escape sequence or a conditional escape sequence is ill-formed.

 (since C++26)

Notes

The null character ('\0', L'\0', char16_t(), etc) is always appended to the string literal: thus, a string literal "Hello" is a const char[6], holding the characters 'H', 'e', 'l', 'l', 'o', and '\0'.

The encoding of ordinary string literals (1) and wide string literals (2) is implementation-defined. For example, gcc selects them with the command line options -fexec-charset and -fwide-exec-charset.

String literals have static storage duration, and thus exist in memory for the life of the program.

String literals can be used to initialize character arrays. If an array is initialized like char str[] = "foo";, str will contain a copy of the string "foo".

Whether string literals can overlap and whether successive evaluations of a string-literal yield the same object is unspecified. That means that identical string literals may or may not compare equal when compared by pointer.

bool b = "bar" == 3 + "foobar"; // could be true or false, implementation-defined

Attempting to modify a string literal results in undefined behavior: they may be stored in read-only storage (such as .rodata) or combined with other string literals:

const char* pc = "Hello";
char* p = const_cast<char*>(pc);
p[0] = 'M'; // undefined behavior
String literals are convertible and assignable to non-const char* or wchar_t* in order to be compatible with C, where string literals are of types char[N] and wchar_t[N]. Such implicit conversion is deprecated. (do C++11)
String literals are not convertible or assignable to non-const CharT*. An explicit cast (e.g. const_cast) must be used if such conversion is wanted. (od C++11)

A string literal is not necessarily a null-terminated character sequence: if a string literal has embedded null characters, it represents an array which contains more than one string.

const char* p = "abc\0def"; // std::strlen(p) == 3, but the array has size 8

If a valid hex digit follows a hex escape in a string literal, it would fail to compile as an invalid escape sequence. String concatenation can be used as a workaround:

//const char* p = "\xfff"; // error: hex escape sequence out of range
const char* p = "\xff""f"; // OK: the literal is const char[3] holding {'\xff','f','\0'}
Feature-test macroValueStdComment
__cpp_char8_t202207L(C++20) (DR)char8_t compatibility and portability fix (allow initialization of (unsigned) char arrays from UTF-8 string literals)
__cpp_raw_strings200710L(C++11)Raw string literals
__cpp_unicode_literals200710L(C++11)Unicode string literals

Example

#include <iostream>

char array1[] = "Foo" "bar";
// same as
char array2[] = {'F', 'o', 'o', 'b', 'a', 'r', '\0'};

const char* s1 = R"foo(
Hello
World
)foo";
// same as
const char* s2 = "\nHello\n World\n";
// same as
const char* s3 = "\n"
"Hello\n"
" World\n";

const wchar_t* s4 = L"ABC" L"DEF"; // OK, same as
const wchar_t* s5 = L"ABCDEF";
const char32_t* s6 = U"GHI" "JKL"; // OK, same as
const char32_t* s7 = U"GHIJKL";
const char16_t* s9 = "MN" u"OP" "QR"; // OK, same as
const char16_t* sA = u"MNOPQR";

// const auto* sB = u"Mixed" U"Types";
// before C++23 may or may not be supported by
// the implementation; ill-formed since C++23

const wchar_t* sC = LR"--(STUV)--"; // OK, raw string literal

int main()
{
std::cout << array1 << ' ' << array2 << '\n'
<< s1 << s2 << s3 << std::endl;
std::wcout << s4 << ' ' << s5 << ' ' << sC
<< std::endl;
}
Result
Foobar Foobar

Hello
World

Hello
World

Hello
World

ABCDEF ABCDEF STUV

Defect reports

The following behavior-changing defect reports were applied retroactively to previously published C++ standards.

DRApplied toBehavior as publishedCorrect behavior
CWG 1759(C++11)A UTF-8 string literal might have code units that are not representable in charchar can represent all UTF-8 code units
CWG 1823(C++98)whether string literals are distinct was implementation-defineddistinctness is unspecified, and same string literal can yield different object
P1854R4(C++23)string literals with non-encodable characters were conditionally-supportedthe program is ill-formed

References

  • C++23 standard (ISO/IEC 14882:2023):
    • 5.13.5 String literals [lex.string]
  • C++20 standard (ISO/IEC 14882:2020):
    • 5.13.5 String literals [lex.string]
  • C++17 standard (ISO/IEC 14882:2017):
    • 5.13.5 String literals [lex.string]
  • C++14 standard (ISO/IEC 14882:2014):
    • 2.14.5 String literals [lex.string]
  • C++11 standard (ISO/IEC 14882:2011):
    • 2.14.5 String literals [lex.string]
  • C++03 standard (ISO/IEC 14882:2003):
    • 2.13.4 String literals [lex.string]
  • C++98 standard (ISO/IEC 14882:1998):
    • 2.13.4 String literals [lex.string]

String literal

Syntax

1
"s-char-sequence(optional)"
2L
"s-char-sequence(optional)"
3u8
"s-char-sequence(optional)"
4u
"s-char-sequence(optional)"
5U
"s-char-sequence(optional)"
6
prefix(optional)R"d-char-sequence(optional)(r-char-sequence(optional))d-char-sequence(optional)"

Explanation

pubs-char-sequenceA sequence of one or more s-chars
pubs-charOne of
pubbasic-s-charA character from the translation character set, except the double-quote ", backslash \, or new-line character
pubprefixOne of L, u8, u, U
pubd-char-sequenceA sequence of one or more d-chars, at most 16 characters long
pubd-charA character from the basic character set, except parentheses, backslash and spaces
pubr-char-sequenceA sequence of one or more r-chars, except that it must not contain the closing sequence)d-char-sequence"
pubr-charA character from the translation character set
  1. Ordinary string literal. The type of an unprefixed string literal is const char[N], where N is the size of the string in code units of the ordinary literal encoding, including the null terminator.
  2. Wide string literal. The type of a L"..." string literal is const wchar_t[N], where N is the size of the string in code units of the wide literal encoding, including the null terminator.
  3. UTF-8 string literal. The type of a u8"..." string literal is const char[N] (do C++20) const char8_t[N] (od C++20), where N is the size of the string in UTF-8 code units including the null terminator.
  4. UTF-16 string literal. The type of a u"..." string literal is const char16_t[N], where N is the size of the string in UTF-16 code units including the null terminator.
  5. UTF-32 string literal. The type of a U"..." string literal is const char32_t[N], where N is the size of the string in UTF-32 code units including the null terminator.
  6. Raw string literal. Used to avoid escaping of any character. Anything between the delimiters becomes part of the string. prefix, if present, has the same meaning as described above. The terminating d-char-sequence is the same sequence of characters as the initial d-char-sequence.

Each s-char (originally from non-raw string literals) or r-char (originally from raw string literals) (od C++11) initializes the corresponding element(s) in the string literal object. An s-char or r-char (od C++11) corresponds to more than one element if and only if it is represented by a sequence of more than one code units in the string literal's associated character encoding. If a character lacks representation in the associated character encoding, the program is ill-formed.  (od C++23)

Each numeric escape sequence corresponds to a single element. If the value specified by the escape sequence fits within the unsigned version of the element type, the element has the specified value (possibly after conversion to the element type); otherwise (the specified value is out of range), the program is ill-formed.  (od C++23)

Concatenation

String literals placed side-by-side are concatenated at translation phase 6 (after the preprocessor). That is, "Hello," " world!" yields the (single) string "Hello, world!". If the two strings have the same encoding prefix (or neither has one), the resulting string will have the same encoding prefix (or no prefix).

If one of the strings has an encoding prefix and the other doesn't, the one that doesn't will be considered to have the same encoding prefix as the other.

L"Δx = %" PRId16 // at phase 4, PRId16 expands to "d"
// at phase 6, L"Δx = %" and "d" form L"Δx = %d"

If a UTF-8 string literal and a wide string literal are side by side, the program is ill-formed.

 (od C++11)

Any other combination of encoding prefixes is conditionally supported with implementation-defined semantics. (No known implementation supports such concatenation.)

 (do C++23)
 (od C++11)

Any other combination of encoding prefixes is ill-formed.

 (od C++23)

Unevaluated strings

The following contexts expect a string literal, but do not evaluate it:

The standard does not specify whether an encoding prefix is allowed in these contexts (except that a literal operator name must not have an encoding prefix) (od C++11). Implementations behave inconsistently. (until C++26)

No encoding prefix is allowed in these contexts.

Each universal character name and each simple escape sequence in an unevaluated string is replaced by the member of the translation character set it denotes. An unevaluated string that contains a numeric escape sequence or a conditional escape sequence is ill-formed.

 (since C++26)

Notes

The null character ('\0', L'\0', char16_t(), etc) is always appended to the string literal: thus, a string literal "Hello" is a const char[6], holding the characters 'H', 'e', 'l', 'l', 'o', and '\0'.

The encoding of ordinary string literals (1) and wide string literals (2) is implementation-defined. For example, gcc selects them with the command line options -fexec-charset and -fwide-exec-charset.

String literals have static storage duration, and thus exist in memory for the life of the program.

String literals can be used to initialize character arrays. If an array is initialized like char str[] = "foo";, str will contain a copy of the string "foo".

Whether string literals can overlap and whether successive evaluations of a string-literal yield the same object is unspecified. That means that identical string literals may or may not compare equal when compared by pointer.

bool b = "bar" == 3 + "foobar"; // could be true or false, implementation-defined

Attempting to modify a string literal results in undefined behavior: they may be stored in read-only storage (such as .rodata) or combined with other string literals:

const char* pc = "Hello";
char* p = const_cast<char*>(pc);
p[0] = 'M'; // undefined behavior
String literals are convertible and assignable to non-const char* or wchar_t* in order to be compatible with C, where string literals are of types char[N] and wchar_t[N]. Such implicit conversion is deprecated. (do C++11)
String literals are not convertible or assignable to non-const CharT*. An explicit cast (e.g. const_cast) must be used if such conversion is wanted. (od C++11)

A string literal is not necessarily a null-terminated character sequence: if a string literal has embedded null characters, it represents an array which contains more than one string.

const char* p = "abc\0def"; // std::strlen(p) == 3, but the array has size 8

If a valid hex digit follows a hex escape in a string literal, it would fail to compile as an invalid escape sequence. String concatenation can be used as a workaround:

//const char* p = "\xfff"; // error: hex escape sequence out of range
const char* p = "\xff""f"; // OK: the literal is const char[3] holding {'\xff','f','\0'}
Feature-test macroValueStdComment
__cpp_char8_t202207L(C++20) (DR)char8_t compatibility and portability fix (allow initialization of (unsigned) char arrays from UTF-8 string literals)
__cpp_raw_strings200710L(C++11)Raw string literals
__cpp_unicode_literals200710L(C++11)Unicode string literals

Example

#include <iostream>

char array1[] = "Foo" "bar";
// same as
char array2[] = {'F', 'o', 'o', 'b', 'a', 'r', '\0'};

const char* s1 = R"foo(
Hello
World
)foo";
// same as
const char* s2 = "\nHello\n World\n";
// same as
const char* s3 = "\n"
"Hello\n"
" World\n";

const wchar_t* s4 = L"ABC" L"DEF"; // OK, same as
const wchar_t* s5 = L"ABCDEF";
const char32_t* s6 = U"GHI" "JKL"; // OK, same as
const char32_t* s7 = U"GHIJKL";
const char16_t* s9 = "MN" u"OP" "QR"; // OK, same as
const char16_t* sA = u"MNOPQR";

// const auto* sB = u"Mixed" U"Types";
// before C++23 may or may not be supported by
// the implementation; ill-formed since C++23

const wchar_t* sC = LR"--(STUV)--"; // OK, raw string literal

int main()
{
std::cout << array1 << ' ' << array2 << '\n'
<< s1 << s2 << s3 << std::endl;
std::wcout << s4 << ' ' << s5 << ' ' << sC
<< std::endl;
}
Result
Foobar Foobar

Hello
World

Hello
World

Hello
World

ABCDEF ABCDEF STUV

Defect reports

The following behavior-changing defect reports were applied retroactively to previously published C++ standards.

DRApplied toBehavior as publishedCorrect behavior
CWG 1759(C++11)A UTF-8 string literal might have code units that are not representable in charchar can represent all UTF-8 code units
CWG 1823(C++98)whether string literals are distinct was implementation-defineddistinctness is unspecified, and same string literal can yield different object
P1854R4(C++23)string literals with non-encodable characters were conditionally-supportedthe program is ill-formed

References

  • C++23 standard (ISO/IEC 14882:2023):
    • 5.13.5 String literals [lex.string]
  • C++20 standard (ISO/IEC 14882:2020):
    • 5.13.5 String literals [lex.string]
  • C++17 standard (ISO/IEC 14882:2017):
    • 5.13.5 String literals [lex.string]
  • C++14 standard (ISO/IEC 14882:2014):
    • 2.14.5 String literals [lex.string]
  • C++11 standard (ISO/IEC 14882:2011):
    • 2.14.5 String literals [lex.string]
  • C++03 standard (ISO/IEC 14882:2003):
    • 2.13.4 String literals [lex.string]
  • C++98 standard (ISO/IEC 14882:1998):
    • 2.13.4 String literals [lex.string]