String
The String wrapper’s primary purpose is to provide a safe and lightweight way of working with both char8_t and char strings. It helps you avoid common pitfalls while keeping your code simple and efficient, especially when working with string literals.
A typical use case looks like this:
// Use a u8"" string literal.
auto rooms = value->getSectionListOrThrow(u8"room");
// It also works with plain char "" string literals.
level->_title = value->getOrThrow<std::string>("title");
// Prefer constexpr std::u8string_view for identifiers.
static constexpr std::u8string_view cNameColor = u8"color";
if (value->hasValue(cNameColor)) {
settings->_colorEnabled = value->getBooleanOrThrow(cNameColor);
}
String literals (like u8"...") have a known length at compile time, which allows the compiler to optimize access. However, when the string length is undetermined at runtime, you need to provide an explicit wrapper:
// This looks like it should work...
constexpr auto cName = u8"name"; // but this is just a "const char8_t*"
auto name = value->getOrThrow(cName); // ❌ ERROR: Ambiguous overload
// Must be written like this:
auto name = value->getOrThrow(std::u8string{cName}); // ✅ OK
// ... or, even better, declare it as a string_view:
constexpr std::u8string_view cName = u8"name";
auto name = value->getOrThrow(cName); // ✅ OK
Usage
1auto str = String{u8"text"}; // unchecked, assuming valid UTF-8
2// ...
3std::cout << str.toCharString();
1void processData(std::string_view view) {
2 auto str = String::fromCharString(view); // checked, throws on encoding errors
3 // ...
4}
Interface
-
class String
Thin wrapper around
std::u8string.The class mirrors the API of
std::u8stringclosely and adds a few convenience functions. It is primarily intended to ease the integration of the parser into applications that usestd::stringfor text processing.- Tested:
StringTest,StringUtf8Test
String Conversion
-
inline std::string toCharString() const noexcept
Convert the wrapped string into a
charbasedstd::string.This helper performs the required conversion from
char8_ttocharand is primarily meant for interoperability with APIs that expect a regularstd::string.- Returns:
A
charbased string.
Public Types
-
using ConstByteSpan = std::span<const std::byte>
A span of bytes referencing the underlying data of the string.
Public Functions
-
template<std::size_t N>
inline constexpr String(const char8_t (&literal)[N]) noexcept Construct from a UTF-8 string literal.
- Template Parameters:
N – The length of the literal including the null terminator.
- Parameters:
literal – The UTF-8 literal to copy.
-
inline constexpr String(const char8_t *str, const std::size_t size) noexcept
Construct from a UTF-8 character pointer and size.
- Parameters:
str – Pointer to UTF-8 characters.
size – Number of characters to read.
-
inline constexpr String(const std::u8string_view str) noexcept
Construct from a UTF-8 string view.
- Parameters:
str – The UTF-8 string view to copy.
-
inline explicit constexpr String(const WrappedString &str) noexcept
Copy construct from the underlying UTF-8 string.
- Parameters:
str – The string to wrap.
-
inline constexpr String(WrappedString &&str) noexcept
Move construct from the underlying UTF-8 string.
- Parameters:
str – The string to move from.
-
inline constexpr String(std::size_t count, value_type c) noexcept
Construct a string with a repeated character.
- Parameters:
count – Number of characters.
c – The character to repeat.
-
template<typename InputIt>
inline constexpr String(InputIt begin, InputIt end) noexcept Construct a string from a character range.
- Template Parameters:
InputIt – Input iterator type.
- Parameters:
begin – Iterator to the first character.
end – Iterator to one-past-last character.
-
template<std::size_t N>
inline constexpr String(const char (&literal)[N]) noexcept Construct from a null-terminated string.
- Template Parameters:
N – The length of the literal including the null terminator.
- Parameters:
literal – The literal to copy.
-
inline String(const char *str, const std::size_t size) noexcept
Construct from a character pointer and size.
- Parameters:
str – Pointer to narrow characters.
size – Number of characters to read.
-
inline String(const std::string_view str) noexcept
Construct from a standard string view.
- Parameters:
str – The standard string view to the string to copy.
-
inline explicit String(const std::string &str) noexcept
Construct from a standard narrow string.
- Parameters:
str – The std::string to convert.
-
inline constexpr String(std::size_t count, char c) noexcept
Construct a string with a repeated narrow character.
- Parameters:
count – Number of characters.
c – The character to repeat.
-
String() = default
Default constructor.
-
~String() = default
Default destructor.
-
inline String &operator+=(const String &other) noexcept
Append another string to this string.
- Parameters:
other – The string to append.
- Returns:
Reference to this string.
-
inline String &operator+=(char8_t c) noexcept
Append a character to this String.
- Parameters:
c – The character to append.
- Returns:
Reference to this String.
-
inline String &operator+=(char c) noexcept
Append a character to this String.
- Parameters:
c – The character to append.
- Returns:
Reference to this String.
-
template<std::size_t N>
inline bool operator==(const char8_t (&literal)[N]) const noexcept Compare this String to a UTF-8 literal for equality.
- Template Parameters:
N – The size of the literal including null terminator.
- Parameters:
literal – The UTF-8 literal to compare against.
- Returns:
trueif the literal matches exactly.
-
template<std::size_t N>
inline bool operator!=(const char8_t (&literal)[N]) const noexcept Compare this String to a UTF-8 literal for inequality.
- Template Parameters:
N – The size of the literal including null terminator.
- Parameters:
literal – The UTF-8 literal to compare against.
- Returns:
trueif the literal does not match.
-
template<std::size_t N>
inline String operator+(const char8_t (&literal)[N]) const noexcept Concatenate a UTF-8 literal to this String.
- Template Parameters:
N – The size of the literal including null terminator.
- Parameters:
literal – The UTF-8 literal to append.
- Returns:
A new String with the literal appended.
-
inline String operator+(const std::u8string &other) const noexcept
Concatenate a std::u8string to this String.
- Parameters:
other – The u8string to append.
- Returns:
A new String with the contents appended.
-
template<std::size_t N>
inline String &operator+=(const char8_t (&literal)[N]) noexcept Append a UTF-8 literal to this String.
- Template Parameters:
N – The size of the literal including null terminator.
- Parameters:
literal – The UTF-8 literal to append.
- Returns:
Reference to this String.
-
inline String &operator+=(const std::u8string &other) noexcept
Append a std::u8string to this String.
- Parameters:
other – The u8string to append.
- Returns:
Reference to this String.
-
template<std::size_t N>
inline bool operator==(const char (&literal)[N]) const noexcept Compare this String to a narrow literal for equality.
- Template Parameters:
N – The size of the literal including null terminator.
- Parameters:
literal – The narrow literal to compare.
- Returns:
trueif the literal matches exactly.
-
template<std::size_t N>
inline bool operator!=(const char (&literal)[N]) const noexcept Compare this String to a narrow literal for inequality.
- Template Parameters:
N – The size of the literal including null terminator.
- Parameters:
literal – The narrow literal to compare.
- Returns:
trueif the literal does not match.
-
template<std::size_t N>
inline String operator+(const char (&literal)[N]) const noexcept Concatenate a narrow literal to this String.
- Template Parameters:
N – The size of the literal including null terminator.
- Parameters:
literal – The narrow literal to append.
- Returns:
A new String with the literal appended.
-
inline String operator+(const std::string &other) const noexcept
Concatenate a std::string to this String.
- Parameters:
other – The std::string to append.
- Returns:
A new String with the contents appended.
-
template<std::size_t N>
inline String &operator+=(const char (&literal)[N]) noexcept Append a narrow literal to this String.
- Template Parameters:
N – The size of the literal including null terminator.
- Parameters:
literal – The narrow literal to append.
- Returns:
Reference to this String.
-
inline String &operator+=(const std::string &other) noexcept
Append a std::string to this String.
- Parameters:
other – The std::string to append.
- Returns:
Reference to this String.
-
inline String &operator+=(const char32_t unicodeChar) noexcept
Append a single Unicode character to this String.
- Parameters:
unicodeChar – The character to append.
- Returns:
Reference to this String.
-
inline constexpr size_type length() const noexcept
Get the number of characters in this String.
- Returns:
The length of the string.
-
inline constexpr size_type max_size() const noexcept
Get the maximum number of characters this String can hold.
- Returns:
The maximum possible size.
-
inline void reserve(size_type size) noexcept
Reserve storage to at least the specified capacity.
- Parameters:
size – The minimum capacity to reserve.
-
inline void shrink_to_fit() noexcept
Reduce memory usage to fit the current size.
-
inline constexpr size_type capacity() const noexcept
Get the current capacity of the String.
- Returns:
The allocated storage size.
-
inline void append(const String &other) noexcept
Append another String to this one.
- Parameters:
other – The String to append.
-
inline void append(const char8_t character) noexcept
Append a character to this String.
- Parameters:
character – The character to append.
-
inline void append(const char character) noexcept
Append a character to this String.
- Parameters:
character – The character to append.
-
template<std::size_t N>
inline void append(const char8_t (&literal)[N]) noexcept Append a UTF-8 literal to this String.
- Template Parameters:
N – The literal length including null terminator.
- Parameters:
literal – The UTF-8 literal to append.
-
inline void append(const std::u8string &str) noexcept
Append a std::u8string to this String.
- Parameters:
str – The u8string to append.
-
inline void append(const std::u8string_view str) noexcept
Append a UTF-8 string view to this String.
- Parameters:
str – The u8string_view to append.
-
inline void append(const std::string &str) noexcept
Append a std::string to this String.
- Parameters:
str – The std::string to append.
-
inline void append(const std::string_view str) noexcept
Append a std::string_view to this String.
- Parameters:
str – The string_view to append.
-
inline void append(const char32_t unicodeChar) noexcept
Append a Unicode character to this String.
- Parameters:
unicodeChar – The Unicode character to append.
-
inline String substr(size_type pos = 0, size_type count = npos) const
Extract a substring from this String.
- Parameters:
pos – The starting index.
count – The number of characters.
- Returns:
The extracted substring.
-
inline String &erase(size_type index = 0, size_type count = npos) noexcept
Erase a substring from the string.
- Parameters:
index – The starting index to begin erasure.
count – The number of characters to erase.
- Returns:
Reference to this string after erasure.
-
inline iterator erase(iterator position) noexcept
Erase the character at the specified position.
- Parameters:
position – Iterator to the character to remove.
- Returns:
Iterator following the removed character.
-
inline iterator erase(const_iterator position) noexcept
Erase the character at the specified position.
- Parameters:
position – Iterator to the character to remove.
- Returns:
Iterator following the removed character.
-
inline iterator erase(iterator first, iterator last) noexcept
Erase a range of characters from the string.
- Parameters:
first – Iterator to the first character to remove.
last – Iterator past the last character to remove.
- Returns:
Iterator following the last removed character.
-
inline iterator erase(const_iterator first, const_iterator last) noexcept
Erase a range of characters from the string.
- Parameters:
first – Iterator to the first character to remove.
last – Iterator past the last character to remove.
- Returns:
Iterator following the last removed character.
-
template<typename FindStr>
inline size_type find(FindStr s, size_type pos, size_type count) const Find the first occurrence of a substring in the string.
- Template Parameters:
FindStr – Type of the search string.
- Parameters:
s – The substring to search for.
pos – The starting position of the search.
count – The number of characters of the substring.
- Returns:
The index of the first occurrence, or npos if not found.
-
template<typename FindStr>
inline size_type find(FindStr s, size_type pos = 0) const This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
-
template<typename FindStr>
inline size_type rfind(FindStr s, size_type pos, size_type count) const Find the last occurrence of a substring in the string.
- Template Parameters:
FindStr – Type of the search string.
- Parameters:
s – The substring to search for.
pos – The starting position of the search.
count – The number of characters of the substring.
- Returns:
The index of the last occurrence, or npos if not found.
-
template<typename FindStr>
inline size_type rfind(FindStr s, size_type pos = npos) const This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
-
template<typename FindStr>
inline size_type find_first_of(FindStr s, size_type pos, size_type count) const Find the first occurrence of any character from a set.
- Template Parameters:
FindStr – Type of the search set string.
- Parameters:
s – The set of characters to search for.
pos – The starting position of the search.
count – The number of characters in the set.
- Returns:
The index of the first matching character, or npos if not found.
-
template<typename FindStr>
inline size_type find_first_of(FindStr s, size_type pos = 0) const This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
-
template<typename FindStr>
inline size_type find_first_not_of(FindStr s, size_type pos, size_type count) const Find the first character not in a set.
- Template Parameters:
FindStr – Type of the search set string.
- Parameters:
s – The set of characters to exclude.
pos – The starting position of the search.
count – The number of characters in the set.
- Returns:
The index of the first non-matching character, or npos if none.
-
template<typename FindStr>
inline size_type find_first_not_of(FindStr s, size_type pos = 0) const This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
-
template<typename FindStr>
inline size_type find_last_of(FindStr s, size_type pos, size_type count) const Find the last occurrence of any character from a set.
- Template Parameters:
FindStr – Type of the search set string.
- Parameters:
s – The set of characters to search for.
pos – The starting position of the search.
count – The number of characters in the set.
- Returns:
The index of the last matching character, or npos if not found.
-
template<typename FindStr>
inline size_type find_last_of(FindStr s, size_type pos = npos) const This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
-
template<typename FindStr>
inline size_type find_last_not_of(FindStr s, size_type pos, size_type count) const Find the last character not in a set.
- Template Parameters:
FindStr – Type of the search set string.
- Parameters:
s – The set of characters to exclude.
pos – The starting position of the search.
count – The number of characters in the set.
- Returns:
The index of the last non-matching character, or npos if none.
-
template<typename FindStr>
inline size_type find_last_not_of(FindStr s, size_type pos = npos) const This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
-
template<typename FindStr>
inline bool starts_with(FindStr s) const noexcept Check if the string starts with a given prefix.
Note
This does a binary comparison to follow the original functionality of the std-library. For a Unicode code-point-based variant, please use
startsWith.- Template Parameters:
FindStr – Type of the prefix string.
- Parameters:
s – The prefix to check.
- Returns:
trueif the string starts with the prefix,falseotherwise.
-
template<typename FindStr>
inline bool ends_with(FindStr s) const noexcept Check if the string ends with a given suffix.
Note
This does a binary comparison to follow the original functionality of the std-library. For a Unicode code-point-based variant, please use
endsWith.- Template Parameters:
FindStr – Type of the suffix string.
- Parameters:
s – The suffix to check.
- Returns:
trueif the string ends with the suffix,falseotherwise.
-
template<typename FindStr>
inline bool contains(FindStr s) const noexcept Check if the string contains a given substring.
Note
This does a binary comparison to follow the original functionality of the std-library. To force the Unicode code-point-based variant, please use
containswithString.- Template Parameters:
FindStr – Type of the search string.
- Parameters:
s – The substring to search for.
- Returns:
trueif the substring is found,falseotherwise.
-
bool isValidUtf8() const noexcept
Test if the contained UTF-8 data is valid.
Tests for out-of-range Unicode code-points and UTF-8 encoding errors.
- Returns:
trueif the string contains valid UTF-8,falseotherwise.
-
std::size_t characterLength() const
Get the number of Unicode code-points (characters).
Decodes the string and counts the characters.
- Throws:
Error – (Encoding) if the string contains invalid UTF-8 data.
- Returns:
The number of characters in this string.
-
auto characterCompare(const String &other, CaseSensitivity caseSensitivity = CaseSensitivity::CaseSensitive) const -> std::strong_ordering
Compare this string with another using Unicode code-points.
- Parameters:
other – The other string to compare with.
caseSensitivity – Controls whether the comparison is case-sensitive.
- Throws:
Error – (Encoding) if either string contains invalid UTF-8 data.
- Returns:
The three-way comparison result.
-
std::strong_ordering nameCompare(const String &other) const
Compare this string with another using name normalization Example: “this_name” == “THIS_NAME” == “This Name”.
- Parameters:
other – The other string to compare with.
- Throws:
Error – (Encoding) if either string contains invalid UTF-8 data.
- Returns:
The three-way comparison result.
-
bool nameEquals(std::string_view name) const
Test if this string matches the given name.
This is a convenience function for case-insensitive name comparison using string literals. Same as
str.nameCompare(u8"name") == std::strong_ordering::equal, but without extra copy. Example: “this_name” == “THIS_NAME” == “This Name” @important Unlike the parser’s name handling, this function never fails on invalid names. Characters that are not valid name characters are preserved as-they-are. Consecutive spaces or tabs inside the name are replaced with the same number of underscores.- Parameters:
name – The name to compare with.
- Throws:
Error – (Encoding) if either string contains invalid UTF-8 data.
- Returns:
trueif this string matches the given name,falseotherwise.
-
bool nameEquals(std::u8string_view name) const
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
-
auto startsWith(const String &other, CaseSensitivity caseSensitivity = CaseSensitivity::CaseSensitive) const -> bool
Test if this string starts with another string using Unicode code-points.
- Parameters:
other – The other string for the comparison.
caseSensitivity – Controls case sensitivity.
- Throws:
Error – (Encoding) if either string contains invalid UTF-8 data.
- Returns:
trueif this string starts withother.
-
auto contains(const String &other, CaseSensitivity caseSensitivity = CaseSensitivity::CaseSensitive) const -> bool
Test if this string contains another string using Unicode code-points.
- Parameters:
other – The substring to search for.
caseSensitivity – Controls case sensitivity.
- Throws:
Error – (Encoding) if either string contains invalid UTF-8 data.
- Returns:
trueif this string containsotheras a contiguous subsequence.
-
auto endsWith(const String &other, CaseSensitivity caseSensitivity = CaseSensitivity::CaseSensitive) const -> bool
Test if this string ends with another string using Unicode code-points.
- Parameters:
other – The other string for the comparison.
caseSensitivity – Controls case sensitivity.
- Throws:
Error – (Encoding) if either string contains invalid UTF-8 data.
- Returns:
trueif this string ends withother.
-
StringList split(char32_t character, std::optional<std::size_t> maxSplits = {}) const
Split this string at a character using Unicode code-points.
Empty segments are included in the result. If
maxSplitsis set, at most that many splits are performed, and the remaining text is returned as the final segment.- Parameters:
character – The separator character to split at.
maxSplits – Optional maximum number of splits.
- Throws:
Error – (Encoding) if the string contains invalid UTF-8 data.
- Returns:
The list of split segments.
-
String join(const StringList &parts) const
Join string parts using this string as the glue.
- Parameters:
parts – The parts to join.
- Returns:
The joined string.
-
String trimmed() const
Return a string with all spacing (space, tab) removed from beginning and end.
- Returns:
A new string with the spacing removed.
-
String transformed(const std::function<char32_t(char32_t)> &transformer) const
Transform this string by applying a function to each decoded character.
- Parameters:
transformer – A function that transforms one character into another.
- Throws:
Error – (Encoding) if the string contains invalid UTF-8 data.
- Returns:
A new string with all characters transformed.
-
void forEachCharacter(const std::function<void(char32_t)> &fn) const
Call a function for each decoded character.
- Parameters:
fn – The function to call for each decoded character.
- Throws:
Error – (Encoding) if the string contains invalid UTF-8 data.
-
String toSafeText(std::size_t maximumSize = 200) const
Convert text to be safe for output or logs.
Escapes characters that may disrupt display or have side effects and truncates to the given number of characters, inserting an ellipsis in the middle when necessary.
- Parameters:
maximumSize – The maximum number of characters (default 200).
- Returns:
The safe text.
-
std::size_t escapedSize(EscapeMode mode) const noexcept
Get the byte size of the escaped string.
Use this function to calculate the size requirements of an escaped string, without the actual conversion.
- Returns:
The byte size of the escaped text (without trailing zero end byte).
-
String toEscaped(EscapeMode mode) const noexcept
Create an escaped version of this string.
- Parameters:
mode – The escape mode to use for escaping.
-
String toNameNormalized() const noexcept
Convert this string to its normalized name form.
Spaces are converted to underscores and ASCII uppercase letters are converted to lowercase. @important Unlike the parser’s normalization, this function never fails on invalid input. Characters that are not valid name characters are preserved as-is. Consecutive spaces or tabs inside the name are replaced with the same number of underscores.
Public Static Attributes
-
static constexpr auto npos = WrappedString::npos
Constant representing an invalid or not-found position.
-
enum class erbsland::conf::EscapeMode : uint8_t
Escaping modes.
- Not Tested:
Tested via
CharandString.
Values:
-
enumerator Text
Escaping for double-quoted text.
See reference documentation, chapter Text. Even allowed, the tab character is escaped as well. - Escape characters U+0000-U+001F, <code>\\</code>, <code>\"</code>, U+007F - Use short formats for <code>\\\\</code>, <code>\\"</code>, <code>\\n</code>, <code>\\r</code>, <code>\\t</code>. - Everything else as <code>\\u{x}</code>.
-
enumerator FullTextName
Full text name escaping.
See reference documentation, chapter "Parser-Specific Usage of Text Names". Also mentioned in the specification for test adapters. - Escape characters U+0000-U+001F, <code>\\</code>, <code>\"</code>, <code>.</code>, <code>=</code>, U+007F-... - Escape all characters in <code>\\u{X}</code> format.
-
enumerator FullTestAdapter
Full test adapter escaping.
-
enumerator ErrorText
Escape for error output and log messages.
- Escapes all Unicode code points that may disrupt the display or have unexpected side effects. - Escapes all control codes. - Escapes backslash and double-quote. - Use short formats for <code>\\\\</code>, <code>\\"</code>, <code>\\n</code>, <code>\\r</code>, <code>\\t</code>. - Everything else as <code>\\u{x}</code>.