UCE Docs / split_utf8

Signature

StringList split_utf8(String str, bool compound_characters = false)

Parameters

str : string to be split
compound_characters : optional, if true tries to combine compound characters
return value : a list of Unicode characters

Splits str into its constituent Unicode code points.

If compound_characters is true, split_utf8() also applies a small amount of grouping so some multi-code-point glyphs stay together. The current rules are:

  • combine characters joined by a Zero-Width Joiner (ZWJ)

  • combine two Regional Indicator Symbol Letter characters

  • append Variation Selectors to the previous character

  • otherwise leave characters as separate entries

This is useful when simple byte-wise or ASCII splitting would break Unicode text incorrectly.

Example

StringList chars = split_utf8("Hi");
print(join(chars, ","), "\n");
Output
H,i