Class UTF8
-
- All Implemented Interfaces:
public class UTF8
Utilities for working with UTF-8 encodings.
Decoding of UTF-8 is based on a presentation by Bob Steagall at CppCon2018 (see https://github.com/BobSteagall/CppCon2018). It uses a Deterministic Finite Automaton (DFA) to recognize and decode multi-byte code points.
-
-
Constructor Summary
Constructors Constructor Description UTF8()
-
Method Summary
Modifier and Type Method Description static int
transcodeToUTF16(Array<byte> utf8, Array<char> utf16)
Transcode a UTF-8 encoding into a UTF-16 representation. -
-
Method Detail
-
transcodeToUTF16
static int transcodeToUTF16(Array<byte> utf8, Array<char> utf16)
Transcode a UTF-8 encoding into a UTF-16 representation. In the general case the output
utf16
array should be at least as long as the inpututf8
one to handle arbitrary inputs. The number of output UTF-16 code units is returned, or -1 if any errors are encountered (in which case an arbitrary amount of data may have been written into the output array). Errors that will be detected are malformed UTF-8, including incomplete, truncated or "overlong" encodings, and unmappable code points. In particular, no unmatched surrogates will be produced. An error will also result ifutf16
is found to be too small to store the complete output.- Parameters:
utf8
- A non-null array containing a well-formed UTF-8 encoding.utf16
- A non-null array, at least as long as theutf8
array in order to ensure the output will fit.- Returns:
The number of UTF-16 code units written to
utf16
(beginning from index 0), or else -1 if the input was either malformed or encoded any unmappable characters, or if theutf16
is too small.
-
-
-
-