utf8
utf8
This library provides basic support for UTF-8 encoding. This library does not provide any support for Unicode other than the handling of the encoding. Any operation that needs the meaning of a character, such as character classification, is outside its scope.
Unless stated otherwise, all functions that expect a byte position as a parameter assume that the given position is either the start of a byte sequence or one plus the length of the subject string. As in the string library, negative indices count from the end of the string.
You can find a large catalog of usable UTF-8characters here.
Functions
| string utf8.char ( Tuple codepoints ) |
|
Receives zero or more codepoints as integers, converts each one to its corresponding UTF-8 byte sequence and returns a string with the concatenation of all these sequences. |
| function , string , int utf8.codes ( string str ) |
|
Returns an iterator function so that the construction: for position, codepoint in utf8.codes(str) do -- body end will iterate over all codepoints in string |
| Tuple utf8.codepoint ( string str, int i = 1, int j = i ) |
|
Returns the codepoints (as integers) from all codepoints in the provided string (str) that start between byte positions i and j (both included). The default for i is 1 and for j is i. It raises an error if it meets any invalid byte sequence. |
| int utf8.len ( string s, int i = 1, int j = -1 ) |
|
Returns the number of UTF-8 codepoints in the string str that start between positions i and j (both inclusive). The default for i is 1 and for j is -1. If it finds any invalid byte sequence, returns a nil value plus the position of the first invalid byte. |
| int utf8.offset ( string s, int n, int i = 1 ) |
|
Returns the position (in bytes) where the encoding of the n-th codepoint of s (counting from byte position i) starts. A negative n gets characters before position i. The default for i is 1 when n is non-negative and #s + 1 otherwise, so that utf8.offset(s, -n) gets the offset of the n-th character from the end of the string. If the specified character is neither in the subject nor right after its end, the function returns nil. |
| function utf8.graphemes ( string str, number i, number j ) |
|
Returns an iterator function so that for first, last in utf8.graphemes(str) do local grapheme = s:sub(first, last) -- body end will iterate the grapheme clusters of the string. |
| string utf8.nfcnormalize ( string str ) |
|
Converts the input string to Normal Form C, which tries to convert decomposed characters into composed characters. |
| string utf8.nfdnormalize ( string str ) |
|
Converts the input string to Normal Form D, which tries to break up composed characters into decomposed characters. |
Constants
| string utf8.charpattern |
|
The pattern |