A dumping ground for any feature that only relates to unicode work. More...
Functions | |
String | AsBitString (Int32 IntToPrint) |
A helper function that produces a human readable sequence of ' ', '1' and '0' characters. More... | |
Int32 | GetCharacterFromInt (char *Destination, Int32 BytesUsable, Int32 ByteSequence) |
Convert a number that represents any valid unicode value into its UTF8 representation. More... | |
Int32 | GetIntFromCharacter (Int32 &BytesUsed, const char *CurrentCharacter) |
Get a number suitable for using in an index from a character string. More... | |
Variables | |
const UInt8 | High1Bit = (1<<7) |
1xxxxxxx - Is used compared against high 2 bits to determine if in middle of byte | |
const UInt32 | High1bytes = 0xFF000000 |
The Highest byte of an integer on this system. | |
const UInt8 | High2Bit = High1Bit | (1<<6) |
11xxxxxx | |
const UInt32 | High2bytes = 0xFFFF0000 |
The Highest 2 bytes of an integer on this system. | |
const UInt8 | High3Bit = High2Bit | (1<<5) |
111xxxxx | |
const UInt32 | High3bytes = 0xFFFFFF00 |
The Highest 3 bytes of an integer on this system. | |
const UInt8 | High4Bit = High3Bit | (1<<4) |
1111xxxx | |
const UInt8 | High5Bit = High4Bit | (1<<3) |
11111xxx | |
const UInt8 | High6Bit = High5Bit | (1<<2) |
111111xx | |
const UInt8 | High7Bit = High6Bit | (1<<1) |
1111111x | |
const UInt8 | High8Bit = High7Bit | 1 |
11111111 | |
const UInt8 | IterableHighBits [] = {0, High1Bit, High2Bit, High3Bit, High4Bit, High5Bit, High6Bit, High7Bit, High8Bit} |
The index of this array corresponds to the amount of high bits that are set. | |
const UInt8 | IterableLowBits [] = {0, Low1Bit, Low2Bit, Low3Bit, Low4Bit, Low5Bit, Low6Bit, Low7Bit, Low8Bit} |
The index of this array corresponds to the amount of low bits that are set. | |
const UInt8 | Low1Bit = (1) |
xxxxxxx1 | |
const UInt8 | Low2Bit = Low1Bit | (1<<1) |
xxxxxx11 | |
const UInt8 | Low3Bit = Low2Bit | (1<<2) |
xxxxx111 | |
const UInt8 | Low4Bit = Low3Bit | (1<<3) |
xxxx1111 | |
const UInt8 | Low5Bit = Low4Bit | (1<<4) |
xxx11111 | |
const UInt8 | Low6Bit = Low5Bit | (1<<5) |
xx111111 | |
const UInt8 | Low7Bit = Low6Bit | (1<<6) |
x1111111 | |
const UInt8 | Low8Bit = Low7Bit | (1<<7) |
11111111 | |
const Int32 | UTF8ByteRange1Max = 127 |
The maximum Unicode codepoint that can fit into a single UTF8 byte. Equal to 2^7-1. | |
const Int32 | UTF8ByteRange2Max = 4097 |
The maximum Unicode codepoint that can fit into 2 UTF8 bytes. Equal to 2^11-1. | |
const Int32 | UTF8ByteRange3Max = 65535 |
The maximum Unicode codepoint that can fit into 3 UTF8 bytes. Equal to 2^16-1. | |
const Int32 | UTF8ByteRange4Max = 2097151 |
The maximum Unicode codepoint that can fit into 4 UTF8 bytes. Equal to 2^21-1. | |
const UInt32 | UTF8Null2ByteBase = 49280 |
This is the numerical representation 0 in a two UTF8 Sequence. Is equal to 11000000 10000000. | |
const UInt32 | UTF8Null3ByteBase = 14712960 |
This is the numerical representation 0 in a three UTF8 Sequence. Is equal to 11100000 10000000 10000000. | |
const UInt32 | UTF8Null4ByteBase = 4034953344 |
This is the numerical representation 0 in a four UTF8 Sequence. Is equal to 11110000 10000000 10000000 10000000. | |
A dumping ground for any feature that only relates to unicode work.
Unicode is a series of numbers that correlate to glyphs. These numbers are seperate from any binary representation. Common binary represenations of the numbers are UTF8, UTF16, and UTF32. This library supports UTF8, which uses between 1 and 4 bytes to represent and valid Unicode glyph. The tools provided here allow conversion between The raw Unicode value, which is useful for algorithms, and its UTF8 representation, which is useful for storage and transmission.
String Mezzanine::Unicode::AsBitString | ( | Int32 | IntToPrint) |
A helper function that produces a human readable sequence of ' ', '1' and '0' characters.
IntToPrint | A 32 bit integer that will be used to create the sequence. |
Definition at line 71 of file unicode.cpp.
Int32 Mezzanine::Unicode::GetCharacterFromInt | ( | char * | Destination, |
Int32 | BytesUsable, | ||
Int32 | ByteSequence | ||
) |
Convert a number that represents any valid unicode value into its UTF8 representation.
Destination | The place to write the results. Never more than 4 bytes will be written. Null terminators are not written. |
BytesUsable | How many byte of the Destination are usable. |
ByteSequence | The integer value to convert to a UTF8 unicode representation. This sequence must be representable in 21 or fewer bits(<4194304) to be valid. |
Definition at line 116 of file unicode.cpp.
Int32 Mezzanine::Unicode::GetIntFromCharacter | ( | Int32 & | BytesUsed, |
const char * | CurrentCharacter | ||
) |
Get a number suitable for using in an index from a character string.
BytesUsed | The value of this variable is ignored and overwritten with the amount of bytes consumed from CurrentCharacter. |
CurrentCharacter | a pointer to a c style string. |
Definition at line 86 of file unicode.cpp.