class idna_convert


Located at : UKNOWVA_ROOT/libraries/simplepie/idn/idna_convert.class.php

Encode/decode Internationalized Domain Names.

The class allows to convert internationalized domain names (see RFC 3490 for details) as they can be used with various registries worldwide to be translated between their original (localized) form and their encoded form as it will be used in the DNS (Domain Name System).

The class provides two public methods, encode() and decode(), which do exactly what you would expect them to do. You are allowed to use complete domain names, simple strings and complete email addresses as well. That means, that you might use any of the following notations:

  • www.nörgler.com
  • xn--nrgler-wxa
  • xn--brse-5qa.xn--knrz-1ra.info

Unicode input might be given as either UTF-8 string, UCS-4 string or UCS-4 array. Unicode output is available in the same formats. You can select your preferred format via {@link set_paramter()}.

ACE input and output is always expected to be ASCII.

Properties

array $NP Holds all relevant mapping tables, loaded from a seperate file on construct See RFC3454 for details
$_punycode_prefix
$_invalid_ucs
$_max_ucs
$_base
$_tmin
$_tmax
$_skew
$_damp
$_initial_bias
$_initial_n
$_sbase
$_lbase
$_vbase
$_tbase
$_lcount
$_vcount
$_tcount
$_ncount
$_scount
$_error
$_api_encoding
$_allow_overlong
$_strict_mode

Methods

idna_convert($options = false)

No description

boolean
set_parameter(mixed $option, string $value = false)

Sets a new option value. Available options and values: [encoding - Use either UTF-8, UCS4 as array or UCS4 as string as input ('utf8' for UTF-8, 'ucs4string' and 'ucs4array' respectively for UCS4); The output is always UTF-8] [overlong - Unicode does not allow unnecessarily long encodings of chars, to allow this, set this parameter to true, else to false; default is false.] [strict - true: strict mode, good for registration purposes - Causes errors on failures; false: loose mode, ideal for "wildlife" applications by silently ignoring errors and returning the original input instead

string
decode($input, $one_time_encoding = false)

Decode a given ACE domain name

string
encode($decoded, $one_time_encoding = false)

Encode a given UTF-8 domain name

string
get_last_error()

Use this method to get the last error ocurred

_decode($encoded)

The actual decoding algorithm

_encode($decoded)

The actual encoding algorithm

_adapt($delta, $npoints, $is_first)

Adapt the bias according to the current code point and position

_encode_digit($d)

Encoding a certain digit

_decode_digit($cp)

Decode a certain digit

_error($error = '')

Internal error handling method

string
_nameprep(array $input)

Do Nameprep according to RFC3491 and RFC3454

array
_hangul_decompose(integer $char)

Decomposes a Hangul syllable (see http://www.unicode.org/unicode/reports/tr15/#Hangul

array
_hangul_compose(array $input)

Ccomposes a Hangul syllable (see http://www.unicode.org/unicode/reports/tr15/#Hangul

integer
_get_combining_class(integer $char)

Returns the combining class of a certain wide char

array
_apply_cannonical_ordering(array $input)

Apllies the cannonical ordering of a decomposed UCS4 sequence

array
_combine(array $input)

Do composition of a sequence of starter and non-starter

_utf8_to_ucs4($input)

This converts an UTF-8 encoded string to its UCS-4 representation By talking about UCS-4 "strings" we mean arrays of 32bit integers representing each of the "chars". This is due to PHP not being able to handle strings with bit depth different from 8. This apllies to the reverse method ucs4to_utf8(), too.

_ucs4_to_utf8($input)

Convert UCS-4 string into UTF-8 string See utf8to_ucs4() for details

_ucs4_to_ucs4_string($input)

Convert UCS-4 array into UCS-4 string

_ucs4_string_to_ucs4($input)

Convert UCS-4 strin into UCS-4 garray

Details

at line 94
idna_convert($options = false)

Parameters

$options

at line 125
boolean set_parameter(mixed $option, string $value = false)

Sets a new option value. Available options and values: [encoding - Use either UTF-8, UCS4 as array or UCS4 as string as input ('utf8' for UTF-8, 'ucs4string' and 'ucs4array' respectively for UCS4); The output is always UTF-8] [overlong - Unicode does not allow unnecessarily long encodings of chars, to allow this, set this parameter to true, else to false; default is false.] [strict - true: strict mode, good for registration purposes - Causes errors on failures; false: loose mode, ideal for "wildlife" applications by silently ignoring errors and returning the original input instead

Parameters

mixed $option Parameter to set (string: single parameter; array of Parameter => Value pairs)
string $value Value to use (if parameter 1 is a string)

Return Value

boolean true on success, false otherwise

at line 165
string decode($input, $one_time_encoding = false)

Decode a given ACE domain name

Parameters

$input
$one_time_encoding

Return Value

string Decoded Domain name (UTF-8 or UCS-4)

at line 267
string encode($decoded, $one_time_encoding = false)

Encode a given UTF-8 domain name

Parameters

$decoded
$one_time_encoding

Return Value

string Encoded Domain name (ACE string)

at line 351
string get_last_error()

Use this method to get the last error ocurred

Return Value

string The last error, that occured

at line 360
_decode($encoded)

The actual decoding algorithm

Parameters

$encoded

at line 419
_encode($decoded)

The actual encoding algorithm

Parameters

$decoded

at line 517
_adapt($delta, $npoints, $is_first)

Adapt the bias according to the current code point and position

Parameters

$delta
$npoints
$is_first

at line 531
_encode_digit($d)

Encoding a certain digit

Parameters

$d

at line 540
_decode_digit($cp)

Decode a certain digit

Parameters

$cp

at line 550
_error($error = '')

Internal error handling method

Parameters

$error

at line 561
string _nameprep(array $input)

Do Nameprep according to RFC3491 and RFC3454

Parameters

array $input Unicode Characters

Return Value

string Unicode Characters, Nameprep'd

at line 645
array _hangul_decompose(integer $char)

Decomposes a Hangul syllable (see http://www.unicode.org/unicode/reports/tr15/#Hangul

Parameters

integer $char 32bit UCS4 code point

Return Value

array Either Hangul Syllable decomposed or original 32bit value as one value array

at line 665
array _hangul_compose(array $input)

Ccomposes a Hangul syllable (see http://www.unicode.org/unicode/reports/tr15/#Hangul

Parameters

array $input Decomposed UCS4 sequence

Return Value

array UCS4 sequence with syllables composed

at line 707
integer _get_combining_class(integer $char)

Returns the combining class of a certain wide char

Parameters

integer $char Wide char to check (32bit integer)

Return Value

integer Combining class if found, else 0

at line 718
array _apply_cannonical_ordering(array $input)

Apllies the cannonical ordering of a decomposed UCS4 sequence

Parameters

array $input Decomposed UCS4 sequence

Return Value

array Ordered USC4 sequence

at line 751
array _combine(array $input)

Do composition of a sequence of starter and non-starter

Parameters

array $input UCS4 Decomposed sequence

Return Value

array Ordered USC4 sequence

at line 788
_utf8_to_ucs4($input)

This converts an UTF-8 encoded string to its UCS-4 representation By talking about UCS-4 "strings" we mean arrays of 32bit integers representing each of the "chars". This is due to PHP not being able to handle strings with bit depth different from 8. This apllies to the reverse method ucs4to_utf8(), too.

The following UTF-8 encodings are supported: bytes bits representation 1 7 0xxxxxxx 2 11 110xxxxx 10xxxxxx 3 16 1110xxxx 10xxxxxx 10xxxxxx 4 21 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 5 26 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 6 31 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx Each x represents a bit that can be used to store character data. The five and six byte sequences are part of Annex D of ISO/IEC 10646-1:2000

Parameters

$input

at line 865
_ucs4_to_utf8($input)

Convert UCS-4 string into UTF-8 string See utf8to_ucs4() for details

Parameters

$input

at line 902
_ucs4_to_ucs4_string($input)

Convert UCS-4 array into UCS-4 string

Parameters

$input

at line 918
_ucs4_string_to_ucs4($input)

Convert UCS-4 strin into UCS-4 garray

Parameters

$input