CMSimple_XH
Functions
utf8.php File Reference

UTF-8 related string functions. More...

Functions

 utf8_strlen ($string)
 Returns the number of Unicode code points in a string. More...
 
 utf8_substr ($string, $offset, $length=null)
 Returns part of a string given character offset and optionally length. More...
 
 utf8_strtolower ($string)
 Makes a string lowercase. More...
 
 utf8_strtoupper ($string)
 Makes a string uppercase. More...
 
 utf8_strpos ($haystack, $needle, $offset=0)
 Finds position of first occurrence of a string within another, case sensitive. More...
 
 utf8_stripos ($haystack, $needle, $offset=0)
 Finds position of first occurrence of a string within another, case insensitive. More...
 
 utf8_ucfirst ($string)
 Makes a string's first character uppercase. More...
 
 utf8_is_valid ($string)
 Tests a string as to whether it's valid UTF-8 and supported by the Unicode standard. More...
 
 utf8_bad_replace ($string, $replace='?')
 Replace bad bytes with an alternative character - ASCII character recommended is replacement char. More...
 

Detailed Description

UTF-8 related string functions.

Author
Harry Fuecks hfuec.nosp@m.ks@g.nosp@m.mail..nosp@m.com
The CMSimple_XH developers devs@.nosp@m.cmsi.nosp@m.mple-.nosp@m.xh.o.nosp@m.rg
See also
http://cmsimple-xh.org/

Function Documentation

◆ utf8_bad_replace()

utf8_bad_replace (   $string,
  $replace = '?' 
)

Replace bad bytes with an alternative character - ASCII character recommended is replacement char.

PCRE Pattern to locate bad bytes in a UTF-8 string Comes from W3 FAQ: Multilingual Forms.

Note: modified to include full ASCII range including control chars

Parameters
string$stringA string to search.
string$replaceA string to replace bad bytes with - use ASCII.
Returns
string
See also
http://www.w3.org/International/questions/qa-forms-utf-8

◆ utf8_is_valid()

utf8_is_valid (   $string)

Tests a string as to whether it's valid UTF-8 and supported by the Unicode standard.

Parameters
string$stringA UTF-8 encoded string.
Returns
boolean

◆ utf8_stripos()

utf8_stripos (   $haystack,
  $needle,
  $offset = 0 
)

Finds position of first occurrence of a string within another, case insensitive.

Returns false if needle is not found.

Parameters
string$haystackA haystack.
string$needleA needle.
int$offsetAn offset in Unicode code points.
Returns
int

◆ utf8_strlen()

utf8_strlen (   $string)

Returns the number of Unicode code points in a string.

Note: this function does not count bad bytes in the string - these are simply ignored.

Parameters
string$stringA UTF-8 encoded string.
Returns
int

◆ utf8_strpos()

utf8_strpos (   $haystack,
  $needle,
  $offset = 0 
)

Finds position of first occurrence of a string within another, case sensitive.

Returns false if needle is not found.

Parameters
string$haystackA haystack.
string$needleA needle.
int$offsetAn offset in Unicode code points.
Returns
int

◆ utf8_strtolower()

utf8_strtolower (   $string)

Makes a string lowercase.

Note: The concept of a characters "case" only exists is some alphabets such as Latin, Greek, Cyrillic, Armenian and archaic Georgian - it does not exist in the Chinese alphabet, for example. See Unicode Standard Annex #21: Case Mappings.

Parameters
string$stringA UTF-8 encoded string.
Returns
string

◆ utf8_strtoupper()

utf8_strtoupper (   $string)

Makes a string uppercase.

Note: The concept of a characters "case" only exists is some alphabets such as Latin, Greek, Cyrillic, Armenian and archaic Georgian - it does not exist in the Chinese alphabet, for example. See Unicode Standard Annex #21: Case Mappings.

Parameters
string$stringA UTF-8 encoded string.
Returns
string

◆ utf8_substr()

utf8_substr (   $string,
  $offset,
  $length = null 
)

Returns part of a string given character offset and optionally length.

Parameters
string$stringA UTF-8 encoded string.
int$offsetA number of UTF-8 code points offset.
int$lengthA length in UTF-8 code points from offset
Returns
string

◆ utf8_ucfirst()

utf8_ucfirst (   $string)

Makes a string's first character uppercase.

Parameters
string$stringA UTF-8 encoded string.
Returns
string