UZBEK_LANGUAGE_CODE
UZBEK_LANGUAGE_CODE = 'uz'
## 🇷🇺 Русским гражданам В Украине сейчас идет война. Силами РФ наносятся удары по гражданской инфраструктуре в [Харькове][1], [Киеве][2], [Чернигове][3], [Сумах][4], [Ирпене][5] и десятках других городов. Гибнут люди - и гражданское население, и военные, в том числе российские призывники, которых бросили воевать. Чтобы лишить собственный народ доступа к информации, правительство РФ запретило называть войну войной, закрыло независимые СМИ и принимает сейчас ряд диктаторских законов. Эти законы призваны заткнуть рот всем, кто против войны. За обычный призыв к миру сейчас можно получить несколько лет тюрьмы.
$ASCII_MAPS : array<string,array<string,string>>|null
$ASCII_MAPS_AND_EXTRAS : array<string,array<string,string>>|null
$ASCII_EXTRAS : array<string,array<string,string>>|null
$ORD : array<string,int>|null
$LANGUAGE_MAX_KEY : array<string,int>|null
$REGEX_ASCII : string
url: https://en.wikipedia.org/wiki/Wikipedia:ASCII#ASCII_printable_characters
$BIDI_UNI_CODE_CONTROLS_TABLE : array<int,string>
bidirectional text chars
url: https://www.w3.org/International/questions/qa-bidi-unicode-controls
charsArray(bool $replace_extra_symbols = false) : array
Returns an replacement array for ASCII methods.
EXAMPLE:
$array = ASCII::charsArray();
var_dump($array['ru']['б']); // 'b'
bool | $replace_extra_symbols | [optional] Add some more replacements e.g. "£" with " pound ". |
charsArrayWithMultiLanguageValues(bool $replace_extra_symbols = false) : array
Returns an replacement array for ASCII methods with a mix of multiple languages.
EXAMPLE:
$array = ASCII::charsArrayWithMultiLanguageValues();
var_dump($array['b']); // ['β', 'б', 'ဗ', 'ბ', 'ب']
bool | $replace_extra_symbols | [optional] Add some more replacements e.g. "£" with " pound ". |
An array of replacements.
charsArrayWithOneLanguage(string $language = self::ENGLISH_LANGUAGE_CODE, bool $replace_extra_symbols = false, bool $asOrigReplaceArray = true) : array
Returns an replacement array for ASCII methods with one language.
For example, German will map 'ä' to 'ae', while other languages will simply return e.g. 'a'.
EXAMPLE:
$array = ASCII::charsArrayWithOneLanguage('ru');
$tmpKey = \array_search('yo', $array['replace']);
echo $array['orig'][$tmpKey]; // 'ё'
string | $language | [optional] Language of the source string e.g.: en, de_at, or de-ch. (default is 'en') | ASCII::*_LANGUAGE_CODE |
bool | $replace_extra_symbols | [optional] Add some more replacements e.g. "£" with " pound ". |
bool | $asOrigReplaceArray | [optional] TRUE === return {orig: string[], replace: string[]} array |
An array of replacements.
charsArrayWithSingleLanguageValues(bool $replace_extra_symbols = false, bool $asOrigReplaceArray = true) : array
Returns an replacement array for ASCII methods with multiple languages.
EXAMPLE:
$array = ASCII::charsArrayWithSingleLanguageValues();
$tmpKey = \array_search('hnaik', $array['replace']);
echo $array['orig'][$tmpKey]; // '၌'
bool | $replace_extra_symbols | [optional] Add some more replacements e.g. "£" with " pound ". |
bool | $asOrigReplaceArray | [optional] TRUE === return {orig: string[], replace: string[]} array |
An array of replacements.
clean(string $str, bool $normalize_whitespace = true, bool $keep_non_breaking_space = false, bool $normalize_msword = true, bool $remove_invisible_characters = true) : string
Accepts a string and removes all non-UTF-8 characters from it + extras if needed.
string | $str | The string to be sanitized. |
bool | $normalize_whitespace | [optional] Set to true, if you need to normalize the whitespace. |
bool | $keep_non_breaking_space | [optional] Set to true, to keep non-breaking-spaces, in combination with $normalize_whitespace |
bool | $normalize_msword | [optional] Set to true, if you need to normalize MS Word chars e.g.: "…" => "..." |
bool | $remove_invisible_characters | [optional] Set to false, if you not want to remove invisible characters e.g.: "\0" |
A clean UTF-8 string.
normalize_msword(string $str) : string
Returns a string with smart quotes, ellipsis characters, and dashes from Windows-1252 (commonly used in Word documents) replaced by their ASCII equivalents.
EXAMPLE:
ASCII::normalize_msword('„Abcdef…”'); // '"Abcdef..."'
string | $str | The string to be normalized. |
A string with normalized characters for commonly used chars in Word documents.
normalize_whitespace(string $str, bool $keepNonBreakingSpace = false, bool $keepBidiUnicodeControls = false, bool $normalize_control_characters = false) : string
Normalize the whitespace.
EXAMPLE:
ASCII::normalize_whitespace("abc-\xc2\xa0-öäü-\xe2\x80\xaf-\xE2\x80\xAC", true); // "abc-\xc2\xa0-öäü- -"
string | $str | The string to be normalized. |
bool | $keepNonBreakingSpace | [optional] Set to true, to keep non-breaking-spaces. |
bool | $keepBidiUnicodeControls | [optional] Set to true, to keep non-printable (for the web) bidirectional text chars. |
bool | $normalize_control_characters | [optional] Set to true, to convert e.g. LINE-, PARAGRAPH-SEPARATOR with "\n" and LINE TABULATION with "\t". |
A string with normalized whitespace.
remove_invisible_characters(string $str, bool $url_encoded = false, string $replacement = '', bool $keep_basic_control_characters = true) : string
Remove invisible characters from a string.
e.g.: This prevents sandwiching null characters between ascii characters, like Java\0script.
copy&past from https://github.com/bcit-ci/CodeIgniter/blob/develop/system/core/Common.php
string | $str | |
bool | $url_encoded | |
string | $replacement | |
bool | $keep_basic_control_characters |
to_ascii_remap(string $str1, string $str2) : string[]
WARNING: This method will return broken characters and is only for special cases.
Convert two UTF-8 encoded string to a single-byte strings suitable for functions that need the same string length after the conversion.
The function simply uses (and updates) a tailored dynamic encoding (in/out map parameter) where non-ascii characters are remapped to the range [128-255] in order of appearance.
string | $str1 | |
string | $str2 |
to_ascii(string $str, string $language = self::ENGLISH_LANGUAGE_CODE, bool $remove_unsupported_chars = true, bool $replace_extra_symbols = false, bool $use_transliterate = false, bool|null $replace_single_chars_only = null) : string
Returns an ASCII version of the string. A set of non-ASCII characters are replaced with their closest ASCII counterparts, and the rest are removed by default. The language or locale of the source string can be supplied for language-specific transliteration in any of the following formats: en, en_GB, or en-GB. For example, passing "de" results in "äöü" mapping to "aeoeue" rather than "aou" as in other languages.
EXAMPLE:
ASCII::to_ascii('�Düsseldorf�', 'en'); // Dusseldorf
string | $str | The input string. |
string | $language | [optional] Language of the source string. (default is 'en') | ASCII::*_LANGUAGE_CODE |
bool | $remove_unsupported_chars | [optional] Whether or not to remove the unsupported characters. |
bool | $replace_extra_symbols | [optional] Add some more replacements e.g. "£" with " pound ". |
bool | $use_transliterate | [optional] Use ASCII::to_transliterate() for unknown chars. |
bool|null | $replace_single_chars_only | [optional] Single char replacement is better for the performance, but some languages need to replace more then one char at the same time. | NULL === auto-setting, depended on the language |
A string that contains only ASCII characters.
to_filename(string $str, bool $use_transliterate = true, string $fallback_char = '-') : string
Convert given string to safe filename (and keep string case).
EXAMPLE:
ASCII::to_filename('שדגשדג.png', true)); // 'shdgshdg.png'
string | $str | |
bool | $use_transliterate | ASCII::to_transliterate() is used by default - unsafe characters are simply replaced with hyphen otherwise. |
string | $fallback_char |
A string that contains only safe characters for a filename.
to_slugify(string $str, string $separator = '-', string $language = self::ENGLISH_LANGUAGE_CODE, array<string,string> $replacements = [], bool $replace_extra_symbols = false, bool $use_str_to_lower = true, bool $use_transliterate = false) : string
Converts the string into an URL slug. This includes replacing non-ASCII characters with their closest ASCII equivalents, removing remaining non-ASCII and non-alphanumeric characters, and replacing whitespace with $separator. The separator defaults to a single dash, and the string is also converted to lowercase. The language of the source string can also be supplied for language-specific transliteration.
string | $str | |
string | $separator | [optional] The string used to replace whitespace. |
string | $language | [optional] Language of the source string. (default is 'en') | ASCII::*_LANGUAGE_CODE |
array |
$replacements | [optional] A map of replaceable strings. |
bool | $replace_extra_symbols | [optional] Add some more replacements e.g. "£" with " pound ". |
bool | $use_str_to_lower | [optional] Use "string to lower" for the input. |
bool | $use_transliterate | [optional] Use ASCII::to_transliterate() for unknown chars. |
A string that has been converted to an URL slug.
to_transliterate(string $str, string|null $unknown = '?', bool $strict = false) : string
Returns an ASCII version of the string. A set of non-ASCII characters are replaced with their closest ASCII counterparts, and the rest are removed unless instructed otherwise.
EXAMPLE:
ASCII::to_transliterate('déjà σσς iıii'); // 'deja sss iiii'
string | $str | The input string. |
string|null | $unknown | [optional] Character use if character unknown. (default is '?') But you can also use NULL to keep the unknown chars. |
bool | $strict | [optional] Use "transliterator_transliterate()" from PHP-Intl |
A String that contains only ASCII characters.
to_ascii_remap_intern(string $str, array $map) : string
WARNING: This method will return broken characters and is only for special cases.
Convert a UTF-8 encoded string to a single-byte string suitable for functions that need the same string length after the conversion.
The function simply uses (and updates) a tailored dynamic encoding (in/out map parameter) where non-ascii characters are remapped to the range [128-255] in order of appearance.
Thus, it supports up to 128 different multibyte code points max over the whole set of strings sharing this encoding.
Source: https://github.com/KEINOS/mb_levenshtein
string | $str | UTF-8 string to be converted to extended ASCII. |
array | $map | Internal-Map of code points to ASCII characters. |
Mapped borken string.