PHP: Prepare (sanitize, transliterate, convert, change) user string (input) for filename or url address
, updated:

PHP: Prepare (sanitize, transliterate, convert, change) user string (input) for filename or url address

Function transliterator_transliterate() will help you to create nice transliterated string from user input. It can be used for creation of URL addresses (slugs) or for sanitization of uploaded filename.

Install php-intl package

First of all install, PHP Intl extension otherwise when you try to run the function transliterator_transliterate() you get error like this

Call to undefined function.

Commands for Ubuntu 16.04 Xenial Xerus with php 7.0:

sudo apt-get install php-intl
sudo service php7.0-fpm restart

Commands for Ubuntu 20.04 LTS (Focal Fossa) with php 7.4:

sudo apt install php-intl
sudo service php7.4-fpm restart

Basic code sample

This line represents core of the function:

$string = transliterator_transliterate('Any-Latin;Latin-ASCII;', $string);

Sanitize (transliterate) uploaded filename

It transliterate non-ASCII characters into ASCII (毛泽东 -> mao ze dong).

This is complete test.php for fileName function code transliterating filename of the upload file to Latin characters:

<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Test</title>
</head>
<body>
<?php

function pr($string) {
  print '<hr>';
  print '"' . fileName($string) . '"';
  print '<br>';
  print '"' . $string . '"';
}

function fileName($string) {
  // remove html tags
  $clean = strip_tags($string);
  // transliterate
  $clean = transliterator_transliterate('Any-Latin;Latin-ASCII;', $clean);
  // remove non-number and non-letter characters
  $clean = str_replace('--', '-', preg_replace('/[^a-z0-9-\_]/i', '', preg_replace(array(
    '/\s/',
    '/[^\w-\.\-]/'
  ), array(
    '_',
    ''
  ), $clean)));
  // replace '-' for '_'
  $clean = strtr($clean, array(
    '-' => '_'
  ));
  // remove double '__'
  $positionInString = stripos($clean, '__');
  while ($positionInString !== false) {
    $clean = str_replace('__', '_', $clean);
    $positionInString = stripos($clean, '__');
  }
  // remove '_' from the end and beginning of the string
  $clean = rtrim(ltrim($clean, '_'), '_');
  // lowercase the string
  return strtolower($clean);
}
pr('_replace(\'~&([a-z]{1,2})(ac134/56f4315981743 8765475[]lt7ňl2ú5äňú138yé73ťž7ýľute|');
pr(htmlspecialchars('<script>alert(\'hacked\')</script>'));
pr('Álix----_Ãxel!?!?');
pr('áéíóúÁÉÍÓÚ');
pr('üÿÄËÏÖÜ.ŸåÅ');
pr('nie4č a a§ôňäääaš');
pr('Мао Цзэдун');
pr('毛泽东');
pr('ماو تسي تونغ');
pr('مائو تسه‌تونگ');
pr('מאו דזה-דונג');
pr('მაო ძედუნი');
pr('Mao Trạch Đông');
pr('毛澤東');
pr('เหมา เจ๋อตง');
?>
</body>
</html>

PHP8 update: Custom transliterate rules

Sometimes you need to write custom transliterate rules for specific languages. Our example is about the Russian language where letter “ш” which English reads as “sh” is transliterated by function transliterator_transliterate() as “s”.

Solutions to this is code as follows:

$str = 'Финиш';

$rules = <<<'RULES'
:: NFC ;
ё > e; ж > zh; й > i; х > kh; ц > ts; ч > ch; ш > sh; щ > shch; ъ > ie;
э > e; ю > iu; я > ia;
:: Cyrillic-Latin ;
RULES;

$tls = Transliterator::createFromRules($rules);

echo $tls->transliterate($str) . PHP_EOL;

More about Transliterator class on PHP.net. More transliterate solutions are here.

Leave a Reply

Your email address will not be published. Required fields are marked *

↑ Up