Allegro.cc - Online Community

Allegro.cc Forums » Programming Questions » [PHP] Using str_replace to replace UTF-8 characters?

This thread is locked; no one can reply to it. rss feed Print
[PHP] Using str_replace to replace UTF-8 characters?
Vanneto
Member #8,643
May 2007

Hey everyone,

I have a small problem, I have the following script that replaces characters like ü, ö (German characters) with ue, oe, etc. Here is the script:

#SelectExpand
1<?php 2 3error_reporting(E_ALL); 4 5$dbhost = 'localhost'; 6$dbuser = 'root'; 7$dbpass = ''; 8 9mysql_connect($dbhost, $dbuser, $dbpass) or die(mysql_error()); 10mysql_select_db('stavki'); 11 12/*mysql_set_charset('utf-8');*/ 13mysql_query("SET CHARSET utf-8"); 14 15if(isset($_POST['dodaj'])) 16{ 17 $input2 = $_POST['input']; 18 19 $vn = array('ö', 'ü', 'ä', 'ß', 'Ö', 'Ü', 'Ä', 'Š', 'È', 'Ž', 'š', 'è', 'ž'); 20 $not = array('oe', 'ue', 'ae', 'ss', 'OE', 'UE', 'AE', 'S', 'C', 'Z', 's', 'c', 'z'); 21 22 $input = str_replace($vn, $not, $input2); 23 24 $besede = explode(" ", $input); 25 26 $stavki = mysql_query("SELECT * FROM stavki") or die(mysql_error()); 27 28 echo "IZPIŠEMO VSE MOŽNE BESEDE, KI JIH JE VPISAL UPORABNIK"; 29 echo "<pre>"; 30 print_r($besede); 31 echo "</pre>"; 32 echo "<hr />"; 33 34 while($kos = mysql_fetch_array($stavki)) 35 { 36 37 $vprasanje = $kos['vpr']; 38 $odgovor = $kos['odg']; 39 40 $vpr2 = str_replace($vn, $not, $vprasanje); 41 42 echo "Debug:<BR />Odgovor: ".$odgovor."<br />"; 43 echo "Vprašanje: ".$vpr2."<br />"; 44 $a = 0; 45 foreach($besede as $beseda) 46 { 47 if(stristr($vpr2, $beseda)) 48 { 49 $a = $a + 1; 50 echo "<font color=\"green\"><b>".$beseda."</b></font> "; 51 } 52 else 53 { 54 echo "<font color=\"red\">".$beseda."</font> "; 55 } 56 } 57 echo "<hr />"; 58 59 $konec[] = array("st_ujemanj" => $a, "vpr" => $vprasanje, "odg" => $odgovor); 60 } 61 array_multisort($konec, SORT_DESC); 62 echo "<pre>"; 63 print_r($konec); 64 echo "</pre>"; 65 echo "<hr />"; 66 echo "<b>".$_POST['input']."</b> "; 67 echo $konec[0]['odg']; 68 69} 70?> 71 72<form action="" method="post"> 73 Vprašanje: <input type="text" name="input" /><br /> 74 <input type="submit" name="dodaj" /> 75</form>

Note: not my code

I guess what it comes down to is this:

#SelectExpand
1<?php 2 3if(isset($_POST['dodaj'])) 4{ 5 $input2 = $_POST['input']; 6 7 $vn = array('ö', 'ü', 'ä', 'ß', 'Ö', 'Ü', 'Ä', 'Š', 'È', 'Ž', 'š', 'è', 'ž'); 8 $not = array('oe', 'ue', 'ae', 'ss', 'OE', 'UE', 'AE', 'S', 'C', 'Z', 's', 'c', 'z'); 9 10 $input = str_replace($vn, $not, $input2); 11 12 echo $input; 13 14 15} 16?> 17 18<form action="" method="post"> 19 Vprašanje: <input type="text" name="input" /><br /> 20 <input type="submit" name="dodaj" /> 21</form>

I input ü and it outputs ü. Does anyone know how to solve this? Maybe using the MultiByte extension? But AFAIK, it doesnt have a mb_str_replace(). I would really appreciate any help or suggestions.

Thank you.

In capitalist America bank robs you.

CGamesPlay
Member #2,559
July 2002
avatar

Adding header("Content-type: text/html; charset=utf-8"); to the top of the file causes the script to work on my server. I don't know why though :)

[append]

The behavior is that the browser sends the form data in the same character set as the page that contained the form, and this is supported by this page, but I haven't found anything on W3C.

--
Tomasu: Every time you read this: hugging!

Ryan Patterson - <http://cgamesplay.com/>

Matthew Leverton
Supreme Loser
January 1999
avatar

First, make sure you are explicitly setting the character encoding type. If you set it to UTF-8, then you only have to handle one case.

Second, make sure your file is saved in UTF-8.

Then it will work.

Reason:

The character: "ö" might be stored as 0xF6 (ASCII) or 0xC3B6 (UTF-8) or 0x00F6 (UTF-16) or 0x000000F6 (UTF-32) in your file.

In addition to those types, the browser may submit it as &#246; or &#xf6; or &ouml;. (Although for characters < 256, the default is probably always ASCII.)

A naive search and replace will only work if both encoding types are the same.

ImLeftFooted
Member #3,935
October 2003
avatar

Maybe if you stopped killing kittens your code would work :-X

Tobias Dammers
Member #2,604
August 2002
avatar

My advice: use UTF-8 for everything. This includes:
- The source file itself
- PHP's multibyte support (mb_encoding() etc.)
- The database collation (if possible)
- Content-type header

---
Me make music: Triofobie
---
"We need Tobias and his awesome trombone, too." - Johan Halmén

Go to: