migrating from latin1 to utf8 in mysql : part 1

Some time ago we had to migrate a latin1 mysql 5.0.37 database to it’s utf8 counterpart, due to the addition of new languages to the application (russian, polish, …)

In the latin1 database we had 3 cases :
1. legal latin1 characters
2. html entities generated by the browser
3. double encoded characters which infact are utf8 characters :
eg ÃŸ => if you put this in hex you’ll see that this is c3 9f => this
converts to the german ß in utf8

the problem was on how to locate those occurences :
* the first case was not a problem since mysqldump would convert them properly
* the second case was not difficult we used the the function of regexp to locate
these eg. name REGEXP ‘&#[0-9]+;’
* the third case was much more difficult to find i’ll dedicate a 2nd post to this
later

On June 7, 2010 By phil

Leave a Reply Cancel reply

Related Post

migrating from latin1 to utf8 in mysql : part 2

connecting to mysql from warehouse builder