why is mysql default character set latin1?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

why is mysql default character set latin1?

Si Chen-2
Hi -

Just curious...why does entityengine.xml set default character set  
and collation for MySQL to latin1 instead of utf8?


Si
[hidden email]



Reply | Threaded
Open this post in threaded view
|

RE: why is mysql default character set latin1?

Peter Dirickson
Hi Si,

Is it maybe because the MySQL default collation is latin1?

Peter.


-----Original Message-----
From: Si Chen [mailto:[hidden email]]
Sent: Thursday, October 05, 2006 11:09 AM
To: [hidden email]
Subject: why is mysql default character set latin1?

Hi -

Just curious...why does entityengine.xml set default character set  
and collation for MySQL to latin1 instead of utf8?


Si
[hidden email]




Reply | Threaded
Open this post in threaded view
|

Re: why is mysql default character set latin1?

David E Jones-2
In reply to this post by Si Chen-2

With MySQL 4.1X using UTF-8 really messed up the column sizes because  
it stores each UTF-8 character as 3 bytes (why not 2 I don't  
know...). In other words, if you had a varchar of length 60 and put  
in a 21 character UTF-8 string, it will overflow...

I don't know if this is still an issue with the 5 series of MySQL.

-David


On Oct 5, 2006, at 7:08 PM, Si Chen wrote:

> Hi -
>
> Just curious...why does entityengine.xml set default character set  
> and collation for MySQL to latin1 instead of utf8?
>
>
> Si
> [hidden email]
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: why is mysql default character set latin1?

Kurt T Stam-3
UTF-8 can consume up to 6 bytes per character. UCS2 is strictly 2 bytes.
However most people prefer the 'backwards compatible' utf-8 where ASCII
range characters still only consume 1 byte, so it should NOT overflow
using ASCII, but it might using Asian characters. BTW, on average it
takes 3 bytes per character for Asian characters, so a rule of thumb is
to increase your string lengths by 3 when doing i18n.

Any db will have this 'problem'..

--Kurt

 
David E Jones wrote:

>
> With MySQL 4.1X using UTF-8 really messed up the column sizes because
> it stores each UTF-8 character as 3 bytes (why not 2 I don't know...).
> In other words, if you had a varchar of length 60 and put in a 21
> character UTF-8 string, it will overflow...
>
> I don't know if this is still an issue with the 5 series of MySQL.
>
> -David
>
>
> On Oct 5, 2006, at 7:08 PM, Si Chen wrote:
>
>> Hi -
>>
>> Just curious...why does entityengine.xml set default character set
>> and collation for MySQL to latin1 instead of utf8?
>>
>>
>> Si
>> [hidden email]
>>
>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: why is mysql default character set latin1?

David E Jones-2

On Oct 9, 2006, at 4:52 PM, Kurt T Stam wrote:

> UTF-8 can consume up to 6 bytes per character. UCS2 is strictly 2  
> bytes.
> However most people prefer the 'backwards compatible' utf-8 where  
> ASCII
> range characters still only consume 1 byte, so it should NOT overflow
> using ASCII, but it might using Asian characters. BTW, on average it
> takes 3 bytes per character for Asian characters, so a rule of  
> thumb is
> to increase your string lengths by 3 when doing i18n.
>
> Any db will have this 'problem'..

To some extent this is true, but it seems that many other databases  
"hide" this internally by treating field sizes as the total number of  
characters instead of the total number of bytes. In other words, if  
you are using a multi-byte character set like UTF-8 and it wants to  
reserve 3 bytes per character and you say your column should be 255  
characters, then internally it will make that 765 bytes to cover  
those 255 characters you wanted in your column size.

In the 4 series MySQL didn't do this, hence the latin character set  
default. I don't know if this has changed in the 5 series, but it  
sure would be nice!

-David