OFBiz › OFBiz - User

why is mysql default character set latin1?

_‹ Previous Topic Next Topic _›

Classic

List

Threaded

5 messages Options

Si Chen-2

why is mysql default character set latin1?

Hi -

Just curious...why does entityengine.xml set default character set
and collation for MySQL to latin1 instead of utf8?

Si
[hidden email]

Peter Dirickson

RE: why is mysql default character set latin1?

Hi Si,

Is it maybe because the MySQL default collation is latin1?

Peter.

-----Original Message-----
From: Si Chen [mailto:[hidden email]]
Sent: Thursday, October 05, 2006 11:09 AM
To: [hidden email]
Subject: why is mysql default character set latin1?

Hi -

Just curious...why does entityengine.xml set default character set
and collation for MySQL to latin1 instead of utf8?

Si
[hidden email]

David E Jones-2

Re: why is mysql default character set latin1?

In reply to this post by Si Chen-2

With MySQL 4.1X using UTF-8 really messed up the column sizes because
it stores each UTF-8 character as 3 bytes (why not 2 I don't
know...). In other words, if you had a varchar of length 60 and put
in a 21 character UTF-8 string, it will overflow...

I don't know if this is still an issue with the 5 series of MySQL.

-David

On Oct 5, 2006, at 7:08 PM, Si Chen wrote:

> Hi -
>
> Just curious...why does entityengine.xml set default character set
> and collation for MySQL to latin1 instead of utf8?
>
>
> Si
> [hidden email]
>
>
>

Kurt T Stam-3

Re: why is mysql default character set latin1?

UTF-8 can consume up to 6 bytes per character. UCS2 is strictly 2 bytes.
However most people prefer the 'backwards compatible' utf-8 where ASCII
range characters still only consume 1 byte, so it should NOT overflow
using ASCII, but it might using Asian characters. BTW, on average it
takes 3 bytes per character for Asian characters, so a rule of thumb is
to increase your string lengths by 3 when doing i18n.

Any db will have this 'problem'..

--Kurt

David E Jones wrote:

>
> With MySQL 4.1X using UTF-8 really messed up the column sizes because
> it stores each UTF-8 character as 3 bytes (why not 2 I don't know...).
> In other words, if you had a varchar of length 60 and put in a 21
> character UTF-8 string, it will overflow...
>
> I don't know if this is still an issue with the 5 series of MySQL.
>
> -David
>
>
> On Oct 5, 2006, at 7:08 PM, Si Chen wrote:
>
>> Hi -
>>
>> Just curious...why does entityengine.xml set default character set
>> and collation for MySQL to latin1 instead of utf8?
>>
>>
>> Si
>> [hidden email]
>>
>>
>>
>

David E Jones-2

Re: why is mysql default character set latin1?

On Oct 9, 2006, at 4:52 PM, Kurt T Stam wrote:

> UTF-8 can consume up to 6 bytes per character. UCS2 is strictly 2
> bytes.
> However most people prefer the 'backwards compatible' utf-8 where
> ASCII
> range characters still only consume 1 byte, so it should NOT overflow
> using ASCII, but it might using Asian characters. BTW, on average it
> takes 3 bytes per character for Asian characters, so a rule of
> thumb is
> to increase your string lengths by 3 when doing i18n.
>
> Any db will have this 'problem'..

To some extent this is true, but it seems that many other databases
"hide" this internally by treating field sizes as the total number of
characters instead of the total number of bytes. In other words, if
you are using a multi-byte character set like UTF-8 and it wants to
reserve 3 bytes per character and you say your column should be 255
characters, then internally it will make that 765 bytes to cover
those 255 characters you wanted in your column size.

In the 4 series MySQL didn't do this, hence the latin character set
default. I don't know if this has changed in the 5 series, but it
sure would be nice!

-David