[jira] Created: (OFBIZ-281) The URIEncoding parameter of the Tomcat connector does not seem to be taken into account

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (OFBIZ-281) The URIEncoding parameter of the Tomcat connector does not seem to be taken into account

Nicolas Malin (Jira)
The URIEncoding parameter of the Tomcat connector does not seem to be taken into account
----------------------------------------------------------------------------------------

                 Key: OFBIZ-281
                 URL: http://issues.apache.org/jira/browse/OFBIZ-281
             Project: OFBiz (The Open for Business Project)
          Issue Type: Bug
          Components: framework
    Affects Versions: SVN trunk
         Environment: Linux 2.6.x, firefox 1.5
            Reporter: Marco Risaliti


When I create an entity value which contains UTF-8 characters in its primary keys, I'm unable to access it via webtools entity data maintenance or its corresponding management interface in backend.

For example, if you create a new Security Group from partymgr application with theses parameters :
id = Securité
description = Test

Then you try to select the newly created Security Group from Security Group List or you type this url :
https://127.0.0.1:8443/partymgr/control/EditSecurityGroup?groupId=securit%C3%A9 
you should obtain an Edit Security Group form with theses parameters :
id = securité -[CommonCannotBeFound: [securité]]-
description =

The symptoms are similar when you try to access to this entity via webtools
https://127.0.0.1:8443/webtools/control/ViewGeneric?entityName=SecurityGroup&groupId=securit%C3%A9 
-> Specified SecurityGroup was not found.

The problem is not specific to SecurityGroup entity, it can be reproduced for all entities.


After some search, it appears that request.getParameter(pkField) doesn't decode
correctly the UTF-8 sequence "%C3%A9" whereas URIEncoding of HTTP(S) connector is
set to UTF-8.


The patch 'URIEncoding-problem.patch' try to demonstrate that the URIEncoding
specified in base/config/ofbiz-containers.xml (UTF-8) is not set at the
connector level. After having applied the patch, recompiled Ofbiz and
restarted it, the following lines should appear at the end of Ofbiz loading.

32017 (main) [ CatalinaContainer.java:238:INFO ] Connector AJP/1.3 @ 8009 - not-secure URIEncoding=null [org.apache.jk.server.JkCoyoteHandler] started.
32018 (main) [ CatalinaContainer.java:235:INFO ] Connector HTTP/1.1 @ 8080 - not-secure URIEncoding=null [org.apache.coyote.http11.Http11Protocol] started.
32018 (main) [ CatalinaContainer.java:235:INFO ] Connector TLS @ 8443 - secure URIEncoding=null [org.apache.coyote.http11.Http11Protocol] started.
32022 (main) [ CatalinaContainer.java:242:INFO ] Started Apache Tomcat/5.5.9


I've written a small workaround that use setURIEncoding instead of setProperty
Connector's method. After having applied the patch 'URIEncoding-quickfix.patch',
recompiled Ofbiz and restarted it, you should see the following lines at the end
of ofbiz loading.
20551 (main) [ CatalinaContainer.java:238:INFO ] Connector AJP/1.3 @ 8009 - not-secure URIEncoding=UTF-8 [org.apache.jk.server.JkCoyoteHandler] started.
20552 (main) [ CatalinaContainer.java:235:INFO ] Connector HTTP/1.1 @ 8080 - not-secure URIEncoding=UTF-8 [org.apache.coyote.http11.Http11Protocol] started.
20552 (main) [ CatalinaContainer.java:235:INFO ] Connector TLS @ 8443 - secure URIEncoding=UTF-8 [org.apache.coyote.http11.Http11Protocol] started.
20602 (main) [ CatalinaContainer.java:242:INFO ] Started Apache Tomcat/5.5.9

With this patch the problem disapear but I don't think it is the right solution.


I think the behavior of the setProperty(String, String) method of Tomcat Connector class has changed in 5.5.x series. When you look at its source code :
(http://svn.apache.org/repos/asf/tomcat/container/tc5.5.x/catalina/src/share/org/apache/catalina/connector/Connector.java)
it seems this method only set parameters of the protocol handler. And URIEncoding is a parameter of the connector and not of the protocol handler.


This problem does not appear in the non-embedded version of tomcat because they use
common-digester to map xml elements and attributes of configuration file to setters
of Connector object.


Can someone confirm this problem ?
 
 

 All    Comments    Work Log    Change History       Sort Order:  
Comment by David E. Jones [24/Apr/06 12:16 AM] [ Permlink ]
This is actually a bigger issue than you might think. Even if the tomcat setting is in there properly, there are still problems so I HIGHLY recommend agains't trying to use UTF-8 characters in an HTTP URL. Below is some research I did on this a while back:

========================================================================
I tried various changes to setting of the character encoding on the request. With no character encoding neither URI parameters nor the form input (POST) are decoded properly. When UTF-8 character encoding is used the form input (POST) parameters are decoded properly, but not the URI parameters.

After playing around and verifying a few things I started to do research in the Tomcat bug tracking site and found some other things to try there, but the results are not very encouraging. The clippings below are from the following URL:

http://issues.apache.org/bugzilla/show_bug.cgi?id=23929 

Based on this I have changed the URIEncoding to UTF-8, but it does not seem to fix the problem. I tried this with the useBodyEncodingForURI with both possible values, ie true and false. In none of these conditions did it work with Safari or Firefox (Camino and other Mozilla-based browsers seem to behave the same). I did not try any of these with IE on Windows because if these browsers (especially Firefox) do not work, it doesn't really matter, but I suspect IE will have a similar problem based on what we were seeing before when we did test it with IE on Windows in the VNC session.

It looks like the best set of values for these is URIEncoding=UTF-8 and useBodyEncodingForURI=false, but even with those settings it is not working properly and according to the Tomcat guys, there isn't any way to really get this working.

So, based on all of this my recommendation is to restrict ID values to the ISO-8859-1 character set. Actually, anything that is passed as a URI parameter needs to be this way. Parameters that are passed with forms using input tags and such can have UTF-8 characters and it appears to work reliably.

BTW, I also tried using our variation on the standard ?= and &= syntax of URI parameters, which is the /~= syntax (ie /~productId=África instead of ?productId=África), and that did not work either, which is to be expected as it is also part of the URI string.

The only other thing I can think of to try is doing our own UTF-8 decoding of URI parameter values. I'm not sure it is feasible or will work reliably (or at all), but may be worth a try.

-David


===========================================
Sorry, there's no bug. BZ is not there to discuss design decisions. If you want
to do so, post on tomcat-dev. The only standard for URL encoding is to use
UTF-8, but nobody follows the standard. You can also now configure the URI
encoding in the connector. If you insist on using i18n with URL parameters, the
result is that it won't work reliably, but of course, you're free to do what you
want ;-)
Please do not reopen the report.
===========================================

AND

===========================================
From Mark:

Character encoding has been the source of quite a bit of debate on the tomcat-
dev list in recent weeks. There have been a few changes (see summary below) as
a result. Essentially some additional configuration options have been
provided. The UTF-8 issue (also reported in bug 22666) has also been fixed.

Character encoding summary:

There are a number of situations where there may be a requirement to use non-
US ASCII characters in a URI. These include:
- Parameters in the query string
- Servlet paths

There is a standard for encoding URIs (http://www.w3.org/International/O-URL- 
code.html) but this standard is not consistently followed by clients. This
causes a number of problems.

The functionality provided by Tomcat (4 and 5) to handle this less than ideal
situation is described below.

1. The Coyote HTTP/1.1 connector has a useBodyEncodingForURI attribute which
if set to true will use the request body encoding to decode the URI query
parameters.
  - The default value is true for TC4 (breaks spec but gives consistent
behaviour across TC4 versions)
  - The default value is false for TC5 (spec compliant but there may be
migration issues for some apps)
2. The Coyote HTTP/1.1 connector has a URIEncoding attribute which defaults to
ISO-8859-1.
3. The parameters class (o.a.t.u.http.Parameters) has a QueryStringEncoding
field which defaults to the URIEncoding. It must be set before the parameters
are parsed to have an effect.

Things to note regarding the servlet API:
1. HttpServletRequest.setCharacterEncoding() normally only applies to the
request body NOT the URI.
2. HttpServletRequest.getPathInfo() is decoded by the web container.
3. HttpServletRequest.getRequestURI() is not decoded by container.

Other tips:
1. Use POST with forms to return parameters as the parameters are then part of
the request body.
===========================================

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (OFBIZ-281) The URIEncoding parameter of the Tomcat connector does not seem to be taken into account

Nicolas Malin (Jira)
     [ http://issues.apache.org/jira/browse/OFBIZ-281?page=all ]

Marco Risaliti updated OFBIZ-281:
---------------------------------

    Attachment: URIEncoding-problem.patch
                URIEncoding-quickfix.patch

> The URIEncoding parameter of the Tomcat connector does not seem to be taken into account
> ----------------------------------------------------------------------------------------
>
>                 Key: OFBIZ-281
>                 URL: http://issues.apache.org/jira/browse/OFBIZ-281
>             Project: OFBiz (The Open for Business Project)
>          Issue Type: Bug
>          Components: framework
>    Affects Versions: SVN trunk
>         Environment: Linux 2.6.x, firefox 1.5
>            Reporter: Marco Risaliti
>         Attachments: URIEncoding-problem.patch, URIEncoding-quickfix.patch
>
>
> When I create an entity value which contains UTF-8 characters in its primary keys, I'm unable to access it via webtools entity data maintenance or its corresponding management interface in backend.
> For example, if you create a new Security Group from partymgr application with theses parameters :
> id = Securité
> description = Test
> Then you try to select the newly created Security Group from Security Group List or you type this url :
> https://127.0.0.1:8443/partymgr/control/EditSecurityGroup?groupId=securit%C3%A9 
> you should obtain an Edit Security Group form with theses parameters :
> id = securité -[CommonCannotBeFound: [securité]]-
> description =
> The symptoms are similar when you try to access to this entity via webtools
> https://127.0.0.1:8443/webtools/control/ViewGeneric?entityName=SecurityGroup&groupId=securit%C3%A9 
> -> Specified SecurityGroup was not found.
> The problem is not specific to SecurityGroup entity, it can be reproduced for all entities.
> After some search, it appears that request.getParameter(pkField) doesn't decode
> correctly the UTF-8 sequence "%C3%A9" whereas URIEncoding of HTTP(S) connector is
> set to UTF-8.
> The patch 'URIEncoding-problem.patch' try to demonstrate that the URIEncoding
> specified in base/config/ofbiz-containers.xml (UTF-8) is not set at the
> connector level. After having applied the patch, recompiled Ofbiz and
> restarted it, the following lines should appear at the end of Ofbiz loading.
> 32017 (main) [ CatalinaContainer.java:238:INFO ] Connector AJP/1.3 @ 8009 - not-secure URIEncoding=null [org.apache.jk.server.JkCoyoteHandler] started.
> 32018 (main) [ CatalinaContainer.java:235:INFO ] Connector HTTP/1.1 @ 8080 - not-secure URIEncoding=null [org.apache.coyote.http11.Http11Protocol] started.
> 32018 (main) [ CatalinaContainer.java:235:INFO ] Connector TLS @ 8443 - secure URIEncoding=null [org.apache.coyote.http11.Http11Protocol] started.
> 32022 (main) [ CatalinaContainer.java:242:INFO ] Started Apache Tomcat/5.5.9
> I've written a small workaround that use setURIEncoding instead of setProperty
> Connector's method. After having applied the patch 'URIEncoding-quickfix.patch',
> recompiled Ofbiz and restarted it, you should see the following lines at the end
> of ofbiz loading.
> 20551 (main) [ CatalinaContainer.java:238:INFO ] Connector AJP/1.3 @ 8009 - not-secure URIEncoding=UTF-8 [org.apache.jk.server.JkCoyoteHandler] started.
> 20552 (main) [ CatalinaContainer.java:235:INFO ] Connector HTTP/1.1 @ 8080 - not-secure URIEncoding=UTF-8 [org.apache.coyote.http11.Http11Protocol] started.
> 20552 (main) [ CatalinaContainer.java:235:INFO ] Connector TLS @ 8443 - secure URIEncoding=UTF-8 [org.apache.coyote.http11.Http11Protocol] started.
> 20602 (main) [ CatalinaContainer.java:242:INFO ] Started Apache Tomcat/5.5.9
> With this patch the problem disapear but I don't think it is the right solution.
> I think the behavior of the setProperty(String, String) method of Tomcat Connector class has changed in 5.5.x series. When you look at its source code :
> (http://svn.apache.org/repos/asf/tomcat/container/tc5.5.x/catalina/src/share/org/apache/catalina/connector/Connector.java)
> it seems this method only set parameters of the protocol handler. And URIEncoding is a parameter of the connector and not of the protocol handler.
> This problem does not appear in the non-embedded version of tomcat because they use
> common-digester to map xml elements and attributes of configuration file to setters
> of Connector object.
> Can someone confirm this problem ?
>  
>  
>  All    Comments    Work Log    Change History       Sort Order:  
> Comment by David E. Jones [24/Apr/06 12:16 AM] [ Permlink ]
> This is actually a bigger issue than you might think. Even if the tomcat setting is in there properly, there are still problems so I HIGHLY recommend agains't trying to use UTF-8 characters in an HTTP URL. Below is some research I did on this a while back:
> ========================================================================
> I tried various changes to setting of the character encoding on the request. With no character encoding neither URI parameters nor the form input (POST) are decoded properly. When UTF-8 character encoding is used the form input (POST) parameters are decoded properly, but not the URI parameters.
> After playing around and verifying a few things I started to do research in the Tomcat bug tracking site and found some other things to try there, but the results are not very encouraging. The clippings below are from the following URL:
> http://issues.apache.org/bugzilla/show_bug.cgi?id=23929 
> Based on this I have changed the URIEncoding to UTF-8, but it does not seem to fix the problem. I tried this with the useBodyEncodingForURI with both possible values, ie true and false. In none of these conditions did it work with Safari or Firefox (Camino and other Mozilla-based browsers seem to behave the same). I did not try any of these with IE on Windows because if these browsers (especially Firefox) do not work, it doesn't really matter, but I suspect IE will have a similar problem based on what we were seeing before when we did test it with IE on Windows in the VNC session.
> It looks like the best set of values for these is URIEncoding=UTF-8 and useBodyEncodingForURI=false, but even with those settings it is not working properly and according to the Tomcat guys, there isn't any way to really get this working.
> So, based on all of this my recommendation is to restrict ID values to the ISO-8859-1 character set. Actually, anything that is passed as a URI parameter needs to be this way. Parameters that are passed with forms using input tags and such can have UTF-8 characters and it appears to work reliably.
> BTW, I also tried using our variation on the standard ?= and &= syntax of URI parameters, which is the /~= syntax (ie /~productId=África instead of ?productId=África), and that did not work either, which is to be expected as it is also part of the URI string.
> The only other thing I can think of to try is doing our own UTF-8 decoding of URI parameter values. I'm not sure it is feasible or will work reliably (or at all), but may be worth a try.
> -David
> ===========================================
> Sorry, there's no bug. BZ is not there to discuss design decisions. If you want
> to do so, post on tomcat-dev. The only standard for URL encoding is to use
> UTF-8, but nobody follows the standard. You can also now configure the URI
> encoding in the connector. If you insist on using i18n with URL parameters, the
> result is that it won't work reliably, but of course, you're free to do what you
> want ;-)
> Please do not reopen the report.
> ===========================================
> AND
> ===========================================
> From Mark:
> Character encoding has been the source of quite a bit of debate on the tomcat-
> dev list in recent weeks. There have been a few changes (see summary below) as
> a result. Essentially some additional configuration options have been
> provided. The UTF-8 issue (also reported in bug 22666) has also been fixed.
> Character encoding summary:
> There are a number of situations where there may be a requirement to use non-
> US ASCII characters in a URI. These include:
> - Parameters in the query string
> - Servlet paths
> There is a standard for encoding URIs (http://www.w3.org/International/O-URL- 
> code.html) but this standard is not consistently followed by clients. This
> causes a number of problems.
> The functionality provided by Tomcat (4 and 5) to handle this less than ideal
> situation is described below.
> 1. The Coyote HTTP/1.1 connector has a useBodyEncodingForURI attribute which
> if set to true will use the request body encoding to decode the URI query
> parameters.
>   - The default value is true for TC4 (breaks spec but gives consistent
> behaviour across TC4 versions)
>   - The default value is false for TC5 (spec compliant but there may be
> migration issues for some apps)
> 2. The Coyote HTTP/1.1 connector has a URIEncoding attribute which defaults to
> ISO-8859-1.
> 3. The parameters class (o.a.t.u.http.Parameters) has a QueryStringEncoding
> field which defaults to the URIEncoding. It must be set before the parameters
> are parsed to have an effect.
> Things to note regarding the servlet API:
> 1. HttpServletRequest.setCharacterEncoding() normally only applies to the
> request body NOT the URI.
> 2. HttpServletRequest.getPathInfo() is decoded by the web container.
> 3. HttpServletRequest.getRequestURI() is not decoded by container.
> Other tips:
> 1. Use POST with forms to return parameters as the parameters are then part of
> the request body.
> ===========================================

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (OFBIZ-281) The URIEncoding parameter of the Tomcat connector does not seem to be taken into account

Nicolas Malin (Jira)
In reply to this post by Nicolas Malin (Jira)
     [ http://issues.apache.org/jira/browse/OFBIZ-281?page=all ]

Jacopo Cappellato updated OFBIZ-281:
------------------------------------

    Description:
Copy of http://jira.undersunconsulting.com/browse/OFBIZ-861 from Peter Goron.

=======================================================


When I create an entity value which contains UTF-8 characters in its primary keys, I'm unable to access it via webtools entity data maintenance or its corresponding management interface in backend.

For example, if you create a new Security Group from partymgr application with theses parameters :
id = Securité
description = Test

Then you try to select the newly created Security Group from Security Group List or you type this url :
https://127.0.0.1:8443/partymgr/control/EditSecurityGroup?groupId=securit%C3%A9 
you should obtain an Edit Security Group form with theses parameters :
id = securité -[CommonCannotBeFound: [securité]]-
description =

The symptoms are similar when you try to access to this entity via webtools
https://127.0.0.1:8443/webtools/control/ViewGeneric?entityName=SecurityGroup&groupId=securit%C3%A9 
-> Specified SecurityGroup was not found.

The problem is not specific to SecurityGroup entity, it can be reproduced for all entities.


After some search, it appears that request.getParameter(pkField) doesn't decode
correctly the UTF-8 sequence "%C3%A9" whereas URIEncoding of HTTP(S) connector is
set to UTF-8.


The patch 'URIEncoding-problem.patch' try to demonstrate that the URIEncoding
specified in base/config/ofbiz-containers.xml (UTF-8) is not set at the
connector level. After having applied the patch, recompiled Ofbiz and
restarted it, the following lines should appear at the end of Ofbiz loading.

32017 (main) [ CatalinaContainer.java:238:INFO ] Connector AJP/1.3 @ 8009 - not-secure URIEncoding=null [org.apache.jk.server.JkCoyoteHandler] started.
32018 (main) [ CatalinaContainer.java:235:INFO ] Connector HTTP/1.1 @ 8080 - not-secure URIEncoding=null [org.apache.coyote.http11.Http11Protocol] started.
32018 (main) [ CatalinaContainer.java:235:INFO ] Connector TLS @ 8443 - secure URIEncoding=null [org.apache.coyote.http11.Http11Protocol] started.
32022 (main) [ CatalinaContainer.java:242:INFO ] Started Apache Tomcat/5.5.9


I've written a small workaround that use setURIEncoding instead of setProperty
Connector's method. After having applied the patch 'URIEncoding-quickfix.patch',
recompiled Ofbiz and restarted it, you should see the following lines at the end
of ofbiz loading.
20551 (main) [ CatalinaContainer.java:238:INFO ] Connector AJP/1.3 @ 8009 - not-secure URIEncoding=UTF-8 [org.apache.jk.server.JkCoyoteHandler] started.
20552 (main) [ CatalinaContainer.java:235:INFO ] Connector HTTP/1.1 @ 8080 - not-secure URIEncoding=UTF-8 [org.apache.coyote.http11.Http11Protocol] started.
20552 (main) [ CatalinaContainer.java:235:INFO ] Connector TLS @ 8443 - secure URIEncoding=UTF-8 [org.apache.coyote.http11.Http11Protocol] started.
20602 (main) [ CatalinaContainer.java:242:INFO ] Started Apache Tomcat/5.5.9

With this patch the problem disapear but I don't think it is the right solution.


I think the behavior of the setProperty(String, String) method of Tomcat Connector class has changed in 5.5.x series. When you look at its source code :
(http://svn.apache.org/repos/asf/tomcat/container/tc5.5.x/catalina/src/share/org/apache/catalina/connector/Connector.java)
it seems this method only set parameters of the protocol handler. And URIEncoding is a parameter of the connector and not of the protocol handler.


This problem does not appear in the non-embedded version of tomcat because they use
common-digester to map xml elements and attributes of configuration file to setters
of Connector object.


Can someone confirm this problem ?
 
 

 All    Comments    Work Log    Change History       Sort Order:  
Comment by David E. Jones [24/Apr/06 12:16 AM] [ Permlink ]
This is actually a bigger issue than you might think. Even if the tomcat setting is in there properly, there are still problems so I HIGHLY recommend agains't trying to use UTF-8 characters in an HTTP URL. Below is some research I did on this a while back:

========================================================================
I tried various changes to setting of the character encoding on the request. With no character encoding neither URI parameters nor the form input (POST) are decoded properly. When UTF-8 character encoding is used the form input (POST) parameters are decoded properly, but not the URI parameters.

After playing around and verifying a few things I started to do research in the Tomcat bug tracking site and found some other things to try there, but the results are not very encouraging. The clippings below are from the following URL:

http://issues.apache.org/bugzilla/show_bug.cgi?id=23929 

Based on this I have changed the URIEncoding to UTF-8, but it does not seem to fix the problem. I tried this with the useBodyEncodingForURI with both possible values, ie true and false. In none of these conditions did it work with Safari or Firefox (Camino and other Mozilla-based browsers seem to behave the same). I did not try any of these with IE on Windows because if these browsers (especially Firefox) do not work, it doesn't really matter, but I suspect IE will have a similar problem based on what we were seeing before when we did test it with IE on Windows in the VNC session.

It looks like the best set of values for these is URIEncoding=UTF-8 and useBodyEncodingForURI=false, but even with those settings it is not working properly and according to the Tomcat guys, there isn't any way to really get this working.

So, based on all of this my recommendation is to restrict ID values to the ISO-8859-1 character set. Actually, anything that is passed as a URI parameter needs to be this way. Parameters that are passed with forms using input tags and such can have UTF-8 characters and it appears to work reliably.

BTW, I also tried using our variation on the standard ?= and &= syntax of URI parameters, which is the /~= syntax (ie /~productId=África instead of ?productId=África), and that did not work either, which is to be expected as it is also part of the URI string.

The only other thing I can think of to try is doing our own UTF-8 decoding of URI parameter values. I'm not sure it is feasible or will work reliably (or at all), but may be worth a try.

-David


===========================================
Sorry, there's no bug. BZ is not there to discuss design decisions. If you want
to do so, post on tomcat-dev. The only standard for URL encoding is to use
UTF-8, but nobody follows the standard. You can also now configure the URI
encoding in the connector. If you insist on using i18n with URL parameters, the
result is that it won't work reliably, but of course, you're free to do what you
want ;-)
Please do not reopen the report.
===========================================

AND

===========================================
From Mark:

Character encoding has been the source of quite a bit of debate on the tomcat-
dev list in recent weeks. There have been a few changes (see summary below) as
a result. Essentially some additional configuration options have been
provided. The UTF-8 issue (also reported in bug 22666) has also been fixed.

Character encoding summary:

There are a number of situations where there may be a requirement to use non-
US ASCII characters in a URI. These include:
- Parameters in the query string
- Servlet paths

There is a standard for encoding URIs (http://www.w3.org/International/O-URL- 
code.html) but this standard is not consistently followed by clients. This
causes a number of problems.

The functionality provided by Tomcat (4 and 5) to handle this less than ideal
situation is described below.

1. The Coyote HTTP/1.1 connector has a useBodyEncodingForURI attribute which
if set to true will use the request body encoding to decode the URI query
parameters.
  - The default value is true for TC4 (breaks spec but gives consistent
behaviour across TC4 versions)
  - The default value is false for TC5 (spec compliant but there may be
migration issues for some apps)
2. The Coyote HTTP/1.1 connector has a URIEncoding attribute which defaults to
ISO-8859-1.
3. The parameters class (o.a.t.u.http.Parameters) has a QueryStringEncoding
field which defaults to the URIEncoding. It must be set before the parameters
are parsed to have an effect.

Things to note regarding the servlet API:
1. HttpServletRequest.setCharacterEncoding() normally only applies to the
request body NOT the URI.
2. HttpServletRequest.getPathInfo() is decoded by the web container.
3. HttpServletRequest.getRequestURI() is not decoded by container.

Other tips:
1. Use POST with forms to return parameters as the parameters are then part of
the request body.
===========================================

  was:
When I create an entity value which contains UTF-8 characters in its primary keys, I'm unable to access it via webtools entity data maintenance or its corresponding management interface in backend.

For example, if you create a new Security Group from partymgr application with theses parameters :
id = Securité
description = Test

Then you try to select the newly created Security Group from Security Group List or you type this url :
https://127.0.0.1:8443/partymgr/control/EditSecurityGroup?groupId=securit%C3%A9 
you should obtain an Edit Security Group form with theses parameters :
id = securité -[CommonCannotBeFound: [securité]]-
description =

The symptoms are similar when you try to access to this entity via webtools
https://127.0.0.1:8443/webtools/control/ViewGeneric?entityName=SecurityGroup&groupId=securit%C3%A9 
-> Specified SecurityGroup was not found.

The problem is not specific to SecurityGroup entity, it can be reproduced for all entities.


After some search, it appears that request.getParameter(pkField) doesn't decode
correctly the UTF-8 sequence "%C3%A9" whereas URIEncoding of HTTP(S) connector is
set to UTF-8.


The patch 'URIEncoding-problem.patch' try to demonstrate that the URIEncoding
specified in base/config/ofbiz-containers.xml (UTF-8) is not set at the
connector level. After having applied the patch, recompiled Ofbiz and
restarted it, the following lines should appear at the end of Ofbiz loading.

32017 (main) [ CatalinaContainer.java:238:INFO ] Connector AJP/1.3 @ 8009 - not-secure URIEncoding=null [org.apache.jk.server.JkCoyoteHandler] started.
32018 (main) [ CatalinaContainer.java:235:INFO ] Connector HTTP/1.1 @ 8080 - not-secure URIEncoding=null [org.apache.coyote.http11.Http11Protocol] started.
32018 (main) [ CatalinaContainer.java:235:INFO ] Connector TLS @ 8443 - secure URIEncoding=null [org.apache.coyote.http11.Http11Protocol] started.
32022 (main) [ CatalinaContainer.java:242:INFO ] Started Apache Tomcat/5.5.9


I've written a small workaround that use setURIEncoding instead of setProperty
Connector's method. After having applied the patch 'URIEncoding-quickfix.patch',
recompiled Ofbiz and restarted it, you should see the following lines at the end
of ofbiz loading.
20551 (main) [ CatalinaContainer.java:238:INFO ] Connector AJP/1.3 @ 8009 - not-secure URIEncoding=UTF-8 [org.apache.jk.server.JkCoyoteHandler] started.
20552 (main) [ CatalinaContainer.java:235:INFO ] Connector HTTP/1.1 @ 8080 - not-secure URIEncoding=UTF-8 [org.apache.coyote.http11.Http11Protocol] started.
20552 (main) [ CatalinaContainer.java:235:INFO ] Connector TLS @ 8443 - secure URIEncoding=UTF-8 [org.apache.coyote.http11.Http11Protocol] started.
20602 (main) [ CatalinaContainer.java:242:INFO ] Started Apache Tomcat/5.5.9

With this patch the problem disapear but I don't think it is the right solution.


I think the behavior of the setProperty(String, String) method of Tomcat Connector class has changed in 5.5.x series. When you look at its source code :
(http://svn.apache.org/repos/asf/tomcat/container/tc5.5.x/catalina/src/share/org/apache/catalina/connector/Connector.java)
it seems this method only set parameters of the protocol handler. And URIEncoding is a parameter of the connector and not of the protocol handler.


This problem does not appear in the non-embedded version of tomcat because they use
common-digester to map xml elements and attributes of configuration file to setters
of Connector object.


Can someone confirm this problem ?
 
 

 All    Comments    Work Log    Change History       Sort Order:  
Comment by David E. Jones [24/Apr/06 12:16 AM] [ Permlink ]
This is actually a bigger issue than you might think. Even if the tomcat setting is in there properly, there are still problems so I HIGHLY recommend agains't trying to use UTF-8 characters in an HTTP URL. Below is some research I did on this a while back:

========================================================================
I tried various changes to setting of the character encoding on the request. With no character encoding neither URI parameters nor the form input (POST) are decoded properly. When UTF-8 character encoding is used the form input (POST) parameters are decoded properly, but not the URI parameters.

After playing around and verifying a few things I started to do research in the Tomcat bug tracking site and found some other things to try there, but the results are not very encouraging. The clippings below are from the following URL:

http://issues.apache.org/bugzilla/show_bug.cgi?id=23929 

Based on this I have changed the URIEncoding to UTF-8, but it does not seem to fix the problem. I tried this with the useBodyEncodingForURI with both possible values, ie true and false. In none of these conditions did it work with Safari or Firefox (Camino and other Mozilla-based browsers seem to behave the same). I did not try any of these with IE on Windows because if these browsers (especially Firefox) do not work, it doesn't really matter, but I suspect IE will have a similar problem based on what we were seeing before when we did test it with IE on Windows in the VNC session.

It looks like the best set of values for these is URIEncoding=UTF-8 and useBodyEncodingForURI=false, but even with those settings it is not working properly and according to the Tomcat guys, there isn't any way to really get this working.

So, based on all of this my recommendation is to restrict ID values to the ISO-8859-1 character set. Actually, anything that is passed as a URI parameter needs to be this way. Parameters that are passed with forms using input tags and such can have UTF-8 characters and it appears to work reliably.

BTW, I also tried using our variation on the standard ?= and &= syntax of URI parameters, which is the /~= syntax (ie /~productId=África instead of ?productId=África), and that did not work either, which is to be expected as it is also part of the URI string.

The only other thing I can think of to try is doing our own UTF-8 decoding of URI parameter values. I'm not sure it is feasible or will work reliably (or at all), but may be worth a try.

-David


===========================================
Sorry, there's no bug. BZ is not there to discuss design decisions. If you want
to do so, post on tomcat-dev. The only standard for URL encoding is to use
UTF-8, but nobody follows the standard. You can also now configure the URI
encoding in the connector. If you insist on using i18n with URL parameters, the
result is that it won't work reliably, but of course, you're free to do what you
want ;-)
Please do not reopen the report.
===========================================

AND

===========================================
From Mark:

Character encoding has been the source of quite a bit of debate on the tomcat-
dev list in recent weeks. There have been a few changes (see summary below) as
a result. Essentially some additional configuration options have been
provided. The UTF-8 issue (also reported in bug 22666) has also been fixed.

Character encoding summary:

There are a number of situations where there may be a requirement to use non-
US ASCII characters in a URI. These include:
- Parameters in the query string
- Servlet paths

There is a standard for encoding URIs (http://www.w3.org/International/O-URL- 
code.html) but this standard is not consistently followed by clients. This
causes a number of problems.

The functionality provided by Tomcat (4 and 5) to handle this less than ideal
situation is described below.

1. The Coyote HTTP/1.1 connector has a useBodyEncodingForURI attribute which
if set to true will use the request body encoding to decode the URI query
parameters.
  - The default value is true for TC4 (breaks spec but gives consistent
behaviour across TC4 versions)
  - The default value is false for TC5 (spec compliant but there may be
migration issues for some apps)
2. The Coyote HTTP/1.1 connector has a URIEncoding attribute which defaults to
ISO-8859-1.
3. The parameters class (o.a.t.u.http.Parameters) has a QueryStringEncoding
field which defaults to the URIEncoding. It must be set before the parameters
are parsed to have an effect.

Things to note regarding the servlet API:
1. HttpServletRequest.setCharacterEncoding() normally only applies to the
request body NOT the URI.
2. HttpServletRequest.getPathInfo() is decoded by the web container.
3. HttpServletRequest.getRequestURI() is not decoded by container.

Other tips:
1. Use POST with forms to return parameters as the parameters are then part of
the request body.
===========================================


> The URIEncoding parameter of the Tomcat connector does not seem to be taken into account
> ----------------------------------------------------------------------------------------
>
>                 Key: OFBIZ-281
>                 URL: http://issues.apache.org/jira/browse/OFBIZ-281
>             Project: OFBiz (The Open for Business Project)
>          Issue Type: Bug
>          Components: framework
>    Affects Versions: SVN trunk
>         Environment: Linux 2.6.x, firefox 1.5
>            Reporter: Marco Risaliti
>         Attachments: URIEncoding-problem.patch, URIEncoding-quickfix.patch
>
>
> Copy of http://jira.undersunconsulting.com/browse/OFBIZ-861 from Peter Goron.
> =======================================================
> When I create an entity value which contains UTF-8 characters in its primary keys, I'm unable to access it via webtools entity data maintenance or its corresponding management interface in backend.
> For example, if you create a new Security Group from partymgr application with theses parameters :
> id = Securité
> description = Test
> Then you try to select the newly created Security Group from Security Group List or you type this url :
> https://127.0.0.1:8443/partymgr/control/EditSecurityGroup?groupId=securit%C3%A9 
> you should obtain an Edit Security Group form with theses parameters :
> id = securité -[CommonCannotBeFound: [securité]]-
> description =
> The symptoms are similar when you try to access to this entity via webtools
> https://127.0.0.1:8443/webtools/control/ViewGeneric?entityName=SecurityGroup&groupId=securit%C3%A9 
> -> Specified SecurityGroup was not found.
> The problem is not specific to SecurityGroup entity, it can be reproduced for all entities.
> After some search, it appears that request.getParameter(pkField) doesn't decode
> correctly the UTF-8 sequence "%C3%A9" whereas URIEncoding of HTTP(S) connector is
> set to UTF-8.
> The patch 'URIEncoding-problem.patch' try to demonstrate that the URIEncoding
> specified in base/config/ofbiz-containers.xml (UTF-8) is not set at the
> connector level. After having applied the patch, recompiled Ofbiz and
> restarted it, the following lines should appear at the end of Ofbiz loading.
> 32017 (main) [ CatalinaContainer.java:238:INFO ] Connector AJP/1.3 @ 8009 - not-secure URIEncoding=null [org.apache.jk.server.JkCoyoteHandler] started.
> 32018 (main) [ CatalinaContainer.java:235:INFO ] Connector HTTP/1.1 @ 8080 - not-secure URIEncoding=null [org.apache.coyote.http11.Http11Protocol] started.
> 32018 (main) [ CatalinaContainer.java:235:INFO ] Connector TLS @ 8443 - secure URIEncoding=null [org.apache.coyote.http11.Http11Protocol] started.
> 32022 (main) [ CatalinaContainer.java:242:INFO ] Started Apache Tomcat/5.5.9
> I've written a small workaround that use setURIEncoding instead of setProperty
> Connector's method. After having applied the patch 'URIEncoding-quickfix.patch',
> recompiled Ofbiz and restarted it, you should see the following lines at the end
> of ofbiz loading.
> 20551 (main) [ CatalinaContainer.java:238:INFO ] Connector AJP/1.3 @ 8009 - not-secure URIEncoding=UTF-8 [org.apache.jk.server.JkCoyoteHandler] started.
> 20552 (main) [ CatalinaContainer.java:235:INFO ] Connector HTTP/1.1 @ 8080 - not-secure URIEncoding=UTF-8 [org.apache.coyote.http11.Http11Protocol] started.
> 20552 (main) [ CatalinaContainer.java:235:INFO ] Connector TLS @ 8443 - secure URIEncoding=UTF-8 [org.apache.coyote.http11.Http11Protocol] started.
> 20602 (main) [ CatalinaContainer.java:242:INFO ] Started Apache Tomcat/5.5.9
> With this patch the problem disapear but I don't think it is the right solution.
> I think the behavior of the setProperty(String, String) method of Tomcat Connector class has changed in 5.5.x series. When you look at its source code :
> (http://svn.apache.org/repos/asf/tomcat/container/tc5.5.x/catalina/src/share/org/apache/catalina/connector/Connector.java)
> it seems this method only set parameters of the protocol handler. And URIEncoding is a parameter of the connector and not of the protocol handler.
> This problem does not appear in the non-embedded version of tomcat because they use
> common-digester to map xml elements and attributes of configuration file to setters
> of Connector object.
> Can someone confirm this problem ?
>  
>  
>  All    Comments    Work Log    Change History       Sort Order:  
> Comment by David E. Jones [24/Apr/06 12:16 AM] [ Permlink ]
> This is actually a bigger issue than you might think. Even if the tomcat setting is in there properly, there are still problems so I HIGHLY recommend agains't trying to use UTF-8 characters in an HTTP URL. Below is some research I did on this a while back:
> ========================================================================
> I tried various changes to setting of the character encoding on the request. With no character encoding neither URI parameters nor the form input (POST) are decoded properly. When UTF-8 character encoding is used the form input (POST) parameters are decoded properly, but not the URI parameters.
> After playing around and verifying a few things I started to do research in the Tomcat bug tracking site and found some other things to try there, but the results are not very encouraging. The clippings below are from the following URL:
> http://issues.apache.org/bugzilla/show_bug.cgi?id=23929 
> Based on this I have changed the URIEncoding to UTF-8, but it does not seem to fix the problem. I tried this with the useBodyEncodingForURI with both possible values, ie true and false. In none of these conditions did it work with Safari or Firefox (Camino and other Mozilla-based browsers seem to behave the same). I did not try any of these with IE on Windows because if these browsers (especially Firefox) do not work, it doesn't really matter, but I suspect IE will have a similar problem based on what we were seeing before when we did test it with IE on Windows in the VNC session.
> It looks like the best set of values for these is URIEncoding=UTF-8 and useBodyEncodingForURI=false, but even with those settings it is not working properly and according to the Tomcat guys, there isn't any way to really get this working.
> So, based on all of this my recommendation is to restrict ID values to the ISO-8859-1 character set. Actually, anything that is passed as a URI parameter needs to be this way. Parameters that are passed with forms using input tags and such can have UTF-8 characters and it appears to work reliably.
> BTW, I also tried using our variation on the standard ?= and &= syntax of URI parameters, which is the /~= syntax (ie /~productId=África instead of ?productId=África), and that did not work either, which is to be expected as it is also part of the URI string.
> The only other thing I can think of to try is doing our own UTF-8 decoding of URI parameter values. I'm not sure it is feasible or will work reliably (or at all), but may be worth a try.
> -David
> ===========================================
> Sorry, there's no bug. BZ is not there to discuss design decisions. If you want
> to do so, post on tomcat-dev. The only standard for URL encoding is to use
> UTF-8, but nobody follows the standard. You can also now configure the URI
> encoding in the connector. If you insist on using i18n with URL parameters, the
> result is that it won't work reliably, but of course, you're free to do what you
> want ;-)
> Please do not reopen the report.
> ===========================================
> AND
> ===========================================
> From Mark:
> Character encoding has been the source of quite a bit of debate on the tomcat-
> dev list in recent weeks. There have been a few changes (see summary below) as
> a result. Essentially some additional configuration options have been
> provided. The UTF-8 issue (also reported in bug 22666) has also been fixed.
> Character encoding summary:
> There are a number of situations where there may be a requirement to use non-
> US ASCII characters in a URI. These include:
> - Parameters in the query string
> - Servlet paths
> There is a standard for encoding URIs (http://www.w3.org/International/O-URL- 
> code.html) but this standard is not consistently followed by clients. This
> causes a number of problems.
> The functionality provided by Tomcat (4 and 5) to handle this less than ideal
> situation is described below.
> 1. The Coyote HTTP/1.1 connector has a useBodyEncodingForURI attribute which
> if set to true will use the request body encoding to decode the URI query
> parameters.
>   - The default value is true for TC4 (breaks spec but gives consistent
> behaviour across TC4 versions)
>   - The default value is false for TC5 (spec compliant but there may be
> migration issues for some apps)
> 2. The Coyote HTTP/1.1 connector has a URIEncoding attribute which defaults to
> ISO-8859-1.
> 3. The parameters class (o.a.t.u.http.Parameters) has a QueryStringEncoding
> field which defaults to the URIEncoding. It must be set before the parameters
> are parsed to have an effect.
> Things to note regarding the servlet API:
> 1. HttpServletRequest.setCharacterEncoding() normally only applies to the
> request body NOT the URI.
> 2. HttpServletRequest.getPathInfo() is decoded by the web container.
> 3. HttpServletRequest.getRequestURI() is not decoded by container.
> Other tips:
> 1. Use POST with forms to return parameters as the parameters are then part of
> the request body.
> ===========================================

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira