[OFBiz] Dev - ofbiz sql(whee!) (very long)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[OFBiz] Dev - ofbiz sql(whee!) (very long)

Adam Heath-2
So, as talked about at the last users' conference, here's a teaser.

I have code that can parse SQL-like strings and files, and product
EntityCondition objects.  For example:

==
InputStream in = StringBufferInputStream(
"CONDITION foo field1 = 'bar' OR field2 = 'baz' AND field3 " +
"IN (1, 2, 3); CONDITION bar id = ?id;";
SqlScript sqlScript = new SqlScript(inputStream);
SqlScriptVisitor visitor = MakeCondition();
SimpleNode node = sqlScript.NamedConditions();
Map namedConditions = (Map) node.jjtAccept(visitor, null);
==

That returns a map of name -> condition.  There is support for named
parameters, which make use of the freeze support I added last year, and a
ThreadLocal, to convert the condition to a real instance.

The above pseudo-code(which is very close to the real code I have) can cache
the parsed EntityCondition object, and only needs to call freeze() if
parameters are being expanded.

Here's the api I'd like to see:

==
condition = EntityConditionUtil.getCondition(string);
condition = EntityConditionUtil.getCondition(string, map);
condition = EntityConditionUtil.get(url, name);
condition = EntityConditionUtil.get(url, name, map);
==

The url forms could use UtilCache to store the parsed file.

The variants that take a map will call freeze() to create a real condition.

Here's what I have done.

* jjtree/javacc grammar for conditions(actually, full sql, see below)
* implementation of SqlScriptVisitor that produces EntityCondition objects
* new EntityConditionValue classes:
  * NamedParameter:
    placeholder for ?name in condition strings
  * ConstantValue:
    NamedParameter replaces itself with this during freeze.

All that's needed is a url->inputstream mapper(trivial), storage in UtilCache,
and the above wrapping code in EntityConditionUtil.

Ok, now on to the next big feature.

The jjtree grammar I have is not just for conditions.  It can do full sql.
And by full script, I mean CREATE TABLE, SELECT, INSERT, etc.

To test this system out, I wrote an xsl stylesheet, that converts
entitymodel.xml files into sql.  I then did it over all such files in ofbiz.
I then wrote a visitor that walks the parsed tree, and produces standard
ModelEntity/ModelViewEntity objects.  With counting in place, I get identical
entity/view counts as the standard ModelReader stuff.

Initially, this may look like a bunch of busy work.  Why should entitymodel be
done in sql, instead of the current working form?  Well, if you were only
concerned with plain tables, and not views, you would be correct.

However, views are much more powerful when you can make use of the full sql
syntax.  This includes nested subselects in from items, complex nested
inner/outer joins, and mathematical operations everywhere.  The existing
ModelViewEntity and sql generating code in in SqlJdbcUtil do not support such
advanced features.

This system can also be a replacement for DynamicViewEntities, which are
rather cumbersome to work with.  You already need to understand sql to write a
view, and you also need to understand how to use the DynamicViewEntities api.
With this new system, you just need to know sql.

This last viewpoint also applies to ModelViewEntity xml.  Why bother learning
a second xml language for defining views, when sql will do just fine?

So now you may be wondering why I wrote all this.  You may ask, isn't the
current system just fine?  Well, while the current system does work, and is
fairly well tested, it has several shortcomings, which have now permeated
themselves thruout the code base, and the ofbiz community.

As already mentioned, existing view entities are limited in their features.
Because of the limit syntax, java code gets written that ends up doing several
round trips to the database(or the entity-cache stuff I wrote), and recombing
data in memory.  However, it's been my experience, that a properly written
query, with proper indexing, can be extremely fast.  Let me give you an
example.

When I wrote the entitycache stuff and submitted it over a year ago, it was
due to realized performance issues with the existing condition/caching stuff
in ofbiz.  This was due to the job I was working on at the time having some
rather large databases, and some complex data arrangements.  The
reverse-dependency cache clearing code was an attempt to solve this, by
clearing views when a normal entity was updated.  It mostly works, with some
very small border conditions(some have been fixed since then).  The views that
I was using that drove this development, have 10 members, but only 8 tables.
2 of those tables are joined twice.  Additionally, a feature I wrote, but
never submitted, was the ability to restrict views with conditions in the xml.
I never liked my implemetation, so never sent it.

After I got the cache clearing stuff done, and before I sent it to ofbiz, I
realized that the complex view I had done was not performing well in the
database.  This was due to my initial lack of proper indexing.  The existing
deployed site now doesn't even cache that query *at all*, and the page it's
one is the one of the main pages for the site(gives users a transcript of all
certification tests they have taken, grouped into categories, with
red/yellow/green lights signifying whether they have passed, or their cert is
about to expire).

It was at this point that I decided to trust the database, and let it do all
the work.  As much as I could, I tried to create a view, and not have any loop
code at all.  Once you have to do multiple queries in a loop(even with
caching), performance falls about.

So, now jump forward a year.  The current projects we(brainfood) are working
on are based on a new wiki system we've developed.  One of the goals for this
system is to simplify things as much as possible; to lower the barrier for
entry when creating websites.  Wiki markup is one of the features we
implemented to support this end goal.

Ean(my boss) wrote a very simple, but very easy to use, SQLUtil tool.  Give it
an ofbiz connection name(ie, "localpostgres"), a raw sql query string, and it
will return you a list of maps, one map for each row in the database.  Writing
sql is much simpler than writing the xml stuff, and this tool let him change
the sql easily at runtime.

However, using this system requires one to know the entityname/fieldname
mapping syntax, which is yet another thing to learn.  It also had no caching
whatsoever.

One of the tools we integrated for this wiki system, is velocity.  This is
simpler(main design goal) than freemarker, and faster(freemarker copies object
graphs when placing them into the template context).  It uses jjtree to parse
it's templates, and uses the AST(abstract syntax tree) at runtime to evaluate
everything.  This allowed me to see a working jjtree/javacc setup, and helped
me to learn that tool.

So, now that we(brainfood) have a need for a sql parser(to speed up
SQLUtil, and to make things easier to work with), and that I know how to use
jjtree/javacc, this feature can actually be implemented.

Ok, enough rambling.  Let me quickly explain the attachments.

SqlScript.jtt:

This is the sql grammar file.  Fairly standard set of options needed to parse
that into the SqlScript.jj file, then produce the SqlScript.java file from
that.  There are 2 top-level entry points:  SQLScript, and NamedConditions.
Both expect to parse a full file.  The former will handle CREATE TABLE, CREATE
VIEW, and CREATE INDEX.  The latter handles the CONDITION name <condition>
stuff.  Both output an AST of the parsed data.

The AST produced is meanted to be accessed by using an implementation of
SqlScriptVisitor.  I have versions for parsing CREATE TABLE, CREATE VIEW, and
CONDITION trees; however, they require some minor changes to existing ofbiz
code, so are not yet included here.

entitymodel.xslt:

XSL stylesheet that can convert entitymodel.xml files into sql files.  Used to
automatically convert everything, for testing.

model.sql.gz:

Example file containing all existing ofbiz entity models converted to sql.

My development plan for integrating this, is to not replace anything, at
first.  I want this new code to exist alongside the existing
sql/entitymodel/ModelEntity stuff.  This way, no unknown problems occur.
I'll be modifying ModelReader to be an interface, then having different
implementations, one for the xml, and one for the sql forms.

However, even at this step, no new features will exist.

I'll then make a third ModelReader, that will not restrict what you can do for
CREATE VIEW.  This will replace the existing ModelViewEntity, and SqlJdbcUtil.
It'll most likely just keep the post-processed AST around at runtime, for
generating the actual sql.

I don't have a time table for completly any of these steps.  I can, however,
probably get the condition parsing stuff integrated fairly quick, as that is
non-invasive, and non-disrupting.

ps: sorry for the length, but it's hard to shrink this down.

SqlScript.jjt (46K) Download Attachment
entitymodel.xslt (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Dev - ofbiz sql(whee!) (very long)

Si Chen-2
Adam,

This was from a while ago, but I'm curious where you are on this.

I initially had reservations about another set of technologies in the
OFBiz framework, but I think your solution could really help with
"business intelligence" or reporting tasks.  I would really like to be
able to write complex queries against the OFBiz entity and field names
(ie, not the underlying database ones) and then have them execute
through the entity engine, while still preserving true database
independence.

So, please let me know if you're still interested in making this
available and whether you'd like to get some help on it.

Si

PS Here are the links to the original messages, for anybody who's
interested in reading about them again:
http://lists.ofbiz.org/pipermail/dev/2005-September/009122.html
http://lists.ofbiz.org/pipermail/dev/2005-September/009173.html

Adam Heath wrote:

>So, as talked about at the last users' conference, here's a teaser.
>
>I have code that can parse SQL-like strings and files, and product
>EntityCondition objects.  For example:
>
>==
>InputStream in = StringBufferInputStream(
>"CONDITION foo field1 = 'bar' OR field2 = 'baz' AND field3 " +
>"IN (1, 2, 3); CONDITION bar id = ?id;";
>SqlScript sqlScript = new SqlScript(inputStream);
>SqlScriptVisitor visitor = MakeCondition();
>SimpleNode node = sqlScript.NamedConditions();
>Map namedConditions = (Map) node.jjtAccept(visitor, null);
>==
>
>That returns a map of name -> condition.  There is support for named
>parameters, which make use of the freeze support I added last year, and a
>ThreadLocal, to convert the condition to a real instance.
>
>The above pseudo-code(which is very close to the real code I have) can cache
>the parsed EntityCondition object, and only needs to call freeze() if
>parameters are being expanded.
>
>  
>
>
 
_______________________________________________
Dev mailing list
[hidden email]
http://lists.ofbiz.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Dev - ofbiz sql(whee!) (very long)

Adam Heath-2
On Wed, 11 Jan 2006, Si Chen wrote:

> Adam,
>
> This was from a while ago, but I'm curious where you are on this.
>
> I initially had reservations about another set of technologies in the
> OFBiz framework, but I think your solution could really help with
> "business intelligence" or reporting tasks.  I would really like to be
> able to write complex queries against the OFBiz entity and field names
> (ie, not the underlying database ones) and then have them execute
> through the entity engine, while still preserving true database
> independence.
>
> So, please let me know if you're still interested in making this
> available and whether you'd like to get some help on it.

Well, it can't do joins.  Without that feature, it's mostly useless.

I am using svk locally, and have this checked in.  I could merge this to you
guys, easy enough, that way(svk push, after I merge the newer upstream into my
work branch).  It's fairly contained, and the existing code doesn't change any
existing functionality.  It's all new stuff.

Using the sql as a macro expander, by making use of the meta data in
ModelEntity and ModelViewEntity, is not something I have attempted yet.  It'd
require multiple passes over the AST(Abstract Syntax Tree).

I do have code that can convert *all* entitymodel.xml files into compatible
CREATE TABLE/CREATE VIEW stmts, then code that can parse those sql stmts into
AST.  I only did this because it seemed like a good way to verify my code was
handling everything.

I don't have anything that handles SELECT/INSERT/UPDATE/DELETE.  The AST parts
are there, but I've not started any visitor stuff.

Lately, we have been skipping ofbiz entirely, and doing direct database
queries, that return a list of maps.  This is way faster than doing loop code
in java, then combining as needed.

For example, here's a query that we have setup.  It's a velocity template,
which explains the $var and #directive stuff.  It's just not possible to do
anything like this with ofbiz views.  It's also something I want to be able to
support with my sql stuff, if I ever get around to it.

==
SELECT
        per.party_id AS "partyId",
        per.first_name AS "firstName",
        per.last_name AS "lastName",
        a.tech_id AS "technicianId",
        a.email_address AS "emailAddress",
        a.city AS "city",
        a.state_province_name AS "state",
        a.address_id AS "addressId"
FROM
        person per JOIN (
                SELECT
                        a.party_id,
                        a.tech_id,
                        b.email_address,
                        c.city,
                        c.state_province_name,
                        c.contact_mech_id AS address_id
                FROM
                        (
                                SELECT
                                        a.contact_mech_id,
                                        b.party_id,
                                        a.info_string AS tech_id
                                FROM
                                        contact_mech a JOIN party_contact_mech b ON
                                                a.contact_mech_id = b.contact_mech_id
                                        JOIN party_contact_mech_purpose c ON
                                                b.contact_mech_id = c.contact_mech_id
                                WHERE
                                        (b.from_date IS NULL OR b.from_date <= current_timestamp)
                                        AND
                                        (b.thru_date IS NULL OR b.thru_date > current_timestamp)
                                        AND
                                        (c.from_date IS NULL OR c.from_date <= current_timestamp)
                                        AND
                                        (c.thru_date IS NULL OR c.thru_date > current_timestamp)
                                        AND
                                        a.contact_mech_type_id = 'ECOUNT_PUID'
                                        AND
                                        c.contact_mech_purpose_type_id = 'ECOUNT_TECH_ID'
                        ) a JOIN (
                                SELECT
                                        a.contact_mech_id,
                                        b.party_id,
                                        a.info_string AS email_address
                                FROM
                                        contact_mech a JOIN party_contact_mech b ON
                                                a.contact_mech_id = b.contact_mech_id
                                        JOIN party_contact_mech_purpose c ON
                                                b.contact_mech_id = c.contact_mech_id
                                WHERE
                                        (b.from_date IS NULL OR b.from_date <= current_timestamp)
                                        AND
                                        (b.thru_date IS NULL OR b.thru_date > current_timestamp)
                                        AND
                                        (c.from_date IS NULL OR c.from_date <= current_timestamp)
                                        AND
                                        (c.thru_date IS NULL OR c.thru_date > current_timestamp)
                                        AND
                                        a.contact_mech_type_id = 'EMAIL_ADDRESS'
                                        AND
                                        c.contact_mech_purpose_type_id = 'PRIMARY_EMAIL'
                        ) b ON
                                a.party_id = b.party_id
                        JOIN (
                                SELECT
                                        a.contact_mech_id,
                                        b.party_id,
                                        d.city,
                                        e.geo_id AS state_province_id,
                                        e.geo_name AS state_province_name
                                FROM
                                        contact_mech a JOIN party_contact_mech b ON
                                                a.contact_mech_id = b.contact_mech_id
                                        JOIN party_contact_mech_purpose c ON
                                                b.contact_mech_id = c.contact_mech_id
                                        JOIN postal_address d ON
                                                c.contact_mech_id = d.contact_mech_id
                                        JOIN geo e ON
                                                d.state_province_geo_id = e.geo_id
                                WHERE
                                        (b.from_date IS NULL OR b.from_date <= current_timestamp)
                                        AND
                                        (b.thru_date IS NULL OR b.thru_date > current_timestamp)
                                        AND
                                        (c.from_date IS NULL OR c.from_date <= current_timestamp)
                                        AND
                                        (c.thru_date IS NULL OR c.thru_date > current_timestamp)
                                        AND
                                        a.contact_mech_type_id = 'POSTAL_ADDRESS'
                                        AND
                                        c.contact_mech_purpose_type_id = 'HOME_ADDRESS'
                        ) c ON
                                a.party_id = c.party_id
        ) a ON
                a.party_id = per.party_id
        JOIN party_role pr ON
                per.party_id = pr.party_id
                AND
                pr.role_type_id = 'REWARD_PROGRAM_MEMBR'

#if ($technicianId || $emailAddress || $firstName || $lastName)
WHERE
        TRUE
#if ($technicianId)
        AND cm_t.info_string = ?
#end
#if ($emailAddress)
        AND LOWER(cm_e.info_string) LIKE LOWER(?)
#end
#if ($firstName)
        AND LOWER(per.first_name) LIKE LOWER(?)
#end
#if ($lastName)
        AND LOWER(per.last_name) LIKE LOWER(?)
#end
#end
#if ($orderBy == "technicianId")
ORDER BY
        cm_t.info_string $orderByDirection
#elseif ($orderBy == "emailAddress")
ORDER BY
        cm_e.info_string $orderByDirection
#elseif ($orderBy == "firstName")
ORDER BY
        per.first_name $orderByDirection
#elseif ($orderBy == "lastName")
ORDER BY
        per.last_name $orderByDirection
#end
OFFSET
        $offset
LIMIT
        $limit + 1
==
 
_______________________________________________
Dev mailing list
[hidden email]
http://lists.ofbiz.org/mailman/listinfo/dev
Reply | Threaded
Open this post in threaded view
|

Re: Dev - ofbiz sql(whee!) (very long)

Si Chen-2
Adam,

Would you be able to set up your SVK so we can take a look at it and
possibly play with it? Maybe we can look through your code and your
examples?

Thanks,

Si



Adam Heath wrote:

>On Wed, 11 Jan 2006, Si Chen wrote:
>
>  
>
>>Adam,
>>
>>This was from a while ago, but I'm curious where you are on this.
>>
>>I initially had reservations about another set of technologies in the
>>OFBiz framework, but I think your solution could really help with
>>"business intelligence" or reporting tasks.  I would really like to be
>>able to write complex queries against the OFBiz entity and field names
>>(ie, not the underlying database ones) and then have them execute
>>through the entity engine, while still preserving true database
>>independence.
>>
>>So, please let me know if you're still interested in making this
>>available and whether you'd like to get some help on it.
>>    
>>
>
>Well, it can't do joins.  Without that feature, it's mostly useless.
>
>I am using svk locally, and have this checked in.  I could merge this to you
>guys, easy enough, that way(svk push, after I merge the newer upstream into my
>work branch).  It's fairly contained, and the existing code doesn't change any
>existing functionality.  It's all new stuff.
>
>Using the sql as a macro expander, by making use of the meta data in
>ModelEntity and ModelViewEntity, is not something I have attempted yet.  It'd
>require multiple passes over the AST(Abstract Syntax Tree).
>
>I do have code that can convert *all* entitymodel.xml files into compatible
>CREATE TABLE/CREATE VIEW stmts, then code that can parse those sql stmts into
>AST.  I only did this because it seemed like a good way to verify my code was
>handling everything.
>
>I don't have anything that handles SELECT/INSERT/UPDATE/DELETE.  The AST parts
>are there, but I've not started any visitor stuff.
>
>Lately, we have been skipping ofbiz entirely, and doing direct database
>queries, that return a list of maps.  This is way faster than doing loop code
>in java, then combining as needed.
>
>For example, here's a query that we have setup.  It's a velocity template,
>which explains the $var and #directive stuff.  It's just not possible to do
>anything like this with ofbiz views.  It's also something I want to be able to
>support with my sql stuff, if I ever get around to it.
>
>==
>SELECT
> per.party_id AS "partyId",
> per.first_name AS "firstName",
> per.last_name AS "lastName",
> a.tech_id AS "technicianId",
> a.email_address AS "emailAddress",
> a.city AS "city",
> a.state_province_name AS "state",
> a.address_id AS "addressId"
>FROM
> person per JOIN (
> SELECT
> a.party_id,
> a.tech_id,
> b.email_address,
> c.city,
> c.state_province_name,
> c.contact_mech_id AS address_id
> FROM
> (
> SELECT
> a.contact_mech_id,
> b.party_id,
> a.info_string AS tech_id
> FROM
> contact_mech a JOIN party_contact_mech b ON
> a.contact_mech_id = b.contact_mech_id
> JOIN party_contact_mech_purpose c ON
> b.contact_mech_id = c.contact_mech_id
> WHERE
> (b.from_date IS NULL OR b.from_date <= current_timestamp)
> AND
> (b.thru_date IS NULL OR b.thru_date > current_timestamp)
> AND
> (c.from_date IS NULL OR c.from_date <= current_timestamp)
> AND
> (c.thru_date IS NULL OR c.thru_date > current_timestamp)
> AND
> a.contact_mech_type_id = 'ECOUNT_PUID'
> AND
> c.contact_mech_purpose_type_id = 'ECOUNT_TECH_ID'
> ) a JOIN (
> SELECT
> a.contact_mech_id,
> b.party_id,
> a.info_string AS email_address
> FROM
> contact_mech a JOIN party_contact_mech b ON
> a.contact_mech_id = b.contact_mech_id
> JOIN party_contact_mech_purpose c ON
> b.contact_mech_id = c.contact_mech_id
> WHERE
> (b.from_date IS NULL OR b.from_date <= current_timestamp)
> AND
> (b.thru_date IS NULL OR b.thru_date > current_timestamp)
> AND
> (c.from_date IS NULL OR c.from_date <= current_timestamp)
> AND
> (c.thru_date IS NULL OR c.thru_date > current_timestamp)
> AND
> a.contact_mech_type_id = 'EMAIL_ADDRESS'
> AND
> c.contact_mech_purpose_type_id = 'PRIMARY_EMAIL'
> ) b ON
> a.party_id = b.party_id
> JOIN (
> SELECT
> a.contact_mech_id,
> b.party_id,
> d.city,
> e.geo_id AS state_province_id,
> e.geo_name AS state_province_name
> FROM
> contact_mech a JOIN party_contact_mech b ON
> a.contact_mech_id = b.contact_mech_id
> JOIN party_contact_mech_purpose c ON
> b.contact_mech_id = c.contact_mech_id
> JOIN postal_address d ON
> c.contact_mech_id = d.contact_mech_id
> JOIN geo e ON
> d.state_province_geo_id = e.geo_id
> WHERE
> (b.from_date IS NULL OR b.from_date <= current_timestamp)
> AND
> (b.thru_date IS NULL OR b.thru_date > current_timestamp)
> AND
> (c.from_date IS NULL OR c.from_date <= current_timestamp)
> AND
> (c.thru_date IS NULL OR c.thru_date > current_timestamp)
> AND
> a.contact_mech_type_id = 'POSTAL_ADDRESS'
> AND
> c.contact_mech_purpose_type_id = 'HOME_ADDRESS'
> ) c ON
> a.party_id = c.party_id
> ) a ON
> a.party_id = per.party_id
> JOIN party_role pr ON
> per.party_id = pr.party_id
> AND
> pr.role_type_id = 'REWARD_PROGRAM_MEMBR'
>
>#if ($technicianId || $emailAddress || $firstName || $lastName)
>WHERE
> TRUE
>#if ($technicianId)
> AND cm_t.info_string = ?
>#end
>#if ($emailAddress)
> AND LOWER(cm_e.info_string) LIKE LOWER(?)
>#end
>#if ($firstName)
> AND LOWER(per.first_name) LIKE LOWER(?)
>#end
>#if ($lastName)
> AND LOWER(per.last_name) LIKE LOWER(?)
>#end
>#end
>#if ($orderBy == "technicianId")
>ORDER BY
> cm_t.info_string $orderByDirection
>#elseif ($orderBy == "emailAddress")
>ORDER BY
> cm_e.info_string $orderByDirection
>#elseif ($orderBy == "firstName")
>ORDER BY
> per.first_name $orderByDirection
>#elseif ($orderBy == "lastName")
>ORDER BY
> per.last_name $orderByDirection
>#end
>OFFSET
> $offset
>LIMIT
> $limit + 1
>==
>
>_______________________________________________
>Dev mailing list
>[hidden email]
>http://lists.ofbiz.org/mailman/listinfo/dev
>
>  
>
 
_______________________________________________
Dev mailing list
[hidden email]
http://lists.ofbiz.org/mailman/listinfo/dev