Dev - RFC: entity-sql rewrite(90% done)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Dev - RFC: entity-sql rewrite(90% done)

Adam Heath-2
(sorry for the length of this email; I think it'd be useful if those watching
knew a little bit about what drove me to do this).

After much delay, I finally have something to show on this feature.  Let me
start from the beginning, for those who haven't followed along.

OfBiz has a fairly nice database abstraction layer.  It uses xml to define a
series of tables and views(basically, a views are a way to pre-create complex
joins).  It then reads this xml into an internal object graph, and when the
delegator is called, converts this object graph into sql.

It supports a dynamic view interface, but the only way to access this is by
using rather verbose object creation methods.

I'm not certain about the rest of the community, but I find the
DymamicViewEntity api to be hard to remember.  Even creating views thru xml is
a chore.

However, I *have* done complex views with xml.  On an older deployment(1),
I've got a few <view-entity>s that join 10 tables; they join to PartyGroup and
PartyRelationship twice each.  These views are what drove me to develope the
advanced dependency clearing view cache code, that I submitted a while back.

The way I designed those views, was by starting out writing plain sql, then
making my own dynamic sql generator(this was before DynamicViewEntity), then
finally converting that into a multiple <view-entity>s, which I then chose
from at runtime.  Writing it as plain sql was rather fast, but converting into
the dynamic system(self-written) and then the xml, took time.

Fast forward 2 years....

Last summer, we wrote some new technology(would rather not disclose that right
now, it's not important for this discussion) that allowed us to rapidly
develop web sites.  We became more agile.  One of the features we used, was
the ability to send raw sql to the database backend.  We then added a feature
to generate that raw sql string with velocity, so that it could become
dynamic.  While this did an end run around OfBiz(and would have had no chance
of ever being accepted), it made the system development super fast.

The faster we can develop systems for our clients, the quicker the clients get
serviced.  Clients are willing to pay good money to have things done fast.
Plus, it frees us from being invovled with a long-term contract, without
seeing any fruits from our labor.  It's standard for clients to pay a portion
up front, then some in the middle, and a final chunk at the end.  If the time
for development is shortened, then more money is flowing into the business.
So, we have a pressing need to keep things simple and fast.

Anyways, back to that entityengine bypass.  Since I knew it would never be
accepted at large, it never sat well with me.  Around this time(last July) I
started playing again with javacc/jjtree.  This time, I finally figured it
out.  I also then started looking on the web for sql grammars.  Most of the
ones I found were plain javacc.  However, I found one based on oracle, that
was jjtree(jjtree produces an AST(Abstract Syntax Tree), which you can then
write visitors for).  I started modifying it to parse an OfBiz sql dialect,
and then the OfBiz User's/Developer's Conference started.

At this conference, I mentioned what I was doing.  I mentioned that I almost
had it parsing SQLExpression(s), which then were producing EntityCondition
object graphs.  Andy said he was interested.  After arriving back home, I
*did* get that finished, but we got busy, so I never got around to announcing
it anywhere.  An example of using just this feature:

        EntityCondition condition = EntityConditionUtil.parse(
                "foo = ?foo or bar = ?bar",
                UtilMisc.toMap(
                        "foo", "this is foo",
                        "bar", "this is bar"
                )
        );

Note the NamedParameter support(another new feature).  However, we didn't even
start using this feature at all, and it stagnated in a separate svk repository
for months.

Fast forward to last thursday...

Brainfood(the company I work for) recently moved to a new office.  This has
caused us to act differently.  We've decided that we need to actually start
finishing all the little side projects we've started.  One that I picked, was
this entityengine sql feature.

Now, about 1.5 weeks later, I have it about 90% done.  In short, here's what
is happening:

* To make certain the sql grammar parsed things correctly, I needed some
  existing sql to parse.  Since I didn't have a large corpus readily
  available, I took the next best thing: the existing entitymodel.xml files.
  To that end, I wrote a stylsheet(xslt)(see attached) that converts *all*
  portions of entitymodel.xml into the sql dialect.
* Fleshed out the sql grammar to produce the AST.
* Started writing visitors to walk this grammar, and fill out the existing
  model object graphs.
* (the above steps actually took place last September, but weren't quite
  finished).
* Wrote 2 more parsers, from scratch, both reading the existing xml data; one
  based on commons-digester, another pure sax.  I wrote this so that I could
  verify that my new code was parsing things in correctly.
* Copied the existing model objects into a new dir, so I could modify them
  without breaking anything.
* Got all 3 parsers working identically(they count the various parts as they
  work, and their numbers are the same).
* Started reworking the model object graph features:
  * Renamed ComplexAlias to ComplexAliasList.
  * Added ComplexAliasFunction.
  * Made ComplexAliasFunction, ComplexAliasList, and ComplexAliasField all
    extend ComplexAliasMember.
  * Removed function support from ModelAlias; this is now converted into
    ComplexAliasFunction during parse.
  * Removed groupBy support from ModelAlias.  This is a maintained as
    a separate LinkedHasSet on the view.
  * groupBy can now be *any* ComplexAliasMember.
  * ModelViewEntity renamed to ModelView, extending AbstractModelEntity.
    Some parts of ModelEntity didn't make sense when dealing with views.
  * ModelEntity extends AbstractModelEntity.
  * ModelField is now abstact, with:
      ModelEntity.EntityField extends ModelField
      ModelView.AbstractViewField extends ModelField
      ModelView.ViewField extends AbstractViewField
      ModelView.ComplexField extends AbstractViewField
    Some parts of ModelField didn't make sense when dealing with views.
  * Based analyze() on ModelView.populateFieldsBasic.
  * Fixed case when ModelAlias is complex; the existing code doesn't recurse
    down into the member, to verify that the final complex-alias-field items
    reference existing fields.
  * Add support for views joining other views(ie, nesting).
* Started work on DynamicView.  This is an almost exact copy of the new sql
  parsing view code, with the addition of a few more things.
  * A WHERE clause is allowed, which gets parsed into an EntityCondition
    object(this feature doesn't actually work, because it is still using the
    old model objects, not the new ones I've modified, so I have it commented
    out).
  * Nested joins are parsed into unnamed unattached views, that only exist as
    children of the DynamicView.
* Wrote a sql generator from scratch, to handle all the above.

So, at this point, I can write queries in OfBiz SQL, then have it get
translated into database specific sql dialects(including the 2 types of join
syntaxes).

It currently doesn't handle WHERE clauses(neither in parsing DynamicView, nor
when generating sql).  It doesn't do ORDER BY.  It doesn't restrict the
fields in the SELECT clause to those being requested.  It doesn't generate
INSERT/UPDATE/DELETE.  None of these problems are that difficult to solve,
however, once the basic FROM stuff is done.

Also, none of the reverse dependency cache stuff has been retrofitted into
this system.  We've discovered that the database(postgres in this case) is
rather smart.  A properly written query, with proper indices, can be quite
fast.  This is also one of the reasons why I decided to finish this project.

It also doesn't deal at all with the split datasource/DAO objects.  I'm not
certain if we should keep those.  OfBiz, as it stands, doesn't really work
well with having different parts of the entities in different databases.

About the files(http://debian.doogie.brainfood.com/uploads/entity-sql):

entitymodel.xslt:
        This is the stylesheet that can convert any <entitymodel> into a
        series of sql statements.  During testing, I took all entitymodel.xml
        files(by parsing component-load.xml/ofbiz-component.xml recursively),
        then concatenated all of the files together into a single large xml
        file.
example.xml:
        A few select <view-entity>s.
example.sql:
        The converted sql, from the stylesheet
example-out.sql:
        Output of the sql generator on the select views.
query.sql:
        A DynamicView sql query.
query-out.sql:
        The output from the generator of the DynamicView.  It's interesting to
        note, that I can do 1000 load/parse/analyze/generate in 3218
        milliseconds.  That comes out to 3.2 milliseconds each, which means
        310 conversions/second.  Once all the classes are loaded and jitted.

1: This older deployment is based on a version of OfBiz that was branched 3
   weeks before 2.0.0 came out.  Its repository was created 3 years ago, and
   was deployed into production 2 years ago this summer.   The system is an
   intranet, that allows users to take certification tests online.  Tuesday
   thru Thursday of this last week, it did 1100/700/700 tests, and
   29,000/19,000/19,000 questions.  There are over 550.000 certifications in
   the system(going back to March, 2000(we converted the old database to
   ofbiz for deployment)), and over 15,800,000 individual questions.  The
   system has an active userbase of around 6000 employees, with over 17,000
   parties in the system(they have a lot of turnover).  This system is what
   caused me to write the reverse-dependency entity cache additions, so that
   views could be cached.
 
_______________________________________________
Dev mailing list
[hidden email]
http://lists.ofbiz.org/mailman/listinfo/dev