MySQL connections in the database connection pool time out (usually set to 8 hours). Possible
workarounds(?): 1. Increase the time out value and hope that someone will connect to OFBiz before the connections time out. 2. Fix OFBiz to allow to a "validateQuery" mechanism. 3. Use PostgreSQL. In Tomcat, we usually use the "validateQuery" so the DBCP will test each connection before giving it to the application. If all connections in the pool has timed out (say no one has accessed OFBiz in 8 hours), the DBCP creates new connections for the pool. If someone will tell me that this doesn't happen for PostgreSQL, I'll simply make the switch to PostgreSQL rather than fix things in OFBiz for MySQL. Thanks. Jonathon |
Sorry, it's "validationQuery", not validateQuery.
Jonathon Jonathon -- Improov wrote: > MySQL connections in the database connection pool time out (usually set > to 8 hours). Possible workarounds(?): > > 1. Increase the time out value and hope that someone will connect to OFBiz > before the connections time out. > > 2. Fix OFBiz to allow to a "validateQuery" mechanism. > > 3. Use PostgreSQL. > > In Tomcat, we usually use the "validateQuery" so the DBCP will test each > connection before giving it to the application. If all connections in > the pool has timed out (say no one has accessed OFBiz in 8 hours), the > DBCP creates new connections for the pool. > > If someone will tell me that this doesn't happen for PostgreSQL, I'll > simply make the switch to PostgreSQL rather than fix things in OFBiz for > MySQL. > > Thanks. > > Jonathon > |
In reply to this post by jonwimp
This does not occur in PostgreSQL. It is a "feature" of MySQL and they
(mysql) will smugly say that OFBiz doesn't handle the connection pool correctly. I don't know and don't really care to know if it's true or not. I switched over about 2 months ago and have had smooth sailing since (even seemingly eliminated that UserLoginHistory bug that you're aware of). Be warned, it's a bit of a pain to convert from MySQL to Postgres. Most of the issues seem to be of how lax MySQL with data and how stringent PostgreSQL is(at least the default installation). These were some of the issues I came across with my data using the export/import in webtools 1. the createdBy fields in the various entities weren't in the correct case (i believe this has been solved in OFBiz, I just had data that predated the fix) 2. UserLogin and Party entites end up with a circular dependency based on the partyId admin if the UserLogin admin created parties. Either load the single Party record for partyId before loading the UserLogin entity or remove the createdBy data from the Party entity 3. Heirarchial parent->child relationships. This occurs with the *Type entities. They simply need to be loaded in the correct order. There is a JIRA issue which solves this problem for about the *Type entities where the child is childId and the parent is parentChildId (e.g. partyTypeId -> parentPartyTypeId) There may have been other referential integrity issues, but I think they were self created and not created by OFBiz. --- Jonathon -- Improov <[hidden email]> wrote: > MySQL connections in the database connection pool time out (usually > set to 8 hours). Possible > workarounds(?): > > 1. Increase the time out value and hope that someone will connect to > OFBiz > before the connections time out. > > 2. Fix OFBiz to allow to a "validateQuery" mechanism. > > 3. Use PostgreSQL. > > In Tomcat, we usually use the "validateQuery" so the DBCP will test > each connection before giving > it to the application. If all connections in the pool has timed out > (say no one has accessed OFBiz > in 8 hours), the DBCP creates new connections for the pool. > > If someone will tell me that this doesn't happen for PostgreSQL, I'll > simply make the switch to > PostgreSQL rather than fix things in OFBiz for MySQL. > > Thanks. > > Jonathon > |
Chris,
Wow, thanks. You sound like you've really been through it (the migration). Seems like quite a pain migrating to PostgreSQL from MySQL. But what if I don't have data to migrate? What if I just start over with PostgreSQL? Any problems? I'll take your advice regarding the migration gotchas. Thanks. Actually, about DBCPs. For RDBMSes, it's correct for database connections to time out after a set interval of inactivity. That's just prudence. When connections are used inside of DBCPs, it is the DBCP's responsibility to refresh timed out connections in the pool. I did my very first very own DBCP more than a decade ago, and that was one of the must-have functionalities for a DBCP. I was beaten up real bad for missing that out. :) The fix for this, in OFBiz, is in OFBiz's DBCP --- XAPoolDataSource. I posted another message asking for the source codes to ofbiz-minerva.jar. Jonathon Chris Howe wrote: > This does not occur in PostgreSQL. It is a "feature" of MySQL and they > (mysql) will smugly say that OFBiz doesn't handle the connection pool > correctly. I don't know and don't really care to know if it's true or > not. I switched over about 2 months ago and have had smooth sailing > since (even seemingly eliminated that UserLoginHistory bug that you're > aware of). > > Be warned, it's a bit of a pain to convert from MySQL to Postgres. > Most of the issues seem to be of how lax MySQL with data and how > stringent PostgreSQL is(at least the default installation). These were > some of the issues I came across with my data using the export/import > in webtools > > 1. the createdBy fields in the various entities weren't in the correct > case (i believe this has been solved in OFBiz, I just had data that > predated the fix) > 2. UserLogin and Party entites end up with a circular dependency based > on the partyId admin if the UserLogin admin created parties. Either > load the single Party record for partyId before loading the UserLogin > entity or remove the createdBy data from the Party entity > 3. Heirarchial parent->child relationships. This occurs with the *Type > entities. They simply need to be loaded in the correct order. There > is a JIRA issue which solves this problem for about the *Type entities > where the child is childId and the parent is parentChildId (e.g. > partyTypeId -> parentPartyTypeId) > > There may have been other referential integrity issues, but I think > they were self created and not created by OFBiz. > > --- Jonathon -- Improov <[hidden email]> wrote: > >> MySQL connections in the database connection pool time out (usually >> set to 8 hours). Possible >> workarounds(?): >> >> 1. Increase the time out value and hope that someone will connect to >> OFBiz >> before the connections time out. >> >> 2. Fix OFBiz to allow to a "validateQuery" mechanism. >> >> 3. Use PostgreSQL. >> >> In Tomcat, we usually use the "validateQuery" so the DBCP will test >> each connection before giving >> it to the application. If all connections in the pool has timed out >> (say no one has accessed OFBiz >> in 8 hours), the DBCP creates new connections for the pool. >> >> If someone will tell me that this doesn't happen for PostgreSQL, I'll >> simply make the switch to >> PostgreSQL rather than fix things in OFBiz for MySQL. >> >> Thanks. >> >> Jonathon >> > > |
Most of the data migration issues should be solvable by reading the
migration data files (entity-engine-xml) into a Xindice XML database and running ofbiz services from there. This will allow XPath queries for the hierarchical stuff, and XUpdate of children elements for error messages and thus group error handling using XPath. I hope to be able to get some of this together shortly as we've been using OFBiz to slowly bring our legacy data in line with the OFBiz data model and the legacy data is still in MySQL (and much more critical and can't really take the risk that I "think" I changed the right data). If you don't have data, there shouldn't be any issue. Simply run the ant run-install with a delegator calling the postgres datasource. The impression I get is that most of the community in production is using postgres unless they've purchased a license to a mssql or oracle. My limited reading through mailing lists on mysql and postgres show that the timeout handling should be handled by the client (ofbiz) and that mysql decided to provided an additional safe gaurd by moving an excessive, configurable, "catch-all" timeout to the server. One of my favorite things about OFBiz is that you can get pretty far in developing something useful without ever knowing or worrying about database administration like this and can come back and address it when you're fine tuning your deployment for bottlenecks before going live. --- Jonathon -- Improov <[hidden email]> wrote: > Chris, > > Wow, thanks. You sound like you've really been through it (the > migration). > > Seems like quite a pain migrating to PostgreSQL from MySQL. But what > if I don't have data to > migrate? What if I just start over with PostgreSQL? Any problems? > I'll take your advice regarding > the migration gotchas. Thanks. > > Actually, about DBCPs. For RDBMSes, it's correct for database > connections to time out after a set > interval of inactivity. That's just prudence. When connections are > used inside of DBCPs, it is the > DBCP's responsibility to refresh timed out connections in the pool. I > did my very first very own > DBCP more than a decade ago, and that was one of the must-have > functionalities for a DBCP. I was > beaten up real bad for missing that out. :) > > The fix for this, in OFBiz, is in OFBiz's DBCP --- XAPoolDataSource. > I posted another message > asking for the source codes to ofbiz-minerva.jar. > > Jonathon > > Chris Howe wrote: > > This does not occur in PostgreSQL. It is a "feature" of MySQL and > they > > (mysql) will smugly say that OFBiz doesn't handle the connection > pool > > correctly. I don't know and don't really care to know if it's true > or > > not. I switched over about 2 months ago and have had smooth > sailing > > since (even seemingly eliminated that UserLoginHistory bug that > you're > > aware of). > > > > Be warned, it's a bit of a pain to convert from MySQL to Postgres. > > Most of the issues seem to be of how lax MySQL with data and how > > stringent PostgreSQL is(at least the default installation). These > were > > some of the issues I came across with my data using the > export/import > > in webtools > > > > 1. the createdBy fields in the various entities weren't in the > correct > > case (i believe this has been solved in OFBiz, I just had data that > > predated the fix) > > 2. UserLogin and Party entites end up with a circular dependency > based > > on the partyId admin if the UserLogin admin created parties. > Either > > load the single Party record for partyId before loading the > UserLogin > > entity or remove the createdBy data from the Party entity > > 3. Heirarchial parent->child relationships. This occurs with the > *Type > > entities. They simply need to be loaded in the correct order. > There > > is a JIRA issue which solves this problem for about the *Type > entities > > where the child is childId and the parent is parentChildId (e.g. > > partyTypeId -> parentPartyTypeId) > > > > There may have been other referential integrity issues, but I think > > they were self created and not created by OFBiz. > > > > --- Jonathon -- Improov <[hidden email]> wrote: > > > >> MySQL connections in the database connection pool time out > (usually > >> set to 8 hours). Possible > >> workarounds(?): > >> > >> 1. Increase the time out value and hope that someone will connect > to > >> OFBiz > >> before the connections time out. > >> > >> 2. Fix OFBiz to allow to a "validateQuery" mechanism. > >> > >> 3. Use PostgreSQL. > >> > >> In Tomcat, we usually use the "validateQuery" so the DBCP will > test > >> each connection before giving > >> it to the application. If all connections in the pool has timed > out > >> (say no one has accessed OFBiz > >> in 8 hours), the DBCP creates new connections for the pool. > >> > >> If someone will tell me that this doesn't happen for PostgreSQL, > I'll > >> simply make the switch to > >> PostgreSQL rather than fix things in OFBiz for MySQL. > >> > >> Thanks. > >> > >> Jonathon > >> > > > > > > |
In reply to this post by jonwimp
Read this post:
*http://www.nabble.com/MySQL-connection-problem-tf2486297.html#a9092103 / Hans * Jonathon -- Improov skrev: > MySQL connections in the database connection pool time out (usually > set to 8 hours). Possible workarounds(?): > > 1. Increase the time out value and hope that someone will connect to > OFBiz > before the connections time out. > > 2. Fix OFBiz to allow to a "validateQuery" mechanism. > > 3. Use PostgreSQL. > > In Tomcat, we usually use the "validateQuery" so the DBCP will test > each connection before giving it to the application. If all > connections in the pool has timed out (say no one has accessed OFBiz > in 8 hours), the DBCP creates new connections for the pool. > > If someone will tell me that this doesn't happen for PostgreSQL, I'll > simply make the switch to PostgreSQL rather than fix things in OFBiz > for MySQL. > > Thanks. > > Jonathon > > |
Hans,
Thanks. That's the precise fix I needed. Why isn't it in the SVN trunk? Jonathon Hans Holmlund wrote: > Read this post: > *http://www.nabble.com/MySQL-connection-problem-tf2486297.html#a9092103 > > / Hans > * > Jonathon -- Improov skrev: >> MySQL connections in the database connection pool time out (usually >> set to 8 hours). Possible workarounds(?): >> >> 1. Increase the time out value and hope that someone will connect to >> OFBiz >> before the connections time out. >> >> 2. Fix OFBiz to allow to a "validateQuery" mechanism. >> >> 3. Use PostgreSQL. >> >> In Tomcat, we usually use the "validateQuery" so the DBCP will test >> each connection before giving it to the application. If all >> connections in the pool has timed out (say no one has accessed OFBiz >> in 8 hours), the DBCP creates new connections for the pool. >> >> If someone will tell me that this doesn't happen for PostgreSQL, I'll >> simply make the switch to PostgreSQL rather than fix things in OFBiz >> for MySQL. >> >> Thanks. >> >> Jonathon >> >> > > |
In reply to this post by cjhowe
Chris,
You mean go from MySQL to Xindice to PostgreSQL? Yeah, I know, data migration can be a pain, even without data-mapping efforts to go with it (ie, same structure migrated to same structure). > My limited reading through mailing lists on mysql and postgres show that the > timeout handling should be handled by the client (ofbiz) and that mysql > decided to provided an additional safe gaurd by moving an excessive, > configurable, "catch-all" timeout to the server. If you're saying that a DBCP should assume that a DB connection can never time out (always good), I rest my case. :) Personally, I think MySQL could've provided the option to drop the time out feature. FYI to all. If Hans is right, it appears OFBiz's DBCP XAPoolDataSource does handle connection time out after all! I'll check it out. Anyway, the fix, as mentioned by Hans, can be done in MinervaConnectionFactory. I'm gonna try just that. Problem solved. On with real work. You're right, OFBiz does abstract tons of infrastructure details away from application developers. Would be good if we can get our hands on the XAPoolDataSource source codes, just to confirm that the issue is fixed. Jonathon Chris Howe wrote: > Most of the data migration issues should be solvable by reading the > migration data files (entity-engine-xml) into a Xindice XML database > and running ofbiz services from there. This will allow XPath queries > for the hierarchical stuff, and XUpdate of children elements for error > messages and thus group error handling using XPath. I hope to be able > to get some of this together shortly as we've been using OFBiz to > slowly bring our legacy data in line with the OFBiz data model and the > legacy data is still in MySQL (and much more critical and can't really > take the risk that I "think" I changed the right data). > > If you don't have data, there shouldn't be any issue. Simply run the > ant run-install with a delegator calling the postgres datasource. The > impression I get is that most of the community in production is using > postgres unless they've purchased a license to a mssql or oracle. > > My limited reading through mailing lists on mysql and postgres show > that the timeout handling should be handled by the client (ofbiz) and > that mysql decided to provided an additional safe gaurd by moving an > excessive, configurable, "catch-all" timeout to the server. > > One of my favorite things about OFBiz is that you can get pretty far in > developing something useful without ever knowing or worrying about > database administration like this and can come back and address it when > you're fine tuning your deployment for bottlenecks before going live. > > --- Jonathon -- Improov <[hidden email]> wrote: > >> Chris, >> >> Wow, thanks. You sound like you've really been through it (the >> migration). >> >> Seems like quite a pain migrating to PostgreSQL from MySQL. But what >> if I don't have data to >> migrate? What if I just start over with PostgreSQL? Any problems? >> I'll take your advice regarding >> the migration gotchas. Thanks. >> >> Actually, about DBCPs. For RDBMSes, it's correct for database >> connections to time out after a set >> interval of inactivity. That's just prudence. When connections are >> used inside of DBCPs, it is the >> DBCP's responsibility to refresh timed out connections in the pool. I >> did my very first very own >> DBCP more than a decade ago, and that was one of the must-have >> functionalities for a DBCP. I was >> beaten up real bad for missing that out. :) >> >> The fix for this, in OFBiz, is in OFBiz's DBCP --- XAPoolDataSource. >> I posted another message >> asking for the source codes to ofbiz-minerva.jar. >> >> Jonathon >> >> Chris Howe wrote: >>> This does not occur in PostgreSQL. It is a "feature" of MySQL and >> they >>> (mysql) will smugly say that OFBiz doesn't handle the connection >> pool >>> correctly. I don't know and don't really care to know if it's true >> or >>> not. I switched over about 2 months ago and have had smooth >> sailing >>> since (even seemingly eliminated that UserLoginHistory bug that >> you're >>> aware of). >>> >>> Be warned, it's a bit of a pain to convert from MySQL to Postgres. >>> Most of the issues seem to be of how lax MySQL with data and how >>> stringent PostgreSQL is(at least the default installation). These >> were >>> some of the issues I came across with my data using the >> export/import >>> in webtools >>> >>> 1. the createdBy fields in the various entities weren't in the >> correct >>> case (i believe this has been solved in OFBiz, I just had data that >>> predated the fix) >>> 2. UserLogin and Party entites end up with a circular dependency >> based >>> on the partyId admin if the UserLogin admin created parties. >> Either >>> load the single Party record for partyId before loading the >> UserLogin >>> entity or remove the createdBy data from the Party entity >>> 3. Heirarchial parent->child relationships. This occurs with the >> *Type >>> entities. They simply need to be loaded in the correct order. >> There >>> is a JIRA issue which solves this problem for about the *Type >> entities >>> where the child is childId and the parent is parentChildId (e.g. >>> partyTypeId -> parentPartyTypeId) >>> >>> There may have been other referential integrity issues, but I think >>> they were self created and not created by OFBiz. >>> >>> --- Jonathon -- Improov <[hidden email]> wrote: >>> >>>> MySQL connections in the database connection pool time out >> (usually >>>> set to 8 hours). Possible >>>> workarounds(?): >>>> >>>> 1. Increase the time out value and hope that someone will connect >> to >>>> OFBiz >>>> before the connections time out. >>>> >>>> 2. Fix OFBiz to allow to a "validateQuery" mechanism. >>>> >>>> 3. Use PostgreSQL. >>>> >>>> In Tomcat, we usually use the "validateQuery" so the DBCP will >> test >>>> each connection before giving >>>> it to the application. If all connections in the pool has timed >> out >>>> (say no one has accessed OFBiz >>>> in 8 hours), the DBCP creates new connections for the pool. >>>> >>>> If someone will tell me that this doesn't happen for PostgreSQL, >> I'll >>>> simply make the switch to >>>> PostgreSQL rather than fix things in OFBiz for MySQL. >>>> >>>> Thanks. >>>> >>>> Jonathon >>>> >>> >> > > |
Chris, Hans,
Hans, first, thanks for your link to: http://www.nabble.com/MySQL-connection-problem-tf2486297.html#a9092103 FYI to all. I just tested the fix. Doesn't work. I still need the XAPoolDataSource source codes to fix this. Did Katherine Morgan post anything else? The fix was suggested by her. For those of you using MySQL, if you wanna test this out, edit your my.cnf file, maybe at /etc/my.cnf. Under [mysqld], enter "wait_timeout=60". When OFBiz is fully started up, wait for 1 minute, then try to do a login. Don't set it to 10 seconds, or you might never get it to work even if the DBCP functions correctly! OFBiz processes may take out a DB connection for more than 10 seconds, you never know. Jonathon Jonathon -- Improov wrote: > Chris, > > You mean go from MySQL to Xindice to PostgreSQL? Yeah, I know, data > migration can be a pain, even without data-mapping efforts to go with it > (ie, same structure migrated to same structure). > > > My limited reading through mailing lists on mysql and postgres show > that the > > timeout handling should be handled by the client (ofbiz) and that mysql > > decided to provided an additional safe gaurd by moving an excessive, > > configurable, "catch-all" timeout to the server. > > If you're saying that a DBCP should assume that a DB connection can > never time out (always good), I rest my case. :) Personally, I think > MySQL could've provided the option to drop the time out feature. > > FYI to all. If Hans is right, it appears OFBiz's DBCP XAPoolDataSource > does handle connection time out after all! I'll check it out. > > Anyway, the fix, as mentioned by Hans, can be done in > MinervaConnectionFactory. I'm gonna try just that. Problem solved. On > with real work. You're right, OFBiz does abstract tons of infrastructure > details away from application developers. > > Would be good if we can get our hands on the XAPoolDataSource source > codes, just to confirm that the issue is fixed. > > Jonathon > > Chris Howe wrote: >> Most of the data migration issues should be solvable by reading the >> migration data files (entity-engine-xml) into a Xindice XML database >> and running ofbiz services from there. This will allow XPath queries >> for the hierarchical stuff, and XUpdate of children elements for error >> messages and thus group error handling using XPath. I hope to be able >> to get some of this together shortly as we've been using OFBiz to >> slowly bring our legacy data in line with the OFBiz data model and the >> legacy data is still in MySQL (and much more critical and can't really >> take the risk that I "think" I changed the right data). >> >> If you don't have data, there shouldn't be any issue. Simply run the >> ant run-install with a delegator calling the postgres datasource. The >> impression I get is that most of the community in production is using >> postgres unless they've purchased a license to a mssql or oracle. >> >> My limited reading through mailing lists on mysql and postgres show >> that the timeout handling should be handled by the client (ofbiz) and >> that mysql decided to provided an additional safe gaurd by moving an >> excessive, configurable, "catch-all" timeout to the server. >> >> One of my favorite things about OFBiz is that you can get pretty far in >> developing something useful without ever knowing or worrying about >> database administration like this and can come back and address it when >> you're fine tuning your deployment for bottlenecks before going live. >> >> --- Jonathon -- Improov <[hidden email]> wrote: >> >>> Chris, >>> >>> Wow, thanks. You sound like you've really been through it (the >>> migration). >>> >>> Seems like quite a pain migrating to PostgreSQL from MySQL. But what >>> if I don't have data to migrate? What if I just start over with >>> PostgreSQL? Any problems? >>> I'll take your advice regarding the migration gotchas. Thanks. >>> >>> Actually, about DBCPs. For RDBMSes, it's correct for database >>> connections to time out after a set interval of inactivity. That's >>> just prudence. When connections are >>> used inside of DBCPs, it is the DBCP's responsibility to refresh >>> timed out connections in the pool. I >>> did my very first very own DBCP more than a decade ago, and that was >>> one of the must-have >>> functionalities for a DBCP. I was beaten up real bad for missing that >>> out. :) >>> >>> The fix for this, in OFBiz, is in OFBiz's DBCP --- XAPoolDataSource. >>> I posted another message asking for the source codes to >>> ofbiz-minerva.jar. >>> >>> Jonathon >>> >>> Chris Howe wrote: >>>> This does not occur in PostgreSQL. It is a "feature" of MySQL and >>> they >>>> (mysql) will smugly say that OFBiz doesn't handle the connection >>> pool >>>> correctly. I don't know and don't really care to know if it's true >>> or >>>> not. I switched over about 2 months ago and have had smooth >>> sailing >>>> since (even seemingly eliminated that UserLoginHistory bug that >>> you're >>>> aware of). >>>> Be warned, it's a bit of a pain to convert from MySQL to Postgres. >>>> Most of the issues seem to be of how lax MySQL with data and how >>>> stringent PostgreSQL is(at least the default installation). These >>> were >>>> some of the issues I came across with my data using the >>> export/import >>>> in webtools >>>> >>>> 1. the createdBy fields in the various entities weren't in the >>> correct >>>> case (i believe this has been solved in OFBiz, I just had data that >>>> predated the fix) >>>> 2. UserLogin and Party entites end up with a circular dependency >>> based >>>> on the partyId admin if the UserLogin admin created parties. >>> Either >>>> load the single Party record for partyId before loading the >>> UserLogin >>>> entity or remove the createdBy data from the Party entity >>>> 3. Heirarchial parent->child relationships. This occurs with the >>> *Type >>>> entities. They simply need to be loaded in the correct order. >>> There >>>> is a JIRA issue which solves this problem for about the *Type >>> entities >>>> where the child is childId and the parent is parentChildId (e.g. >>>> partyTypeId -> parentPartyTypeId) >>>> >>>> There may have been other referential integrity issues, but I think >>>> they were self created and not created by OFBiz. >>>> >>>> --- Jonathon -- Improov <[hidden email]> wrote: >>>> >>>>> MySQL connections in the database connection pool time out >>> (usually >>>>> set to 8 hours). Possible workarounds(?): >>>>> >>>>> 1. Increase the time out value and hope that someone will connect >>> to >>>>> OFBiz >>>>> before the connections time out. >>>>> >>>>> 2. Fix OFBiz to allow to a "validateQuery" mechanism. >>>>> >>>>> 3. Use PostgreSQL. >>>>> >>>>> In Tomcat, we usually use the "validateQuery" so the DBCP will >>> test >>>>> each connection before giving it to the application. If all >>>>> connections in the pool has timed >>> out >>>>> (say no one has accessed OFBiz in 8 hours), the DBCP creates new >>>>> connections for the pool. >>>>> >>>>> If someone will tell me that this doesn't happen for PostgreSQL, >>> I'll >>>>> simply make the switch to PostgreSQL rather than fix things in >>>>> OFBiz for MySQL. >>>>> >>>>> Thanks. >>>>> >>>>> Jonathon >>>>> >>>> >>> >> >> > > |
In reply to this post by jonwimp
The current approach is this
1.Export MySQL (or origin database) to XML files 2.Import the XML files in one at a time and either succeed or fail the entire file. Because of the SAX parser you're limited to in best case scenario to what you can fit into memory (because of the data OFBiz's data import logic, you're limited to one file at a time, even if two or more will fit in memory). Therefore you're guessing if referential integrity is maintained (exists) in subsequent XML data files. In the case of Postgres, once OFBiz creates the database schema, postgres handles referential integrity constraints. Because of this, it's not enough just to have the entity engine ignore its error on referential integrity with a dummy key, the dummy key actually has to be written to the destination database(which in my experience did not happen, I didn't look any deeper into this as a solution because of the other remaining issues). By reading all of the XML data files into an XML database first, children elements can be added to each record with error information. You'll also be able to trace through referential integrity to ensure it's maintained in subsequent records and then actually import the data with the PK records going in first. Additionally, you can test an entire data set for importation, report back all the errors and let the user make adjustments as is needed. It should be very interesting and fairly simple to implement as the solution is more logic based instead of code based. It would be great if someone could review OFBIZ-851 and maybe add it as a specialpurpose app so that others can play with it and contribute ideas. --- Jonathon -- Improov <[hidden email]> wrote: > Chris, > > You mean go from MySQL to Xindice to PostgreSQL? Yeah, I know, data > migration can be a pain, even > without data-mapping efforts to go with it (ie, same structure > migrated to same structure). |
Chris,
Usually, in MySQL (or MSSQL or many other RDBMSes), we disable the foreign key constraints check first, then dump in all the data, then re-enable the key constraints. That way, we don't have to figure out the top-down hierarchy of the database structure and insert data from the top to the bottom. Does the data import tool allow the above? It's usually just a single flag to toggle on/off (foreign_key_checks for MySQL). Jonathon Chris Howe wrote: > The current approach is this > 1.Export MySQL (or origin database) to XML files > 2.Import the XML files in one at a time and either succeed or fail the > entire file. > > Because of the SAX parser you're limited to in best case scenario to > what you can fit into memory (because of the data OFBiz's data import > logic, you're limited to one file at a time, even if two or more will > fit in memory). Therefore you're guessing if referential integrity is > maintained (exists) in subsequent XML data files. > > In the case of Postgres, once OFBiz creates the database schema, > postgres handles referential integrity constraints. Because of this, > it's not enough just to have the entity engine ignore its error on > referential integrity with a dummy key, the dummy key actually has to > be written to the destination database(which in my experience did not > happen, I didn't look any deeper into this as a solution because of the > other remaining issues). > > By reading all of the XML data files into an XML database first, > children elements can be added to each record with error information. > You'll also be able to trace through referential integrity to ensure > it's maintained in subsequent records and then actually import the data > with the PK records going in first. Additionally, you can test an > entire data set for importation, report back all the errors and let the > user make adjustments as is needed. It should be very interesting and > fairly simple to implement as the solution is more logic based instead > of code based. > > It would be great if someone could review OFBIZ-851 and maybe add it as > a specialpurpose app so that others can play with it and contribute > ideas. > > --- Jonathon -- Improov <[hidden email]> wrote: > >> Chris, >> >> You mean go from MySQL to Xindice to PostgreSQL? Yeah, I know, data >> migration can be a pain, even >> without data-mapping efforts to go with it (ie, same structure >> migrated to same structure). > > |
Jonathon,
That's interesting. Until you just said that, it has never even occurred to me to trust that the data being imported was consistent. When I first started playing with php and mysql a few years ago, I had only used MyISAM tables, which doesn't (or at least didn't, perhaps still doesn't) handle the enforcement of referential integrity. So, I guess that's where my bias lies. In regards to the hierarchy, it's already specified in the entity model. It just needs to be walked. In any regards, I'll be playing with this in the near term as it has several advantages and will help me with several other ideas that deal with the loads of XML in OFBiz. --- Jonathon -- Improov <[hidden email]> wrote: > Chris, > > Usually, in MySQL (or MSSQL or many other RDBMSes), we disable the > foreign key constraints check > first, then dump in all the data, then re-enable the key constraints. > That way, we don't have to > figure out the top-down hierarchy of the database structure and > insert data from the top to the > bottom. > > Does the data import tool allow the above? It's usually just a single > flag to toggle on/off > (foreign_key_checks for MySQL). > > Jonathon > > Chris Howe wrote: > > The current approach is this > > 1.Export MySQL (or origin database) to XML files > > 2.Import the XML files in one at a time and either succeed or fail > the > > entire file. > > > > Because of the SAX parser you're limited to in best case scenario > to > > what you can fit into memory (because of the data OFBiz's data > import > > logic, you're limited to one file at a time, even if two or more > will > > fit in memory). Therefore you're guessing if referential integrity > is > > maintained (exists) in subsequent XML data files. > > > > In the case of Postgres, once OFBiz creates the database schema, > > postgres handles referential integrity constraints. Because of > this, > > it's not enough just to have the entity engine ignore its error on > > referential integrity with a dummy key, the dummy key actually has > to > > be written to the destination database(which in my experience did > not > > happen, I didn't look any deeper into this as a solution because of > the > > other remaining issues). > > > > By reading all of the XML data files into an XML database first, > > children elements can be added to each record with error > information. > > You'll also be able to trace through referential integrity to > ensure > > it's maintained in subsequent records and then actually import the > data > > with the PK records going in first. Additionally, you can test an > > entire data set for importation, report back all the errors and let > the > > user make adjustments as is needed. It should be very interesting > and > > fairly simple to implement as the solution is more logic based > instead > > of code based. > > > > It would be great if someone could review OFBIZ-851 and maybe add > it as > > a specialpurpose app so that others can play with it and contribute > > ideas. > > > > --- Jonathon -- Improov <[hidden email]> wrote: > > > >> Chris, > >> > >> You mean go from MySQL to Xindice to PostgreSQL? Yeah, I know, > data > >> migration can be a pain, even > >> without data-mapping efforts to go with it (ie, same structure > >> migrated to same structure). > > > > > > |
Chris,
Yeah, MyISAM was... "difficult". Other than lack of referential integrity, transaction atomicity had to be especially handled. I don't know where MySQL will be today if not for InnoDB. I don't know where they'll be in future, since InnoDB license seems to be changing soon. In relation to your question of MySQL database being consistent or not, it is. Unless OFBiz entity engine doesn't create the referential keys (which it does), the MySQL-based OFBiz database is consistent. The hierarchical walk you mentioned seems like too much work, perhaps even computationally expensive. I would've preferred a dumb dump (same structure to same structure), complete with referential keys. I'm sure there are very largely similar SQL-standard features between MySQL and PostgreSQL. A typical application migration (turn off old app, migrate CURRENT data and whatnot, turn on new app) must take less than 8 hours (overnight). All migration work must be scripted to run automatically, completely free of human error. Should we enhance the data import utility, perhaps? You mentioned it can only take in a few(?) tables/files (XML) at a time? Jonathon Chris Howe wrote: > Jonathon, > > That's interesting. Until you just said that, it has never even > occurred to me to trust that the data being imported was consistent. > When I first started playing with php and mysql a few years ago, I had > only used MyISAM tables, which doesn't (or at least didn't, perhaps > still doesn't) handle the enforcement of referential integrity. So, I > guess that's where my bias lies. > > In regards to the hierarchy, it's already specified in the entity > model. It just needs to be walked. In any regards, I'll be playing > with this in the near term as it has several advantages and will help > me with several other ideas that deal with the loads of XML in OFBiz. > > --- Jonathon -- Improov <[hidden email]> wrote: > >> Chris, >> >> Usually, in MySQL (or MSSQL or many other RDBMSes), we disable the >> foreign key constraints check >> first, then dump in all the data, then re-enable the key constraints. >> That way, we don't have to >> figure out the top-down hierarchy of the database structure and >> insert data from the top to the >> bottom. >> >> Does the data import tool allow the above? It's usually just a single >> flag to toggle on/off >> (foreign_key_checks for MySQL). >> >> Jonathon >> >> Chris Howe wrote: >>> The current approach is this >>> 1.Export MySQL (or origin database) to XML files >>> 2.Import the XML files in one at a time and either succeed or fail >> the >>> entire file. >>> >>> Because of the SAX parser you're limited to in best case scenario >> to >>> what you can fit into memory (because of the data OFBiz's data >> import >>> logic, you're limited to one file at a time, even if two or more >> will >>> fit in memory). Therefore you're guessing if referential integrity >> is >>> maintained (exists) in subsequent XML data files. >>> >>> In the case of Postgres, once OFBiz creates the database schema, >>> postgres handles referential integrity constraints. Because of >> this, >>> it's not enough just to have the entity engine ignore its error on >>> referential integrity with a dummy key, the dummy key actually has >> to >>> be written to the destination database(which in my experience did >> not >>> happen, I didn't look any deeper into this as a solution because of >> the >>> other remaining issues). >>> >>> By reading all of the XML data files into an XML database first, >>> children elements can be added to each record with error >> information. >>> You'll also be able to trace through referential integrity to >> ensure >>> it's maintained in subsequent records and then actually import the >> data >>> with the PK records going in first. Additionally, you can test an >>> entire data set for importation, report back all the errors and let >> the >>> user make adjustments as is needed. It should be very interesting >> and >>> fairly simple to implement as the solution is more logic based >> instead >>> of code based. >>> >>> It would be great if someone could review OFBIZ-851 and maybe add >> it as >>> a specialpurpose app so that others can play with it and contribute >>> ideas. >>> >>> --- Jonathon -- Improov <[hidden email]> wrote: >>> >>>> Chris, >>>> >>>> You mean go from MySQL to Xindice to PostgreSQL? Yeah, I know, >> data >>>> migration can be a pain, even >>>> without data-mapping efforts to go with it (ie, same structure >>>> migrated to same structure). >>> >> > > |
On Apr 13, 2007, at 9:42 PM, Jonathon -- Improov wrote: > The hierarchical walk you mentioned seems like too much work, > perhaps even computationally expensive. I would've preferred a dumb > dump (same structure to same structure), complete with referential > keys. I'm sure there are very largely similar SQL-standard features > between MySQL and PostgreSQL. Actually, it's worse, it is not possible. The problem is it isn't an hierarchy, it's a graph, and this graph has loops in it.... So, yeah, an ordered traversal is impossible. -David smime.p7s (3K) Download Attachment |
David,
Yeah, you're right. You just reminded me of the loops in the data structure (self-referencing entities?). I'll look for a way to port MySQL data directly to PostgreSQL then. Even using standard SQL shouldn't be a problem, I suppose. I think it's time I started getting in touch with PostgreSQL. Jonathon David E. Jones wrote: > > On Apr 13, 2007, at 9:42 PM, Jonathon -- Improov wrote: > >> The hierarchical walk you mentioned seems like too much work, perhaps >> even computationally expensive. I would've preferred a dumb dump (same >> structure to same structure), complete with referential keys. I'm sure >> there are very largely similar SQL-standard features between MySQL and >> PostgreSQL. > > Actually, it's worse, it is not possible. The problem is it isn't an > hierarchy, it's a graph, and this graph has loops in it.... So, yeah, an > ordered traversal is impossible. > > -David > > |
With the OFBiz tools there are a couple of options. 1. on the check/update page in webtools you can drop and recreate all foreign keys 2. in the import use the dummy-fks option which will insert dummy records as needed with pk only values to satisfy primary key constraints Yeah, there are various loops: self-referencing entities (A -> A) and mulit-entity loops (A->B->A; A->B->C->A; etc). -David On Apr 14, 2007, at 12:25 AM, Jonathon -- Improov wrote: > David, > > Yeah, you're right. You just reminded me of the loops in the data > structure (self-referencing entities?). > > I'll look for a way to port MySQL data directly to PostgreSQL then. > Even using standard SQL shouldn't be a problem, I suppose. > > I think it's time I started getting in touch with PostgreSQL. > > Jonathon > > David E. Jones wrote: >> On Apr 13, 2007, at 9:42 PM, Jonathon -- Improov wrote: >>> The hierarchical walk you mentioned seems like too much work, >>> perhaps even computationally expensive. I would've preferred a >>> dumb dump (same structure to same structure), complete with >>> referential keys. I'm sure there are very largely similar SQL- >>> standard features between MySQL and PostgreSQL. >> Actually, it's worse, it is not possible. The problem is it isn't >> an hierarchy, it's a graph, and this graph has loops in it.... So, >> yeah, an ordered traversal is impossible. >> -David > smime.p7s (3K) Download Attachment |
--- "David E. Jones" <[hidden email]> wrote: [...] > 2. in the import use the dummy-fks option which will insert dummy > records as needed with pk only values to satisfy primary key > constraints > Are the dummy records supposed to actually be inserted into the database or simply pacify the entity engine? If it's the former, it did not occur in my migration with code from about middle December '06 with the target being postgres. Thanks! |
In reply to this post by David E Jones
--- "David E. Jones" <[hidden email]> wrote: > > Yeah, there are various loops: self-referencing entities (A -> A) and > mulit-entity loops (A->B->A; A->B->C->A; etc). > -David > So, I went ahead and wrote a script to walk and order the entities and it turns out there are only two loops which are actually more like kinks (granted it takes 17 passes to reduce the relationships to these two loops, but it does get there ). Knowing the order of entering entity data that won't fail and need to be retried on subsequent passes will more than make up for the three minutes of processing time it takes to determine. These should be identifiable by A-B = B-A. The A->B->C->A loops and greater would obviously be difficult to identify, but it doesn't currently exist in OFBiz, so I'll assume that it's theoretical and not likely to exist in a highly normalized generic data model. You have the self-referencing entities (A=A) which you can avoid referential integrity issues by walking the record hierarchy of that entity parent->child. These are easily identified by having both the entity and rel-entity equal to one another. The two restricting loops are both A->B->A 1. UserLogin->Party->UserLogin This is caused by a denormalized(non-normalized) field Party.createdBy and the application specific field UserLogin.partyId. 2. FinAccountTrans->Payment->FinAccountTrans I haven't looked at the application logic, but it appears by looking at the data model that either FinAccountTrans.paymentId or Payment.finAccountTransId is redundant. Judging by the rest of FinAccountTrans, I would say that the paymentId is the one misplaced as there is much denormalized information. I wouldn't suspect that this is a heavily read area of the data model that requires denormalization. #1 can be addressed by ordering the records or by treating as a graph whereby creating a two column temporary join table (A__B ie UserLogin__Party) hold the referential data, set the fK to null, load all the records, then run an update from the temporary table to the original entity. #2 can be probably be addressed by fixing the logic as there are likely 1:1 relationships between the records and therefore a misplaced fk. |
I'm not sure what you used to find loops in the graph but I know just off the top of my head that there are others, and I'm guessing quite a few others. Actually a highly normalized data model that fulfills a large number of different requirements probably tends to cause more loops. Some off the of my head real quick: - ProductCategory points to self - ProductStore points to Facility, Facility points to ProductStore Certain other ones are loops but are actually workable, like ProductCategory(1) -> ProductCategoryRollup -> ProductCategory(2) -> ProductCategoryRollup -> ProductCategory(1) which is okay because you can do the ProductCategory records first and then do the rollups. One thing to consider is that there are certain loops that can possibly be resolved by sorting records instead of entities. That makes things significantly more complicated, but for certain scenarios it's the only solution. If you really want to go for this and try to code something that will sort both entities and records within entities to get something that will handle most but not all cases, go for it, I won't stop you. I'm just warning you that I looked into this once and did the background work and found it to be a tough nut to crack, probably not worth the effort as other solutions are easier. Also, this has been tried before by other people who made some progress when they got into testing with real data and such ran into problem after problem and never found a workable solution. Maybe the best thing for me to do is shut up and let you give it a try. Let me know how it goes. -David On Apr 15, 2007, at 4:22 AM, Chris Howe wrote: > > --- "David E. Jones" <[hidden email]> wrote: > >> >> Yeah, there are various loops: self-referencing entities (A -> A) and >> mulit-entity loops (A->B->A; A->B->C->A; etc). >> -David >> > > So, I went ahead and wrote a script to walk and order the entities and > it turns out there are only two loops which are actually more like > kinks (granted it takes 17 passes to reduce the relationships to these > two loops, but it does get there ). Knowing the order of entering > entity data that won't fail and need to be retried on subsequent > passes > will more than make up for the three minutes of processing time it > takes to determine. > > These should be identifiable by A-B = B-A. The A->B->C->A loops and > greater would obviously be difficult to identify, but it doesn't > currently exist in OFBiz, so I'll assume that it's theoretical and not > likely to exist in a highly normalized generic data model. > > You have the self-referencing entities (A=A) which you can avoid > referential integrity issues by walking the record hierarchy of that > entity parent->child. These are easily identified by having both the > entity and rel-entity equal to one another. > > The two restricting loops are both A->B->A > 1. UserLogin->Party->UserLogin > This is caused by a denormalized(non-normalized) field Party.createdBy > and the application specific field UserLogin.partyId. > > 2. FinAccountTrans->Payment->FinAccountTrans > I haven't looked at the application logic, but it appears by > looking at > the data model that either FinAccountTrans.paymentId or > Payment.finAccountTransId is redundant. Judging by the rest of > FinAccountTrans, I would say that the paymentId is the one > misplaced as > there is much denormalized information. I wouldn't suspect that this > is a heavily read area of the data model that requires > denormalization. > > #1 can be addressed by ordering the records or by treating as a graph > whereby creating a two column temporary join table (A__B ie > UserLogin__Party) hold the referential data, set the fK to null, load > all the records, then run an update from the temporary table to the > original entity. > > #2 can be probably be addressed by fixing the logic as there are > likely > 1:1 relationships between the records and therefore a misplaced fk. smime.p7s (3K) Download Attachment |
First, thank you very much for the feedback! Your comments have been
extremely helpful in me wrapping my brain around this nut. Comments inline. --- "David E. Jones" <[hidden email]> wrote: > > I'm not sure what you used to find loops in the graph but I know just > off the top of my head that there are others, and I'm guessing quite > a few others. Actually a highly normalized data model that fulfills a > large number of different requirements probably tends to cause more > loops. > I put the logic at the end for critique > Some off the of my head real quick: > > - ProductCategory points to self All of the self referential issues have been accounted for. These are found with a relationship of entity=relEntity and therefore records must be ordered for the pass to be successful. > - ProductStore points to Facility, Facility points to ProductStore This is a one-nofk relationship. I assumed that this type of relationship doesn't require maintaining referential integrity. Was this assumption correct? If so, would this be a better relationship type for UserLogin->Party? > > Certain other ones are loops but are actually workable, like > ProductCategory(1) -> ProductCategoryRollup -> ProductCategory(2) -> > ProductCategoryRollup -> ProductCategory(1) which is okay because you > can do the ProductCategory records first and then do the rollups. This is another one-nofk > One thing to consider is that there are certain loops that can > possibly be resolved by sorting records instead of entities. That > makes things significantly more complicated, but for certain > scenarios it's the only solution. This is how the UserLogin -> Party would be handled assuming we don't change the relationship to one-nofk :-) > > If you really want to go for this and try to code something that will > sort both entities and records within entities to get something that > will handle most but not all cases, go for it, I won't stop you. > > I'm just warning you that I looked into this once and did the > background work and found it to be a tough nut to crack, probably not > worth the effort as other solutions are easier. Also, this has been > tried before by other people who made some progress when they got > into testing with real data and such ran into problem after problem > and never found a workable solution. > > Maybe the best thing for me to do is shut up and let you give it a > try. Let me know how it goes. I wish you wouldn't. The gotcha scenarios are very helpful in the cracking process. If the gotchas can't ungotched...then warnings and caveats can be put into the most workable solution so that it can be left for someone else to be able to pick it up and only work on the uncracked portions. > > -David > Two Entities: 1) EntityMap [entity, insertionPass, insertionScenario] 2) EntityRelationshipMap [primaryEntity, relatedEntity, ignoreRelationshipFlag] Logic process: Setup- 1) Load all entities into EntityMaps; 2) Load all relationships of type one into EntityRelationshipMaps; set all ignoreRelationshipFlags to "N"; Locate relationship scenario A->B->A 3) Iterate list looking for instances of primaryEntity_relatedEntity = relatedEntity_primaryEntity. When found set ignoreRelationshipFlag to "Y" and set EntityMap.insertionScenario for the primaryEntity and relatedEntity to "ABA" Locate relationship scenario A->A 4) Iterate list looking for instances where primaryEntity = relatedEntity; set ignoreRelationshipFlag to "Y" and set EntityMap.insertionScarion to "AA" Determine order 5) Find all EntityMaps where insertionPass is null 6) Iterate list; find the number of times it exists in the EntityRelationshipMap where ignoreRelationshipFlag = "N" 7) For all that return 0, set EntityMaps.insertionPass to current pass number; find all records where the current entity exists in EntityRelationshipMap.relatedEntity and set EntityRelationshipMap.ignoreRelationshipFlag to "Y" 8) increment the current pass; go to 5 until no more exist. |
Free forum by Nabble | Edit this page |