Pentaho Metadata

When a Business View contains 5 or more tables, incorrect SQL is generated when a subset of tables is selected from.

Details

  • Type: Bug Bug
  • Status: Open Open
  • Priority: Severe Severe
  • Resolution: Unresolved
  • Affects Version/s: Pentaho 1.6 GA
  • Fix Version/s: SUGAR
  • Component/s: SQL Generator
  • Description:
    When a Business View contains 5 or more tables, incorrect SQL is generated when a subset of tables is selected from. It ends up adding extra joins to tables that are not necessary for the generated query.
  1. metadata-patch.zip
    (92 kB)
    Nick Coleman
    26/Sep/08 12:58 PM

Activity

Hide
Will Gorman added a comment - 20/Dec/07 3:52 PM
SVN Commit #17880, modified BusinessModel.getShortestPathBetween() algorithm
Show
Will Gorman added a comment - 20/Dec/07 3:52 PM SVN Commit #17880, modified BusinessModel.getShortestPathBetween() algorithm
Hide
Nick Coleman added a comment - 26/Sep/08 12:58 PM
The issues stems from the fact that the code assumes N tables will always be connected through N-1 connections. However, intermediate tables may really be only to connect tables that the user is actually interested in which changes the problem to needing > N-1 connections.

I started with a crazy model that incorporated 15 tables. Several of them were outer joins to the same table. I ran into the getShortestPath bug and I created a new solution. It is enclosed entirely withing the org.pentaho.pms.mql.graph package. It uses a basic AC3-like arc consistency algorithm to allow find the best path quickly and efficiently. I then continued to add support to allow the user to choose between one of five different techniques for generating a path:
1) All - use all tables and joins
2) Any Relevant - use all joins that lead to any used table regardless if there is more than one path
3) First - returns the first path that only contains that contains no redundant joins between tables
4) Shortest - returns a path connecting every required table with the smallest number of joins
5) Best Score - returns a path that chooses all the joins that will create the smallest score based on the estimated size of a table

After this I went on and included some code from the trunk to use outer joins and discovered some bugs in the DefaultSQLDialect.java class when making assumptions about how to build inner and outer joins.

Then I added another property to the BusinessTable that is a where clause filter which gets added any time the table is used. This was necessary for me because I needed to import the same table multiple times with a different selection of data in each table.

I also enhanced to the PMSFormula.java class to make the resolving of columns in formulas more consistent. I was finding that the class used one method during validation and another during actual code generation so I changed it to resolve as follows:
   * 1) <BUSINESS TABLE ID>.<BUSINESS COLUMN ID>
   * 2) <BUSINESS CATEGORY ID>.<BUSINESS COLUMN ID>
   * 3) <PHYSICAL TABLE>.<PHYSICAL COLUMN>
   * 4) <PHYSICAL TABLE ID>.<PHYSICAL COLUMN ID>
   * 5) <BUSINESS COLUMN ID>
   * 6) <PHYSICAL COLUMN>
   * 7) <PHYSICAL COLUMN ID>



I would hope that Pentaho will use all of these enhancements, but at the very least it should look at the getShortestPath changes and the DefaultSQLDialect changes. The get shortest path changes will work for the latest code in the 1.7 branch and the trunk, but the other files are more appropriate for the trunk given the use of outer joins.

-Nick
Show
Nick Coleman added a comment - 26/Sep/08 12:58 PM The issues stems from the fact that the code assumes N tables will always be connected through N-1 connections. However, intermediate tables may really be only to connect tables that the user is actually interested in which changes the problem to needing > N-1 connections. I started with a crazy model that incorporated 15 tables. Several of them were outer joins to the same table. I ran into the getShortestPath bug and I created a new solution. It is enclosed entirely withing the org.pentaho.pms.mql.graph package. It uses a basic AC3-like arc consistency algorithm to allow find the best path quickly and efficiently. I then continued to add support to allow the user to choose between one of five different techniques for generating a path: 1) All - use all tables and joins 2) Any Relevant - use all joins that lead to any used table regardless if there is more than one path 3) First - returns the first path that only contains that contains no redundant joins between tables 4) Shortest - returns a path connecting every required table with the smallest number of joins 5) Best Score - returns a path that chooses all the joins that will create the smallest score based on the estimated size of a table After this I went on and included some code from the trunk to use outer joins and discovered some bugs in the DefaultSQLDialect.java class when making assumptions about how to build inner and outer joins. Then I added another property to the BusinessTable that is a where clause filter which gets added any time the table is used. This was necessary for me because I needed to import the same table multiple times with a different selection of data in each table. I also enhanced to the PMSFormula.java class to make the resolving of columns in formulas more consistent. I was finding that the class used one method during validation and another during actual code generation so I changed it to resolve as follows:    * 1) <BUSINESS TABLE ID>.<BUSINESS COLUMN ID>    * 2) <BUSINESS CATEGORY ID>.<BUSINESS COLUMN ID>    * 3) <PHYSICAL TABLE>.<PHYSICAL COLUMN>    * 4) <PHYSICAL TABLE ID>.<PHYSICAL COLUMN ID>    * 5) <BUSINESS COLUMN ID>    * 6) <PHYSICAL COLUMN>    * 7) <PHYSICAL COLUMN ID> I would hope that Pentaho will use all of these enhancements, but at the very least it should look at the getShortestPath changes and the DefaultSQLDialect changes. The get shortest path changes will work for the latest code in the 1.7 branch and the trunk, but the other files are more appropriate for the trunk given the use of outer joins. -Nick
Hide
Matt Casters added a comment - 12/Mar/09 3:11 AM
If the changes stack up and the unit tests reasonably pass I would love to see this new algorithm at work.
Show
Matt Casters added a comment - 12/Mar/09 3:11 AM If the changes stack up and the unit tests reasonably pass I would love to see this new algorithm at work.
Hide
Will Gorman added a comment - 31/Mar/10 8:56 AM
We should evaluate the patch for inclusion
Show
Will Gorman added a comment - 31/Mar/10 8:56 AM We should evaluate the patch for inclusion

People

Dates

  • Created:
    19/Dec/07 3:23 PM
    Updated:
    08/Apr/10 8:30 AM