History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: PMD-263
Type: Bug Bug
Status: Ready For Test Ready For Test
Priority: Unknown Unknown
Assignee: Will Gorman
Reporter: Jared Cornelius
Votes: 1
Watchers: 2
Operations

If you were logged in you would be able to see more operations.
Pentaho Metadata

When a Business View contains 5 or more tables, incorrect SQL is generated when a subset of tables is selected from.

Created: 19/Dec/07 03:23 PM   Updated: 12/Mar/09 03:11 AM
Component/s: SQL Generator
Affects Version/s: Pentaho 1.6 GA
Fix Version/s: Future Release

Time Tracking:
Not Specified

File Attachments: 1. Zip Archive metadata-patch.zip (92 kb)



 Description  « Hide
When a Business View contains 5 or more tables, incorrect SQL is generated when a subset of tables is selected from. It ends up adding extra joins to tables that are not necessary for the generated query.

 All   Comments   Work Log   Change History   Transitions      Sort Order: Ascending order - Click to sort in descending order
Will Gorman - 20/Dec/07 03:52 PM
SVN Commit #17880, modified BusinessModel.getShortestPathBetween() algorithm

Nick Coleman - 26/Sep/08 12:58 PM
The issues stems from the fact that the code assumes N tables will always be connected through N-1 connections. However, intermediate tables may really be only to connect tables that the user is actually interested in which changes the problem to needing > N-1 connections.

I started with a crazy model that incorporated 15 tables. Several of them were outer joins to the same table. I ran into the getShortestPath bug and I created a new solution. It is enclosed entirely withing the org.pentaho.pms.mql.graph package. It uses a basic AC3-like arc consistency algorithm to allow find the best path quickly and efficiently. I then continued to add support to allow the user to choose between one of five different techniques for generating a path:
1) All - use all tables and joins
2) Any Relevant - use all joins that lead to any used table regardless if there is more than one path
3) First - returns the first path that only contains that contains no redundant joins between tables
4) Shortest - returns a path connecting every required table with the smallest number of joins
5) Best Score - returns a path that chooses all the joins that will create the smallest score based on the estimated size of a table

After this I went on and included some code from the trunk to use outer joins and discovered some bugs in the DefaultSQLDialect.java class when making assumptions about how to build inner and outer joins.

Then I added another property to the BusinessTable that is a where clause filter which gets added any time the table is used. This was necessary for me because I needed to import the same table multiple times with a different selection of data in each table.

I also enhanced to the PMSFormula.java class to make the resolving of columns in formulas more consistent. I was finding that the class used one method during validation and another during actual code generation so I changed it to resolve as follows:
   * 1) <BUSINESS TABLE ID>.<BUSINESS COLUMN ID>
   * 2) <BUSINESS CATEGORY ID>.<BUSINESS COLUMN ID>
   * 3) <PHYSICAL TABLE>.<PHYSICAL COLUMN>
   * 4) <PHYSICAL TABLE ID>.<PHYSICAL COLUMN ID>
   * 5) <BUSINESS COLUMN ID>
   * 6) <PHYSICAL COLUMN>
   * 7) <PHYSICAL COLUMN ID>



I would hope that Pentaho will use all of these enhancements, but at the very least it should look at the getShortestPath changes and the DefaultSQLDialect changes. The get shortest path changes will work for the latest code in the 1.7 branch and the trunk, but the other files are more appropriate for the trunk given the use of outer joins.

-Nick

Matt Casters - 12/Mar/09 03:11 AM
If the changes stack up and the unit tests reasonably pass I would love to see this new algorithm at work.