|
|
TECH: Kettle, Database join replace values
Replace values in Kettle makes the transformations much simpler than before. We can then initialize the variables before branches and run database join without thinking about keeping the attributes in the right order.
According to the change log at Pentaho, this should have been implemented/ fixed in 3.2.3, but this revision is not available at sourceforge. It is not corrected in 3.2.5 source. The change log indicates that the problem is not properly understood. I therefore programmed the changes.
With the changes below, we can do more ordinary programming by initializing all needed variables before operations. We do not have to worry about automatically created dublicates and different number of variables after branches join again.
What the changes in practice does, is to search for previous variables (attributes/ fields) with the same name instead of adding them. The changes are quite simple.
A limitation: if ReplaceVariables is checked, all variables have to be created beforehand. Then, database join does not automatically create missing variables.
In src/org/pentaho/di/trans/steps/databasejoin/DatabaseJoinMeta.java, I added an if around Row.addRowMeta. The fields should only be added to the row if Variable Replace is not selected. If selected, we use fields that are already present.
if (isVariableReplace() == false) {
Row.addRowMeta(add);
}
In DatabaseJoin.java I declare and load an array of strings with the fields returned by the sql statement. This is in the class DatabaseJoin, method lookupValues, just before the while statement:
String[] newFields = addMeta.getFieldNames();
Then within the for statement I added:
if (MetaPr.isVariableReplace()) {
int prevFieldIndex = RowMeta.indexOfValue(newFields[i]);
if (prevFieldIndex == -1) {
log.logError(toString(), "The new variable from Database join: " + newFields[i] + " is not found among the previous variables! It has to be declared earlier when replace_variable is checked!");
}
/* This new statement adds
the value returned from sql to an already present field */
newRow[prevFieldIndex] = add[i];
}
else {
/* This is the previous statement adding
the new value to the new field */
newRow[newIndex++] = add[i];
}
A little below in the same method, after // Nothing found? Perhaps we have to put something out after all? I added:
if (MetaPr.isVariableReplace()) {
int prevFieldIndex = RowMeta.indexOfValue(newFields[i]);
if (prevFieldIndex == -1) {
log.logError(toString(), "The new variable from Database join: " + newFields[i] + " is not found among the previous variables! It has to be declared earlier when replace_variable is checked!");
}
newRow[prevFieldIndex] = DataPr.notfound[i];
}
else {
/* This is the previous statement */
newRow[newIndex++] = DataPr.notfound[i];
}
As you see, the changes are quite simple. This shows the strength of open source. You can change the code if it does not do what you want.
Please contact birger@telescope.no if you have questions or comments.
Skrevet av Birger Baksaas
18.03.2010.
|
|
|