TECH: Kettle, Database join replace variables
The "Replace variables" check box in Kettle database join transformations can intuitively mean two things. It could be "replace values" for variables defined / declared in earlier transformation steps. But, this is not what it is. The check box enables inserting job scope variables in sql code.
A job variable can be defined in a "Set variables" job entry. For instance we can set STARTDATE_VAR to '2012-08-25'. In sql we can use this variable with
....
where
(factsWebLog.full_request_date >= ${STARTDATE_VAR});
The syntax is the same in Table input steps.
****
Replace values is a different story. I programmed this functionality using the already present "Replace Variables" check box. The code is not suitable for production systems.
Such functionality could make transformations simpler than today. We could then initialize the variables before branches and run database join without thinking about keeping the attributes in the same order.
With the changes outlined below, we can initializing all needed variables before use in a transformation. We then do not have to worry about automatically created dublicates and different number and order of variables when branches join again.
What the changes in practice does, is to search for previous variables (attributes/ fields) with the same name instead of adding them. The changes are quite simple.
A limitation: if ReplaceVariables is checked, all variables have to be created beforehand. Then, database join does not automatically create missing variables.
In src/org/pentaho/di/trans/steps/databasejoin/DatabaseJoinMeta.java, I added an if around Row.addRowMeta. The fields should only be added to the row if Variable Replace is not selected. If selected, we use fields that are already present.
if (isVariableReplace() == false) {
Row.addRowMeta(add);
}
In DatabaseJoin.java I declare and load an array of strings with the fields returned by the sql statement. This is in the class DatabaseJoin, method lookupValues, just before the while statement:
String[] newFields = addMeta.getFieldNames();
Then within the for statement I added:
if (MetaPr.isVariableReplace()) {
int prevFieldIndex = RowMeta.indexOfValue(newFields[i]);
if (prevFieldIndex == -1) {
log.logError(toString(), "The new variable from Database join: " + newFields[i] + " is not found among the previous variables! It has to be declared earlier when replace_variable is checked!");
}
/* This new statement adds
the value returned from sql to an already present field */
newRow[prevFieldIndex] = add[i];
}
else {
/* This is the previous statement adding
the new value to the new field */
newRow[newIndex++] = add[i];
}
A little below in the same method, after // Nothing found? Perhaps we have to put something out after all? I added:
if (MetaPr.isVariableReplace()) {
int prevFieldIndex = RowMeta.indexOfValue(newFields[i]);
if (prevFieldIndex == -1) {
log.logError(toString(), "The new variable from Database join: " + newFields[i] + " is not found among the previous variables! It has to be declared earlier when replace_variable is checked!");
}
newRow[prevFieldIndex] = DataPr.notfound[i];
}
else {
/* This is the previous statement */
newRow[newIndex++] = DataPr.notfound[i];
}
As you see, the changes are quite simple. This shows the strength of open source. You can change the code if it does not do what you want. However it is not pleasant to maintain own code and syncronize with new Kettle versions. So this code is mainly for experimental purposes.
Please contact birger@telescope.no if you have questions or comments.
Skrevet av Birger Baksaas
18.03.2010.
Se flere innlegg
|