This node helps handle missing values found in cells of the input
table. The first tab in the dialog (labeled "Default") provides
default handling options for all columns of a given type.
These settings apply to all columns in the input table that are not
explicitly mentioned in the second tab, labeled "Individual". This
second tab permits individual settings for each available column
(thus, overriding the default). To make use of this second approach,
select a column or a list of columns which needs
extra handling, click "Add", and set the parameters. Click on the
label with the column name(s), will select all covered columns
in the column list. To remove this extra handling (and instead use
the default handling), click the "Remove" button for this column.
Options marked with an asterisk (*) will result in non-standard PMML.
If you select such an option, the warning label in the dialog will become
red and a warning will be shown during execution of the node. Non-standard
PMML uses extensions that cannot be read by other tools than Knime.
Mean
Calculates the mean value of all non-missing cells in a column
and replaces the missing values with this mean.
This missing value handler produces valid PMML 4.2.
Moving Average*
Calculates the mean of all values that are within the window given by
the lookahead and lookbehind and replaces missing values with this mean.
This missing value handler does not produce standard PMML 4.2!
The number of cells to take into account before and after the current cell can be
set using the options lookbehind and lookahead respectively.
Fix Value (Double)
Replaces missing values with a double given by the user.
This missing value handler produces valid PMML 4.2.
Maximum
Finds the column's largest value and replaces all missing values with it.
This missing value handler produces valid PMML 4.2.
Rounded Mean
Calculates the mean value of all non-missing cells in a column
and replaces the missing values with this mean.
This missing value handler produces valid PMML 4.2.
Fix Value (Integer)
Replaces missing values with an integer number given by the user.
This missing value handler produces valid PMML 4.2.
Minimum
Finds the column's smallest value and replaces all missing values with it.
This missing value handler produces valid PMML 4.2.
Most Frequent Value
Calculates the most frequent value in a column
and replaces the missing values with it.
This missing value handler produces valid PMML 4.2.
Previous*
This missing value handler replaces missing values with the last encountered
non-missing value in the column it is configured for.
When dealing with tables that have a large number of rows but not too many columns
that need missing value replacement, the option to use disk backed statistics
avoid flooding of the main memory. This should be used with caution, at is generally
much slower than in-memory statistics.
This missing value handler does not produce standard PMML 4.2!
Remove Row*
This missing value handler removes rows that have a missing value in the column
it is configured for.
This missing value handler does not produce standard PMML 4.2!
Median
Finds the column's median value and replaces all missing values with it.
For large tables this might be computationally expensive because the table needs
to be sorted to find the median.
This missing value handler produces valid PMML 4.2.
Linear Interpolation*
This missing value handler replaces missing values with the linear interpolation
between the last encountered and next non-missing value.
The column 1 2 ? ? 5 6, for example, would be interpolated to 1 2 3 4 5 6.
This missing value handler does not produce standard PMML 4.2!
Linear Interpolation*
This missing value handler replaces missing values with the linear interpolation
between the previous and next encountered
non-missing value in the column it is configured for.
When dealing with tables that have a large number of rows but not too many columns
that need missing value replacement, the option to use disk backed statistics
avoid flooding of the main memory. This should be used with caution, at is generally
much slower than in-memory statistics.
This missing value handler does not produce standard PMML 4.2!
Fix Value (String)
Replaces missing values with a string given by the user.
This missing value handler produces valid PMML 4.2.
Average Interpolation*
This missing value handler replaces missing values with the average value of
the previous and next encountered non-missing value in the column it is configured for.
When dealing with tables that have a large number of rows but not too many columns
that need missing value replacement, the option to use disk backed statistics
avoid flooding of the main memory. This should be used with caution, at is generally
much slower than in-memory statistics.
This missing value handler does not produce standard PMML 4.2!
Fix Value
No description provided.
0 | Table with missing values |
0 | Table with replaced missing values |
1 | Table with PMML documenting the missing value replacement |