Deduplication of Address Data

The workflow shows the power of the new distance measurement framework - a high prediction correctness of possible matches is achieved with a minimum number of nodes and without any preprocessing by just aggregating some distances on different attributes. The chosen data set is the "Restaurant data set" from comprising 864 restaurant records and 112 duplicates. Each record contains a name, an address, a city, a type and finally a class attribute. Records with an identical value in the class attribute point to the same real-word entity or restaurant in our case.

Deduplication of Address Data



EXAMPLES Server: 50_Applications/13_Address_Deduplication/01_Deduplication_of_Address_Data50_Applications/13_Address_Deduplication/01_Deduplication_of_Address_Data*
Download a zip-archive




* Find more about the Examples Server here.
The link will open the workflow directly in KNIME Analytics Platform (requirements: Windows; KNIME Analytics Platform must be installed with the Installer version 3.2.0 or higher). In other cases, please use the link to a zip-archive or open the provided path manually