This workflow shows how to perform data exploration and visualization on a large dataset using KNIME Big Data Extensions and make the whole process interactive via the KNIME WebPortal. The data that we will use is the hugely popular NYC taxi dataset.
This workflow handles the preprocessing of the NYC taxi dataset (loading, cleaning, filtering, etc). The NYC taxi dataset contains over 1 billion taxi trips in New York City between January 2009 and December 2017 and is provided by the NYC Taxi and Limousine Commision (TLC). It contains not only information about the regular yellow cabs, but also green taxis, which started in August 2013, and For-Hire Vehicle (e.g Uber) starting from January 2015.