Blending Languages

Blend Languages in Tag Cloud

Blend Languages in Tag Cloud

This workflow demonstrates how different languages can be blend with KNIME. The question is, can they be blend? Read our blog post for more details.

Blog: https://www.knime.com/blog/around_the_world_in_8_languages

Read more about Blend Languages in Tag Cloud

Document clustering

This workflow shows how to import textual data, preprocess documents by filtering and stemming, transform documents into a bag of words and document vectors, and finally cluster the documents based on their numerical representation.

Read more about Document clustering

DocumentVector Hashing

This workflow demonstrates how to use the Document Vector Hashing node to execute the Sentiment Analysis example in a streaming fashion.

Read more about DocumentVector Hashing

Document Classification

This workflow shows how to import textual data, preprocess documents by filtering and stemming, transform documents into a bag of words and document vectors and finally build a predictive model to classify the documents. It also contains the corresponding deployment workflow.

Read more about Document Classification

Sentiment Classification with NGrams

This workflow shows how to import text from a csv file, convert it to documents, preprocess the documents and transform them into numerical document vectors consisting of single word and 2-gram features.
Finally two predictive models are trained on the vectors to predict the sentiment class of the documents. The two models are then compared via a ROC curve.

Read more about Sentiment Classification with NGrams

Analyzing Twitter Posts with Custom Tagging

This workflow performs textprocessing on KNIME tweets using the Dictionary Tagger (Multi Column) and Term Neighborhood Extractor nodes.

Read more about Analyzing Twitter Posts with Custom Tagging

epub JPEG Romeo Juliet

The challenge here is to blend together text and image data. Text data is in epub format while images are in JPEG format. The goal is to build the network of interactions in one of Shakespear most famous tragedies: Romeo and Juliet. The network of interactions is then dispayed as a graph, where each node represents a character. Each node then displays the character JPEG image. epub with JPEG. Will they blend? ... and yes! They blend.

Read more about epub JPEG Romeo Juliet

Topic Detection LDA

This workflow extracts topics from the "Romeo & Juliet" epub book using the Topic Extractor (Parallel LDA) node. It reads textual data from a table and converts them into documents. The documents are then preprocessed, i.e. tagged, filtered, lemmatized, etc. After that, the Topic Extractor node can be applied to the preprocessed documents. However, the node requires users to input the number of topics that should be extracted beforehand. After pre-processing, the Topic Extractor node can be executed and a tag cloud is created to visualize the topics' terms.

Read more about Topic Detection LDA

Sentiment Analysis Lexicon Based Approach

This workflow shows how to perform a lexycon based approach for sentiment analysis of IMDB reviews dataset. The dataset contains movie reviews, previously labelled as positive/negative. The lexicon based approach assigns a sentiment to each word in a text based on dictionaries of positive and negative words. A sentiment score is then calculated for each document as: (number of positive words - number of negative words) / total number of words.

Read more about Sentiment Analysis Lexicon Based Approach

NER Tagger Model Training

This workflows shows how to train a model for named-entity recognition. The model can be created with the StanfordNLP NE Learner node which creates a conditional random field (CRF) model. To create the model, a document training set and a dictionary with known named-entities is needed. Due to generalization of word patterns, the model can be used by the tagger to find new named-entitities in unknown documents. A Scorer node for model evaluation is also available.

Read more about NER Tagger Model Training

Subscribe to Text Processing

What are you looking for?