Big Data PR
R tutorial,data science,data science training,data professor,dataprofessor,data mining,data science 101,getting started on data science,data science tutorial,R workshop,R tutorials,R training,learn R,learning R,R programming,R programming for beginners,R programming tutorial,R programming for data science,R data science project,machine learning in R,classification model in R,data science in R,support vector machine,SVM,SVM in R,

What is bag of words in Machine Learning?

A “bag of words” is a representation of the words in a phrase or passage, irrespective of order.

For example, bag of words represents the following three phrases identically:

  • the dog jumps
  • jumps the dog
  • dog jumps the

Each word is mapped to an index in a sparse vector, where the vector has an index for every word in the vocabulary. For example, the phrase the dog jumps is mapped into a feature vector with non-zero values at the three indices corresponding to the words thedog, and jumps. The non-zero value can be any of the following:

  • A 1 to indicate the presence of a word.
  • A count of the number of times a word appears in the bag. For example, if the phrase were the maroon dog is a dog with maroon fur, then both maroon and dog would be represented as 2, while the other words would be represented as 1.
  • Some other value, such as the logarithm of the count of the number of times a word appears in the bag.

Data Science PR

Add comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Follow us

Don't be shy, get in touch. We love meeting interesting people and making new friends.