Click here to Skip to main content
15,886,857 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I get single letter words in my Structural Topic Model after my preprocessing steps. Particularly after I stem my tokens.

My code is as follows:

#read the csv
articletexts <- read_csv("Meta.csv")
summary(articletexts)

#create a corpus
corp = corpus(articletexts, text_field = 'title')
corp

#stopwords
sw <- stopwords("english")
sw

#tokenize and remove elements
tokens <- tokens(corp, remove_punct = T, remove_numbers = T, remove_symbols = T) %>%
 tokens_remove(sw) %>%
  tokens_wordstem() 


What I have tried:

I have tried gsub to remove all letters of the English alphabet.
Posted
Updated 15-Jan-23 9:11am

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900