How do I remove stop words from textfile and show number of words matches between documents

Question

0.00/5 (No votes)

See more:

Java Problem here please. I have two text files. One contains a long list of stop-words, and the other file contains lots of paragraphs(corpus). I am reading the earlier stated stop-words and removing these stop-words from the other file that has lots of paragraphs in it, mentioning the number of words that matches and how many times the stopwords were found in the corpus. After removing the stop-words from the corpus document, I am saving(writing) it into another new text file. I have been trying to think how to go about it but I am getting nowhere. I am stuck on how to go about it. I meant to write it in java. Assistance is much appreciated.

What I have tried:

I tried following this link but i am stuck

java - How to delete a specific string in a text file? - Stack Overflow[^]

Posted 8-Feb-17 18:05pm

Hassan Jide Hassan

Updated 8-Feb-17 18:48pm

Add a Solution

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Peter Leow · Answer 1 · 2017-02-08T18:48:00

Try the following approach:
1. Read in the list of stop words and store them in a Map (Java Platform SE 7 )[^] with stop word as the key and its count as the value starting at 0. Increase the value by one whenever that stop word is found in the corpus.
2. Read in the corpus text file line by line, for each line, scan for any stop words stored as keys in the Map collection created in step 1, remove it from that line and save that line to a new text file, not to forget to increase the count of that stop word found in the Map collection.
On the coding part, ask Google as it has plenty of examples, specifically, look for string manipulation and file I/O.