Goal: Given an input file and an optional noise word file, print a frequency count of non-noise words in the input file
1. Change the name of the program to WordAnalyzer
2. Accept two parameters - input file and noise file
3. Input file is the same as the first version
4. Noise file contains a set of noise words separated by white space (one or more spaces, newlines, tabs)
5. Open both files - raise exception if one of them do not exist and exit program
6. Read noise file and store all the words in memory
7. Read input file, line by line
8. For each line, tokenize (separate into words)
9. Eliminate any punctuation characters ( period, comma, semi-colon, colon, question-mark, exclamation point and other non-alpha numeric characters)
10. Increment word count for all input words
11. Check the word against noise words -
- if it is a noise word, increment the noise-word count (so that we know how many noise words are in the text)
- if it is not a noise word, - increment word-count and word-frequency count
12. At the end of input file, produce the following output.
13. Count of input words, count of noise words in the input file, count of valid-words
14. Write an output file in the following format (sort it by descending order of frequency)
word, frequencycount
Perform the following tests
1. Invalid input file, invalid noise word file
2. Valid input, invalid noise word
3. Valid input, valid noise word file
4. Valid input, empty noise word file (the noise word exists but does not contain any blanks)
5. Empty input - input file exists but does not contain any data
6. Input with a single word
7. Input with only punctuation characters (no valid words)
Comments (0)
You don't have permission to comment on this page.