Word count (3/3)
Overview
In first part and second part, I shared with you my learning project in which I learned more about TestDrivenDevelopment(TDD) and improved my Python skills. Whole project was inspired by: https://ccd-school.de/coding-dojo/#cd8.
I will always share one part in one article, but you can easily see whole project in my gitlab repository
Part VII.
In this step I add Optionally an index of all counted words is printed. Sample usage:
$ wordcount -index
Enter text: Mary had a little lamb
Number of words: 4, unique: 4; average word length: 4.25 characters
Index:
had
lamb
little
Mary
$
The index option should list all the words counted. So I need to do at least two things. First, save and return all counted words if they are required. And then decide after listing the input arguments. Finally I save all the counted words in the selected_words
list. If Index is True, the list is returned, otherwise not.
WORD_PATTERN = "[a-z-A-Z]*"
def simple_word_count(input_value_text, index=False):
selected_words = []
for word in lines:
if len(word) > 1 and re.match(WORD_PATTERN, word).endpos > 0:
if word not in stop_words:
selected_words.append(word)
# ...
if index:
return count, unique_count, result_avg, sorted(selected_words, key=str.lower)
return count, unique_count, result_avg, None
More interesting is the main method, where I need to check whether all arguments are valid or not. If they are valid, they should be run with the correct parameters. All the work is done by a useful library getopt
import getopt
def get_opts_args(argv):
try:
opts, args = getopt.getopt(argv, shortopts="hd:", longopts=["help", "index"])
except getopt.GetoptError:
write_help()
sys.exit(2)
return args, opts
def main(argv):
args, opts = get_opts_args(argv)
stdio_workflow(args)
But how to test input parameters that users enter manually into the console? As always, the patch from the previous examples can help me here. I can even redirect not only the standard output, but also the standard input and test the console arguments correctly.
def runTest(self, given_answer, expected_out, args):
with patch(BUILTINS_INPUT, return_value=given_answer), \
patch(SYS_STDOUT, new=io.StringIO()) as dummy_out:
my_count.main(args)
self.assertEqual(dummy_out.getvalue().strip(), expected_out)
@patch('sys.argv', ['my_count/__init__.py', '-index'])
def test_input_index(self):
# wordcount -index
# Enter text: Mary had a little lamb
self.runTest(SAMPLE_TEXT,
"Number of words: 4, unique: 4; "
"average word length: 4.25 characters\n"
"Index:\n"
"had\n"
"lamb\n"
"little\n"
"Mary",
['--index'])
# I can also test empty arguments
def test_empty_line_args_input(self):
self.runTest(' ', 'Number of words: 0, unique: 0;'
' average word length: 0.00 characters', [])
# when I put all stuff in one method it also work
@patch('sys.argv', ['wordcount.py'])
def test_main_loop_without_args(self):
given_answer = SAMPLE_TEXT
args = ['--index']
expected_out = HAD_LAMB_LITTLE_MARY_EXCEPTED_OUT
with patch(BUILTINS_INPUT, return_value=given_answer), \
patch(SYS_STDOUT, new=io.StringIO()) as dummy_out:
wordcount.main(args)
self.assertEqual(dummy_out.getvalue().strip(), expected_out)
# invalid arguments should raise SystemExit (because '-not_args' is not valid arg )
@patch('sys.argv', ['wordcount.py'])
def test_main_loop_without_valid_args(self):
given_answer = " "
args = ['-not_args', '-sssss']
expected_out = HAD_LAMB_LITTLE_MARY_EXCEPTED_OUT
with self.assertRaises(SystemExit):
with patch(BUILTINS_INPUT, return_value=given_answer), \
patch(SYS_STDOUT, new=io.StringIO()) as dummy_out:
wordcount.main(args)
self.assertEqual(dummy_out.getvalue().strip(), expected_out)
Part VIII.
Next part is also connected with another input arguments.
Optionally the text can be checked against a dictionary of known words. If the index is printed it will mark words not found in the dictionary with a star and state the number of unknown words, e.g.
$ wordcount -index -dictionary=dict.txt
Enter text: Mary had a little lamb
Number of words: 4, unique: 4; average word length: 4.25 characters
Index (unknown: 2):
had
lamb*
little
Mary*
$
With dict.txt being:
big
small
little
cat
dog
have
has
had
This time I start with test, which is similar to previous scenario:
@patch('sys.argv', ['my_count/__init__.py',
'-index',
'-dictionary=dict.txt'])
def test_input_index_and_dict(self):
self.runTest(SAMPLE_TEXT,
"Number of words: 4, unique: 4; "
"average word length: 4.25 characters\n"
"Index (unknown: 2):\n"
"had\n"
"lamb*\n"
"little\n"
"Mary*",
['--index', '--dictionary=dict.txt'])
@patch('sys.argv', ['my_count/__init__.py',
'--index',
'--dictionary=dict.txt'])
def test_input_index_and_dict_another(self):
self.runTest("aaa",
"Number of words: 1, unique: 1; "
"average word length: 3.00 characters\n"
"Index (unknown: 1):\n"
"aaa*",
['--index', '--dictionary=dict.txt'])
Main method does not change to much, only another arguments needs to consider:
def consume_args_opts(opts):
index_config = False
dict_config = False
dict_value = None
for opt, arg in opts:
if opt in ('-h', '--help'):
write_help()
sys.exit(0)
if opt in ('-i', '--index'):
index_config = True
if opt in ('-d', '--dictionary'):
dict_config = True
dict_value = arg
return dict_config, dict_value, index_config
def get_opts_args(argv):
# Before
opts, args = getopt.getopt(argv, shortopts="hd:", longopts=["help", "index"])
# Current
opts, args = getopt.getopt(argv, shortopts="hd:", longopts=["help", "index", "dictionary="])
def main(argv):
# before
args, opts = get_opts_args(argv)
# current
args, opts = get_opts_args(argv)
dict_config, dict_value, index_config = consume_args_opts(opts)
Part IX.
And last scenario should Allow the user to enter several texts and get them analyzed. The program is terminated by entering an empty text:
$ wordcount
Enter text: Mary had a little lamb
Number of words: 4, unique: 4; average word length: 4.25 characters
Enter text: a bb ccc dddd
Number of words: 4, unique: 4; average word length: 2.5 characters
Enter text:
$
So this part needs some sort of loop to run until the end condition is met. An infinite loop is not the best choice, but it can help for this scenario.
INFINITE_LOOP = True
WORD_PATTERN = "[a-z-A-Z]*"
# This can work with STDIN
def work_with_stdio(index=False):
while True:
input_text = input("Enter text: ")
if input_text:
if index:
count_words, count_unique, avg_len, index_words = \
simple_word_count(input_text, index)
print("Number of words: {}, unique: {}; "
"average word length: {:.2f} characters\nIndex:".
format(count_words, count_unique, avg_len))
print(*index_words, sep='\n')
else:
count_words, count_unique, avg_len, _ = \
simple_word_count(input_text, index)
print("Number of words: {}, unique: {}; "
"average word length: {:.2f} characters".
format(count_words, count_unique, avg_len))
else:
sys.exit(0)
if INFINITE_LOOP:
return
Testing an infinite loop can be a bit confusing, but there are several ways to achieve this goal. First, I can omit all other parameters, so that the "infinite" loop only runs once.
def test_another_infinite_true_loop(self):
wordcount.INFINITE_LOOP = True
given_answer = ""
args = []
expected_out = ""
with self.assertRaises(SystemExit):
with patch(BUILTINS_INPUT, return_value=given_answer), \
patch(SYS_STDOUT, new=io.StringIO()) as dummy_out:
wordcount.main(args)
self.assertEqual(dummy_out.getvalue().strip(), expected_out)
On the other hand, I can test waiting for user input using a timeout.
@timeout(10)
def call_infinite_main_loop(self, args, given_answer):
with patch(BUILTINS_INPUT, return_value=given_answer), \
patch(SYS_STDOUT, new=io.StringIO()):
my_count.INFINITE_LOOP = False
wordcount.main(args)
Whole code is available my gitlab repository
Conclusion
I learned a lot of things during this coding challenge. There are more things I missed, performance testing and testing the whole application on real data examples (lorem impsum might be a good example). There are more libraries out there that achieve the same result with less process time and consume less resources.
But I still think it is a good challenge for any developer to increase their skills.