Word count (3/3)

Overview

In first part and second part, I shared with you my learning project in which I learned more about TestDrivenDevelopment(TDD) and improved my Python skills. Whole project was inspired by: https://ccd-school.de/coding-dojo/#cd8.

I will always share one part in one article, but you can easily see whole project in my gitlab repository

Whole code is written in Python (version 3.9.8). Code base is validate with pylint and flake8. Test are run by pytest and also sonar is in place.

Part VII.

In this step I add Optionally an index of all counted words is printed. Sample usage:

$ wordcount -index
Enter text: Mary had a little lamb
Number of words: 4, unique: 4; average word length: 4.25 characters
Index:
had
lamb
little
Mary
$

The index option should list all the words counted. So I need to do at least two things. First, save and return all counted words if they are required. And then decide after listing the input arguments. Finally I save all the counted words in the selected_words list. If Index is True, the list is returned, otherwise not.

WORD_PATTERN = "[a-z-A-Z]*"
def simple_word_count(input_value_text, index=False):

    selected_words = []
    for word in lines:
        if len(word) > 1 and re.match(WORD_PATTERN, word).endpos > 0:
            if word not in stop_words:
                selected_words.append(word)
        # ...

    if index:
        return count, unique_count, result_avg, sorted(selected_words, key=str.lower)

    return count, unique_count, result_avg, None

More interesting is the main method, where I need to check whether all arguments are valid or not. If they are valid, they should be run with the correct parameters. All the work is done by a useful library getopt

import getopt

def get_opts_args(argv):
    try:
        opts, args = getopt.getopt(argv, shortopts="hd:", longopts=["help", "index"])
    except getopt.GetoptError:
        write_help()
        sys.exit(2)
    return args, opts


def main(argv):
    args, opts = get_opts_args(argv)

    stdio_workflow(args)

But how to test input parameters that users enter manually into the console? As always, the patch from the previous examples can help me here. I can even redirect not only the standard output, but also the standard input and test the console arguments correctly.

    def runTest(self, given_answer, expected_out, args):
        with patch(BUILTINS_INPUT, return_value=given_answer), \
                patch(SYS_STDOUT, new=io.StringIO()) as dummy_out:
            my_count.main(args)
            self.assertEqual(dummy_out.getvalue().strip(), expected_out)


    @patch('sys.argv', ['my_count/__init__.py', '-index'])
    def test_input_index(self):
        # wordcount -index
        # Enter text: Mary had a little lamb
        self.runTest(SAMPLE_TEXT,
                     "Number of words: 4, unique: 4; "
                     "average word length: 4.25 characters\n"
                     "Index:\n"
                     "had\n"
                     "lamb\n"
                     "little\n"
                     "Mary",
                     ['--index'])

    # I can also test empty arguments
    def test_empty_line_args_input(self):
        self.runTest(' ', 'Number of words: 0, unique: 0;'
                          ' average word length: 0.00 characters', [])

    # when I put all stuff in one method it also work
    @patch('sys.argv', ['wordcount.py'])
    def test_main_loop_without_args(self):
        given_answer = SAMPLE_TEXT
        args = ['--index']
        expected_out = HAD_LAMB_LITTLE_MARY_EXCEPTED_OUT
        with patch(BUILTINS_INPUT, return_value=given_answer), \
                patch(SYS_STDOUT, new=io.StringIO()) as dummy_out:
            wordcount.main(args)
            self.assertEqual(dummy_out.getvalue().strip(), expected_out)

    # invalid arguments should raise SystemExit (because '-not_args' is not valid arg )
    @patch('sys.argv', ['wordcount.py'])
    def test_main_loop_without_valid_args(self):
        given_answer = " "
        args = ['-not_args', '-sssss']
        expected_out = HAD_LAMB_LITTLE_MARY_EXCEPTED_OUT
        with self.assertRaises(SystemExit):
            with patch(BUILTINS_INPUT, return_value=given_answer), \
                    patch(SYS_STDOUT, new=io.StringIO()) as dummy_out:
                wordcount.main(args)
                self.assertEqual(dummy_out.getvalue().strip(), expected_out)

Part VIII.

Next part is also connected with another input arguments.

Optionally the text can be checked against a dictionary of known words. If the index is printed it will mark words not found in the dictionary with a star and state the number of unknown words, e.g.

$ wordcount -index -dictionary=dict.txt
Enter text: Mary had a little lamb
Number of words: 4, unique: 4; average word length: 4.25 characters
Index (unknown: 2):
had
lamb*
little
Mary*
$

With dict.txt being:

big
small
little
cat
dog
have
has
had

This time I start with test, which is similar to previous scenario:

    @patch('sys.argv', ['my_count/__init__.py',
                        '-index',
                        '-dictionary=dict.txt'])
    def test_input_index_and_dict(self):
        self.runTest(SAMPLE_TEXT,
                     "Number of words: 4, unique: 4; "
                     "average word length: 4.25 characters\n"
                     "Index (unknown: 2):\n"
                     "had\n"
                     "lamb*\n"
                     "little\n"
                     "Mary*",
                     ['--index', '--dictionary=dict.txt'])

    @patch('sys.argv', ['my_count/__init__.py',
                        '--index',
                        '--dictionary=dict.txt'])
    def test_input_index_and_dict_another(self):
        self.runTest("aaa",
                     "Number of words: 1, unique: 1; "
                     "average word length: 3.00 characters\n"
                     "Index (unknown: 1):\n"
                     "aaa*",
                     ['--index', '--dictionary=dict.txt'])

Main method does not change to much, only another arguments needs to consider:

def consume_args_opts(opts):
    index_config = False
    dict_config = False
    dict_value = None
    for opt, arg in opts:
        if opt in ('-h', '--help'):
            write_help()
            sys.exit(0)
        if opt in ('-i', '--index'):
            index_config = True
        if opt in ('-d', '--dictionary'):
            dict_config = True
            dict_value = arg
    return dict_config, dict_value, index_config


def get_opts_args(argv):
    # Before
    opts, args = getopt.getopt(argv, shortopts="hd:", longopts=["help", "index"])

    # Current
    opts, args = getopt.getopt(argv, shortopts="hd:", longopts=["help", "index", "dictionary="])


def main(argv):
    # before
    args, opts = get_opts_args(argv)

    # current
    args, opts = get_opts_args(argv)
    dict_config, dict_value, index_config = consume_args_opts(opts)

Part IX.

And last scenario should Allow the user to enter several texts and get them analyzed. The program is terminated by entering an empty text:

$ wordcount
Enter text: Mary had a little lamb
Number of words: 4, unique: 4; average word length: 4.25 characters

Enter text: a bb ccc dddd
Number of words: 4, unique: 4; average word length: 2.5 characters

Enter text:
$

So this part needs some sort of loop to run until the end condition is met. An infinite loop is not the best choice, but it can help for this scenario.

INFINITE_LOOP = True
WORD_PATTERN = "[a-z-A-Z]*"

# This can work with STDIN
def work_with_stdio(index=False):
    while True:
        input_text = input("Enter text: ")

        if input_text:
            if index:
                count_words, count_unique, avg_len, index_words = \
                    simple_word_count(input_text, index)
                print("Number of words: {}, unique: {}; "
                      "average word length: {:.2f} characters\nIndex:".
                      format(count_words, count_unique, avg_len))
                print(*index_words, sep='\n')
            else:
                count_words, count_unique, avg_len, _ = \
                    simple_word_count(input_text, index)
                print("Number of words: {}, unique: {}; "
                      "average word length: {:.2f} characters".
                      format(count_words, count_unique, avg_len))
        else:
            sys.exit(0)
        if INFINITE_LOOP:
            return

Testing an infinite loop can be a bit confusing, but there are several ways to achieve this goal. First, I can omit all other parameters, so that the "infinite" loop only runs once.

    def test_another_infinite_true_loop(self):
        wordcount.INFINITE_LOOP = True
        given_answer = ""
        args = []
        expected_out = ""
        with self.assertRaises(SystemExit):
            with patch(BUILTINS_INPUT, return_value=given_answer), \
                    patch(SYS_STDOUT, new=io.StringIO()) as dummy_out:
                wordcount.main(args)
                self.assertEqual(dummy_out.getvalue().strip(), expected_out)

On the other hand, I can test waiting for user input using a timeout.

    @timeout(10)
    def call_infinite_main_loop(self, args, given_answer):
        with patch(BUILTINS_INPUT, return_value=given_answer), \
                patch(SYS_STDOUT, new=io.StringIO()):
            my_count.INFINITE_LOOP = False
            wordcount.main(args)

Whole code is available my gitlab repository

Conclusion

I learned a lot of things during this coding challenge. There are more things I missed, performance testing and testing the whole application on real data examples (lorem impsum might be a good example). There are more libraries out there that achieve the same result with less process time and consume less resources.

But I still think it is a good challenge for any developer to increase their skills.

Michal Slovík
Michal Slovík
Java developer and Cloud DevOps

My job interests include devops, java development and docker / kubernetes technologies.