Word count (1/3) | Michal Slovik

Overview

In this section of articles, I will share with you my learning project in which I learned more about TDD and improved my Python skills. The whole project was inspired by: CCD Coding Dojo.

I will always share one part in one article, but you can easily see the whole project in my GitLab repository.

The whole code is written in Python (version 3.9.8). The code base is validated with pylint and flake8. Tests are run by pytest and also SonarCloud is in place.

It all starts with the repository setup

First, let’s look at the .gitlab.yaml file, where the code is checked, validated, tested, and built. The GitLab repository provides a nice pipeline where everything can run inside a Docker container. Python should be correct for this application. My first attempt at starting a pipeline was not successful, but learning from mistakes is one way to learn.

# Using the official Python Docker image as default
image: python

# Columns of the pipeline
stages:
  - static analysis
  - build
  - test

# Static code analysis job via Pylint
static analysis:
  stage: static analysis
  before_script:
    - pip install pylint
  script:
    - pylint my_count/__init__.py
  allow_failure: true

# Compilation job
build:
  stage: build
  script:
    - python -m compileall my_count/__init__.py

# Unit testing via PyTest
unit tests:
  stage: test
  before_script:
    - pip install pytest
  script:
    - py.test test_word_count.py

# Module testing and code coverage reporting
module tests:
  stage: test
  before_script:
    - pip install coverage
  script:
    - COVRUN="coverage run -a"
    - test $($COVRUN calculator.py 1 + 1) = 2
    - test $($COVRUN calculator.py 100 - 10) = 90
    - test $($COVRUN calculator.py 25 \* 5) = 125
    - (! $COVRUN calculator.py 30 / 5) # Should fail as division is unsupported
    - coverage report
  coverage: /\d+\%\s*$/

Pylint works fine, but compilation and coverage failed because of missing calculator.py file. Task coverage has been resolved by simplifying

# Coverage
unit test with coverage:
  stage: test
  before_script:
    - pip install pytest pytest-cov
  script:
    - coverage run -m unittest discover
    - coverage report -m
    - coverage xml
  artifacts:
    reports:
      cobertura: coverage.xml

I also add flake for static check of code.

# Flake8
unit test_with_flake:
  stage: test
  before_script:
    - pip install flake8
  script:
    - flake8 test_word_count.py

And to double-check everything I added sonar

# Tox and Sonar
sonarcloud-check:
  stage: sonar
  image:
    name: sonarsource/sonar-scanner-cli:latest
    entrypoint: [ "" ]
  cache:
    key: "${CI_JOB_NAME}"
    paths:
      - .sonar/cache
  before_script:
    - pip install tox
  script:
    - tox -e py
    - sonar-scanner -Dsonar.python.coverage.reportPaths=coverage.xml

Last, but not least should be requirements.txt file for special libraries.

TDD (Test Driven Development)

Martin Fowler’s TestDrivenDevelopment article has been known for years. I mention this because I tried to follow this concept in this project. For reading more about that concept I recommend to read that article.

Part I.

First commit literally looks very simple. It missed a good deal of important stuff. It can take only stdin as input, not files, no check for args, etc.

def simple_word_count():
    input_text = input("Enter text: ")

    lines = input_text.split()
    count = 0
    for word in lines:
        if not word.isnumeric():
            count += 1
    print("Number of words: {}".format(count))
    return count


if __name__ == '__main__':
    simple_word_count()

But this script is completely fine for first part https://gitlab.com/Mishco/word-count#1-agility-kata-word-count-i

$ wordcount
Enter text: Mary had a little lamb
Number of words: 5

Part II.

But next part needs to have some kind of stopwords dictionary with words which should be counted.

Here help tests. I created stopwords.txt file with all stop words :


    def test_all_stop_words(self):
        with open("my_count/stopwords.txt", "r") as file_handle:
            stop_words = file_handle.readlines()
            stop_words = [x.strip() for x in stop_words]

        for stop_word in stop_words:
            result, _, _, _ = simple_word_count(stop_word)
            self.assertEqual(0, result)

Part III.

And last part in this article continue on previous idea, what happen when I would like to count words from file. https://gitlab.com/Mishco/word-count#3-agility-kata-word-count-iii

$ wordcount mytext.txt
Number of words: 4

So I took the previous test on stopwords and used them for counting. And I put each result into stdin, so in this test I captured the whole stdin/stdout, tested it, and then put everything back.

    @patch('sys.argv', ['my_count/__init__.py', 'my_count/mytext.txt'])
    def test_main_loop(self):
        args = sys.argv[1:]
        captured_output = io.StringIO()
        sys.stdout = captured_output
        my_count.main(args)
        sys.stdout = sys.__stdout__
        self.assertEqual("Number of words: 4, unique: 4;"
                         " average word length: 4.25 characters\n",
                         captured_output.getvalue())

This concludes the first part of the article on word count. I always share one part per article, but you can easily see the whole project at the my gitlab repository.