Word count (1/3)
Start small, then grow…
Overview
In this section of articles, I will share with you my learning project in which I learned more about TDD and improved my Python skills. Whole project was inspired by: https://ccd-school.de/coding-dojo/#cd8.
I will always share one part in one article, but you can easily see whole project in my gitlab repository
It all starts with the repository setup
First, let’s look at the .gitlab.yaml file, where the code is checked, validated, tested, and built. The Gitlab repository provides a nice pipeline where everything can run inside a docker container. Python should be correct for this application. My first attempt at starting a pipeline was not successful, but learning from mistakes is one way to learn.
# Using the official Python Docker image as default
image: python
# Columns of the pipeline
stages:
- static analysis
- build
- test
# Static code analysis job via Pylint
static analysis:
stage: static analysis
before_script:
- pip install pylint
script:
- pylint my_count/__init__.py
allow_failure: true
# Compilation job
build:
stage: build
script:
- python -m compileall my_count/__init__.py
# Unit testing via PyTest
unit tests:
stage: test
before_script:
- pip install pytest
script:
- py.test test_word_count.py
# Module testing and code coverage reporting
module tests:
stage: test
before_script:
- pip install coverage
script:
- COVRUN="coverage run -a"
- test $($COVRUN calculator.py 1 + 1) = 2
- test $($COVRUN calculator.py 100 - 10) = 90
- test $($COVRUN calculator.py 25 \* 5) = 125
- (! $COVRUN calculator.py 30 / 5) # Should fail as division is unsupported
- coverage report
coverage: /\d+\%\s*$/
Pylint works fine, but compilation and coverage failed because of missing calculator.py
file.
Task coverage has been resolved by simplifying
# Coverage
unit test with coverage:
stage: test
before_script:
- pip install pytest pytest-cov
script:
- coverage run -m unittest discover
- coverage report -m
- coverage xml
artifacts:
reports:
cobertura: coverage.xml
I also add flake for static check of code.
# Flake8
unit test_with_flake:
stage: test
before_script:
- pip install flake8
script:
- flake8 test_word_count.py
And to double-check everything I added sonar
# Tox and Sonar
sonarcloud-check:
stage: sonar
image:
name: sonarsource/sonar-scanner-cli:latest
entrypoint: [ "" ]
cache:
key: "${CI_JOB_NAME}"
paths:
- .sonar/cache
before_script:
- pip install tox
script:
- tox -e py
- sonar-scanner -Dsonar.python.coverage.reportPaths=coverage.xml
Last, but not least should be requirements.txt
file for special libraries.
TDD (Test Driven Development)
Martin Fowler’s TestDrivenDevelopment article has been known for years. I mention this because I tried to follow this concept in this project. For reading more about that concept I recommend to read that article.
Part I.
First commit literally looks very simple. It missed a good deal of important stuff. It can take only stdin as input, not files, no check for args, etc.
def simple_word_count():
input_text = input("Enter text: ")
lines = input_text.split()
count = 0
for word in lines:
if not word.isnumeric():
count += 1
print("Number of words: {}".format(count))
return count
if __name__ == '__main__':
simple_word_count()
But this script is completely fine for first part https://gitlab.com/Mishco/word-count#1-agility-kata-word-count-i
$ wordcount
Enter text: Mary had a little lamb
Number of words: 5
Part II.
But next part needs to have some kind of stopwords
dictionary with words which should be counted.
Here help tests.
I created stopwords.txt
file with all stop words :
def test_all_stop_words(self):
with open("my_count/stopwords.txt", "r") as file_handle:
stop_words = file_handle.readlines()
stop_words = [x.strip() for x in stop_words]
for stop_word in stop_words:
result, _, _, _ = simple_word_count(stop_word)
self.assertEqual(0, result)
Part III.
And last part in this article continue on previous idea, what happen when I would like to count words from file. https://gitlab.com/Mishco/word-count#3-agility-kata-word-count-iii
$ wordcount mytext.txt
Number of words: 4
So I took the previous test on stopwords and used them for counting. And I put each result into stdin, so in this test I captured the whole stdin/stdout, tested it, and then put everything back.
@patch('sys.argv', ['my_count/__init__.py', 'my_count/mytext.txt'])
def test_main_loop(self):
args = sys.argv[1:]
captured_output = io.StringIO()
sys.stdout = captured_output
my_count.main(args)
sys.stdout = sys.__stdout__
self.assertEqual("Number of words: 4, unique: 4;"
" average word length: 4.25 characters\n",
captured_output.getvalue())
This concludes the first part of the article on word count. I always share one part per article, but you can easily see the whole project at the my gitlab repository.