Code Standards I follow

Subhankar Halder
7 min readApr 13, 2020

--

I currently work in a Computer Science/Engineering domain where there’s much to learn — the newest deep learning models, the innovative ways to increase accuracy and so on. Nevertheless, I often find myself googling questions such as “How to get better in coding?”, “What are the best software engineering practices?”, “How is a new software project in FAANG executed?” and so on. I wonder if it seems naive to do this. It is akin to an English novelist asking “How to write better English?”.

At work, I usually do exploratory programming. For instance, I try to figure out whether a model gives a good prediction. In such a scenario, the “ends” matter more than the “means”. Often, I am not sure what results I’ll get when I run the model. Once, management is happy with the results, they might move the model for production. When that happens, all the core software engineering design principles come into play.

Recently, I was asked to code a “production” software for a change. “Staged for production” might be a better word. I realized this was my chance to understand and implement some of the best software engineering standards followed by coders worldwide. The final objective, by following these standards, was, of course, to write quality code.

And so, I started experimenting while writing the “production” code. In this post, I’ll tell you about four of the standards that I really liked and followed:

Imperative Mood Git Commit Messages

To be honest, I never gave too much attention when I was writing my git commit messages. For my exploratory projects, I use git mostly to back up my files in a cloud repository. Subsequently, my commit messages would be extremely vague such as:

Added some files

And, neither did I pay attention to the tense of my commit messages:

Made changes and add files

This went on , until I ended up reading the Git official documentation. They write:

Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
to do frotz", as if you are giving orders to the codebase to change
its behavior.

So now, I write my git commit messages in the imperative (present tense). For instance:

Add prediction files and change test folder location

Well established open source projects follow this pattern and I believe it’s a good habit to follow. Anyways, you are typing less if you are using the present tense.

Test Driven Development (TDD)

A trained chef in a restaurant has a weird style of chopping onions. She clasps the onion with her left hand and bends her finger in a claw shape. Her right hand grips the handle and the first finger goes over the knife. Yet, when she chops, the cuts are uniform and precise. Importantly, this technique is safer than the ones used by any amateur. The point, I am trying to get at is, that our natural way of doing things may not be efficient. Likewise, it may be that there are different paradigms of coding that might be better than our natural way of writing a program. I believe TDD is one of those “unnatural” techniques and yet performs better for coding production software.

I heard about TDD in this talk by Robert Martin, author of many agile books for software development. Irrespective of your opinions on TDD, the first thirty minutes of the talk is extremely inspirational! Do give it a listen.

Following are the three laws of TDD:

  1. You can’t write any production code until you have first written a failing unit test.
  2. You can’t write more of a unit test than is sufficient to fail, and not compiling is failing.
  3. You can’t write more production code than is sufficient to pass the currently failing unit test.

While implementing the three laws, you follow these steps:

  1. Create a unit tests that fails
  2. Write production code that makes that test pass.
  3. Refactor code

In this paradigm, you first write the test and not the code! Isn’t that unnatural? I was skeptical of the process too! But I gradually got used to it. And now, I really like it. Let’s look at an example and see how this works.

Example: Consider, we are need to write a deep learning model which distinguishes between cats and dogs images. Assuming we are using PyTorch, our natural style of coding would be:

  1. Write a dataset and dataloader for train and test sets
  2. Construct our CNN model
  3. Loop over Forward and Backward Pass for the training step
  4. Write a prediction script with the model in eval mode
  5. Finally, you would write some tests where you provide a picture, and the tests should assert True or False depending upon the image ground truth

If there are bugs, you would probably read the error messages, put print statements all over your code which seems relevant and then debug.

In the TDD paradigm, you would first write a test. So, first, we write a unit test for the dataset class. Like so:

First Test

And now, we write the code:

Code

And then you see if the test pass. If so, write a new test, and write some code. Else, refactor either your actual code or your test.

So, in effect, you are writing a small test, and some code and testing it regularly. For those who have studied accounting, this almost seems like the double entry book-keeping. Just like both the debit and credit balance each other, the tests and the actual code balance out. Except, you always start with the test. If there are errors, it seems more manageable to tackle.

What you end up getting is a robust test suite. It’s your first check to the question whether your software can be deployed. And since you are writing more tests, you get better at coding tests.

For instance, I got to learn about mocking and patching functions when you are doing testing. For example, suppose we have a function that call the os.listdir command to get a list of directories. You would think that you would have to create entire directories and then run the test. But not so, you can patch a mock function where your os.listdir resides in your script that returns values you want! Something like so:

Patch a Mock Function

Such techniques are useful when you unit testing and your function itself, makes calls to some external functions.

I feel I am more confident about my code when I am writing tests concurrently. It’s like driving in the night with headlights on. Sure, you can still ram into the pole, and the headlights draws a significant amount of power. But, you can see better and you are more confident about your night driving.

There are cons to this process of course. Notice that, we would probably write 2X the amount of code when we follow the TDD process. In a work environment, where you have deadlines, this becomes quite complicated to follow. Also, I am not sure, whether this paradigm is useful for exploratory programming.

Documentation

There are many quotable quotes in the book about Steve Jobs by Walter Isaacson. However, I always remember the following quote from the book quite well. This is Steve Jobs talking about carpentry:

“When you’re a carpenter making a beautiful chest of drawers, you’re not going to use a piece of plywood on the back, even though it faces the wall and nobody will ever see it. You’ll know it’s there, so you’re going to use a beautiful piece of wood on the back. For you to sleep well at night, the aesthetic, the quality, has to be carried all the way through.”

Documentation is similar to putting the piece of wood on the back of the chest. Hardly anyone would read your documentation and comments, but it is a good practice to do it.

While I was writing code, I thought I would do three things concurrently: (a) Test (b) Code and (c) Readme Documentation

However, this did not work well. Whenever I changed my class or function designs, I realized I would have to delete my Readme and start over all again.

So, I have now decided to start with the Readme only after I have written all the code. Nevertheless, in the actual scripts, I usually document my functions with a docstring like so:

Function Docstring

Refactoring Code

Since I was working in a TDD paradigm, I was constantly refactoring code to make the test pass. And during this cycle process, since I came back to my code often, I realized there were ways to make my code more efficient. For instance, take a look at the following example:

String concatenation

There’s nothing fancy here — we are simply adding two strings. For instance, this is usually used when one adds a “file” string to a “.jpg” string to make a complete filename.

However, if you loop over this many times, this process is not that efficient. In fact you should use the “join” method to add the strings together. For “join”, Python allocates memory for the combined string only once. If strings are added in succession Python allocates new memory for each addition. So, our changed example would be:

“Join” method for strings

I really enjoyed coding the “production” software (still do, the project is still not finished). What standards do you follow that makes you a better programmer and allows you to write quality code? Do comment!

--

--