Engineering Blog

Using a filename as a function input

While creating a new function that reads the file, passing the filename as an argument is not considered best practice and might have negative effects such as making it difficult to write test cases for a variety of criteria.

Suppose we want to implement a function to count the number of empty lines in a file. One way to implement this function would be to accept a filename and use bufio.NewScanner to scan and check every line:

func countEmptyLinesInFile(filename string) (int, error) {
file, err := os.Open(filename)
if err != nil {
return 0, err
}
// Handle file closure
scanner := bufio.NewScanner(file)
for scanner.Scan() {
}
}

We open a file from the filename. Then we use bufio.NewScanner to scan every line (by default, it splits the input per line).

This function will do what we expect it to do. Indeed, as long as the provided filename is valid, we will read from it and return the number of empty lines. So what’s the problem? Let’s say we want to implement unit tests to cover the following cases:
1. A nominal case
2. An empty file
3. A file containing only empty lines

Here, for each unit test case, we have to create a new file which makes it more complex for testing. If the function becomes complex more cases have to be added for testing which might end up using dozens of files in some cases which becomes difficult to handle.

Furthermore, this function isn’t reusable. For example, if we had to implement the same logic but count the number of empty lines with an HTTP request, we would have to duplicate the main logic:

func countEmptyLinesInHTTPRequest(request http.Request) (int, error) {
scanner := bufio.NewScanner(request.Body)

}

Way to overcome these limitations

Make the function accept a *bufio.Scanner (the output returned by bufio.NewScanner ). Both functions have the same logic from the moment we create the scanner variable, so this approach would work. But in Go, the idiomatic way is to start from the reader’s abstraction. Let’s write a new version of the countEmptyLines function that receives an io.Reader abstraction instead:

func countEmptyLines(reader io.Reader) (int, error) {
scanner := bufio.NewScanner(reader)
for scanner.Scan() {
// ...
}
}

The benefit of this approach is that this function abstracts the data source. Is it a file? An HTTP request? A socket input? It’s not important for the function. Because *os.File and the Body field of http.Request implement io.Reader , we can reuse the same function regardless of the input type.

Another benefit is related to testing. We mentioned that creating one file per test case could quickly become cumbersome. Now that countEmptyLines accepts an io.Reader , we can implement unit tests by creating an io.Reader from a string:

func TestCountEmptyLines(t *testing.T) {
emptyLines, err := countEmptyLines(strings.NewReader(
`foo
bar
baz
`))
// Test logic
}

In this test, we create an io.Reader using strings.NewReader from a string literal directly. Therefore, we don’t have to create one file per test case. Each test case can be self-contained, improving the test readability and maintainability as we don’t have to open another file to see the content.

Conclusion

Accepting a filename as a function input to read from a file should, in most cases, be considered a code smell (except in specific functions such as os.Open ). As we’ve seen, it makes unit tests more complex because we may have to create multiple files. It also reduces the reusability of a function (although not all functions are meant to be reused). Using the io.Reader interface abstracts the data source. Regardless of whether the input is a file, a string, an HTTP request, or a gRPC request, the imple-mentation can be reused and easily tested.

References

  • 100 Go Mistakes and how to avoid them, Teiva Harsanyi, Manning Publications Co
Previous Post
Next Post