September 19, 2021

Go: Read a file line by line

Go: Read a file line by line

In this tutorial we will look at how we can read a file line by line using Go. Go makes this incredibly easy by using bufio.NewScanner().

You can find the full source code here. Ok, let's jump in.

Create package and import packages

I am not going to talk much about this, but this is what I have the top of my main.go file:

package main

import (
	"bufio"
	"fmt"
	"log"
	"os"
)

Opening and Closing a file

It goes without saying that we need a main function, so I will add it in this function. Another thing to note, is that I already have a file called fileToRead.txt, this file contains the following content:

Hello
World
of
Go!

I will be reading this file. Now that that is cleared up, let's create the main function and open/close that file.

func main() {
	file, err := os.Open("fileToRead.txt")

	if err != nil {
		log.Fatal(err)
	}

	defer file.Close()
}

On the first line in the main function we open the file using os.Open. According to the Go docs, this will open the file for reading.

Because opening a file can fail, we need to check to see if err is nil or not, so after we try to open the file, we will check if there is an error, if there is an error we will use log.Fatal to print the error and exit the program.

Lastly, for this section is we will run defer file.Close(). This will ensure that the file gets closed when the function returns.

Why we use defer: A defer statement pushes a function call onto a list. The list of saved calls is executed after the surrounding function returns. Defer is commonly used to simplify functions that perform various clean-up actions. More info can be found here.

Read the file line by line

Now that we have the opening and closing of the file done, we can look at reading the file line by line.

Add the following code below defer file.Close():

scanner := bufio.NewScanner(file)

for scanner.Scan() {
	fmt.Println(scanner.Text())
}

if scanner.Err() == bufio.ErrTooLong {
	log.Fatal(scanner.Err())
}

On the first line we create a new scanner and we assign it to the scanner variable. Scanner fits our purpose of reading a file line by line quite well. If we take a look at the docs, it says the following:

Scanner provides a convenient interface for reading data such as a file of newline-delimited lines of text

Once we have created a new Scanner, we can use the Scan method to get the next Token. Scan will return false when it reaches the end of the file or it will error.

When using Scan we need to use either the Byte or Text method to use the token. In our case we can just use Text() and then print that out.

Lastly we add a check to make sure to check if the scanner ran into an error. The scanner will stop as soon as there is an error, so we handle that after the loop. If there is an error, we will log the error and exit the program.

If we run this now, we should get the following output:

Hello
World
of
Go!

The same as the input file! Great news.

Optional: Large line file

By default, the scanner is limited to to a max token size of 64*1024. To double check this, we can take a look at that implementation of NewScanner:

const (
	// MaxScanTokenSize is the maximum size used to buffer a token
	// unless the user provides an explicit buffer with Scanner.Buffer.
	// The actual maximum token size may be smaller as the buffer
	// may need to include, for instance, a newline.
	MaxScanTokenSize = 64 * 1024

	startBufSize = 4096 // Size of initial allocation for buffer.
)

// NewScanner returns a new Scanner to read from r.
// The split function defaults to ScanLines.
func NewScanner(r io.Reader) *Scanner {
	return &Scanner{
		r:            r,
		split:        ScanLines,
		maxTokenSize: MaxScanTokenSize,
	}
}

We can see that there is a MaxScanTokenSize, and if we look at where it is declared, we will see its size.

I have another file called fileToReadLargeLine.txt. This contains two lines, the first being bigger than the MaxScanTokenSize and the second being very small, but the code will never get to it in its current state.

If we update the os.Open file line to this:

file, err := os.Open("fileToReadLargeLine.txt")

And then run the code, we should something similar to this:

bufio.Scanner: token too long

So how do we fix this? We can fix this by simply setting the buffer size on the scanner.

Just below the line where we call NewScanner(), add the following:

bufferSize := 1024 * 1024
scannerBuffer := make([]byte, bufferSize)
scanner.Buffer(scannerBuffer, bufferSize)

Basically what this code is doing is creating a buffer of 1 megabyte, which means that if we run the code again, we won't get the error anymore, and instead, it will print out that massive line as well as the small line that follows it.

Conclusion

Go makes it incredibly easy to read a file line by line, and while there is a small caveat with regards to line size, it is very easy to overcome that limitation by increasing the buffer size.

You can find the source code here.