Go: Read a file line by line
In this tutorial we will look at how we can read a file line by line using Go. Go makes this incredibly easy by using bufio.NewScanner()
.
You can find the full source code here. Ok, let's jump in.
Create package and import packages
I am not going to talk much about this, but this is what I have the top of my main.go file:
package main
import (
"bufio"
"fmt"
"log"
"os"
)
Opening and Closing a file
It goes without saying that we need a main
function, so I will add it in this function. Another thing to note, is that I already have a file called fileToRead.txt
, this file contains the following content:
Hello
World
of
Go!
I will be reading this file. Now that that is cleared up, let's create the main
function and open/close that file.
func main() {
file, err := os.Open("fileToRead.txt")
if err != nil {
log.Fatal(err)
}
defer file.Close()
}
On the first line in the main
function we open the file using os.Open
. According to the Go
docs, this will open the file for reading.
Because opening a file can fail, we need to check to see if err
is nil
or not, so after we try to open the file, we will check if there is an error, if there is an error we will use log.Fatal
to print the error and exit the program.
Lastly, for this section is we will run defer file.Close()
. This will ensure that the file gets closed when the function returns.
Why we use defer: A defer statement pushes a function call onto a list. The list of saved calls is executed after the surrounding function returns. Defer is commonly used to simplify functions that perform various clean-up actions. More info can be found here.
Read the file line by line
Now that we have the opening and closing of the file done, we can look at reading the file line by line.
Add the following code below defer file.Close()
:
scanner := bufio.NewScanner(file)
for scanner.Scan() {
fmt.Println(scanner.Text())
}
if scanner.Err() == bufio.ErrTooLong {
log.Fatal(scanner.Err())
}
On the first line we create a new scanner and we assign it to the scanner
variable. Scanner
fits our purpose of reading a file line by line quite well. If we take a look at the docs, it says the following:
Scanner provides a convenient interface for reading data such as a file of newline-delimited lines of text
Once we have created a new Scanner
, we can use the Scan
method to get the next Token
. Scan
will return false when it reaches the end of the file or it will error.
When using Scan
we need to use either the Byte
or Text
method to use the token. In our case we can just use Text()
and then print that out.
Lastly we add a check to make sure to check if the scanner ran into an error. The scanner will stop as soon as there is an error, so we handle that after the loop. If there is an error, we will log the error and exit the program.
If we run this now, we should get the following output:
Hello
World
of
Go!
The same as the input file! Great news.
Optional: Large line file
By default, the scanner is limited to to a max token size of 64*1024
. To double check this, we can take a look at that implementation of NewScanner
:
const (
// MaxScanTokenSize is the maximum size used to buffer a token
// unless the user provides an explicit buffer with Scanner.Buffer.
// The actual maximum token size may be smaller as the buffer
// may need to include, for instance, a newline.
MaxScanTokenSize = 64 * 1024
startBufSize = 4096 // Size of initial allocation for buffer.
)
// NewScanner returns a new Scanner to read from r.
// The split function defaults to ScanLines.
func NewScanner(r io.Reader) *Scanner {
return &Scanner{
r: r,
split: ScanLines,
maxTokenSize: MaxScanTokenSize,
}
}
We can see that there is a MaxScanTokenSize
, and if we look at where it is declared, we will see its size.
I have another file called fileToReadLargeLine.txt
. This contains two lines, the first being bigger than the MaxScanTokenSize
and the second being very small, but the code will never get to it in its current state.
If we update the os.Open
file line to this:
file, err := os.Open("fileToReadLargeLine.txt")
And then run the code, we should something similar to this:
bufio.Scanner: token too long
So how do we fix this? We can fix this by simply setting the buffer size on the scanner.
Just below the line where we call NewScanner()
, add the following:
bufferSize := 1024 * 1024
scannerBuffer := make([]byte, bufferSize)
scanner.Buffer(scannerBuffer, bufferSize)
Basically what this code is doing is creating a buffer of 1 megabyte, which means that if we run the code again, we won't get the error anymore, and instead, it will print out that massive line as well as the small line that follows it.
Conclusion
Go makes it incredibly easy to read a file line by line, and while there is a small caveat with regards to line size, it is very easy to overcome that limitation by increasing the buffer size.
You can find the source code here.