At Raftt a significant component of our product is a CLI written in Go that is run by developers. It has a CLI component as well as a long-running daemon handling things like file syncing, port forwarding, and more.
The CLI and daemon both write logs to log files, which we can use to identify and debug problems. They are stored locally, and can be sent by the developer using the `raftt doctor` command when they encounter a problem.
To make sure these files don’t grow to be extremely large, we added log rotation - whenever the log file reaches a certain size, move it aside, compress it, and create a new one. When we reach a predefined limit of old files (or the files are too old) - delete them.
Everything was going great, until suddenly we no longer received the “rotated” (compressed) files in our user’s `doctor` reports. In this post we will give more background on log handling, talk about our setup, finding the root problem and finally our solution.
## Log rotation and Lumberjack
Log rotation is the practice of maintaining logging files so they don’t grow enormously large. Usually it means archiving the current log file, compressing it, creating a new log file and finally deleting old archives. It is often parameterized by maximum logfile age and maximum logfile size.
A popular (and aptly named) utility for log rotation is [`logrotate`](https://linux.die.net/man/8/logrotate). Borrowing from it’s `man` page:
> **logrotate** is designed to ease administration of systems that generate large numbers of log files. It allows automatic rotation, compression, removal, and mailing of log files. Each log file may be handled daily, weekly, monthly, or when it grows too large.
[`logrotate`](https://linux.die.net/man/8/logrotate) is not a good fit for a program that runs on a personal computer and needs to support multiple operating systems because [`logrotate`](https://linux.die.net/man/8/logrotate) is Unix-specific. We support Linux, but also Mac and Windows. Luckily, there is a popular log rotation package in Go - [`lumberjack`](https://github.com/natefinch/lumberjack).
[`lumberjack`](https://github.com/natefinch/lumberjack) provides a simple and efficient way to rotate log files based on their size and age. To use [`lumberjack`](https://github.com/natefinch/lumberjack) in your Golang application, you simply need to import the [`lumberjack`](https://github.com/natefinch/lumberjack) package and create a new instance of the logger, specifying the log file path and rotation settings. As with other log rotation solutions, you can choose the number of old log files to keep track of and [`lumberjack`](https://github.com/natefinch/lumberjack) can compress them to save more storage.
We use [`zerolog`](https://github.com/rs/zerolog) as our logger, and the following code example we will see how you can use them together:
```go
import (
"io"
"gopkg.in/natefinch/lumberjack.v2"
"github.com/rs/zerolog"
)
func Rotate(path string) io.WriteCloser {
return &lumberjack.Logger{
Filename: path,
MaxSize: 10, // In MB before rotating the file
MaxAge: 30, // In days before deleting the file
MaxBackups: 5, // Maximum number of backups to keep track of
Compress: true, // Compress the rotated log files, false by default.
}
}
func main() {
logger := zerolog.New(Rotate("/tmp/out.log")
logger.Info().Msg("Hello World!")
}
```
## The problem
At Raftt each CLI process writes logs to the same main file. One day we noticed that the “rotated” log files disappeared. We quickly realized that this started around the time we released our new IDE plugins, which invoked the CLI periodically to retrieve info for display in the IDE. Our working theory from that point was that multiple process trying to rotate the same log file can override the file handle of each other causing various… shenanigans.
We opened up the `lumberjack` code and found that the library does not support multi process logging to the same file. There is no locking during the rotate process, and the current size of the file is [kept in memory](https://github.com/natefinch/lumberjack/blob/v2.0/lumberjack.go#L110) and tracked over writes, so it is not clear how to synchronize this process. Multiple processes using the `lumberjack` library at the same time can easily try to rotate the log file at the same time, overriding each others file handle. Looking at the issues page of the `lumberjack` Github repository we found one that [refers to the same problem](https://github.com/natefinch/lumberjack/issues/85).
In the next paragraph we will discuss possible solutions and the one we chose.
## Building out a solution
We came up with 2 possible solutions for the multi-process log rotation problem:
1. We can have each process write to a dedicated log file and have a main collection process that gathers these logs and writes them to a rotating output.
2. We can change `lumberjack` to use a cross process lock when trying to rotate the file.
Option 2 seemed very attractive at first. However, because the rotation affects future writes, every log write would need to be locked as well. This would be too much of a performance hit, and this option was discarded.
Option 1 is a bit more complex, but doesn’t any performance issues. Having each CLI process write to a dedicated log is easy - just create a log file with the current PID of the process as the file name and write to it.
The harder part is the process which periodically collects the CLI log files. Luckily, we already have a daemonized process running in the background of the user’s PC, we just need it to periodically gather the log files and write them to a single, rotating, log file.
For this solution to be complete we needed to handle the following:
1. Need to ensure that the collection process does not gather incomplete log files.
2. If somehow log files are not gathered for a very long time we need to garbage collect them.
3. Since we are gather logs from processes that run in parallel, we need to sort the rotated log file by timestamp.
We decided that point 3 was not worth the effort as we can reorder the lines of the log file when needed (at log read / analysis time), but we had to deal with points 1 and 2.
To solve point 1 we need some kind of signal, allowing the daemon to know that the process writing to each log file is done. There are several options - we went for a simple file, created by each CLI process when it is done running to indicate that the log file can be safely gathered.
To solve point 2 the daemon process checks if there are very old log or signal files and if it finds any it removes them.
## Putting it all together
We have 2 directories:
1. One to store the log files for each CLI invoked process. Let’s call it `cli`.
2. One for storing the done indications of these processes. Let’s call it `done`.
A process is invoked with PID 1000 creates a file in the `cli` directory called `cli.1000.log` and writes all logs to it. When the process is done running it creates an empty file with the same name in the `done` directory.
The daemon periodically goes over the `done` directory and for each file there it:
1. Finds the matching file with the same name in the `cli` directory.
2. Copies the content to the rotating merged log file.
3. Deletes both files.
Finally, as it goes over the files if they are very old it just deletes them.
See the following partial code to see how simple this ended up being!
### Daemon log file gathering code
```go
import (
"context"
"fmt"
"time"
"path/filepath"
"gopkg.in/natefinch/lumberjack.v2"
)
func MergeLogs(ctx context.Context) {
logDir := LogDirPath(...)
doneIndicationsDir := filepath.Join(logDir, "done")
cliLogsDir := filepath.Join(logDir, "cli")
cliLogRotator := &lumberjack.Logger{Filename: "merged.log"}
defer cliLogRotator.Close()
for {
select {
case <-ctx.Done():
return
case <-time.After(time.Minute):
logFiles := gatherLogFiles(...)
copyLogFilesToMerged(logFiles)
cleanupVeryOldFiles()
}
}
}
```
### CLI log file writing code
```go
import (
"fmt"
"os"
"path/filepath"
)
func main() {
logsDir := LogDirPath(...)
pid := fmt.Sprint(os.Getpid())
logFileName := filepath.Join(logsDir, "cli", fmt.Sprintf("cli.%d.log", pid))
logFileHandle, err := openLogFile(logFileName)
if err != nil {
fmt.Fprintf(os.Stderr, "Error: Could not create log file: %v\n", err.Error())
return
}
doneIndicationDir := filepath.Join(logsDir, "done")
_ = os.MkdirAll(doneIndicationDir, 0o755)
SetupLogger(logFileName) // This makes the logger available for use anywhere in the codebase
defer func() {
logFileHandle.Close()
_, _ = os.Create(filepath.Join(doneIndicationDir, logFileName))
}
app.Execute()
}
```
## Finishing up
Log rotation is a critical aspect of managing log files in any log-writing application. With the [`lumberjack`](https://github.com/natefinch/lumberjack) utility, it's easy to implement efficient and reliable log rotation in your Golang application. But if you try to use [`lumberjack`](https://github.com/natefinch/lumberjack) from multiple processes at the same time you are going to have problems.
Seeing that [`lumberjack`](https://github.com/natefinch/lumberjack) is the industry standard in log rotation in Golang, and multi-process log rotation is a common problem people encounter, we are considering releasing our solution as an open source library for everyone to use. Let us know if you would find it useful!
The ability to focus on doing what you love best can be more than a bottled-up desire lost in a sea of frustration. Make it a reality — with Raftt.