bitfield / script

среда, 12 июня 2019 г. в 00:16:57

Go
Making it easy to write shell-like scripts in Go

script

script is a Go library for doing the kind of tasks that shell scripts are good at: reading files, executing subprocesses, counting lines, matching strings, and so on.

Why shouldn't it be as easy to write system administration programs in Go as it is in a typical shell? script aims to make it just that easy.

Shell scripts often compose a sequence of operations on a stream of data (a pipeline). This is how script works, too.

What can I do with it?

Let's see a simple example. Suppose you want to read the contents of a file as a string:

contents, err := script.File("test.txt").String()

That looks straightforward enough, but suppose you now want to count the lines in that file.

numLines, err := script.File("test.txt").CountLines()

For something a bit more challenging, let's try counting the number of lines in the file which match the string "Error":

numErrors, err := script.File("test.txt").Match("Error").CountLines()

But what if, instead of reading a specific file, we want to simply pipe input into this program, and have it output only matching lines (like grep)?

script.Stdin().Match("Error").Stdout()

That was almost too easy! So let's pass in a list of files on the command line, and have our program read them all in sequence and output the matching lines:

script.Args().Concat().Match("Error").Stdout()

What's that? You want to append that output to a file instead of printing it to the terminal? No problem:

script.Args().Concat().Match("Error").AppendFile("/var/log/errors.txt")

How does it work?

Those chained function calls look a bit weird. What's going on there?

One of the neat things about the Unix shell, and its many imitators, is the way you can compose operations into a pipeline:

cat test.txt | grep Error | wc -l

The output from each stage of the pipeline feeds into the next, and you can think of each stage as a filter which passes on only certain parts of its input to its output.

By comparison, writing shell-like scripts in raw Go is much less convenient, because everything you do returns a different data type, and you must (or at least should) check errors following every operation.

In scripts for system administration we often want to compose different operations like this in a quick and convenient way. If an error occurs somewhere along the pipeline, we would like to check this just once at the end, rather than after every operation.

Everything is a pipe

The script library allows us to do this because everything is a pipe (specifically, a script.Pipe). To create a pipe, start with a source like File():

var p script.Pipe
p = script.File("test.txt")

You might expect File() to return an error if there is a problem opening the file, but it doesn't. We will want to call a chain of methods on the result of File(), and it's inconvenient to do that if it also returns an error.

Instead, you can check the error status of the pipe at any time by calling its Error() method:

p = script.File("test.txt")
if p.Error() != nil {
    log.Fatalf("oh no: %v", p.Error())
}

What use is a pipe?

Now, what can you do with this pipe? You can call a method on it:

var q script.Pipe
q = p.Match("Error")

Note that the result of calling a method on a pipe is another pipe. You can do this in one step, for convenience:

var q script.Pipe
q = script.File("test.txt").Match("Error")

Handling errors

Woah, woah! Just a minute! What if there was an error opening the file in the first place? Won't Match blow up if it tries to read from a non-existent file?

No, it won't. As soon as an error status is set on a pipe, all operations on the pipe become no-ops. Any operation which would normally return a new pipe just returns the old pipe unchanged. So you can run as long a pipeline as you want to, and if an error occurs at any stage, nothing will crash, and you can check the error status of the pipe at the end.

(Seasoned Gophers will recognise this as the errWriter pattern described by Rob Pike in the blog post Errors are values.)

Getting output

A pipe is useless if we can't get some output from it. To do this, you can use a sink, such as String():

result, err := q.String()
if err != nil {
    log.Fatalf("oh no: %v", err)
}
fmt.Println(result)

Errors

Note that sinks return an error value in addition to the data. This is the same value you would get by calling p.Error(). If the pipe had an error in any operation along the pipeline, the pipe's error status will be set, and a sink operation which gets output will return the zero value, plus the error.

numLines, err := script.File("doesnt_exist.txt").CountLines()
fmt.Println(numLines)
// Output: 0
if err != nil {
	    log.Fatal(err)
}
// Output: open doesnt_exist.txt: no such file or directory

CountLines() is another useful sink, which simply returns the number of lines read from the pipe.

Closing pipes

If you've dealt with files in Go before, you'll know that you need to close the file once you've finished with it. Otherwise, the program will retain what's called a file handle (the kernel data structure which represents an open file). There is a limit to the total number of open file handles for a given program, and for the system as a whole, so a program which leaks file handles will eventually crash, and will waste resources in the meantime.

Files aren't the only things which need to be closed after reading: so do network connections, HTTP response bodies, and so on.

How does script handle this? Simple. The data source associated with a pipe will be automatically closed once it is read completely. Therefore, calling any sink method which reads the pipe to completion (such as String()) will close its data source. The only case in which you need to call Close() on a pipe is when you don't read from it, or you don't read it to completion.

If the pipe was created from something that doesn't need to be closed, such as a string, then calling Close() simply does nothing.

This is implemented using a type called ReadAutoCloser, which takes an io.Reader and wraps it so that:

it is always safe to close (if it's not a closable resource, it will be wrapped in an ioutil.NopCloser to make it one), and
it is closed automatically once read to completion (specifically, once the Read() call on it returns io.EOF).

It is your responsibility to close a pipe if you do not read it to completion.

Sources, filters, and sinks

script provides three types of pipe operations: sources, filters, and sinks.

Sources create pipes from input in some way (for example, File() opens a file).
Filters read from a pipe and filter the data in some way (for example Match() passes on only lines which contain a given string).
Sinks get the output from a pipeline in some useful form (for example String() returns the contents of the pipe as a string), along with any error status.

Let's look at the source, filter, and sink options that script provides.

Sources

These are operations which create a pipe.

Args

Args() creates a pipe containing the program's command-line arguments, one per line.

p := script.Args()
output, err := p.String()
fmt.Println(output)
// Output: command-line arguments

Echo

Echo() creates a pipe containing a given string:

p := script.Echo("Hello, world!")
output, err := p.String()
fmt.Println(output)
// Output: Hello, world!

Exec

Exec() runs a given command and creates a pipe containing its combined output (stdout and stderr). If there was an error running the command, the pipe's error status will be set.

p := script.Exec("echo hello")
output, err := p.String()
fmt.Println(output)
// Output: hello

Note that Exec() can also be used as a filter, in which case the given command will read from the pipe as its standard input.

Exit status

If the command returns a non-zero exit status, the pipe's error status will be set to the string "exit status X", where X is the integer exit status.

p := script.Exec("ls doesntexist")
output, err := p.String()
fmt.Println(err)
// Output: exit status 1

For convenience, you can get this value directly as an integer by calling ExitStatus() on the pipe:

p := script.Exec("ls doesntexist")
var exit int = p.ExitStatus()
fmt.Println(exit)
// Output: 1

The value of ExitStatus() will be zero unless the pipe's error status matches the string "exit status X", where X is a non-zero integer.

Error output

Even in the event of a non-zero exit status, the command's output will still be available in the pipe. This is often helpful for debugging. However, because String() is a no-op if the pipe's error status is set, if you want output you will need to reset the error status before calling String():

p := Exec("man bogus")
p.SetError(nil)
output, err := p.String()
fmt.Println(output)
// Output: No manual entry for bogus

File

File() creates a pipe that reads from a file.

p = script.File("test.txt")
output, err := p.String()
fmt.Println(output)
// Output: contents of file

Stdin

Stdin() creates a pipe which reads from the program's standard input.

p := script.Stdin()
output, err := p.String()
fmt.Println(output)
// Output: [contents of standard input]

Filters

Filters are operations on an existing pipe that also return a pipe, allowing you to chain filters indefinitely.

Concat

Concat() reads a list of filenames from the pipe, one per line, and creates a pipe which concatenates the contents of those files. For example, if you have files a, b, and c:

output, err := Echo("a\nb\nc\n").Concat().String()
fmt.Println(output)
// Output: contents of a, followed by contents of b, followed
// by contents of c

This makes it convenient to write programs which take a list of input files on the command line, for example:

func main() {
	script.Args().Concat().Stdout()
}

The list of files could also come from a file:

// Read all files in filelist.txt
p := File("filelist.txt").Concat()

...or from the output of a command:

// Print all config files to the terminal.
p := Exec("ls /var/app/config/").Concat().Stdout()

Each input file will be closed once it has been fully read.

EachLine

EachLine() lets you create custom filters. You provide a function, and it will be called once for each line of input. If you want to produce output, your function can write to a supplied strings.Builder. The return value from EachLine is a pipe containing your output.

p := script.File("test.txt")
q := p.EachLine(func(line string, out *strings.Builder) {
	out.WriteString("> " + line + "\n")
})
output, err := q.String()
fmt.Println(output)

Exec

Exec() runs a given command, which will read from the pipe as its standard input, and returns a pipe containing the command's combined output (stdout and stderr). If there was an error running the command, the pipe's error status will be set.

Apart from connecting the pipe to the command's standard input, the behaviour of an Exec() filter is the same as that of an Exec() source.

// `cat` copies its standard input to its standard output.
p := script.Echo("hello world").Exec("cat")
output, err := p.String()
fmt.Println(output)
// Output: hello world

Join

Join() reads its input and replaces newlines with spaces, preserving a terminating newline if there is one.

p := script.Echo("hello\nworld\n").Join()
output, err := p.String()
fmt.Println(output)
// Output: hello world\n

Match

Match() returns a pipe containing only the input lines which match the supplied string:

p := script.File("test.txt").Match("Error")

MatchRegexp

MatchRegexp() is like Match(), but takes a compiled regular expression instead of a string.

p := script.File("test.txt").MatchRegexp(regexp.MustCompile(`E.*r`))

Reject

Reject() is the inverse of Match(). Its pipe produces only lines which don't contain the given string:

p := script.File("test.txt").Match("Error").Reject("false alarm")

RejectRegexp

RejectRegexp() is like Reject(), but takes a compiled regular expression instead of a string.

p := script.File("test.txt").Match("Error").RejectRegexp(regexp.MustCompile(`false|bogus`))

Sinks

Sinks are operations which return some data from a pipe, ending the pipeline.

AppendFile

AppendFile() is like WriteFile(), but appends to the destination file instead of overwriting it. It returns the number of bytes written, or an error:

var wrote int
wrote, err := script.Echo("Got this far!").AppendFile("logfile.txt")

Bytes

Bytes() returns the contents of the pipe as a slice of byte, plus an error:

var data []byte
data, err := script.File("test.bin").Bytes()

CountLines

CountLines(), as the name suggests, counts lines in its input, and returns the number of lines as an integer, plus an error:

var numLines int
numLines, err := script.File("test.txt").CountLines()

Read

Read() behaves just like the standard Read() method on any io.Reader:

buf := make([]byte, 256)
n, err := r.Read(buf)

Because a Pipe is an io.Reader, you can use it anywhere you would use a file, network connection, and so on. You can pass it to ioutil.ReadAll, io.Copy, json.NewDecoder, and anything else which takes an io.Reader.

Unlike most sinks, Read() does not read the whole contents of the pipe (unless the supplied buffer is big enough to hold them).

Stdout

Stdout() writes the contents of the pipe to the program's standard output. It returns the number of bytes written, or an error:

p := Echo("hello world")
wrote, err := p.Stdout()

In conjunction with Stdin(), Stdout() is useful for writing programs which filter input. For example, here is a program which simply copies its input to its output, like cat:

func main() {
	script.Stdin().Stdout()
}

To filter only lines matching a string:

func main() {
	script.Stdin().Match("hello").Stdout()
}

String

String() returns the contents of the pipe as a string, plus an error:

contents, err := script.File("test.txt").String()

Note that String(), like all sinks, consumes the complete output of the pipe, which closes the input reader automatically. Therefore, calling String() (or any other sink method) again on the same pipe will return an error:

p := script.File("test.txt")
_, _ = p.String()
_, err := p.String()
fmt.Println(err)
// Output: read test.txt: file already closed

WriteFile

WriteFile() writes the contents of the pipe to a named file. It returns the number of bytes written, or an error:

var wrote int
wrote, err := script.File("source.txt").WriteFile("destination.txt")

Writing your own pipe operations

There's nothing to stop you writing your own sources, sinks, or filters (in fact, that would be excellent. Please submit a pull request if you want to add them to the standard operations supplied with script.)

Writing a source

All a pipe source has to do is return a pointer to a script.Pipe. To be useful, a pipe needs to have a reader (a data source, such as a file) associated with it.

Echo() is a simple example, which just creates a pipe containing a string:

func Echo(s string) *script.Pipe {
	return script.NewPipe().WithReader(strings.NewReader(s))
}

Let's break this down:

We create a strings.Reader to be our data source, using strings.NewReader on the supplied string.
We create a new pipe with NewPipe().
We attach the reader to the pipe with WithReader().

In fact, any io.Reader can be the data source for a pipe. Passing it to WithReader() will turn it into a ReadAutoCloser, which is a wrapper for io.Reader that automatically closes the reader once it has been fully read.

Here's an implementation of File(), for example:

func File(name string) *script.Pipe {
	p := script.NewPipe()
	f, err := os.Open(name)
	if err != nil {
		return p.WithError(err)
	}
	return p.WithReader(f)
}

Writing a filter

Filters are methods on pipes, that return pipes. For example, here's a simple filter which just reads and rejects all input, returning an empty pipe:

func (p *script.Pipe) RejectEverything() *script.Pipe {
	if p.Error() != nil {
		return p
	}
	_, err := ioutil.ReadAll(p.Reader)
	if err != nil {
		p.SetError(err)
		return p
	}
	return script.Echo("")
}

Important things to note here:

The first thing we do is check the pipe's error status. If this is set, we do nothing, and just return the original pipe.
If an error occurs, we set the pipe's error status, using p.SetError(), and return the pipe.

Filters must not log anything, terminate the program, or return anything but *script.Pipe.

As you can see from the example, the pipe's reader is available to you as p.Reader. You can do anything with that that you can with an io.Reader.

If your method modifies the pipe (for example if it can set an error on the pipe), it must take a pointer receiver, as in this example. Otherwise, it can take a value receiver.

Writing a sink

Any method on a pipe which returns something other than a pipe is a sink. For example, here's an implementation of String():

func (p *script.Pipe) String() (string, error) {
	if p.Error() != nil {
		return "", p.Error()
	}
	res, err := ioutil.ReadAll(p.Reader)
	if err != nil {
		p.SetError(err)
		return "", err
	}
	return string(res), nil
}

Again, the first thing we do is check the error status on the pipe. If it's set, we return the zero value (empty string in this case) and the error.

We then try to read the whole contents of the pipe. If we get an error on reading, we set the pipe's error status and return the zero value and the error.

Otherwise, we return the result of reading the pipe, and a nil error.

Ideas

These are some ideas I'm playing with for additional features. If you feel like working on one of them, send a pull request. If you have ideas for other features, open an issue (or, better, a pull request).

Sources

Get() makes a web request, like curl, and pipes the result
Net() makes a network connection to a specified address and port, and reads the connection until it's closed
ListFiles() takes a filesystem path or glob, and pipes the list of matching files
Find() pipes a list of files matching various criteria (name, modified time, and so on)
Processes() pipes the list of running processes, like ps.

Filters

Ideas welcome!

Sinks

Ideas equally welcome!

Examples

Since script is designed to help you write system administration programs, a few simple examples of such programs are included in the examples directory:

cat (copies stdin to stdout)
cat 2 (takes a list of files on the command line and concatenates their contents to stdout)
grep
echo

More examples would be welcome!

Use cases

The best libraries are designed to satisfy real use cases. If you have a sysadmin task which you'd like to implement with script, let me know by opening an issue.