modernish / modernish
- среда, 7 февраля 2018 г. в 03:14:48
cross-platform POSIX shell feature detection and language extension library
modernish is an ambitious, as-yet experimental, cross-platform POSIX shell feature detection and language extension library. It aims to extend the shell language with extensive feature testing and language enhancements, using the power of aliases and functions to extend the shell language using the shell language itself.
The name is a pun on Modernizr, the JavaScript feature testing library, -sh, the common suffix for UNIX shell names, and -ish, still not quite a modern programming language but perhaps a little closer. jQuery is another source of general inspiration; like it, modernish adds a considerable feature set by using the power of the language it's implemented in to extend/transcend that same language.
That said, the aim of modernish is to build a better shell language, and not to make the shell language into something it's not. Its feature set is aimed at solving specific and commonly experienced deficits and annoyances of the shell language, and not at adding/faking things that are foreign to it, such as object orientation or functional programming. (However, since modernish is modular, nothing stops anyone from adding a module attempting to implement these things.)
The library builds on pure POSIX 2013 Edition (including full C-style shell arithmetics with assignment, comparison and conditional expressions), so it should run on any POSIX-compliant shell and operating system. But it does not shy away from using non-standard extensions where available to enhance performance or robustness.
Some example programs are in share/doc/modernish/examples
.
Modernish also comes with a suite of regression tests to detect bugs in modernish itself. See Appendix B.
Run install.sh
and follow instructions, choosing your preferred shell
and install location. After successful installation you can run modernish
shell scripts and write your own. Run uninstall.sh
to remove modernish.
Both the install and uninstall scripts are interactive by default, but support fully automated (non-interactive) operation as well. Command line options are as follows:
install.sh
[ -n
] [ -s
shell ] [ -f
] [ -d
installroot ] [ -D
prefix ]
-n
: non-interactive operation-s
: specify default shell to execute modernish-f
: force unconditional installation on specified shell-d
: specify root directory for installation-D
: extra destination directory prefix (for packagers)uninstall.sh
[ -n
] [ -f
] [ -d
installroot ]
-n
: non-interactive operation-f
: delete */modernish directories even if files left-d
: specify root directory of modernish installation to uninstallThe simplest way to write a modernish program is to source modernish as a dot script. For example, if you write for bash:
#! /bin/bash
. modernish
use safe
use sys/base
...your program starts here...
The modernish 'use' command load modules with optional functionality. safe
is
a special module that introduces a new and safer way of shell programming, with
field splitting (word splitting) and pathname expansion (globbing) disabled by
default. The sys/base
module contains modernish versions of certain basic but
non-standardised utilities (e.g. readlink
, mktemp
, which
), guaranteeing
that modernish programs all have a known version at their disposal. There are
many other modules as well. See below for more information.
The above method makes the program dependent on one particular shell (in this case, bash). So it is okay to mix and match functionality specific to that particular shell with modernish functionality.
The most portable way to write a modernish program is to use the special generic hashbang path for modernish programs. For example:
#! /usr/bin/env modernish
#! use safe
#! use sys/base
...your program begins here...
A program in this form is executed by whatever shell the user who installed modernish on the local system chose as the default shell. Since you as the programmer can't know what shell this is (other than the fact that it passed some rigorous POSIX compliance testing executed by modernish), a program in this form must be strictly POSIX compliant -- except, of course, that it should also make full use of the rich functionality offered by modernish.
Note that modules are loaded in a different way: the use
commands are part of
hashbang comment (starting with #!
like the initial hashbang path). Only such
lines that immediately follow the initial hashbang path are evaluated; even
an empty line in between causes the rest to be ignored.
thisshellhas BUG_MULTIBYTE
or thisshellhas BUG_NOCHCLASS
where needed. See Appendix A under Bugs.LC_*
or LANG
) after
initialising modernish. Doing this might break various functions, as
modernish sets specific versions depending on your OS, shell and locale.
(Temporarily changing the locale is fine as long as you don't use
modernish features that depend on it -- for example, setting a specific
locale just for an external command. However, if you use harden()
, see
the important note
in its documentation below!)Modernish is primarily designed to enhance shell programs/scripts, but also
offers features for use in interactive shells. For instance, the new with
loop construct from the loop/with
module can be quite practical to repeat
an action x times, and the safe
module on interactive shells provides
convenience functions for manipulating, saving and restoring the state of
field splitting and globbing.
To use modernish on your favourite interactive shell, you have to add it to
your .profile
, .bashrc
or similar init file.
Important: Upon initialising, modernish adapts itself to
other settings, such as the locale. So you have to organise your
.profile
or similar file in the following order:
PATH
, locale, etc.);. modernish
and use
any modules you want;After installation, the modernish
command can be invoked as if it were a
shell, with the standard command line options from other shells (such as
-c
to specify a command or script directly on the command line), plus some
enhancements. The effect is that the shell chosen at installation time will
be run enhanced with modernish functionality. It is not possible to use
modernish as an interactive shell in this way.
Usage:
modernish
[ --use=
module | option ... ]
[ scriptfile ] [ arguments ]modernish
[ --use=
module | option ... ]
-c
[ script [ me-name [ arguments ] ] ]modernish --test
modernish --version
In the first form, the --use
long-form option preloads any given modernish
modules, any given short or long-form shell options
are set or unset (the syntax is identical to that of POSIX shells and the
shell options
supported depend on the shell executing modernish), and then scriptfile is
loaded and executed with any arguments assigned to the positional parameters.
The module argument to each specified --use
option is split using
standard shell field splitting. The first field is the module name and any
further fields become arguments to that module's initialisation routine.
Using the shell option -e
or -o errexit
is an error, because modernish
does not support it and
would break. If the shell option -x
or -o xtrace
is given, modernish sets
the PS4
prompt to a useful value that traces the line number and exit status,
as well as the current file and function names if the shell is capable of this.
In the second form, after pre-loading any modules and setting any shell
options as in the first form, -c
executes the specified modernish
script, optionally with the me-name assigned to $ME
and the
arguments assigned to the positional parameters. This is identical to the
-c
option on POSIX shells, except that the me-name is assigned to $ME
and not $0
(because POSIX shells do not allow changing $0
).
The --test
option runs the regression test suite and exits. This verifies
that the modernish installation is functioning correctly.
See Appendix B for more information.
The --version
option outputs the version of modernish and exits.
modernish --use=loop/with -c 'with i=1 to 10; do putln "$i"; done'
zsh /usr/local/bin/modernish -o xtrace /path/to/program.sh
Function-local variables are not supported by the standard POSIX shell; only
global variables are provided for. Modernish needs a way to store its
internal state without interfering with the program using it. So most of the
modernish functionality uses an internal namespace _Msh_*
for variables,
functions and aliases. All these names may change at any time without
notice. Any names starting with _Msh_
should be considered sacrosanct and
untouchable; modernish programs should never directly use them in any way.
Of course this is not enforceable, but names starting with _Msh_
should be
uncommon enough that no unintentional conflict is likely to occur.
Modernish includes a battery of shell bug, quirk and feature tests, each of
which is given a special ID. These are easy to query using the thisshellhas
function, e.g. if thisshellhas LOCAL, then
... That same function also tests
if 'thisshellhas' a particular reserved word, builtin command or shell option.
To reduce start up time, the main bin/modernish script only includes the bug/quirk/feature tests that are essential to the functioning of it; these are considered built-in tests. The rest, considered external tests, are included as small test scripts in libexec/modernish/cap/*.t which are sourced on demand.
Feature testing is used by library functions to conveniently work around bugs or
take advantage of special features not all shells have. For instance,
ematch
will use [[
var =~
regex ]]
if available and fall back to
invoking awk
to use its builtin match()
function otherwise.
But the use of feature testing is not restricted to
modernish itself; any script using the library can do this in the same way.
The thisshellhas
function is an essential component of feature testing in
modernish. There is no standard way of testing for the presence of a shell
built-in or reserved word, so different shells need different methods; the
library tests for this and loads the correct version of this function.
See Appendix A below for a list of capabilities and bugs currently tested for.
Modernish provides certain constants (read-only variables) to make life easier. These include:
$MSH_VERSION
: The version of modernish.$MSH_PREFIX
: Installation prefix for this modernish installation (e.g.
/usr/local).$ME
: Path to the current program. Replacement for $0
. This is
necessary if the hashbang path #!/usr/bin/env modernish
is used, or if
the program is launched like sh /path/to/bin/modernish /path/to/script.sh
, as these set $0
to the path to bin/modernish and
not your program's path.$MSH_SHELL
: Path to the default shell for this modernish installation,
chosen at install time (e.g. /bin/sh). This is a shell that is known to
have passed all the modernish tests for fatal bugs. Cross-platform scripts
should use it instead of hard-coding /bin/sh, because on some operating
systems (NetBSD, OpenBSD, Solaris) /bin/sh is not POSIX compliant.$SIGPIPESTATUS
: The exit status of a command killed by SIGPIPE
(a
broken pipe). For instance, if you use grep something somefile.txt | more
and you quit more
before grep
is finished, grep
is killed by
SIGPIPE and exits with that particular status. Some modernish functions,
such as harden
and traverse
, need to handle such a SIGPIPE exit
specially to avoid unduly killing the program. The exact value of this
exit status is shell-specific, so modernish runs a quick test to determine
it at initialisation time.SIGPIPE
was set to ignore by the process that invoked the current
shell, SIGPIPESTATUS
can't be detected and is set to the special value
99999. See also the description of the
WRN_NOSIGPIPE
ID for
thisshellhas
.$DEFPATH
: The default system path guaranteed to find compliant POSIX
utilities, as given by getconf PATH
.POSIX does not provide for the quoted C-style escape codes commonly used in
bash, ksh and zsh (such as $'\n'
to represent a newline character),
leaving the standard shell without a convenient way to refer to control
characters. Modernish provides control character constants (read-only
variables) with hexadecimal suffixes $CC01
.. $CC1F
and $CC7F
, as well as $CCe
,
$CCa
, $CCb
, $CCf
, $CCn
, $CCr
, $CCt
, $CCv
(corresponding with
printf
backslash escape codes). This makes it easy to insert control
characters in double-quoted strings.
More convenience constants, handy for use in bracket glob patterns for use
with case
or modernish match
:
$CONTROLCHARS
: All the control characters.$WHITESPACE
: All whitespace characters.$ASCIIUPPER
: The ASCII uppercase letters A to Z.$ASCIILOWER
: The ASCII lowercase letters a to z.$ASCIIALNUM
: The ASCII alphanumeric characters 0-9, A-Z and a-z.$SHELLSAFECHARS
: Safelist for shell-quoting.$ASCIICHARS
: The complete set of ASCII characters (minus NUL).A few aliases that seem to make the shell language look slightly friendlier:
alias not='! ' # more legible synonym for '!'
alias so='[ "$?" -eq 0 ]' # test preceding command's success with
# 'if so;' or 'if not so;'
alias forever='while :;' # indefinite loops: forever do <stuff>; done
exit
: extended usage: exit
[ -u
] [ status [ message ] ]
If the -u option is given, the function showusage() is called, which has
a simple default but can be redefined by the script.
die
: reliably halt program execution, even from within subshells, optionally
printing an error message. Note that die
is meant for an emergency program
halt only, i.e. in situations were continuing would mean the program is in an
inconsistent or undefined state. Shell scripts running in an inconsistent or
undefined state may wreak all sorts of havoc. They are also notoriously
difficult to terminate correctly, especially if the fatal error occurs within
a subshell: exit
won't work then. That's why die
is optimised for
killing all the program's processes (including subshells and external
commands launched by it) as quickly as possible. It should never be used for
exiting the program normally.
On interactive shells, die
behaves differently. It does not kill or exit your
shell; instead, it issues SIGINT
to the shell to abort the execution of your
running command(s), which is equivalent to pressing Ctrl+C.
Usage: die
[ message ]
A special DIE
pseudosignal can be trapped (using plain old trap
or
pushtrap
)
to perform emergency cleanup commands upon
invoking die
. On interactive shells, DIE
traps are never executed (though
they can be set and printed). On non-interactive shells, in order to kill the
malfunctioning program as quickly as possible (hopefully before it has a chance
to delete all your data), die
doesn't wait for those traps to complete before
killing the program. Instead, it executes each DIE
trap simultaneously as a
background job, then gathers the process IDs of the main shell and all its
subprocesses, sending SIGKILL
to all of them except any DIE
trap processes.
(One case where die
is limited is when the main shell program has exited,
but several runaway background processes that it forked are still going. If
die
is called by one of those background processes, then it will kill that
background process and its subshells, but not the others. This is due to an
inherent limitation in the design of POSIX operating systems. When the main
shell exits, its surviving background processes are detached from the
process hierarchy and become independent from one another, with no way to
determine that they once belonged to the same program.)
insubshell
: easily check if you're currently running in a subshell. This
function takes no arguments. It returns success (0) if it was called from
within a subshell and non-success (1) if not. In either case, the process ID
(PID) of the current subshell or main shell is stored in REPLY
. (Note that
on AT&T ksh93, which does not fork a new process for non-background
subshells, that PID is same as the main shell's except for background jobs.)
setstatus
: manually set the exit status $?
to the desired value. The
function exits with the status indicated. This is useful in conditional
constructs if you want to prepare a particular exit status for a subsequent
'exit' or 'return' command to inherit under certain circumstances.
thisshellhas
is the central function of the modernish feature testing
framework. It tests if one or more shell built-in commands, shell reserved
words (a.k.a. keywords), shell options, or shell capabilities/quirks/bugs are
present on the current shell.
This function is designed to minimise the need to avoid calling it to optimise
performance. Where appropriate, test results are cached in an internal variable
after the first test, so repeated checks using thisshellhas
are efficient.
Usage:
thisshellhas
[ --cache
| --show
] item [ item ... ]
_
,
return the result status of the associated modernish
feature, quirk or bug test.--rw=
or --kw=
, check if the identifier
immediately following these characters is a shell reserved word
(a.k.a. shell keyword).--bi=
, similarly check for a shell built-in command.--sig=
, check if the shell knows about a signal
(usable by kill
, trap
, etc.) by the name or number following the =
.
If a number > 128 is given, the remainder of its division by 128 is checked.
If the signal is found, its canonicalised signal name is left in the
REPLY
variable, otherwise REPLY
is unset. (If multiple --sig=
items
are given and all are found, REPLY
contains only the last one.)-o
followed by a separate word, check if this shell has a
long-form shell option by that name.-
, check if
this shell has a short-form shell option by that character.--cache
option runs all external modernish bug/quirk/feature tests
that have not yet been run, causing the cache to be complete.--show
option performs a --cache
and then outputs all the IDs of
positive results, one per line.thisshellhas
continues to process items until one of them produces a
negative result or is found invalid, at which point any further items are
ignored. So the function only returns successfully if all the items
specified were found on the current shell. (To check if either one item or
another is present, use separate thisshellhas
invocations separated by the
||
shell operator.)
Note that the tests for the presence of reserved words, built-in commands, shell options, and signals only check if an item by that name exists on this shell. No attempt is made to verify that it does the same thing as on another shell.
Exit status: 0 if this shell has all the items in question; 1 if not; 2 if an item was encountered that is not recognised as a valid identifier.
isvarname
: Check if argument is valid portable identifier in the shell,
that is, a portable variable name, shell function name or long-form shell
option name. (Modernish requires portable names everywhere; for example,
accented or non-Latin characters in variable names are not supported.)
isset
: check if a variable, shell function or option is set. Usage:
isset
varname: Check if a variable is set.isset -v
varname: Id.isset -x
varname: Check if variable is exported.isset -r
varname: Check if variable is read-only.isset -f
funcname: Check if a shell function is set.isset -
optionletter (e.g. isset -C
): Check if shell option is set.isset -o
optionname: Check if shell option is set by long name.Exit status: 0 if the item is set; 1 if not; 2 if the argument is not recognised as a syntactically valid identifier.
When checking a shell option, a nonexistent shell option is not an error,
but returns the same result as an unset shell option. (To check if a shell
option exists, use thisshellhas
.
Note: just isset -f
checks if shell option -f
(a.k.a. -o noglob
) is
set, but with an extra argument, it checks if a shell function is set.
Similarly, isset -x
checks if shell option -x
(a.k.a -o xtrace
)
is set, but isset -x
varname checks if a variable is exported. If you
use unquoted variable expansions here, make sure they're not empty, or
the shell's empty removal mechanism will cause the wrong thing to be checked
(even in use safe
mode).
unexport
: the opposite of export
. Unexport a variable while preserving
its value, or (while working under set -a
) don't export it at all.
Usage is like export
, with the caveat that variable assignment arguments
containing non-shellsafe characters or expansions must be quoted as
appropriate, unlike in some specific shell implementations of export
.
(To get rid of that headache, use safe
.)
shellquote
: Quote the values of specified variables in such a way that the
values are suitable for parsing by the shell as string literals. This is
essential for the safe use of eval
or any other context where the shell
must parse untrusted input. shellquote
only uses quoting mechanisms
specified by POSIX, so the quoted values it produces are safe to parse
in any POSIX shell. They are also safe to parse using
xargs
(1).
Usage: shellquote
[ -f
|+f
] varname [ [ -f
|+f
] varname ... ]
The values of the variables specified by name are shell-quoted and stored
back into those variables. By default, a value is only quoted if it contains
characters not present in $SHELLSAFECHARS
. An -f
argument forces
unconditional quoting for subsequent variables; an +f
argument restores
default behaviour. shellquote
returns success (0) if all variables were
processed successfully, and non-success (1) if any undefined (unset)
variables were encountered. In the latter case, any set variables still get
their values quoted.
shellquoteparams
: shell-quote the current shell's positional parameters
in-place.
storeparams
: store the positional parameters, or a sub-range of them,
in a variable, in a shellquoted form suitable for restoration using
eval "set -- $varname"
. For instance: storeparams -f2 -t6 VAR
quotes and stores $2
to $6
in VAR
.
push
& pop
: every variable and shell option gets its own stack. For
variables, both the value and the set/unset state is (re)stored. Usage:
push
[ --key=
value ] item [ item ... ]pop
[ --keepstatus
] [ --key=
value ] item [ item ... ]where item is a valid portable variable name, a short-form shell option
(dash plus letter), or a long-form shell option (-o
followed by an option
name, as two arguments). The precise shell options supported (other than the
ones guaranteed by POSIX) depend on the shell modernish is running on. For
cross-shell compatibility, nonexistent shell options are treated as unset.
Before pushing or popping anything, both functions check if all the given
arguments are valid and pop
checks all items have a non-empty stack. This
allows pushing and popping groups of items with a check for the integrity of
the entire group. pop
exits with status 0 if all items were popped
successfully, and with status 1 if one or more of the given items could not
be popped (and no action was taken at all).
The --key=
option is an advanced feature that can help different modules
or funtions to use the same variable stack safely. If a key is given to
push
, then for each item, the given key value is stored along with the
variable's value for that position in the stack. Subsequently, restoring
that value with pop
will only succeed if the key option with the same key
value is given to the pop
invocation. Similarly, popping a keyless value
only succeeds if no key is given to pop
. If there is any key mismatch, no
changes are made and pop returns status 2. For instance, if a function
pushes all its values with something like --key=myfunction
, it can do a
loop like while pop --key=myfunction var; do ...
even if var
already has
other items on its stack that shouldn't be tampered with. Note that this is
a robustness/convenience feature, not a security feature; the keys are not
hidden in any way. (The var/setlocal
module, which provides stack-based local variables, internally makes use of
this feature.)
If the --keepstatus
option is given, pop
will exit with the
exit status of the command executed immediately prior to calling pop
. This
can avoid the need for awkward workarounds when restoring variables or shell
options at the end of a function. However, note that this makes failure to pop
(stack empty or key mismatch) a fatal error that kills the program, as pop
no longer has a way to communicate this through its exit status.
The shell options stack allows saving and restoring the state of any shell
option available to the set
builtin using push
and pop
commands with
a syntax similar to that of set
.
Long-form shell options are matched to their equivalent short-form shell
options, if they exist. For instance, on all POSIX shells, -f
is
equivalent to -o noglob
, and push -o noglob
followed by pop -f
works
correctly. (This works even for shell-specific short & long option
equivalents; modernish internally does a check to find any equivalent.)
On shells with a dynamic no
option name prefix, that is on ksh, zsh and
yash (where, for example, noglob
is the opposite of glob
), the no
prefix is ignored, so something like push -o glob
followed by pop -o noglob
does the right thing. But this depends on the shell and should never
be used in cross-shell scripts.
pushtrap
and poptrap
: traps are now also stack-based, so that each
program component or library module can set its own trap commands
without interfering with others.
Note an important difference between the trap stack and stacks for variables
and shell options: pushing traps does not save them for restoring later, but
adds them alongside other traps on the same signal. All pushed traps are
active at the same time and are executed from last-pushed to first-pushed
when the respective signal is triggered. Traps cannot be pushed and popped
using push
and pop
but use dedicated commands as follows.
Usage:
pushtrap
[ --key=
value ] [ --
] command sigspec [ sigspec ... ]poptrap
[ --key=
value ] [ --
] sigspec [ sigspec ... ]pushtrap
works like regular trap
, with the following exceptions:
pushtrap
for the same signal. To remedy this, you
can issue a simple trap
command; as modernish prints the traps, it will
quietly detect ones it doesn't yet know about and make them work nicely
with the trap stack.)pushtrap
within a subshell has no effect (except adding dummy traps for
printing with a trap
command without arguments).pushtrap
stores current $IFS
(field splitting) and $-
(shell options)
along with the pushed trap. Within the subshell executing each stack trap,
modernish restores IFS
and the shell options f
(noglob
), u
(nounset
) and C
(noclobber
) to the values in effect during the
corresponding pushtrap
. This is to avoid unexpected effects in case a trap
is triggered while temporary settings are in effect.--key
option applies the keying functionality inherited from
plain push
to the trap stack.
It works the same way, so the description is not repeated here.poptrap
takes just signal names or numbers as arguments. It takes the
last-pushed trap for each signal off the stack, storing the commands that
was set for those signals into the REPLY variable, in a format suitable for
re-entry into the shell. Again, the --key
option works as in
plain pop
.
Modernish tries hard to avoid incompatibilities with existing trap practice. To that end, it intercepts the regular POSIX 'trap' command using an alias, reimplementing and interfacing it with the shell's builtin trap facility so that plain old regular traps play nicely with the trap stack. You should not notice any changes in the POSIX 'trap' command's behaviour, except for the following:
bash
users might notice the SIG
prefix is not included in the signal names written.)SIG
prefix on
all shells; that prefix is quietly accepted and discarded.var=$(trap)
) now works on every shell supported by modernish, including
(d)ash, mksh and zsh.trap INT
to unset a SIGINT
trap (which only works if the 'trap' command is given exactly one
argument). Note that this is for compatibility with existing scripts only.POSIX traps for each signal are always executed after that signal's stack-based traps; this means they should not rely on modernish modules that use the trap stack to clean up after themselves on exit, as those cleanups would already have been done.
Modernish introduces a new DIE
(-1) pseudosignal whose traps are
executed upon invoking die
in scripts. This is analogous to the
EXIT
(0) pseudosignal that is built in to all POSIX shells. All
trap-related commands in modernish support this new pseudosignal. Note
that DIE
traps are never executed on interactive shells.
See the die
description for
more information.
On interactive shells, INT
traps (both POSIX and stack) are cleared out
after executing them once. This is because die
uses SIGINT for cleanup and command interruption on interactive shells.
pushparams
and popparams
: push and pop the complete set of positional
parameters. No arguments are supported.
For the four functions below, item can be:
-o
followed by an option name (two arguments)@
to refer to the positional parameters stack--trap=
SIGNAME to refer to the trap stack for the indicated signalstackempty
[ --key=
value ] [ --force
] item: Tests if the stack
for an item is empty. Returns status 0 if it is, 1 if it is not. The key
feature works as in pop
: by default, a key
mismatch is considered equivalent to an empty stack. If --force
is given,
this function ignores keys altogether.
stacksize
[ --silent
| --quiet
] item: Leaves the size of a stack in
the REPLY
variable and, if option --silent
or --quiet
is not given,
writes it to standard output.
The size of the complete stack is returned, even if some values are keyed.
printstack
[ --quote
] item: Outputs a stack's content.
Option --quote
shell-quotes each stack value before printing it, allowing
for parsing multi-line or otherwise complicated values.
Column 1 to 7 of the output contain the number of the item (down to 0).
If the item is set, column 8 and 9 contain a colon and a space, and
if the value is non-empty or quoted, column 10 and up contain the value.
Sets of values that were pushed with a key are started with a special
line containing --- key:
value. A subsequent set pushed with no key is
started with a line containing --- (key off)
.
Returns status 0 on success, 1 if that stack is empty.
clearstack
[ --key=
value ] [ --force
] item [ item ... ]:
Clears one or more stacks, discarding all items on it.
If (part of) the stack is keyed or a --key
is given, only clears until a
key mismatch is encountered. The --force
option overrides this and always
clears the entire stack (be careful, e.g. don't use within
setlocal
... endlocal
).
Returns status 0 on success, 1 if that stack was already empty, 2 if
there was nothing to clear due to a key mismatch.
harden
: modernish's replacement for set -e
a.k.a. set -o errexit
(which is
fundamentally
flawed,
not supported and will break the library).
harden
installs a shell function that hardens a particular command by
checking its exit status against values indicating error or system failure.
Exactly what exit statuses signify an error or failure depends on the
command in question; this should be looked up in the
POSIX specification
(under "Utilities") or in the command's man
page or other documentation.
If the command fails, the function installed by harden
calls die
, so it
will reliably halt program execution, even if the failure occurred within a
subshell (for instance, in a pipe construct or command substitution).
harden
(along with use safe
) is an essential feature for robust shell
programming that current shells lack. In shell programs without modernish,
proper error checking is too inconvenient and therefore rarely done. It's often
recommended to use set -e
a.k.a set -o errexit
, but that is broken in
various strange ways (see links above) and the idea is often abandoned. So,
all too often, shell programs simply continue in an inconsistent state after a
critical error occurs, occasionally wreaking serious havoc on the system.
Modernish harden
was designed to help solve that problem properly.
Usage:
harden
[ -f
funcname ] [ -[cpXtPE]
] [ -e
testexpr ]
[ var=
value ... ] [ -u
var ... ] command_name_or_path
[ command_argument ... ]
The -f
option hardens the command as the shell function funcname instead
of defaulting to command_name_or_path as the function name. (If the latter
is a path, that's always an invalid function name, so the use of -f
is
mandatory.)
The -c
option causes command_name_or_path to be hardened and run
immediately instead of setting a shell function for later use. This option
is meant for commands that run once; it is not efficient for repeated use.
It cannot be used together with the -f
option.
The -e
option, which defaults to >0
, indicates the exit statuses
corresponding to a fatal error. It depends on the command what these are;
consult the POSIX spec and the manual pages.
The status test expression testexpr, argument
to the -e
option, is like a shell arithmetic
expression, with the binary operators ==
!=
<=
>=
<
>
turned
into unary operators referring to the exit status of the command in
question. Assignment operators are disallowed. Everything else is the same,
including &&
(logical and) and ||
(logical or) and parentheses.
Note that the expression needs to be quoted as the characters used in it
clash with shell grammar tokens.
The -X
option causes harden
to always search for and harden an external
command, even if a built-in command by that name exists.
The -E
option causes the hardening function to consider it a fatal error
if the hardened command writes anything to the standard error stream. This
option allows hardening commands (such as
bc
)
where you can't rely on the exit status to detect an error. The text written
to standard error is passed on as part of the error message printed by
die
. Note that:
-E
cannot
influence the calling shell (e.g. harden -E cd
renders cd
ineffective).-E
does not disable exit status checks; by default, any exit status greater
than zero is still considered a fatal error as well. If your command does not
even reliably return a 0 status upon success, then you may want to add -e '>125'
, limiting the exit status check to reserved values indicating errors
launching the command and signals caught.The -p
option causes harden
to search for commands using the
system default path (as obtained with getconf PATH
) as opposed to the
current $PATH
. This ensures that you're using a known-good external
command that came with your operating system. By default, the system-default
PATH search only applies to the command itself, and not to any commands that
the command may search for in turn. But if the -p
option is specified at
least twice, or if the command is a shell function (hardened under another name
using -f
), the command is run in a subshell with PATH
exported as the
default path, which is equivalent to adding a PATH=$DEFPATH
assignment
argument (see below).
Examples:
harden make # simple check for status > 0
harden -f tar '/usr/local/bin/gnutar' # id.; be sure to use this 'tar' version
harden -e '> 1' grep # for grep, status > 1 means error
harden -e '==1 || >2' gzip # 1 and >2 are errors, but 2 isn't (see manual)
As far as the shell is concerned, hardened commands are shell functions and not external or builtin commands. This essentially changes one behaviour of the shell: variable assignments preceding the command will not be local to the command as usual, but will persist after the command completes. (POSIX technically makes that behaviour optional but all current shells behave the same in POSIX mode.)
For example, this means that something like
harden -e '>1' grep
# [...]
LC_ALL=C grep regex some_ascii_file.txt
should never be done, because the meant-to-be-temporary LC_ALL
locale
assignment will persist and is likely to cause problems further on.
To solve this problem, harden
supports adding these assignments as
part of the hardening command, so instead of the above you do:
harden -e '>1' LC_ALL=C grep
# [...]
grep regex some_ascii_file.txt
With the -u
option, harden
also supports unsetting variables for the
duration of a command, e.g.:
harden -e '>1' -u LC_ALL grep
Pitfall alert: if the -u
option is used, this causes the hardened command to
run in a subshell with those variables unset, because using a subshell is the
only way to avoid altering those variables' state in the main shell. This is
usually fine, but note that a builtin command hardened with use of -u
cannot
influence the calling shell. For instance, something like harden -u LC_ALL cd
renders cd
ineffective: the working directory is only changed within the
subshell which is then immediately left.
The same happens if you harden a shell function under another name using
-f
while adding environment variable assignments (or using the -p
option, which effectively adds PATH=$DEFPATH
). The hardened function
will not be able to influence the main shell. Also note that the hardening
function will export the assigned environment variables for the duration of
that subshell, so those variables will be inherited by any external command
run from the hardened function. (While hardening shell functions using
harden
is possible, it's not really recommended and it's better to call
die
directly in your shell function upon detecting an error.)
If you're piping a command's output into another command that may close
the pipe before the first command is finished, you can use the -P
option
to allow for this:
harden -e '==1 || >2' -P gzip # also tolerate gzip being killed by SIGPIPE
gzip -dc file.txt.gz | head -n 10 # show first 10 lines of decompressed file
head
will close the pipe of gzip
input after ten lines; the operating
system kernel then kills gzip
with the PIPE signal before it's finished,
causing a particular exit status that is greater than 128. This exit status
would normally make harden
kill your entire program, which in the example
above is clearly not the desired behaviour. If the exit status caused by a
broken pipe were known, you could specifically allow for that exit status in
the status expression. The trouble is that this exit status varies depending
on the shell and the operating system. The -p
option was made to solve
this problem: it automatically detects and whitelists the correct exit
status corresponding to SIGPIPE termination on the current system.
Tolerating SIGPIPE is an option and not the default, because in many contexts it may be entirely unexpected and a symptom of a severe error if a command is killed by a broken pipe. It is up to the programmer to decide which commands should expect SIGPIPE and which shouldn't.
Tip: It could happen that the same command should expect SIGPIPE in one context but not another. You can create two hardened versions of the same command, one that tolerates SIGPIPE and one that doesn't. For example:
harden -f hardGrep -e '>1' grep # hardGrep does not tolerate being aborted
harden -f pipeGrep -e '>1' -P grep # pipeGrep for use in pipes that may break
Note: If SIGPIPE
was set to ignore by the process invoking the current
shell, the -p
option has no effect, because no process or subprocess of
the current shell can ever be killed by SIGPIPE
. However, this may cause
various other problems and you may want to refuse to let your program run
under that condition.
thisshellhas WRN_NOSIGPIPE
can help
you easily detect that condition so your program can make a decision. See
the WRN_NOSIGPIPE description for more information.
The -t
option will trace command output. Each execution of a command
hardened with -t
causes the full command line to be output to standard
error, in the following format:
[functionname]> commandline
where functionname
is the name of the shell function used to harden the
command and commandline
is the complete and actual command executed. The
commandline
is properly shell-quoted in a format suitable for re-entry
into the shell (which is an enhancement over the builtin tracing facility on
most shells). If standard error is on a terminal that supports ANSI colours,
the tracing output will be colourised.
The -t
option was added to harden
because the commands that you harden
are often the same ones you would be particularly interested in tracing. The
advantage of using harden -t
over the shell's builtin tracing facility
(set -x
or set -o xtrace
) is that the output is a lot less noisy,
especially when using a shell library such as modernish.
Note: Internally, -t
uses the shell file descriptor 9, redirecting it to
standard error (using exec 9>&2
). This allows tracing to continue to work
normally even for commands that redirect standard error to a file (which is
another enhancement over set -x
on most shells). However, this does mean
harden -t
conflicts with any other use of the file descriptor 9 in your
shell program.
If file descriptor 9 is already open before harden
is called, harden
does not attempt to override this. This means tracing may be redirected
elsewhere by doing something like exec 9>trace.out
before calling
harden
. (Note that redirecting FD 9 on the harden
command itself will
not work as it won't survive the run of the command.)
Sometimes you just want to trace the execution of some specific commands as
in harden -t
(see above) without actually hardening them against command
errors; you might prefer to do your own error handling. trace
makes this
easy. It is modernish's replacement or complement for set -x
a.k.a. set -o xtrace
.
trace
is actually a shortcut for harden -tPe'>125'
commandname. The
result is that the indicated command is automatically traced upon execution.
Other options, including -f
, -c
and environment variable assignments, are
as in harden
.
A bonus is that you still get minimal hardening against fatal system errors. Errors in the traced command itself are ignored, but your program is immediately halted with an informative error message if the traced command:
SIGPIPE
(exit status > 128, except
the shell-specific exit status for SIGPIPE
).Note: The caveat for command-local variable assignments for harden
also
applies to trace
. See
Important note on variable assignments
above.
extern
is like command
but always runs an external command, without
having to know or determine its location. This provides an easy way to
bypass a builtin, alias or function. It does the same $PATH
search
the shell normally does when running an external command. For instance, to
guarantee running external printf
just do: extern printf ...
Usage: extern
[ -p
] [ -v
] command [ argument ... ]
-p
: use the operating system's default PATH
(as determined by getconf PATH
) instead of your current $PATH
for the command search. This guarantees
a path that finds all the standard utilities defined by POSIX, akin to
command -p
but still guaranteeing an external command.
(Note that extern -p
is more reliable than command -p
because many
shell binaries don't ask the OS for the default path and have a wrong
default path hard-coded in.)-v
: don't execute command but show the full path name of the command that
would have been executed. Any extra arguments are taken as more command
paths to show, one per line. extern
exits with status 0 if all the commands
were found, 1 otherwise. This option can be combined with -p
.putln
: prints each argument on a separate line. There is no processing of
options or escape codes. (Modernish constants $CCn
, etc. can be used to insert
control characters in double-quoted strings. To process escape codes, use
printf
instead.)
put
: prints each argument separated by a space, without a trailing separator
or newline. Again, there is no processing of options or escape codes.
echo
: This command is notoriously unportable and kind of broken, so is
deprecated in favour of put
and putln
. Modernish does provide its own
version of echo
, but it is only activated if modernish is in the hashbang
path or otherwise is itself used as the shell (the "most portable" way of
running programs
explained above).
If your script runs on a specific shell and sources modernish as a dot script
(. modernish
), or if you use modernish interactively in your shell profile,
the shell-specific version of echo
is left intact. This is to make it
possible to add modernish to existing shell-specific scripts without breaking
anything, while still providing one consistent echo
for cross-shell scripts.
The modernish version of echo
, if active, does not interpret any escape codes
and supports only one option, -n
, which, like BSD echo
, suppresses the
final newline. However, unlike BSD echo
, if -n
is the only argument, it is
not interpreted as an option and the string -n
is printed instead. This makes
it safe to output arbitrary data using this version of echo
as long as it is
given as a single argument (using quoting if needed).
source
: bash/zsh-style source
command now available to all POSIX
shells, complete with optional positional parameters given as extra
arguments (which is not supported by POSIX .
).
Complete replacement for test
/[
in the form of speed-optimised shell
functions, so modernish scripts never need to use that [
botch again.
Instead of inherently ambiguous [
syntax (or the nearly-as-confusing
[[
one), these familiar shell syntax to get more functionality, including:
let
: implementation of let
as in ksh, bash and zsh, now available to all
POSIX shells. This makes C-based signed integer arithmetic evaluation
available to every supported shell, with the exception of the unary "++" and
"--" operators (which have been given the capability designation ARITHPP).
This means let
should be used for operations and tests, e.g. both
let "x=5"
and if let "x==5"; then
... are supported (note single = for
assignment, double == for comparison).
isint
: test if a given argument is a decimal, octal or hexadecimal integer
number in valid POSIX shell syntax, ignoring leading (but not trailing) spaces
and tabs.
empty: test if string is empty
identic: test if 2 strings are identical
sortsbefore: test if string 1 sorts before string 2
sortsafter: test if string 1 sorts after string 2
contains: test if string 1 contains string 2
startswith: test if string 1 starts with string 2
endswith: test if string 1 ends with string 2
match: test if string matches a glob pattern
ematch: test if string matches an extended regex
These avoid the snags with symlinks you get with [
and [[
.
By default, symlinks are not followed. Add -L
to operate on files
pointed to by symlinks instead of symlinks themselves (the -L
makes
no difference if the operands are not symlinks).
is present: test if file exists (yields true even if invalid symlink)
is -L present: test if file exists and is not an invalid symlink
is sym: test if file is symlink
is -L sym: test if file is a valid symlink
is reg: test if file is a regular file
is -L reg: test if file is regular or a symlink pointing to a regular
is dir: test if file is a directory
is -L dir: test if file is dir or symlink pointing to dir
is fifo, is -L fifo, is socket, is -L socket, is blockspecial,
is -L blockspecial, is charspecial, is -L charspecial:
same pattern, you figure it out :)
By default, symlinks are not followed. Again, add -L
to follow them.
is newer: test if file 1 is newer than file 2 (or if file 1 exists,
but file 2 doesn't)
is older: test if file 1 is older than file 2 (or if file 1 doesn't
exist, but file 2 does)
is samefile: test if file 1 and file 2 are the same file (hardlinks)
is onsamefs: test if file 1 and file 2 are on the same file system (for
non-regular, non-directory files, test the parent directory)
is -L newer, is -L older, is -L samefile, is -L onsamefs:
same as above, but after resolving symlinks
Symlinks are followed.
is nonempty: test is file exists, is not an invalid symlink, and is
not empty (also works for dirs with read permission)
is setuid: test if file has user ID bit set
is setgid: test if file has group ID bit set
is onterminal: test if file descriptor is associated with a terminal
These use a more straightforward logic than [
and [[
.
Any symlinks given are resolved, as these tests would be meaningless
for a symlink itself.
can read: test if we have read permission for a file
can write: test if we have write permission for a file or directory
(for directories, only true if traverse permission as well)
can exec: test if we have execute permission for a regular file
can traverse: test if we can enter (traverse through) a directory
The main modernish library contains functions for a few basic string manipulation operations (because they are needed by other functions in the main library). Currently these are:
toupper: convert all letters to upper case
tolower: convert all letters to lower case
If no arguments are given, toupper
and tolower
copy standard input to
standard output, converting case.
If one or more arguments are given, they are taken as variable names (note:
they should be given without the $
) and case is converted in the contents
of the specified variables, without reading input or writing output.
toupper
and tolower
try hard to use the fastest available method on the
particular shell your program is running on. They use built-in shell
functionality where available and working correctly, otherwise they fall back
on running an external utility.
Which external utility is chosen depends on whether the current locale uses
the Unicode UTF-8 character set or not. For non-UTF-8 locales, modernish
assumes the POSIX/C locale and tr
is always used. For UTF-8 locales,
modernish tries hard to find a way to correctly convert case even for
non-Latin alphabets. A few shells have this functionality built in with
typeset
. The rest need an external utility. Even in 2017, it is a real
challenge to find an external utility on an arbitrary POSIX-compliant system
that will correctly convert case for all applicable UTF-8 characters.
Modernish initialisation tries tr
, awk
, GNU awk
and GNU sed
before
giving up and declaring BUG_CNONASCII. If thisshellhas BUG_CNONASCII
, it
means modernish is in a UTF-8 locale but has not found a way to convert
Case for NON ASCII characters, so toupper
and tolower
will convert
only ASCII characters and leave any other characters in the string alone.
Small utilities that should have been part of the standard shell, but aren't. Since their implementation is inexpensive, they are part of the main library instead of a module.
mkcd
: make one or more directories, then, upon success, change into the
last-mentioned one. mkcd
inherits mkdir
's usage, so options depend on
your system's mkdir
; only the
POSIX options
are guaranteed.
use
: use a modernish module. It implements a simple Perl-like module
system with names such as 'safe', 'var/setlocal' and 'loop/select'.
These correspond to files 'safe.mm', 'var/setlocal.mm', etc. which are
dot scripts defining functionality. Any extra arguments to the use
command are passed on to the dot script unmodified, so modules can
implement option parsing to influence their initialisation.
Does IFS=''; set -f -u -C
, that is: field splitting and globbing are
disabled, variables must be defined before use, and
Essentially, this is a whole new way of shell programming, eliminating most variable quoting headaches, protects against typos in variable names wreaking havoc, and protects files from being accidentally overwritten by output redirection.
Of course, you don't get field splitting and globbing. But modernish
provides various ways of enabling one or both only for the commands
that need them, setlocal
...endlocal
blocks chief among them
(see use var/setlocal
below).
On interactive shells (or if use safe -i
is given), also loads
convenience functions fsplit
and glob
to control and inspect the
state of field splitting and globbing in a more user friendly way.
It is highly recommended that new modernish scripts start out with use safe
.
But this mode is not enabled by default because it will totally break
compatibility with shell code written for default shell settings.
These shortcut functions are alternatives for using 'let'.
inc
, dec
, mult
, div
, mod
: simple integer arithmetic shortcuts. The first
argument is a variable name. The optional second argument is an
arithmetic expression, but a sane default value is assumed (1 for inc
and dec, 2 for mult and div, 256 for mod). For instance, inc X
is
equivalent to X=$((X+1))
and mult X Y-2
is equivalent to X=$((X*(Y-2)))
.
ndiv
is like div
but with correct rounding down for negative numbers.
Standard shell integer division simply chops off any digits after the
decimal point, which has the effect of rounding down for positive numbers
and rounding up for negative numbers. ndiv
consistently rounds down.
These have the same name as their test
/[
option equivalents. Unlike
with test
, the arguments are shell integer arith expressions, which can be
anything from simple numbers to complex expressions. As with $(( ))
,
variable names are expanded to their values even without the $
.
Function: Returns successfully if:
eq <expr> <expr> the two expressions evaluate to the same number
ne <expr> <expr> the two expressions evaluate to different numbers
lt <expr> <expr> the 1st expr evaluates to a smaller number than the 2nd
le <expr> <expr> the 1st expr eval's to smaller than or equal to the 2nd
gt <expr> <expr> the 1st expr evaluates to a greater number than the 2nd
ge <expr> <expr> the 1st expr eval's to greater than or equal to the 2nd
mapr
(map records): Read delimited records from the standard input, invoking
a callback command with each input record as an argument and with up to
quantum arguments at a time. By default, an input record is one line of text.
Usage: mapr
[ -d
delimiter | -D
] [ -n
count ] [ -s count ]
[ -c quantum ] callback [ argument ... ]
Options:
-d
delimiter: Use the single character delimiter to delimit input records,
instead of the newline character.-P
: Paragraph mode. Input records are delimited by sequences consisting of
a newline plus one or more blank lines, and leading or trailing blank lines
will not result in empty records at the beginning or end of the input. Cannot
be used together with -d.-n
count: Pass at most count records as arguments to callback.
If count is 0, all records are passed.-s
count: Skip and discard the first count records read.-c
quantum: Specify the number of records read between each invocation of
callback. If -c is not supplied, the default quantum is 5000.Arguments:
mapr
. It is a fatal error for the callback command to exit
with a status > 0.mapfile
mapr
was inspired by the bash 4.x builtin command mapfile
a.k.a.
readarray
, and uses similar options, but there are important differences.
mapr
does not support assigning records directly to an array (because the
POSIX shell language does not have arrays). Instead, all handling is done
through the callback command.mapr
passes all the records as arguments to the callback command.mapr
command.-t
option to remove it).mapr
supports paragraph mode.mapr
is implemented as a shell function and awk
script.Defines a new setlocal
...endlocal
shell code block construct with
arbitrary local variables and arbitrary local shell options, as well as
safe field splitting and pathname expansion operators.
Usage: setlocal
[ localitem ... ]
[ [ --split
| --split=
characters ] [ --glob
] --
[ word ... ] ]
;
do
commands ;
endlocal
The commands are executed with the specified settings applied locally to
the setlocal
...endlocal
block.
Each localitem can be:
=
immediately followed by a value.
This renders that variable local to the block, initially either unsetting
it or assigning the value, which may be empty.-
or +
sign. This
locally turns that shell option on or off, respectively. This follows the
counterintuitive syntax of set
. Long-form shell options like -o
optionname and +o
optionname are also supported. It depends on the
shell what options are supported. Specifying a nonexistent option is a
fatal error. Use thisshellhas
to check
for a non-POSIX option's existence on the current shell before using it.The return
command exits the block, causing the global variables and
settings to be restored and resuming execution at the point immmediately
following endlocal
. This is like a shell function. In fact, internally,
setlocal
blocks are one-time shell functions that use
the stack
to save and restore variables and settings. Like any shell
function, a setlocal
block exits with the exit status of the last command
executed within it or, with the status passed on by or given as an argument to
return
.
The break
and continue
commands, when not used within a loop within the
block, also exit the block, but always with exit status 0. It's preferred to
use return
instead. Note that setlocal
creates a new loop context and
you cannot use break
or continue
to resume or break from enclosing loops
outside the setlocal
block. (Shells with
QRK_BCDANGER do in fact allow this, preventing
endlocal
from restoring the global settings! Shells without this quirk
automatically protect against this.)
Within the block, the positional parameters ($@
, $1
, etc.) are always
local. By default, a copy is inherited from outside the block. Any changes to
the positional parameters made within the block will be discarded upon
exiting it.
However, if a --
is present, the set of words after --
becomes the
positional parameters instead, after being modified by the --split
or
--glob
operators if present. The --split
operator subjects the words
to default field splitting, whereas --split=
string subjects them to
field splitting based on the characters given in string. The --glob
operator subjects them to pathname expansion. These operators do not
enable field splitting or pathname expansion within the block itself, but
only subject the words after the --
to them. If field splitting and
globbing are disabled globally, this provides a
safe way to perform field splitting or globbing without actually enabling
them for any code. To illustrate this advantage, note the difference:
# Field splitting is enabled for all unquoted expansions within the
# setlocal block, which may be unsafe, so must quote "$foo" and "$bar".
setlocal dir IFS=':'; do
for dir in $PATH; do
somestuff "$foo" "$bar"
done
endlocal
# The value of PATH is split at ':' and assigned to the positional
# parameters, without enabling field splitting within the setlocal block.
setlocal dir --split=':' -- $PATH; do
for dir do
somestuff $foo $bar
done
endlocal
Important: The --split
and --glob
operators are designed to be
used along with safe mode. If they are used in
traditional mode, i.e. with field splitting and/or pathname expansion
globally active, you must make sure the words after the --
are
properly quoted as with any other command, otherwise you will have
unexpected duplicate splitting or pathname expansion.
Other usage notes:
setlocal
declaration ends with ; do
as in a while
or
until
loop, the code block is terminated with endlocal
and not with
done
. Terminating it with done
results in a misleading shell syntax
error (end of file, or missing }
), a side effect of how setlocal
is
implemented.setlocal
blocks do not mix well with
LOCAL
(shell-native functionality for local variables), especially not on shells
with QRK_LOCALUNS
or QRK_LOCALUNS2
. Use one or the other, but not
both.BUG_FNSUBSH
on
ksh93, and an alias parsing oddity on mksh [up to
R54 2016/11/11] that triggers a spurious syntax
error), setlocal
blocks should not be used within subshells, including
command substitution subshells. There is usually not much point to this
anyway; the point of setlocal
is to have certain settings local and keep
the rest global, all without the performance hit of forking a subshell
process. (Forking new subshells within a setlocal
block is fine.)break
and continue
in ways that would cause execution to continue
outside the setlocal
block. Some shells do not allow break
and continue
to break out of a shell function (including the internal one-time shell
function employed by setlocal), so thankfully this fails on those shells.
But on others this succeeds, so global settings are not restored, wreaking
havoc on the rest of your program. One way to avoid the problem is to
envelop the entire loop in a setlocal
block. Another is to exit the
internal shell function using return 1
and then add || break
or
|| continue
immediately after endlocal
.setlocal
as pretty much the equivalent of
zsh's anonymous functions -- functionality that is hereby brought to all
POSIX shells, albeit with a rather different syntax.String manipulation functions.
trim
: strip whitespace (or other characters) from the beginning and end of
a variable's value.
replacein
: Replace leading, -t
railing or -a
ll occurrences of a string by
another string in a variable.
append
and prepend
: Append or prepend zero or more strings to a
variable, separated by a string of zero or more characters, avoiding the
hairy problem of dangling separators. Optionally shell-quote each string
before appending or prepending.
Some very common external commands ought to be standardised, but aren't. For
instance, the which
and readlink
commands have incompatible options on
various GNU and BSD variants and may be absent on other Unix-like systems.
This module provides a complete re-implementation of such basic utilities
written as modernish shell functions. Scripts that use the modernish version
of these utilities can expect to be fully cross-platform. They also have
various enhancements over the GNU and BSD originals.
readlink
: Read the target of a symbolic link. Robustly handles weird
filenames such as those containing newline characters. Stores result in the
$REPLY variable and optionally writes it on standard output. Optionally
canonicalises each path, following all symlinks encountered (for this mode,
all but the last component must exist). Optionally shell-quote each item of
output for later parsing by the shell, separating multiple items with spaces
instead of newlines.
which
: Outputs, and/or stores in the REPLY
variable, either the first
available directory path to each given command, or all available paths,
according to the current $PATH
or the system default path. Exits
successfully if at least one path was found for each command, or
unsuccessfully if none were found for any given command.
Usage: which
[ -[apqsnQ1]
] [ -P
number ] program [ program ... ]
-a
: List all executables found, not just the first one for each argument.-p
: Search the system default path, not the current $PATH
. This is the
minimal path, specified by POSIX, that is guaranteed to find all the standard
utilities.-q
: Be quiet: suppress all warnings.-s
: Silent operation: don't write output, only store it in the REPLY
variable. Suppress warnings except, if you run which -s
in a subshell,
the warning that the REPLY
variable will not survive the subshell.-n
: When writing to standard output, do not write a final newline.-Q
: Shell-quote each unit of output. Separate by spaces instead
of newlines. This generates a list of arguments in shell syntax,
guaranteed to be suitable for safe parsing by the shell, even if the
resulting pathnames should contain strange characters such as spaces or
newlines and other control characters.-1
(one): Output the results for at most one of the arguments in
descending order of preference: once a search succeeds, ignore
the rest. Suppress warnings except a subshell warning for -s
.
This is useful for finding a command that can exist under
several names, for example, in combination with
harden
:harden -P -f tar $(which -1 gnutar gtar tar)
which -1
returns successfully if any match was found.-P
: Strip the indicated number of pathname elements from the output,
starting from the right.
-P1
: strip /program
;
-P2
: strip /*/program
,
etc. This is useful for determining the installation root directory for
an installed package.A cross-platform shell implementation of 'mktemp' that aims to be just as
safe as native mktemp
(1) implementations, while avoiding the problem of
having various mutually incompatible versions and adding several unique
features of its own.
Creates one or more unique temporary files, directories or named pipes, atomically (i.e. avoiding race conditions) and with safe permissions. The path name(s) are stored in $REPLY and optionally written to stdout.
Usage: mktemp
[ -dFsQCt
] [ template ... ]
-d
: Create a directory instead of a regular file.-F
: Create a FIFO (named pipe) instead of a regular file.-s
: Silent. Store output in $REPLY
, don't write any output or message.-Q
: Shell-quote each unit of output. Separate by spaces, not newlines.-C
: Automated cleanup.
Pushes a trap
to remove the files
on exit. On an interactive shell, that's all this option does. On a
non-interactive shell, the following applies: Clean up on receiving
SIGPIPE and SIGTERM as well. On receiving SIGINT, clean up if the
option was given at least twice, otherwise notify the user of files
left. On the invocation of
die
,
clean up if the option was given at least three times, otherwise notify
the user of files left.-t
: Prefix one temporary files directory to all the templates:
$TMPDIR/
if TMPDIR
is set, /tmp/
otherwise. The templates
may not contain any slashes. If the template has neither any trailing
X
es nor a trailing dot, a dot is added before the random suffix.The template defaults to /tmp/temp.
. An suffix of random shell-safe ASCII
characters is added to the template to create the file. For compatibility with
other mktemp
implementations, any optional trailing X
characters in the
template are removed. The length of the suffix will be equal to the amount of
X
es removed, or 10, whichever is more. The longer the random suffix, the
higher the security of using mktemp
in a shared directory such as tmp
.
Since /tmp
is a world-writable directory shared by other users, for best
security it is recommended to create a private subdirectory using mktemp -d
and work within that.
Option -C
cannot be used while invoking mktemp
in a subshell, such as
in a command substitution. Modernish will detect this and treat it as a
fatal error. The reason is that a typical command substitution like
tmpfile=$(mktemp -C)
is incompatible with auto-cleanup, as the cleanup EXIT trap would be
triggered not upon exiting the program but upon exiting the command
substitution subshell that just ran mktemp
, thereby immediately undoing
the creation of the file. Instead, do something like:
mktemp -sC; tmpfile=$REPLY
A cross-platform implementation of seq
that is more powerful and versatile
than native GNU and BSD seq
(1) implementations. The core is written in
bc
, the POSIX arbitrary-presision calculator language. That means this
seq
inherits the capacity to handle numbers with a precision and size only
limited by computer memory, as well as the ability to handle input numbers
in any base from 1 to 16 and produce output in any base 1 and up.
Usage: seq
[ -w
] [ -f
format ] [ -s
string ] [ -S
scale ]
[ -B
base ] [ -b
base ] [ first [ incr ] ] last
seq
prints a sequence of arbitrary-precision floating point numbers, one
per line, from first (default 1), to as near last as possible, in increments of
incr (default 1). If first is larger than last, the default incr is -1.
-w
: Equalise width by padding with leading zeros. The longest of the
first, incr or last arguments is taken as the length that each
output number should be padded to.-f
: printf
-style floating-point format. The format string is passed on
(with an added \n
) to awk
's builtin printf
function. Because
of that, the -f
option can only be used if the output base is 10.
Note that awk
's floating point precision is limited, so very
large or long numbers will be rounded.-s
: Use string to separate numbers. Default: newline. The terminator
character remains a newline in any case (which is like GNU seq
and differs from BSD seq
).-S
: Explicitly set the scale (number of digits after decimal point).
Defaults to the largest number of digits after the decimal point
among the first, incr or last arguments.-B
: Set input and output base from 1 to 16. Defaults to 10.-b
: Set arbitrary output base from 1. Defaults to input base.
See the bc
(1) manual for more infromation on the output format
for bases greater than 16.The -S
, -B
and -b
options take shell integer numbers as operands. This
means a leading 0X
or 0x
denotes a hexadecimal number and (except on
shells with BUG_NOOCTAL) a leading 0
denotes an octal numnber.
For portability reasons, modernish seq
always uses a dot (.) for the
floating point, never a comma, regardless of the system locale. This applies
both to command arguments and to output.
The -w
, -f
and -s
options are inspired by GNU and BSD seq
, mostly
emulating GNU where they differ. The -S
, -B
and -b
options are
modernish enhancements based on bc
(1) functionality.
rev
copies the specified files to the standard output, reversing the order
of characters in every line. If no files are specified, the standard input
is read.
Usage: like rev
on Linux and BSD, which is like cat
except that -
is
a filename and does not denote standard input. No options are supported.
Functions for working with directories. So far I have:
traverse
is a fully cross-platform, robust replacement for find
without
the snags of the latter. It is not line oriented but handles all data
internally in the shell. Any weird characters in file names (including
whitespace and even newlines) "just work", provided either
use safe
is active or shell expansions are
properly quoted. This avoids many hairy
common pitfalls with find
while remaining compatible with all POSIX systems.
traverse
recursively walks through a directory, executing a command for
each file and subdirectory found. That command is usually a handler shell
function in your program.
Unlike find
, which is so smart its command line options are practically
their own programming language, traverse
is dumb: it has minimal
functionality of its own. However, with a shell function as the command,
any functionality of 'find' and anything else can be programmed in the
shell language. Flexibility is unlimited. The install.sh
script that comes
with modernish provides a good example of its practical use. See also the
traverse-test
example program.
Usage: traverse
[ -d
] [ -F
] [ -X
] directory command
traverse
calls command, once for each file found within the directory,
with one parameter containing the full pathname relative to the directory.
Any directories found within are automatically entered and traversed
recursively unless the command exits with status 1. Symlinks to
directories are not followed.
find
's -prune
functionality is implemented by testing the command's exit
status. If the command indicated exits with status 1 for a directory, this
means: do not traverse the directory in question. For other types of files,
exit status 1 is the same as 0 (success). Exit status 2 means: stop the
execution of traverse
and resume program execution. An exit status greater
than 2 indicates system failure and causes the program to abort.
find
's -depth
functionality is implemented using the -d
option. By
default, traverse
handles directories first, before their contents. The
-d
option causes depth-first traversal, so all entries in a directory will
be acted on before the directory itself. This applies recursively to
subdirectories. That means depth-first traversal is incompatible with
pruning, so returning status 1 for directories will have no effect.
find's -xdev
functionality is implemented using the -F
option. If this
is given, traverse
will not descend into directories that are on another
file system than that of the directory given in the argument.
xargs
-like functionality is implemented using the -X
option. As many
items as possible are saved up before being passed to the command all at
once. This is also incompatible with pruning. Unlike xargs
, the command is
only executed if at least one item was found for it to handle.
countfiles
: Count the files in a directory using nothing but shell
functionality, so without external commands. (It's amazing how many pitfalls
this has, so a library function is needed to do it robustly.)
Usage: countfiles
[ -s
] directory [ globpattern ... ]
Count the number of files in a directory, storing the number in REPLY
and (unless -s
is given) printing it to standard output.
If any globpatterns are given, only count the files matching them.
Utilities for working with the terminal.
readkey
: read a single character from the keyboard without echoing back to
the terminal. Buffering is done so that multiple waiting characters are read
one at a time.
Usage: readkey
[ -E
ERE ] [ -t
timeout ] [ -r
] [ varname ]
-E
: Only accept characters that match the extended regular expression
ERE (the type of RE used by grep -E
/egrep
). readkey
will silently
ignore input not matching the ERE and wait for input matching it.
-t
: Specify a timeout in seconds (one significant digit after the
decimal point). After the timeout expires, no character is read and
readkey
returns status 1.
-r
: Raw mode. Disables INTR (Ctrl+C), QUIT, and SUSP (Ctrl+Z) processing
as well as translation of carriage return (13) to linefeed (10).
The character read is stored into the variable referenced by varname,
which defaults to REPLY
if not specified.
Adds a --long
option to the getopts built-in for parsing GNU-style long
options. (Does not currently work in ash derivatives because getopts
has a function-local state in those shells. The only way out is to
re-implement getopts
completely in shell code instead of building on
the built-in. This is on the TODO list.)
Parsing of command line options for shell functions is a hairy problem.
Using getopts
in shell functions is problematic at best, and manually
written parsers are very hard to do right. That's why this module provides
generateoptionparser
, a command to generate an option parser: it takes
options specifying what variable names to use and what your function should
support, and outputs code to parse options for your shell function. Options
can be specified to require or not take arguments. Combining/stacking
options and arguments in the traditional UNIX manner is supported.
Only short (one-character) options are supported. Each option gets a corresponding variable with a name with a specified prefix, ending in the option character (hence, only option characters that are valid in variables are supported, namely, the ASCII characters A-Z, a-z, 0-9 and the underscore). If the option was not specified on the command line, the variable is set, otherwise it is set to the empty value, or, if the option requires an argument, the variable will contain that argument.
A C-style for loop akin to for (( ))
in bash/ksh/zsh, but unfortunately
not with the same syntax. For example, to count from 1 to 10:
cfor 'i=1' 'i<=10' 'i+=1'; do
echo "$i"
done
(Note that ++i
and i++
can only be used on shells with ARITHPP,
but i+=1
or i=i+1
can be used on all POSIX-compliant shells.)
A C-style for loop with arbitrary shell commands instead of arithmetic expressions. For example, to count from 1 to 10 with traditional shell commands:
sfor 'i=1' '[ "$i" -le 10 ]' 'i=$((i+1))'; do
print "$i"
done
or, with modernish commands:
sfor 'i=1' 'le i 10' 'inc i'; do
print "$i"
done
The shell lacks a very simple and basic loop construct, so this module
provides for an old-fashioned MS BASIC-style for
loop, renamed a with
loop because we can't overload the reserved shell keyword for
. Integer
arithmetic only. Usage:
with <varname>=<value> to <limit> [ step <increment> ]; do
# some commands
done
To count from 1 to 10:
with i=1 to 10; do
print "$i"
done
The value for step
defaults to 1 if limit is equal to or greater
than value, and to -1 if limit is less than value. The latter is
a slight enhancement over the original BASIC for
construct. So
counting backwards is as simple as with i=10 to 1; do
(etc).
A complete and nearly accurate reimplementation of the select
loop from
ksh, zsh and bash for POSIX shells lacking it. Modernish scripts running
on any POSIX shell can now easily use interactive menus.
(All the new loop constructs have one bug in common: as they start with
an alias that expands to two commands, you can't pipe a command's output
directly into such a loop. You have to enclose it in {
...}
as a
workaround. I have not found a way around this limitation that doesn't
involve giving up the familiar do
...done
syntax.)
This is a list of shell capabilities and bugs that modernish tests for, so
that both modernish itself and scripts can easily query the results of these
tests. The all-caps IDs below are all usable with the thisshellhas
function. This makes it easy for a cross-platform modernish script to write
optimisations taking advantage of certain non-standard shell features,
falling back to a standard method on shells without these features. On the
other hand, if universal compatibility is not a concern for your script, it
is just as easy to require certain features and exit with an error message
if they are not present, or to refuse shells with certain known bugs.
Most feature/quirk/bug tests have their own little test script in the
libexec/modernish/cap
directory. These tests are executed on demand, the
first time the capability or bug in question is queried using
thisshellhas
. An ID in ITALICS
denotes an ID for a "builtin" test,
which is always tested for at startup and doesn't have its own test script
file.
Non-standard shell capabilities currently tested for are:
LEPIPEMAIN
: execute last element of a pipe in the main shell, so that
things like somecommand | read
somevariable work. (zsh, AT&T ksh,
bash 4.2+)RANDOM
: the $RANDOM
pseudorandom generator.LINENO
: the $LINENO
variable contains the current shell script line
number.LOCAL
: function-local variables, either using the local
keyword, or
by aliasing local
to typeset
(mksh, yash).KSH88FUNC
: define ksh88-style shell functions with the 'function' keyword,
supporting dynamically scoped local variables with the 'typeset' builtin.
(mksh, bash, zsh, yash, et al)KSH93FUNC
: the same, but with static scoping for local variables. (ksh93 only)
See Q28 at the ksh93 FAQ for an explanation
of the difference.ARITHPP
: support for the ++
and --
unary operators in shell arithmetic.ARITHCMD
: standalone arithmetic evaluation using a command like
((
expression))
.ARITHFOR
: ksh93/C-style arithmetic 'for' loops of the form
for ((
exp1;
exp2;
exp3)) do
commands; done
.CESCQUOT
: Quoting with C-style escapes, like $'\n'
for newline.ADDASSIGN
: Add a string to a variable using additive assignment,
e.g. VAR+=
stringPSREPLACE
: Search and replace strings in variables using special parameter
substitutions with a syntax vaguely resembling sed.ROFUNC
: Set functions to read-only with readonly -f
. (bash, yash)DOTARG
: Dot scripts support arguments.HERESTR
: Here-strings, an abbreviated kind of here-document.TESTO
: The test
/[
builtin supports the -o
unary operator to check if
a shell option is set.PRINTFV
: The shell's printf
builtin has the -v
option to print to a variable,
which avoids forking a command substitution subshell.ANONFUNC
: zsh anonymous functions (basically the native zsh equivalent
of modernish's var/setlocal module)KSHARRAY
: ksh88-style arrays. Supported on bash, zsh (under emulate sh
),
mksh, pdksh and ksh93.KSHARASGN
: ksh93-style mass array assignment in the style of
array=(one two three)
. Supported on the same shells as KSHARRAY except pdksh.Shell quirks currently tested for are:
QRK_IFSFINAL
: in field splitting, a final non-whitespace IFS delimiter
character is counted as an empty field (yash < 2.42, zsh, pdksh). This is a QRK
(quirk), not a BUG, because POSIX is ambiguous on this.QRK_32BIT
: mksh: the shell only has 32-bit arithmetics. Since every modern
system these days supports 64-bit long integers even on 32-bit kernels, we
can now count this as a quirk.QRK_ARITHWHSP
: In yash
and FreeBSD /bin/sh, trailing whitespace from variables is not trimmed in arithmetic
expansion, causing the shell to exit with an 'invalid number' error. POSIX is silent
on the issue. The modernish isint
function (to determine if a string is a valid
integer number in shell syntax) is QRK_ARITHWHSP
compatible, tolerating only
leading whitespace.QRK_BCDANGER
: break
and continue
can affect non-enclosing loops,
even across shell function barriers (zsh, Busybox ash; older versions
of bash, dash and yash). (This is especially dangerous when using
var/setlocal
which internally uses a temporary shell function to try to protect against
breaking out of the block without restoring global parameters and settings.)QRK_EMPTPPFLD
: Unquoted $@
and $*
do not discard empty fields.
POSIX says
for both unquoted $@
and unquoted $*
that empty positional parameters
may be discarded from the expansion. AFAIK, just one shell (yash)
doesn't.QRK_EMPTPPWRD
: POSIX says
that empty "$@"
generates zero fields but empty ''
or ""
or
"$emptyvariable"
generates one empty field. But it leaves unspecified
whether something like "$@$emptyvariable"
generates zero fields or one
field. Zsh, pdksh/mksh and (d)ash generate one field, as seems logical.
But bash, AT&T ksh and yash generate zero fields, which we consider a
quirk. (See also BUG_PP_01)QRK_EVALNOOPT
: eval
does not parse options, not even --
, which makes it
incompatible with other shells: on the one hand, (d)ash does not accepteval -- "$command"
whereas on other shells this is necessary if the command
starts with a -
, or the command would be interpreted as an option to eval
.
A simple workaround is to prefix arbitrary commands with a space.
Both situations are POSIX compliant,
but since they are incompatible without a workaround,the minority situation
is labeled here as a QuiRK.QRK_EXECFNBI
: In pdksh and zsh, exec
looks up shell functions and
builtins before external commands, and if it finds one it does the
equivalent of running the function or builtin followed by exit
. This
is probably a bug in POSIX terms; exec
is supposed to launch a
program that overlays the current shell, implying the program launched by
exec
is always external to the shell. However, since the
POSIX language
is rather
vague and possibly incorrect,
this is labeled as a shell quirk instead of a shell bug.BUG_HDPARQUOT
: Double quotes within certain parameter substitutions in
here-documents aren't removed (FreeBSD sh; bosh). For instance, if
var
is set, ${var+"x"}
in a here-document yields "x"
, not x
.
POSIX considers it undefined
to use double quotes there, so they should be avoided for a script to be
fully POSIX compatible.
(Note this quirk does not apply for substitutions that remove pattens,
such as ${var#"$x"}
and ${var%"$x"}
; those are defined by POSIX
and double quotes are fine to use.)
(Note 2: single quotes produce widely varying behaviour and should never
be used within any form of parameter substitution in a here-document.)QRK_LOCALINH
: On a shell with LOCAL, local variables, when declared
without assigning a value, inherit the state of their global namesake, if
any. (dash, FreeBSD sh)QRK_LOCALSET
: On a shell with LOCAL, local variables are immediately set
to the empty value upon being declared, instead of being initially without
a value. (zsh)QRK_LOCALSET2
: Like QRK_LOCALSET
, but only if the variable by the
same name in the global/parent scope is unset. If the global variable is
set, then the local variable starts out unset. (bash 2 and 3)QRK_LOCALUNS
: On a shell with LOCAL, local variables lose their local
status when unset. Since the variable name reverts to global, this means that
unset
will not necessarily unset the variable! (yash, pdksh/mksh. Note:
this is actually a behaviour of typeset
, to which modernish aliases local
on these shells.)QRK_LOCALUNS2
: This is a more treacherous version of QRK_LOCALUNS
that
is unique to bash. The unset
command works as expected when used on a local
variable in the same scope that variable was declared in, however, it
makes local variables global again if they are unset in a subscope of that
local scope, such as a function called by the function where it is local.
(Note: since QRK_LOCALUNS2
is a special case of QRK_LOCALUNS
, modernish
will not detect both.)QRK_UNSETF
: If 'unset' is invoked without any option flag (-v or -f), and
no variable by the given name exists but a function does, the shell unsets
the function. (bash)Non-fatal shell bugs currently tested for are:
BUG_ALSUBSH
: Aliases defined within subshells leak upwards to the main shell.
(Bug found in older versions of ksh93.)BUG_APPENDC
: When set -C
(noclobber
) is active, "appending" to a nonexistent
file with >>
throws an error rather than creating the file. (zsh < 5.1)
This is a bug making use safe
less convenient to work with, as this sets
the -C
(-o noclobber
) option to reduce accidental overwriting of files.
The safe
module requires an explicit override to tolerate this bug.BUG_ARITHINIT
: In dash 0.5.9.1, using unset or empty variables in
arithmetic expressions causes the shell to error out with an "Illegal number"
error. Instead, according to POSIX, it should take them as a value of zero.
Yash (at least up to 2.44) also has a variant of this bug: it is only
triggered in a simple arithmetic expression containing a single variable name
without operators. The bug causes yash to exit silently with status 2.BUG_ARITHTYPE
: In zsh, arithmetic assignments (using let
, $(( ))
,
etc.) on unset variables assign a numerical/arithmetic type to a variable,
causing subsequent normal variable assignments to be interpreted as
arithmetic expressions and fail if they are not valid as such.BUG_BRACQUOT
: shell quoting within bracket patterns has no effect (zsh < 5.3;
ksh93) This bug means the -
retains it special meaning of 'character
range', and an initial !
(and, on some shells, ^
) retains the meaning of
negation, even in quoted strings within bracket patterns, including quoted
variables.BUG_CASECC01
: glob patterns as in 'case' cannot match an escaped ^A
($CC01
) control character. Found on: bash 2.05bBUG_CASESTAT
: The 'case' conditional construct prematurely clobbers the
exit status $?
. (found in zsh < 5.3, Busybox ash <= 1.25.0, dash <
0.5.9.1)BUG_CMDOPTEXP
: the command
builtin does not recognise options if they
result from expansions. For instance, you cannot conditionally store -p
in a variable like defaultpath
and then do command $defaultpath someCommand
. (found in zsh < 5.3)BUG_CMDPV
: command -pv
does not find builtins ({pd,m}ksh), does not
accept the -p and -v options together (zsh < 5.3) or ignores the '-p'
option altogether (bash 3.2); in any case, it's not usable to find commands
in the default system PATH.BUG_CMDSPASGN
: preceding a special builtin with 'command' does not stop
preceding invocation-local variable assignments from becoming global.
(AT&T ksh, 2010-ish versions)BUG_CMDSPEXIT
: preceding a special builtin with 'command' does not stop
it from exiting the shell if the builtin encounters error.
(zsh < 5.2; mksh < R50e)BUG_CMDVRESV
: 'command -v' does not find reserved words such as "if".
(pdksh, mksh). This necessitates a workaround version of thisshellhas().BUG_CNONASCII
: the modernish functions toupper
and tolower
cannot
convert non-ASCII letters to upper or lower case -- e.g. accented Latin
letters, Greek, cyrillic. (Note: modernish falls back to the external
tr
, awk
, gawk
or GNU sed
command if the shell can't convert non-ASCII
(or any) characters, so this bug is only detected if none of these external
commands can convert them. But if the shell can, then this bug is not
detected even if the external commands cannot. The thing to take away from
all this is that the result of thisshellhas BUG_CNONASCII
only applies
to the modernish toupper
and tolower
functions and not to your shell or
any external command in particular.)BUG_CSCMTQUOT
: unbalanced single and double quotes and backticks in comments
within command substitutions cause obscure and hard-to-trace syntax errors
later on in the script. (ksh88; pdksh, incl. {Open,Net}BSD ksh; bash 2.05b)BUG_CSNHDBKSL
: Backslashes within non-expanding here-documents within
command substitutions are incorrectly expanded to perform newline joining,
as opposed to left intact. (bash <= 4.4, and pdksh)BUG_DOLRCSUB
: parsing problem where, inside a command substitution of
the form $(...)
, the sequence $$'...'
is treated as $'...'
(i.e. as
a use of CESCQUOT), and $$"..."
as $"..."
(bash-specific translatable
string). (Found in bash up to 4.4)BUG_EMPTYBRE
is a case
pattern matching bug in zsh < 5.0.8: empty
bracket expressions eat subsequent shell grammar, producing unexpected
results. This is particularly bad if you want to pass a bracket
expression using a variable or parameter, and that variable or parameter
could be empty. This means the grammar parsing depends on the contents
of the variable!BUG_EVALCOBR
: break
and continue
do not work if they are within
eval
, wrongly causing loop execution to continue.
(pdksh; mksh < R55 2017/04/12)BUG_FNREDIRP
: I/O redirections on function definitions are forgotten if the
function is called as part of a pipeline with at least one |
. (bash 2.05b)BUG_FNSUBSH
: Function definitions within subshells (including command
substitutions) are ignored if a function by the same name exists in the
main shell, so the wrong function is executed. unset -f
is also silently
ignored. ksh93 (all current versions as of June 2015) has this bug.BUG_HASHVAR
: On zsh, $#var
means the length of $var
- other shells and
POSIX require braces, as in ${#var}
. This causes interesting bugs when
combining $#
, being the number of positional parameters, with other
strings. For example, in arithmetics: $(($#-1))
, instead of the number of
positional parameters minus one, is interpreted as ${#-}
concatenated with
1
. So, for zsh compatibility, always use ${#}
instead of $#
unless it's
stand-alone or followed by a space.BUG_IFSGLOBC
: In glob pattern matching (such as in case
and [[
), if a
wildcard character is part of IFS
, it is matched literally instead of as a
matching character. This applies to glob characters *
, ?
, [
and ]
.
Since nearly all modernish functions use case
for argument validation and
other purposes, nearly every modernish function breaks on shells with this
bug if IFS contains any of these three characters!
(Found in bash < 4.4)BUG_IFSGLOBP
: In pathname expansion (filename globbing), if a
wildcard character is part of IFS
, it is matched literally instead of as a
matching character. This applies to glob characters *
, ?
, [
and ]
.
(Bug found in bash, all versions up to at least 4.4)BUG_IFSGLOBS
: in glob pattern matching (as in case
or paramter
substitution with #
and %
), if IFS
starts with ?
or *
and the
"$*"
parameter expansion inserts any IFS separator characters, those
characters are erroneously interpreted as wildcards when quoted "$*" is
used as the glob pattern. (AT&T ksh93)BUG_IFSISSET
: AT&T ksh93 (recent versions): ${IFS+s}
always yields 's'
even if IFS is unset. This applies to IFS only.BUG_ISSETLOOP
: AT&T ksh93: Expansions like ${var+set}
and
${var+:nonempty)
remain static when used within a for
, while
or
until
loop; the expansions don't change along with the state of the
variable, so they cannot be used to check whether a variable is set
and/or empty within a loop if the state of that variable may change
in the course of the loop.BUG_KUNSETIFS
: ksh93: Can't unset IFS
under very specific
circumstances. unset -v IFS
is a known POSIX shell idiom to activate
default field splitting. With this bug, the unset
builtin silently fails
to unset IFS (i.e. fails to activate field splitting) if we're executing
an eval
or a trap and a number of specific conditions are met. See
BUG_KUNSETIFS.t
for more information.BUG_LNNOALIAS
: The shell has LINENO, but $LINENO is always expanded to 0
when used within an alias. (pdksh variants, including mksh and oksh)BUG_LNNOEVAL
: The shell has LINENO, but $LINENO is always expanded to 0
when used in 'eval'. (pdksh variants, including mksh and oksh)BUG_MULTIBIFS
: We're on a UTF-8 locale and the shell supports UTF-8
characters in general (i.e. we don't have BUG_MULTIBYTE
) -- however, using
multibyte characters as IFS
field delimiters still doesn't work. For
example, "$*"
joins positional parameters on the first byte of $IFS
instead of the first character. (ksh93, mksh, FreeBSD sh, Busybox ash)BUG_MULTIBYTE
: We're in a UTF-8 locale but the shell does not have
multi-byte/variable-length character support. (Non-UTF-8 variable-length
locales are not yet supported.) Dash is a recent shell with this bug.BUG_NOCHCLASS
: POSIX-mandated character [:
classes:]
within bracket
[
expressions]
are not supported in glob patterns. (pdksh, mksh, and
family)BUG_NOOCTAL
: Shell arithmetic does interpret numbers with leading
zeroes as octal numbers; these are interpreted as decimal instead,
though POSIX specifies octal. (older mksh, 2013-ish versions)BUG_NOUNSETEX
: Cannot assign export attribute to variables in an unset
state; exporting a variable immediately sets it to the empty value.
(zsh < 5.3)BUG_NOUNSETRO
: Cannot freeze variables as readonly in an unset state.
This bug in zsh < 5.0.8 makes the readonly
command set them to the
empty string instead.BUG_OPTNOLOG
: on dash, setting -o nolog
causes $-
to wreak havoc:
trying to expand $-
silently aborts parsing of an entire argument,
so e.g. "one,$-,two"
yields "one,"
. (Same applies to -o debug
.)BUG_PARONEARG
: When IFS
is empty on bash 3.x and 4.x (i.e. field
splitting is off), ${1+"$@"}
is counted as a single argument instead
of each positional parameter as separate arguments. To avoid this bug,
simply use "$@"
instead. (${1+"$@"}
is an obsolete workaround for
a fatal shell bug, FTL_UPP
.)BUG_PFRPAD
: Negative padding value for strings in the printf
builtin
does not cause blank padding on the right-hand side, but inserts blank
padding on the left-hand side as if the value were positive, e.g.
printf '[%-4s]' hi
outputs [ hi]
, not [hi ]
. (zsh 5.0.8)BUG_PP_01
: POSIX says
that empty "$@"
generates zero fields but empty ''
or ""
or
"$emptyvariable"
generates one empty field. This means concatenating
"$@"
with one or more other, separately quoted, empty strings (like
"$@""$emptyvariable"
) should still produce one empty field. But on
bash 3.x, this erroneously produces zero fields. (See also QRK_EMPTPPWRD)BUG_PP_02
: Like BUG_PP_01
, but with unquoted $@
and only
with "$emptyvariable"$@
, not $@"$emptyvariable"
. (pdksh)BUG_PP_03
: When IFS is unset or empty (zsh 5.3.1) or empty (pdksh),
assigning var=$*
only assigns the first field, failing to join and
discarding the rest of the fields. Workaround: var="$*"
(POSIX leaves var=$@
, etc. undefined, so we don't test for those.)BUG_PP_03A
: When IFS is unset, assignments like var=$*
incorrectly remove leading and trailing spaces (but not tabs or
newlines) from the result. Workaround: quote the expansion. Found on:
bash 4.3 and 4.4.BUG_PP_03B
: When IFS is unset, assignments like var=${var+$*}
,
etc. incorrectly remove leading and trailing spaces (but not tabs or
newlines) from the result. Workaround: quote the expansion. Found on:
bash 4.3 and 4.4.BUG_PP_03C
: When IFS
is unset, assigning var=${var-$*}
only assigns
the first field, failing to join and discarding the rest of the fields.
(zsh 5.3, 5.3.1) Workaround: var=${var-"$*"}
BUG_PP_04
: Assigning the positional parameters to a variable using
a conditional assignment within a parameter substitution, such as
: ${var=$*}, discards everything but the last field if IFS is empty.
(pdksh, mksh)BUG_PP_04_S
: When IFS is null (empty), the result of a substitution
like ${var=$*}
is incorrectly field-split on spaces. The difference
with BUG_PP_04 is that the assignment itself succeeds normally.
Found on: bash 4.2, 4.3BUG_PP_04A
: Like BUG_PP_03A, but for conditional assignments within
parameter substitutions, as in : ${var=$*}
or : ${var:=$*}
.
Workaround: quote either $*
within the expansion or the expansion
itself. Found on: bash 2.05b through 4.4.BUG_PP_04B
: When assigning the positional parameters ($*) to a variable
using a conditional assignment within a parameter substitution, e.g.
: ${var:=$*}
, the fields are always joined and separated by spaces,
regardless of the content or state of IFS. Workaround as in BUG_PP_04A.
(bash 2.05b)BUG_PP_04C
: In e.g. : ${var:=$*}
, the expansion incorrectly generates
multiple fields. POSIX says the expansion (before field splitting) shall
generate the result of the assignment, i.e. 1 field. Workaround: same.
(mksh R50)BUG_PP_05
: POSIX says
that empty $@
generates zero fields, but with null IFS, empty unquoted
$@
yields one empty field. Found on: dash 0.5.9.1BUG_PP_06
: POSIX says
that unquoted $@
initially generates as many fields as there are
positional parameters, and then (because $@
is unquoted) each field is
split further according to IFS
. With this bug, the latter step is not
done. Found on: zsh < 5.3BUG_PP_06A
: POSIX says
that unquoted $@
and $*
initially generate as many fields as there are
positional parameters, and then (because $@
or $*
is unquoted) each field is
split further according to IFS
. With this bug, the latter step is not
done if IFS
is unset (i.e. default split). Found on: zsh < 5.4BUG_PP_07
: unquoted $*
and $@
(including in substitutions like
${1+$@}
or ${var-$*}
) do not perform default field splitting if
IFS
is unset. Found on: zsh (up to 5.3.1) in sh modeBUG_PP_07A
: When IFS
is unset, unquoted $*
undergoes word splitting
as if IFS=' '
, and not the expected IFS=" ${CCt}${CCn}"
.
Found on: bash 4.4BUG_PP_08
: When IFS
is empty, unquoted $@
and $*
do not generate
one field for each positional parameter as expected, but instead join
them into a single field without a separator. Found on: yash < 2.44BUG_PP_08B
: When IFS
is empty, unquoted $*
within a substitution (e.g.
${1+$*}
or ${var-$*}
) does not generate one field for each positional
parameter as expected, but instead joins them into a single field without
a separator. Found on: bash 3 and 4BUG_PP_09
: When IFS
is non-empty but does not contain a space,
unquoted $*
within a substitution (e.g. ${1+$*}
or ${var-$*}
) does
not generate one field for each positional parameter as expected,
but instead joins them into a single field separated by spaces
(even though, as said, IFS does not contain a space).
Found on: bash 2BUG_PP_10
: When IFS
is null (empty), assigning var=$*
removes any
$CC01
(^A) and $CC7F
(DEL) characters. (bash 3, 4)BUG_PP_10A
: When IFS
is non-empty, assigning var=$*
prefixes each
$CC01
(^A) and $CC7F
(DEL) character with a $CC01
character. (bash 4.4)BUG_PSUBBKSL1
: A backslash-escaped }
character within a quoted parameter
substitution is not unescaped. (bash 2 & 3, standard dash, Busybox ash)BUG_PSUBPAREN
: Parameter substitutions where the word to substitute contains
parentheses wrongly cause a "bad substitution" error. (pdksh)BUG_PSUBSQUOT
: in pattern matching parameter substitutions
(${param#pattern}
, ${param%pattern}
, ${param##pattern}
and
${param%%pattern}
), if the whole parameter substitution is quoted with
double quotes, then single quotes in the pattern are not parsed. POSIX
says
they are to keep their special meaning, so that glob characters may
be quoted. For example: x=foobar; echo "${x#'foo'}"
should yield bar
but with this bug yields foobar
. (dash; Busybox ash)BUG_READTWHSP
: read
does not trim trailing IFS whitespace if there
is more than one field. (dash 0.5.8)BUG_REDIRIO
: the I/O redirection operator <>
(open a file descriptor
for both read and write) defaults to opening standard output (i.e. is
short for 1<>
) instead of defaulting to opening standard input (0<>
) as
POSIX specifies.
(AT&T ksh93)BUG_SELECTEOF
: in a shell-native 'select' loop, the REPLY variable
is not cleared if the user presses Ctrl-D to exit the loop. (zsh)BUG_SELECTRPL
: in a shell-native 'select' loop, input that is not a menu
item is not stored in the REPLY variable as it should be. (mksh R50 2014)BUG_TESTERR0
: mksh: test
/[
exits successfully (exit status 0) if
an invalid argument is given to an operator. (mksh R52 fixes this)BUG_TESTERR1A
: AT&T ksh: test
/[
exits with a non-error 'false' status
(1) if an invalid argument is given to an operator.BUG_TESTERR1B
: zsh: test
/[
exits with status 1 (false) if there are
too few or too many arguments, instead of a status > 1 as it should do.BUG_TESTILNUM
: On dash (up to 0.5.8), giving an illegal number to test -t
or [ -t
causes some kind of corruption so the next test
/[
invocation
fails with an "unexpected operator" error even if it's legit.BUG_TESTONEG
: The test
/[
builtin supports a -o
unary operator to
check if a shell option is set, but it ignores the no
prefix on shell
option names, so something like [ -o noclobber ]
gives a false positive.
Bug found on yash up to 2.43. (The TESTO
feature test implicitly checks
against this bug and won't detect the feature if the bug is found.)BUG_TESTPAREN
: Incorrect exit status of test -n
/-z
with values (
,
)
or !
in zsh 5.0.6 and 5.0.7. This can make scripts that process
arbitrary data (e.g. the shellquote function) take the wrong action unless
workarounds are implemented or modernish equivalents are used instead.
Also, spurious error message with both test -n
and test -z
.BUG_TESTRMPAR
: zsh: in binary operators with test
/[
, if the first
argument starts with (
and the last with `)', both the first and the
last argument are completely removed, leaving only the operator, and the
result of the operation is incorrectly true because the operator is
incorrectly parsed as a non-empty string. This applies to any operator.Warning IDs do not identify any characteristic of the shell, but instead warn about a potentially problematic system condition that was detected at initalisation time.
WRN_NOSIGPIPE
: Modernish has detected that the process that launched
the current program has set SIGPIPE
to ignore, an irreversible condition
that is in turn inherited by any process started by the current shell, and
their subprocesses, and so on. This makes it impossible to detect
$SIGPIPESTATUS
;
it is set to the special
value 99999 which is impossible as an exit status. But it also makes it
irrelevant what that status is, because neither the current shell nor any
process it spawns is now capable of receiving SIGPIPE
. The
-P
option to harden
is also rendered irrelevant. Note that a command such as yes | head -n 10
now never ends; the only way yes
would ever stop trying to write
lines is by receiving SIGPIPE
from head
, which is being ignored.
Programs that use commands in this fashion should check if thisshellhas WRN_NOSIGPIPE
and either employ workarounds or refuse to run if so.Modernish comes with a suite of regression tests to detect bugs in modernish
itself, which can be run using modernish --test
after installation. By
default, it will run all the tests verbosely but without tracing the command
execution. The install.sh
installer will run the suite quietly on the
selected shell before installation.
A few options are available to specify after --test
:
-q
: quieter operation; report expected fails [known shell bugs]
and unexpected fails [bugs in modernish]). Add -q
again for
quietest operation (report unexpected fails only).-s
: entirely silent operation.-x
: trace each test using the shell's xtrace
facility. Each trace is
stored in a separate file in a specially created temporary directory. By
default, the trace is deleted if a test does not produce an unexpected
fail. Add -x
again to keep all traces. If any traces were saved,
modernish will tell you the location of the temporary directory at the
end, otherwise it will silently remove the directory again.These short options can be combined so, for example,
--test -qxx
is the same as --test -q -x -x
.
Note the difference between these regression tests and the tests listed
above in Appendix A. The latter are tests for
whatever shell is executing modernish: they test for capabilities (features,
quirks, bugs) of the current shell. They are meant to be run via
thisshellhas
and are designed to be
taken advantage of in scripts. On the other hand, these tests run by
modernish --test
are regression tests for modernish itself. It does not
make sense to use these in a script.
New/unknown shell bugs can still cause modernish regression tests to fail, of course. That's why some of the regression tests also check for consistency with the results of the feature/quirk/bug tests: if there is a shell bug in a widespread release version that modernish doesn't know about yet, this in turn is considered to be a bug in modernish, because one of its goals is to know about all the shell bugs in all released shell versions currently seeing significant use.
The testshells.sh
program in share/doc/modernish/examples
can be used to
run the regression test suite on all the shells installed on your system.
You could put it as testshells
in some convenient location in your
$PATH
, and then simply run:
testshells modernish --test
(adding any further options you like -- for instance, you might like to add
-q
to avoid very long terminal output). On first run, testshells
will
generate a list of shells it can find on your system and it will give you a
chance to edit it before proceeding.
EOF