These days I find myself searching for a better shell. Before I dive into what I want specifically, I would first like to spend a minute complaining about bash and explaining why the existing options aren't good enough. If you get bored, skip to What I Want.

Bash (or: Bourne-Again Shell)

First released in 1989, bash is the most common shell on Linux and OSX systems. For some linux users, this will be all they know. It's a more full featured version of sh (the Bourne Shell).

The good

Ubiquity. Bash is the default on MacOS and every version of Linux that I'm aware of. A standard shell means that you are able to write scripts that run the same everywhere you need them to.

Simplicity. Writing bash scripts is very straightforward. If you need to run 3 commands in a batch, just put them all in a file, each on a line, and run the script.

Metaprogramming is easy(ish). You can store commands in a variable, and then just run the variable as a command. This can make your program more flexible. (It can also make your code more confusing. And it's not always easy. This is covered below).

The bad

Clumsy error handling. If you write code, you need to deal with the fact that some commands are going to fail. By default bash will shrug its shoulders and carries on executing the next commands. This is rarely desirable in any sort of program (which is what a shell script is). I know that you can execute set -e to opt out of this functionality, but now that's extra boilerplate that I need to remember for each of my scripts.

Variable handling. Again, the default is clumsy. If you reference $foobar in one of your scripts, and it doesn't exist, bash will just substitute it with an empty string and continue. And once again this is rarely desirable. If I reference a variable that doesn't exist, that's a bug in the program. Crash the program for goodness sakes. don't just carry on executing with bad program state (I'm also looking at you, PHP). I know you can use set -u to disable this, but again, boilerplate. And if you do enable -u, the syntax to set a default value is foobar=${1:-}. Yeah, it doesn't seem intuitive to me either.

Variables are just text, you can run variables as commands. This means you can do stuff like this (contrived example):

if [ -f config.json ]
then
    command="echo found config file"
else
    command="exit 1"
fi
$command

But it also means that you need to be really careful with correctly quoting your strings. If you do output_file="~/Downloads/file.html https://bad-host.com/exploit.html --referer", then your call to curl -o $output_file https://safe-host.com suddenly becomes a lot more interesting. In order to do this safely, you need to wrap your variable in quotes to ensure that the bad stuff is treated like regular text: curl -o "$output_file" https://safe-host.com. So your variables are vulnerable by default, and you need to do extra work to make them safe.

Arrays. Arrays behave really weirdly in bash. I can't even tell you all the ways in which they're weird because I honestly haven't figured them out yet. For example, let's pretend that you have a bash script which runs a docker container. The command you want is docker run -d <container>. But you also want to read some environment variables out of a config file and pass extra options to the docker command. The naive solution would look like this:

options=$(bash ./read-options.sh)
# let's say this returns "-e DEV=1 -e ADMINS='wendy john'"
docker run -d \
    ${options[@]} \
    <container>

First, yes, the way to "explode" the array and insert each of its items into the command is ${array[@]} (or ${array[*]}. Apparently this handles whitespace slightly differently). This is a weird syntax, but whatever, I can deal with that. The problem is that the spaces in ADMINS='wendy john' will break the command. It will be evaluated as:

docker run -d -e DEV=1 -e ADMINS=wendy john <container>

You'll notice that john is treated as a standalone argument.  But if we escape it with "${array[@]}", then we get this other equally incorrect result:

docker run -d "-e DEV=1 -e ADMINS=wendy john" <container>

The working solution I ended up finding looks like this:

unset options
while IFS= read -r -d '' arg
do
    options+=("$arg")
done < <(bash ./read-options.sh)
docker run -d \
    "${options[@]}" \
    <container>

The less said about that, the better. (Also, yes I know about docker-compose and  --env-file, the point is still valid.)

Bash has weird environment flag conventions. To enable command logging, you need to run set -x xtrace. And to disable it, set +x xtrace. This is backwards.

Let's say you want to start a builder process and a web server process in parallel:

# start the webserver in the background
python -m http.server &
# save the process we just started
webserver_pid=$!
# and ensure that we stop it before we exit the program
trap "kill $webserver_pid" EXIT

builder_command --watch
# ^ and this will run until the user presses ^C

This is doable, but a bit verbose. But if you want to restart the webserver if it crashes while in the background, you can't. Or at least I don't know how to do it.

Also, the shell is a bit simple by default. You can enable lots of extra functionality (like command autocomplete, git repository status, etc), but you still need to enable it all (on each system where you want to use it).

Zsh?

Zsh has better a better user interface than bash in the form of autocomplete and color highlighting (configuration needed). Zsh also has Oh My Zsh, which adds a lot of easy plugins and better default functionality.

However, zsh is not the standard, so you will need to install and configure it on each system you use. This is extra bootstrapping time.

Zsh also suffers from most of the same problems as bash, since it's built under the same process model. So it's not a clear winner of a replacement. It's basically just a slightly nicer bash.

Fish Shell?

The fish shell has a much better default UI than bash. Autocomplete, syntax highlighting, and easy persistant variable management come out of the box.

Fish has excellent autocomplete support. It parses your man pages so it can give you flag suggestions and descriptions when you press tab.

Fish comes with a web based config viewer/editor.

This is pretty cool

Fish does, unfortunately, break convention. You can't run ENV=dev ./program. Instead you need to run env ENV=dev ./program. $? is now $status. <ctrl-r> then type search is now type search then <up>. The changes usually make sense, but the transition is a bit jarring.

Fish has slightly better handling of strings (no splitting on spaces by default), but other than that the substitution model seems mostly the same.

Compared with bash and zsh, fish is a clear winner. But it's not perfect.

Xonsh?

Disclaimer: I haven't actually tried this one yet.

Xonsh is a very promising candidate for The Shell that Solves All My Problems™. First, it is a superset of Python. Which means that all valid python code is automatically valid xonsh code. It also means that all the nice python features are available if you need to dip into real programming features. And on top of that, you can still run simple commands like ls -l.

Xonsh also allows you to capture subprocesses as variables and reference their streams directly. This could allow for the kind or process orchestration I'm looking for.

It does seems like there are some cases where it's ambigious whether a statement should be evaluated in "shell mode" or "python mode". I'm not a fan of code where the execution path isn't obvious, so this could be a deal breaker. It's also possible that this is a weird edge case that is never really encountered. Only a real test will tell.

Python? (or Ruby/Perl/JS/etc...)

I could use a regular programming language (and often do for scripts). But a programming language makes a poor shell. The whole reason shells exist is because they provide a more comfortable environment for navigating the filesystem and starting programs. Compare the following examples in bash and python:

for file in $(find . -t f); do
    # something with $file
done
import os
for root, dirs, files in os.walk("."):
    for file in files:
        # do something with file

This kind of extra friction builds up when you spend most of your day in and around a terminal.

What I want

At the end of it all, what do I actually want out of a shell that the current options aren't satisfying?

I want sane variables. I don't want to evaluate strings as commands unless I specifically ask for it. I want strings that don't unescape when inserted into a new command. I also want real arrays without magic behaviour.

I want to be able to  construct commands dynamically, and execute them safely.

I don't want a shell that blunders on when an error happens (at any level).

I want to be able to be able to treat subprocesses as first class entities in the shell, not just text output or process IDs. I want to be able to pause/resume subprocesses, push them to the background, and join them back to the foreground. I want to be able to handle errors in background processes. I even want to be able to pipe the output of one process into two others.

I want a real module system so I can re-use code. I don't want this to be a C or PHP style text dump import. I want a JS style import {a, b, c} from 'somewhere'. A real, easy module system would ease a lot of the pain of pretty much every existing shell system, because I would be able to write my own abstractions around the difficult bits.

Maybe xonsh will be a good fit, maybe I will need to try making my own. I guess we'll see.