Shell Assignment
This shell assignment is inspired by the one by Bryant and O’Hallaron for Computer Systems: A Programmer’s Perspective, Third Edition
Due: Friday, November 10, 11:59pm
For this assignment, you will implement a simple shell-scripting language, whoosh. The whoosh language is not entirely unlike bash, but whoosh is intended exclusively for batch mode. The starting code includes the language parser and an initial evaluation framework that works for a single command. You will change the initial evaluation so that it supports running multiple processes, in some cases capturing output to a variable or sending a variable’s content as input.
Example Scripts
Before describing the syntax and semantics of whoosh scripts, we provide examples to illustrate the main ideas.
/bin/ls -l
Lists the content of the current directory in long form.
Unlike any of the other examples, this example already works in the starting code for whoosh.
# This is a comment
/bin/ls -l
/bin/date
Lists the content of the current directory in long form and then reports the current time and date.
repeat 10 /bin/echo hello
Prints hello 10 times.
/bin/ls -l => $files
$files => /usr/bin/wc -l
Copies the current directory’s listing to a variable, and then sends that variable to program that counts lines. The result is the same as /bin/ls -l | /usr/bin/wc -l in bash—
as long as the directory does not have too many files, because a variable like $files in whoosh holds only up to 4096 bytes. /usr/bin/curl www.google.com || /usr/bin/curl www.bing.com
Gets the web page at www.bing.com or www.google.com, stopping when one of them is completely received and printed. Part or all of both may be printed.
/usr/bin/curl www.google.com && /usr/bin/curl www.bing.com
Gets the web pages at www.bing.com and www.google.com, stopping when both of them have been completely received and printed. The outputs may be interleaved, since the pages are downloaded at the same time.
/bin/bash -c "echo hi; exit 0" => $output
/bin/echo $output
Prints hi, since the bash process exits with 0, which means that its output is captured to $output.
/bin/bash -c "echo hi; exit 42" => $output
/bin/echo $output
Prints 42, since the bash process exits with failure, which means that $output holds the exit status.
/bin/bash -c "echo hi; kill $$" => $output
/bin/echo $output
Prints -15, because kill $$ causes bash to send a termination signal to itself. Since the bash process terminated with a signal, the negated signal number is written to $output, and 15 is the number for the termination signal.
/bin/echo hi @ $echoPid
/bin/echo $echoPid
Prints hi followed by the process ID used to run the first echo process (although that process ID is of no use, since echo has terminated).
/bin/sleep 1000 @ $sleepPid && /bin/kill $sleepPid
Ends quickly, because the sleep process is terminated by kill.
/bin/sleep 1000
/bin/sleep 1000
Prints nothing for at least 33 minutes and 20 seconds. Hitting Ctl-C once reduces the time to 16 minutes and 40 seconds. Hitting Ctl-C twice can end the script quickly.
(If you don’t implement Ctl-C behavior, then hitting Ctl-C once will likely terminate the script immediately.)
/bin/bash -c "sleep 3; echo 1" => $patience
/bin/echo "Patience level =" $patience
Prints Patience level = 1 if you’re patient enough, or Patience level = -15 if you hit Ctl-C within 3 seconds.
whoosh Script Syntax
This grammar is described using Backus-Naur Form (BNF), which is a style of context-free grammar that you’ll see used for most any programming-language definition. Text with a gray background, such as repeat, indicates characters that appear verbatim in a script. Text in angle brackets, such as ‹command›, is a non-terminal that refers to a grammar production. Each |-separated line is an alternative. A * on a non-terminal means zero or more repetitions of the terminal, and a ? means that the non-terminal is optional. Non-linebreaking whitespace is implicitly allowed between grammar elements.
You don’t have to parse whoosh scripts, since a complete parser is provided with the starting code, but you will need to understand the syntax of whoosh scripts.
A whoosh script can contain blank lines or lines that start immediately with #, and those lines are ignored. Any other line must have the form of a ‹group›:
| ‹group› | ::= | ‹commands› |
|
| | | repeat ‹n› ‹commands› |
That is, a ‹group› is a ‹commands› optionally prefixed with repeat ‹n›.
The ‹commands› for a ‹group› is a single command or multiple commands to run in “and” mode or “or” mode. Multiple commands in “and” mode are grouped with &&, and multiple commands in “or” mode are grouped with ||:
| ‹commands› | ::= | ‹command› |
|
| | | ‹and-commands› |
|
| | | ‹or-commands› |
| ‹and-commands› | ::= | ‹command› |
|
| | | ‹command› && ‹and-commands› |
| ‹or-commands› | ::= | ‹command› |
|
| | | ‹command› || ‹or-commands› |
A single ‹command› could also parse as ‹or-commands› or ‹and-commands›. It turns out that all three interpretations behave the same way, while the parser reports a single ‹command› as a special case.
A ‹command› can be a ‹simple-command›, which is much like a ‹command› in any shell language: the path of an executable file (as an absolute path or relative to the current directory) followed by arguments to the executable:
| ‹simple-command› | ::= | ‹executable› ‹argument›* |
More generally, a ‹command› can start with a variable followed by => to supply the command’s input, it can include => followed by a variable to receive the command’s output, and it can end in @ ‹variable› to receive the command’s process ID:
| ‹command› | ::= | ‹in-variable›? ‹simple-command› ‹out-variable›? ‹at-variable›? |
| ‹in-variable› | ::= | ‹variable› => |
| ‹out-variable› | ::= | => ‹variable› |
| ‹at-variable› | ::= | @ ‹variable› |
An ‹executable› or ‹argument› can be a ‹literal›, such as /bin/ls or -l, where " acts as an escape to allow arbitrary ASCII characters (other than " itself) until a closing ". Instead of a ‹literal›, an ‹argument› can be a ‹variable›, which always starts $.
| ‹executable› | ::= | ‹literal› |
| ‹argument› | ::= | ‹literal› |
|
| | | ‹variable› |
| ‹literal› | ::= | sequence of characters a-z, A-Z, 0-9, ., :, _, -, =, and/or / and/or other characters between matching "s |
| ‹variable› | ::= | $ followed by a sequence of characters a-z, A-Z, and/or 0-9 |
whoosh Script Semantics
Each ‹command› in a whoosh program starts a process in the usual way. When a ‹group› contains multiple ‹command›s, the corresponding processes are all started at once whether in “and” mode or “or” mode.
Each ‹group› in a whoosh program runs to completion before the next group is started. The definition of “completion” depends on the ‹group› form:
A ‹command› completes when the single process for the ‹command› terminates, either by return/exit or by a signal.
An ‹and-commands› completes when all of the ‹command›s complete.
An ‹or-commands› completes when any one of the ‹command›s completes (in the same sense as a ‹command› by itself). As soon as one command completes, processes for not-yet-completed commands are terminated using SIGTERM followed by SIGCONT. For the purposes of this assignment, assume that SIGTERM followed by SIGCONT will always terminate a process.
If whoosh receives SIGINT, such as when Ctl-C is pressed, then it immediately terminates all processes for the current ‹group› using SIGTERM plus SIGCONT and moves on to the next ‹group› (if any). Meanwhile, if a command process sends a signal to all processes in its group, the signal should not affect whoosh.
A ‹group› that starts repeat ‹n› is the same as ‹n› lines that contain the ‹group› without the repeat ‹n› prefix.
If a ‹command› starts with ‹variable› =>, then the string value of ‹variable› plus a newline character is copied to the command’s standard input. The value of ‹variable› will always be at most 4096 bytes. Otherwise, the command uses the same standard input stream as the whoosh process.
If a ‹command› has => ‹variable› after the
‹simple-command› part, then the command’s standard output is
captured and copied into the ‹variable› as a string—
If a ‹command› ends with @ ‹variable›, then ‹variable› will be set to the command’s process ID. If the ‹command› is part of an “and” or “or” group, then ‹variable› is set before the variable’s value is used (as input or as an argument) for any later ‹command› in the same group.
Every ‹variable› used by a script is initialized to 0 when the script starts.
Implementation
The shlab-handout.zip archives provides an initial woosh implementation that works for the simplest example above. Your job is to change "whoosh.c" to implement the rest of the whoosh functionality. Within "whoosh.c", you can add functions, change function signatures, or whatever to implement new functionality.
You will handin a single file, "whoosh.c", which must use only ANSI standard C syntax, standard C libraries, Linux system libraries, and the "csapp.c" wrapper functions.
You will need to read "ast.h" to know how the whoosh parser represents programs, but you will not need to modify or understand "parse.c". Feel free to use the fail function provided by "fail.c".
Note that the script_command structure in "ast.h" includes an extra_data field. You can use that pointer field to store any extra information with a command that you find useful.
Examples and Tests
The "scripts" directory of the unpacked archive includes subdirectories: "0-single", "1-variable", "2-and", "3-or", and "4-ctl-c". The scripts in those directories correspond to five different of completion for this assignment: basic support, the addition of support for “and” group, the addition of support for “or” group, and the addition of Ctl-C support.
For example, after unpacking the archive, you can use
$ make |
$ ./whoosh scripts/0-single/0-ls-l.whoosh |
to try the first example, which is the only one that will work initially.
The example scripts include (to varying degrees of precision) the expected output of each example. The expected-output information is in a format recognized by the "test.rkt" script. You can run the initial whoosh build on "ls-l.whoosh" as a test with
$ racket test.rkt scripts/0-single/0-ls-l.whoosh |
If you supply a directory to "test.rkt", then all files in the directory (recurring to subdirectories) that end with ".whoosh" are run as tests:
$ racket test.rkt scripts/0-single |
... runs all 0-single tests ... |
$ racket test.rkt scripts |
... runs all provided tests ... |
The "test.rkt" script also accepts an optional --program option to specify a whoosh implementation other than "./whoosh".
Naturally, grading may test your implementation on more or different scripts.
Evaluation
Grades will be assigned based on a level of completion, where each level requires success at the lower levels:
20 points: Runs any number of single-‹command› groups where each ‹command› is either a ‹simple-command› or a repeat-prefixed ‹simple-command›.
This level of completion corresponds to the "scripts/0-simple" tests.
50 points: Runs any number of single-‹command› groups, including ‹commands› that use @ to capture process IDs and use => to send a process input or capture its output.
This level of completion corresponds to the "scripts/1-variable" tests.
70 points: Runs any number of ‹commands› groups, including ‹and-commands›, but not necessarily including ‹or-commands› and possibly without Ctl-C handling.
This level of completion corresponds to the "scripts/2-and" tests.
90 points: Runs any number of ‹commands› groups, including ‹and-commands› and ‹or-commands› groups, but possibly without Ctl-C handling.
This level of completion corresponds to the "scripts/3-or" tests.
100 points: Implements the full whoosh specification, including Ctl-C handling.
This level of completion corresponds to the "scripts/4-ctl-c" tests.
Although tests are provides in "scripts", grading may use additional tests of similar complexity at each level.
Tips
Start by making a sequence of individual commands work, which involves adding Fork.
Next, add support for @ ‹variable›. The function set_var in the provided code will be useful for recording a process ID.
Next, add support for => ‹variable› to capture output. You’ll need to use the Pipe and Dup2 functions. The function read_to_var in the provided code will be useful. The constraint on variables to hold at most 4096 bytes means that you can write all of them to a fresh pipe without blocking. Don’t forget to close pipes appropriately, otherwise reading can get stuck, waiting for the write end of a pipe to be closed.
Next, add support for ‹variable› => to send input. You’ll use Pipe and Dup2, again, and the function write_var_to in the provided code will be useful. The constraint on output to 4096 bytes or less means that it will all fit in a pipe, so the pipe doesn’t have to be read until the command process terminates. Don’t forget to close pipes appropriately, otherwise a process can get stuck, waiting for more input.
Next, add support for ‹and-commands› next.
Next, add support for ‹or-commands› last.
Finally, implement Ctl-C handling. You’ll probably need functions like Signal, Sigprocmask, and sigsuspend. Note that Setpgid will prevent a Ctl-C that is intended for whoosh from being sent to child processes of whoosh, since a terminal will send Ctl-C to a process group. Remember that relevant signals will need to be blocked while a process is being created.
If your implementation includes a call to sleep, Sleep, pause, or Pause, then you’re doing it wrong.