Shell implemented in C. Built as part of the Operating Systems course at the RUG.
Go to file
Djairo Hougee 7e6c1147db feat: implemented pipes and redirects + background operator 2024-03-16 23:07:19 +01:00
assets feat: implemented pipes and redirects + background operator 2024-03-16 23:07:19 +01:00
Makefile feat: implemented pipes and redirects + background operator 2024-03-16 23:07:19 +01:00
README.md feat: implemented pipes and redirects + background operator 2024-03-16 23:07:19 +01:00
cmd.c feat: implemented pipes and redirects + background operator 2024-03-16 23:07:19 +01:00
cmd.h feat: implemented pipes and redirects + background operator 2024-03-16 23:07:19 +01:00
main.c feat: implemented pipes and redirects + background operator 2024-03-16 23:07:19 +01:00
scanner.c feat: implemented pipes and redirects + background operator 2024-03-16 23:07:19 +01:00
scanner.h feat: implemented pipes and redirects + background operator 2024-03-16 23:07:19 +01:00
shell feat: implemented pipes and redirects + background operator 2024-03-16 23:07:19 +01:00
shell.c feat: implemented pipes and redirects + background operator 2024-03-16 23:07:19 +01:00
shell.h assignment 1 2024-02-16 15:34:28 +01:00

README.md

Bonuses added for lab 3 are at the bottom of this document, starting here.

general code structure

The main objective of this assignment was to tokenize given inputs and process the resulting tokens according to the given grammar. We decided to seperate the two steps of tokenizing and processing almost entirely, which decouples the input parser from the execution very neatly. The one drawback of this method is that each input is, more or less, processed twice - but we considered this a fair trade-off, as the input for a shell can generally be expected to not be incredulously large.

The crux of our approach lies in a slight addition to the ListNode struct: we have added a single Enum Type attribute which places each token within one of a few categories, roughly mapping to the grammar we are implementing.

The possible values of the Type enum are:

Value Definition meaning
EXECUTABLE any non-operator A command in either $CWD or $PATH.
OPTION any non-operator Any options to pass to an EXECUTABLE or BUILTIN.
COMPOSITION '&&', '||', ';', '\n' Special characters which signal the end of a chain.
BUILTIN a builtin One of a few pre-defined builtin executables.
PIPELINE '|' A pipeline, indicating the preceding and following commands ought to be piped together.
REDIRECT '<', '>' Signals stdin and/or stdout should be handled differently.
BACKGROUND '&' Tells the shell to run a given chain as non-blocking (run in the background).
FILENAME any non-operator A file (for redirection).

builtins

A builtin for our shell means any specific command which should be handled by the shell internally, as opposed to calling an external executable. So far, we have implemented:

Builtin Result
status prints the return status of the last run command.
exit exits the shell.
cd changes the current working directory.
debug prints debugging information to stdout.

operators

An operator is a special token which tells the shell to process the preceding (and following) chain differently. The standard operators are (or will be) implemented by this shell:

Operator Name Effect
"&" Background operator Tells the shell ro run a chain in the background.
"&&" Conditional and Tells the shell to execute the next chain iff the status of the previous chain equals 0.
"||" Conditional or Tells the shell to execute the next chain iff the status of the previous chain does not equal 0.
";" Seperator Tells the shell when a chain ends.
"<" Redirect in Tells the shell to take input from a non-stdin stream.
">" Redirect out Tells the shell to print output to a non-stdout stream.
"|" Pipe operator Tells the shell to chain commands together.

processing

After parsing a complete line and assigning each token a Type, the next step of the program is the actual processing. To aid in this, we have constructed a few different types - defined in cmd.h - which model various parts of a typical shell execution.

The Command struct models one executable and its' options, arranged in char * and char ** to easily pass them to an exec() system call.

typedef struct Command {
  char **arguments;
  char *command;
  int capacity;
  int numArguments;
} Command;

The CommandList struct models a list of Commands, and will be used when multiple commands are to be executed together (as is the case when piping commands, or using the background operator).

typedef struct CommandList {
  Command *commands;
  int capacity;
  int numCommands;
} CommandList;

The Chain struct models one complete chain to be executed. It conains a list of commands to execute, and extra information related to I/O - any redirects. These are set to NULL if no redirect was given.

typedef struct Chain {
  CommandList commands;
  bool runInBackground;
  char *in;
  char *out;
} Chain;

These structs all come with some library functions for ease of use, which allow us to build up commands and chains, and execute them appropriately.

main

The driver code behind the program lives (obviously) in main().

The code is mostly found in the while loop, which runs until the value of do_loop is set to false, which only happens when the program reaches the EOF or the BUILTIN command exit is given. inputLine is the line that the user enters the shell. We declare tokenList to store all of the tokens that are picked up from the input.

readInputLine() reads the entire input line. We modified this function so that it returns NULL on EOF. Before we start, inputLine is quickly checked to see whether or not its NULL. If it is not, the function getTokenList() is run to extract all the tokens from the input and place them into tokenList.

Since the tokenList will be edited, we create two copies of the list, cpy and og. Before looking through any tokens, the parseInputList() function is utilised to check the validity of tokenList. It is also checked whether or not the list is NULL. If these are both valid, we can begin executing the line, else we print an invalid syntax error message.

The execution is done within a while loop that runs until the cpy list is entirely consumed and is NULL. Within this while loop, we implemented a switch case which, at this level, should only accept the following enum types:

  • EXECUTABLE
  • COMPOSITION
  • BUILTIN

If the code encounters any other types at this level, it means that the syntax of the input is incorrect. If this happens, an error message is printed out and cpy is set to NULL in order to exit the loop.

In the end, we use free() and freeTokenList() to free the memory.

bonuses

We have enhanced the functionality of our shell by implementing a few bonuses. These are described in this section.

cd builtin

Any self-respecting shell needs a way to change the current working directory. Luckily for the end-user, we have implemented exactly that! Use the builtin cd to move around the system. We even added support for moving to the $HOME directory - accomplished by witholding any command options (just cd will get you home).

Note that this builtin has side-effects, and might therefore edit status. If changing directories fails for whatever reason (insufficient permissions, destination doesn't exist, etc), status will be set to -1.

This builtin provides interactions like these ('>' is input, added in post):

>realpath .
/home/djairoh/Nextcloud/opsys/assignments/1/src
>cd /home/djairoh/Documents
>realpath .
/home/djairoh/Documents
>cd
>realpath .
/home/djairoh

debug mode

To get an insight into how the shell works under the hood, we've added a custom builtin: with the debug flag, all executed commands will be laid out in glorious JSON format.

This builtin is a toggle, meaning a second use will disable it once more. Provided below is some example input/output with this builtin on. Lines starting with '>' are input (the '>' was manually added post-fact).

>debug
Toggled debug to 1.
>echo "hello, world!"
{
"commandList": {
    "commands": [
      {
        "arguments": [
          "echo",
          "hello, world!",
          "(null)"
        ],
        "command": "echo",
        "capacity": 4,
        "numArguments": 3
      }],
    "capacity": 4,
    "numCommands": 1
  },
"runInBackground": 0
}
hello, world!

shell prompt (coloured)

Standard in any shell is a prompt. These can range from minimalist (the sh prompt only shows the current version of the shell), to things as complicated and customizable as the Starship prompt - of which an example is provided here:

An example of a Starship prompt

While we didn't quite get around to this level of complexity, we have implemented a little prompt consisting of three components:

  • the username of the person who started the shell
  • the current working directory
  • the status/status of the last command, indicated by '>'

The end result looks like this:

Our own (less impressive) prompt

navigable arrow keys

Another standard shell feature, available in everything from the modern fish and zsh, to shells as far back as sh, is that of the arrow key. Because our shell takes input using getchar() the user is unable to navigate back in an unfinished prompt. By switching to the GNU readline library, we can make use of the readline() function - allowing us to navigate an unfinished prompt in the expected manner.

This is a bit hard to illustrate using text/images, so we recommend you build the shell yourself if you wish to play around with this a little bit.

tab completion

Another perk of the GNU readline library is it quite neatly allows for tab completion, bash-style. Use tab while entering a command and watch as the shell tries its best to complete it for you.

Note that this only works on filenames - the bash-style tab-completion of variables and executable commands is not implemented. Included below is an example; after entering 'cat' we then pressed tab to showcase the completion, which prints all (matching) files in CWD to the shell.

tab completion on filesnames

background operation

Everyone who has ever used a shell before will be familiar with the background operator, but here's a quick recap anyay: Whenever a command is suffixed with a single ampersand (&), it is run in the background and control of the shell is immediately returned to the user. The output streams of the command are still connected to the shell, (ergo, echo 7 & will still print '7' to stdout, which might mess with the prompt a little), but the command is run as non-blocking.

Because of the structured approach we took towards chain/command design, it was trivially easy for us to implement this bonus: a simple if statement (checking for the background operator) and we are good to go!

The relevant snippet of code lives in executeChain and reads as follows:

  // wait on children
  int stat;
  #if EXT_PROMPT
  if (chain.runInBackground) return;
  #endif
  for (int i=0; i < numCommands; i++) {
    waitpid(pids[i], &stat, 0);
    if (WIFEXITED(stat)) {
      *status = WEXITSTATUS(stat);
    }
  }