CSCI 241 - Homework 2:
Shell Scripting

Due by 11:59.59pm Friday, September 21
Due by 11:59.59pm Monday, September 24

The Github URL for this assignment is https://classroom.github.com/a/ZJVUDo60

Introduction

For this assignment you will be creating a number of shell scripts.

Part 1 - URL Testing

Write a shell script called testurl.sh that accepts a list of urls in a separate file and tests if the website is up or not. You might find it useful to checkout the curl, wget and tail commands.

rhoyle@clyde$ cat urls
http://cs.oberlin.edu/~ncare/cs241/labs/lab8.html
https://occs.cs.oberlin.edu/~rhoyle/17s-cs241/assignments/hw02.html
http://no.such.url
http://occs.cs.oberlin.edu
rhoyle@clyde$ ./testurl.sh urls
Not found: http://no.such.url

This script should also handle errors. If the user doesn't provide any urls to the script it should print out a usage message.


Part 2 - Back it up a step

Next, I want you to create a script called backup.sh. The script should take as arguments a directory to backup into followed by a list of one or more files to copy to the backup directory.

Your script should only copy files in if their timestamp is more recent than the file that exists in the backup directory when the script is run. You might find it helpful to check bash's test (i.e. [ ]) syntax. Additionally, you should make your script executable using chmod. That is, the command should be runnable as follows

$ ./backup.sh ~/.backup file1 file2 dir1

Part 3 - Diskhogger

Third, I want you to create a shell script called diskhog.sh that lists the 5 largest items (files or folders) in the current directory in decreasing order of size. You should output the sizes in a human readable format like so:

% cd ~rhoyle/pub/cs241
% ./diskhog.sh
3.9M week03
572K old
348K hw06
152K week06
112K week05

Check out the man pages for du, cut, sort, xargs and head (or tail)


Part 4 - linecount

Create a shell script called linecount that by default will report the total number of lines in all of the files in the current working directory (recursively).

You'll want to take a look at wc, cd, find, and test.

Part 5 - Retro-grade Scripting

I want you to write a script called gradeit.sh that will test your pyramid and rot128 submissions for lab 1.

The script should analyze student's submissions for correctness and warn if the output of the program differs from the reference implementation, which is located in ~rhoyle/pub/cs241/hw01.

You must decide what to test for. You will be graded on how thorough your test is. Explain, in comments in your script, what you are testing for and why you are running that particular test. After your script has finished, you should clean up any temporary files created by the testing process

You'll want to take a look atwc, pushd (and popd), find, and diff


Part 6 - Data file analysis

I often find myself using shell tools to answer questions about a data file that I'm working on. Here is a data file from a machine learning dataset that I'd like you download and unzip: adult.data.zip The fields in the data set are described at http://archive.ics.uci.edu/ml/datasets/Adult.

Answer the following questions in your README file (and give the commands used to find the answer):

  1. How many entries are marked "Male" and how many are marked "Female"?
  2. The last column is the label that is applied to the entry. How many of each label type are there?
  3. Give the counts for each label used for "race" in decreasing order
  4. Give the counts for a combined "race"/"sex" attribute in decreasing order

Potentially useful commands to look at include cut, sort, and uniq. If you include the commands you used to generate your answers, it might be possible to give you partial credit. Once you have answered the questions, you should delete the adult.data and adult.data.zip files so that you don't hand them in.

Programming Hints


Extra Credit

  1. Modify testurl.sh to output if a file is a valid HTML file according to the W3C validator at https://validator.w3.org/
  2. Modify your backup.sh script to keep a list of the five most recent backup directories and store copies as symlinks.
  3. Modify diskhog.sh to take a flag to change the number of items to display and another to limits it to files or directories.
  4. Make Diskhogger take a flag to change the number of items to display, or maybe another that limits it to files/directories.
  5. Have your linecount.sh script support an optional argument that will be used as a file glob pattern for the types of files. The user is responsible for properly quoting things on the command line. For example, to get a sum of all of the lines in your java source files you would use:
  6. % ./linecount '*.java' 
    
  7. Add more testing to gradeit.sh

Turning it In

README

Create a file called README that contains

  1. Your name
  2. A description of the programs
  3. Your answers to the "Data File Analysis" questions and commands
  4. An estimate of the amount of time it took to complete each part
  5. Any known bugs or incomplete functions
  6. Any interesting design decisions you'd like to share

Now you should clean up your folder (remove test case detritus, etc.) and handin your folder containing your scripts and README.

Grading

Here is what I am looking for in this assignment:


Last Modified: February 12, 2017 - Roberto Hoyle and Nick Care. Some material based on work by Benjamin Kuperman.