CSCI 241 - Homework 5:
Creating our first Unix tool

Due by 11:59.59pm Friday, March 18th 2016 Due by 11:59.59pm Wednesday, March 30th 2016

Note: You may work with a partner on this project. Let me know who you are working with (along with a team name and desired repository name), and I'll set up a shared private GitHub repository for this assignment. As there are a number of components to this assignment, I'd encourage you to get started before the last minute.

Also go through the interactive GitHub tutorial if you have not done so.

Introduction

In this class, you'll be re-creating a few common Unix tools. As we have discussed in class, the general Unix tool philosophy is that you have a program that does one thing (hopefully well) and can be composed together to perform more powerful operations.

The first tool you will be creating will allow you to reformat text. This will give you some experience working with C strings and command line arguments.

I'm interested in seeing how much time students estimate an assignment will take versus how much time they actually spend on the assignment. What I'd like you to first do is read through this assignment and create a README with your estimated time to complete it. (Feel free to list time for individual components as well if you'd like.) After you get done, I'd like you to add in the actual amount of time you spent.


format

The program you'll be creating is called format. Its job will be to read in input and "neaten" it up. It reads in paragraphs of words and rearranges them such that they fit nicely onto a line of specified width inserting line breaks as needed. A paragraph is separated from other paragraphs by one or more empty lines (which might contain whitespace).

This program is based on the Unix fmt program.

Example

INPUT:
Atop a large, ice-covered plateau,
 a struggle
 for survival is occurring. Two groups of mechanical 
                            creatures -- Chompers and 
Lobbers -- are battling to control this square,
                 slippery field for reasons that are beyond human 
comprehension.

However, the Lobbers believe that you,
 The Programmer
 knows what you are doing
and have decided that you will instruct them in their activities
    during
        this
            fateful
                day.
OUTPUT:
Atop a large, ice-covered plateau, a struggle for survival is occurring.
Two groups of mechanical creatures -- Chompers and Lobbers -- are
battling to control this square, slippery field for reasons that are
beyond human comprehension.

However, the Lobbers believe that you, The Programmer knows what you are
doing and have decided that you will instruct them in their activities
during this fateful day.

Command line arguments

format supports 4 command line arguments which change the behavior slightly.

Change line width

The -w flag allows you to change the width of the output lines. The default width is 72 characters per line.

Using the same input as above, we can specify our program to run as ./format -w 40 and get the following:

Atop a large, ice-covered plateau, a
struggle for survival is occurring. Two
groups of mechanical creatures --
Chompers and Lobbers -- are battling to
control this square, slippery field for
reasons that are beyond human
comprehension.

However, the Lobbers believe that you,
The Programmer knows what you are doing
and have decided that you will instruct
them in their activities during this
fateful day.

Right alignment

The -r flag allows you to specify that you want all the text aligned on the right side, not the left. Using the same input as above, we can specify our program to run as ./format -r and get the following:

Atop a large, ice-covered plateau, a struggle for survival is occurring.
       Two groups of mechanical creatures -- Chompers and Lobbers -- are
    battling to control this square, slippery field for reasons that are
                                             beyond human comprehension.

However, the Lobbers believe that you, The Programmer knows what you are
  doing and have decided that you will instruct them in their activities
                                                during this fateful day.

Fully justified

The -j flag allows you to fully justify the text. That is, each line extends from the left side all the way to the maximum width of the line. To simplify things, do this for even the last line of a paragraph.

Using the same input as above, we can specify our program to run as ./format -j and get the following:

Atop a large, ice-covered plateau, a struggle for survival is occurring.
Two groups  of  mechanical creatures  --  Chompers and  Lobbers  --  are
battling to control  this square,  slippery field for  reasons that  are
beyond                       human                        comprehension.

However, the Lobbers believe that you, The Programmer knows what you are
doing and have decided that you  will instruct them in their  activities
during                 this                 fateful                 day.

Professor Kevin Woods came up with a nifty way to calculate the number of spaces that go between words in the fully justified case. If you know the number of words on a line, then you can figure the number of gaps that need to be filled.

Similarly, if you know the length of the total number of words on that line, you can determine the number of spaces that should be inserted.

You then can apply integer division to calculate the total number of spaces that should have been seen by the time you finish a gap.

                                (# of gaps so far) * (total # of spaces)
    # of spaces seen so far =   ----------------------------------------
                                            (total # of gaps)

As an example, imagine we have 4 words on a line, and 8 extra spaces. There are 3 gaps to fill. At the first one we put 2 spaces:

    1 * 8 / 3 = 2 spaces

At the second, 3 spaces because the total needs to be:

    2 * 8 / 3 = 5 spaces

And finally, 3 more spaces to finish off the set:

    3 * 8 / 3 = 8 spaces

While you aren't required to use this formula to determine spacing, it seems to work quite well. Certainly better than just pushing the excess into one of the edge positions. (That would have been 2-2-4 instead of 2-3-3.)

Skip multiple blank lines

Finally, you should also support the -s flag to indicate that you want multiple blank lines to be compressed into just one. (Normally, you'd have a bunch of empty paragraphs.)

INPUT:
I like cheese.







How about you?
OUTPUT:
I like cheese.

How about you?

Multiple options

These options should be cumulative -- you can specify width, alignment, and skipping of blank lines together. The -r and -j flags do not make sense being used together. If they are, just use whichever was specified last.

Programming notes

Some guidelines you should follow when working on your solution:

You might also want to think about how you can break this down into the basic functionality and how each flag modifies the behavior. There are many ways to correctly implement these specifications -- read in and assemble a line from by reading a word at a time then processing the line OR read in a bunch of words, then assemble the lines from that.

You should be doing a process of stepwise refinement. Add in one new component and test to be sure it works before moving on to the next. Trying to do everything all at once will likely only lead to much confusion. Also, sketching out your design on paper beforehand is invaluable.

For example, when sketching things out, you decide you'd like it if you could just implement some of the functionality of Java's Scanner.next(), you'd have a design of how to handle the basic case. Then you should implement that function and then test to see if it behaves as it should on a variety of input. Once it is working, you can move on to the next phase of your design.

Note: There is no need to dynamically allocate space in this assignment. You should be able to use fixed size buffers for all of the operations requested.


More on command line arguments

In addition to the flags listed above, I'd like you to implement a -h and -? flag that prints out a brief usage message and then exits the program with a non-zero value. You should do the same behavior if an unknown flag is passed to your program.

You should have your program's main function return 0 upon successful completion of the assigned task.


Man page

You'll also be creating a man page to accompany your tool. Man pages are simply text files that have some additional annotations with formatting instructions -- somewhat similar to LaTeX if you've encountered that as well.

Traditionally, the tool nroff was used to do typesetting of man pages. These days, many systems now have groff and just have nroff as an alias to groff. You can get more info on the various options by seeing the manpage groff_man(7).

Important nroff macros to know

At a minimum, you will need to know the following macros (all go at the start of the line).

.\"
designates a comment line. There usually is a block of these at the top.
.TH <name> <section> <center-footer> <left-footer> <center-header>
Title Heading. Use this after the initial block of comments at the start. It takes the following 5 arguments:
  1. name - name of the command or function
  2. section - section of the manual (1 for user commands)
  3. center-footer - date the page was last modified
  4. left-footer - use "CSCI 241" for man pages in this class
  5. center-header - this is the organization field, use "Oberlin College" for this course
.SH text
A section header. By convention these should be in all caps. Commonly used values are NAME, SYNOPSIS, DESCRIPTION, OPTIONS, DIAGNOSTICS, AUTHOR, BUGS in that order.
.SS text
A subheading. By convention these should only have the initial letter caps.
.P
Start a new paragraph. Each line of the paragraph needs to start in column 1.
.IP text
An indented paragraph. Very useful for describing options.
.B word
Bolds the following word
.I word
Italicizes the following word

Sample manpage

.\" Sample man page for CSCI 241
.\" Benjamin Kuperman - Fall 2011

.TH sample_man 1 "06 October 2011" "CSCI 241" "Oberlin College"

.SH NAME
.B sample_man
\- an example of a sample man page

.SH SYNOPSIS
.B sample_man
[ -o outputfile ]
<filename>

.SH DESCRIPTION
Does everything a sample man page should do.

.SH OPTIONS
.IP "-o outputfile"
Do things to be written to an output file.

.SH AUTHOR
Benjamin Kuperman (Fall 2011)

.SH BUGS
None!

You can also see/download a longer version.

You should name your file based on the standard convention <name_of_program>.<manual_section>. So for the above page, it would be sample_man_page.1. (Recall that section 1 is user commands.)

To view the processed man page use either of the following commands:

Here is the HTML version of above:

sample_man(1)                   Oberlin College                  sample_man(1)

NAME
       sample_man - an example of a sample man page

SYNOPSIS
       sample_man [ -o outputfile ] <filenames>

DESCRIPTION
       Does everything a sample man page should do.

OPTIONS
       -o outputfile
              Do things to be written to an output file.

AUTHOR
       Benjamin Kuperman (Fall 2011)

BUGS
       None!

CSCI 241                        06 October 2011                  sample_man(1)

handin

README

Create a file called README that contains

  1. Your name and a description of the program
  2. A listing of the files with a short one line description of the contents
  3. Any known bugs or incomplete functions
  4. Your estimated time from the start and the actual time taken
  5. Your affirmation as to the honor code if you followed it

Now you should make clean to get rid of your executables and handin your folder containing your source files, Makefile, and README.

    % cd ~/cs241
    % handin -c 241 -a 5 hw5
    % lshand

Grading

Here is what I am looking for in this assignment:


Grading Breakdown

format:     [/15]
-w:         [/5]
-s:         [/5]
-r:         [/5]
-j:         [/10]
manpage:    [/5]
readme:     [/5]

TOTAL:      [/50]

Last Modified: March 28, 2016 - Roberto Hoyle