HW 5

Files

Introduction to Programming -- CS 140
Spring , 2008

This assignment is due at midnight on Tuesday, April 8

 


In this assignment you will work with files..

Part one.  Debugging. There are three debugging exercises this week. As usual, you don't need to hand these in; just figure out what is wrong and how to fix them. You might want to download the text files jabberwocky.txt and prufrock.txt, which are sample text files to use when debugging these programs.

File debug1.py contains a program that asks the user for a file name, and then tries to print this file to the screen. This sounds easy and the program is quite short. Sadly, it doesn't work. What this program actually does is odd -- it prints the file name down the left edge of the Shell, one character per line, and then it stops. How can this be fixed?

File debug2.py has a program similar to that for debug1, only this one works. If you give it the name of a file it will print it. If you give it a string that is not the name of a file it says "IOError: no such file" and then crashes. I hate it when programs crash. What can be done to the program so that it says something like "File not found" (or "No, I refuse") when you give it a bad file name?

File debug3.py tries to do something useful. It opens up the file "jabberwocky.txt" and writes the title and name of the author at the top of the file. This works, but unfortunately it deletes the poem in doing this. Everyone's a critic, and I don't much like the poem myself, but still it ought to be possible to do this.

a) What is wrong?
b) How you would go about writing this program correctly?

 

Part two.: For the programming portion of this assignment you will complete the program concordance.py. This builds a concordance -- an index to all of the words in a piece of text. In this program you read a text file and build a dictionary of words. Each entry of the dictionary lists all of the line numbers of the file that contain the word. For example, the dictionary entry for key "bob" might be the list [2, 6, 6, 9], which means that the word "bob" appears on line 2, twice on line 6, and once more on line 9.

Printing the concordance is easy. To make sure that everyone has the same output (to make grading easier) I have given you a PrintFile() function that handles this. It is your job to write the functions BuildConcordance() and AddWordToConcordance() that build up the dictionary so that it can be printed.

First consider function AddWordToConcordance(word, number, C). The job of this function is to add to the dictionary C the fact that the word was found on the given line number in the file. What you need to do depends on whether this word has been seen before. If it is already one of the keys of C then you want to append its line number to the list C[word]. If it isn't one of the keys you need to add it to the dictionary by saying C[word]=[number]

The BuildConcordance(C, fname) function does the bulk of the work:

  1. It opens the file and reads it one line at a time.
  2. It keeps track of line numbers; each time it reads a new non-blank line it increments the line counter.
  3. It splits the line into words using a single blank: ' ' as the delimiter between words and then has a for loop that looks at each word in the resulting list:
        for word in line.split( ' ' ):
  4. It strips punctuation marks, such as '!', ',', '.', '\n etc. from words. The order in which you do this matters. "Here is an example." splits into the words '"Here', 'is', 'an' and 'example."' The final word: 'example."' is the tricky one. If you first strip off the quote marks with word.strip( '"') and then the period with word.strip( '.' ) you will end up with the word you want: 'example'.
    You can either do this with a sequence of strip statements:
         word = word .strip( '\n')
         word = word .strip( '"')
         word = word.strip( '.' )
    etc.
    or else you can do this in a loop:
         for ch in ['\n', '"', '.', ';', ',' etc]:
               word = word.strip(ch)
  5. Once the word is stripped of punctuation marks, sent it and its line number to the AddWordToConcordance() function.

Once you have this done try running it on the demo files prufrock.txt and jabberwocky.txt. The first of these holds T.S. Eliot's poem "The Love Song of J. Alfred Prufrock" and the second has Lewis Carroll's "Jabberwocky." Compare the output you get with the actual files. You don't need to check every word ("The Lovesong of J. Alfred Prufrock" is about 3 pages of text) but you should spot-check the results. In particular, look through the output to insure that you are correctly stripping off all of the punctuation marks. Look that your line numbers are correct (call the first line number 1 like most language analysts do, even though most computer scientists would call it 0). Make sure that you are handling blank lines correctly. For example, the word "slithy" first appears on line number 1 and "Jabberwock" first appears on line number 5 of Jabberwocky.