Personal tools
You are here: Home Classes Fall 2004 - Spring 2005 CS 151 Tries
Navigation
Log in


Forgot your password?
« May 2008 »
Su Mo Tu We Th Fr Sa
123
456789 10
11121314151617
18192021222324
25262728293031
 
Document Actions

Tries

by admin last modified 2005-05-11 18:13

Lab 9 -- Tries

In today's lab you'll implement a portion of the Set interface using the Trie data structure.  You can refer to last week's class notes for some discussion of Tries.

Writing the Trie class

Define the data members of Trie as follows:
boolean isWord;
Trie[]  children;
The no-argument constructor for a Trie should initialize an empty Trie.  The result should be a single Trie node whose isWord flag is false and whose children array is an array of null pointers.  Recall that the size of the array is determined by the number of characters in the alphabet; in our case, that will be 26.  (This is a good place for the use of a "static final" variable to define a constant, so that you don't need to hard-code the number 26 in more than one place in your program.)

You are responsible for implementing the following methods in Trie.java:
boolean contains(String string);  Return true if the trie contains the given string, false otherwise.

boolean add(String string);  Insert string in the trie, if it is not already present.  Return true if and only if the trie is modified by this operation.

boolean isEmpty();  Return true if the trie contains no strings, false otherwise.

Object[] toArray();  Return an array of strings containing all the strings in the trie, in lexicographic (alphabetical) order.  (One way to do this would be to construct a LinkedList or ArrayList of all the strings and then call the toArray method of List.)

Trie subtrie(String prefix);  Return the subtrie of this trie containing all words which begin with the given prefix.  If there are no words in the trie which begin with the prefix, return null.  (Be aware that the returned trie, taken by itself, contains the remainders of the matching strings (after the prefix), not the entire strings.)

int size();  Return a count of the number of strings contained in this trie.

int height();  Return the height of the trie.  (This is equivalent to asking for the length of the longest string in the trie.)

The contains and add methods were outlined in Monday's class notes.  The subtrie method is similar to the contains method.  The toArray, size, and height methods require a recursive tree traversal.

Some remarks on the use of characters and strings:
  1. You need to be able to break a String down into its individual characters.  You can use the charAt(int) method in String for this.  (Check the Java API for details.)
  2. Another String method you may find useful is the substring method.  It is used to extract a substring from a given string.  (Look it up in the Java API for details.)
  3. In searching a trie for a string, the individual characters of the string must be used to index into each trie node's array of children.  To accomplish this, you will need to convert each character to a numeric index.  In particular, you will need to convert 'a' to 0, 'b' to 1, 'c' to 2, ..., and 'z' to 25.  This is easy to do in Java, because characters are considered to be a numeric type compatible with int, so it is possible to perform arithmetic operations on them.  When Java performs arithmetic on characters, each character is interpreted as the number used in its Unicode representation.  Because the letters of the alphabet are assigned consecutive Unicode values, a letter can be converted to a number in the range 0..25 by simply subtracting the letter 'a' from it.  For example, 'b' - 'a' is equal to 1, 'c' - 'a' is equal to 2, 'd' - 'a' is equal to 3, etc.
  4. What will happen if the user tries to add a string containing a character outside the range 'a'..'z'?  It is likely that an "index out of bounds" exception will occur in the add method.  Your program should handle this exception somehow, either in the Trie class or in the application programs you will write which use it.

Testing the Trie class

To test your Trie class, write the following application programs.  Each of these should be written as the main program in a separate class; say App1.java, App2.java, etc.  I've set up two test data files for you:   small.txt and lexicon.txt.

1.  This program has a series of words as its command-line arguments.  It inserts each word into a trie, and then displays:

The size of the trie.
The height of the trie.
A list of the words in the trie, in alphabetical order.

2.  This program has a file name as its first command-line argument.  The file contains one word per line, as in the lexicon.txt file.  The remaining command-line arguments are words to be searched for in the trie.  

The program should create a BufferedReader to read the file.  (See the remarks on BufferedReader below.)  Each line should be read into a string which is then inserted into a trie.  The program then searches the trie for the remaining words on the command line.  It displays a list of the words which are found, one per line.

3.  This program has two command-line arguments:  a file name and a prefix string.  It starts by loading all the words from the file into a trie, as in application 2.  It then searches the trie for all words which start with the given prefix string.  It displays a message indicating the number of matching words, followed by a list of those words.  (It should display the entire matching words, not just the remainders.)

It is possible that the list of matching words could be very long (for example, if the prefix is an empty string).  So, if there are more than 20 matching strings, the program prints only the first 20, then three lines consisting of a single period, then the last matching word.

4.  (Optional)  It might be interesting to see how many nodes need to be created for a given trie.  Write an application that checks this out, as follows:
  1. Write an additional method in the Trie class called "nodeCount", which returns an int.
  2. Write an application which reads a list of words from a file and stores them in a trie, and then prints the number of words in the file and the number of nodes in the trie.
  3. Try it out on the lexicon.txt file.

the BufferedReader class

BufferedReader is a class in java.io which is designed to read an input stream, one line at a time.  A buffered reader can be created for a given file by the statement:

BufferedReader reader = new BufferedReader(new FileReader(/* insert the name of a file here */));

BufferedReader's readLine() method returns the next line from the file; when it reaches the end of the file, it returns null.  So a loop to read the contents of a file, one line at a time, can be structured as follows:

BufferedReader reader = new BufferedReader(new FileReader(/* insert the name of a file here */));
String line = reader.readLine();
while(line!=null){
      /*
         insert code to process one line of the file here
      */
    line = reader.readLine();
}


Also recall that methods which might generate an IOException must either throw or catch that exception.

 

Powered by Plone CMS, the Open Source Content Management System

This site conforms to the following standards: