ECE 2524 - Word Frequency Count: Part 3

ECE 2524

Introduction to Unix for Engineers

Word Frequency Count: Part 3

Last modified

Background

Implement ‘catlike’ interface pattern

  • user1: implement and test ‘catlike’ iterface pattern as shown in class.

    Any additional command line arguments that do not start with - should be treated as file names to read from. Attempt to open each argument as a file and append the words contained into a single word list (replace the call to split_words with a call to append_words, implementation shown below.

    If no additional arguments are given then the program should work exactly the same as it did for part 2, i.e. read from standard input.

    void append_words(word_list_t wl, FILE* fp) {
        /* append words found in fp to list wl */
        token_inject(fp, " \n\t.,?!-", wl, word_list_injector, TOK_INJECT_COMPACT);
    }
    
    int main(int argc, char *argv[]) {
        /* variable declarations */
        word_list_t wl;
        FILE* fp;
    
        /* call getopt in while loop */
    
        wl = word_list_create(100); /* used to be done in split_words,
                                     * which we are no longer using */
    
        /* optind will contain the offset from the start of the original
         * argv to the first argument not consumed by getopt */
        argv += optind; 
        argc -= optind;
    
        /* now use argv as a list of remaining command line arguments. */
        if ( argc > 0 ) {
            /* any arguments left over after getopt should be treated as
             * file names */
            do {
                fp = fopen(*argv, "r");
                /* TODO: user2 implement error checking */
                append_words(wl, fp);
            } while (*++argv);
        } else {
            /* no additional arguments means read from standard input */
            append_words(wl, stdin);
        }
        
        /* remainder of program shouldn't need to change from part 2,
         * but since append_words replaces split_words, you won't need
         * that line */
    }

    update the usage message to reflect the new functionality:

    $ ./wordfreq -h
    Usage: ./wordfreq [-hr] [-k N] [FILE ...]
  • user2: after user1 has completed their work, add error checking to handle filenames that can not be read.

    If a given command line argument referrs to a file that does not exist, or can not be read then print a cooresponding message to standard error and continue to the next file in the list.

    Assuming files named file1 and file2 exist and are readable, but no file named not_a_file exists, then the standard output of

    $ ./wordfreq file1 file2

    and

    $ ./wordfreq file1 not_a_file file2

    should be identical, but the latter command should also produce a message on standard error:

    not_a_file: No such file or directory

    Remember, don’t do more work than you have to. You don’t need to know the reason fopen failed to open a file, just that it will set the global errno with some value. Use strerror to generate a human-readable message from the global errno.

    #include <stdio.h>
    #include <string.h>
    
    ...
      
    fprintf(stderr, "%s: %s\n", *argv, strerror(errno));

    Alternatively, you could use the warn function defined in err.h.

    #include <err.h>
    
    ...
      
    warn("%s");

Compiling and Linking

If you receive linking errors about ‘token_inject’ being undefined, you may have to tell the linker to explicitly link to the streamtoken library:

$ clang -o wordfreq -lanalytics -lstreamtoken main.o

The extra -lstreamtoken option is not needed if compiling with clang 3.4 on the ece2524 VM but I found it necessary when using clang 3.3 on my local machine.

Submission

The source files should exist in their own git repository, if you change to the directory containing your source files and run ls -a you should see a directory named .git. If not, run git init to initialize a git repository in the current directory. You should only run git init once for each new project.

Push your git repository to the remote at git@ece2524.ece.vt.edu:USER/wordfreq.git where USER is your git user name.

If you have initialized a new repo but have not added a remote yet:

$ git remote add origin git@ece2524.ece.vt.edu:USER/wordfreq.git

where is your git user name.

If you have already added a remote named origin, but the URL is incorrect, replace add with set-url in the above command. You can always check that remotes you have added by running git remote -v.

Remember, if this is the first time pushing to a new remote you need to specify a destination branch (usually `master`). Using the `-u` option will save this default destination for future pushes.

$ git push -u origin master

Testing

Feature repo path: features/wordfreq

The following features will be tested using cucumber:

@compile
Feature: Compile

  Background:
    Given I am working from a clean git clone to "wordfreq"
    And I cd to "wordfreq"
    
  Scenario: Clean Repo
    Then a file named "wordfreq" should not exist

  Scenario: Compile
    When I successfully run `clang -c -o main.o main.c`
    Then a file named "main.o" should exist
    When I successfully run `clang -o wordfreq -lanalytics main.o`
    Then a file named "wordfreq" should exist
@part3 @no-clobber
Feature: catlike interface pattern

  Background:
    Given I cd to "wordfreq"
    And a file named "fox.txt" with:
    """
    the quick brown fox jumped over the lazy cow.
    but the cow jumped over the moon!
    what does the fox say?
    
    """
    And a file named "numbers" with:
    """
    four two four one
    two four three three
    three four
    
    """

  Scenario: One file argument
    When I run the shell command "./wordfreq numbers"
    Then its stdout should contain exactly 4 lines
    And its stdout lines should match:
    | ^\s*4\s+four$  |
    | ^\s*3\s+three$ |
    | ^\s*2\s+two$   |
    | ^\s*1\s+one$   |

  Scenario: Two file arguments
    When I run the shell command "./wordfreq numbers fox.txt"
    Then its stdout should contain exactly 10 lines
    And its stdout lines should match:
    | ^\s*5\s+the$    |
    | ^\s*4\s+four$   |
    | ^\s*3\s+three$  |
    | ^\s*2\s+cow$    |
    | ^\s*2\s+fox$    |
    | ^\s*2\s+jumped$ |
    | ^\s*2\s+over$   |
    | ^\s*2\s+two$    |
    | ^\s*1\s+brown$  |
    | ^\s*1\s+but$    |

  Scenario: A bad file argument
    Given the file "not_a_file" should not exist
    When I run the shell command "./wordfreq numbers not_a_file fox.txt"
    Then its stdout should contain exactly 10 lines
    And its stdout lines should match:
    | ^\s*5\s+the$    |
    | ^\s*4\s+four$   |
    | ^\s*3\s+three$  |
    | ^\s*2\s+cow$    |
    | ^\s*2\s+fox$    |
    | ^\s*2\s+jumped$ |
    | ^\s*2\s+over$   |
    | ^\s*2\s+two$    |
    | ^\s*1\s+brown$  |
    | ^\s*1\s+but$    |
    And its stderr should contain exactly 1 line
    And its stderr should contain "not_a_file: No such file or directory"
@part3 @no-clobber
Feature: Command Line Arguments

  Background:
    Given I cd to "wordfreq"
    And a file named "fox.txt" with:
    """
    the quick brown fox jumped over the lazy cow.
    but the cow jumped over the moon!
    what does the fox say?
    
    """
    And a file named "numbers" with:
    """
    four two four one
    two four three three
    three four
    
    """

  Scenario: "-k argument error checking"
    When I run the shell command "./wordfreq -k five < fox.txt"
    Then its stdout should contain exactly 0 lines
    And its stderr should contain exactly 1 line
    And its stderr should contain "No digits were found"

You can run the tests manually with

$ cucumber /usr/share/features/wordfreq
when logged in to your shell account. This command assumes your current working directory is your project directory.