Task 1 - Text compression

Part 1 - Overall Description

Develop a program that analyses a sentence that contains several words without punctuation. When a word in that sentence is input, the program identifies all of the positions where the word occurs in the sentence. The system should not be case sensitive: Ask, ask, ASK should be treated as the same word.

For example, in the sentence:

ASK NOT WHAT YOUR COUNTRY CAN DO FOR YOU ASK WHAT YOU CAN DO FOR YOUR COUNTRY

The word ‘COUNTRY’ occurs in the 5th and 17th positions.

Analyse the requirements for this system and design, develop, test and evaluate a program to locate and return the position(s) of the word you have selected in a particular sentence or return an error message if the word is not in the sentence.

Help page for Task 1

Positions if word is in the sentence

If no words are found ...

Task 2

Develop a program that identifies individual words in a sentence, stores these in a list and replaces each word in the original sentence with the position of that word in the list.

(if you wrote your program from task 1 in definitions then there is a lot you can still use)

For example, the sentence:

ASK NOT WHAT YOUR COUNTRY CAN DO FOR YOU ASK WHAT YOU CAN DO FOR YOUR COUNTRY

contains the words ASK, NOT, WHAT, YOUR, COUNTRY, CAN, DO, FOR, YOU

The sentence can be recreated from the positions of these words in this list using the sequence:

1,2,3,4,5,6,7,8,9,1,3,9,6,7,8,4,5

Help page for compression

Next Step

Save the list of words and the positions of these words in the sentence as separate files or as a single file.  -  Saving and loading files / working with text files. 

Analyse the requirements for this system and design, develop, test and evaluate a program to:

· identify the individual words in a sentence and store them in a list  -  (already done in task 1) 

· create a list of positions for words in that list  (already done from above) 

· save these lists as a single file or as separate files.  ( this is the new bit - saving to a file) 

Task 3

Develop a program that builds upon the technique from Task 2 to compress a text file with several sentences, including punctuation. The program should be able to compress a file into a list of words and list of positions to recreate the original file. It should also be able to take a compressed file and recreate the full text, including punctuation and capitalisation, of the original file. 

You could get a text file of a whole book if you like.  Try project Guttenberg 

Analyse the requirements for this system and design, develop, test and evaluate a program to compress a text file and reproduce the original text from a compressed file. You will need to create a text file with more than one sentence to test your system.