Dictionary Compression - Task 2

Dictionary Compression tries to reduce file size by having a dictionary of words or part words and then replacing the words in a sentence with a reference to the dictionary index instead.  This can be done with a dictionary structure or with lists  - these ideas below are with lists. 

There is more on Dictionary Compression on the link below from the theory (Data Rep) 

https://lesgrammar.fireflycloud.net/computing/rs-materials-gcse-a-level-etc/gcse/data-representation---number-bases---binary/compression-techniques

So if I have a sentence like "Fred was here Fred was"  there are three unique words in it - my task is to replace each word with a number of where that word could be found in a unique list of words. 

Decompose the task. 

I like to split down - (decompose) the problem and then try to write a procedure or function for each.  These just get a sentence and split it into a list on the separator space " ".  You can find more info on the programming pages for lists. 

Program Code for getting sentence and making a full list

Next I want to make a unique list from the whole sentence list I have just made - so no duplicates. 

You can perhaps use a for loop to step through the list and only add unique words to the list. 

This function receives a list and returns a unique list. 

(If you have done some work with the SET data structure you could perhaps use this instead)

Code that makes a unique list from the full list

Now I have a unique list I can revisit my sentence and code a string based on the positions of the numbers in the list.  

My sentence was "Fred was here was Fred"  -  my unique list should look like this. 

So my sentence should be coded as  0 1  2 1 0  

This time I will send the sentence and the unique list to my function that will return the coded string. 

you need to step through the list with all the words in and create a string with the index numbers instead of the words - taken fro the unique list.  you can return the index position of a word using the following syntax - uniquelist.index(wordToFind)  Again below is some code that does this but have a go yourself first. 

Code to build a sentence of coded words from the dictionary

it just remains for me to put that all together in a main() definition. 

The code for main() to put the definitions together.

I could then use the code 0 1 2 1 0 to rebuild the sentence. - code that yourself!  You need to read each number from the string - convert it to a number and then find the word in the unique list that corresponds to that number. - Add the space back in too.