word/text twist program, assistance required

Pages: 12
Aug 20, 2008 at 6:26am
i need to make a program which reads from a file which has a word list. i then need to input more than 3 letters, and less than 8, and then see how many words which are present in the wordlist can be made from those particular letters and then output those words.
any help will be apppreciated
thanks a lot
tyler
Aug 20, 2008 at 7:01am
http://www.cplusplus.com/forum/articles/1295/

Do you have ANY experience with coding or is this your first venture into it?

Do you have ANY code written, or is there a specific part of the program you're having trouble with?

Is this a school assignment? If not, maybe you should try something simpler if you really don't know where to start.

You'll need to find a dictionary word list, I would think there would be something available, though I've never seen/used one. I can't really think of an easy way to do this right now.
Last edited on Aug 20, 2008 at 7:04am
Aug 20, 2008 at 7:18am
i have previous experience with coding, although its not much, its been just 2 months since i've started learning c++.

this is not a school assignment, its just something i wanted try myself.
i've already found a dictionary list, alphabetically sorted.

i know that i'll have to use file input, and read from that file and compare it to the words entered, but i dont know how to go about the comparing part.

i dont have any substantial code written yet.

basically i just need help with the part where the letters which have been input are jumbled up continuously and compared to the words in the word list(which has been input as a file). first the 3 letter words, then 4, and so on, till 6 or 7 letter words.

i've thought a lot about this but i cant seem to figure out.
Aug 20, 2008 at 7:18am
I can see it now. You first:

1. Need a fuction of some sort to read all the word from a file and store it in a list.
2. Need one to take in the user word, check it, and store it.
3. Need one that would compare the two.

hehe. I previously wanted to do something simliar, but with number. Like converting phone number into word. But I don't know how to use large database yet. So I am going to put that back a little. hehe. gl
Aug 20, 2008 at 7:23am
thanks LacViet
but the problem is that the user word has to be constantly jumbled up, and checked again and again. all the combinations need to be taken into consideration.
i already have a word list as a text file which can be used using file i/o.
Aug 20, 2008 at 7:34am
Ex:
Word list: Alpha, Beta, Dog, Tot, Bob.
User input: Bob.

If I have to do it, I would probably first count the char of the user input. Then compare it with the wordlist--eliminating all the word that have more or less char than the word the user inputed.
int lenghtOfWord = 3, eliminating Alpha, Beta.

Then compare all the indivitual char themselves, eliminating any words in the wordlist that doesn't contains the same char as the char of the user inputed word.
--check char b, o, b. eliminating tot and do.

Then do a char switch and compare the two. O.o.
Last edited on Aug 20, 2008 at 7:35am
Aug 20, 2008 at 7:45am
thats alright, but what if my word list has the words :
rob, orb, cool, hot, hog, bro

user input : : bro

the ideal output should be : bro, orb, rob

but your logic would give just : bro as the output

how do i fix this bug. can you give show me some code?

thanks

tyler
Aug 20, 2008 at 7:50am
Nope. Mine code would eliminate cool, hot, hog. Leaving bro, rob, orb then the last step you would compare all the different variation of "bro" to see which one matches with the word left over in the list. I don't have the code yet, just psuedo code in my head...hehe. I am a noob myself, but I will attempt this problem also. hehe.
Aug 20, 2008 at 8:28am
I actually wrote this program a long time ago to see if i could get perfect scores on text twist and to help myself learn C. I lost the source code but my method basically went as follows (I'm changing it to fit for c++):

1 - Read the dictionary file into a vector in alphabetical order (my dictionary file was in alphabetical order, if yours isn't then you can always make a program to sort it first).

2 - Get the user input letters

3 - Loop through each word in the dictionary vector and see if you can make that word with the input letters. To do this i had a separate array of bits that corresponded to each letter in the input representing if it was used or not. I would start by reading the first letter of the dictionary word, and try and find that in my input letters (ignoring letters already used - that's where the "used array" came in). If the letter is found, set that character as "used". If it was not found, that word can't be made. Continue this process for each letter in the dictionary word. If you reach the end of the word then you can make that word with your input letters so print it out.

Now, if you want to make this run faster (it may be pretty slow if your dictionary is large) you can take two different approaches:
1 - Alter the old approach so that if at some point during checking a dictionary word you find that you cannot make that word, you therefore know that any other dictionary words starting with the same prefix cannot be made, so you can skip all of those words. For example, if you have the letters: "ABCDER" and you are checking the word "FAR", once you see that the 'F' is not found in the input letters, you can skip all the dictionary words starting with 'F'. Similarily, once you reach the word "RICE" you will hit the "I" in the word and determine you cannot make this word, so you can skip all words in the dictionary starting with "RI".

2 - The other approach is to not look at every dictionary word, but rather look at every permutation of the input letters. In this case you should look up code on how to create all possible permutations of a string. For each permutation, do a binary search on your dictionary word list. This method will likely be fastest for your computer to do.
Aug 20, 2008 at 8:30am
Hmm.. i see.. but how do i compare.. can you help me out on that? i'm pretty confused

i'd be really thankful if you could provide Some code
Aug 20, 2008 at 8:37am
@ mahlerfive

thanks a lot. that was really helpful. i think i'll use the method where the permutations of the letters are compared to words in the word list. it will be fastest and probably the easiest to implement.

but how do i do a binary search on the word list?

could you give me a very small example of this program.. the code that is, i just cant seem to figure out the code.
Aug 20, 2008 at 8:41am
You can use the built-in binary search method in the STL.
Check out: http://www.cplusplus.com/reference/algorithm/binary_search.html

It gives a very good example of doing a binary search on a vector. It uses numbers, but for strings it will be the same thing.

Edit: Remember, for binary search to work, the vector MUST be in sorted order
Last edited on Aug 20, 2008 at 8:42am
Aug 20, 2008 at 8:52am
yes i've sorted the word list alphabetically.

one more thing, how do i find out the permutations?

e.g. : if i input the letters : " glow " , i want " glo, glw, gol, gow wog, owg, olg, ogw, olw, glow, golw, gowl, lowg, logw, olgw, etc as the permutations.. and since i will input 6 letters, i want all possible 3,4,5,6 letter words that can be made using those letters

can you help me with a program on how to do this?

thanks
Aug 20, 2008 at 11:02am
You will probably want two functions: one that generates all the subsets of the input string, and one that generates all the permutations of a given string. Then you can do something like the following to generate all the permutations of all the subsets:

[code=c++]
string input_string;
cin >> input_string;

vector<string> subsets; // this will store all the subsets of the input string
vector<string> permutations; // this will store all the permutations of a single subset
vector<string> permutations_of_subsets; // this will store all the permutations of all the subsets which is our ultimate goal

// First we find all the subsets
subsets = find_subsets( input_string );

// Now we generate all the permutations on all of these subsets
vector<string>::iterator iter;
for( iter = subsets.begin(); iter != subsets.end(); iter++ ) {
// For this subset, find the permutations
permutations = find_permutations( *iter );

// Add these to all the permutations we have so far
permutations_of_subsets.insert( permutations_of_subsets.start(), permutations.start(), permutations.end() );
}

// Now we go through each string in permutations_of_subsets and look it up in the dictionary.....
[/code]

Here is a site I just googled and looks promising for permutations code. It looks like they use char* instead of string, but you can easily convert between the two using c_str(). http://www.bearcave.com/random_hacks/permute.html

At first glance I couldn't find a subsets function, but this could be a good exercise anyway :)

Aug 20, 2008 at 11:56am
You don't need to generate every possible permutation. Work backwards.

Since you are looking for words that contain only a specific collection of letters, you can use one of the string class's methods that will help. I'll let you look for it, but here is a hint: try to identify words that have a letter other than those in your list of letters (which should be a string).
http://www.cplusplus.com/reference/string/string/
(Scroll down. Remember, you want to find words with invalid letters.)

Once you have identified a word as good or bad, you can cout or not cout it.

Hope this helps.
Aug 20, 2008 at 12:25pm
@ mahlerfive.. thanks a lot, the linked helped a lot

@ duoas.. thanks a lot..

You don't need to generate every possible permutation. Work backwards.


i understood this part, but i'm not sure if i understood the rest completely. are you referring to the find_first_not_of function?

Aug 20, 2008 at 12:33pm
i've written this bit of code for finding permutations of a given word.
i'm not getting any error but the program does not work as expected. also, is there a better way to do this? is the logic used correct?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
#include <iostream>
#include <cstring>

using namespace std;

void permutation(char str[], char str2[])
{
int length = strlen(str);
if (length)
{
for (int i = 0; i < length; i++)
{
char* str1 = new char[length+1];
int count1 , count2;
for (count1 = 0 , count2 = 0; count1 < length; count1++, count2++)
{
if (count1 == i)
{
str1[count1] = str[count2++];
continue;
}
else
str1[count1] = str[count2];
}
str1[count1] = '\0';

int length1 = strlen(str2);
char* str3 = new char [length1+2];

strncpy(str3,str2,length1);
str3[length1] = str[i];
str3[length1+1] = '\0';

permutation(str1,str3);


}
}
else
{
cout << str2 << endl;
}
}

int main()
{
char str[] = "great"; // this is the word who's permutations are to be found out
char str2[] = "\0";

permutation (str,str2);

return 0;
}
Aug 20, 2008 at 4:20pm
I once cracked my head open trying to figure out such an algorithm. And then I found this: http://www.cplusplus.com/reference/algorithm/next_permutation.html
Aug 20, 2008 at 4:28pm
are you referring to the find_first_not_of function?

You've got it exactly. If you can find a letter in your word string that is not in your letters string, then you know that the word cannot be some combination of the letters.

Calculating every possible combination is impractical. The longest word in English is "antidisestablishmentarianism", which is 28 letters long, using 12 letters of the alphabet (almost half).

The possibilities for 28 letters, each of which can be one of 12, is:
121 × 122 × 123 × ··· × 1227 × 1228
which is 1648446623609512543951043690496 possibilities.

Yep, you read that right.

If we assume that the computer can check 10,000 possibilities a second, then it would take nearly 7.2 × 1018 years to check them all. (That's 7.2 quintillion years, American; or trillion in the old UK long scale.) The computer would turn to dust before that happened.

Sure, that's an extreme estimate. Let's go for five letters in a five letter word:
5 × 5 × 5 × 5 × 5
is 3125 possibilities. That's a lot more manageable. (Assuming our 10,000/sec calculations, that takes only about 3% of a second.) You could easily live with that. Just remember that each additional letter adds to the time exponentially.

But why? Do as little as possible to accomplish the task. You only need to check words that you already know. The Official Tournament and Club Word List from the National Scrabble Association lists 178,691 words as of 2006 [1]. Comparatively, 3125 is a lot of word checking.

Hope this helps.

[1] http://en.wikipedia.org/wiki/Official_Tournament_and_Club_Word_List
Aug 20, 2008 at 4:29pm
I did the same thing but I used map<string, string> instead of vector<string>.
Then using my own version of next_permutation() (I didnt know about it, now I'm gonna change it) i checked if the specific string exists in the map.

Isn't it faster than going through each element of the vector and checking for their value generating all the possibilities?
No it is not :D. Now I understand Duoas solution :P

But what about words that two letters are the same, like "boot", it has two 'o's but if the user has 'bot' as letters it will return that this is a word :/
After the find_first_not_of() we should use the count() function?
Last edited on Aug 20, 2008 at 4:39pm
Pages: 12