Monday, September 20, 2010

Class Notes: 9/20/10

KR: Last week we didn't really talk about HOW we should/will/are implementing our strategies. What kinds of data structures are appropriate? Are we happy with what happened last year? What kind of innovation can be made?

Jesse: For any hand of letters, calculate the expected value if you had these letters. You'd need the probability of getting each letter, and of what the cost may be. Use a linear scan, no special data structures
Nitin: In the beginning, compute the list of 7 letter words. Prune out the ones that you can't form with at least 50% of what's on your rack currently. Bid high, low, or medium based upon frequencies.

KR: Let's focus on how, rather than pure strategy.
Neeha: Need to adapt previous strategies to take into account how many bids are left.
EPK: We use the apriori algorithm. We can use this general algorithm to make strategy decisions. The interesting change is that we can look at subsets of our rack, because we can form a 7 letter word with more than 7 letters (rather than assume that all letters on the rack must be used to form a 7 letter word).
Daniel: We use an intersection to create a list of words with 1 A, 1 B, etc, 2 A's, 2B's, etc. We do a linear merge upon the sorted word list.

Zach: We take each word, and if it's possible, rebuilt them into a list of letters, sorted by the letters that they have.
DFed: Each node tells you how many words are reachable from each set of letters. (JB: Take a look at Group 2 from last year, we had this all in a SQLite database, plus some other stats too :P)
Eva: We do this all in a few hundred MB

Lauren: We are doing an intersection of both strategies. We have an alphabetical trie for word lookup at the end of the game, plus apriori to find what is reachable. We assume that we may get at least 8 letters, so that you can drop a letter if necessary.
Dan: Try to look at percentage of words that can be formed overall. Rather than counting the words that ARE possible, look also at the words that are form-able.

We ran a game and:
Andrew (g4): We bid 22 for an E. We did this because we are close to 7, and we really like E's.
John (g9): We bid 12 for a V, and that's because we can form a lot of 7 letter words with (A,I,R,V). Our max bid is 15
Aranchana (g1): Similar to John/G9's strategy here
Monica: We are planning to add more to infer others' bidding strategies
Group 6 bid 68 for an S (bug)

DFed: Everyone should be bidding high on S's because they should perceive that they form more 7 letter words.

KR: Did we learn anything about strategy from this?
Daniel: Everyone seems to be pretty conservative.
Monica: Group 3 took on way too many vowels. Only looked at letter frequencies, not looking at what words can be made.
Nitin: We were supposed to stop after a 7 letter word, but stopped at 7 letters. Try to figure out the value of a high bid for everyone before putting in a high bid.
Flavio: We computed the average score for a letter, but we should have come up with an expected score (using the probability of getting each letter)

KR: What's the next most important thing to get working?
DFed: Coming up with an approximation of a letter's value to other people, to adjust your bid.
Blake: If we do a strategy where we bid more than we really value it for, we can use a sliding "bank" for using to outbid others, limiting the damage.
Neeha: DFed & Blake's strategy sound good.

Deliverables, etc

For this Weds: Fix your bugs, etc. and improve your strategies! Let's see the implementation match the discussion, rather than a "we should be doing X but seem to be doing Y" discussion.

Note that the 27th will be the last discussion class for this project. Final code will be submitted on the 29th, and presentations/write ups on Oct 4.

Using my old SQLite database

If you want to play around with statistics, you should take a look at the seven.db file included in the simulator in src/g2/util. You can access it with the sqlite application (I believe one comes standard on Mac OS X, you can download one otherwise).

There are two tables:
lettercollections:
letters|oc_1|oc_2|oc_3|oc_4|oc_5|oc_6|oc_7|a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z

The first column is the set of letters, sorted alphabetically. Each oc_n column shows how many words occur with those letters of length n. The a-z columns show how many of each letter are in that word.

words:
word|letters|score

The first column is the actual word, the second the alphabetically sorted set of letters, and the 3rd the score you'll get for it.

SQLite supports standard SQL, ex, to find the most common letters in 7 letter words:
pazuzu:util jon$ sqlite3 seven.db
sqlite> select letters,oc_7 from lettercollections where length(letters)=1 order by oc_7 desc;
letters|oc_7
E|16068
S|13216
I|12154
A|12028
R|10727
N|9584
O|9426
L|8664
T|8382
U|6765
D|6627
C|5942
G|5338
M|4941
P|4711
H|4599
B|3996
Y|3298
K|2718
F|2563
W|2285
V|1623
Z|760
X|629
J|577
Q|360

No comments:

Post a Comment