WordIndexDictionary Class Reference

A two way dictionary of words to indices. More...

#include <WordIndexDictionary.h>

List of all members.

Public Member Functions

 WordIndexDictionary ()
virtual ~WordIndexDictionary ()
int get_index (string word)
string get_word (int index)
int insert_word (string word)
int get_num_words () const
void print ()
bool match_word_index ()
void dump (string fname)
void initialize_from_dict (WordIndexDictionary *dict, bool sort=false)
void initialize_from_dump (string fname, int num_words=INT_MAX, bool sort=false)
void initialize_from_dumps (string prefix, int dumps)
size_t size ()
int get_prev_index (int new_id)
int get_freq (int index)

Public Attributes

vector< id2freq_t > frequencies

Detailed Description

A two way dictionary of words to indices.

Provides a two way dictionary mapping words as strings to a unique int index and vice versa. The hashtable implementation of boost/unordered_map is used.


Constructor & Destructor Documentation

WordIndexDictionary::WordIndexDictionary (  ) 

Constructs an empty dictionary

WordIndexDictionary::~WordIndexDictionary (  )  [virtual]

Member Function Documentation

void WordIndexDictionary::dump ( string  fname  ) 

Dumps the dictionary onto disk in protobuffer binary format so that a new dictionary can be intialized later from the dump using initialize_from_dump Also the dump does batch write to disk to optimize io. Batches 1000 (word,index) pairs and then writes them to disk using DocumentWriter

int WordIndexDictionary::get_freq ( int  index  ) 
int WordIndexDictionary::get_index ( string  word  ) 

Find the unique index assigned to word

int WordIndexDictionary::get_num_words (  )  const
int WordIndexDictionary::get_prev_index ( int  new_id  ) 
string WordIndexDictionary::get_word ( int  index  ) 

Find the word having index as its index

void WordIndexDictionary::initialize_from_dict ( WordIndexDictionary dict,
bool  sort = false 
)
void WordIndexDictionary::initialize_from_dump ( string  fname,
int  num_words = INT_MAX,
bool  sort = false 
)

Initializes from a dump file produced by dump Reads the (word,index) pairs from the file & populates the maps

void WordIndexDictionary::initialize_from_dumps ( string  prefix,
int  dumps 
)
int WordIndexDictionary::insert_word ( string  word  ) 

Insert the word into the dictionary if it doesn't exist This automatically manages assigning unique indices

bool WordIndexDictionary::match_word_index (  ) 

This is a method aiding testing. This tests for the uniqueness of indices. It also does this by making the assumption that the indices have to sequential and reduces the complexity of testing uniqueness by checking the actual sum of the indices assigned to the expected sum comuted as sigma(last_index_assigned) Should always return true. If you change the logic for assigning unique indices, make sure you modify this method to verify it.

void WordIndexDictionary::print (  ) 

Log the dictionary to log(INFO)

size_t WordIndexDictionary::size (  ) 

Member Data Documentation


The documentation for this class was generated from the following files:
Generated on Tue Jul 19 11:45:30 2011 for Y!LDA by  doxygen 1.6.3