WordIndexDictionary Class Reference
A two way dictionary of words to indices.
More...
#include <WordIndexDictionary.h>
List of all members.
Detailed Description
A two way dictionary of words to indices.
Provides a two way dictionary mapping words as strings to a unique int index and vice versa. The hashtable implementation of boost/unordered_map is used.
Constructor & Destructor Documentation
WordIndexDictionary::WordIndexDictionary |
( |
|
) |
|
Constructs an empty dictionary
WordIndexDictionary::~WordIndexDictionary |
( |
|
) |
[virtual] |
Member Function Documentation
void WordIndexDictionary::dump |
( |
string |
fname |
) |
|
Dumps the dictionary onto disk in protobuffer binary format so that a new dictionary can be intialized later from the dump using initialize_from_dump Also the dump does batch write to disk to optimize io. Batches 1000 (word,index) pairs and then writes them to disk using DocumentWriter
int WordIndexDictionary::get_freq |
( |
int |
index |
) |
|
int WordIndexDictionary::get_index |
( |
string |
word |
) |
|
Find the unique index assigned to word
int WordIndexDictionary::get_num_words |
( |
|
) |
const |
int WordIndexDictionary::get_prev_index |
( |
int |
new_id |
) |
|
string WordIndexDictionary::get_word |
( |
int |
index |
) |
|
Find the word having index as its index
void WordIndexDictionary::initialize_from_dict |
( |
WordIndexDictionary * |
dict, |
|
|
bool |
sort = false | |
|
) |
| | |
void WordIndexDictionary::initialize_from_dump |
( |
string |
fname, |
|
|
int |
num_words = INT_MAX , |
|
|
bool |
sort = false | |
|
) |
| | |
Initializes from a dump file produced by dump Reads the (word,index) pairs from the file & populates the maps
void WordIndexDictionary::initialize_from_dumps |
( |
string |
prefix, |
|
|
int |
dumps | |
|
) |
| | |
int WordIndexDictionary::insert_word |
( |
string |
word |
) |
|
Insert the word into the dictionary if it doesn't exist This automatically manages assigning unique indices
bool WordIndexDictionary::match_word_index |
( |
|
) |
|
This is a method aiding testing. This tests for the uniqueness of indices. It also does this by making the assumption that the indices have to sequential and reduces the complexity of testing uniqueness by checking the actual sum of the indices assigned to the expected sum comuted as sigma(last_index_assigned) Should always return true. If you change the logic for assigning unique indices, make sure you modify this method to verify it.
void WordIndexDictionary::print |
( |
|
) |
|
Log the dictionary to log(INFO)
size_t WordIndexDictionary::size |
( |
|
) |
|
Member Data Documentation
The documentation for this class was generated from the following files: