TypeTopicCounts Class Reference
#include <TypeTopicCounts.h>
List of all members.
Public Member Functions |
| TypeTopicCounts () |
| TypeTopicCounts (word_t num_words, topic_t num_topics) |
virtual | ~TypeTopicCounts () |
void | initialize_from_docs (string wfname, string tfname) |
int | verify_header (DocumentReader &doc_rdr) |
void | initialize_from_string (word_t word, string &counts) |
bool | initialize_from_dump (string fname, WordIndexDictionary *local_dict=NULL, WordIndexDictionary *global_dict=NULL, size_t offset=0) |
void | initialize_from_ttc (TypeTopicCounts *ttc) |
void | estimate_alphas (double *alphas, double &alpha_sum) |
void | dump (string fname) |
topic_t | get_counts (word_t word, topicCounts *tc) |
topic_t | get_counts (atomic< topic_t > *tc) |
word_t | get_num_words () |
topic_t | get_num_topics () |
word_mutex_t * | get_lock (word_t word) |
pair< TopKList **, TopKList * > | get_topic_stats () |
void | replace (word_t word, topicCounts &tc) |
void | upd_count (word_t word, topic_t old_topic, topic_t new_topic, bool ignore_old_topic=false) |
void | upd_count (word_t word, mapped_vec delta, string dbg="") |
bool | equal (const TypeTopicCounts &expected) |
string | print (word_t word) |
void | print () |
void | initialize (topicCounts *wtc, atomic< topic_t > *tc, word_t word=0) |
void | initialize (topicCounts **wtc, atomic< topic_t > *tc) |
Static Public Member Functions |
static pair< int, float > | estimate_fit (string fname, WordIndexDictionary *dict) |
static pair< int, float > | estimate_fit (string fname, float used_memory, int &incoming_words) |
Protected Member Functions |
void | estimate_memoryn_warn (long num_elems) |
void | clear_stats () |
void | init (topic_t num_topics_) |
void | destroy () |
Protected Attributes |
atomic< topic_t > * | tokens_per_topic |
topic_t | num_topics |
TopKList ** | topic_stats |
TopKList * | top_topics |
Friends |
class | Memcached_Synchronizer |
Constructor & Destructor Documentation
TypeTopicCounts::TypeTopicCounts |
( |
|
) |
|
Construct an empty Word_Topic Counts table
TypeTopicCounts::TypeTopicCounts |
( |
word_t |
num_words_, |
|
|
topic_t |
num_topics_ | |
|
) |
| | |
Constructs a Word Topic Counts table that will have num_words_ unique words and each word can be assigned a maximum of num_topics_ topics
TypeTopicCounts::~TypeTopicCounts |
( |
|
) |
[virtual] |
Member Function Documentation
void TypeTopicCounts::clear_stats |
( |
|
) |
[protected] |
Clears all the topic statistics
void TypeTopicCounts::destroy |
( |
|
) |
[protected] |
void TypeTopicCounts::dump |
( |
string |
fname |
) |
|
Use the DocumentWriter to dump the topicCounts and the n(t) into a dump file (fname). Writes num_words topicCounts vectors and then n(t)
void TypeTopicCounts::estimate_alphas |
( |
double * |
alphas, |
|
|
double & |
alpha_sum | |
|
) |
| | |
pair< int, float > TypeTopicCounts::estimate_fit |
( |
string |
fname, |
|
|
float |
used_memory, |
|
|
int & |
incoming_words | |
|
) |
| | [static] |
pair< int, float > TypeTopicCounts::estimate_fit |
( |
string |
fname, |
|
|
WordIndexDictionary * |
dict | |
|
) |
| | [static] |
void TypeTopicCounts::estimate_memoryn_warn |
( |
long |
num_elems |
) |
[protected] |
Used to estimate the amount of memory being used while initializing this structure in order to warn if it exceeds WARN_MEMORY_SIZE and fail if it exceeds MAX_MEMORY_USAGE
topic_t TypeTopicCounts::get_counts |
( |
atomic< topic_t > * |
tc |
) |
|
The memory for tc is assumed to be allocated. This method just copies the n(t) into tc
topic_t TypeTopicCounts::get_counts |
( |
word_t |
word, |
|
|
topicCounts * |
tc | |
|
) |
| | |
The memory for topicCounts is assumed to have been allocated This method then just copied the topicCounts vector from the table into tc
word_mutex_t * TypeTopicCounts::get_lock |
( |
word_t |
word |
) |
|
If you want to lock the structure externally, you can get the lock that controls access to the word by this method Method: word_mutex_t::scoped_lock lock(*get_lock(word), false);
topic_t TypeTopicCounts::get_num_topics |
( |
|
) |
|
The number of topics being learnt
word_t TypeTopicCounts::get_num_words |
( |
|
) |
|
The number of unique words for which the topic counts are maintained
Get the TopicStats per topic & the hot/top NUM_TOP_TOPICS topics. Used to print the topic stats for the top topics
void TypeTopicCounts::init |
( |
topic_t |
num_topics_ |
) |
[protected] |
void TypeTopicCounts::initialize |
( |
topicCounts ** |
wtc, |
|
|
atomic< topic_t > * |
tc | |
|
) |
| | |
Initialize using and array of topicCounts. Used in testing. Refer TypeTopicCountsTest.cpp
void TypeTopicCounts::initialize |
( |
topicCounts * |
wtc, |
|
|
atomic< topic_t > * |
tc, |
|
|
word_t |
word = 0 | |
|
) |
| | |
Initialize the structure using explicit topicCounts for word. Used in testing. Refer TypeTopicCountsTest.cpp
void TypeTopicCounts::initialize_from_docs |
( |
string |
wfname, |
|
|
string |
tfname | |
|
) |
| | |
Reads documents using DocumentReader. For each document and for each word encountered, it updates the topic counts for that word based on the topic assignment.
Reads the serialize dump (fname) in the protobuffere format into memory. The dump is generated using dump() method. The DocumentReader is used to read from the protobuf format dump file. There will be num_words entries in the dump for topicCounts and the last entry is the n(t). So first we read num_words topicCounts & then n(t)
void TypeTopicCounts::initialize_from_string |
( |
word_t |
word, |
|
|
string & |
counts | |
|
) |
| | |
void TypeTopicCounts::print |
( |
|
) |
|
Dump the structure to log(INFO)
string TypeTopicCounts::print |
( |
word_t |
word |
) |
|
void TypeTopicCounts::replace |
( |
word_t |
word, |
|
|
topicCounts & |
tc | |
|
) |
| | |
Replace the counts for this word with those in tc
void TypeTopicCounts::upd_count |
( |
word_t |
word, |
|
|
mapped_vec |
delta, |
|
|
string |
dbg = "" | |
|
) |
| | |
Update the counts for word using the delta map. The map contains the topic to count deltas. A lock is obtained on the word and is delegated to the relevant topicCounts Used by the background synchronizers
void TypeTopicCounts::upd_count |
( |
word_t |
word, |
|
|
topic_t |
old_topic, |
|
|
topic_t |
new_topic, |
|
|
bool |
ignore_old_topic = false | |
|
) |
| | |
Main update used by the updater filter. Decrements the old_topic counts by 1 and increments the new_topic counts by 1 for word. Acquire a lock. Find the old and new topic locations and update them using the topicCounts vector's methods. However the decrement and increment methods take an index and do fast updates. So there is some pointer arithmetic going on here to find the index into the vector using the position pointers. Also note that the basic structure used has the topic and count packed in a 64 bit integer. So whenever we want to refer to the topic or count individually we need to multiply or divide by 2
Friends And Related Function Documentation
friend class Memcached_Synchronizer [friend] |
Member Data Documentation
The documentation for this class was generated from the following files: