TypeTopicCounts Class Reference

#include <TypeTopicCounts.h>

List of all members.

Public Member Functions

 TypeTopicCounts ()
 TypeTopicCounts (word_t num_words, topic_t num_topics)
virtual ~TypeTopicCounts ()
void initialize_from_docs (string wfname, string tfname)
int verify_header (DocumentReader &doc_rdr)
void initialize_from_string (word_t word, string &counts)
bool initialize_from_dump (string fname, WordIndexDictionary *local_dict=NULL, WordIndexDictionary *global_dict=NULL, size_t offset=0)
void initialize_from_ttc (TypeTopicCounts *ttc)
void estimate_alphas (double *alphas, double &alpha_sum)
void dump (string fname)
topic_t get_counts (word_t word, topicCounts *tc)
topic_t get_counts (atomic< topic_t > *tc)
word_t get_num_words ()
topic_t get_num_topics ()
word_mutex_t * get_lock (word_t word)
pair< TopKList **, TopKList * > get_topic_stats ()
void replace (word_t word, topicCounts &tc)
void upd_count (word_t word, topic_t old_topic, topic_t new_topic, bool ignore_old_topic=false)
void upd_count (word_t word, mapped_vec delta, string dbg="")
bool equal (const TypeTopicCounts &expected)
string print (word_t word)
void print ()
void initialize (topicCounts *wtc, atomic< topic_t > *tc, word_t word=0)
void initialize (topicCounts **wtc, atomic< topic_t > *tc)

Static Public Member Functions

static pair< int, float > estimate_fit (string fname, WordIndexDictionary *dict)
static pair< int, float > estimate_fit (string fname, float used_memory, int &incoming_words)

Protected Member Functions

void estimate_memoryn_warn (long num_elems)
void clear_stats ()
void init (topic_t num_topics_)
void destroy ()

Protected Attributes

atomic< topic_t > * tokens_per_topic
topic_t num_topics
TopKList ** topic_stats
TopKListtop_topics

Friends

class Memcached_Synchronizer

Constructor & Destructor Documentation

TypeTopicCounts::TypeTopicCounts (  ) 

Construct an empty Word_Topic Counts table

TypeTopicCounts::TypeTopicCounts ( word_t  num_words_,
topic_t  num_topics_ 
)

Constructs a Word Topic Counts table that will have num_words_ unique words and each word can be assigned a maximum of num_topics_ topics

TypeTopicCounts::~TypeTopicCounts (  )  [virtual]

Member Function Documentation

void TypeTopicCounts::clear_stats (  )  [protected]

Clears all the topic statistics

void TypeTopicCounts::destroy (  )  [protected]
void TypeTopicCounts::dump ( string  fname  ) 

Use the DocumentWriter to dump the topicCounts and the n(t) into a dump file (fname). Writes num_words topicCounts vectors and then n(t)

bool TypeTopicCounts::equal ( const TypeTopicCounts expected  ) 
void TypeTopicCounts::estimate_alphas ( double *  alphas,
double &  alpha_sum 
)
pair< int, float > TypeTopicCounts::estimate_fit ( string  fname,
float  used_memory,
int &  incoming_words 
) [static]
pair< int, float > TypeTopicCounts::estimate_fit ( string  fname,
WordIndexDictionary dict 
) [static]
void TypeTopicCounts::estimate_memoryn_warn ( long  num_elems  )  [protected]

Used to estimate the amount of memory being used while initializing this structure in order to warn if it exceeds WARN_MEMORY_SIZE and fail if it exceeds MAX_MEMORY_USAGE

topic_t TypeTopicCounts::get_counts ( atomic< topic_t > *  tc  ) 

The memory for tc is assumed to be allocated. This method just copies the n(t) into tc

topic_t TypeTopicCounts::get_counts ( word_t  word,
topicCounts tc 
)

The memory for topicCounts is assumed to have been allocated This method then just copied the topicCounts vector from the table into tc

word_mutex_t * TypeTopicCounts::get_lock ( word_t  word  ) 

If you want to lock the structure externally, you can get the lock that controls access to the word by this method Method: word_mutex_t::scoped_lock lock(*get_lock(word), false);

topic_t TypeTopicCounts::get_num_topics (  ) 

The number of topics being learnt

word_t TypeTopicCounts::get_num_words (  ) 

The number of unique words for which the topic counts are maintained

pair< TopKList **, TopKList * > TypeTopicCounts::get_topic_stats (  ) 

Get the TopicStats per topic & the hot/top NUM_TOP_TOPICS topics. Used to print the topic stats for the top topics

void TypeTopicCounts::init ( topic_t  num_topics_  )  [protected]
void TypeTopicCounts::initialize ( topicCounts **  wtc,
atomic< topic_t > *  tc 
)

Initialize using and array of topicCounts. Used in testing. Refer TypeTopicCountsTest.cpp

void TypeTopicCounts::initialize ( topicCounts wtc,
atomic< topic_t > *  tc,
word_t  word = 0 
)

Initialize the structure using explicit topicCounts for word. Used in testing. Refer TypeTopicCountsTest.cpp

void TypeTopicCounts::initialize_from_docs ( string  wfname,
string  tfname 
)

Reads documents using DocumentReader. For each document and for each word encountered, it updates the topic counts for that word based on the topic assignment.

bool TypeTopicCounts::initialize_from_dump ( string  fname,
WordIndexDictionary local_dict = NULL,
WordIndexDictionary global_dict = NULL,
size_t  offset = 0 
)

Reads the serialize dump (fname) in the protobuffere format into memory. The dump is generated using dump() method. The DocumentReader is used to read from the protobuf format dump file. There will be num_words entries in the dump for topicCounts and the last entry is the n(t). So first we read num_words topicCounts & then n(t)

void TypeTopicCounts::initialize_from_string ( word_t  word,
string &  counts 
)
void TypeTopicCounts::initialize_from_ttc ( TypeTopicCounts ttc  ) 
void TypeTopicCounts::print (  ) 

Dump the structure to log(INFO)

string TypeTopicCounts::print ( word_t  word  ) 
void TypeTopicCounts::replace ( word_t  word,
topicCounts tc 
)

Replace the counts for this word with those in tc

void TypeTopicCounts::upd_count ( word_t  word,
mapped_vec  delta,
string  dbg = "" 
)

Update the counts for word using the delta map. The map contains the topic to count deltas. A lock is obtained on the word and is delegated to the relevant topicCounts Used by the background synchronizers

void TypeTopicCounts::upd_count ( word_t  word,
topic_t  old_topic,
topic_t  new_topic,
bool  ignore_old_topic = false 
)

Main update used by the updater filter. Decrements the old_topic counts by 1 and increments the new_topic counts by 1 for word. Acquire a lock. Find the old and new topic locations and update them using the topicCounts vector's methods. However the decrement and increment methods take an index and do fast updates. So there is some pointer arithmetic going on here to find the index into the vector using the position pointers. Also note that the basic structure used has the topic and count packed in a 64 bit integer. So whenever we want to refer to the topic or count individually we need to multiply or divide by 2

int TypeTopicCounts::verify_header ( DocumentReader doc_rdr  ) 

Friends And Related Function Documentation

friend class Memcached_Synchronizer [friend]

Member Data Documentation

topic_t TypeTopicCounts::num_topics [protected]
atomic<topic_t>* TypeTopicCounts::tokens_per_topic [protected]

The documentation for this class was generated from the following files:
Generated on Tue Jul 19 11:45:28 2011 for Y!LDA by  doxygen 1.6.3