public class Word2Vec extends Object implements scala.Serializable, Logging
We used skip-gram model in our implementation and hierarchical softmax method to train the model. The variable names in the implementation matches the original C implementation.
For original C implementation, see https://code.google.com/p/word2vec/ For research papers, see Efficient Estimation of Word Representations in Vector Space and Distributed Representations of Words and Phrases and their Compositionality.
| Constructor and Description | 
|---|
Word2Vec()  | 
| Modifier and Type | Method and Description | 
|---|---|
<S extends Iterable<String>> | 
fit(JavaRDD<S> dataset)
Computes the vector representation of each word in vocabulary (Java version). 
 | 
<S extends scala.collection.Iterable<String>> | 
fit(RDD<S> dataset)
Computes the vector representation of each word in vocabulary. 
 | 
Word2Vec | 
setLearningRate(double learningRate)
Sets initial learning rate (default: 0.025). 
 | 
Word2Vec | 
setMaxSentenceLength(int maxSentenceLength)
Sets the maximum length (in words) of each sentence in the input data. 
 | 
Word2Vec | 
setMinCount(int minCount)
Sets minCount, the minimum number of times a token must appear to be included in the word2vec
 model's vocabulary (default: 5). 
 | 
Word2Vec | 
setNumIterations(int numIterations)
Sets number of iterations (default: 1), which should be smaller than or equal to number of
 partitions. 
 | 
Word2Vec | 
setNumPartitions(int numPartitions)
Sets number of partitions (default: 1). 
 | 
Word2Vec | 
setSeed(long seed)
Sets random seed (default: a random long integer). 
 | 
Word2Vec | 
setVectorSize(int vectorSize)
Sets vector size (default: 100). 
 | 
Word2Vec | 
setWindowSize(int window)
Sets the window of words (default: 5) 
 | 
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitinitializeLogging, initializeLogIfNecessary, initializeLogIfNecessary, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarningpublic <S extends scala.collection.Iterable<String>> Word2VecModel fit(RDD<S> dataset)
dataset - an RDD of sentences,
                each sentence is expressed as an iterable collection of wordspublic <S extends Iterable<String>> Word2VecModel fit(JavaRDD<S> dataset)
dataset - a JavaRDD of wordspublic Word2Vec setLearningRate(double learningRate)
learningRate - (undocumented)public Word2Vec setMaxSentenceLength(int maxSentenceLength)
maxSentenceLength size (default: 1000)maxSentenceLength - (undocumented)public Word2Vec setMinCount(int minCount)
minCount - (undocumented)public Word2Vec setNumIterations(int numIterations)
numIterations - (undocumented)public Word2Vec setNumPartitions(int numPartitions)
numPartitions - (undocumented)public Word2Vec setSeed(long seed)
seed - (undocumented)public Word2Vec setVectorSize(int vectorSize)
vectorSize - (undocumented)public Word2Vec setWindowSize(int window)
window - (undocumented)