public class TrecContentSource extends ContentSource
ContentSource over the TREC collection.
Supports the following configuration parameters (on top of
ContentSource):
HTMLParser class to use for
parsing the TREC documents content (default=DemoHTMLParser).
BUFFER_SIZE, encoding, forever, logStep, verbose| Constructor and Description |
|---|
TrecContentSource() |
| Modifier and Type | Method and Description |
|---|---|
void |
close()
Called when reading from this content source is no longer required.
|
DocData |
getNextDocData(DocData docData)
Returns the next
DocData from the content source. |
void |
resetInputs()
Resets the input for this content source, so that the test would behave as
if it was just started, input-wise.
|
void |
setConfig(Config config)
Sets the
Config for this content source. |
addBytes, addDoc, collectFiles, getBytesCount, getConfig, getDocsCount, getInputStream, getTotalBytesCount, getTotalDocsCount, shouldLogpublic void close()
throws java.io.IOException
ContentSourceclose in class ContentSourcejava.io.IOExceptionpublic DocData getNextDocData(DocData docData) throws NoMoreDataException, java.io.IOException
ContentSourceDocData from the content source.getNextDocData in class ContentSourceNoMoreDataExceptionjava.io.IOExceptionpublic void resetInputs()
throws java.io.IOException
ContentSourceNOTE: the default implementation resets the number of bytes and documents generated since the last reset, so it's important to call super.resetInputs in case you override this method.
resetInputs in class ContentSourcejava.io.IOExceptionpublic void setConfig(Config config)
ContentSourceConfig for this content source. If you override this
method, you must call super.setConfig.setConfig in class ContentSourceCopyright © 2000-2016 Apache Software Foundation. All Rights Reserved.