com.univocity.api.entity.text
Class TextEntityConfiguration<F extends TextFormat>

java.lang.Object
  extended by com.univocity.api.entity.Configuration
      extended by com.univocity.api.entity.text.TextEntityConfiguration<F>
Type Parameters:
F - the configuration class that manages a specific text format.
Direct Known Subclasses:
CsvEntityConfiguration, FixedWidthEntityConfiguration, TsvEntityConfiguration

public abstract class TextEntityConfiguration<F extends TextFormat>
extends Configuration

This is the parent class for all configuration classes used by text-based data entities. It provides essential configuration settings and sensible defaults for reading from and writing to text in conformance to a particular format (such as CSV, for example).

Author:
uniVocity Software Pty Ltd - dev@univocity.com

Field Summary
protected  int[] fieldLengths
           
protected  String[] headers
           
protected  String[] identifiers
           
 
Constructor Summary
protected TextEntityConfiguration()
           
 
Method Summary
protected  void copyDefaultsFrom(Configuration defaultConfig)
          Applies default values to undefined settings using a Configuration object.
 int[] getFieldLengths()
          Returns the length of each column of records returned by the data entity.
 F getFormat()
          Returns the input/output format settings for a given text.
 String[] getHeaders()
          Returns the sequence of field names used to refer to columns in the input/output text of an entity.
 String[] getIdentifiers()
          Returns the sequence of field names to use as data entity identifiers.
 boolean getIgnoreLeadingWhitespaces()
          Determines whether to remove leading white spaces from values being read/written
 boolean getIgnoreTrailingWhitespaces()
          Determines whether to remove trailing white spaces from values being read/written
 int getInputBufferSize()
          Defines the number of characters held by the entity buffer when reading from the input
 int getMaxCharsPerColumn()
          Returns the maximum number of characters allowed for any given value being written/read.
 int getMaxColumns()
          Returns the hard limit on how many columns a record can have.
 String getNullValue()
          Returns the default value used in substitution of null when there are empty fields in a text record.
 int getNumberOfRecordsToRead()
          Returns the number of valid records to be parsed before the reading process is stopped.
 boolean getReadInputOnSeparateThread()
          Defines whether or not a separate thread will be used to read characters from the input while parsing.
 boolean getSkipEmptyLines()
          Determines whether to skip empty lines of text when reading: if the entity reads an empty line from the input, it will be discarded. when writing: if the entity receives an empty or null row to write to the output, it will be ignored. defaults to true.
 boolean isHeaderExtractionEnabled()
          Indicates whether or not the first valid record parsed from the input should be used to derive the names of each column of this entity.
 boolean isHeaderWritingEnabled()
          Indicates whether or not to write headers to the output when writing records to an empty entity.
protected abstract  F newDefaultFormat()
          Creates a new instance of a text format configuration.
 void setFieldLengths(int... fieldLengths)
          Associates a length to each column of records returned by the data entity.
 void setFieldsAndLengths(LinkedHashMap<String,Integer> fields)
          Defines a sequence of field names used to refer to columns in the input/output text of an entity, along with their lengths.
 void setFormat(F format)
          Defines the input/output format settings for a given text.
 void setHeaderExtractionEnabled(boolean extractHeaders)
          Defines whether or not the first valid record parsed from the input should be used to derive the names of each column of this entity.
 void setHeaders(String... headers)
          Defines a sequence of field names used to refer to columns in the input/output text of an entity.
 void setHeaderWritingEnabled(boolean headerWritingEnabled)
          Indicates whether or not to write headers to the output when writing records to an empty entity.
 void setIdentifiers(String... identifiers)
          Defines a sequence of field names to use as data entity identifiers.
 void setIgnoreLeadingWhitespaces(boolean ignoreLeadingWhitespaces)
          Determines whether to remove leading white spaces from values being read/written
 void setIgnoreTrailingWhitespaces(boolean ignoreTrailingWhitespaces)
          Determines whether to remove trailing white spaces from values being read/written
 void setInputBufferSize(int inputBufferSize)
          Defines the number of characters held by the entity buffer when reading from the input
 void setMaxCharsPerColumn(int maxCharsPerColumn)
          Defines the maximum number of characters allowed for any given value being written/read.
 void setMaxColumns(int maxColumns)
          Defines a hard limit on how many columns a record can have.
 void setNullValue(String nullValue)
          Defines a default value to be used in substitution of null when there are empty fields in a text record.
 void setNumberOfRecordsToRead(int numberOfRecordsToRead)
          Defines the number of valid records to be parsed before the reading process is stopped.
 void setReadInputOnSeparateThread(boolean readInputOnSeparateThread)
          Defines whether or not a separate thread will be used to read characters from the input while parsing.
 void setSkipEmptyLines(boolean skipEmptyLines)
          Determines whether to skip empty lines of text when reading: if the entity reads an empty line from the input, it will be discarded. when writing: if the entity receives an empty or null row to write to the output, it will be ignored.
protected  void validateHeaders(String[] headers, String[] identifiers, int[] lengths)
          Validates headers and information associated with them.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

headers

protected String[] headers

identifiers

protected String[] identifiers

fieldLengths

protected int[] fieldLengths
Constructor Detail

TextEntityConfiguration

protected TextEntityConfiguration()
Method Detail

getHeaders

public String[] getHeaders()
Returns the sequence of field names used to refer to columns in the input/output text of an entity. This overrides any headers extracted from a text input (when isHeaderExtractionEnabled()== true)

Defaults to null.

Returns:
the field name sequence to be associated with each column in the input/output.

setHeaders

public void setHeaders(String... headers)
Defines a sequence of field names used to refer to columns in the input/output text of an entity. This overrides any headers extracted from a text input (when isHeaderExtractionEnabled()== true)

Parameters:
headers - the field name sequence to be associated with each column in the input/output.

setIdentifiers

public void setIdentifiers(String... identifiers)
Defines a sequence of field names to use as data entity identifiers.

Parameters:
identifiers - sequence of field names to use as identifiers of each record of the data entity

getIdentifiers

public String[] getIdentifiers()
Returns the sequence of field names to use as data entity identifiers.

defaults to null.

Returns:
the sequence of field names to use as identifiers of each record of the data entity

getNumberOfRecordsToRead

public int getNumberOfRecordsToRead()
Returns the number of valid records to be parsed before the reading process is stopped.
A negative value indicates there's no limit and all records in the input will be read.

Defaults to -1.

Returns:
the number of records to read before stopping the reading process.

setNumberOfRecordsToRead

public void setNumberOfRecordsToRead(int numberOfRecordsToRead)
Defines the number of valid records to be parsed before the reading process is stopped.
A negative value indicates there's no limit and all records in the input will be read.

Defaults to -1.

Parameters:
numberOfRecordsToRead - the number of records to read before stopping the reading process.

getFieldLengths

public int[] getFieldLengths()
Returns the length of each column of records returned by the data entity.
This information is only used when enabling database-like operations on the entity via DataStoreConfiguration.enableDatabaseOperationsIn(String...)

Defaults to null.

Returns:
fieldLengths the length of each field of the records produced by the data entity.

setFieldLengths

public void setFieldLengths(int... fieldLengths)
Associates a length to each column of records returned by the data entity.
This information is only required when enabling database-like operations on the entity through DataStoreConfiguration.enableDatabaseOperationsIn(String...)

Parameters:
fieldLengths - the length of each field of the records produced by the data entity.

validateHeaders

protected void validateHeaders(String[] headers,
                               String[] identifiers,
                               int[] lengths)
Validates headers and information associated with them.

Parameters:
headers - the headers to validate
identifiers - the identifiers that can be found among the given headers
lengths - the lengths of each field

setFieldsAndLengths

public void setFieldsAndLengths(LinkedHashMap<String,Integer> fields)
Defines a sequence of field names used to refer to columns in the input/output text of an entity, along with their lengths. This overrides any headers extracted from a text input (when isHeaderExtractionEnabled()== true)

Parameters:
fields - a LinkedHashMap containing the sequence of fields to be associated with each column in the input/output, with their respective lengths.

getSkipEmptyLines

public final boolean getSkipEmptyLines()
Determines whether to skip empty lines of text defaults to true.

Returns:
a flag indicating whether or not empty lines should be skipped.

setSkipEmptyLines

public final void setSkipEmptyLines(boolean skipEmptyLines)
Determines whether to skip empty lines of text

Parameters:
skipEmptyLines - a flag indicating whether or not empty lines should be skipped.

getIgnoreTrailingWhitespaces

public final boolean getIgnoreTrailingWhitespaces()
Determines whether to remove trailing white spaces from values being read/written

defaults to true.

Returns:
true if trailing white spaces should be removed from values of this entity; false otherwise

setIgnoreTrailingWhitespaces

public final void setIgnoreTrailingWhitespaces(boolean ignoreTrailingWhitespaces)
Determines whether to remove trailing white spaces from values being read/written

Parameters:
ignoreTrailingWhitespaces - flag indicating whether trailing white spaces should be removed from values of this entity.

isHeaderExtractionEnabled

public final boolean isHeaderExtractionEnabled()
Indicates whether or not the first valid record parsed from the input should be used to derive the names of each column of this entity.

defaults to false.

Returns:
true if the first valid record parsed from the input should be used to derive the names of each column, false otherwise

setHeaderExtractionEnabled

public final void setHeaderExtractionEnabled(boolean extractHeaders)
Defines whether or not the first valid record parsed from the input should be used to derive the names of each column of this entity.

Parameters:
extractHeaders - a flag indicating whether the first valid record parsed from the input be used to derive the names of each column of this entity.

getIgnoreLeadingWhitespaces

public final boolean getIgnoreLeadingWhitespaces()
Determines whether to remove leading white spaces from values being read/written

defaults to true.

Returns:
true if leading white spaces should be removed from values of this entity; false otherwise.

setIgnoreLeadingWhitespaces

public final void setIgnoreLeadingWhitespaces(boolean ignoreLeadingWhitespaces)
Determines whether to remove leading white spaces from values being read/written

defaults to true.

Parameters:
ignoreLeadingWhitespaces - true if leading white spaces should be removed from values of this entity.

getInputBufferSize

public final int getInputBufferSize()
Defines the number of characters held by the entity buffer when reading from the input

Defaults to 1024*1024 characters (i.e. 1,048,576 characters).

Returns:
the number of characters held by the entity buffer when reading from the input

setInputBufferSize

public final void setInputBufferSize(int inputBufferSize)
Defines the number of characters held by the entity buffer when reading from the input

Parameters:
inputBufferSize - the new input buffer size (in number of characters)

getReadInputOnSeparateThread

public final boolean getReadInputOnSeparateThread()
Defines whether or not a separate thread will be used to read characters from the input while parsing. Defaults to true if the number of available processors at runtime is greater than 1

Returns:
true if the input should be read on a separate thread, false otherwise

setReadInputOnSeparateThread

public final void setReadInputOnSeparateThread(boolean readInputOnSeparateThread)
Defines whether or not a separate thread will be used to read characters from the input while parsing.

Parameters:
readInputOnSeparateThread - the flag indicating whether or not the input should be read on a separate thread

setNullValue

public final void setNullValue(String nullValue)
Defines a default value to be used in substitution of null when there are empty fields in a text record.

Parameters:
nullValue - a default value used instead of null for reading and writing.

getNullValue

public final String getNullValue()
Returns the default value used in substitution of null when there are empty fields in a text record.

defaults to null.

Returns:
a default value used instead of null for reading and writing.

getMaxCharsPerColumn

public final int getMaxCharsPerColumn()
Returns the maximum number of characters allowed for any given value being written/read.
This is required to avoid getting an OutOfMemoryError in case a file does not have a valid format.
In such cases the entity might just keep reading from the input until its end, or until the memory is exhausted. This provides a limit which avoids unwanted JVM crashes.

defaults to 4096.

Returns:
the maximum number of characters any given field in a record can have

setMaxCharsPerColumn

public final void setMaxCharsPerColumn(int maxCharsPerColumn)
Defines the maximum number of characters allowed for any given value being written/read.
This is required to avoid getting an OutOfMemoryError in case a file does not have a valid format.
In such cases the entity might just keep reading from the input until its end, or until the memory is exhausted. This provides a limit which avoids unwanted JVM crashes.

Parameters:
maxCharsPerColumn - the maximum number of characters any given field in a record can have

getMaxColumns

public final int getMaxColumns()
Returns the hard limit on how many columns a record can have.
This is required to avoid getting an OutOfMemoryError in case a file does not have a valid format.
In such cases the entity might just keep reading from the input until its end, or until the memory is exhausted. This provides a limit which avoids unwanted JVM crashes.

defaults to 512.

Returns:
the maximum number of columns a record can have.

setMaxColumns

public final void setMaxColumns(int maxColumns)
Defines a hard limit on how many columns a record can have.
This is required to avoid getting an OutOfMemoryError in case a file does not have a valid format.
In such cases the entity might just keep reading from the input until its end, or until the memory is exhausted. This provides a limit which avoids unwanted JVM crashes.

Parameters:
maxColumns - the maximum number of columns a record can have.

isHeaderWritingEnabled

public final boolean isHeaderWritingEnabled()
Indicates whether or not to write headers to the output when writing records to an empty entity.

Note: write-only entities (i.e. obtained from WriterProvider) do not provide information about whether the output is empty or not. uniVocity will only attempt to write headers to such entities after a call to WriterProvider.clearDestination() is made

defaults to false.

Returns:
true if the headers should be written before adding records to an empty entity, false otherwise

setHeaderWritingEnabled

public final void setHeaderWritingEnabled(boolean headerWritingEnabled)
Indicates whether or not to write headers to the output when writing records to an empty entity.

Note: write-only entities (i.e. obtained from WriterProvider) do not provide information about whether the output is empty or not. uniVocity will only attempt to write headers to such entities after a call to WriterProvider.clearDestination() is made

Parameters:
headerWritingEnabled - true if the headers should be written before adding records to an empty entity, false otherwise

getFormat

public final F getFormat()
Returns the input/output format settings for a given text. Each text format requires specific configuration, but they all share common settings from TextFormat

Returns:
the text format settings.

setFormat

public final void setFormat(F format)
Defines the input/output format settings for a given text. Each text format requires specific configuration, but they all share common settings from TextFormat

Parameters:
format - the text format settings.

newDefaultFormat

protected abstract F newDefaultFormat()
Creates a new instance of a text format configuration.

Returns:
a new instance of a text format configuration.

copyDefaultsFrom

protected void copyDefaultsFrom(Configuration defaultConfig)
Applies default values to undefined settings using a Configuration object.

Specified by:
copyDefaultsFrom in class Configuration
Parameters:
defaultConfig - a configuration object from where to obtain default settings.


Copyright © 2015 uniVocity Software Pty Ltd. All rights reserved.