[deutsch]

Parsing comma-separated value (CSV) resources

Data from a CSV resource can be parsed using the function kontrast.parseCSV. The main input to this function is the CSV resource as a string. For example, you can load external resources as a string using a XMLHttpRequest as detailed in the documentation page on using external resources.

Common format for CSV files

The CSV format is specified by RFC 4180 (https://tools.ietf.org/html/rfc4180). The parseCSV conforms to this specification, but can be modified by customizable options to allow for CSV input that does not conform to the specificiation.

The function supports quoted fields using double quotation marks ("). Double quotation marks may be escaped by double double quotation marks ("") as per the RFC.

The parser supports three data types ('string', 'raw-string', 'number') by default and can be extented for custom column data types using custom functions. Valid values within the array supplied as the 'columnTypes' property are:

  • 'string': Parses the content as a string. If the content is not quoted, the string is trimmed (i.e. leading and trailing whitespace is removed). For quoted content, the raw content within quotes is preserved directly.

  • 'raw-string': Parses the content as a string. If the content is not quoted, the string is not trimmed (i.e. leading and trailing whitespace is removed). For quoted content, the raw content within quotes is preserved directly.

  • 'number': Parses the content as a number. The special values 'nan' (case-insensitive) is interpreted as the NaN number value. The string values inf and infinity (case-insensitive, with optional leading plus or minus sign) are parsed as (plus or minus) Infinity.

  • A custom function can be supplied that receives the raw content as a parameter. The return value of the function is used as value. Please refer to the example linked below.

Interactive code example

Input

The function is called as kontrast.parseCSV(options), where options is an object with the following properties:

  • source: string
    The complete resource as a string.
  • columnTypes: array
    An array used to specify the type of each column. Valid items of the array are 'number' (for parsing the column as a floating-point number, 'string' for strings, which are trimmed if not quoted. Untrimmed strings can be extracted using the 'raw-string' value. You can also use a custom function that takes the raw string from the CSV resource as a single argument and returns a custom single value as output.
  • startLine: zeroOrPositiveInteger (default value: 0)
    The index of the line where the first data occurs.
  • headerLine: zeroOrPositiveInteger | undefined
    The index of the line where from which the header can be read. A value of NaN indicates that there is no header.
  • delimiter: character (default value: ',')
    A string of characters used to delimit columns. In most cases, this is a comma (','), a space (' '), a tab ('\t') or a semicolon (';').
  • quoteCharacter: character (default value: '"')
    A single character used to group cells. This is useful for textual data that contains the delimiter. By quoting it the cell can be parsed correctly.
  • decimalSeparator: character (default value: '.')
    A single character used as decimal separator. In most cases, this is a dot ('.') or a comma (',').

Output

  • rowCount: number
    The number of rows
  • columns: object | array
    The parsed data for each column. If a header line has been used, the output is an object, where the property keys are the name of the columns. In the other case, it is an array, where each item of the array is another array with the data.