mesv/parse

Module containing the functions for creating a Parser, and using the Parser to parse an input CSV String into a List of some data types.

Important!

At this stage, everything is still in flux, and breaking changes can occur on minor version updates. Be careful and check for possible issues before updating!

Examples

A full example of parsing an example CSV String.

import gleam/int
import mesv
import mesv/parse

const expected_data: List(#(String, Int, Bool)) = [
  #("Andrew", 20, True),
  #("Blake", 25, True),
  #("Cassandra", 2, False),
]

pub fn main() -> Nil {
  let parsed_data =
    parse.build({
      // Create a parsing function using `mesv.parsed`
      // to construct a curried parsing function
      use name <- mesv.parsed
      use age <- mesv.parsed
      use adult <- mesv.parsed

      // If any value fails (ie, returns Error(Nil)),
      // the parsing of a row will stop.
      // However, if it reaches here,
      // it returns the following data type
      #(name, age, adult)
    })
    |> parse.column(Ok)
    |> parse.column(int.parse)
    |> parse.column(fn(val: String) -> Result(Bool, Nil) {
      case val {
        "true" | "True" -> Ok(True)
        "false" | "False" -> Ok(False)
        _ -> Error(Nil)
      }
    })
    // Specify that the first row is the headers,
    // and if they don't match what is specified, 
    // the parsing will fail
    |> parse.expect_headers(["Name", "Age", "Is an adult"])
    // Pass in the CSV String to parse
    |> parse.parse(
      "Name,Age,Is an adult\n"
      <> "Andrew,20,true\n"
      <> "Blake,25,True\n"
      <> "Cassandra,2,False",
    )

  assert parsed_data == Ok(#(expected_data, []))
}

Parsing a CSV and performing some operations on the data immediately after parsing

// [...]
const expected_data: List(#(String, Int, Bool)) = [
  #("Anna", 20, True),
  #("Bob", 25, True),
  #("Cleopatra", 2095, False),
  // She's dead, she can't be an adult.
  // But alas, our parser is too simple to understand
  // this fact, so it will throw an error.
]

pub fn main() -> Nil {
  let parsed_data =
    parse.build({
      use name <- mesv.parsed
      use age <- mesv.parsed
      // As long as the operation is guaranteed to result
      // in the data type specified in the Parser,
      // you can do anything in here!
      #(name, age, age >= 18)
    })
    |> parse.column(Ok)
    |> parse.column(int.parse)
    // Pass in the CSV String to parse
    |> parse.parse(
      "Anna,20\n"
      <> "Bob,25\n"
      <> "Cleopatra,2095",
    )

  assert parsed_data == Ok(#(expected_data, []))
}

Types

Parser

opaque </>

The type describing how to create a value of type a from a String.

To create it, use the build function, the provided transformation functions (set_row_sep, set_col_sep, set_escaper, expect_headers) to configure the specific behaviour, and the column function to specify how each subsequent column should be parsed.

Once you have the desired Parser(a), use the parse function to convert a String into a List(a) (plus a list of ParsingErrors).

pub opaque type Parser(a)

ParsingError

</>

An error type representing any kind of error encountered when parsing.

In the future, a better Error type and error handling will be implemented, but it should do its’ job for now.

pub type ParsingError {
  CantParseRow(index: Int, contents: String, reason: String)
  ExpectedHeadersMismatch(
    expected: List(String),
    found: List(String),
  )
  RanOutOfValues
  StrictParsedWithLeftovers(leftovers: List(String))
  EncounteredMalformedElement(
    element: String,
    description: String,
  )
}

Constructors

CantParseRow(index: Int, contents: String, reason: String)

ExpectedHeadersMismatch(
  expected: List(String),
  found: List(String),
)

```
RanOutOfValues
```

StrictParsedWithLeftovers(leftovers: List(String))

EncounteredMalformedElement(element: String, description: String)

Values

build

</>

pub fn build(f: fn(a) -> b) -> Parser(fn(a) -> b)

Function for directly building a Parser that uses the subsequent elements in order.

The function passed in should be a curried one - ie, a function that returns a function, and so on, with every subsequent function taking in some type of argument.

To build the parser, transform it using the parse.column function to specify how to parse each subsequent value in a row.

Examples

The simplest parser is one element:

parse.build(fn(str) { str })
  |> parse.column(Ok)

When used, it will create a List(String) containing the first cell of each row of the input CSV String.

Infallible transformation of the data can be done both inside of the initial function that is passed to parse.build and in parse.column, but fallible transformations (those that output a Result or Option when the argument requires what’s inside the Option) must reside in the parse.column call.

A more complex Parser would be something like this:

parse.build({
  use name: String <- mesv.parsed
  use age: Int <- mesv.parsed
  use adult: Bool <- mesv.parsed

  #(name, age, adult)
})

and to parse the arguments to construct the result, again, use the parse.column function.

column

</>

pub fn column(
  parser: Parser(fn(a) -> b),
  parse: fn(String) -> Result(a, Nil),
) -> Parser(b)

Transform a Parser, by passing in a parsing function for a specified column.

This function will be called for every row, and the output of this function, if it’s Ok(a), will be passed to the Parser’s internal function, and the parsing of the row continued;

If it’s Error(Nil), the parsing of the row will fail.

Examples

// Parser(fn(String) -> a)
parser
  |> parse.column(Ok)
  // Parser(a)

expect_headers

</>

pub fn expect_headers(
  parser: Parser(a),
  headers: List(String),
) -> Parser(a)

Configure the parser to treat the first parsed row as the headers, and specify that we expect the CSV headers to equal these headers.

If the first row is not strictly identical to the contents of the arguments to this function, the parser will return an Error.

Examples

parser
  |> parse.parse("a,1,c")
  // -> row returns Ok(#("a", 1, "c"))

parser
  |> set_col_sep("|")
  |> parse.parse("a,1,c")
  // -> row returns Error(RanOutOfValues)
parser
  |> set_col_sep("|")
  |> parse.parse("a|1|c")
  // -> row returns Ok(#("a", 1, "c"))

parse

</>

pub fn parse(
  parser: Parser(a),
  source: String,
) -> Result(#(List(a), List(ParsingError)), ParsingError)

Function to use the specified Parser(a) to transform the source into a List(a)

If the headers specified in the expect_headers function did not match the specified pattern, a ParsingError will be returned, of the type ExpectedHeadersMismatch, containing both the expected headers, and what was found.

If the headers weren’t specified, or were specified and match the expected pattern, the function will return Ok(#(List(parsed_type), List(ParsingError))); The first is the list of all rows that were successfully parsed, while the second is a list of ParsingErrors that were thrown due to a row failing to parse.

What to do with both of these Lists is up to the user, whether to ignore all errors or abort if any errors occur.

partition_on_unescaped_

</>

pub fn partition_on_unescaped_(
  separator el: String,
  not_in escaper: String,
) -> fn(String) -> List(String)

Internal helper function for constructing a function that splits a String on separator, as long as the separator is not between two not_in.

It is public because I created unit tests for it.

Feel free to use it, but it is not part of the API, so a breaking change can occur in every version change, without prior notice.

set_col_sep

</>

pub fn set_col_sep(
  parser: Parser(a),
  new_column_separator: String,
) -> Parser(a)

Function to set a specific column separator, instead of the default comma (,)

Examples

parser
  |> parse.parse("a,1,c")
  // -> row returns Ok(#("a", 1, "c"))

parser
  |> set_col_sep("|")
  |> parse.parse("a,1,c")
  // -> row returns Error(RanOutOfValues)
parser
  |> set_col_sep("|")
  |> parse.parse("a|1|c")
  // -> row returns Ok(#("a", 1, "c"))

set_escaper

</>

pub fn set_escaper(
  parser: Parser(a),
  new_escaper: String,
) -> Parser(a)

Function to set a specific value escaper, instead of the default doublequotes (")

Escapers are wrapped around a cell if that cell contains any one or more of:

column separator (by default ,)
row separator (by default \n)
escaper itself

In the event that a cell contains an escaper, the escaper is first replaced with two escapers.

So here's " would first become here's "" , then be wrapped and become "here's "" ".

Examples

parser
  |> parse.parse("a,'b','c'''")
  // -> row returns Ok(#("a", "'b'", "'c'''"))
parser
  |> parse.parse("a,\"b\",\"c\"\"\"")
  // -> row returns Ok(#("a", "b", "c\""))

parser
  |> set_escaper("'")
  |> parse.parse("a,'b','c'''")
  // -> row returns Ok(#("a", "b", "c'"))
parser
  |> set_escaper("'")
  |> parse.parse("a,\"b\",\"c\"\"\"")
  // -> row returns Ok(#("a", "\"b\"", "\"c\"\"\""))

set_row_sep

</>

pub fn set_row_sep(
  parser: Parser(a),
  new_row_separator: String,
) -> Parser(a)

Function to set a specific row separator, instead of the default newline (\n)

Examples

parser
  |> parse.parse("a,1,c\nd,4,a")
  // -> parse returns [#("a", 1, "c"), #("d", 4, "a")]

parser
  |> set_row_sep("|")
  |> parse.parse("a,1,c\nd,4,a")
  // -> parse returns [#("a", 1, "c\nd")]
  // the two cells "4" and "a" are treated as leftovers
parser
  |> set_row_sep("|")
  |> parse.parse("a,1,c|d,4,a")
  // -> parse returns [#("a", 1, "c"), #("d", 4, "a")]

set_strict_columns

</>

pub fn set_strict_columns(parser: Parser(a)) -> Parser(a)

Function to make the parser strict in terms of columns.

This means that when parsing a row, there must be exactly as many cells as there were arguments for the internal Parser function. If this function is called, if there are any leftover values after the parsing is finished, parsing that row returns an Error even if the parsing returned a value.

Examples

parser
  |> parse.parse("a,1,c")
  // -> row returns Ok(#("a", 1))

parser
  |> set_strict_columns()
  |> parse.parse("a,1,c")
  // -> row returns Error(StrictParsedWithLeftovers(["c"]))

set_trim_whitespace

</>

pub fn set_trim_whitespace(
  parser: Parser(a),
  trim_start: Bool,
  trim_end: Bool,
) -> Parser(a)

Function to set whether the parser should trim the whitespace on both ends of each value.

This operation is performed before the cell is unwrapped (escapers removed), so if the CSV file was modified somehow (for example, using VSCode plugin Rainbow CSV to align the columns), the cell can be correctly unescaped and parsed.

I think the behaviour of this function and internal order of operations will change in the future, so no examples yet.