lector.csv.dialects#

Detectors of CSV dialects (separator, quoting etc.).

Note that python.csv is not even internally consistent. E.g. although the dialect used to produce a CSV may specify \n as the line terminator, the python sniffer is hard-coded to return \r\n (it doesn’t actually support detecting it). It’s own reader (and others hopefully) deal internally with different line breaks, but it means one cannot compare a dialect used to generate a CSV and a dialect created by sniffing the same (quoting is equally hard-coded to QUOTE_MINIMAL).

Python quoting levels:

  • QUOTE_ALL: 1

  • QUOTE_MINIMAL: 0

  • QUOTE_NONE: 3

  • QUOTE_NONNUMERIC: 2

Classes#

Dialect

A more convenient class for dialects than Python's built-in.

DialectDetector

Base class for all dialect detectors.

PySniffer

Use Python's built-in csv sniffer.

Attributes#

CLEVER_CSV

DELIMITER_OPTIONS

Allowed delimiters for dialect detection.

N_ROWS_DFAULT

How many rows to use for dialect detection.

PyDialectT

is_potential_escapechar_orig

class lector.csv.dialects.Dialect[source]#

A more convenient class for dialects than Python’s built-in.

The built-in Dialect is a class with class attributes only, and so instead of instances of that class, Python wants you to send references to subclasses around, which is, uhm, awkward to say the least (see below _to_builtin() for an example).

delimiter: str = ','[source]#
double_quote: bool = True[source]#
escape_char: str | None[source]#
line_terminator: str = '\r\n'[source]#
quote_char: str = '"'[source]#
quoting: int[source]#
skip_initial_space: bool = False[source]#
classmethod from_builtin(dialect)[source]#

Make instance from built-in dialect class configured for reliable reading(!).

Parameters:

dialect (str | PyDialectT) –

Return type:

Dialect

to_builtin()[source]#

Make a subclass of built-in Dialect from this instance.

Return type:

PyDialectT

class lector.csv.dialects.DialectDetector[source]#

Bases: abc.ABC

Base class for all dialect detectors.

abstract detect(buffer)[source]#
Parameters:

buffer (TextIO) –

Return type:

Dialect

class lector.csv.dialects.PySniffer[source]#

Bases: DialectDetector

Use Python’s built-in csv sniffer.

delimiters: collections.abc.Iterable[str][source]#
log: bool = False[source]#
n_rows: int[source]#
detect(buffer)[source]#

Detect a dialect we can read(!) a CSV with using the python sniffer.

Note that the sniffer is not reliable for detecting quoting, quotechar etc., but reasonable defaults are almost guaranteed to work with most parsers. E.g. the lineterminator is not even configurable in pyarrow’s csv reader, nor in pandas (python engine).

Parameters:

buffer (TextIO) –

Return type:

Dialect

lector.csv.dialects.CLEVER_CSV = True[source]#
lector.csv.dialects.DELIMITER_OPTIONS: tuple[str] = (',', ';', '\t', '|')[source]#

Allowed delimiters for dialect detection.

lector.csv.dialects.N_ROWS_DFAULT: int = 100[source]#

How many rows to use for dialect detection.

lector.csv.dialects.PyDialectT[source]#
lector.csv.dialects.is_potential_escapechar_orig[source]#