lector.csv.dialects#
Detectors of CSV dialects (separator, quoting etc.).
Note that python.csv is not even internally consistent. E.g. although the dialect used to produce a
CSV may specify \n as the line terminator, the python sniffer is hard-coded to return
\r\n (it doesn’t actually support detecting it). It’s own reader (and others hopefully) deal
internally with different line breaks, but it means one cannot compare a dialect used to generate a
CSV and a dialect created by sniffing the same (quoting is equally hard-coded to QUOTE_MINIMAL).
Python quoting levels:
QUOTE_ALL: 1QUOTE_MINIMAL: 0QUOTE_NONE: 3QUOTE_NONNUMERIC: 2
Classes#
A more convenient class for dialects than Python's built-in. |
|
Base class for all dialect detectors. |
|
Use Python's built-in csv sniffer. |
Attributes#
Allowed delimiters for dialect detection. |
|
How many rows to use for dialect detection. |
|
- class lector.csv.dialects.Dialect[source]#
A more convenient class for dialects than Python’s built-in.
The built-in Dialect is a class with class attributes only, and so instead of instances of that class, Python wants you to send references to subclasses around, which is, uhm, awkward to say the least (see below _to_builtin() for an example).
- class lector.csv.dialects.DialectDetector[source]#
Bases:
abc.ABCBase class for all dialect detectors.
- class lector.csv.dialects.PySniffer[source]#
Bases:
DialectDetectorUse Python’s built-in csv sniffer.
- detect(buffer)[source]#
Detect a dialect we can read(!) a CSV with using the python sniffer.
Note that the sniffer is not reliable for detecting quoting, quotechar etc., but reasonable defaults are almost guaranteed to work with most parsers. E.g. the lineterminator is not even configurable in pyarrow’s csv reader, nor in pandas (python engine).
- Parameters:
buffer (TextIO) –
- Return type: