lector.csv.preambles#
Detectors of preambles in CSV files.
This is generally a chicken-and-egg-type situation. Do detect generic preambles robustly and efficiently, it would really help to know the CSV dialect, or at least the delimiter. But to detect the dialect/delimiter correctly, we need to ignore/(skip) the preamble. Detectors may therefore rely on (somtimes) overly simplistic heuristics implicitly assuming a certain dialect.
Classes#
Detect CSV files exported from Brandwatch. |
|
Detects initial rows that don't contain any delimited fields. |
|
In GoogleAds CSVs the garbage lines don't contain the separator (comma or tab). |
|
Base class for detecting preambles (initial junk) in a CSV buffer. |
|
Registry to manage preamble detectors. |
- class lector.csv.preambles.Brandwatch[source]#
Bases:
PreambleDetectorDetect CSV files exported from Brandwatch.
Brandwatch uses the comma as separator and includes a row of commas only to separate preamble texts from the CSV table as such.
- class lector.csv.preambles.Fieldless[source]#
Bases:
PreambleDetectorDetects initial rows that don’t contain any delimited fields.
Tries parsing buffer using Python’s built-in csv functionality, assuming as delimiter the most frequent character amongst those configured via
delimiters. Given this delimiter, the parser detects N initial lines containing a single field only, followed by at least one line containing multiple fields. N is then the number of rows to skip.
- class lector.csv.preambles.GoogleAds[source]#
Bases:
FieldlessIn GoogleAds CSVs the garbage lines don’t contain the separator (comma or tab).
The only complications are that 1) GoogleAds has two CSV export formats: ‘Excel’ using tabs as separators and normal ‘CSV’ the comma; 2) A single column CSV wouldn’t have the separator either.
GoogleAds also seems to include two “totals” rows at the end, which we exclude here.
- class lector.csv.preambles.PreambleDetector[source]#
Bases:
abc.ABCBase class for detecting preambles (initial junk) in a CSV buffer.
- class lector.csv.preambles.Preambles[source]#
Registry to manage preamble detectors.
- classmethod detect(buffer, detectors=None, log=False)[source]#
Get result of first preamble detector matching the csv buffer.
Matching here means detecting more than 0 rows of preamble text, and result is the number of rows to skip.
If no detectors are provided (as ordered sequence), all registered detector classes are tried in registered order and using default parameters.
- Parameters:
buffer (TextIO) –
detectors (collections.abc.Iterable[PreambleDetector] | None) –
log (bool) –
- Return type:
int