lector.csv.arrow#
Classes#
Use base class detection methods to configure a pyarrow.csv.read_csv() call. |
Functions#
|
Handle empty and duplicate column names. |
|
Safely transcode any readable byte stream from decoder to encoder codecs. |
Attributes#
- class lector.csv.arrow.ArrowReader(fp, encoding=None, dialect=None, preamble=None, log=True)[source]#
Bases:
lector.csv.abc.ReaderUse base class detection methods to configure a pyarrow.csv.read_csv() call.
- Parameters:
fp (FileLike) –
encoding (str | lector.csv.encodings.EncodingDetector | None) –
dialect (dict | lector.csv.dialects.Dialect | lector.csv.dialects.DialectDetector | None) –
preamble (int | PreambleRegistry | None) –
log (bool) –
- configure(format)[source]#
- Parameters:
format (lector.csv.abc.Format) –
- Return type:
dict
- lector.csv.arrow.clean_column_names(names)[source]#
Handle empty and duplicate column names.
- Parameters:
names (list[str]) –
- Return type:
list[str]
- lector.csv.arrow.transcode(fp, codec_in='utf-8', codec_out='utf-8', errors='replace')[source]#
Safely transcode any readable byte stream from decoder to encoder codecs.
Arrow only accepts byte streams and optional encoding, but has no option to automatically handle codec errors. It also doesn’t seem to like the interface of a Python recoder when the encoding is “utf-16” (rather than more specific “utf-16-le” or “utf-16-be”).
- Parameters:
fp (lector.csv.abc.FileLike) –
codec_in (str) –
codec_out (str) –
- Return type:
codecs.StreamRecoder