lector.csv.encodings#
Helpers to detecting character encodings in binary buffers.
Classes#
An encoding detector using cchardet if the default utf-8 generates too many errors. |
|
Base class specifying interface for all encoding detetors. |
Functions#
|
The proportion of characters that couldn't be decoded correctly. |
|
Detect encoding by looking for a BOM at the start of the file. |
Attributes#
Map BOM (Byte-order mark) to encoding. |
|
Character representing non-codable bytes. |
|
Cannot read more than this number of bytes at once to detect encoding. |
- class lector.csv.encodings.Chardet[source]#
Bases:
EncodingDetectorAn encoding detector using cchardet if the default utf-8 generates too many errors.
- confidence_threshold: float = 0.6[source]#
Minimum level of confidence to accept an encoding automatically detected by cchardet.
- class lector.csv.encodings.EncodingDetector[source]#
Bases:
abc.ABCBase class specifying interface for all encoding detetors.
- lector.csv.encodings.decoding_errors(bs, encoding, prop=True)[source]#
The proportion of characters that couldn’t be decoded correctly.
- Parameters:
bs (bytes) –
encoding (str) –
prop (bool) –
- Return type:
float
- lector.csv.encodings.detect_bom(bs)[source]#
Detect encoding by looking for a BOM at the start of the file.
- Parameters:
bs (bytes) –