lector.types.cast#

Helpers to easily cast columns to their most appropriate/efficient type.

Classes#

Autocast

Simple cast trying each registered type in order.

Cast

Tries a specific cast for each column.

CastStrategy

Base class for autocasting implementations.

Functions#

ensure_converters([converters])

Turn a type conversion config into a list of converter instances.

Attributes#

Config

An (ordered) dict of converter class names and corresponding parameters.

Converters

Accepted argument type where converters are expected.

DEFAULT_CONVERTERS

class lector.types.cast.Autocast[source]#

Bases: CastStrategy

Simple cast trying each registered type in order.

As a little performance optimization (having a huge effect on execution time), types are first tested on a sample for fast rejection of non-matching types.

fallback: lector.types.abc.Converter | None[source]#
n_samples: int = 100[source]#
cast_array(array, name=None)[source]#

Only need to override this.

Parameters:
  • array (pyarrow.Array | pyarrow.ChunkedArray) –

  • name (str | None) –

Return type:

lector.types.abc.Conversion

class lector.types.cast.Cast[source]#

Tries a specific cast for each column.

converters: dict[str, lector.types.abc.Converter][source]#
log: bool = False[source]#
cast(table)[source]#
Parameters:

table (pyarrow.Table) –

Return type:

pyarrow.Table

class lector.types.cast.CastStrategy[source]#

Bases: abc.ABC

Base class for autocasting implementations.

columns: list[str] | None[source]#
converters: Converters | None[source]#
log: bool = False[source]#
__post_init__()[source]#
cast(data)[source]#

Shouldn’t be necessary, but @singledispatchmethod doesn’t work with inheritance.

Parameters:

data (pyarrow.Array | pyarrow.ChunkedArray | pyarrow.Table) –

Return type:

lector.types.abc.Conversion | pyarrow.Table

abstract cast_array(array, name=None)[source]#

Only need to override this.

Parameters:
  • array (pyarrow.Array) –

  • name (str | None) –

Return type:

lector.types.abc.Conversion

cast_table(table)[source]#

Takes care of updating fields, including metadata etc.

Parameters:

table (pyarrow.Table) –

Return type:

pyarrow.Table

lector.types.cast.ensure_converters(converters=None)[source]#

Turn a type conversion config into a list of converter instances.

Parameters:

converters (Converters) –

Return type:

list[lector.types.abc.Converter]

lector.types.cast.Config[source]#

An (ordered) dict of converter class names and corresponding parameters.

lector.types.cast.Converters[source]#

Accepted argument type where converters are expected.

lector.types.cast.DEFAULT_CONVERTERS: Config[source]#