dir_content_diff.pandas

Extension module to process files with Pandas.

Functions

register()

Register Pandas extensions.

Classes

CsvComparator([default_load_kwargs, ...])

Comparator for CSV files.

DataframeComparator([default_load_kwargs, ...])

Comparator for pandas.DataFrame objects.

HdfComparator([default_load_kwargs, ...])

Comparator for HDF files.

class dir_content_diff.pandas.CsvComparator(default_load_kwargs=None, default_format_data_kwargs=None, default_diff_kwargs=None, default_filter_kwargs=None, default_format_diff_kwargs=None, default_sort_kwargs=None, default_concat_kwargs=None, default_report_kwargs=None, default_save_kwargs=None)

Bases: DataframeComparator

Comparator for CSV files.

load(path, **kwargs)

Load a CSV file into a pandas.DataFrame object.

save(data, path, **kwargs)

Save data to a CSV file.

class dir_content_diff.pandas.DataframeComparator(default_load_kwargs=None, default_format_data_kwargs=None, default_diff_kwargs=None, default_filter_kwargs=None, default_format_diff_kwargs=None, default_sort_kwargs=None, default_concat_kwargs=None, default_report_kwargs=None, default_save_kwargs=None)

Bases: BaseComparator

Comparator for pandas.DataFrame objects.

diff(ref, comp, *args, ignore_columns=None, **kwargs)

Compare two pandas.DataFrame objects.

This function calls pandas.testing.assert_series_equal(), read the doc of this function for details on args and kwargs.

Parameters:
Returns:

False if the DataFrames are considered as equal or a string explaining why they are not considered equal.

Return type:

bool or str

format_data(data, ref=None, replace_pattern=None)

Format the compared pandas.DataFrame.

Parameters:
  • data (pandas.DataFrame) – The DataFrame to format.

  • ref (pandas.DataFrame) – (Optional) The reference DataFrame.

  • **replace_pattern (dict) –

    (Optional) The columns that contain a given pattern which must be made replaced. The dictionary must have the following format:

    {
        (<pattern>, <new_value>, <optional regex flag>): [col1, col2]
    }
    

Note

The formatting errors are stored in self.current_state[“format_errors”]. It contains a dict in which the keys are the columns with detected issues and the values are the actual descriptions of these issues.

Returns:

The formatted compared data.

Return type:

pandas.DataFrame

format_diff(difference)

Format one element difference.

sort(differences)

Do not sort the differences to keep the column order.

class dir_content_diff.pandas.HdfComparator(default_load_kwargs=None, default_format_data_kwargs=None, default_diff_kwargs=None, default_filter_kwargs=None, default_format_diff_kwargs=None, default_sort_kwargs=None, default_concat_kwargs=None, default_report_kwargs=None, default_save_kwargs=None)

Bases: DataframeComparator

Comparator for HDF files.

load(path, **kwargs)

Load a HDF file into a pandas.DataFrame object.

save(data, path, **kwargs)

Save data to a HDF file.

dir_content_diff.pandas.register()

Register Pandas extensions.