dir_content_diff.pandas¶
Extension module to process files with Pandas.
Functions
|
Register Pandas extensions. |
Classes
|
Comparator for CSV files. |
|
Comparator for |
|
Comparator for Feather files. |
|
Comparator for HDF files. |
|
Comparator for Parquet files. |
|
Comparator for Stata files. |
- class dir_content_diff.pandas.CsvComparator(default_load_kwargs=None, default_format_data_kwargs=None, default_diff_kwargs=None, default_filter_kwargs=None, default_format_diff_kwargs=None, default_sort_kwargs=None, default_concat_kwargs=None, default_report_kwargs=None, default_save_kwargs=None)¶
Bases:
DataframeComparator
Comparator for CSV files.
- load(path, **kwargs)¶
Load a CSV file into a
pandas.DataFrame
object.
- save(data, path, **kwargs)¶
Save data to a CSV file.
- class dir_content_diff.pandas.DataframeComparator(default_load_kwargs=None, default_format_data_kwargs=None, default_diff_kwargs=None, default_filter_kwargs=None, default_format_diff_kwargs=None, default_sort_kwargs=None, default_concat_kwargs=None, default_report_kwargs=None, default_save_kwargs=None)¶
Bases:
BaseComparator
Comparator for
pandas.DataFrame
objects.- diff(ref, comp, *args, ignore_columns=None, **kwargs)¶
Compare two
pandas.DataFrame
objects.This function calls
pandas.testing.assert_series_equal()
, read the doc of this function for details on args and kwargs.- Parameters:
ref (pandas.DataFrame) – The reference DataFrame.
comp (pandas.DataFrame) – The compared DataFrame.
**ignore_columns (list(str)) – (Optional) The columns that should not be checked.
- Returns:
False
if the DataFrames are considered as equal or a string explaining why they are not considered equal.- Return type:
- format_data(data, ref=None, replace_pattern=None)¶
Format the compared
pandas.DataFrame
.- Parameters:
data (pandas.DataFrame) – The DataFrame to format.
ref (pandas.DataFrame) – (Optional) The reference DataFrame.
**replace_pattern (dict) –
(Optional) The columns that contain a given pattern which must be made replaced. The dictionary must have the following format:
{ (<pattern>, <new_value>, <optional regex flag>): [col1, col2] }
Note
The formatting errors are stored in self.current_state[“format_errors”]. It contains a dict in which the keys are the columns with detected issues and the values are the actual descriptions of these issues.
- Returns:
The formatted compared data.
- Return type:
- format_diff(difference)¶
Format one element difference.
- sort(differences)¶
Do not sort the differences to keep the column order.
- class dir_content_diff.pandas.FeatherComparator(default_load_kwargs=None, default_format_data_kwargs=None, default_diff_kwargs=None, default_filter_kwargs=None, default_format_diff_kwargs=None, default_sort_kwargs=None, default_concat_kwargs=None, default_report_kwargs=None, default_save_kwargs=None)¶
Bases:
DataframeComparator
Comparator for Feather files.
- load(path, **kwargs)¶
Load a Feather file into a
pandas.DataFrame
object.
- save(data, path, **kwargs)¶
Save data to a Feather file.
- class dir_content_diff.pandas.HdfComparator(default_load_kwargs=None, default_format_data_kwargs=None, default_diff_kwargs=None, default_filter_kwargs=None, default_format_diff_kwargs=None, default_sort_kwargs=None, default_concat_kwargs=None, default_report_kwargs=None, default_save_kwargs=None)¶
Bases:
DataframeComparator
Comparator for HDF files.
- load(path, **kwargs)¶
Load a HDF file into a
pandas.DataFrame
object.
- save(data, path, **kwargs)¶
Save data to a HDF file.
- class dir_content_diff.pandas.ParquetComparator(default_load_kwargs=None, default_format_data_kwargs=None, default_diff_kwargs=None, default_filter_kwargs=None, default_format_diff_kwargs=None, default_sort_kwargs=None, default_concat_kwargs=None, default_report_kwargs=None, default_save_kwargs=None)¶
Bases:
DataframeComparator
Comparator for Parquet files.
- load(path, **kwargs)¶
Load a Parquet file into a
pandas.DataFrame
object.
- save(data, path, **kwargs)¶
Save data to a Parquet file.
- class dir_content_diff.pandas.StataComparator(default_load_kwargs=None, default_format_data_kwargs=None, default_diff_kwargs=None, default_filter_kwargs=None, default_format_diff_kwargs=None, default_sort_kwargs=None, default_concat_kwargs=None, default_report_kwargs=None, default_save_kwargs=None)¶
Bases:
DataframeComparator
Comparator for Stata files.
- load(path, **kwargs)¶
Load a Stata file into a
pandas.DataFrame
object.
- save(data, path, **kwargs)¶
Save data to a Stata file.
- dir_content_diff.pandas.register()¶
Register Pandas extensions.