dir_content_diff

dir-content-diff package.

Simple tool to compare directory contents.

class dir_content_diff.BaseComparator(default_load_kwargs=None, default_format_data_kwargs=None, default_diff_kwargs=None, default_filter_kwargs=None, default_format_diff_kwargs=None, default_sort_kwargs=None, default_concat_kwargs=None, default_report_kwargs=None, default_save_kwargs=None)

Bases: ABC

Base Comparator class.

__call__(ref_file, comp_file, *diff_args, return_raw_diffs=False, load_kwargs=None, format_data_kwargs=None, filter_kwargs=None, format_diff_kwargs=None, sort_kwargs=None, concat_kwargs=None, report_kwargs=None, **diff_kwargs)

Perform the comparison between the reference file and the compared file.

Note

The workflow is the following:

concatenate(differences, **kwargs)

Concatenate the differences.

abstractmethod diff(ref, comp, *args, **kwargs)

Perform the comparison between the reference data and the compared data.

Note

This function must return either of the following:

  • an iterable of differences between each data element (the iterable can be empty).

  • a mapping of differences between each data element in which the keys can be an element ID or a column name (the mapping can be empty).

  • a boolean indicating whether the files are different (True) or not (False).

filter(differences, **kwargs)

Define a filter to remove specific elements from the result differences.

format_data(data, ref=None, **kwargs)

Format the loaded data.

format_diff(difference, **kwargs)

Format one element difference.

load(path, **kwargs)

Load a file.

report(ref_file, comp_file, formatted_differences, diff_args, diff_kwargs, load_kwargs=None, format_data_kwargs=None, filter_kwargs=None, format_diff_kwargs=None, sort_kwargs=None, concat_kwargs=None, **kwargs)

Create a report from the formatted differences.

Note

This function must return a formatted report of the differences (usually as a string but it can be any type). If the passed differences are None, False or an empty collection, the report should return False to state that the files are not different.

save(data, path, **kwargs)

Save formatted data into a file.

property save_capability

Check that the current class has a save() capability.

sort(differences, **kwargs)

Sort the element differences.

class dir_content_diff.ComparisonConfig(include_patterns: Iterable[str] | None = None, exclude_patterns: Iterable[str] | None = None, comparators: Dict[str | None, BaseComparator | Callable] | None = None, specific_args: Dict[str, Dict[str, Any]] | None = None, return_raw_diffs: bool = False, export_formatted_files: bool | str = False, executor_type: Literal['sequential', 'thread', 'process'] = 'sequential', max_workers: int | None = None)

Bases: object

Configuration class to store comparison settings.

Parameters:
include_patterns

A list of regular expression patterns. If the relative path of a file does not match any of these patterns, it is ignored during the comparison. Note that this means that any specific arguments for that file will also be ignored.

Type:

Iterable[str] | None

exclude_patterns

A list of regular expression patterns. If the relative path of a file matches any of these patterns, it is ignored during the comparison. Note that this means that any specific arguments for that file will also be ignored.

Type:

Iterable[str] | None

comparators

A dict to override the registered comparators.

Type:

Dict[str | None, dir_content_diff.base_comparators.BaseComparator | collections.abc.Callable] | None

specific_args

A dict with the args/kwargs that should be given to the comparator for a given file. This dict should be like the following:

{
    <relative_file_path>: {
        comparator: ComparatorInstance,
        args: [arg1, arg2, ...],
        kwargs: {
            kwarg_name_1: kwarg_value_1,
            kwarg_name_2: kwarg_value_2,
        }
    },
    <another_file_path>: {...},
    <a name for this category>: {
        "patterns": ["regex1", "regex2", ...],
        ... (other arguments)
    }
}

If the “patterns” entry is present, then the name is not considered and is only used as a helper for the user. When a “patterns” entry is detected, the other arguments are applied to all files whose relative name matches one of the given regular expression patterns. If a file could match multiple patterns of different groups, only the first one is considered.

Note that all entries in this dict are optional.

Type:

Dict[str, Dict[str, Any]] | None

return_raw_diffs

If set to True, only the raw differences are returned instead of a formatted report.

Type:

bool

export_formatted_files

If set to True or a not empty string, create a new directory with formatted compared data files. If a string is passed, this string is used as suffix for the new directory. If True is passed, the suffix is _FORMATTED.

Type:

bool | str

max_workers

Maximum number of worker threads/processes for parallel execution. If None, defaults to min(32, (os.cpu_count() or 1) + 4) as per executor default.

Type:

int | None

executor_type

Type of executor to use for parallel execution. ‘thread’ uses ThreadPoolExecutor (better for I/O-bound tasks), ‘process’ uses ProcessPoolExecutor (better for CPU-bound tasks), ‘sequential’ disables parallel execution.

Type:

Literal[‘sequential’, ‘thread’, ‘process’]

should_ignore_file(relative_path: str) bool

Check if a file should be ignored.

Parameters:

relative_path (str)

Return type:

bool

class dir_content_diff.DefaultComparator(default_load_kwargs=None, default_format_data_kwargs=None, default_diff_kwargs=None, default_filter_kwargs=None, default_format_diff_kwargs=None, default_sort_kwargs=None, default_concat_kwargs=None, default_report_kwargs=None, default_save_kwargs=None)

Bases: BaseComparator

The comparator used by default when none is registered for a given extension.

This comparator only performs a binary comparison of the files.

diff(ref, comp, *args, **kwargs)

Compare binary data.

This function calls filecmp.cmp(), read the doc of this function for details on args and kwargs.

dir_content_diff.assert_equal_trees(*args, export_formatted_files=False, **kwargs)

Raise an AssertionError if differences are found in the two directory trees.

Note

This function has a specific behavior when run with pytest. See the doc of the dir_content_diff.pytest_plugin.

Parameters:
  • *args – passed to the compare_trees() function.

  • export_formatted_files (bool, or str) – If set to True, the formatted files are exported to the directory with the default suffix. If set to a string, it is used as suffix for the new directory.

  • **kwargs – passed to the compare_trees() function.

Returns:

(bool) True if the trees are equal. If they are not, an AssertionError is raised.

dir_content_diff.compare_files(ref_file: str | Path, comp_file: str | Path, comparator: BaseComparator | Callable, *args, return_raw_diffs: bool = False, **kwargs) bool | str

Compare 2 files and return the difference.

Parameters:
  • ref_file (str | Path) – Path to the reference file.

  • comp_file (str | Path) – Path to the compared file.

  • comparator (BaseComparator | Callable) – The comparator to use (see in register_comparator() for the comparator signature).

  • return_raw_diffs (bool) – If set to True, only the raw differences are returned instead of a formatted report.

  • *args – passed to the comparator.

  • **kwargs – passed to the comparator.

Returns:

False if the files are equal or a string with a message explaining the differences if they are different.

Return type:

bool | str

dir_content_diff.compare_trees(ref_path: str | Path, comp_path: str | Path, *, config: ComparisonConfig | None = None, **kwargs)

Compare all files from 2 different directory trees and return the differences.

Note

The comparison only considers the files found in the reference directory. So if there are files in the compared directory that do not exist in the reference directory, they are just ignored.

Parameters:
  • ref_path (str | Path) – Path to the reference directory.

  • comp_path (str | Path) – Path to the directory that must be compared against the reference.

  • config (ComparisonConfig) – A config object. If given, all other configuration parameters should be set to default values.

Keyword Arguments:

**kwargs (dict) – Additional keyword arguments are used to build a ComparisonConfig object and will override the values of the given config argument.

Returns:

A dict in which the keys are the relative file paths and the values are the difference messages. If the directories are considered as equal, an empty dict is returned.

Return type:

dict

dir_content_diff.export_formatted_file(file: str | Path, formatted_file: str | Path, comparator: BaseComparator | Callable, **kwargs) None

Format a data file and export it.

Note

A new file is created only if the corresponding comparator has saving capability.

Parameters:
  • file (str | Path) – Path to the compared file.

  • formatted_file (str | Path) – Path to the formatted file.

  • comparator (BaseComparator | Callable) – The comparator to use (see in register_comparator() for the comparator signature).

  • **kwargs – Can contain the following dictionaries: ‘load_kwargs’, ‘format_data_kwargs’ and ‘save_kwargs’.

Return type:

None

dir_content_diff.get_comparators()

Return a copy of the comparator registry.

dir_content_diff.pick_comparator(comparator=None, suffix=None, comparators=None)

Pick a comparator based on its name or a file suffix.

dir_content_diff.register_comparator(ext: str, comparator: BaseComparator | Callable, force: bool = False) None

Add a comparator to the registry.

Parameters:
  • ext (str) – The extension to register.

  • comparator (BaseComparator | Callable) – The comparator that should be associated with the given extension.

  • force (bool) – If set to True, no exception is raised if the given ext is already registered and the comparator is replaced.

Return type:

None

Note

It is possible to create and register custom comparators. The easiest way to do it is to derive a class from dir_content_diff.BaseComparator.

Otherwise, the given comparator should be a callable with the following signature:

comparator(
    ref_file: str,
    comp_file: str,
    *diff_args: Sequence[Any],
    return_raw_diffs: bool=False,
    **diff_kwargs: Mapping[str, Any],
) -> Union[False, str]

The return type can be Any when used with return_raw_diffs == True, else it should be a string object.

dir_content_diff.reset_comparators()

Reset the comparator registry to the default values.

dir_content_diff.unregister_comparator(ext: str, quiet: bool = False)

Remove a comparator from the registry.

Parameters:
  • ext (str) – The extension to unregister.

  • quiet (bool) – If set to True, no exception is raised if the given ext is not registered.

Returns:

The removed comparator.