dir_content_diff¶
dir-content-diff package.
Simple tool to compare directory contents.
- class dir_content_diff.BaseComparator(default_load_kwargs=None, default_format_data_kwargs=None, default_diff_kwargs=None, default_filter_kwargs=None, default_format_diff_kwargs=None, default_sort_kwargs=None, default_concat_kwargs=None, default_report_kwargs=None, default_save_kwargs=None)¶
Bases:
ABCBase Comparator class.
- __call__(ref_file, comp_file, *diff_args, return_raw_diffs=False, load_kwargs=None, format_data_kwargs=None, filter_kwargs=None, format_diff_kwargs=None, sort_kwargs=None, concat_kwargs=None, report_kwargs=None, **diff_kwargs)¶
Perform the comparison between the reference file and the compared file.
Note
The workflow is the following:
call
dir_content_diff.base_comparators.BaseComparator.load()to load the reference file.call
dir_content_diff.base_comparators.BaseComparator.load()to load the compared file.call
dir_content_diff.base_comparators.BaseComparator.format_data()to format the data from the compared file.call
dir_content_diff.base_comparators.BaseComparator.diff()to compute the differences.if
return_raw_diffs, the diffs are returned at this step.- if the diffs are not just a boolean, the collection is:
filtered by calling
dir_content_diff.base_comparators.BaseComparator.filter().formatted by calling
dir_content_diff.base_comparators.BaseComparator.format_diff()on each element.sorted by calling
dir_content_diff.base_comparators.BaseComparator.sort().concatenated into one string by calling
dir_content_diff.base_comparators.BaseComparator.concatenate().
a report is generated by calling
dir_content_diff.base_comparators.BaseComparator.report().
- concatenate(differences, **kwargs)¶
Concatenate the differences.
- abstractmethod diff(ref, comp, *args, **kwargs)¶
Perform the comparison between the reference data and the compared data.
Note
This function must return either of the following:
an iterable of differences between each data element (the iterable can be empty).
a mapping of differences between each data element in which the keys can be an element ID or a column name (the mapping can be empty).
a boolean indicating whether the files are different (True) or not (False).
- filter(differences, **kwargs)¶
Define a filter to remove specific elements from the result differences.
- format_data(data, ref=None, **kwargs)¶
Format the loaded data.
- format_diff(difference, **kwargs)¶
Format one element difference.
- load(path, **kwargs)¶
Load a file.
- report(ref_file, comp_file, formatted_differences, diff_args, diff_kwargs, load_kwargs=None, format_data_kwargs=None, filter_kwargs=None, format_diff_kwargs=None, sort_kwargs=None, concat_kwargs=None, **kwargs)¶
Create a report from the formatted differences.
Note
This function must return a formatted report of the differences (usually as a string but it can be any type). If the passed differences are
None,Falseor an empty collection, the report should returnFalseto state that the files are not different.
- save(data, path, **kwargs)¶
Save formatted data into a file.
- property save_capability¶
Check that the current class has a
save()capability.
- sort(differences, **kwargs)¶
Sort the element differences.
- class dir_content_diff.ComparisonConfig(include_patterns: Iterable[str] | None = None, exclude_patterns: Iterable[str] | None = None, comparators: Dict[str | None, BaseComparator | Callable] | None = None, specific_args: Dict[str, Dict[str, Any]] | None = None, return_raw_diffs: bool = False, export_formatted_files: bool | str = False, executor_type: Literal['sequential', 'thread', 'process'] = 'sequential', max_workers: int | None = None)¶
Bases:
objectConfiguration class to store comparison settings.
- Parameters:
- include_patterns¶
A list of regular expression patterns. If the relative path of a file does not match any of these patterns, it is ignored during the comparison. Note that this means that any specific arguments for that file will also be ignored.
- Type:
Iterable[str] | None
- exclude_patterns¶
A list of regular expression patterns. If the relative path of a file matches any of these patterns, it is ignored during the comparison. Note that this means that any specific arguments for that file will also be ignored.
- Type:
Iterable[str] | None
- comparators¶
A
dictto override the registered comparators.- Type:
Dict[str | None, dir_content_diff.base_comparators.BaseComparator | collections.abc.Callable] | None
- specific_args¶
A
dictwith the args/kwargs that should be given to the comparator for a given file. Thisdictshould be like the following:{ <relative_file_path>: { comparator: ComparatorInstance, args: [arg1, arg2, ...], kwargs: { kwarg_name_1: kwarg_value_1, kwarg_name_2: kwarg_value_2, } }, <another_file_path>: {...}, <a name for this category>: { "patterns": ["regex1", "regex2", ...], ... (other arguments) } }
If the “patterns” entry is present, then the name is not considered and is only used as a helper for the user. When a “patterns” entry is detected, the other arguments are applied to all files whose relative name matches one of the given regular expression patterns. If a file could match multiple patterns of different groups, only the first one is considered.
Note that all entries in this
dictare optional.
- return_raw_diffs¶
If set to
True, only the raw differences are returned instead of a formatted report.- Type:
- export_formatted_files¶
If set to
Trueor a not empty string, create a new directory with formatted compared data files. If a string is passed, this string is used as suffix for the new directory. If True is passed, the suffix is_FORMATTED.
- max_workers¶
Maximum number of worker threads/processes for parallel execution. If None, defaults to min(32, (os.cpu_count() or 1) + 4) as per executor default.
- Type:
int | None
- executor_type¶
Type of executor to use for parallel execution. ‘thread’ uses ThreadPoolExecutor (better for I/O-bound tasks), ‘process’ uses ProcessPoolExecutor (better for CPU-bound tasks), ‘sequential’ disables parallel execution.
- Type:
Literal[‘sequential’, ‘thread’, ‘process’]
- class dir_content_diff.DefaultComparator(default_load_kwargs=None, default_format_data_kwargs=None, default_diff_kwargs=None, default_filter_kwargs=None, default_format_diff_kwargs=None, default_sort_kwargs=None, default_concat_kwargs=None, default_report_kwargs=None, default_save_kwargs=None)¶
Bases:
BaseComparatorThe comparator used by default when none is registered for a given extension.
This comparator only performs a binary comparison of the files.
- diff(ref, comp, *args, **kwargs)¶
Compare binary data.
This function calls
filecmp.cmp(), read the doc of this function for details on args and kwargs.
- dir_content_diff.assert_equal_trees(*args, export_formatted_files=False, **kwargs)¶
Raise an
AssertionErrorif differences are found in the two directory trees.Note
This function has a specific behavior when run with pytest. See the doc of the
dir_content_diff.pytest_plugin.- Parameters:
*args – passed to the
compare_trees()function.export_formatted_files (bool, or str) – If set to
True, the formatted files are exported to the directory with the default suffix. If set to a string, it is used as suffix for the new directory.**kwargs – passed to the
compare_trees()function.
- Returns:
(bool)
Trueif the trees are equal. If they are not, anAssertionErroris raised.
- dir_content_diff.compare_files(ref_file: str | Path, comp_file: str | Path, comparator: BaseComparator | Callable, *args, return_raw_diffs: bool = False, **kwargs) bool | str¶
Compare 2 files and return the difference.
- Parameters:
comparator (BaseComparator | Callable) – The comparator to use (see in
register_comparator()for the comparator signature).return_raw_diffs (bool) – If set to
True, only the raw differences are returned instead of a formatted report.*args – passed to the comparator.
**kwargs – passed to the comparator.
- Returns:
Falseif the files are equal or a string with a message explaining the differences if they are different.- Return type:
- dir_content_diff.compare_trees(ref_path: str | Path, comp_path: str | Path, *, config: ComparisonConfig | None = None, **kwargs)¶
Compare all files from 2 different directory trees and return the differences.
Note
The comparison only considers the files found in the reference directory. So if there are files in the compared directory that do not exist in the reference directory, they are just ignored.
- Parameters:
comp_path (str | Path) – Path to the directory that must be compared against the reference.
config (ComparisonConfig) – A config object. If given, all other configuration parameters should be set to default values.
- Keyword Arguments:
**kwargs (dict) – Additional keyword arguments are used to build a ComparisonConfig object and will override the values of the given config argument.
- Returns:
A
dictin which the keys are the relative file paths and the values are the difference messages. If the directories are considered as equal, an emptydictis returned.- Return type:
- dir_content_diff.export_formatted_file(file: str | Path, formatted_file: str | Path, comparator: BaseComparator | Callable, **kwargs) None¶
Format a data file and export it.
Note
A new file is created only if the corresponding comparator has saving capability.
- Parameters:
comparator (BaseComparator | Callable) – The comparator to use (see in
register_comparator()for the comparator signature).**kwargs – Can contain the following dictionaries: ‘load_kwargs’, ‘format_data_kwargs’ and ‘save_kwargs’.
- Return type:
None
- dir_content_diff.get_comparators()¶
Return a copy of the comparator registry.
- dir_content_diff.pick_comparator(comparator=None, suffix=None, comparators=None)¶
Pick a comparator based on its name or a file suffix.
- dir_content_diff.register_comparator(ext: str, comparator: BaseComparator | Callable, force: bool = False) None¶
Add a comparator to the registry.
- Parameters:
ext (str) – The extension to register.
comparator (BaseComparator | Callable) – The comparator that should be associated with the given extension.
force (bool) – If set to
True, no exception is raised if the givenextis already registered and the comparator is replaced.
- Return type:
None
Note
It is possible to create and register custom comparators. The easiest way to do it is to derive a class from
dir_content_diff.BaseComparator.Otherwise, the given comparator should be a callable with the following signature:
comparator( ref_file: str, comp_file: str, *diff_args: Sequence[Any], return_raw_diffs: bool=False, **diff_kwargs: Mapping[str, Any], ) -> Union[False, str]
The return type can be Any when used with return_raw_diffs == True, else it should be a string object.
- dir_content_diff.reset_comparators()¶
Reset the comparator registry to the default values.