dir_content_diff.base_comparators

Module containing the base comparators.

Classes

BaseComparator([default_load_kwargs, ...])

Base Comparator class.

DefaultComparator([default_load_kwargs, ...])

The comparator used by default when none is registered for a given extension.

DictComparator(*args, **kwargs)

Comparator for dictionaries.

IniComparator(*args, **kwargs)

Comparator for INI files.

JsonComparator(*args, **kwargs)

Comparator for JSON files.

PdfComparator([default_load_kwargs, ...])

Comparator for PDF files.

XmlComparator(*args, **kwargs)

Comparator for XML files.

YamlComparator(*args, **kwargs)

Comparator for YAML files.

class dir_content_diff.base_comparators.BaseComparator(default_load_kwargs=None, default_format_data_kwargs=None, default_diff_kwargs=None, default_filter_kwargs=None, default_format_diff_kwargs=None, default_sort_kwargs=None, default_concat_kwargs=None, default_report_kwargs=None, default_save_kwargs=None)

Bases: ABC

Base Comparator class.

__call__(ref_file, comp_file, *diff_args, return_raw_diffs=False, load_kwargs=None, format_data_kwargs=None, filter_kwargs=None, format_diff_kwargs=None, sort_kwargs=None, concat_kwargs=None, report_kwargs=None, **diff_kwargs)

Perform the comparison between the reference file and the compared file.

Note

The workflow is the following:

concatenate(differences, **kwargs)

Concatenate the differences.

abstract diff(ref, comp, *args, **kwargs)

Perform the comparison between the reference data and the compared data.

Note

This function must return either of the following:

  • an iterable of differences between each data element (the iterable can be empty).

  • a mapping of differences between each data element in which the keys can be an element ID or a column name (the mapping can be empty).

  • a boolean indicating whether the files are different (True) or not (False).

filter(differences, **kwargs)

Define a filter to remove specific elements from the result differences.

format_data(data, ref=None, **kwargs)

Format the loaded data.

format_diff(difference, **kwargs)

Format one element difference.

load(path, **kwargs)

Load a file.

report(ref_file, comp_file, formatted_differences, diff_args, diff_kwargs, load_kwargs=None, format_data_kwargs=None, filter_kwargs=None, format_diff_kwargs=None, sort_kwargs=None, concat_kwargs=None, **kwargs)

Create a report from the formatted differences.

Note

This function must return a formatted report of the differences (usually as a string but it can be any type). If the passed differences are None, False or an empty collection, the report should return False to state that the files are not different.

save(data, path, **kwargs)

Save formatted data into a file.

property save_capability

Check that the current class has a save() capability.

sort(differences, **kwargs)

Sort the element differences.

class dir_content_diff.base_comparators.DefaultComparator(default_load_kwargs=None, default_format_data_kwargs=None, default_diff_kwargs=None, default_filter_kwargs=None, default_format_diff_kwargs=None, default_sort_kwargs=None, default_concat_kwargs=None, default_report_kwargs=None, default_save_kwargs=None)

Bases: BaseComparator

The comparator used by default when none is registered for a given extension.

This comparator only performs a binary comparison of the files.

diff(ref, comp, *args, **kwargs)

Compare binary data.

This function calls filecmp.cmp(), read the doc of this function for details on args and kwargs.

class dir_content_diff.base_comparators.DictComparator(*args, **kwargs)

Bases: BaseComparator

Comparator for dictionaries.

diff(ref, comp, *args, **kwargs)

Compare 2 dictionaries.

This function calls dictdiffer.diff() to compare the dictionaries, read the doc of this function for details on args and kwargs.

Keyword Arguments:
  • tolerance (float) – Relative threshold to consider when comparing two float numbers.

  • absolute_tolerance (float) – Absolute threshold to consider when comparing two float numbers.

  • ignore (set[list]) – Set of keys that should not be checked.

  • path_limit (list[str]) – List of path limit tuples or dictdiffer.utils.PathLimit object to limit the diff recursion depth.

format_data(data, ref=None, replace_pattern=None, **kwargs)

Format the loaded data.

format_diff(difference)

Format one element difference.

class dir_content_diff.base_comparators.IniComparator(*args, **kwargs)

Bases: DictComparator

Comparator for INI files.

This comparator is based on the DictComparator and uses the same parameters.

Note

The load_kwargs are passed to the configparser.ConfigParser.

static configparser_to_dict(config)

Transform a ConfigParser object into a dict.

static dict_to_configparser(data, **kwargs)

Transform a dict object into a ConfigParser.

load(path, **kwargs)

Open a INI file.

save(data, path)

Save formatted data into a INI file.

class dir_content_diff.base_comparators.JsonComparator(*args, **kwargs)

Bases: DictComparator

Comparator for JSON files.

This comparator is based on the DictComparator and uses the same parameters.

load(path)

Open a JSON file.

save(data, path)

Save formatted data into a JSON file.

class dir_content_diff.base_comparators.PdfComparator(default_load_kwargs=None, default_format_data_kwargs=None, default_diff_kwargs=None, default_filter_kwargs=None, default_format_diff_kwargs=None, default_sort_kwargs=None, default_concat_kwargs=None, default_report_kwargs=None, default_save_kwargs=None)

Bases: BaseComparator

Comparator for PDF files.

diff(ref, comp, *args, **kwargs)

Compare data from two PDF files.

This function calls the diff_pdf_visually.pdf_similar() function, read the doc of this function for details on args and kwargs. It compares the visual aspects of the PDF files, ignoring the invisible content (e.g. file header or invisible things like white font on white background). The PDF files are converted into images using ImageMagick and then these images are compared.

Keyword Arguments:
  • threshold (int) – The threshold used to compare the images.

  • tempdir (pathlib.Path) – Empty directory where the temporary images will be exported.

  • dpi (int) – The resolution used to convert the PDF files into images.

  • verbosity (int) – The log verbosity.

  • max_report_pagenos (int) – Only this number of the different pages will be logged (only used if the verbosity is greater than 1).

  • num_threads (int) – If set to 2 (the default), the image conversion are processed in parallel. If set to 1 it is processed sequentially.

class dir_content_diff.base_comparators.XmlComparator(*args, **kwargs)

Bases: DictComparator

Comparator for XML files.

This comparator is based on the DictComparator and uses the same parameters.

Warning

The XML files must have only one root.

Note

If the type attributes are given in the XML file, the values will be automatically casted to Python types. For the lists, each item must be in an separated entry. Here is an example of such XML data:

<?xml version="1.0" encoding="UTF-8" ?>
<root>
    <int_value type="int">1</int_value>
    <simple_list type="list">
        <item type="int">1</item>
        <item type="float">2.5</item>
        <item type="str">str_val</item>
    </simple_list>
</root>
static add_to_output(obj, child)

Add entry from xml.etree.ElementTree.Element object into the given object.

load(path)

Open a XML file.

save(data, path)

Save formatted data into a XML file.

static xmltodict(obj)

Convert an XML string into a Python object based on each tag’s attribute.

class dir_content_diff.base_comparators.YamlComparator(*args, **kwargs)

Bases: DictComparator

Comparator for YAML files.

This comparator is based on the DictComparator and uses the same parameters.

load(path)

Open a YAML file.

save(data, path)

Save formatted data into a YAML file.