dupeguru/help/en/developer/core/engine.rst

core.engine
===========

.. automodule:: core.engine

    .. autoclass:: Match

    .. autoclass:: Group
        :members:

    .. autofunction:: build_word_dict
    .. autofunction:: compare
    .. autofunction:: compare_fields
    .. autofunction:: getmatches
    .. autofunction:: getmatches_by_contents
    .. autofunction:: get_groups
    .. autofunction:: merge_similar_words
    .. autofunction:: reduce_common_words

.. _fields:

Fields
------

Fields are groups of words which each represent a significant part of the whole name. This concept
is sifnificant in music file names, where we often have names like "My Artist - a very long title
with many many words".

This title has 10 words. If you run as scan with a bit of tolerance, let's say 90%, you'll be able
to find a dupe that has only one "many" in the song title. However, you would also get false
duplicates from a title like "My Giraffe - a very long title with many many words", which is of
course a very different song and it doesn't make sense to match them.

When matching by fields, each field (separated by "-") is considered as a separate string to match
independently. After all fields are matched, the lowest result is kept. In the "Giraffe" example we
gave, the result would be 50% instead of 90% in normal mode.
Began serious code documentation effort Enabled the autodoc Sphinx extension and started adding docstrings to classes, methods, etc.. It's quickly becoming quite interesting... 2013-08-18 22:36:09 +00:00			`core.engine`
			`===========`

			`.. automodule:: core.engine`
chore: Apply whitespace fixes from hooks - Remove trailing whitespace - Correct single newline at end of files (skip for json) - Update to formatting in a few places due to black 2023-01-10 04:58:08 +00:00
Improved dev docs 2013-08-21 02:52:43 +00:00			`.. autoclass:: Match`
chore: Apply whitespace fixes from hooks - Remove trailing whitespace - Correct single newline at end of files (skip for json) - Update to formatting in a few places due to black 2023-01-10 04:58:08 +00:00
Improved dev docs 2013-08-21 02:52:43 +00:00			`.. autoclass:: Group`
			`:members:`
chore: Apply whitespace fixes from hooks - Remove trailing whitespace - Correct single newline at end of files (skip for json) - Update to formatting in a few places due to black 2023-01-10 04:58:08 +00:00
Improved dev docs 2013-08-21 02:52:43 +00:00			`.. autofunction:: build_word_dict`
			`.. autofunction:: compare`
			`.. autofunction:: compare_fields`
			`.. autofunction:: getmatches`
			`.. autofunction:: getmatches_by_contents`
			`.. autofunction:: get_groups`
			`.. autofunction:: merge_similar_words`
			`.. autofunction:: reduce_common_words`
chore: Apply whitespace fixes from hooks - Remove trailing whitespace - Correct single newline at end of files (skip for json) - Update to formatting in a few places due to black 2023-01-10 04:58:08 +00:00
Improved dev docs 2013-08-21 02:52:43 +00:00			`.. _fields:`
Began serious code documentation effort Enabled the autodoc Sphinx extension and started adding docstrings to classes, methods, etc.. It's quickly becoming quite interesting... 2013-08-18 22:36:09 +00:00
Improved dev docs 2013-08-21 02:52:43 +00:00			`Fields`
			`------`

			`Fields are groups of words which each represent a significant part of the whole name. This concept`
			`is sifnificant in music file names, where we often have names like "My Artist - a very long title`
			`with many many words".`

			`This title has 10 words. If you run as scan with a bit of tolerance, let's say 90%, you'll be able`
			`to find a dupe that has only one "many" in the song title. However, you would also get false`
			`duplicates from a title like "My Giraffe - a very long title with many many words", which is of`
			`course a very different song and it doesn't make sense to match them.`

			`When matching by fields, each field (separated by "-") is considered as a separate string to match`
			`independently. After all fields are matched, the lowest result is kept. In the "Giraffe" example we`
			`gave, the result would be 50% instead of 90% in normal mode.`