Update changelog

Add vscode extension recommendation
Fix internal links in CONTRIBUTING.md
2026-01-25 16:11:39 +00:00 · 2022-03-25 23:37:46 -05:00 · 2022-03-21 22:27:16 -05:00 · 2022-03-21 22:19:58 -05:00 · 2022-03-21 22:18:22 -05:00 · 2022-03-21 22:04:45 -05:00
318 changed files with 37919 additions and 10268 deletions
--- a/.github/FUNDING.yml
+++ b/.github/FUNDING.yml
@@ -0,0 +1,13 @@
+# These are supported funding model platforms
+
+github: arsenetar
+patreon: # Replace with a single Patreon username
+open_collective: # Replace with a single Open Collective username
+ko_fi: # Replace with a single Ko-fi username
+tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
+community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
+liberapay: # Replace with a single Liberapay username
+issuehunt: # Replace with a single IssueHunt username
+otechie: # Replace with a single Otechie username
+lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry
+custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']
--- a/.github/ISSUE_TEMPLATE.md
+++ b/.github/ISSUE_TEMPLATE.md
@@ -1,24 +0,0 @@
-# Instructions 
-1. Provide a short descriptive title for the issue. A good example is 'Results window appears off screen.', a non-optimal example is 'Problem with App'.
-2. Please fill out either the 'Bug / Issue' or the 'Feature Request' section. Replace values in ` `.
-3. Delete these instructions and the unused sections.
-
-# Bug / issue Report
-System Information:
- - DupeGuru Version: `version`
- - Operating System: `Windows/Linux/OSX` `distribution` `version` 
-
-If using the source distribution and building yourself also provide (otherwise remove):
- - Python Version: `version ex. 3.6.6` `32/64bit`  
- - Complier: `gcc/llvm/msvc` `version`
-
-## Description
-`Provide a detailed description of the issue to help reproduce it.  If it happens after a specific sequence of events provide them here.`
-
-## Debug Log
-```
-If reporting an error provide the debug log and/or the error message information.  If the debug log is short < 40 lines you can provide it here, otherwise attach the text file to this issue.
-```
-
-# Feature Requests
-`Provide a detailed description of the feature.`
--- a/.github/ISSUE_TEMPLATE/bug_report.md
+++ b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -0,0 +1,31 @@
+---
+name: Bug report
+about: Create a report to help us improve
+title: ''
+labels: bug
+assignees: ''
+
+---
+
+**Describe the bug**
+A clear and concise description of what the bug is.
+
+**To Reproduce**
+Steps to reproduce the behavior:
+1. Go to '...'
+2. Click on '....'
+3. Scroll down to '....'
+4. See error
+
+**Expected behavior**
+A clear and concise description of what you expected to happen.
+
+**Screenshots**
+If applicable, add screenshots to help explain your problem.
+
+**Desktop (please complete the following information):**
+ - OS: [e.g. Windows 10 / OSX 10.15 / Ubuntu 20.04 / Arch Linux]
+ - Version [e.g. 4.1.0]
+
+**Additional context**
+Add any other context about the problem here. You may include the debug log although it is normally best to attach it as a file.
--- a/.github/ISSUE_TEMPLATE/feature_request.md
+++ b/.github/ISSUE_TEMPLATE/feature_request.md
@@ -0,0 +1,20 @@
+---
+name: Feature request
+about: Suggest an idea for this project
+title: ''
+labels: feature
+assignees: ''
+
+---
+
+**Is your feature request related to a problem? Please describe.**
+A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
+
+**Describe the solution you'd like**
+A clear and concise description of what you want to happen.
+
+**Describe alternatives you've considered**
+A clear and concise description of any alternative solutions or features you've considered.
+
+**Additional context**
+Add any other context or screenshots about the feature request here.
--- a/.github/workflows/codeql-analysis.yml
+++ b/.github/workflows/codeql-analysis.yml
@@ -0,0 +1,50 @@
+name: "CodeQL"
+
+on:
+  push:
+    branches: [master]
+  pull_request:
+    # The branches below must be a subset of the branches above
+    branches: [master]
+  schedule:
+    - cron: "24 20 * * 2"
+
+jobs:
+  analyze:
+    name: Analyze
+    runs-on: ubuntu-latest
+    permissions:
+      actions: read
+      contents: read
+      security-events: write
+
+    strategy:
+      fail-fast: false
+      matrix:
+        language: ["cpp", "python"]
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v2
+
+      # Initializes the CodeQL tools for scanning.
+      - name: Initialize CodeQL
+        uses: github/codeql-action/init@v1
+        with:
+          languages: ${{ matrix.language }}
+          # If you wish to specify custom queries, you can do so here or in a config file.
+          # By default, queries listed here will override any specified in a config file.
+          # Prefix the list here with "+" to use these queries and those in the config file.
+          # queries: ./path/to/local/query, your-org/your-repo/queries@main
+      - if: matrix.language == 'cpp'
+        name: Build Cpp
+        run: |
+          sudo apt-get update
+          sudo apt-get install python3-pyqt5
+          make modules
+      - if: matrix.language == 'python'
+        name: Autobuild
+        uses: github/codeql-action/autobuild@v1
+      # Analysis
+      - name: Perform CodeQL Analysis
+        uses: github/codeql-action/analyze@v1
--- a/.github/workflows/default.yml
+++ b/.github/workflows/default.yml
@@ -0,0 +1,84 @@
+# Workflow lints, and checks format in parallel then runs tests on all platforms
+
+name: Default CI/CD
+
+on:
+  push:
+    branches: [master]
+  pull_request:
+    branches: [master]
+
+jobs:
+  lint:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+      - name: Set up Python 3.10
+        uses: actions/setup-python@v2
+        with:
+          python-version: "3.10"
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install -r requirements.txt -r requirements-extra.txt
+      - name: Lint with flake8
+        run: |
+          flake8 .
+  format:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+      - name: Set up Python 3.10
+        uses: actions/setup-python@v2
+        with:
+          python-version: "3.10"
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install -r requirements.txt -r requirements-extra.txt
+      - name: Check format with black
+        run: |
+          black .
+  test:
+    needs: [lint, format]
+    runs-on: ${{ matrix.os }}
+    strategy:
+      matrix:
+        os: [ubuntu-latest, macos-latest, windows-latest]
+        python-version: [3.7, 3.8, 3.9, "3.10"]
+        exclude:
+          - os: macos-latest
+            python-version: 3.7
+          - os: macos-latest
+            python-version: 3.8
+          - os: macos-latest
+            python-version: 3.9
+          - os: windows-latest
+            python-version: 3.7
+          - os: windows-latest
+            python-version: 3.8
+          - os: windows-latest
+            python-version: 3.9
+
+    steps:
+      - uses: actions/checkout@v2
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v2
+        with:
+          python-version: ${{ matrix.python-version }}
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install -r requirements.txt -r requirements-extra.txt
+      - name: Build python modules
+        run: |
+          python build.py --modules
+      - name: Run tests
+        run: |
+          pytest core hscommon
+      - name: Upload Artifacts
+        if: matrix.os == 'ubuntu-latest'
+        uses: actions/upload-artifact@v3
+        with:
+          name: modules ${{ matrix.python-version }}
+          path: ${{ github.workspace }}/**/*.so
--- a/.gitignore
+++ b/.gitignore
@@ -1,25 +1,111 @@
-.DS_Store
-__pycache__
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
 *.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+
+# Translations
 *.mo
-*.waf*
-.lock-waf*
-.tox
-/tags
+#*.pot

-build
-dist
-env*
-/deps
-cocoa/autogen
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow
+__pypackages__/

-/run.py
-/cocoa/*/Info.plist
-/cocoa/*/build
+# Environments
+.env
+.venv
+env*/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# pytype static type analyzer
+.pytype/
+
+# Cython debug symbols
+cython_debug/
+
+# macOS
+.DS_Store
+
+# Visual Studio Code
+.vscode/*
+!.vscode/settings.json
+#!.vscode/tasks.json
+#!.vscode/launch.json
+!.vscode/extensions.json
+!.vscode/*.code-snippets
+
+# Local History for Visual Studio Code
+.history/
+
+# Built Visual Studio Code Extensions
+*.vsix
+
+# dupeGuru Specific
 /qt/*_rc.py
 /help/*/conf.py
 /help/*/changelog.rst
+cocoa/autogen
+/cocoa/*/Info.plist
+/cocoa/*/build

-*.pyd
-*.exe
-*.spec
+*.waf*
+.lock-waf*
+/tags
--- a/.gitmodules
+++ b/.gitmodules
@@ -1,6 +0,0 @@
-[submodule "qtlib"]
-	path = qtlib
-	url = https://github.com/hsoft/qtlib.git
-[submodule "hscommon"]
-	path = hscommon
-	url = https://github.com/hsoft/hscommon.git
--- a/.sonarcloud.properties
+++ b/.sonarcloud.properties
@@ -0,0 +1 @@
+sonar.python.version=3.7, 3.8, 3.9, 3.10
--- a/.travis.yml
+++ b/.travis.yml
@@ -1,11 +0,0 @@
-sudo: false
-dist: xenial
-language: python
-python:
-  - "3.4"
-  - "3.5"
-  - "3.6"
-  - "3.7"
-install: pip install tox-travis
-script: tox
-
--- a/.tx/config
+++ b/.tx/config
@@ -1,19 +1,25 @@
 [main]
 host = https://www.transifex.com

-[dupeguru.core]
-file_filter = locale/<lang>/LC_MESSAGES/core.po
-source_file = locale/core.pot
-source_lang = en
-type = PO
-
-[dupeguru.columns]
+[o:voltaicideas:p:dupeguru-1:r:columns]
 file_filter = locale/<lang>/LC_MESSAGES/columns.po
 source_file = locale/columns.pot
 source_lang = en
 type        = PO

-[dupeguru.ui]
+[o:voltaicideas:p:dupeguru-1:r:core]
+file_filter = locale/<lang>/LC_MESSAGES/core.po
+source_file = locale/core.pot
+source_lang = en
+type        = PO
+
+[o:voltaicideas:p:dupeguru-1:r:qtlib]
+file_filter = qtlib/locale/<lang>/LC_MESSAGES/qtlib.po
+source_file = qtlib/locale/qtlib.pot
+source_lang = en
+type        = PO
+
+[o:voltaicideas:p:dupeguru-1:r:ui]
 file_filter = locale/<lang>/LC_MESSAGES/ui.po
 source_file = locale/ui.pot
 source_lang = en
--- a/.vscode/extensions.json
+++ b/.vscode/extensions.json
@@ -0,0 +1,10 @@
+{
+    // List of extensions which should be recommended for users of this workspace.
+    "recommendations": [
+        "redhat.vscode-yaml",
+        "ms-python.vscode-pylance",
+        "ms-python.python"
+    ],
+    // List of extensions recommended by VS Code that should not be recommended for users of this workspace.
+    "unwantedRecommendations": []
+}
--- a/.vscode/settings.json
+++ b/.vscode/settings.json
@@ -0,0 +1,12 @@
+{
+    "python.formatting.provider": "black",
+    "cSpell.words": [
+        "Dupras",
+        "hscommon"
+    ],
+    "python.languageServer": "Pylance",
+    "yaml.schemaStore.enable": true,
+    "yaml.schemas": {
+        "https://json.schemastore.org/github-workflow.json": ".github/workflows/*.yml"
+    }
+}
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -0,0 +1,88 @@
+# Contributing to dupeGuru
+
+The following is a set of guidelines and information for contributing to dupeGuru.
+
+#### Table of Contents
+
+[Things to Know Before Starting](#things-to-know-before-starting)
+
+[Ways to Contribute](#ways-to-contribute)
+  * [Reporting Bugs](#reporting-bugs)
+  * [Suggesting Enhancements](#suggesting-enhancements)
+  * [Localization](#localization)
+  * [Code Contribution](#code-contribution)
+  * [Pull Requests](#pull-requests)
+
+[Style Guides](#style-guides)
+  * [Git Commit Messages](#git-commit-messages)
+  * [Python Style Guide](#python-style-guide)
+  * [Documentation Style Guide](#documentation-style-guide)
+
+[Additional Notes](#additional-notes)
+  * [Issue and Pull Request Labels](#issue-and-pull-request-labels)
+
+## Things to Know Before Starting
+**TODO**
+## Ways to contribute
+### Reporting Bugs
+**TODO**
+### Suggesting Enhancements
+**TODO**
+### Localization
+**TODO**
+### Code Contribution
+**TODO**
+### Pull Requests
+Please follow these steps to have your contribution considered by the maintainers:
+
+1. Keep Pull Request specific to one feature or bug.
+2. Follow the [style guides](#style-guides)
+3. After you submit your pull request, verify that all [status checks](https://help.github.com/articles/about-status-checks/) are passing <details><summary>What if the status checks are failing?</summary>If a status check is failing, and you believe that the failure is unrelated to your change, please leave a comment on the pull request explaining why you believe the failure is unrelated. A maintainer will re-run the status check for you. If we conclude that the failure was a false positive, then we will open an issue to track that problem with our status check suite.</details>
+
+While the prerequisites above must be satisfied prior to having your pull request reviewed, the reviewer(s) may ask you to complete additional design work, tests, or other changes before your pull request can be ultimately accepted.
+
+## Style Guides
+### Git Commit Messages
+- Use the present tense ("Add feature" not "Added feature")
+- Use the imperative mood ("Move cursor to..." not "Moves cursor to...")
+- Limit the first line to 72 characters or less
+- Reference issues and pull requests liberally after the first line
+
+### Python Style Guide
+- All files are formatted with [Black](https://github.com/psf/black)
+- Follow [PEP 8](https://peps.python.org/pep-0008/) as much as practical
+- Pass [flake8](https://flake8.pycqa.org/en/latest/) linting
+- Include [PEP 484](https://peps.python.org/pep-0484/) type hints (new code)
+
+### Documentation Style Guide
+**TODO**
+
+## Additional Notes
+### Issue and Pull Request Labels
+This section lists and describes the various labels used with issues and pull requests.  Each of the labels is listed with a search link as well.
+
+#### Issue Type and Status
+| Label name | Search | Description |
+|------------|--------|-------------|
+| `enhancement` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3Aenhancement) | Feature requests and enhancements. |
+| `bug` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3Abug) | Bug reports. |
+| `duplicate` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3Aduplicate) | Issue is a duplicate of existing issue. |
+| `needs-reproduction` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3Aneeds-reproduction) | A bug that has not been able to be reproduced. |
+| `needs-information` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3Aneeds-information) | More information needs to be collected about these problems or feature requests (e.g. steps to reproduce). |
+| `blocked` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3Ablocked) | Issue blocked by other issues. |
+| `beginner` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3Abeginner) | Less complex issues for users who want to start contributing. |
+
+#### Category Labels
+| Label name | Search | Description |
+|------------|--------|-------------|
+| `3rd party` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3A%223rd%20party%22)  | Related to a 3rd party dependency. |
+| `crash` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3Acrash) | Related to crashes (complete, or unhandled). |
+| `documentation` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3Adocumentation) | Related to any documentation. |
+| `linux` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3linux) | Related to running on Linux. |
+| `mac` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3Amac) | Related to running on macOS. |
+| `performance` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3Aperformance) | Related to the performance. |
+| `ui` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3Aui)| Related to the visual design. |
+| `windows` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3Awindows) | Related to running on Windows. |
+
+#### Pull Request Labels
+None at this time, if the volume of Pull Requests increase labels may be added to manage.
--- a/2
+++ b/2
@@ -1,6 +1,8 @@
 To know who contributed to dupeGuru, you can look at the commit log, but not all contributions
 result in a commit. This file lists contributors who don't necessarily appear in the commit log.

+* Jason Cho, Exchange icon
+* schollidesign (https://findicons.com/pack/1035/human_o2), Zoom-in, Zoom-out, Zoom-best-fit, Zoom-original icons
 * Jérôme Cantin, Main icon
 * Gregor Tätzner, German localization
 * Frank Weber, German localization
--- a/MANIFEST.in
+++ b/MANIFEST.in
@@ -0,0 +1,6 @@
+recursive-include core *.h
+recursive-include core *.m
+include run.py
+graft locale
+graft help
+graft qtlib/locale
--- a/56
+++ b/56
@@ -1,7 +1,7 @@
 PYTHON ?= python3
 PYTHON_VERSION_MINOR := $(shell ${PYTHON} -c "import sys; print(sys.version_info.minor)")
 PYRCC5 ?= pyrcc5
-REQ_MINOR_VERSION = 4
+REQ_MINOR_VERSION = 7
 PREFIX ?= /usr/local

 # Window compatability via Msys2 
@@ -9,13 +9,13 @@ PREFIX ?= /usr/local
 # - compile generates .pyd instead of .so
 # - venv with --sytem-site-packages has issues on windows as well...

-ifeq ($(shell uname -o), Msys)
+ifeq ($(shell ${PYTHON} -c "import platform; print(platform.system())"), Windows)
 	BIN = Scripts
 	SO = *.pyd
 	VENV_OPTIONS = 
 else
 	BIN = bin
-	SO = cpython-3$(PYTHON_VERSION_MINOR)m*.so
+	SO = *.so
 	VENV_OPTIONS = --system-site-packages
 endif

@@ -34,7 +34,6 @@ endif

 # Our build scripts are not very "make like" yet and perform their task in a bundle. For now, we
 # use one of each file to act as a representative, a target, of these groups.
-submodules_target = hscommon/__init__.py

 packages = hscommon qtlib core qt
 localedirs = $(wildcard locale/*/LC_MESSAGES)
@@ -44,17 +43,17 @@ mofiles = $(patsubst %.po,%.mo,$(pofiles))
 vpath %.po $(localedirs)
 vpath %.mo $(localedirs)

-all : | env i18n modules qt/dg_rc.py
+all: | env i18n modules qt/dg_rc.py 
 	@echo "Build complete! You can run dupeGuru with 'make run'"

 run:
 	$(VENV_PYTHON) run.py

-pyc:
-	${PYTHON} -m compileall ${packages}
+pyc: | env
+	${VENV_PYTHON} -m compileall ${packages}

-reqs :
-ifneq ($(shell test $(PYTHON_VERSION_MINOR) -gt $(REQ_MINOR_VERSION); echo $$?),0)
+reqs:
+ifneq ($(shell test $(PYTHON_VERSION_MINOR) -ge $(REQ_MINOR_VERSION); echo $$?),0)
 	$(error "Python 3.${REQ_MINOR_VERSION}+ required. Aborting.")
 endif
 ifndef NO_VENV
@@ -64,12 +63,7 @@ endif
 	@${PYTHON} -c 'import PyQt5' >/dev/null 2>&1 || \
 		{ echo "PyQt 5.4+ required. Install it and try again. Aborting"; exit 1; }

-# Ensure that submodules are initialized
-$(submodules_target) :
-	git submodule init
-	git submodule update
-
-env : | $(submodules_target) reqs
+env: | reqs
 ifndef NO_VENV
 	@echo "Creating our virtualenv"
 	${PYTHON} -m venv env
@@ -79,40 +73,26 @@ ifndef NO_VENV
 	${PYTHON} -m venv --upgrade ${VENV_OPTIONS} env
 endif

-build/help : | env
+build/help: | env
 	$(VENV_PYTHON) build.py --doc

-qt/dg_rc.py : qt/dg.qrc
+qt/dg_rc.py: qt/dg.qrc
 	$(PYRCC5) qt/dg.qrc > qt/dg_rc.py

 i18n: $(mofiles)

-%.mo : %.po
+%.mo: %.po
 	msgfmt -o $@ $<	

-core/pe/_block.$(SO) : core/pe/modules/block.c core/pe/modules/common.c
-	$(PYTHON) hscommon/build_ext.py $^ _block
-	mv _block.$(SO) core/pe
+modules: | env
+	$(VENV_PYTHON) build.py --modules

-core/pe/_cache.$(SO) : core/pe/modules/cache.c core/pe/modules/common.c
-	$(PYTHON) hscommon/build_ext.py $^ _cache
-	mv _cache.$(SO) core/pe
-
-qt/pe/_block_qt.$(SO) : qt/pe/modules/block.c
-	$(PYTHON) hscommon/build_ext.py $^ _block_qt
-	mv _block_qt.$(SO) qt/pe
-
-modules : core/pe/_block.$(SO) core/pe/_cache.$(SO) qt/pe/_block_qt.$(SO)
-
-mergepot :
+mergepot: | env
 	$(VENV_PYTHON) build.py --mergepot

-normpo :
+normpo: | env
 	$(VENV_PYTHON) build.py --normpo

-srcpkg :
-	./scripts/srcpkg.sh
-
 install: all pyc
 	mkdir -p ${DESTDIR}${PREFIX}/share/dupeguru
 	cp -rf ${packages} locale ${DESTDIR}${PREFIX}/share/dupeguru
@@ -129,7 +109,7 @@ installdocs: build/help
 	mkdir -p ${DESTDIR}${PREFIX}/share/dupeguru
 	cp -rf build/help ${DESTDIR}${PREFIX}/share/dupeguru

-uninstall :
+uninstall:
 	rm -rf "${DESTDIR}${PREFIX}/share/dupeguru"
 	rm -f "${DESTDIR}${PREFIX}/bin/dupeguru"
 	rm -f "${DESTDIR}${PREFIX}/share/applications/dupeguru.desktop"
@@ -140,4 +120,4 @@ clean:
 	-rm locale/*/LC_MESSAGES/*.mo
 	-rm core/pe/*.$(SO) qt/pe/*.$(SO)

-.PHONY : clean srcpkg normpo mergepot modules i18n reqs run pyc install uninstall all
+.PHONY: clean normpo mergepot modules i18n reqs run pyc install uninstall all
--- a/README.md
+++ b/README.md
@@ -1,68 +1,90 @@
 # dupeGuru

 [dupeGuru][dupeguru] is a cross-platform (Linux, OS X, Windows) GUI tool to find duplicate files in
-a system. It's written mostly in Python 3 and has the peculiarity of using
+a system. It is written mostly in Python 3 and has the peculiarity of using
 [multiple GUI toolkits][cross-toolkit], all using the same core Python code. On OS X, the UI layer
-is written in Objective-C and uses Cocoa. On Linux, it's written in Python and uses Qt5.
+is written in Objective-C and uses Cocoa. On Linux, it is written in Python and uses Qt5.

-The Cocoa UI of dupeGuru is hosted in a separate repo: https://github.com/hsoft/dupeguru-cocoa
+The Cocoa UI of dupeGuru is hosted in a separate repo: https://github.com/arsenetar/dupeguru-cocoa

-## Current status: Additional Maintainers Wanted (/ Note on Things in General)
-
-When I started contributing to dupeGuru, it was to help provide an updated Windows build for dupeGuru.  I hoped to contribute more over time and help work through some of the general issues as well.  Since Virgil Dupras left as the lead maintainer, I have not been able to devote enough time to work through as many issues as I had hoped.  Now I am going to be devoting a more consistent amount of time each month to work on dupeGuru, however I will not be able to get to all issues.  Additionally there are a few specific areas where additional help would be appreciated:
-
- OSX maintenance
-  - UI issues (I have no experience with cocoa)
-  - General issues & releases (I lack OSX environments / hardware to develop and test on, looking into doing builds through Travis CI.)
- Linux maintenance
-  - Packaging (I have not really done much linux packaging yet, although will be spending some time trying to get at least .deb and potentially ppa's updated.)
-
-I am still working to update the new site & update links within the help and the repository to use the new urls.  Additionally, hoping to get a 4.0.4 release out this year for at least Windows and Linux.
-
-Thanks,
-
-Andrew Senetar
+## Current status
+Still looking for additional help especially with regards to:
+* OSX maintenance: reproducing bugs & cocoa version, building package with Cocoa UI.
+* Linux maintenance: reproducing bugs, maintaining PPA repository, Debian package.
+* Translations: updating missing strings, transifex project at https://www.transifex.com/voltaicideas/dupeguru-1
+* Documentation: keeping it up-to-date.

 ## Contents of this folder

 This folder contains the source for dupeGuru. Its documentation is in `help`, but is also
-[available online][documentation] in its built form. Here's how this source tree is organised:
+[available online][documentation] in its built form. Here's how this source tree is organized:

 * core: Contains the core logic code for dupeGuru. It's Python code.
 * qt: UI code for the Qt toolkit. It's written in Python and uses PyQt.
 * images: Images used by the different UI codebases.
 * pkg: Skeleton files required to create different packages
 * help: Help document, written for Sphinx.
-* locale: .po files for localisation.
-
-There are also other sub-folder that comes from external repositories and are part of this repo as
-git submodules:
-
+* locale: .po files for localization.
 * hscommon: A collection of helpers used across HS applications.
 * qtlib: A collection of helpers used across Qt UI codebases of HS applications.

 ## How to build dupeGuru from source

-### Windows
+### Windows & macOS specific additional instructions
 For windows instructions see the [Windows Instructions](Windows.md).

-### Prerequisites
+For macos instructions (qt version) see the [macOS Instructions](macos.md).

-* [Python 3.4+][python]
+### Prerequisites
+* [Python 3.7+][python]
 * PyQt5

-### make
+### System Setup
+When running in a linux based environment the following system packages or equivalents are needed to build:
+* python3-pyqt5
+* pyqt5-dev-tools (on some systems, see note)
+* python3-wheel (for hsaudiotag3k)
+* python3-venv (only if using a virtual environment)
+* python3-dev
+* build-essential

-dupeGuru is built with "make":

-    $ make
-    $ make run
+Note: On some linux systems pyrcc5 is not put on the path when installing python3-pyqt5, this will cause some issues with the resource files (and icons). These systems should have a respective pyqt5-dev-tools package, which should also be installed. The presence of pyrcc5 can be checked with `which pyrcc5`.  Debian based systems need the extra package, and Arch does not.

-### Generate Ubuntu packages
+To create packages the following are also needed:
+* python3-setuptools
+* debhelper

-    $ bash -c "pyvenv --system-site-packages env && source env/bin/activate && pip install -r requirements.txt && python3 build.py --clean && python3 package.py"
+### Building with Make
+dupeGuru comes with a makefile that can be used to build and run:

-### Running tests
+    $ make && make run
+
+### Building without Make
+
+    $ cd <dupeGuru directory>
+    $ python3 -m venv --system-site-packages ./env
+    $ source ./env/bin/activate
+    $ pip install -r requirements.txt
+    $ python build.py
+    $ python run.py
+
+### Generating Debian/Ubuntu package
+To generate packages the extra requirements in requirements-extra.txt must be installed, the 
+steps are as follows:
+
+    $ cd <dupeGuru directory>
+    $ python3 -m venv --system-site-packages ./env
+    $ source ./env/bin/activate
+    $ pip install -r requirements.txt -r requirements-extra.txt
+    $ python build.py --clean
+    $ python package.py
+
+This can be made a one-liner (once in the directory) as:
+
+    $ bash -c "python3 -m venv --system-site-packages env && source env/bin/activate && pip install -r requirements.txt -r requirements-extra.txt && python build.py --clean && python package.py"
+
+## Running tests

 The complete test suite is run with [Tox 1.7+][tox]. If you have it installed system-wide, you
 don't even need to set up a virtualenv. Just `cd` into the root project folder and run `tox`.
--- a/Windows.md
+++ b/Windows.md
@@ -2,28 +2,26 @@

 ### Prerequisites

- [Python 3.5+][python]
- [Visual Studio 2017][vs] or [Visual Studio Build Tools 2017][vsBuildTools] with the Windows 10 SDK
+- [Python 3.7+][python]
+- [Visual Studio 2019][vs] or [Visual Studio Build Tools 2019][vsBuildTools] with the Windows 10 SDK
 - [nsis][nsis] (for installer creation)
 - [msys2][msys2] (for using makefile method)

-When installing Visual Studio or the Visual Studio Build Tools with the Windows 10 SDK on versions of Windows below 10 be sure to make sure that the Universal CRT is installed before installing Visual studio as noted in the [Windows 10 SDK Notes][win10sdk] and found at [KB2999226][KB2999226].
+NOTE: When installing Visual Studio or the Visual Studio Build Tools with the Windows 10 SDK on versions of Windows below 10 be sure to make sure that the Universal CRT is installed before installing Visual studio as noted in the [Windows 10 SDK Notes][win10sdk] and found at [KB2999226][KB2999226].

-After installing python it is recommended to update setuptools before compiling packages.  To update run (example is for python launcher and 3.5):
+After installing python it is recommended to update setuptools before compiling packages.  To update run (example is for python launcher and 3.8):

-    $ py -3.5 -m pip install --upgrade setuptools
+    $ py -3.8 -m pip install --upgrade setuptools

-More details on setting up python for compiling packages on windows can be found on the [python wiki][pythonWindowsCompilers]
+More details on setting up python for compiling packages on windows can be found on the [python wiki][pythonWindowsCompilers] Take note of the required vc++ versions.

 ### With build.py (preferred)
-To build with a different python version 3.5 vs 3.6 or 32 bit vs 64 bit specify that version instead of -3.5 to the `py` command below.  If you want to build additional versions while keeping all virtual environments setup use a different location for each vritual environment.
+To build with a different python version 3.7 vs 3.8 or 32 bit vs 64 bit specify that version instead of -3.8 to the `py` command below.  If you want to build additional versions while keeping all virtual environments setup use a different location for each virtual environment.

    $ cd <dupeGuru directory>
-    $ git submodule init
-    $ git submodule update
-    $ py -3.5 -m venv .\env
+    $ py -3.8 -m venv .\env
    $ .\env\Scripts\activate
-    $ pip install -r requirements.txt -r requirements-windows.txt
+    $ pip install -r requirements.txt
    $ python build.py
    $ python run.py

@@ -36,23 +34,21 @@ It is possible to build dupeGuru with the makefile on windows using a compatable
 Then the following execution of the makefile should work.  Pass the correct value for PYTHON to the makefile if not on the path as python3.

    $ cd <dupeGuru directory>
-    $ make PYTHON='py -3.5'
+    $ make PYTHON='py -3.8'
    $ make run

-NOTE: Install PyQt5 & cx-Freeze with requirements-windows.txt into the venv before runing the packaging scripts in the section below.
-
 ### Generate Windows Installer Packages
-You need to use the respective x86 or x64 version of python to build the 32 bit and 64 bit versions.  The build scripts will automatically detect the python architecture for you. When using build.py make sure the resulting python works before continuing to package.py.  NOTE: package.py looks for the 'makensis' executable in the default location for a 64 bit windows system.  Run the following in the respective virtual environment.
+You need to use the respective x86 or x64 version of python to build the 32 bit and 64 bit versions.  The build scripts will automatically detect the python architecture for you. When using build.py make sure the resulting python works before continuing to package.py.  NOTE: package.py looks for the 'makensis' executable in the default location for a 64 bit windows system.  The extra requirements need to be installed to run packaging: `pip install -r requirements-extra.txt`. Run the following in the respective virtual environment.

    $ python package.py

 ### Running tests
-The complete test suite can be run with tox just like on linux.
+The complete test suite can be run with tox just like on linux. NOTE: The extra requirements need to be installed to run unit tests: `pip install -r requirements-extra.txt`.

 [python]: http://www.python.org/
 [nsis]: http://nsis.sourceforge.net/Main_Page
-[vs]: https://www.visualstudio.com/downloads/#visual-studio-community-2017
-[vsBuildTools]: https://www.visualstudio.com/downloads/#build-tools-for-visual-studio-2017
+[vs]: https://www.visualstudio.com/downloads/#visual-studio-community-2019
+[vsBuildTools]: https://www.visualstudio.com/downloads/#build-tools-for-visual-studio-2019
 [win10sdk]: https://developer.microsoft.com/en-us/windows/downloads/windows-10-sdk
 [KB2999226]: https://support.microsoft.com/en-us/help/2999226/update-for-universal-c-runtime-in-windows
 [pythonWindowsCompilers]: https://wiki.python.org/moin/WindowsCompilers
--- a/build.py
+++ b/build.py
@@ -4,136 +4,162 @@
 # which should be included with this package. The terms are also available at
 # http://www.gnu.org/licenses/gpl-3.0.html

-import os
-import os.path as op
+from pathlib import Path
+import sys
 from optparse import OptionParser
 import shutil
+from multiprocessing import Pool

-from setuptools import setup, Extension
-
+from setuptools import sandbox
 from hscommon import sphinxgen
 from hscommon.build import (
-    add_to_pythonpath, print_and_do, move_all, fix_qt_resource_file,
+    add_to_pythonpath,
+    print_and_do,
+    fix_qt_resource_file,
 )
 from hscommon import loc

+
 def parse_args():
    usage = "usage: %prog [options]"
    parser = OptionParser(usage=usage)
    parser.add_option(
-        '--clean', action='store_true', dest='clean',
-        help="Clean build folder before building"
+        "--clean",
+        action="store_true",
+        dest="clean",
+        help="Clean build folder before building",
+    )
+    parser.add_option("--doc", action="store_true", dest="doc", help="Build only the help file (en)")
+    parser.add_option("--alldoc", action="store_true", dest="all_doc", help="Build only the help file in all languages")
+    parser.add_option("--loc", action="store_true", dest="loc", help="Build only localization")
+    parser.add_option(
+        "--updatepot",
+        action="store_true",
+        dest="updatepot",
+        help="Generate .pot files from source code.",
    )
    parser.add_option(
-        '--doc', action='store_true', dest='doc',
-        help="Build only the help file"
+        "--mergepot",
+        action="store_true",
+        dest="mergepot",
+        help="Update all .po files based on .pot files.",
    )
    parser.add_option(
-        '--loc', action='store_true', dest='loc',
-        help="Build only localization"
+        "--normpo",
+        action="store_true",
+        dest="normpo",
+        help="Normalize all PO files (do this before commit).",
    )
    parser.add_option(
-        '--updatepot', action='store_true', dest='updatepot',
-        help="Generate .pot files from source code."
-    )
-    parser.add_option(
-        '--mergepot', action='store_true', dest='mergepot',
-        help="Update all .po files based on .pot files."
-    )
-    parser.add_option(
-        '--normpo', action='store_true', dest='normpo',
-        help="Normalize all PO files (do this before commit)."
+        "--modules",
+        action="store_true",
+        dest="modules",
+        help="Build the python modules.",
    )
    (options, args) = parser.parse_args()
    return options

+
+def build_one_help(language):
+    print("Generating Help in {}".format(language))
+    current_path = Path(".").absolute()
+    changelog_path = current_path.joinpath("help", "changelog")
+    tixurl = "https://github.com/arsenetar/dupeguru/issues/{}"
+    changelogtmpl = current_path.joinpath("help", "changelog.tmpl")
+    conftmpl = current_path.joinpath("help", "conf.tmpl")
+    help_basepath = current_path.joinpath("help", language)
+    help_destpath = current_path.joinpath("build", "help", language)
+    confrepl = {"language": language}
+    sphinxgen.gen(
+        help_basepath,
+        help_destpath,
+        changelog_path,
+        tixurl,
+        confrepl,
+        conftmpl,
+        changelogtmpl,
+    )
+
+
 def build_help():
-    print("Generating Help")
-    current_path = op.abspath('.')
-    help_basepath = op.join(current_path, 'help', 'en')
-    help_destpath = op.join(current_path, 'build', 'help')
-    changelog_path = op.join(current_path, 'help', 'changelog')
-    tixurl = "https://github.com/hsoft/dupeguru/issues/{}"
-    confrepl = {'language': 'en'}
-    changelogtmpl = op.join(current_path, 'help', 'changelog.tmpl')
-    conftmpl = op.join(current_path, 'help', 'conf.tmpl')
-    sphinxgen.gen(help_basepath, help_destpath, changelog_path, tixurl, confrepl, conftmpl, changelogtmpl)
+    languages = ["en", "de", "fr", "hy", "ru", "uk"]
+    # Running with Pools as for some reason sphinx seems to cross contaminate the output otherwise
+    with Pool(len(languages)) as p:
+        p.map(build_one_help, languages)
+

 def build_qt_localizations():
-    loc.compile_all_po(op.join('qtlib', 'locale'))
-    loc.merge_locale_dir(op.join('qtlib', 'locale'), 'locale')
+    loc.compile_all_po(Path("qtlib", "locale"))
+    loc.merge_locale_dir(Path("qtlib", "locale"), "locale")
+

 def build_localizations():
-    loc.compile_all_po('locale')
+    loc.compile_all_po("locale")
    build_qt_localizations()
-    locale_dest = op.join('build', 'locale')
-    if op.exists(locale_dest):
+    locale_dest = Path("build", "locale")
+    if locale_dest.exists():
        shutil.rmtree(locale_dest)
-    shutil.copytree('locale', locale_dest, ignore=shutil.ignore_patterns('*.po', '*.pot'))
+    shutil.copytree("locale", locale_dest, ignore=shutil.ignore_patterns("*.po", "*.pot"))
+

 def build_updatepot():
    print("Building .pot files from source files")
    print("Building core.pot")
-    loc.generate_pot(['core'], op.join('locale', 'core.pot'), ['tr'])
+    loc.generate_pot(["core"], Path("locale", "core.pot"), ["tr"])
    print("Building columns.pot")
-    loc.generate_pot(['core'], op.join('locale', 'columns.pot'), ['coltr'])
+    loc.generate_pot(["core"], Path("locale", "columns.pot"), ["coltr"])
    print("Building ui.pot")
    # When we're not under OS X, we don't want to overwrite ui.pot because it contains Cocoa locs
    # We want to merge the generated pot with the old pot in the most preserving way possible.
-    ui_packages = ['qt', op.join('cocoa', 'inter')]
-    loc.generate_pot(ui_packages, op.join('locale', 'ui.pot'), ['tr'], merge=True)
+    ui_packages = ["qt", Path("cocoa", "inter")]
+    loc.generate_pot(ui_packages, Path("locale", "ui.pot"), ["tr"], merge=True)
    print("Building qtlib.pot")
-    loc.generate_pot(['qtlib'], op.join('qtlib', 'locale', 'qtlib.pot'), ['tr'])
+    loc.generate_pot(["qtlib"], Path("qtlib", "locale", "qtlib.pot"), ["tr"])
+

 def build_mergepot():
    print("Updating .po files using .pot files")
-    loc.merge_pots_into_pos('locale')
-    loc.merge_pots_into_pos(op.join('qtlib', 'locale'))
-    loc.merge_pots_into_pos(op.join('cocoalib', 'locale'))
+    loc.merge_pots_into_pos("locale")
+    loc.merge_pots_into_pos(Path("qtlib", "locale"))
+    # loc.merge_pots_into_pos(Path("cocoalib", "locale"))
+

 def build_normpo():
-    loc.normalize_all_pos('locale')
-    loc.normalize_all_pos(op.join('qtlib', 'locale'))
-    loc.normalize_all_pos(op.join('cocoalib', 'locale'))
+    loc.normalize_all_pos("locale")
+    loc.normalize_all_pos(Path("qtlib", "locale"))
+    # loc.normalize_all_pos(Path("cocoalib", "locale"))
+

 def build_pe_modules():
    print("Building PE Modules")
-    exts = [
-        Extension(
-            "_block",
-            [op.join('core', 'pe', 'modules', 'block.c'), op.join('core', 'pe', 'modules', 'common.c')]
-        ),
-        Extension(
-            "_cache",
-            [op.join('core', 'pe', 'modules', 'cache.c'), op.join('core', 'pe', 'modules', 'common.c')]
-        ),
-    ]
-    exts.append(Extension("_block_qt", [op.join('qt', 'pe', 'modules', 'block.c')]))
-    setup(
-        script_args=['build_ext', '--inplace'],
-        ext_modules=exts,
-    )
-    move_all('_block_qt*', op.join('qt', 'pe'))
-    move_all('_block*', op.join('core', 'pe'))
-    move_all('_cache*', op.join('core', 'pe'))
+    # Leverage setup.py to build modules
+    sandbox.run_setup("setup.py", ["build_ext", "--inplace"])
+

 def build_normal():
    print("Building dupeGuru with UI qt")
-    add_to_pythonpath('.')
+    add_to_pythonpath(".")
    print("Building dupeGuru")
    build_pe_modules()
    print("Building localizations")
    build_localizations()
    print("Building Qt stuff")
-    print_and_do("pyrcc5 {0} > {1}".format(op.join('qt', 'dg.qrc'), op.join('qt', 'dg_rc.py')))
-    fix_qt_resource_file(op.join('qt', 'dg_rc.py'))
+    print_and_do("pyrcc5 {0} > {1}".format(Path("qt", "dg.qrc"), Path("qt", "dg_rc.py")))
+    fix_qt_resource_file(Path("qt", "dg_rc.py"))
    build_help()

+
 def main():
+    if sys.version_info < (3, 7):
+        sys.exit("Python < 3.7 is unsupported.")
    options = parse_args()
-    if not op.exists('build'):
-        os.mkdir('build')
+    if options.clean and Path("build").exists():
+        shutil.rmtree("build")
+    if not Path("build").exists():
+        Path("build").mkdir()
    if options.doc:
+        build_one_help("en")
+    elif options.all_doc:
        build_help()
    elif options.loc:
        build_localizations()
@@ -143,8 +169,11 @@ def main():
        build_mergepot()
    elif options.normpo:
        build_normpo()
+    elif options.modules:
+        build_pe_modules()
    else:
        build_normal()

-if __name__ == '__main__':
+
+if __name__ == "__main__":
    main()
--- a/core/init.py
+++ b/core/init.py
@@ -1,3 +1,2 @@
-__version__ = '4.0.4 RC'
-__appname__ = 'dupeGuru'
-
+__version__ = "4.2.1"
+__appname__ = "dupeGuru"
--- a/core/app.py
+++ b/core/app.py
@@ -26,16 +26,18 @@ from .pe.photo import get_delta_dimensions
 from .util import cmp_value, fix_surrogate_encoding
 from . import directories, results, export, fs, prioritize
 from .ignore import IgnoreList
+from .exclude import ExcludeDict as ExcludeList
 from .scanner import ScanType
 from .gui.deletion_options import DeletionOptions
 from .gui.details_panel import DetailsPanel
 from .gui.directory_tree import DirectoryTree
 from .gui.ignore_list_dialog import IgnoreListDialog
+from .gui.exclude_list_dialog import ExcludeListDialogCore
 from .gui.problem_dialog import ProblemDialog
 from .gui.stats_label import StatsLabel

-HAD_FIRST_LAUNCH_PREFERENCE = 'HadFirstLaunch'
-DEBUG_MODE_PREFERENCE = 'DebugMode'
+HAD_FIRST_LAUNCH_PREFERENCE = "HadFirstLaunch"
+DEBUG_MODE_PREFERENCE = "DebugMode"

 MSG_NO_MARKED_DUPES = tr("There are no marked duplicates. Nothing has been done.")
 MSG_NO_SELECTED_DUPES = tr("There are no selected duplicates. Nothing has been done.")
@@ -44,31 +46,36 @@ MSG_MANY_FILES_TO_OPEN = tr(
    "files are opened with, doing so can create quite a mess. Continue?"
 )

+
 class DestType:
-    Direct = 0
-    Relative = 1
-    Absolute = 2
+    DIRECT = 0
+    RELATIVE = 1
+    ABSOLUTE = 2
+

 class JobType:
-    Scan = 'job_scan'
-    Load = 'job_load'
-    Move = 'job_move'
-    Copy = 'job_copy'
-    Delete = 'job_delete'
+    SCAN = "job_scan"
+    LOAD = "job_load"
+    MOVE = "job_move"
+    COPY = "job_copy"
+    DELETE = "job_delete"
+

 class AppMode:
-    Standard = 0
-    Music = 1
-    Picture = 2
+    STANDARD = 0
+    MUSIC = 1
+    PICTURE = 2
+

 JOBID2TITLE = {
-    JobType.Scan: tr("Scanning for duplicates"),
-    JobType.Load: tr("Loading"),
-    JobType.Move: tr("Moving"),
-    JobType.Copy: tr("Copying"),
-    JobType.Delete: tr("Sending to Trash"),
+    JobType.SCAN: tr("Scanning for duplicates"),
+    JobType.LOAD: tr("Loading"),
+    JobType.MOVE: tr("Moving"),
+    JobType.COPY: tr("Copying"),
+    JobType.DELETE: tr("Sending to Trash"),
 }

+
 class DupeGuru(Broadcaster):
    """Holds everything together.

@@ -100,7 +107,8 @@ class DupeGuru(Broadcaster):

        Instance of :mod:`meta-gui <core.gui>` table listing the results from :attr:`results`
    """
-    #--- View interface
+
+    # --- View interface
    # get_default(key_name)
    # set_default(key_name, value)
    # show_message(msg)
@@ -116,37 +124,41 @@ class DupeGuru(Broadcaster):

    NAME = PROMPT_NAME = "dupeGuru"

-    PICTURE_CACHE_TYPE = 'sqlite' # set to 'shelve' for a ShelveCache
+    PICTURE_CACHE_TYPE = "sqlite"  # set to 'shelve' for a ShelveCache

-    def __init__(self, view):
+    def __init__(self, view, portable=False):
        if view.get_default(DEBUG_MODE_PREFERENCE):
            logging.getLogger().setLevel(logging.DEBUG)
            logging.debug("Debug mode enabled")
        Broadcaster.__init__(self)
        self.view = view
-        self.appdata = desktop.special_folder_path(desktop.SpecialFolder.AppData, appname=self.NAME)
+        self.appdata = desktop.special_folder_path(desktop.SpecialFolder.APPDATA, appname=self.NAME, portable=portable)
        if not op.exists(self.appdata):
            os.makedirs(self.appdata)
-        self.app_mode = AppMode.Standard
+        self.app_mode = AppMode.STANDARD
        self.discarded_file_count = 0
-        self.directories = directories.Directories()
+        self.exclude_list = ExcludeList()
+        hash_cache_file = op.join(self.appdata, "hash_cache.db")
+        fs.filesdb.connect(hash_cache_file)
+        self.directories = directories.Directories(self.exclude_list)
        self.results = results.Results(self)
        self.ignore_list = IgnoreList()
        # In addition to "app-level" options, this dictionary also holds options that will be
        # sent to the scanner. They don't have default values because those defaults values are
        # defined in the scanner class.
        self.options = {
-            'escape_filter_regexp': True,
-            'clean_empty_dirs': False,
-            'ignore_hardlink_matches': False,
-            'copymove_dest_type': DestType.Relative,
-            'picture_cache_type': self.PICTURE_CACHE_TYPE
+            "escape_filter_regexp": True,
+            "clean_empty_dirs": False,
+            "ignore_hardlink_matches": False,
+            "copymove_dest_type": DestType.RELATIVE,
+            "picture_cache_type": self.PICTURE_CACHE_TYPE,
        }
        self.selected_dupes = []
        self.details_panel = DetailsPanel(self)
        self.directory_tree = DirectoryTree(self)
        self.problem_dialog = ProblemDialog(self)
        self.ignore_list_dialog = IgnoreListDialog(self)
+        self.exclude_list_dialog = ExcludeListDialogCore(self)
        self.stats_label = StatsLabel(self)
        self.result_table = None
        self.deletion_options = DeletionOptions()
@@ -155,13 +167,13 @@ class DupeGuru(Broadcaster):
        for child in children:
            child.connect()

-    #--- Private
+    # --- Private
    def _recreate_result_table(self):
        if self.result_table is not None:
            self.result_table.disconnect()
-        if self.app_mode == AppMode.Picture:
+        if self.app_mode == AppMode.PICTURE:
            self.result_table = pe.result_table.ResultTable(self)
-        elif self.app_mode == AppMode.Music:
+        elif self.app_mode == AppMode.MUSIC:
            self.result_table = me.result_table.ResultTable(self)
        else:
            self.result_table = se.result_table.ResultTable(self)
@@ -169,26 +181,24 @@ class DupeGuru(Broadcaster):
        self.view.create_results_window()

    def _get_picture_cache_path(self):
-        cache_type = self.options['picture_cache_type']
-        cache_name = 'cached_pictures.shelve' if cache_type == 'shelve' else 'cached_pictures.db'
+        cache_type = self.options["picture_cache_type"]
+        cache_name = "cached_pictures.shelve" if cache_type == "shelve" else "cached_pictures.db"
        return op.join(self.appdata, cache_name)

    def _get_dupe_sort_key(self, dupe, get_group, key, delta):
-        if self.app_mode in (AppMode.Music, AppMode.Picture):
-            if key == 'folder_path':
-                dupe_folder_path = getattr(dupe, 'display_folder_path', dupe.folder_path)
+        if self.app_mode in (AppMode.MUSIC, AppMode.PICTURE) and key == "folder_path":
+            dupe_folder_path = getattr(dupe, "display_folder_path", dupe.folder_path)
            return str(dupe_folder_path).lower()
-        if self.app_mode == AppMode.Picture:
-            if delta and key == 'dimensions':
+        if self.app_mode == AppMode.PICTURE and delta and key == "dimensions":
            r = cmp_value(dupe, key)
            ref_value = cmp_value(get_group().ref, key)
            return get_delta_dimensions(r, ref_value)
-        if key == 'marked':
+        if key == "marked":
            return self.results.is_marked(dupe)
-        if key == 'percentage':
+        if key == "percentage":
            m = get_group().get_match_of(dupe)
            return m.percentage
-        elif key == 'dupe_count':
+        elif key == "dupe_count":
            return 0
        else:
            result = cmp_value(dupe, key)
@@ -202,15 +212,14 @@ class DupeGuru(Broadcaster):
        return result

    def _get_group_sort_key(self, group, key):
-        if self.app_mode in (AppMode.Music, AppMode.Picture):
-            if key == 'folder_path':
-                dupe_folder_path = getattr(group.ref, 'display_folder_path', group.ref.folder_path)
+        if self.app_mode in (AppMode.MUSIC, AppMode.PICTURE) and key == "folder_path":
+            dupe_folder_path = getattr(group.ref, "display_folder_path", group.ref.folder_path)
            return str(dupe_folder_path).lower()
-        if key == 'percentage':
+        if key == "percentage":
            return group.percentage
-        if key == 'dupe_count':
+        if key == "dupe_count":
            return len(group)
-        if key == 'marked':
+        if key == "marked":
            return len([dupe for dupe in group.dupes if self.results.is_marked(dupe)])
        return cmp_value(group.ref, key)

@@ -243,7 +252,7 @@ class DupeGuru(Broadcaster):

    def _create_file(self, path):
        # We add fs.Folder to fileclasses in case the file we're loading contains folder paths.
-        return fs.get_file(path, self.fileclasses + [fs.Folder])
+        return fs.get_file(path, self.fileclasses + [se.fs.Folder])

    def _get_file(self, str_path):
        path = Path(str_path)
@@ -257,10 +266,7 @@ class DupeGuru(Broadcaster):
            return None

    def _get_export_data(self):
-        columns = [
-            col for col in self.result_table.columns.ordered_columns
-            if col.visible and col.name != 'marked'
-        ]
+        columns = [col for col in self.result_table._columns.ordered_columns if col.visible and col.name != "marked"]
        colnames = [col.display for col in columns]
        rows = []
        for group_id, group in enumerate(self.results.groups):
@@ -272,11 +278,8 @@ class DupeGuru(Broadcaster):
        return colnames, rows

    def _results_changed(self):
-        self.selected_dupes = [
-            d for d in self.selected_dupes
-            if self.results.get_group_of_duplicate(d) is not None
-        ]
-        self.notify('results_changed')
+        self.selected_dupes = [d for d in self.selected_dupes if self.results.get_group_of_duplicate(d) is not None]
+        self.notify("results_changed")

    def _start_job(self, jobid, func, args=()):
        title = JOBID2TITLE[jobid]
@@ -290,32 +293,36 @@ class DupeGuru(Broadcaster):
            self.view.show_message(msg)

    def _job_completed(self, jobid):
-        if jobid == JobType.Scan:
+        if jobid == JobType.SCAN:
            self._results_changed()
+            fs.filesdb.commit()
            if not self.results.groups:
                self.view.show_message(tr("No duplicates found."))
            else:
                self.view.show_results_window()
-        if jobid in {JobType.Move, JobType.Delete}:
+        if jobid in {JobType.MOVE, JobType.DELETE}:
            self._results_changed()
-        if jobid == JobType.Load:
+        if jobid == JobType.LOAD:
            self._recreate_result_table()
            self._results_changed()
            self.view.show_results_window()
-        if jobid in {JobType.Copy, JobType.Move, JobType.Delete}:
+        if jobid in {JobType.COPY, JobType.MOVE, JobType.DELETE}:
            if self.results.problems:
                self.problem_dialog.refresh()
                self.view.show_problem_dialog()
            else:
-                msg = {
-                    JobType.Copy: tr("All marked files were copied successfully."),
-                    JobType.Move: tr("All marked files were moved successfully."),
-                    JobType.Delete: tr("All marked files were successfully sent to Trash."),
-                }[jobid]
+                if jobid == JobType.COPY:
+                    msg = tr("All marked files were copied successfully.")
+                elif jobid == JobType.MOVE:
+                    msg = tr("All marked files were moved successfully.")
+                elif jobid == JobType.DELETE and self.deletion_options.direct:
+                    msg = tr("All marked files were deleted successfully.")
+                else:
+                    msg = tr("All marked files were successfully sent to Trash.")
                self.view.show_message(msg)

    def _job_error(self, jobid, err):
-        if jobid == JobType.Load:
+        if jobid == JobType.LOAD:
            msg = tr("Could not load file: {}").format(err)
            self.view.show_message(msg)
            return False
@@ -341,26 +348,26 @@ class DupeGuru(Broadcaster):
        if dupes == self.selected_dupes:
            return
        self.selected_dupes = dupes
-        self.notify('dupes_selected')
+        self.notify("dupes_selected")

-    #--- Protected
+    # --- Protected
    def _get_fileclasses(self):
-        if self.app_mode == AppMode.Picture:
+        if self.app_mode == AppMode.PICTURE:
            return [pe.photo.PLAT_SPECIFIC_PHOTO_CLASS]
-        elif self.app_mode == AppMode.Music:
+        elif self.app_mode == AppMode.MUSIC:
            return [me.fs.MusicFile]
        else:
            return [se.fs.File]

    def _prioritization_categories(self):
-        if self.app_mode == AppMode.Picture:
+        if self.app_mode == AppMode.PICTURE:
            return pe.prioritize.all_categories()
-        elif self.app_mode == AppMode.Music:
+        elif self.app_mode == AppMode.MUSIC:
            return me.prioritize.all_categories()
        else:
            return prioritize.all_categories()

-    #--- Public
+    # --- Public
    def add_directory(self, d):
        """Adds folder ``d`` to :attr:`directories`.

@@ -370,15 +377,14 @@ class DupeGuru(Broadcaster):
        """
        try:
            self.directories.add_path(Path(d))
-            self.notify('directories_changed')
+            self.notify("directories_changed")
        except directories.AlreadyThereError:
            self.view.show_message(tr("'{}' already is in the list.").format(d))
        except directories.InvalidPathError:
            self.view.show_message(tr("'{}' does not exist.").format(d))

    def add_selected_to_ignore_list(self):
-        """Adds :attr:`selected_dupes` to :attr:`ignore_list`.
-        """
+        """Adds :attr:`selected_dupes` to :attr:`ignore_list`."""
        dupes = self.without_ref(self.selected_dupes)
        if not dupes:
            self.view.show_message(MSG_NO_SELECTED_DUPES)
@@ -390,25 +396,25 @@ class DupeGuru(Broadcaster):
            g = self.results.get_group_of_duplicate(dupe)
            for other in g:
                if other is not dupe:
-                    self.ignore_list.Ignore(str(other.path), str(dupe.path))
+                    self.ignore_list.ignore(str(other.path), str(dupe.path))
        self.remove_duplicates(dupes)
        self.ignore_list_dialog.refresh()

-    def apply_filter(self, filter):
+    def apply_filter(self, result_filter):
        """Apply a filter ``filter`` to the results so that it shows only dupe groups that match it.

        :param str filter: filter to apply
        """
        self.results.apply_filter(None)
-        if self.options['escape_filter_regexp']:
-            filter = escape(filter, set('()[]\\.|+?^'))
-            filter = escape(filter, '*', '.')
-        self.results.apply_filter(filter)
+        if self.options["escape_filter_regexp"]:
+            result_filter = escape(result_filter, set("()[]\\.|+?^"))
+            result_filter = escape(result_filter, "*", ".")
+        self.results.apply_filter(result_filter)
        self._results_changed()

    def clean_empty_dirs(self, path):
-        if self.options['clean_empty_dirs']:
-            while delete_if_empty(path, ['.DS_Store']):
+        if self.options["clean_empty_dirs"]:
+            while delete_if_empty(path, [".DS_Store"]):
                path = path.parent()

    def clear_picture_cache(self):
@@ -417,14 +423,17 @@ class DupeGuru(Broadcaster):
        except FileNotFoundError:
            pass  # we don't care

+    def clear_hash_cache(self):
+        fs.filesdb.clear()
+
    def copy_or_move(self, dupe, copy: bool, destination: str, dest_type: DestType):
        source_path = dupe.path
        location_path = first(p for p in self.directories if dupe.path in p)
        dest_path = Path(destination)
-        if dest_type in {DestType.Relative, DestType.Absolute}:
+        if dest_type in {DestType.RELATIVE, DestType.ABSOLUTE}:
            # no filename, no windows drive letter
            source_base = source_path.remove_drive_letter().parent()
-            if dest_type == DestType.Relative:
+            if dest_type == DestType.RELATIVE:
                source_base = source_base[location_path:]
            dest_path = dest_path[source_base]
        if not dest_path.exists():
@@ -444,6 +453,7 @@ class DupeGuru(Broadcaster):

        :param bool copy: If True, duplicates will be copied instead of moved
        """
+
        def do(j):
            def op(dupe):
                j.add_progress()
@@ -455,28 +465,30 @@ class DupeGuru(Broadcaster):
        if not self.results.mark_count:
            self.view.show_message(MSG_NO_MARKED_DUPES)
            return
-        opname = tr("copy") if copy else tr("move")
-        prompt = tr("Select a directory to {} marked files to").format(opname)
-        destination = self.view.select_dest_folder(prompt)
+        destination = self.view.select_dest_folder(
+            tr("Select a directory to copy marked files to")
+            if copy
+            else tr("Select a directory to move marked files to")
+        )
        if destination:
-            desttype = self.options['copymove_dest_type']
-            jobid = JobType.Copy if copy else JobType.Move
+            desttype = self.options["copymove_dest_type"]
+            jobid = JobType.COPY if copy else JobType.MOVE
            self._start_job(jobid, do)

    def delete_marked(self):
-        """Start an async job to send marked duplicates to the trash.
-        """
+        """Start an async job to send marked duplicates to the trash."""
        if not self.results.mark_count:
            self.view.show_message(MSG_NO_MARKED_DUPES)
            return
        if not self.deletion_options.show(self.results.mark_count):
            return
        args = [
-            self.deletion_options.link_deleted, self.deletion_options.use_hardlinks,
-            self.deletion_options.direct
+            self.deletion_options.link_deleted,
+            self.deletion_options.use_hardlinks,
+            self.deletion_options.direct,
        ]
        logging.debug("Starting deletion job with args %r", args)
-        self._start_job(JobType.Delete, self._do_delete, args=args)
+        self._start_job(JobType.DELETE, self._do_delete, args=args)

    def export_to_xhtml(self):
        """Export current results to XHTML.
@@ -495,7 +507,7 @@ class DupeGuru(Broadcaster):
        The columns and their order in the resulting CSV file is determined in the same way as in
        :meth:`export_to_xhtml`.
        """
-        dest_file = self.view.select_dest_file(tr("Select a destination for your exported CSV"), 'csv')
+        dest_file = self.view.select_dest_file(tr("Select a destination for your exported CSV"), "csv")
        if dest_file:
            colnames, rows = self._get_export_data()
            try:
@@ -505,13 +517,14 @@ class DupeGuru(Broadcaster):

    def get_display_info(self, dupe, group, delta=False):
        def empty_data():
-            return {c.name: '---' for c in self.result_table.COLUMNS[1:]}
+            return {c.name: "---" for c in self.result_table.COLUMNS[1:]}
+
        if (dupe is None) or (group is None):
            return empty_data()
        try:
            return dupe.get_display_info(group, delta)
        except Exception as e:
-            logging.warning("Exception on GetDisplayInfo for %s: %s", str(dupe.path), str(e))
+            logging.warning("Exception (type: %s) on GetDisplayInfo for %s: %s", type(e), str(dupe.path), str(e))
            return empty_data()

    def invoke_custom_command(self):
@@ -521,19 +534,19 @@ class DupeGuru(Broadcaster):
        is replaced with that dupe's ref file. If there's no selection, the command is not invoked.
        If the dupe is a ref, ``%d`` and ``%r`` will be the same.
        """
-        cmd = self.view.get_default('CustomCommand')
+        cmd = self.view.get_default("CustomCommand")
        if not cmd:
            msg = tr("You have no custom command set up. Set it up in your preferences.")
            self.view.show_message(msg)
            return
        if not self.selected_dupes:
            return
-        dupe = self.selected_dupes[0]
-        group = self.results.get_group_of_duplicate(dupe)
-        ref = group.ref
-        cmd = cmd.replace('%d', str(dupe.path))
-        cmd = cmd.replace('%r', str(ref.path))
-        match = re.match(r'"([^"]+)"(.*)', cmd)
+        dupes = self.selected_dupes
+        refs = [self.results.get_group_of_duplicate(dupe).ref for dupe in dupes]
+        for dupe, ref in zip(dupes, refs):
+            dupe_cmd = cmd.replace("%d", str(dupe.path))
+            dupe_cmd = dupe_cmd.replace("%r", str(ref.path))
+            match = re.match(r'"([^"]+)"(.*)', dupe_cmd)
            if match is not None:
                # This code here is because subprocess. Popen doesn't seem to accept, under Windows,
                # executable paths with spaces in it, *even* when they're enclosed in "". So this is
@@ -542,7 +555,7 @@ class DupeGuru(Broadcaster):
                path, exename = op.split(exepath)
                subprocess.Popen(exename + args, shell=True, cwd=path)
            else:
-            subprocess.Popen(cmd, shell=True)
+                subprocess.Popen(dupe_cmd, shell=True)

    def load(self):
        """Load directory selection and ignore list from files in appdata.
@@ -551,20 +564,31 @@ class DupeGuru(Broadcaster):
        is persistent data, is the same as when the last session was closed (when :meth:`save` was
        called).
        """
-        self.directories.load_from_file(op.join(self.appdata, 'last_directories.xml'))
-        self.notify('directories_changed')
-        p = op.join(self.appdata, 'ignore_list.xml')
+        self.directories.load_from_file(op.join(self.appdata, "last_directories.xml"))
+        self.notify("directories_changed")
+        p = op.join(self.appdata, "ignore_list.xml")
        self.ignore_list.load_from_xml(p)
        self.ignore_list_dialog.refresh()
+        p = op.join(self.appdata, "exclude_list.xml")
+        self.exclude_list.load_from_xml(p)
+        self.exclude_list_dialog.refresh()
+
+    def load_directories(self, filepath):
+        # Clear out previous entries
+        self.directories.__init__()
+        self.directories.load_from_file(filepath)
+        self.notify("directories_changed")

    def load_from(self, filename):
        """Start an async job to load results from ``filename``.

        :param str filename: path of the XML file (created with :meth:`save_as`) to load
        """
+
        def do(j):
            self.results.load_from_xml(filename, self._get_file, j)
-        self._start_job(JobType.Load, do)
+
+        self._start_job(JobType.LOAD, do)

    def make_selected_reference(self):
        """Promote :attr:`selected_dupes` to reference position within their respective groups.
@@ -577,8 +601,7 @@ class DupeGuru(Broadcaster):
        changed_groups = set()
        for dupe in dupes:
            g = self.results.get_group_of_duplicate(dupe)
-            if g not in changed_groups:
-                if self.results.make_ref(dupe):
+            if g not in changed_groups and self.results.make_ref(dupe):
                changed_groups.add(g)
        # It's not always obvious to users what this action does, so to make it a bit clearer,
        # we change our selection to the ref of all changed groups. However, we also want to keep
@@ -588,35 +611,31 @@ class DupeGuru(Broadcaster):
        if not self.result_table.power_marker:
            if changed_groups:
                self.selected_dupes = [
-                    d for d in self.selected_dupes
-                    if self.results.get_group_of_duplicate(d).ref is d
+                    d for d in self.selected_dupes if self.results.get_group_of_duplicate(d).ref is d
                ]
-            self.notify('results_changed')
+            self.notify("results_changed")
        else:
            # If we're in "Dupes Only" mode (previously called Power Marker), things are a bit
            # different. The refs are not shown in the table, and if our operation is successful,
            # this means that there's no way to follow our dupe selection. Then, the best thing to
            # do is to keep our selection index-wise (different dupe selection, but same index
            # selection).
-            self.notify('results_changed_but_keep_selection')
+            self.notify("results_changed_but_keep_selection")

    def mark_all(self):
-        """Set all dupes in the results as marked.
-        """
+        """Set all dupes in the results as marked."""
        self.results.mark_all()
-        self.notify('marking_changed')
+        self.notify("marking_changed")

    def mark_none(self):
-        """Set all dupes in the results as unmarked.
-        """
+        """Set all dupes in the results as unmarked."""
        self.results.mark_none()
-        self.notify('marking_changed')
+        self.notify("marking_changed")

    def mark_invert(self):
-        """Invert the marked state of all dupes in the results.
-        """
+        """Invert the marked state of all dupes in the results."""
        self.results.mark_invert()
-        self.notify('marking_changed')
+        self.notify("marking_changed")

    def mark_dupe(self, dupe, marked):
        """Change marked status of ``dupe``.
@@ -629,21 +648,18 @@ class DupeGuru(Broadcaster):
            self.results.mark(dupe)
        else:
            self.results.unmark(dupe)
-        self.notify('marking_changed')
+        self.notify("marking_changed")

    def open_selected(self):
-        """Open :attr:`selected_dupes` with their associated application.
-        """
-        if len(self.selected_dupes) > 10:
-            if not self.view.ask_yes_no(MSG_MANY_FILES_TO_OPEN):
+        """Open :attr:`selected_dupes` with their associated application."""
+        if len(self.selected_dupes) > 10 and not self.view.ask_yes_no(MSG_MANY_FILES_TO_OPEN):
            return
        for dupe in self.selected_dupes:
            desktop.open_path(dupe.path)

    def purge_ignore_list(self):
-        """Remove files that don't exist from :attr:`ignore_list`.
-        """
-        self.ignore_list.Filter(lambda f, s: op.exists(f) and op.exists(s))
+        """Remove files that don't exist from :attr:`ignore_list`."""
+        self.ignore_list.filter(lambda f, s: op.exists(f) and op.exists(s))
        self.ignore_list_dialog.refresh()

    def remove_directories(self, indexes):
@@ -656,7 +672,7 @@ class DupeGuru(Broadcaster):
            indexes = sorted(indexes, reverse=True)
            for index in indexes:
                del self.directories[index]
-            self.notify('directories_changed')
+            self.notify("directories_changed")
        except IndexError:
            pass

@@ -669,11 +685,10 @@ class DupeGuru(Broadcaster):
        :type duplicates: list of :class:`~core.fs.File`
        """
        self.results.remove_duplicates(self.without_ref(duplicates))
-        self.notify('results_changed_but_keep_selection')
+        self.notify("results_changed_but_keep_selection")

    def remove_marked(self):
-        """Removed marked duplicates from the results (without touching the files themselves).
-        """
+        """Removed marked duplicates from the results (without touching the files themselves)."""
        if not self.results.mark_count:
            self.view.show_message(MSG_NO_MARKED_DUPES)
            return
@@ -684,8 +699,7 @@ class DupeGuru(Broadcaster):
        self._results_changed()

    def remove_selected(self):
-        """Removed :attr:`selected_dupes` from the results (without touching the files themselves).
-        """
+        """Removed :attr:`selected_dupes` from the results (without touching the files themselves)."""
        dupes = self.without_ref(self.selected_dupes)
        if not dupes:
            self.view.show_message(MSG_NO_SELECTED_DUPES)
@@ -723,6 +737,8 @@ class DupeGuru(Broadcaster):
        for group in self.results.groups:
            if group.prioritize(key_func=sort_key):
                count += 1
+        if count:
+            self.results.refresh_required = True
        self._results_changed()
        msg = tr("{} duplicate groups were changed by the re-prioritization.").format(count)
        self.view.show_message(msg)
@@ -734,10 +750,15 @@ class DupeGuru(Broadcaster):
    def save(self):
        if not op.exists(self.appdata):
            os.makedirs(self.appdata)
-        self.directories.save_to_file(op.join(self.appdata, 'last_directories.xml'))
-        p = op.join(self.appdata, 'ignore_list.xml')
+        self.directories.save_to_file(op.join(self.appdata, "last_directories.xml"))
+        p = op.join(self.appdata, "ignore_list.xml")
        self.ignore_list.save_to_xml(p)
-        self.notify('save_session')
+        p = op.join(self.appdata, "exclude_list.xml")
+        self.exclude_list.save_to_xml(p)
+        self.notify("save_session")
+
+    def close(self):
+        fs.filesdb.close()

    def save_as(self, filename):
        """Save results in ``filename``.
@@ -749,6 +770,16 @@ class DupeGuru(Broadcaster):
        except OSError as e:
            self.view.show_message(tr("Couldn't write to file: {}").format(str(e)))

+    def save_directories_as(self, filename):
+        """Save directories in ``filename``.
+
+        :param str filename: path of the file to save directories (as XML) to.
+        """
+        try:
+            self.directories.save_to_file(filename)
+        except OSError as e:
+            self.view.show_message(tr("Couldn't write to file: {}").format(str(e)))
+
    def start_scanning(self):
        """Starts an async job to scan for duplicates.

@@ -762,7 +793,7 @@ class DupeGuru(Broadcaster):
        for k, v in self.options.items():
            if hasattr(scanner, k):
                setattr(scanner, k, v)
-        if self.app_mode == AppMode.Picture:
+        if self.app_mode == AppMode.PICTURE:
            scanner.cache_path = self._get_picture_cache_path()
        self.results.groups = []
        self._recreate_result_table()
@@ -770,17 +801,17 @@ class DupeGuru(Broadcaster):

        def do(j):
            j.set_progress(0, tr("Collecting files to scan"))
-            if scanner.scan_type == ScanType.Folders:
+            if scanner.scan_type == ScanType.FOLDERS:
                files = list(self.directories.get_folders(folderclass=se.fs.Folder, j=j))
            else:
                files = list(self.directories.get_files(fileclasses=self.fileclasses, j=j))
-            if self.options['ignore_hardlink_matches']:
+            if self.options["ignore_hardlink_matches"]:
                files = self._remove_hardlink_dupes(files)
-            logging.info('Scanning %d files' % len(files))
+            logging.info("Scanning %d files" % len(files))
            self.results.groups = scanner.get_dupe_groups(files, self.ignore_list, j)
            self.discarded_file_count = scanner.discarded_file_count

-        self._start_job(JobType.Scan, do)
+        self._start_job(JobType.SCAN, do)

    def toggle_selected_mark_state(self):
        selected = self.without_ref(self.selected_dupes)
@@ -792,11 +823,10 @@ class DupeGuru(Broadcaster):
            markfunc = self.results.mark
        for dupe in selected:
            markfunc(dupe)
-        self.notify('marking_changed')
+        self.notify("marking_changed")

    def without_ref(self, dupes):
-        """Returns ``dupes`` with all reference elements removed.
-        """
+        """Returns ``dupes`` with all reference elements removed."""
        return [dupe for dupe in dupes if self.results.get_group_of_duplicate(dupe).ref is not dupe]

    def get_default(self, key, fallback_value=None):
@@ -812,7 +842,7 @@ class DupeGuru(Broadcaster):
    def set_default(self, key, value):
        self.view.set_default(key, value)

-    #--- Properties
+    # --- Properties
    @property
    def stat_line(self):
        result = self.results.stat_line
@@ -826,22 +856,31 @@ class DupeGuru(Broadcaster):

    @property
    def SCANNER_CLASS(self):
-        if self.app_mode == AppMode.Picture:
+        if self.app_mode == AppMode.PICTURE:
            return pe.scanner.ScannerPE
-        elif self.app_mode == AppMode.Music:
+        elif self.app_mode == AppMode.MUSIC:
            return me.scanner.ScannerME
        else:
            return se.scanner.ScannerSE

    @property
    def METADATA_TO_READ(self):
-        if self.app_mode == AppMode.Picture:
-            return ['size', 'mtime', 'dimensions', 'exif_timestamp']
-        elif self.app_mode == AppMode.Music:
+        if self.app_mode == AppMode.PICTURE:
+            return ["size", "mtime", "dimensions", "exif_timestamp"]
+        elif self.app_mode == AppMode.MUSIC:
            return [
-                'size', 'mtime', 'duration', 'bitrate', 'samplerate', 'title', 'artist',
-                'album', 'genre', 'year', 'track', 'comment'
+                "size",
+                "mtime",
+                "duration",
+                "bitrate",
+                "samplerate",
+                "title",
+                "artist",
+                "album",
+                "genre",
+                "year",
+                "track",
+                "comment",
            ]
        else:
-            return ['size', 'mtime']
-
+            return ["size", "mtime"]
--- a/core/directories.py
+++ b/core/directories.py
@@ -11,16 +11,18 @@ import logging
 from hscommon.jobprogress import job
 from hscommon.path import Path
 from hscommon.util import FileOrPath
+from hscommon.trans import tr

 from . import fs

 __all__ = [
-    'Directories',
-    'DirectoryState',
-    'AlreadyThereError',
-    'InvalidPathError',
+    "Directories",
+    "DirectoryState",
+    "AlreadyThereError",
+    "InvalidPathError",
 ]

+
 class DirectoryState:
    """Enum describing how a folder should be considered.

@@ -28,16 +30,20 @@ class DirectoryState:
    * DirectoryState.Reference: Scan files, but make sure never to delete any of them
    * DirectoryState.Excluded: Don't scan this folder
    """
-    Normal = 0
-    Reference = 1
-    Excluded = 2
+
+    NORMAL = 0
+    REFERENCE = 1
+    EXCLUDED = 2
+

 class AlreadyThereError(Exception):
    """The path being added is already in the directory list"""

+
 class InvalidPathError(Exception):
    """The path being added is invalid"""

+
 class Directories:
    """Holds user folder selection.

@@ -47,11 +53,13 @@ class Directories:
    Then, when the user starts the scan, :meth:`get_files` is called to retrieve all files (wrapped
    in :mod:`core.fs`) that have to be scanned according to the chosen folders/states.
    """
-    #---Override
-    def __init__(self):
+
+    # ---Override
+    def __init__(self, exclude_list=None):
        self._dirs = []
        # {path: state}
        self.states = {}
+        self._exclude_list = exclude_list

    def __contains__(self, path):
        for p in self._dirs:
@@ -68,38 +76,56 @@ class Directories:
    def __len__(self):
        return len(self._dirs)

-    #---Private
+    # ---Private
    def _default_state_for_path(self, path):
+        # New logic with regex filters
+        if self._exclude_list is not None and self._exclude_list.mark_count > 0:
+            # We iterate even if we only have one item here
+            for denied_path_re in self._exclude_list.compiled:
+                if denied_path_re.match(str(path.name)):
+                    return DirectoryState.EXCLUDED
+            # return # We still use the old logic to force state on hidden dirs
        # Override this in subclasses to specify the state of some special folders.
-        if path.name.startswith('.'): # hidden
-            return DirectoryState.Excluded
+        if path.name.startswith("."):
+            return DirectoryState.EXCLUDED

    def _get_files(self, from_path, fileclasses, j):
        for root, dirs, files in os.walk(str(from_path)):
            j.check_if_cancelled()
-            root = Path(root)
-            state = self.get_state(root)
-            if state == DirectoryState.Excluded:
+            root_path = Path(root)
+            state = self.get_state(root_path)
+            if state == DirectoryState.EXCLUDED and not any(p[: len(root_path)] == root_path for p in self.states):
                # Recursively get files from folders with lots of subfolder is expensive. However, there
                # might be a subfolder in this path that is not excluded. What we want to do is to skim
                # through self.states and see if we must continue, or we can stop right here to save time
-                if not any(p[:len(root)] == root for p in self.states):
                del dirs[:]
            try:
-                if state != DirectoryState.Excluded:
-                    found_files = [fs.get_file(root + f, fileclasses=fileclasses) for f in files]
+                if state != DirectoryState.EXCLUDED:
+                    # Old logic
+                    if self._exclude_list is None or not self._exclude_list.mark_count:
+                        found_files = [fs.get_file(root_path + f, fileclasses=fileclasses) for f in files]
+                    else:
+                        found_files = []
+                        # print(f"len of files: {len(files)} {files}")
+                        for f in files:
+                            if not self._exclude_list.is_excluded(root, f):
+                                found_files.append(fs.get_file(root_path + f, fileclasses=fileclasses))
                    found_files = [f for f in found_files if f is not None]
                    # In some cases, directories can be considered as files by dupeGuru, which is
                    # why we have this line below. In fact, there only one case: Bundle files under
                    # OS X... In other situations, this forloop will do nothing.
                    for d in dirs[:]:
-                        f = fs.get_file(root + d, fileclasses=fileclasses)
+                        f = fs.get_file(root_path + d, fileclasses=fileclasses)
                        if f is not None:
                            found_files.append(f)
                            dirs.remove(d)
-                    logging.debug("Collected %d files in folder %s", len(found_files), str(from_path))
+                    logging.debug(
+                        "Collected %d files in folder %s",
+                        len(found_files),
+                        str(root_path),
+                    )
                    for file in found_files:
-                        file.is_ref = state == DirectoryState.Reference
+                        file.is_ref = state == DirectoryState.REFERENCE
                        yield file
            except (EnvironmentError, fs.InvalidPath):
                pass
@@ -111,14 +137,14 @@ class Directories:
                for folder in self._get_folders(subfolder, j):
                    yield folder
            state = self.get_state(from_folder.path)
-            if state != DirectoryState.Excluded:
-                from_folder.is_ref = state == DirectoryState.Reference
+            if state != DirectoryState.EXCLUDED:
+                from_folder.is_ref = state == DirectoryState.REFERENCE
                logging.debug("Yielding Folder %r state: %d", from_folder, state)
                yield from_folder
        except (EnvironmentError, fs.InvalidPath):
            pass

-    #---Public
+    # ---Public
    def add_path(self, path):
        """Adds ``path`` to self, if not already there.

@@ -157,8 +183,12 @@ class Directories:
        """
        if fileclasses is None:
            fileclasses = [fs.File]
+        file_count = 0
        for path in self._dirs:
            for file in self._get_files(path, fileclasses=fileclasses, j=j):
+                file_count += 1
+                if type(j) != job.NullJob:
+                    j.set_progress(-1, tr("Collected {} files to scan").format(file_count))
                yield file

    def get_folders(self, folderclass=None, j=job.nulljob):
@@ -168,9 +198,13 @@ class Directories:
        """
        if folderclass is None:
            folderclass = fs.Folder
+        folder_count = 0
        for path in self._dirs:
            from_folder = folderclass(path)
            for folder in self._get_folders(from_folder, j):
+                folder_count += 1
+                if type(j) != job.NullJob:
+                    j.set_progress(-1, tr("Collected {} folders to scan").format(folder_count))
                yield folder

    def get_state(self, path):
@@ -181,9 +215,15 @@ class Directories:
        # direct match? easy result.
        if path in self.states:
            return self.states[path]
-        state = self._default_state_for_path(path) or DirectoryState.Normal
+        state = self._default_state_for_path(path) or DirectoryState.NORMAL
+        # Save non-default states in cache, necessary for _get_files()
+        if state != DirectoryState.NORMAL:
+            self.states[path] = state
+            return state
+
        prevlen = 0
        # we loop through the states to find the longest matching prefix
+        # if the parent has a state in cache, return that state
        for p, s in self.states.items():
            if p.is_parent_of(path) and len(p) > prevlen:
                prevlen = len(p)
@@ -212,21 +252,21 @@ class Directories:
            root = ET.parse(infile).getroot()
        except Exception:
            return
-        for rdn in root.getiterator('root_directory'):
+        for rdn in root.iter("root_directory"):
            attrib = rdn.attrib
-            if 'path' not in attrib:
+            if "path" not in attrib:
                continue
-            path = attrib['path']
+            path = attrib["path"]
            try:
                self.add_path(Path(path))
            except (AlreadyThereError, InvalidPathError):
                pass
-        for sn in root.getiterator('state'):
+        for sn in root.iter("state"):
            attrib = sn.attrib
-            if not ('path' in attrib and 'value' in attrib):
+            if not ("path" in attrib and "value" in attrib):
                continue
-            path = attrib['path']
-            state = attrib['value']
+            path = attrib["path"]
+            state = attrib["value"]
            self.states[Path(path)] = int(state)

    def save_to_file(self, outfile):
@@ -234,17 +274,17 @@ class Directories:

        :param file outfile: path or file pointer to XML file to save to.
        """
-        with FileOrPath(outfile, 'wb') as fp:
-            root = ET.Element('directories')
+        with FileOrPath(outfile, "wb") as fp:
+            root = ET.Element("directories")
            for root_path in self:
-                root_path_node = ET.SubElement(root, 'root_directory')
-                root_path_node.set('path', str(root_path))
+                root_path_node = ET.SubElement(root, "root_directory")
+                root_path_node.set("path", str(root_path))
            for path, state in self.states.items():
-                state_node = ET.SubElement(root, 'state')
-                state_node.set('path', str(path))
-                state_node.set('value', str(state))
+                state_node = ET.SubElement(root, "state")
+                state_node.set("path", str(path))
+                state_node.set("value", str(state))
            tree = ET.ElementTree(root)
-            tree.write(fp, encoding='utf-8')
+            tree.write(fp, encoding="utf-8")

    def set_state(self, path, state):
        """Set the state of folder at ``path``.
@@ -259,4 +299,3 @@ class Directories:
            if path.is_parent_of(iter_path):
                del self.states[iter_path]
        self.states[path] = state
-
--- a/core/engine.py
+++ b/core/engine.py
@@ -24,18 +24,33 @@ from hscommon.jobprogress import job
 ) = range(3)

 JOB_REFRESH_RATE = 100
+PROGRESS_MESSAGE = tr("%d matches found from %d groups")
+

 def getwords(s):
    # We decompose the string so that ascii letters with accents can be part of the word.
-    s = normalize('NFD', s)
-    s = multi_replace(s, "-_&+():;\\[]{}.,<>/?~!@#$*", ' ').lower()
-    s = ''.join(c for c in s if c in string.ascii_letters + string.digits + string.whitespace)
-    return [_f for _f in s.split(' ') if _f] # remove empty elements
+    s = normalize("NFD", s)
+    s = multi_replace(s, "-_&+():;\\[]{}.,<>/?~!@#$*", " ").lower()
+    # logging.debug(f"DEBUG chars for: {s}\n"
+    #               f"{[c for c in s if ord(c) != 32]}\n"
+    #               f"{[ord(c) for c in s if ord(c) != 32]}")
+    # HACK We shouldn't ignore non-ascii characters altogether. Any Unicode char
+    # above common european characters that cannot be "sanitized" (ie. stripped
+    # of their accents, etc.) are preserved as is. The arbitrary limit is
+    # obtained from this one: ord("\u037e") GREEK QUESTION MARK
+    s = "".join(
+        c
+        for c in s
+        if (ord(c) <= 894 and c in string.ascii_letters + string.digits + string.whitespace) or ord(c) > 894
+    )
+    return [_f for _f in s.split(" ") if _f]  # remove empty elements
+

 def getfields(s):
-    fields = [getwords(field) for field in s.split(' - ')]
+    fields = [getwords(field) for field in s.split(" - ")]
    return [_f for _f in fields if _f]

+
 def unpack_fields(fields):
    result = []
    for field in fields:
@@ -45,6 +60,7 @@ def unpack_fields(fields):
            result.append(field)
    return result

+
 def compare(first, second, flags=()):
    """Returns the % of words that match between ``first`` and ``second``

@@ -55,11 +71,11 @@ def compare(first, second, flags=()):
        return 0
    if any(isinstance(element, list) for element in first):
        return compare_fields(first, second, flags)
-    second = second[:] #We must use a copy of second because we remove items from it
+    second = second[:]  # We must use a copy of second because we remove items from it
    match_similar = MATCH_SIMILAR_WORDS in flags
    weight_words = WEIGHT_WORDS in flags
    joined = first + second
-    total_count = (sum(len(word) for word in joined) if weight_words else len(joined))
+    total_count = sum(len(word) for word in joined) if weight_words else len(joined)
    match_count = 0
    in_order = True
    for word in first:
@@ -71,12 +87,13 @@ def compare(first, second, flags=()):
            if second[0] != word:
                in_order = False
            second.remove(word)
-            match_count += (len(word) if weight_words else 1)
+            match_count += len(word) if weight_words else 1
    result = round(((match_count * 2) / total_count) * 100)
    if (result == 100) and (not in_order):
        result = 99  # We cannot consider a match exact unless the ordering is the same
    return result

+
 def compare_fields(first, second, flags=()):
    """Returns the score for the lowest matching :ref:`fields`.

@@ -87,23 +104,24 @@ def compare_fields(first, second, flags=()):
        return 0
    if NO_FIELD_ORDER in flags:
        results = []
-        #We don't want to remove field directly in the list. We must work on a copy.
+        # We don't want to remove field directly in the list. We must work on a copy.
        second = second[:]
        for field1 in first:
-            max = 0
+            max_score = 0
            matched_field = None
            for field2 in second:
                r = compare(field1, field2, flags)
-                if r > max:
-                    max = r
+                if r > max_score:
+                    max_score = r
                    matched_field = field2
-            results.append(max)
+            results.append(max_score)
            if matched_field:
                second.remove(matched_field)
    else:
        results = [compare(field1, field2, flags) for field1, field2 in zip(first, second)]
    return min(results) if results else 0

+
 def build_word_dict(objects, j=job.nulljob):
    """Returns a dict of objects mapped by their words.

@@ -113,11 +131,12 @@ def build_word_dict(objects, j=job.nulljob):
    The result will be a dict with words as keys, lists of objects as values.
    """
    result = defaultdict(set)
-    for object in j.iter_with_progress(objects, 'Prepared %d/%d files', JOB_REFRESH_RATE):
+    for object in j.iter_with_progress(objects, "Prepared %d/%d files", JOB_REFRESH_RATE):
        for word in unpack_fields(object.words):
            result[word].add(object)
    return result

+
 def merge_similar_words(word_dict):
    """Take all keys in ``word_dict`` that are similar, and merge them together.

@@ -126,7 +145,7 @@ def merge_similar_words(word_dict):
    a word equal to the other.
    """
    keys = list(word_dict.keys())
-    keys.sort(key=len)# we want the shortest word to stay
+    keys.sort(key=len)  # we want the shortest word to stay
    while keys:
        key = keys.pop(0)
        similars = difflib.get_close_matches(key, keys, 100, 0.8)
@@ -138,6 +157,7 @@ def merge_similar_words(word_dict):
            del word_dict[similar]
            keys.remove(similar)

+
 def reduce_common_words(word_dict, threshold):
    """Remove all objects from ``word_dict`` values where the object count >= ``threshold``

@@ -159,11 +179,13 @@ def reduce_common_words(word_dict, threshold):
        else:
            del word_dict[word]

+
 # Writing docstrings in a namedtuple is tricky. From Python 3.3, it's possible to set __doc__, but
 # some research allowed me to find a more elegant solution, which is what is done here. See
 # http://stackoverflow.com/questions/1606436/adding-docstrings-to-namedtuples-in-python

-class Match(namedtuple('Match', 'first second percentage')):
+
+class Match(namedtuple("Match", "first second percentage")):
    """Represents a match between two :class:`~core.fs.File`.

    Regarless of the matching method, when two files are determined to match, a Match pair is created,
@@ -182,16 +204,24 @@ class Match(namedtuple('Match', 'first second percentage')):
        their match level according to the scan method which found the match. int from 1 to 100. For
        exact scan methods, such as Contents scans, this will always be 100.
    """
+
    __slots__ = ()

+
 def get_match(first, second, flags=()):
-    #it is assumed here that first and second both have a "words" attribute
+    # it is assumed here that first and second both have a "words" attribute
    percentage = compare(first.words, second.words, flags)
    return Match(first, second, percentage)

+
 def getmatches(
-        objects, min_match_percentage=0, match_similar_words=False, weight_words=False,
-        no_field_order=False, j=job.nulljob):
+    objects,
+    min_match_percentage=0,
+    match_similar_words=False,
+    weight_words=False,
+    no_field_order=False,
+    j=job.nulljob,
+):
    """Returns a list of :class:`Match` within ``objects`` after fuzzily matching their words.

    :param objects: List of :class:`~core.fs.File` to match.
@@ -206,7 +236,7 @@ def getmatches(
    j = j.start_subjob(2)
    sj = j.start_subjob(2)
    for o in objects:
-        if not hasattr(o, 'words'):
+        if not hasattr(o, "words"):
            o.words = getwords(o.name)
    word_dict = build_word_dict(objects, sj)
    reduce_common_words(word_dict, COMMON_WORD_THRESHOLD)
@@ -219,10 +249,11 @@ def getmatches(
        match_flags.append(MATCH_SIMILAR_WORDS)
    if no_field_order:
        match_flags.append(NO_FIELD_ORDER)
-    j.start_job(len(word_dict), tr("0 matches found"))
+    j.start_job(len(word_dict), PROGRESS_MESSAGE % (0, 0))
    compared = defaultdict(set)
    result = []
    try:
+        word_count = 0
        # This whole 'popping' thing is there to avoid taking too much memory at the same time.
        while word_dict:
            items = word_dict.popitem()[1]
@@ -237,39 +268,53 @@ def getmatches(
                        result.append(m)
                        if len(result) >= LIMIT:
                            return result
-            j.add_progress(desc=tr("%d matches found") % len(result))
+            word_count += 1
+            j.add_progress(desc=PROGRESS_MESSAGE % (len(result), word_count))
    except MemoryError:
        # This is the place where the memory usage is at its peak during the scan.
        # Just continue the process with an incomplete list of matches.
        del compared  # This should give us enough room to call logging.
-        logging.warning('Memory Overflow. Matches: %d. Word dict: %d' % (len(result), len(word_dict)))
+        logging.warning("Memory Overflow. Matches: %d. Word dict: %d" % (len(result), len(word_dict)))
        return result
    return result

-def getmatches_by_contents(files, j=job.nulljob):
+
+def getmatches_by_contents(files, bigsize=0, j=job.nulljob):
    """Returns a list of :class:`Match` within ``files`` if their contents is the same.

+    :param bigsize: The size in bytes over which we consider files big enough to
+                    justify taking samples of md5. If 0, compute md5 as usual.
    :param j: A :ref:`job progress instance <jobs>`.
    """
    size2files = defaultdict(set)
    for f in files:
-        if f.size:
        size2files[f.size].add(f)
    del files
    possible_matches = [files for files in size2files.values() if len(files) > 1]
    del size2files
    result = []
-    j.start_job(len(possible_matches), tr("0 matches found"))
+    j.start_job(len(possible_matches), PROGRESS_MESSAGE % (0, 0))
+    group_count = 0
    for group in possible_matches:
        for first, second in itertools.combinations(group, 2):
            if first.is_ref and second.is_ref:
                continue  # Don't spend time comparing two ref pics together.
+            if first.size == 0 and second.size == 0:
+                # skip md5 for zero length files
+                result.append(Match(first, second, 100))
+                continue
            if first.md5partial == second.md5partial:
+                if bigsize > 0 and first.size > bigsize:
+                    if first.md5samples == second.md5samples:
+                        result.append(Match(first, second, 100))
+                else:
                    if first.md5 == second.md5:
                        result.append(Match(first, second, 100))
-        j.add_progress(desc=tr("%d matches found") % len(result))
+        group_count += 1
+        j.add_progress(desc=PROGRESS_MESSAGE % (len(result), group_count))
    return result

+
 class Group:
    """A group of :class:`~core.fs.File` that match together.

@@ -297,7 +342,8 @@ class Group:

        Average match percentage of match pairs containing :attr:`ref`.
    """
-    #---Override
+
+    # ---Override
    def __init__(self):
        self._clear()

@@ -313,7 +359,7 @@ class Group:
    def __len__(self):
        return len(self.ordered)

-    #---Private
+    # ---Private
    def _clear(self):
        self._percentage = None
        self._matches_for_ref = None
@@ -328,7 +374,7 @@ class Group:
            self._matches_for_ref = [match for match in self.matches if ref in match]
        return self._matches_for_ref

-    #---Public
+    # ---Public
    def add_match(self, match):
        """Adds ``match`` to internal match list and possibly add duplicates to the group.

@@ -339,6 +385,7 @@ class Group:

        :param tuple match: pair of :class:`~core.fs.File` to add
        """
+
        def add_candidate(item, match):
            matches = self.candidates[item]
            matches.add(match)
@@ -368,8 +415,7 @@ class Group:
        return discarded

    def get_match_of(self, item):
-        """Returns the match pair between ``item`` and :attr:`ref`.
-        """
+        """Returns the match pair between ``item`` and :attr:`ref`."""
        if item is self.ref:
            return
        for m in self._get_matches_for_ref():
@@ -385,8 +431,7 @@ class Group:
        """
        # tie_breaker(ref, dupe) --> True if dupe should be ref
        # Returns True if anything changed during prioritization.
-        master_key_func = lambda x: (-x.is_ref, key_func(x))
-        new_order = sorted(self.ordered, key=master_key_func)
+        new_order = sorted(self.ordered, key=lambda x: (-x.is_ref, key_func(x)))
        changed = new_order != self.ordered
        self.ordered = new_order
        if tie_breaker is None:
@@ -409,7 +454,7 @@ class Group:
            self.unordered.remove(item)
            self._percentage = None
            self._matches_for_ref = None
-            if (len(self) > 1) and any(not getattr(item, 'is_ref', False) for item in self):
+            if (len(self) > 1) and any(not getattr(item, "is_ref", False) for item in self):
                if discard_matches:
                    self.matches = set(m for m in self.matches if item not in m)
            else:
@@ -418,8 +463,7 @@ class Group:
            pass

    def switch_ref(self, with_dupe):
-        """Make the :attr:`ref` dupe of the group switch position with ``with_dupe``.
-        """
+        """Make the :attr:`ref` dupe of the group switch position with ``with_dupe``."""
        if self.ref.is_ref:
            return False
        try:
@@ -485,7 +529,7 @@ def get_groups(matches):
        del dupe2group
        del matches
        # should free enough memory to continue
-        logging.warning('Memory Overflow. Groups: {0}'.format(len(groups)))
+        logging.warning("Memory Overflow. Groups: {0}".format(len(groups)))
    # Now that we have a group, we have to discard groups' matches and see if there're any "orphan"
    # matches, that is, matches that were candidate in a group but that none of their 2 files were
    # accepted in the group. With these orphan groups, it's safe to build additional groups
@@ -493,8 +537,7 @@ def get_groups(matches):
    orphan_matches = []
    for group in groups:
        orphan_matches += {
-            m for m in group.discard_matches()
-            if not any(obj in matched_files for obj in [m.first, m.second])
+            m for m in group.discard_matches() if not any(obj in matched_files for obj in [m.first, m.second])
        }
    if groups and orphan_matches:
        groups += get_groups(orphan_matches)  # no job, as it isn't supposed to take a long time
--- a/core/exclude.py
+++ b/core/exclude.py
@@ -0,0 +1,513 @@
+# This software is licensed under the "GPLv3" License as described in the "LICENSE" file,
+# which should be included with this package. The terms are also available at
+# http://www.gnu.org/licenses/gpl-3.0.html
+
+from .markable import Markable
+from xml.etree import ElementTree as ET
+
+# TODO: perhaps use regex module for better Unicode support? https://pypi.org/project/regex/
+# also https://pypi.org/project/re2/
+# TODO update the Result list with newly added regexes if possible
+import re
+from os import sep
+import logging
+import functools
+from hscommon.util import FileOrPath
+from hscommon.plat import ISWINDOWS
+import time
+
+default_regexes = [
+    r"^thumbs\.db$",  # Obsolete after WindowsXP
+    r"^desktop\.ini$",  # Windows metadata
+    r"^\.DS_Store$",  # MacOS metadata
+    r"^\.Trash\-.*",  # Linux trash directories
+    r"^\$Recycle\.Bin$",  # Windows
+    r"^\..*",  # Hidden files on Unix-like
+]
+# These are too broad
+forbidden_regexes = [r".*", r"\/.*", r".*\/.*", r".*\\\\.*", r".*\..*"]
+
+
+def timer(func):
+    @functools.wraps(func)
+    def wrapper_timer(*args):
+        start = time.perf_counter_ns()
+        value = func(*args)
+        end = time.perf_counter_ns()
+        print(f"DEBUG: func {func.__name__!r} took {end - start} ns.")
+        return value
+
+    return wrapper_timer
+
+
+def memoize(func):
+    func.cache = dict()
+
+    @functools.wraps(func)
+    def _memoize(*args):
+        if args not in func.cache:
+            func.cache[args] = func(*args)
+        return func.cache[args]
+
+    return _memoize
+
+
+class AlreadyThereException(Exception):
+    """Expression already in the list"""
+
+    def __init__(self, arg="Expression is already in excluded list."):
+        super().__init__(arg)
+
+
+class ExcludeList(Markable):
+    """A list of lists holding regular expression strings and the compiled re.Pattern"""
+
+    # Used to filter out directories and files that we would rather avoid scanning.
+    # The list() class allows us to preserve item order without too much hassle.
+    # The downside is we have to compare strings every time we look for an item in the list
+    # since we use regex strings as keys.
+    # If _use_union is True, the compiled regexes will be combined into one single
+    # Pattern instead of separate Patterns which may or may not give better
+    # performance compared to looping through each Pattern individually.
+
+    # ---Override
+    def __init__(self, union_regex=True):
+        Markable.__init__(self)
+        self._use_union = union_regex
+        # list([str regex, bool iscompilable, re.error exception, Pattern compiled], ...)
+        self._excluded = []
+        self._excluded_compiled = set()
+        self._dirty = True
+
+    def __iter__(self):
+        """Iterate in order."""
+        for item in self._excluded:
+            regex = item[0]
+            yield self.is_marked(regex), regex
+
+    def __contains__(self, item):
+        return self.has_entry(item)
+
+    def __len__(self):
+        """Returns the total number of regexes regardless of mark status."""
+        return len(self._excluded)
+
+    def __getitem__(self, key):
+        """Returns the list item corresponding to key."""
+        for item in self._excluded:
+            if item[0] == key:
+                return item
+        raise KeyError(f"Key {key} is not in exclusion list.")
+
+    def __setitem__(self, key, value):
+        # TODO if necessary
+        pass
+
+    def __delitem__(self, key):
+        # TODO if necessary
+        pass
+
+    def get_compiled(self, key):
+        """Returns the (precompiled) Pattern for key"""
+        return self.__getitem__(key)[3]
+
+    def is_markable(self, regex):
+        return self._is_markable(regex)
+
+    def _is_markable(self, regex):
+        """Return the cached result of "compilable" property"""
+        for item in self._excluded:
+            if item[0] == regex:
+                return item[1]
+        return False  # should not be necessary, the regex SHOULD be in there
+
+    def _did_mark(self, regex):
+        self._add_compiled(regex)
+
+    def _did_unmark(self, regex):
+        self._remove_compiled(regex)
+
+    def _add_compiled(self, regex):
+        self._dirty = True
+        if self._use_union:
+            return
+        for item in self._excluded:
+            # FIXME probably faster to just rebuild the set from the compiled instead of comparing strings
+            if item[0] == regex:
+                # no need to test if already present since it's a set()
+                self._excluded_compiled.add(item[3])
+                break
+
+    def _remove_compiled(self, regex):
+        self._dirty = True
+        if self._use_union:
+            return
+        for item in self._excluded_compiled:
+            if regex in item.pattern:
+                self._excluded_compiled.remove(item)
+                break
+
+    # @timer
+    @memoize
+    def _do_compile(self, expr):
+        return re.compile(expr)
+
+    # @timer
+    # @memoize  # probably not worth memoizing this one if we memoize the above
+    def compile_re(self, regex):
+        compiled = None
+        try:
+            compiled = self._do_compile(regex)
+        except Exception as e:
+            return False, e, compiled
+        return True, None, compiled
+
+    def error(self, regex):
+        """Return the compilation error Exception for regex.
+        It should have a "msg" attr."""
+        for item in self._excluded:
+            if item[0] == regex:
+                return item[2]
+
+    def build_compiled_caches(self, union=False):
+        if not union:
+            self._cached_compiled_files = [x for x in self._excluded_compiled if not has_sep(x.pattern)]
+            self._cached_compiled_paths = [x for x in self._excluded_compiled if has_sep(x.pattern)]
+            self._dirty = False
+            return
+
+        marked_count = [x for marked, x in self if marked]
+        # If there is no item, the compiled Pattern will be '' and match everything!
+        if not marked_count:
+            self._cached_compiled_union_all = []
+            self._cached_compiled_union_files = []
+            self._cached_compiled_union_paths = []
+        else:
+            # HACK returned as a tuple to get a free iterator and keep interface
+            # the same regardless of whether the client asked for union or not
+            self._cached_compiled_union_all = (re.compile("|".join(marked_count)),)
+            files_marked = [x for x in marked_count if not has_sep(x)]
+            if not files_marked:
+                self._cached_compiled_union_files = tuple()
+            else:
+                self._cached_compiled_union_files = (re.compile("|".join(files_marked)),)
+            paths_marked = [x for x in marked_count if has_sep(x)]
+            if not paths_marked:
+                self._cached_compiled_union_paths = tuple()
+            else:
+                self._cached_compiled_union_paths = (re.compile("|".join(paths_marked)),)
+        self._dirty = False
+
+    @property
+    def compiled(self):
+        """Should be used by other classes to retrieve the up-to-date list of patterns."""
+        if self._use_union:
+            if self._dirty:
+                self.build_compiled_caches(self._use_union)
+            return self._cached_compiled_union_all
+        return self._excluded_compiled
+
+    @property
+    def compiled_files(self):
+        """When matching against filenames only, we probably won't be seeing any
+        directory separator, so we filter out regexes with os.sep in them.
+        The interface should be expected to be a generator, even if it returns only
+        one item (one Pattern in the union case)."""
+        if self._dirty:
+            self.build_compiled_caches(self._use_union)
+        return self._cached_compiled_union_files if self._use_union else self._cached_compiled_files
+
+    @property
+    def compiled_paths(self):
+        """Returns patterns with only separators in them, for more precise filtering."""
+        if self._dirty:
+            self.build_compiled_caches(self._use_union)
+        return self._cached_compiled_union_paths if self._use_union else self._cached_compiled_paths
+
+    # ---Public
+    def add(self, regex, forced=False):
+        """This interface should throw exceptions if there is an error during
+        regex compilation"""
+        if self.has_entry(regex):
+            # This exception should never be ignored
+            raise AlreadyThereException()
+        if regex in forbidden_regexes:
+            raise ValueError("Forbidden (dangerous) expression.")
+
+        iscompilable, exception, compiled = self.compile_re(regex)
+        if not iscompilable and not forced:
+            # This exception can be ignored, but taken into account
+            # to avoid adding to compiled set
+            raise exception
+        else:
+            self._do_add(regex, iscompilable, exception, compiled)
+
+    def _do_add(self, regex, iscompilable, exception, compiled):
+        # We need to insert at the top
+        self._excluded.insert(0, [regex, iscompilable, exception, compiled])
+
+    @property
+    def marked_count(self):
+        """Returns the number of marked regexes only."""
+        return len([x for marked, x in self if marked])
+
+    def has_entry(self, regex):
+        for item in self._excluded:
+            if regex == item[0]:
+                return True
+        return False
+
+    def is_excluded(self, dirname, filename):
+        """Return True if the file or the absolute path to file is supposed to be
+        filtered out, False otherwise."""
+        matched = False
+        for expr in self.compiled_files:
+            if expr.fullmatch(filename):
+                matched = True
+                break
+        if not matched:
+            for expr in self.compiled_paths:
+                if expr.fullmatch(dirname + sep + filename):
+                    matched = True
+                    break
+        return matched
+
+    def remove(self, regex):
+        for item in self._excluded:
+            if item[0] == regex:
+                self._excluded.remove(item)
+        self._remove_compiled(regex)
+
+    def rename(self, regex, newregex):
+        if regex == newregex:
+            return
+        found = False
+        was_marked = False
+        is_compilable = False
+        for item in self._excluded:
+            if item[0] == regex:
+                found = True
+                was_marked = self.is_marked(regex)
+                is_compilable, exception, compiled = self.compile_re(newregex)
+                # We overwrite the found entry
+                self._excluded[self._excluded.index(item)] = [newregex, is_compilable, exception, compiled]
+                self._remove_compiled(regex)
+                break
+        if not found:
+            return
+        if is_compilable:
+            self._add_compiled(newregex)
+            if was_marked:
+                # Not marked by default when added, add it back
+                self.mark(newregex)
+
+    # def change_index(self, regex, new_index):
+    # """Internal list must be a list, not dict."""
+    #     item = self._excluded.pop(regex)
+    #     self._excluded.insert(new_index, item)
+
+    def restore_defaults(self):
+        for _, regex in self:
+            if regex not in default_regexes:
+                self.unmark(regex)
+        for default_regex in default_regexes:
+            if not self.has_entry(default_regex):
+                self.add(default_regex)
+            self.mark(default_regex)
+
+    def load_from_xml(self, infile):
+        """Loads the ignore list from a XML created with save_to_xml.
+
+        infile can be a file object or a filename.
+        """
+        try:
+            root = ET.parse(infile).getroot()
+        except Exception as e:
+            logging.warning(f"Error while loading {infile}: {e}")
+            self.restore_defaults()
+            return e
+
+        marked = set()
+        exclude_elems = (e for e in root if e.tag == "exclude")
+        for exclude_item in exclude_elems:
+            regex_string = exclude_item.get("regex")
+            if not regex_string:
+                continue
+            try:
+                # "forced" avoids compilation exceptions and adds anyway
+                self.add(regex_string, forced=True)
+            except AlreadyThereException:
+                logging.error(
+                    f'Regex "{regex_string}" \
+loaded from XML was already present in the list.'
+                )
+                continue
+            if exclude_item.get("marked") == "y":
+                marked.add(regex_string)
+
+        for item in marked:
+            self.mark(item)
+
+    def save_to_xml(self, outfile):
+        """Create a XML file that can be used by load_from_xml.
+        outfile can be a file object or a filename."""
+        root = ET.Element("exclude_list")
+        # reversed in order to keep order of entries when reloading from xml later
+        for item in reversed(self._excluded):
+            exclude_node = ET.SubElement(root, "exclude")
+            exclude_node.set("regex", str(item[0]))
+            exclude_node.set("marked", ("y" if self.is_marked(item[0]) else "n"))
+        tree = ET.ElementTree(root)
+        with FileOrPath(outfile, "wb") as fp:
+            tree.write(fp, encoding="utf-8")
+
+
+class ExcludeDict(ExcludeList):
+    """Exclusion list holding a set of regular expressions as keys, the compiled
+    Pattern, compilation error and compilable boolean as values."""
+
+    # Implemntation around a dictionary instead of a list, which implies
+    # to keep the index of each string-key as its sub-element and keep it updated
+    # whenever insert/remove is done.
+
+    def __init__(self, union_regex=False):
+        Markable.__init__(self)
+        self._use_union = union_regex
+        # { "regex string":
+        #   {
+        #       "index": int,
+        #       "compilable": bool,
+        #       "error": str,
+        #       "compiled": Pattern or None
+        #   }
+        # }
+        self._excluded = {}
+        self._excluded_compiled = set()
+        self._dirty = True
+
+    def __iter__(self):
+        """Iterate in order."""
+        for regex in ordered_keys(self._excluded):
+            yield self.is_marked(regex), regex
+
+    def __getitem__(self, key):
+        """Returns the dict item correponding to key"""
+        return self._excluded.__getitem__(key)
+
+    def get_compiled(self, key):
+        """Returns the compiled item for key"""
+        return self.__getitem__(key).get("compiled")
+
+    def is_markable(self, regex):
+        return self._is_markable(regex)
+
+    def _is_markable(self, regex):
+        """Return the cached result of "compilable" property"""
+        exists = self._excluded.get(regex)
+        if exists:
+            return exists.get("compilable")
+        return False
+
+    def _add_compiled(self, regex):
+        self._dirty = True
+        if self._use_union:
+            return
+        try:
+            self._excluded_compiled.add(self._excluded.get(regex).get("compiled"))
+        except Exception as e:
+            logging.error(f"Exception while adding regex {regex} to compiled set: {e}")
+            return
+
+    def is_compilable(self, regex):
+        """Returns the cached "compilable" value"""
+        return self._excluded[regex]["compilable"]
+
+    def error(self, regex):
+        """Return the compilation error message for regex string"""
+        return self._excluded.get(regex).get("error")
+
+    # ---Public
+    def _do_add(self, regex, iscompilable, exception, compiled):
+        # We always insert at the top, so index should be 0
+        # and other indices should be pushed by one
+        for value in self._excluded.values():
+            value["index"] += 1
+        self._excluded[regex] = {"index": 0, "compilable": iscompilable, "error": exception, "compiled": compiled}
+
+    def has_entry(self, regex):
+        if regex in self._excluded.keys():
+            return True
+        return False
+
+    def remove(self, regex):
+        old_value = self._excluded.pop(regex)
+        # Bring down all indices which where above it
+        index = old_value["index"]
+        if index == len(self._excluded) - 1:  # we start at 0...
+            # Old index was at the end, no need to update other indices
+            self._remove_compiled(regex)
+            return
+
+        for value in self._excluded.values():
+            if value.get("index") > old_value["index"]:
+                value["index"] -= 1
+        self._remove_compiled(regex)
+
+    def rename(self, regex, newregex):
+        if regex == newregex or regex not in self._excluded.keys():
+            return
+        was_marked = self.is_marked(regex)
+        previous = self._excluded.pop(regex)
+        iscompilable, error, compiled = self.compile_re(newregex)
+        self._excluded[newregex] = {
+            "index": previous.get("index"),
+            "compilable": iscompilable,
+            "error": error,
+            "compiled": compiled,
+        }
+        self._remove_compiled(regex)
+        if iscompilable:
+            self._add_compiled(newregex)
+            if was_marked:
+                self.mark(newregex)
+
+    def save_to_xml(self, outfile):
+        """Create a XML file that can be used by load_from_xml.
+
+        outfile can be a file object or a filename.
+        """
+        root = ET.Element("exclude_list")
+        # reversed in order to keep order of entries when reloading from xml later
+        reversed_list = []
+        for key in ordered_keys(self._excluded):
+            reversed_list.append(key)
+        for item in reversed(reversed_list):
+            exclude_node = ET.SubElement(root, "exclude")
+            exclude_node.set("regex", str(item))
+            exclude_node.set("marked", ("y" if self.is_marked(item) else "n"))
+        tree = ET.ElementTree(root)
+        with FileOrPath(outfile, "wb") as fp:
+            tree.write(fp, encoding="utf-8")
+
+
+def ordered_keys(_dict):
+    """Returns an iterator over the keys of dictionary sorted by "index" key"""
+    if not len(_dict):
+        return
+    list_of_items = []
+    for item in _dict.items():
+        list_of_items.append(item)
+    list_of_items.sort(key=lambda x: x[1].get("index"))
+    for item in list_of_items:
+        yield item[0]
+
+
+if ISWINDOWS:
+
+    def has_sep(regexp):
+        return "\\" + sep in regexp
+
+else:
+
+    def has_sep(regexp):
+        return sep in regexp
--- a/core/export.py
+++ b/core/export.py
@@ -114,36 +114,38 @@ ROW_TEMPLATE = """

 CELL_TEMPLATE = """<td>{value}</td>"""

+
 def export_to_xhtml(colnames, rows):
    # a row is a list of values with the first value being a flag indicating if the row should be indented
    if rows:
        assert len(rows[0]) == len(colnames) + 1  # + 1 is for the "indented" flag
-    colheaders = ''.join(COLHEADERS_TEMPLATE.format(name=name) for name in colnames)
+    colheaders = "".join(COLHEADERS_TEMPLATE.format(name=name) for name in colnames)
    rendered_rows = []
    previous_group_id = None
    for row in rows:
        # [2:] is to remove the indented flag + filename
        if row[0] != previous_group_id:
            # We've just changed dupe group, which means that this dupe is a ref. We don't indent it.
-            indented = ''
+            indented = ""
        else:
-            indented = 'indented'
+            indented = "indented"
        filename = row[1]
-        cells = ''.join(CELL_TEMPLATE.format(value=value) for value in row[2:])
+        cells = "".join(CELL_TEMPLATE.format(value=value) for value in row[2:])
        rendered_rows.append(ROW_TEMPLATE.format(indented=indented, filename=filename, cells=cells))
        previous_group_id = row[0]
-    rendered_rows = ''.join(rendered_rows)
+    rendered_rows = "".join(rendered_rows)
    # The main template can't use format because the css code uses {}
-    content = MAIN_TEMPLATE.replace('$colheaders', colheaders).replace('$rows', rendered_rows)
+    content = MAIN_TEMPLATE.replace("$colheaders", colheaders).replace("$rows", rendered_rows)
    folder = mkdtemp()
-    destpath = op.join(folder, 'export.htm')
-    fp = open(destpath, 'wt', encoding='utf-8')
+    destpath = op.join(folder, "export.htm")
+    fp = open(destpath, "wt", encoding="utf-8")
    fp.write(content)
    fp.close()
    return destpath

+
 def export_to_csv(dest, colnames, rows):
-    writer = csv.writer(open(dest, 'wt', encoding='utf-8'))
+    writer = csv.writer(open(dest, "wt", encoding="utf-8"))
    writer.writerow(["Group ID"] + colnames)
    for row in rows:
        writer.writerow(row)
--- a/core/fs.py
+++ b/core/fs.py
@@ -12,24 +12,38 @@
 # and I'm doing it now.

 import hashlib
+from math import floor
 import logging
+import sqlite3
+from threading import Lock
+from typing import Any

+from hscommon.path import Path
 from hscommon.util import nonone, get_file_ext

 __all__ = [
-    'File',
-    'Folder',
-    'get_file',
-    'get_files',
-    'FSError',
-    'AlreadyExistsError',
-    'InvalidPath',
-    'InvalidDestinationError',
-    'OperationError',
+    "File",
+    "Folder",
+    "get_file",
+    "get_files",
+    "FSError",
+    "AlreadyExistsError",
+    "InvalidPath",
+    "InvalidDestinationError",
+    "OperationError",
 ]

 NOT_SET = object()

+# The goal here is to not run out of memory on really big files. However, the chunk
+# size has to be large enough so that the python loop isn't too costly in terms of
+# CPU.
+CHUNK_SIZE = 1024 * 1024  # 1 MiB
+
+# Minimum size below which partial hashes don't need to be computed
+MIN_FILE_SIZE = 3 * CHUNK_SIZE  # 3MiB, because we take 3 samples
+
+
 class FSError(Exception):
    cls_message = "An error has occured on '{name}' in '{parent}'"

@@ -40,8 +54,8 @@ class FSError(Exception):
        elif isinstance(fsobject, File):
            name = fsobject.name
        else:
-            name = ''
-        parentname = str(parent) if parent is not None else ''
+            name = ""
+        parentname = str(parent) if parent is not None else ""
        Exception.__init__(self, message.format(name=name, parent=parentname))


@@ -49,32 +63,109 @@ class AlreadyExistsError(FSError):
    "The directory or file name we're trying to add already exists"
    cls_message = "'{name}' already exists in '{parent}'"

+
 class InvalidPath(FSError):
    "The path of self is invalid, and cannot be worked with."
    cls_message = "'{name}' is invalid."

+
 class InvalidDestinationError(FSError):
    """A copy/move operation has been called, but the destination is invalid."""
+
    cls_message = "'{name}' is an invalid destination for this operation."

+
 class OperationError(FSError):
    """A copy/move/delete operation has been called, but the checkup after the
    operation shows that it didn't work."""
+
    cls_message = "Operation on '{name}' failed."

-class File:
-    """Represents a file and holds metadata to be used for scanning.
+
+class FilesDB:
+
+    create_table_query = "CREATE TABLE IF NOT EXISTS files (path TEXT PRIMARY KEY, size INTEGER, mtime_ns INTEGER, entry_dt DATETIME, md5 BLOB, md5partial BLOB)"
+    drop_table_query = "DROP TABLE files;"
+    select_query = "SELECT {key} FROM files WHERE path=:path AND size=:size and mtime_ns=:mtime_ns"
+    insert_query = """
+        INSERT INTO files (path, size, mtime_ns, entry_dt, {key}) VALUES (:path, :size, :mtime_ns, datetime('now'), :value)
+        ON CONFLICT(path) DO UPDATE SET size=:size, mtime_ns=:mtime_ns, entry_dt=datetime('now'), {key}=:value;
    """
-    INITIAL_INFO = {
-        'size': 0,
-        'mtime': 0,
-        'md5': '',
-        'md5partial': '',
-    }
+
+    def __init__(self):
+        self.conn = None
+        self.cur = None
+        self.lock = None
+
+    def connect(self, path):
+        # type: (str, ) -> None
+
+        self.conn = sqlite3.connect(path, check_same_thread=False)
+        self.cur = self.conn.cursor()
+        self.cur.execute(self.create_table_query)
+        self.lock = Lock()
+
+    def clear(self):
+        # type: () -> None
+
+        with self.lock:
+            self.cur.execute(self.drop_table_query)
+            self.cur.execute(self.create_table_query)
+
+    def get(self, path, key):
+        # type: (Path, str) -> bytes
+
+        stat = path.stat()
+        size = stat.st_size
+        mtime_ns = stat.st_mtime_ns
+
+        with self.lock:
+            self.cur.execute(self.select_query.format(key=key), {"path": str(path), "size": size, "mtime_ns": mtime_ns})
+            result = self.cur.fetchone()
+
+        if result:
+            return result[0]
+
+        return None
+
+    def put(self, path, key, value):
+        # type: (Path, str, Any) -> None
+
+        stat = path.stat()
+        size = stat.st_size
+        mtime_ns = stat.st_mtime_ns
+
+        with self.lock:
+            self.cur.execute(
+                self.insert_query.format(key=key),
+                {"path": str(path), "size": size, "mtime_ns": mtime_ns, "value": value},
+            )
+
+    def commit(self):
+        # type: () -> None
+
+        with self.lock:
+            self.conn.commit()
+
+    def close(self):
+        # type: () -> None
+
+        with self.lock:
+            self.cur.close()
+            self.conn.close()
+
+
+filesdb = FilesDB()  # Singleton
+
+
+class File:
+    """Represents a file and holds metadata to be used for scanning."""
+
+    INITIAL_INFO = {"size": 0, "mtime": 0, "md5": b"", "md5partial": b"", "md5samples": b""}
    # Slots for File make us save quite a bit of memory. In a memory test I've made with a lot of
    # files, I saved 35% memory usage with "unread" files (no _read_info() call) and gains become
    # even greater when we take into account read attributes (70%!). Yeah, it's worth it.
-    __slots__ = ('path', 'is_ref', 'words') + tuple(INITIAL_INFO.keys())
+    __slots__ = ("path", "is_ref", "words") + tuple(INITIAL_INFO.keys())

    def __init__(self, path):
        self.path = path
@@ -96,30 +187,10 @@ class File:
                result = self.INITIAL_INFO[attrname]
        return result

-    #This offset is where we should start reading the file to get a partial md5
-    #For audio file, it should be where audio data starts
-    def _get_md5partial_offset_and_size(self):
-        return (0x4000, 0x4000) #16Kb
+    def _calc_md5(self):
+        # type: () -> bytes

-    def _read_info(self, field):
-        if field in ('size', 'mtime'):
-            stats = self.path.stat()
-            self.size = nonone(stats.st_size, 0)
-            self.mtime = nonone(stats.st_mtime, 0)
-        elif field == 'md5partial':
-            try:
-                fp = self.path.open('rb')
-                offset, size = self._get_md5partial_offset_and_size()
-                fp.seek(offset)
-                partialdata = fp.read(size)
-                md5 = hashlib.md5(partialdata)
-                self.md5partial = md5.digest()
-                fp.close()
-            except Exception:
-                pass
-        elif field == 'md5':
-            try:
-                fp = self.path.open('rb')
+        with self.path.open("rb") as fp:
            md5 = hashlib.md5()
            # The goal here is to not run out of memory on really big files. However, the chunk
            # size has to be large enough so that the python loop isn't too costly in terms of
@@ -129,10 +200,68 @@ class File:
            while filedata:
                md5.update(filedata)
                filedata = fp.read(CHUNK_SIZE)
-                self.md5 = md5.digest()
-                fp.close()
-            except Exception:
-                pass
+            return md5.digest()
+
+    def _calc_md5partial(self):
+        # type: () -> bytes
+
+        # This offset is where we should start reading the file to get a partial md5
+        # For audio file, it should be where audio data starts
+        offset, size = (0x4000, 0x4000)
+
+        with self.path.open("rb") as fp:
+            fp.seek(offset)
+            partialdata = fp.read(size)
+            return hashlib.md5(partialdata).digest()
+
+    def _read_info(self, field):
+        # print(f"_read_info({field}) for {self}")
+        if field in ("size", "mtime"):
+            stats = self.path.stat()
+            self.size = nonone(stats.st_size, 0)
+            self.mtime = nonone(stats.st_mtime, 0)
+        elif field == "md5partial":
+            try:
+                self.md5partial = filesdb.get(self.path, "md5partial")
+                if self.md5partial is None:
+                    self.md5partial = self._calc_md5partial()
+                    filesdb.put(self.path, "md5partial", self.md5partial)
+            except Exception as e:
+                logging.warning("Couldn't get md5partial for %s: %s", self.path, e)
+        elif field == "md5":
+            try:
+                self.md5 = filesdb.get(self.path, "md5")
+                if self.md5 is None:
+                    self.md5 = self._calc_md5()
+                    filesdb.put(self.path, "md5", self.md5)
+            except Exception as e:
+                logging.warning("Couldn't get md5 for %s: %s", self.path, e)
+        elif field == "md5samples":
+            try:
+                with self.path.open("rb") as fp:
+                    size = self.size
+                    # Might as well hash such small files entirely.
+                    if size <= MIN_FILE_SIZE:
+                        setattr(self, field, self.md5)
+                        return
+
+                    # Chunk at 25% of the file
+                    fp.seek(floor(size * 25 / 100), 0)
+                    filedata = fp.read(CHUNK_SIZE)
+                    md5 = hashlib.md5(filedata)
+
+                    # Chunk at 60% of the file
+                    fp.seek(floor(size * 60 / 100), 0)
+                    filedata = fp.read(CHUNK_SIZE)
+                    md5.update(filedata)
+
+                    # Last chunk of the file
+                    fp.seek(-CHUNK_SIZE, 2)
+                    filedata = fp.read(CHUNK_SIZE)
+                    md5.update(filedata)
+                    setattr(self, field, md5.digest())
+            except Exception as e:
+                logging.error(f"Error computing md5samples: {e}")

    def _read_all_info(self, attrnames=None):
        """Cache all possible info.
@@ -144,11 +273,10 @@ class File:
        for attrname in attrnames:
            getattr(self, attrname)

-    #--- Public
+    # --- Public
    @classmethod
    def can_handle(cls, path):
-        """Returns whether this file wrapper class can handle ``path``.
-        """
+        """Returns whether this file wrapper class can handle ``path``."""
        return not path.islink() and path.isfile()

    def rename(self, newname):
@@ -166,11 +294,10 @@ class File:
        self.path = destpath

    def get_display_info(self, group, delta):
-        """Returns a display-ready dict of dupe's data.
-        """
+        """Returns a display-ready dict of dupe's data."""
        raise NotImplementedError()

-    #--- Properties
+    # --- Properties
    @property
    def extension(self):
        return get_file_ext(self.name)
@@ -187,9 +314,10 @@ class File:
 class Folder(File):
    """A wrapper around a folder path.

-    It has the size/md5 info of a File, but it's value are the sum of its subitems.
+    It has the size/md5 info of a File, but its value is the sum of its subitems.
    """
-    __slots__ = File.__slots__ + ('_subfolders', )
+
+    __slots__ = File.__slots__ + ("_subfolders",)

    def __init__(self, path):
        File.__init__(self, path)
@@ -201,20 +329,22 @@ class Folder(File):
        return folders + files

    def _read_info(self, field):
-        if field in {'size', 'mtime'}:
+        # print(f"_read_info({field}) for Folder {self}")
+        if field in {"size", "mtime"}:
            size = sum((f.size for f in self._all_items()), 0)
            self.size = size
            stats = self.path.stat()
            self.mtime = nonone(stats.st_mtime, 0)
-        elif field in {'md5', 'md5partial'}:
+        elif field in {"md5", "md5partial", "md5samples"}:
            # What's sensitive here is that we must make sure that subfiles'
            # md5 are always added up in the same order, but we also want a
            # different md5 if a file gets moved in a different subdirectory.
+
            def get_dir_md5_concat():
                items = self._all_items()
                items.sort(key=lambda f: f.path)
                md5s = [getattr(f, field) for f in items]
-                return b''.join(md5s)
+                return b"".join(md5s)

            md5 = hashlib.md5(get_dir_md5_concat())
            digest = md5.digest()
@@ -244,6 +374,7 @@ def get_file(path, fileclasses=[File]):
        if fileclass.can_handle(path):
            return fileclass(path)

+
 def get_files(path, fileclasses=[File]):
    """Returns a list of :class:`File` for each file contained in ``path``.

--- a/core/gui/init.py
+++ b/core/gui/init.py
@@ -13,4 +13,3 @@ blue, which is supposed to be orange, does the sorting logic, holds selection, e

 .. _cross-toolkit: http://www.hardcoded.net/articles/cross-toolkit-software
 """
-
--- a/core/gui/base.py
+++ b/core/gui/base.py
@@ -8,23 +8,28 @@

 from hscommon.notify import Listener

+
 class DupeGuruGUIObject(Listener):
    def __init__(self, app):
        Listener.__init__(self, app)
        self.app = app

    def directories_changed(self):
+        # Implemented in child classes
        pass

    def dupes_selected(self):
+        # Implemented in child classes
        pass

    def marking_changed(self):
+        # Implemented in child classes
        pass

    def results_changed(self):
+        # Implemented in child classes
        pass

    def results_changed_but_keep_selection(self):
+        # Implemented in child classes
        pass
-
--- a/core/gui/deletion_options.py
+++ b/core/gui/deletion_options.py
@@ -10,6 +10,7 @@ import os
 from hscommon.gui.base import GUIObject
 from hscommon.trans import tr

+
 class DeletionOptionsView:
    """Expected interface for :class:`DeletionOptions`'s view.

@@ -26,9 +27,9 @@ class DeletionOptionsView:
    Other than the flags, there's also a prompt message which has a dynamic content, defined by
    :meth:`update_msg`.
    """
+
    def update_msg(self, msg: str):
-        """Update the dialog's prompt with ``str``.
-        """
+        """Update the dialog's prompt with ``str``."""

    def show(self):
        """Show the dialog in a modal fashion.
@@ -37,8 +38,8 @@ class DeletionOptionsView:
        """

    def set_hardlink_option_enabled(self, is_enabled: bool):
-        """Enable or disable the widget controlling :attr:`DeletionOptions.use_hardlinks`.
-        """
+        """Enable or disable the widget controlling :attr:`DeletionOptions.use_hardlinks`."""
+

 class DeletionOptions(GUIObject):
    """Present the user with deletion options before proceeding.
@@ -46,6 +47,7 @@ class DeletionOptions(GUIObject):
    When the user activates "Send to trash", we present him with a couple of options that changes
    the behavior of that deletion operation.
    """
+
    def __init__(self):
        GUIObject.__init__(self)
        #: Whether symlinks or hardlinks are used when doing :attr:`link_deleted`.
@@ -71,8 +73,7 @@ class DeletionOptions(GUIObject):
        return self.view.show()

    def supports_links(self):
-        """Returns whether our platform supports symlinks.
-        """
+        """Returns whether our platform supports symlinks."""
        # When on a platform that doesn't implement it, calling os.symlink() (with the wrong number
        # of arguments) raises NotImplementedError, which allows us to gracefully check for the
        # feature.
@@ -103,5 +104,3 @@ class DeletionOptions(GUIObject):
        self._link_deleted = value
        hardlinks_enabled = value and self.supports_links()
        self.view.set_hardlink_option_enabled(hardlinks_enabled)
-    
-    
--- a/core/gui/details_panel.py
+++ b/core/gui/details_panel.py
@@ -9,6 +9,7 @@
 from hscommon.gui.base import GUIObject
 from .base import DupeGuruGUIObject

+
 class DetailsPanel(GUIObject, DupeGuruGUIObject):
    def __init__(self, app):
        GUIObject.__init__(self, multibind=True)
@@ -19,7 +20,7 @@ class DetailsPanel(GUIObject, DupeGuruGUIObject):
        self._refresh()
        self.view.refresh()

-    #--- Private
+    # --- Private
    def _refresh(self):
        if self.app.selected_dupes:
            dupe = self.app.selected_dupes[0]
@@ -34,15 +35,13 @@ class DetailsPanel(GUIObject, DupeGuruGUIObject):
        columns = self.app.result_table.COLUMNS[1:]  # first column is the 'marked' column
        self._table = [(c.display, data1[c.name], data2[c.name]) for c in columns]

-    #--- Public
+    # --- Public
    def row_count(self):
        return len(self._table)

    def row(self, row_index):
        return self._table[row_index]

-    #--- Event Handlers
+    # --- Event Handlers
    def dupes_selected(self):
-        self._refresh()
-        self.view.refresh()
-
+        self._view_updated()
--- a/core/gui/directory_tree.py
+++ b/core/gui/directory_tree.py
@@ -11,7 +11,8 @@ from hscommon.gui.tree import Tree, Node
 from ..directories import DirectoryState
 from .base import DupeGuruGUIObject

-STATE_ORDER = [DirectoryState.Normal, DirectoryState.Reference, DirectoryState.Excluded]
+STATE_ORDER = [DirectoryState.NORMAL, DirectoryState.REFERENCE, DirectoryState.EXCLUDED]
+

 # Lazily loads children
 class DirectoryNode(Node):
@@ -55,7 +56,7 @@ class DirectoryNode(Node):


 class DirectoryTree(Tree, DupeGuruGUIObject):
-    #--- model -> view calls:
+    # --- model -> view calls:
    # refresh()
    # refresh_states() # when only states label need to be refreshed
    #
@@ -85,9 +86,9 @@ class DirectoryTree(Tree, DupeGuruGUIObject):
        else:
            # All selected nodes or on second-or-more level, exclude them.
            nodes = self.selected_nodes
-            newstate = DirectoryState.Excluded
-            if all(node.state == DirectoryState.Excluded for node in nodes):
-                newstate = DirectoryState.Normal
+            newstate = DirectoryState.EXCLUDED
+            if all(node.state == DirectoryState.EXCLUDED for node in nodes):
+                newstate = DirectoryState.NORMAL
            for node in nodes:
                node.state = newstate

@@ -100,8 +101,6 @@ class DirectoryTree(Tree, DupeGuruGUIObject):
            node.update_all_states()
        self.view.refresh_states()

-    #--- Event Handlers
+    # --- Event Handlers
    def directories_changed(self):
-        self._refresh()
-        self.view.refresh()
-    
+        self._view_updated()
--- a/core/gui/exclude_list_dialog.py
+++ b/core/gui/exclude_list_dialog.py
@@ -0,0 +1,90 @@
+# Created On: 2012/03/13
+# Copyright 2015 Hardcoded Software (http://www.hardcoded.net)
+#
+# This software is licensed under the "GPLv3" License as described in the "LICENSE" file,
+# which should be included with this package. The terms are also available at
+# http://www.gnu.org/licenses/gpl-3.0.html
+
+from .exclude_list_table import ExcludeListTable
+from core.exclude import has_sep
+from os import sep
+import logging
+
+
+class ExcludeListDialogCore:
+    def __init__(self, app):
+        self.app = app
+        self.exclude_list = self.app.exclude_list  # Markable from exclude.py
+        self.exclude_list_table = ExcludeListTable(self, app)  # GUITable, this is the "model"
+
+    def restore_defaults(self):
+        self.exclude_list.restore_defaults()
+        self.refresh()
+
+    def refresh(self):
+        self.exclude_list_table.refresh()
+
+    def remove_selected(self):
+        for row in self.exclude_list_table.selected_rows:
+            self.exclude_list_table.remove(row)
+            self.exclude_list.remove(row.regex)
+        self.refresh()
+
+    def rename_selected(self, newregex):
+        """Rename the selected regex to ``newregex``.
+        If there is more than one selected row, the first one is used.
+        :param str newregex: The regex to rename the row's regex to.
+        :return bool: true if success, false if error.
+        """
+        try:
+            r = self.exclude_list_table.selected_rows[0]
+            self.exclude_list.rename(r.regex, newregex)
+            self.refresh()
+            return True
+        except Exception as e:
+            logging.warning(f"Error while renaming regex to {newregex}: {e}")
+        return False
+
+    def add(self, regex):
+        self.exclude_list.add(regex)
+        self.exclude_list.mark(regex)
+        self.exclude_list_table.add(regex)
+
+    def test_string(self, test_string):
+        """Set the highlight property on each row when its regex matches the
+        test_string supplied. Return True if any row matched."""
+        matched = False
+        for row in self.exclude_list_table.rows:
+            compiled_regex = self.exclude_list.get_compiled(row.regex)
+
+            if self.is_match(test_string, compiled_regex):
+                row.highlight = True
+                matched = True
+            else:
+                row.highlight = False
+        return matched
+
+    def is_match(self, test_string, compiled_regex):
+        # This method is like an inverted version of ExcludeList.is_excluded()
+        if not compiled_regex:
+            return False
+        matched = False
+
+        # Test only the filename portion of the path
+        if not has_sep(compiled_regex.pattern) and sep in test_string:
+            filename = test_string.rsplit(sep, 1)[1]
+            if compiled_regex.fullmatch(filename):
+                matched = True
+            return matched
+
+        # Test the entire path + filename
+        if compiled_regex.fullmatch(test_string):
+            matched = True
+        return matched
+
+    def reset_rows_highlight(self):
+        for row in self.exclude_list_table.rows:
+            row.highlight = False
+
+    def show(self):
+        self.view.show()
--- a/core/gui/exclude_list_table.py
+++ b/core/gui/exclude_list_table.py
@@ -0,0 +1,96 @@
+# This software is licensed under the "GPLv3" License as described in the "LICENSE" file,
+# which should be included with this package. The terms are also available at
+# http://www.gnu.org/licenses/gpl-3.0.html
+
+from .base import DupeGuruGUIObject
+from hscommon.gui.table import GUITable, Row
+from hscommon.gui.column import Column, Columns
+from hscommon.trans import trget
+
+tr = trget("ui")
+
+
+class ExcludeListTable(GUITable, DupeGuruGUIObject):
+    COLUMNS = [Column("marked", ""), Column("regex", tr("Regular Expressions"))]
+
+    def __init__(self, exclude_list_dialog, app):
+        GUITable.__init__(self)
+        DupeGuruGUIObject.__init__(self, app)
+        self._columns = Columns(self)
+        self.dialog = exclude_list_dialog
+
+    def rename_selected(self, newname):
+        row = self.selected_row
+        if row is None:
+            return False
+        row._data = None
+        return self.dialog.rename_selected(newname)
+
+    # --- Virtual
+    def _do_add(self, regex):
+        """(Virtual) Creates a new row, adds it in the table.
+        Returns ``(row, insert_index)``."""
+        # Return index 0 to insert at the top
+        return ExcludeListRow(self, self.dialog.exclude_list.is_marked(regex), regex), 0
+
+    def _do_delete(self):
+        self.dialog.exclude_list.remove(self.selected_row.regex)
+
+    # --- Override
+    def add(self, regex):
+        row, insert_index = self._do_add(regex)
+        self.insert(insert_index, row)
+        self.view.refresh()
+
+    def _fill(self):
+        for enabled, regex in self.dialog.exclude_list:
+            self.append(ExcludeListRow(self, enabled, regex))
+
+    def refresh(self, refresh_view=True):
+        """Override to avoid keeping previous selection in case of multiple rows
+        selected previously."""
+        self.cancel_edits()
+        del self[:]
+        self._fill()
+        if refresh_view:
+            self.view.refresh()
+
+
+class ExcludeListRow(Row):
+    def __init__(self, table, enabled, regex):
+        Row.__init__(self, table)
+        self._app = table.app
+        self._data = None
+        self.enabled = str(enabled)
+        self.regex = str(regex)
+        self.highlight = False
+
+    @property
+    def data(self):
+        if self._data is None:
+            self._data = {"marked": self.enabled, "regex": self.regex}
+        return self._data
+
+    @property
+    def markable(self):
+        return self._app.exclude_list.is_markable(self.regex)
+
+    @property
+    def marked(self):
+        return self._app.exclude_list.is_marked(self.regex)
+
+    @marked.setter
+    def marked(self, value):
+        if value:
+            self._app.exclude_list.mark(self.regex)
+        else:
+            self._app.exclude_list.unmark(self.regex)
+
+    @property
+    def error(self):
+        # This assumes error() returns an Exception()
+        message = self._app.exclude_list.error(self.regex)
+        if hasattr(message, "msg"):
+            return self._app.exclude_list.error(self.regex).msg
+        else:
+            return message  # Exception object
--- a/core/gui/ignore_list_dialog.py
+++ b/core/gui/ignore_list_dialog.py
@@ -8,22 +8,23 @@
 from hscommon.trans import tr
 from .ignore_list_table import IgnoreListTable

+
 class IgnoreListDialog:
-    #--- View interface
+    # --- View interface
    # show()
    #

    def __init__(self, app):
        self.app = app
        self.ignore_list = self.app.ignore_list
-        self.ignore_list_table = IgnoreListTable(self)
+        self.ignore_list_table = IgnoreListTable(self)  # GUITable

    def clear(self):
        if not self.ignore_list:
            return
        msg = tr("Do you really want to remove all %d items from the ignore list?") % len(self.ignore_list)
        if self.app.view.ask_yes_no(msg):
-            self.ignore_list.Clear()
+            self.ignore_list.clear()
            self.refresh()

    def refresh(self):
@@ -36,4 +37,3 @@ class IgnoreListDialog:

    def show(self):
        self.view.show()
-
--- a/core/gui/ignore_list_table.py
+++ b/core/gui/ignore_list_table.py
@@ -10,22 +10,23 @@ from hscommon.gui.table import GUITable, Row
 from hscommon.gui.column import Column, Columns
 from hscommon.trans import trget

-coltr = trget('columns')
+coltr = trget("columns")
+

 class IgnoreListTable(GUITable):
    COLUMNS = [
        # the str concat below saves us needless localization.
-        Column('path1', coltr("File Path") + " 1"),
-        Column('path2', coltr("File Path") + " 2"),
+        Column("path1", coltr("File Path") + " 1"),
+        Column("path2", coltr("File Path") + " 2"),
    ]

    def __init__(self, ignore_list_dialog):
        GUITable.__init__(self)
-        self.columns = Columns(self)
+        self._columns = Columns(self)
        self.view = None
        self.dialog = ignore_list_dialog

-    #--- Override
+    # --- Override
    def _fill(self):
        for path1, path2 in self.dialog.ignore_list:
            self.append(IgnoreListRow(self, path1, path2))
@@ -38,4 +39,3 @@ class IgnoreListRow(Row):
        self.path2_original = path2
        self.path1 = str(path1)
        self.path2 = str(path2)
-    
--- a/core/gui/prioritize_dialog.py
+++ b/core/gui/prioritize_dialog.py
@@ -9,6 +9,7 @@
 from hscommon.gui.base import GUIObject
 from hscommon.gui.selectable_list import GUISelectableList

+
 class CriterionCategoryList(GUISelectableList):
    def __init__(self, dialog):
        self.dialog = dialog
@@ -18,6 +19,7 @@ class CriterionCategoryList(GUISelectableList):
        self.dialog.select_category(self.dialog.categories[self.selected_index])
        GUISelectableList._update_selection(self)

+
 class PrioritizationList(GUISelectableList):
    def __init__(self, dialog):
        self.dialog = dialog
@@ -41,6 +43,7 @@ class PrioritizationList(GUISelectableList):
            del prilist[i]
        self._refresh_contents()

+
 class PrioritizeDialog(GUIObject):
    def __init__(self, app):
        GUIObject.__init__(self)
@@ -52,15 +55,15 @@ class PrioritizeDialog(GUIObject):
        self.prioritizations = []
        self.prioritization_list = PrioritizationList(self)

-    #--- Override
+    # --- Override
    def _view_updated(self):
        self.category_list.select(0)

-    #--- Private
+    # --- Private
    def _sort_key(self, dupe):
        return tuple(crit.sort_key(dupe) for crit in self.prioritizations)

-    #--- Public
+    # --- Public
    def select_category(self, category):
        self.criteria = category.criteria_list()
        self.criteria_list[:] = [c.display_value for c in self.criteria]
@@ -69,13 +72,15 @@ class PrioritizeDialog(GUIObject):
        # Add selected criteria in criteria_list to prioritization_list.
        if self.criteria_list.selected_index is None:
            return
-        crit = self.criteria[self.criteria_list.selected_index]
+        for i in self.criteria_list.selected_indexes:
+            crit = self.criteria[i]
            self.prioritizations.append(crit)
            del crit
        self.prioritization_list[:] = [crit.display for crit in self.prioritizations]

    def remove_selected(self):
        self.prioritization_list.remove_selected()
+        self.prioritization_list.select([])

    def perform_reprioritization(self):
        self.app.reprioritize_groups(self._sort_key)
--- a/core/gui/problem_dialog.py
+++ b/core/gui/problem_dialog.py
@@ -10,6 +10,7 @@ from hscommon import desktop

 from .problem_table import ProblemTable

+
 class ProblemDialog:
    def __init__(self, app):
        self.app = app
@@ -26,4 +27,3 @@ class ProblemDialog:

    def select_dupe(self, dupe):
        self._selected_dupe = dupe
-    
--- a/core/gui/problem_table.py
+++ b/core/gui/problem_table.py
@@ -10,20 +10,21 @@ from hscommon.gui.table import GUITable, Row
 from hscommon.gui.column import Column, Columns
 from hscommon.trans import trget

-coltr = trget('columns')
+coltr = trget("columns")
+

 class ProblemTable(GUITable):
    COLUMNS = [
-        Column('path', coltr("File Path")),
-        Column('msg', coltr("Error Message")),
+        Column("path", coltr("File Path")),
+        Column("msg", coltr("Error Message")),
    ]

    def __init__(self, problem_dialog):
        GUITable.__init__(self)
-        self.columns = Columns(self)
+        self._columns = Columns(self)
        self.dialog = problem_dialog

-    #--- Override
+    # --- Override
    def _update_selection(self):
        row = self.selected_row
        dupe = row.dupe if row is not None else None
@@ -41,4 +42,3 @@ class ProblemRow(Row):
        self.dupe = dupe
        self.msg = msg
        self.path = str(dupe.path)
-    
--- a/core/gui/result_table.py
+++ b/core/gui/result_table.py
@@ -13,6 +13,7 @@ from hscommon.gui.column import Columns

 from .base import DupeGuruGUIObject

+
 class DupeRow(Row):
    def __init__(self, table, group, dupe):
        Row.__init__(self, table)
@@ -40,6 +41,8 @@ class DupeRow(Row):
            # table.DELTA_COLUMNS are always "delta"
            self._delta_columns = self.table.DELTA_COLUMNS.copy()
            dupe_info = self.data
+            if self._group.ref is None:
+                return False
            ref_info = self._group.ref.get_display_info(group=self._group, delta=False)
            for key, value in dupe_info.items():
                if (key not in self._delta_columns) and (ref_info[key].lower() != value.lower()):
@@ -79,12 +82,12 @@ class ResultTable(GUITable, DupeGuruGUIObject):
    def __init__(self, app):
        GUITable.__init__(self)
        DupeGuruGUIObject.__init__(self, app)
-        self.columns = Columns(self, prefaccess=app, savename='ResultTable')
+        self._columns = Columns(self, prefaccess=app, savename="ResultTable")
        self._power_marker = False
        self._delta_values = False
-        self._sort_descriptors = ('name', True)
+        self._sort_descriptors = ("name", True)

-    #--- Override
+    # --- Override
    def _view_updated(self):
        self._refresh_with_view()

@@ -96,7 +99,7 @@ class ResultTable(GUITable, DupeGuruGUIObject):

    def _update_selection(self):
        rows = self.selected_rows
-        self.app._select_dupes(list(map(attrgetter('_dupe'), rows)))
+        self.app._select_dupes(list(map(attrgetter("_dupe"), rows)))

    def _fill(self):
        if not self.power_marker:
@@ -113,12 +116,12 @@ class ResultTable(GUITable, DupeGuruGUIObject):
        self.refresh()
        self.view.show_selected_row()

-    #--- Public
+    # --- Public
    def get_row_value(self, index, column):
        try:
            row = self[index]
        except IndexError:
-            return '---'
+            return "---"
        if self.delta_values:
            return row.data_delta[column]
        else:
@@ -142,7 +145,7 @@ class ResultTable(GUITable, DupeGuruGUIObject):
        self._sort_descriptors = (key, asc)
        self._refresh_with_view()

-    #--- Properties
+    # --- Properties
    @property
    def power_marker(self):
        return self._power_marker
@@ -171,7 +174,7 @@ class ResultTable(GUITable, DupeGuruGUIObject):
    def selected_dupe_count(self):
        return sum(1 for row in self.selected_rows if not row.isref)

-    #--- Event Handlers
+    # --- Event Handlers
    def marking_changed(self):
        self.view.invalidate_markings()

@@ -187,5 +190,4 @@ class ResultTable(GUITable, DupeGuruGUIObject):
        self.view.refresh()

    def save_session(self):
-        self.columns.save_columns()
-    
+        self._columns.save_columns()
--- a/core/gui/stats_label.py
+++ b/core/gui/stats_label.py
@@ -8,6 +8,7 @@

 from .base import DupeGuruGUIObject

+
 class StatsLabel(DupeGuruGUIObject):
    def _view_updated(self):
        self.view.refresh()
@@ -18,4 +19,5 @@ class StatsLabel(DupeGuruGUIObject):

    def results_changed(self):
        self.view.refresh()
+
    marking_changed = results_changed
--- a/core/ignore.py
+++ b/core/ignore.py
@@ -10,16 +10,17 @@ from xml.etree import ElementTree as ET

 from hscommon.util import FileOrPath

+
 class IgnoreList:
    """An ignore list implementation that is iterable, filterable and exportable to XML.

    Call Ignore to add an ignore list entry, and AreIgnore to check if 2 items are in the list.
    When iterated, 2 sized tuples will be returned, the tuples containing 2 items ignored together.
    """
-    #---Override
+
+    # ---Override
    def __init__(self):
-        self._ignored = {}
-        self._count = 0
+        self.clear()

    def __iter__(self):
        for first, seconds in self._ignored.items():
@@ -29,8 +30,8 @@ class IgnoreList:
    def __len__(self):
        return self._count

-    #---Public
-    def AreIgnored(self, first, second):
+    # ---Public
+    def are_ignored(self, first, second):
        def do_check(first, second):
            try:
                matches = self._ignored[first]
@@ -40,23 +41,23 @@ class IgnoreList:

        return do_check(first, second) or do_check(second, first)

-    def Clear(self):
+    def clear(self):
        self._ignored = {}
        self._count = 0

-    def Filter(self, func):
+    def filter(self, func):
        """Applies a filter on all ignored items, and remove all matches where func(first,second)
        doesn't return True.
        """
        filtered = IgnoreList()
        for first, second in self:
            if func(first, second):
-                filtered.Ignore(first, second)
+                filtered.ignore(first, second)
        self._ignored = filtered._ignored
        self._count = filtered._count

-    def Ignore(self, first, second):
-        if self.AreIgnored(first, second):
+    def ignore(self, first, second):
+        if self.are_ignored(first, second):
            return
        try:
            matches = self._ignored[first]
@@ -86,8 +87,7 @@ class IgnoreList:
            except KeyError:
                return False

-        if not inner(first, second):
-            if not inner(second, first):
+        if not inner(first, second) and not inner(second, first):
            raise ValueError()

    def load_from_xml(self, infile):
@@ -99,31 +99,29 @@ class IgnoreList:
            root = ET.parse(infile).getroot()
        except Exception:
            return
-        file_elems = (e for e in root if e.tag == 'file')
+        file_elems = (e for e in root if e.tag == "file")
        for fn in file_elems:
-            file_path = fn.get('path')
+            file_path = fn.get("path")
            if not file_path:
                continue
-            subfile_elems = (e for e in fn if e.tag == 'file')
+            subfile_elems = (e for e in fn if e.tag == "file")
            for sfn in subfile_elems:
-                subfile_path = sfn.get('path')
+                subfile_path = sfn.get("path")
                if subfile_path:
-                    self.Ignore(file_path, subfile_path)
+                    self.ignore(file_path, subfile_path)

    def save_to_xml(self, outfile):
        """Create a XML file that can be used by load_from_xml.

        outfile can be a file object or a filename.
        """
-        root = ET.Element('ignore_list')
+        root = ET.Element("ignore_list")
        for filename, subfiles in self._ignored.items():
-            file_node = ET.SubElement(root, 'file')
-            file_node.set('path', filename)
+            file_node = ET.SubElement(root, "file")
+            file_node.set("path", filename)
            for subfilename in subfiles:
-                subfile_node = ET.SubElement(file_node, 'file')
-                subfile_node.set('path', subfilename)
+                subfile_node = ET.SubElement(file_node, "file")
+                subfile_node.set("path", subfilename)
        tree = ET.ElementTree(root)
-        with FileOrPath(outfile, 'wb') as fp:
-            tree.write(fp, encoding='utf-8')
-
-
+        with FileOrPath(outfile, "wb") as fp:
+            tree.write(fp, encoding="utf-8")
--- a/core/markable.py
+++ b/core/markable.py
@@ -6,19 +6,22 @@
 # which should be included with this package. The terms are also available at
 # http://www.gnu.org/licenses/gpl-3.0.html

+
 class Markable:
    def __init__(self):
        self.__marked = set()
        self.__inverted = False

-    #---Virtual
-    #About did_mark and did_unmark: They only happen what an object is actually added/removed
+    # ---Virtual
+    # About did_mark and did_unmark: They only happen what an object is actually added/removed
    # in self.__marked, and is not affected by __inverted. Thus, self.mark while __inverted
-    #is True will launch _DidUnmark.
+    # is True will launch _DidUnmark.
    def _did_mark(self, o):
+        # Implemented in child classes
        pass

    def _did_unmark(self, o):
+        # Implemented in child classes
        pass

    def _get_markable_count(self):
@@ -27,7 +30,7 @@ class Markable:
    def _is_markable(self, o):
        return True

-    #---Protected
+    # ---Protected
    def _remove_mark_flag(self, o):
        try:
            self.__marked.remove(o)
@@ -35,7 +38,7 @@ class Markable:
        except KeyError:
            pass

-    #---Public
+    # ---Public
    def is_marked(self, o):
        if not self._is_markable(o):
            return False
@@ -92,7 +95,7 @@ class Markable:
        for o in objects:
            self.unmark(o)

-    #--- Properties
+    # --- Properties
    @property
    def mark_count(self):
        if self.__inverted:
@@ -104,6 +107,7 @@ class Markable:
    def mark_inverted(self):
        return self.__inverted

+
 class MarkableList(list, Markable):
    def __init__(self):
        list.__init__(self)
--- a/core/me/fs.py
+++ b/core/me/fs.py
@@ -6,39 +6,54 @@
 # which should be included with this package. The terms are also available at
 # http://www.gnu.org/licenses/gpl-3.0.html

-from hsaudiotag import auto
+import mutagen
 from hscommon.util import get_file_ext, format_size, format_time

 from core.util import format_timestamp, format_perc, format_words, format_dupe_count
 from core import fs

 TAG_FIELDS = {
-    'audiosize', 'duration', 'bitrate', 'samplerate', 'title', 'artist',
-    'album', 'genre', 'year', 'track', 'comment'
+    "audiosize",
+    "duration",
+    "bitrate",
+    "samplerate",
+    "title",
+    "artist",
+    "album",
+    "genre",
+    "year",
+    "track",
+    "comment",
 }

+# This is a temporary workaround for migration from hsaudiotag for the can_handle method
+SUPPORTED_EXTS = {"mp3", "wma", "m4a", "m4p", "ogg", "flac", "aif", "aiff", "aifc"}
+
+
 class MusicFile(fs.File):
    INITIAL_INFO = fs.File.INITIAL_INFO.copy()
-    INITIAL_INFO.update({
-        'audiosize': 0,
-        'bitrate': 0,
-        'duration': 0,
-        'samplerate': 0,
-        'artist': '',
-        'album': '',
-        'title': '',
-        'genre': '',
-        'comment': '',
-        'year': '',
-        'track': 0,
-    })
+    INITIAL_INFO.update(
+        {
+            "audiosize": 0,
+            "bitrate": 0,
+            "duration": 0,
+            "samplerate": 0,
+            "artist": "",
+            "album": "",
+            "title": "",
+            "genre": "",
+            "comment": "",
+            "year": "",
+            "track": 0,
+        }
+    )
    __slots__ = fs.File.__slots__ + tuple(INITIAL_INFO.keys())

    @classmethod
    def can_handle(cls, path):
        if not fs.File.can_handle(path):
            return False
-        return get_file_ext(path.name) in auto.EXT2CLASS
+        return get_file_ext(path.name) in SUPPORTED_EXTS

    def get_display_info(self, group, delta):
        size = self.size
@@ -60,45 +75,46 @@ class MusicFile(fs.File):
        else:
            percentage = group.percentage
            dupe_count = len(group.dupes)
-        dupe_folder_path = getattr(self, 'display_folder_path', self.folder_path)
+        dupe_folder_path = getattr(self, "display_folder_path", self.folder_path)
        return {
-            'name': self.name,
-            'folder_path': str(dupe_folder_path),
-            'size': format_size(size, 2, 2, False),
-            'duration': format_time(duration, with_hours=False),
-            'bitrate': str(bitrate),
-            'samplerate': str(samplerate),
-            'extension': self.extension,
-            'mtime': format_timestamp(mtime, delta and m),
-            'title': self.title,
-            'artist': self.artist,
-            'album': self.album,
-            'genre': self.genre,
-            'year': self.year,
-            'track': str(self.track),
-            'comment': self.comment,
-            'percentage': format_perc(percentage),
-            'words': format_words(self.words) if hasattr(self, 'words') else '',
-            'dupe_count': format_dupe_count(dupe_count),
+            "name": self.name,
+            "folder_path": str(dupe_folder_path),
+            "size": format_size(size, 2, 2, False),
+            "duration": format_time(duration, with_hours=False),
+            "bitrate": str(bitrate),
+            "samplerate": str(samplerate),
+            "extension": self.extension,
+            "mtime": format_timestamp(mtime, delta and m),
+            "title": self.title,
+            "artist": self.artist,
+            "album": self.album,
+            "genre": self.genre,
+            "year": self.year,
+            "track": str(self.track),
+            "comment": self.comment,
+            "percentage": format_perc(percentage),
+            "words": format_words(self.words) if hasattr(self, "words") else "",
+            "dupe_count": format_dupe_count(dupe_count),
        }

    def _get_md5partial_offset_and_size(self):
-        f = auto.File(str(self.path))
-        return (f.audio_offset, f.audio_size)
+        # No longer calculating the offset and audio size, just whole file
+        size = self.path.stat().st_size
+        return (0, size)

    def _read_info(self, field):
        fs.File._read_info(self, field)
        if field in TAG_FIELDS:
-            f = auto.File(str(self.path))
-            self.audiosize = f.audio_size
-            self.bitrate = f.bitrate
-            self.duration = f.duration
-            self.samplerate = f.sample_rate
-            self.artist = f.artist
-            self.album = f.album
-            self.title = f.title
-            self.genre = f.genre
-            self.comment = f.comment
-            self.year = f.year
-            self.track = f.track
-
+            # The various conversions here are to make this look like the previous implementation
+            file = mutagen.File(str(self.path), easy=True)
+            self.audiosize = self.path.stat().st_size
+            self.bitrate = file.info.bitrate / 1000
+            self.duration = file.info.length
+            self.samplerate = file.info.sample_rate
+            self.artist = ", ".join(file.tags.get("artist") or [])
+            self.album = ", ".join(file.tags.get("album") or [])
+            self.title = ", ".join(file.tags.get("title") or [])
+            self.genre = ", ".join(file.tags.get("genre") or [])
+            self.comment = ", ".join(file.tags.get("comment") or [""])
+            self.year = ", ".join(file.tags.get("date") or [])
+            self.track = (file.tags.get("tracknumber") or [""])[0]
--- a/core/me/prioritize.py
+++ b/core/me/prioritize.py
@@ -8,11 +8,16 @@
 from hscommon.trans import trget

 from core.prioritize import (
-    KindCategory, FolderCategory, FilenameCategory, NumericalCategory,
-    SizeCategory, MtimeCategory
+    KindCategory,
+    FolderCategory,
+    FilenameCategory,
+    NumericalCategory,
+    SizeCategory,
+    MtimeCategory,
 )

-coltr = trget('columns')
+coltr = trget("columns")
+

 class DurationCategory(NumericalCategory):
    NAME = coltr("Duration")
@@ -20,21 +25,29 @@ class DurationCategory(NumericalCategory):
    def extract_value(self, dupe):
        return dupe.duration

+
 class BitrateCategory(NumericalCategory):
    NAME = coltr("Bitrate")

    def extract_value(self, dupe):
        return dupe.bitrate

+
 class SamplerateCategory(NumericalCategory):
    NAME = coltr("Samplerate")

    def extract_value(self, dupe):
        return dupe.samplerate

+
 def all_categories():
    return [
-        KindCategory, FolderCategory, FilenameCategory, SizeCategory, DurationCategory,
-        BitrateCategory, SamplerateCategory, MtimeCategory
+        KindCategory,
+        FolderCategory,
+        FilenameCategory,
+        SizeCategory,
+        DurationCategory,
+        BitrateCategory,
+        SamplerateCategory,
+        MtimeCategory,
    ]
-
--- a/core/me/result_table.py
+++ b/core/me/result_table.py
@@ -10,28 +10,29 @@ from hscommon.trans import trget

 from core.gui.result_table import ResultTable as ResultTableBase

-coltr = trget('columns')
+coltr = trget("columns")
+

 class ResultTable(ResultTableBase):
    COLUMNS = [
-        Column('marked', ''),
-        Column('name', coltr("Filename")),
-        Column('folder_path', coltr("Folder"), visible=False, optional=True),
-        Column('size', coltr("Size (MB)"), optional=True),
-        Column('duration', coltr("Time"), optional=True),
-        Column('bitrate', coltr("Bitrate"), optional=True),
-        Column('samplerate', coltr("Sample Rate"), visible=False, optional=True),
-        Column('extension', coltr("Kind"), optional=True),
-        Column('mtime', coltr("Modification"), visible=False, optional=True),
-        Column('title', coltr("Title"), visible=False, optional=True),
-        Column('artist', coltr("Artist"), visible=False, optional=True),
-        Column('album', coltr("Album"), visible=False, optional=True),
-        Column('genre', coltr("Genre"), visible=False, optional=True),
-        Column('year', coltr("Year"), visible=False, optional=True),
-        Column('track', coltr("Track Number"), visible=False, optional=True),
-        Column('comment', coltr("Comment"), visible=False, optional=True),
-        Column('percentage', coltr("Match %"), optional=True),
-        Column('words', coltr("Words Used"), visible=False, optional=True),
-        Column('dupe_count', coltr("Dupe Count"), visible=False, optional=True),
+        Column("marked", ""),
+        Column("name", coltr("Filename")),
+        Column("folder_path", coltr("Folder"), visible=False, optional=True),
+        Column("size", coltr("Size (MB)"), optional=True),
+        Column("duration", coltr("Time"), optional=True),
+        Column("bitrate", coltr("Bitrate"), optional=True),
+        Column("samplerate", coltr("Sample Rate"), visible=False, optional=True),
+        Column("extension", coltr("Kind"), optional=True),
+        Column("mtime", coltr("Modification"), visible=False, optional=True),
+        Column("title", coltr("Title"), visible=False, optional=True),
+        Column("artist", coltr("Artist"), visible=False, optional=True),
+        Column("album", coltr("Album"), visible=False, optional=True),
+        Column("genre", coltr("Genre"), visible=False, optional=True),
+        Column("year", coltr("Year"), visible=False, optional=True),
+        Column("track", coltr("Track Number"), visible=False, optional=True),
+        Column("comment", coltr("Comment"), visible=False, optional=True),
+        Column("percentage", coltr("Match %"), optional=True),
+        Column("words", coltr("Words Used"), visible=False, optional=True),
+        Column("dupe_count", coltr("Dupe Count"), visible=False, optional=True),
    ]
-    DELTA_COLUMNS = {'size', 'duration', 'bitrate', 'samplerate', 'mtime'}
+    DELTA_COLUMNS = {"size", "duration", "bitrate", "samplerate", "mtime"}
--- a/core/me/scanner.py
+++ b/core/me/scanner.py
@@ -8,6 +8,7 @@ from hscommon.trans import tr

 from core.scanner import Scanner as ScannerBase, ScanOption, ScanType

+
 class ScannerME(ScannerBase):
    @staticmethod
    def _key_func(dupe):
@@ -16,11 +17,9 @@ class ScannerME(ScannerBase):
    @staticmethod
    def get_scan_options():
        return [
-            ScanOption(ScanType.Filename, tr("Filename")),
-            ScanOption(ScanType.Fields, tr("Filename - Fields")),
-            ScanOption(ScanType.FieldsNoOrder, tr("Filename - Fields (No Order)")),
-            ScanOption(ScanType.Tag, tr("Tags")),
-            ScanOption(ScanType.Contents, tr("Contents")),
+            ScanOption(ScanType.FILENAME, tr("Filename")),
+            ScanOption(ScanType.FIELDS, tr("Filename - Fields")),
+            ScanOption(ScanType.FIELDSNOORDER, tr("Filename - Fields (No Order)")),
+            ScanOption(ScanType.TAG, tr("Tags")),
+            ScanOption(ScanType.CONTENTS, tr("Contents")),
        ]
-
-
--- a/core/pe/init.py
+++ b/core/pe/init.py
@@ -1 +1,12 @@
-from . import block, cache, exif, iphoto_plist, matchblock, matchexif, photo, prioritize, result_table, scanner # noqa
+from . import (  # noqa
+    block,
+    cache,
+    exif,
+    iphoto_plist,
+    matchblock,
+    matchexif,
+    photo,
+    prioritize,
+    result_table,
+    scanner,
+)
--- a/core/pe/cache.py
+++ b/core/pe/cache.py
@@ -6,13 +6,15 @@

 from ._cache import string_to_colors  # noqa

+
 def colors_to_string(colors):
    """Transform the 3 sized tuples 'colors' into a hex string.

    [(0,100,255)] --> 0064ff
    [(1,2,3),(4,5,6)] --> 010203040506
    """
-    return ''.join('%02x%02x%02x' % (r, g, b) for r, g, b in colors)
+    return "".join("%02x%02x%02x" % (r, g, b) for r, g, b in colors)
+

 # This function is an important bottleneck of dupeGuru PE. It has been converted to C.
 # def string_to_colors(s):
@@ -23,4 +25,3 @@ def colors_to_string(colors):
 #         number = int(s[i:i+6], 16)
 #         result.append((number >> 16, (number >> 8) & 0xff, number & 0xff))
 #     return result
-
--- a/core/pe/cache_shelve.py
+++ b/core/pe/cache_shelve.py
@@ -12,29 +12,35 @@ from collections import namedtuple

 from .cache import string_to_colors, colors_to_string

+
 def wrap_path(path):
-    return 'path:{}'.format(path)
+    return "path:{}".format(path)
+

 def unwrap_path(key):
    return key[5:]

+
 def wrap_id(path):
-    return 'id:{}'.format(path)
+    return "id:{}".format(path)
+

 def unwrap_id(key):
    return int(key[3:])

-CacheRow = namedtuple('CacheRow', 'id path blocks mtime')
+
+CacheRow = namedtuple("CacheRow", "id path blocks mtime")
+

 class ShelveCache:
-    """A class to cache picture blocks in a shelve backend.
-    """
+    """A class to cache picture blocks in a shelve backend."""
+
    def __init__(self, db=None, readonly=False):
        self.istmp = db is None
        if self.istmp:
            self.dtmp = tempfile.mkdtemp()
-            self.ftmp = db = op.join(self.dtmp, 'tmpdb')
-        flag = 'r' if readonly else 'c'
+            self.ftmp = db = op.join(self.dtmp, "tmpdb")
+        flag = "r" if readonly else "c"
        self.shelve = shelve.open(db, flag)
        self.maxid = self._compute_maxid()

@@ -54,10 +60,10 @@ class ShelveCache:
        return string_to_colors(self.shelve[skey].blocks)

    def __iter__(self):
-        return (unwrap_path(k) for k in self.shelve if k.startswith('path:'))
+        return (unwrap_path(k) for k in self.shelve if k.startswith("path:"))

    def __len__(self):
-        return sum(1 for k in self.shelve if k.startswith('path:'))
+        return sum(1 for k in self.shelve if k.startswith("path:"))

    def __setitem__(self, path_str, blocks):
        blocks = colors_to_string(blocks)
@@ -74,7 +80,7 @@ class ShelveCache:
        self.shelve[wrap_id(rowid)] = wrap_path(path_str)

    def _compute_maxid(self):
-        return max((unwrap_id(k) for k in self.shelve if k.startswith('id:')), default=1)
+        return max((unwrap_id(k) for k in self.shelve if k.startswith("id:")), default=1)

    def _get_new_id(self):
        self.maxid += 1
@@ -133,4 +139,3 @@ class ShelveCache:
                # #402 and #439. I don't think it hurts to silently ignore the error, so that's
                # what we do
                pass
-
--- a/core/pe/cache_sqlite.py
+++ b/core/pe/cache_sqlite.py
@@ -11,10 +11,11 @@ import sqlite3 as sqlite

 from .cache import string_to_colors, colors_to_string

+
 class SqliteCache:
-    """A class to cache picture blocks in a sqlite backend.
-    """
-    def __init__(self, db=':memory:', readonly=False):
+    """A class to cache picture blocks in a sqlite backend."""
+
+    def __init__(self, db=":memory:", readonly=False):
        # readonly is not used in the sqlite version of the cache
        self.dbname = db
        self.con = None
@@ -67,9 +68,9 @@ class SqliteCache:
        try:
            self.con.execute(sql, [blocks, mtime, path_str])
        except sqlite.OperationalError:
-            logging.warning('Picture cache could not set value for key %r', path_str)
+            logging.warning("Picture cache could not set value for key %r", path_str)
        except sqlite.DatabaseError as e:
-            logging.warning('DatabaseError while setting value for key %r: %s', path_str, str(e))
+            logging.warning("DatabaseError while setting value for key %r: %s", path_str, str(e))

    def _create_con(self, second_try=False):
        def create_tables():
@@ -87,14 +88,14 @@ class SqliteCache:
        except sqlite.DatabaseError as e:  # corrupted db
            if second_try:
                raise  # Something really strange is happening
-            logging.warning('Could not create picture cache because of an error: %s', str(e))
+            logging.warning("Could not create picture cache because of an error: %s", str(e))
            self.con.close()
            os.remove(self.dbname)
            self._create_con(second_try=True)

    def clear(self):
        self.close()
-        if self.dbname != ':memory:':
+        if self.dbname != ":memory:":
            os.remove(self.dbname)
        self._create_con()

@@ -117,7 +118,7 @@ class SqliteCache:
            raise ValueError(path)

    def get_multiple(self, rowids):
-        sql = "select rowid, blocks from pictures where rowid in (%s)" % ','.join(map(str, rowids))
+        sql = "select rowid, blocks from pictures where rowid in (%s)" % ",".join(map(str, rowids))
        cur = self.con.execute(sql)
        return ((rowid, string_to_colors(blocks)) for rowid, blocks in cur)

@@ -138,6 +139,5 @@ class SqliteCache:
                    continue
            todelete.append(rowid)
        if todelete:
-            sql = "delete from pictures where rowid in (%s)" % ','.join(map(str, todelete))
+            sql = "delete from pictures where rowid in (%s)" % ",".join(map(str, todelete))
            self.con.execute(sql)
-
--- a/core/pe/exif.py
+++ b/core/pe/exif.py
@@ -148,17 +148,18 @@ GPS_TA0GS = {
    0x1B: "GPSProcessingMethod",
    0x1C: "GPSAreaInformation",
    0x1D: "GPSDateStamp",
-    0x1E: "GPSDifferential"
+    0x1E: "GPSDifferential",
 }

-INTEL_ENDIAN = ord('I')
-MOTOROLA_ENDIAN = ord('M')
+INTEL_ENDIAN = ord("I")
+MOTOROLA_ENDIAN = ord("M")

 # About MAX_COUNT: It's possible to have corrupted exif tags where the entry count is way too high
 # and thus makes us loop, not endlessly, but for heck of a long time for nothing. Therefore, we put
 # an arbitrary limit on the entry count we'll allow ourselves to read and any IFD reporting more
 # entries than that will be considered corrupt.
-MAX_COUNT = 0xffff
+MAX_COUNT = 0xFFFF
+

 def s2n_motorola(bytes):
    x = 0
@@ -166,6 +167,7 @@ def s2n_motorola(bytes):
        x = (x << 8) | c
    return x

+
 def s2n_intel(bytes):
    x = 0
    y = 0
@@ -174,13 +176,14 @@ def s2n_intel(bytes):
        y = y + 8
    return x

+
 class Fraction:
    def __init__(self, num, den):
        self.num = num
        self.den = den

    def __repr__(self):
-        return '%d/%d' % (self.num, self.den)
+        return "%d/%d" % (self.num, self.den)


 class TIFF_file:
@@ -190,16 +193,22 @@ class TIFF_file:
        self.s2nfunc = s2n_intel if self.endian == INTEL_ENDIAN else s2n_motorola

    def s2n(self, offset, length, signed=0, debug=False):
-        slice = self.data[offset:offset+length]
-        val = self.s2nfunc(slice)
+        data_slice = self.data[offset : offset + length]
+        val = self.s2nfunc(data_slice)
        # Sign extension ?
        if signed:
-            msb = 1 << (8*length - 1)
+            msb = 1 << (8 * length - 1)
            if val & msb:
                val = val - (msb << 1)
        if debug:
            logging.debug(self.endian)
-            logging.debug("Slice for offset %d length %d: %r and value: %d", offset, length, slice, val)
+            logging.debug(
+                "Slice for offset %d length %d: %r and value: %d",
+                offset,
+                length,
+                data_slice,
+                val,
+            )
        return val

    def first_IFD(self):
@@ -225,82 +234,84 @@ class TIFF_file:
            return []
        a = []
        for i in range(entries):
-            entry = ifd + 2 + 12*i
+            entry = ifd + 2 + 12 * i
            tag = self.s2n(entry, 2)
-            type = self.s2n(entry+2, 2)
-            if not 1 <= type <= 10:
+            entry_type = self.s2n(entry + 2, 2)
+            if not 1 <= entry_type <= 10:
                continue  # not handled
-            typelen = [1, 1, 2, 4, 8, 1, 1, 2, 4, 8][type-1]
-            count = self.s2n(entry+4, 4)
+            typelen = [1, 1, 2, 4, 8, 1, 1, 2, 4, 8][entry_type - 1]
+            count = self.s2n(entry + 4, 4)
            if count > MAX_COUNT:
                logging.debug("Probably corrupt. Aborting.")
                return []
-            offset = entry+8
-            if count*typelen > 4:
+            offset = entry + 8
+            if count * typelen > 4:
                offset = self.s2n(offset, 4)
-            if type == 2:
+            if entry_type == 2:
                # Special case: nul-terminated ASCII string
-                values = str(self.data[offset:offset+count-1], encoding='latin-1')
+                values = str(self.data[offset : offset + count - 1], encoding="latin-1")
            else:
                values = []
-                signed = (type == 6 or type >= 8)
-                for j in range(count):
-                    if type in {5, 10}:
+                signed = entry_type == 6 or entry_type >= 8
+                for _ in range(count):
+                    if entry_type in {5, 10}:
                        # The type is either 5 or 10
-                        value_j = Fraction(self.s2n(offset, 4, signed),
-                                           self.s2n(offset+4, 4, signed))
+                        value_j = Fraction(self.s2n(offset, 4, signed), self.s2n(offset + 4, 4, signed))
                    else:
                        # Not a fraction
                        value_j = self.s2n(offset, typelen, signed)
                    values.append(value_j)
                    offset = offset + typelen
            # Now "values" is either a string or an array
-            a.append((tag, type, values))
+            a.append((tag, entry_type, values))
        return a

+
 def read_exif_header(fp):
    # If `fp`'s first bytes are not exif, it tries to find it in the next 4kb
    def isexif(data):
-        return data[0:4] == b'\377\330\377\341' and data[6:10] == b'Exif'
+        return data[0:4] == b"\377\330\377\341" and data[6:10] == b"Exif"
+
    data = fp.read(12)
    if isexif(data):
        return data
    # ok, not exif, try to find it
    large_data = fp.read(4096)
    try:
-        index = large_data.index(b'Exif')
-        data = large_data[index-6:index+6]
+        index = large_data.index(b"Exif")
+        data = large_data[index - 6 : index + 6]
        # large_data omits the first 12 bytes, and the index is at the middle of the header, so we
        # must seek index + 18
-        fp.seek(index+18)
+        fp.seek(index + 18)
        return data
    except ValueError:
        raise ValueError("Not an Exif file")

+
 def get_fields(fp):
    data = read_exif_header(fp)
    length = data[4] * 256 + data[5]
    logging.debug("Exif header length: %d bytes", length)
-    data = fp.read(length-8)
+    data = fp.read(length - 8)
    data_format = data[0]
-    logging.debug("%s format", {INTEL_ENDIAN: 'Intel', MOTOROLA_ENDIAN: 'Motorola'}[data_format])
+    logging.debug("%s format", {INTEL_ENDIAN: "Intel", MOTOROLA_ENDIAN: "Motorola"}[data_format])
    T = TIFF_file(data)
    # There may be more than one IFD per file, but we only read the first one because others are
    # most likely thumbnails.
-    main_IFD_offset = T.first_IFD()
+    main_ifd_offset = T.first_IFD()
    result = {}

    def add_tag_to_result(tag, values):
        try:
            stag = EXIF_TAGS[tag]
        except KeyError:
-            stag = '0x%04X' % tag
+            stag = "0x%04X" % tag
        if stag in result:
            return  # don't overwrite data
        result[stag] = values

-    logging.debug("IFD at offset %d", main_IFD_offset)
-    IFD = T.dump_IFD(main_IFD_offset)
+    logging.debug("IFD at offset %d", main_ifd_offset)
+    IFD = T.dump_IFD(main_ifd_offset)
    exif_off = gps_off = 0
    for tag, type, values in IFD:
        if tag == 0x8769:
--- a/core/pe/iphoto_plist.py
+++ b/core/pe/iphoto_plist.py
@@ -8,17 +8,19 @@

 import plistlib

+
 class IPhotoPlistParser(plistlib._PlistParser):
    """A parser for iPhoto plists.

    iPhoto plists tend to be malformed, so we have to subclass the built-in parser to be a bit more
    lenient.
    """
+
    def __init__(self):
        plistlib._PlistParser.__init__(self, use_builtin_types=True, dict_type=dict)
        # For debugging purposes, we remember the last bit of data to be analyzed so that we can
        # log it in case of an exception
-        self.lastdata = ''
+        self.lastdata = ""

    def get_data(self):
        self.lastdata = plistlib._PlistParser.get_data(self)
--- a/core/pe/matchblock.py
+++ b/core/pe/matchblock.py
@@ -48,14 +48,18 @@ except Exception:
    logging.warning("Had problems to determine cpu count on launch.")
    RESULTS_QUEUE_LIMIT = 8

+
 def get_cache(cache_path, readonly=False):
-    if cache_path.endswith('shelve'):
+    if cache_path.endswith("shelve"):
        from .cache_shelve import ShelveCache
+
        return ShelveCache(cache_path, readonly=readonly)
    else:
        from .cache_sqlite import SqliteCache
+
        return SqliteCache(cache_path, readonly=readonly)

+
 def prepare_pictures(pictures, cache_path, with_dimensions, j=job.nulljob):
    # The MemoryError handlers in there use logging without first caring about whether or not
    # there is enough memory left to carry on the operation because it is assumed that the
@@ -86,14 +90,19 @@ def prepare_pictures(pictures, cache_path, with_dimensions, j=job.nulljob):
            except (IOError, ValueError) as e:
                logging.warning(str(e))
            except MemoryError:
-                logging.warning("Ran out of memory while reading %s of size %d", picture.unicode_path, picture.size)
+                logging.warning(
+                    "Ran out of memory while reading %s of size %d",
+                    picture.unicode_path,
+                    picture.size,
+                )
                if picture.size < 10 * 1024 * 1024:  # We're really running out of memory
                    raise
    except MemoryError:
-        logging.warning('Ran out of memory while preparing pictures')
+        logging.warning("Ran out of memory while preparing pictures")
    cache.close()
    return prepared

+
 def get_chunks(pictures):
    min_chunk_count = multiprocessing.cpu_count() * 2  # have enough chunks to feed all subprocesses
    chunk_count = len(pictures) // DEFAULT_CHUNK_SIZE
@@ -101,17 +110,21 @@ def get_chunks(pictures):
    chunk_size = (len(pictures) // chunk_count) + 1
    chunk_size = max(MIN_CHUNK_SIZE, chunk_size)
    logging.info(
-        "Creating %d chunks with a chunk size of %d for %d pictures", chunk_count,
-        chunk_size, len(pictures)
+        "Creating %d chunks with a chunk size of %d for %d pictures",
+        chunk_count,
+        chunk_size,
+        len(pictures),
    )
-    chunks = [pictures[i:i+chunk_size] for i in range(0, len(pictures), chunk_size)]
+    chunks = [pictures[i : i + chunk_size] for i in range(0, len(pictures), chunk_size)]
    return chunks

+
 def get_match(first, second, percentage):
    if percentage < 0:
        percentage = 0
    return Match(first, second, percentage)

+
 def async_compare(ref_ids, other_ids, dbname, threshold, picinfo):
    # The list of ids in ref_ids have to be compared to the list of ids in other_ids. other_ids
    # can be None. In this case, ref_ids has to be compared with itself
@@ -142,6 +155,7 @@ def async_compare(ref_ids, other_ids, dbname, threshold, picinfo):
    cache.close()
    return results

+
 def getmatches(pictures, cache_path, threshold, match_scaled=False, j=job.nulljob):
    def get_picinfo(p):
        if match_scaled:
@@ -160,7 +174,10 @@ def getmatches(pictures, cache_path, threshold, match_scaled=False, j=job.nulljo
                async_results.remove(result)
                comparison_count += 1
        # About the NOQA below: I think there's a bug in pyflakes. To investigate...
-        progress_msg = tr("Performed %d/%d chunk matches") % (comparison_count, len(comparisons_to_do)) # NOQA
+        progress_msg = tr("Performed %d/%d chunk matches") % (
+            comparison_count,
+            len(comparisons_to_do),
+        )  # NOQA
        j.set_progress(comparison_count, progress_msg)

    j = j.start_subjob([3, 7])
@@ -175,7 +192,7 @@ def getmatches(pictures, cache_path, threshold, match_scaled=False, j=job.nulljo
        except ValueError:
            pass
    cache.close()
-    pictures = [p for p in pictures if hasattr(p, 'cache_id')]
+    pictures = [p for p in pictures if hasattr(p, "cache_id")]
    pool = multiprocessing.Pool()
    async_results = []
    matches = []
@@ -203,9 +220,13 @@ def getmatches(pictures, cache_path, threshold, match_scaled=False, j=job.nulljo
        # some wiggle room, log about the incident, and stop matching right here. We then process
        # the matches we have. The rest of the process doesn't allocate much and we should be
        # alright.
-        del comparisons_to_do, chunks, pictures # some wiggle room for the next statements
+        del (
+            comparisons_to_do,
+            chunks,
+            pictures,
+        )  # some wiggle room for the next statements
        logging.warning("Ran out of memory when scanning! We had %d matches.", len(matches))
-        del matches[-len(matches)//3:] # some wiggle room to ensure we don't run out of memory again.
+        del matches[-len(matches) // 3 :]  # some wiggle room to ensure we don't run out of memory again.
    pool.close()
    result = []
    myiter = j.iter_with_progress(
@@ -223,7 +244,8 @@ def getmatches(pictures, cache_path, threshold, match_scaled=False, j=job.nulljo
            ref.dimensions  # pre-read dimensions for display in results
            other.dimensions
            result.append(get_match(ref, other, percentage))
+    pool.join()
    return result

-multiprocessing.freeze_support()

+multiprocessing.freeze_support()
--- a/core/pe/matchexif.py
+++ b/core/pe/matchexif.py
@@ -13,14 +13,15 @@ from hscommon.trans import tr

 from core.engine import Match

+
 def getmatches(files, match_scaled, j):
    timestamp2pic = defaultdict(set)
    for picture in j.iter_with_progress(files, tr("Read EXIF of %d/%d pictures")):
        timestamp = picture.exif_timestamp
        if timestamp:
            timestamp2pic[timestamp].add(picture)
-    if '0000:00:00 00:00:00' in timestamp2pic: # very likely false matches
-        del timestamp2pic['0000:00:00 00:00:00']
+    if "0000:00:00 00:00:00" in timestamp2pic:  # very likely false matches
+        del timestamp2pic["0000:00:00 00:00:00"]
    matches = []
    for pictures in timestamp2pic.values():
        for p1, p2 in combinations(pictures, 2):
@@ -28,4 +29,3 @@ def getmatches(files, match_scaled, j):
                continue
            matches.append(Match(p1, p2, 100))
    return matches
-
--- a/core/pe/modules/block.c
+++ b/core/pe/modules/block.c
@@ -2,9 +2,9 @@
 * Created On: 2010-01-30
 * Copyright 2014 Hardcoded Software (http://www.hardcoded.net)
 *
- * This software is licensed under the "BSD" License as described in the "LICENSE" file, 
- * which should be included with this package. The terms are also available at 
- * http://www.hardcoded.net/licenses/bsd_license
+ * This software is licensed under the "BSD" License as described in the
+ * "LICENSE" file, which should be included with this package. The terms are
+ * also available at http://www.hardcoded.net/licenses/bsd_license
 */

 #include "common.h"
@@ -17,8 +17,7 @@ static PyObject *DifferentBlockCountError;
 /* Returns a 3 sized tuple containing the mean color of 'image'.
 * image: a PIL image or crop.
 */
-static PyObject* getblock(PyObject *image)
-{
+static PyObject *getblock(PyObject *image) {
  int i, totr, totg, totb;
  Py_ssize_t pixel_count;
  PyObject *ppixels;
@@ -30,7 +29,7 @@ static PyObject* getblock(PyObject *image)
  }

  pixel_count = PySequence_Length(ppixels);
-    for (i=0; i<pixel_count; i++) {
+  for (i = 0; i < pixel_count; i++) {
    PyObject *ppixel, *pr, *pg, *pb;
    int r, g, b;

@@ -65,8 +64,7 @@ static PyObject* getblock(PyObject *image)
 /* Returns the difference between the first block and the second.
 * It returns an absolute sum of the 3 differences (RGB).
 */
-static int diff(PyObject *first, PyObject *second)
-{
+static int diff(PyObject *first, PyObject *second) {
  int r1, g1, b1, r2, b2, g2;
  PyObject *pr, *pg, *pb;
  pr = PySequence_ITEM(first, 0);
@@ -93,7 +91,7 @@ static int diff(PyObject *first, PyObject *second)
 }

 PyDoc_STRVAR(block_getblocks2_doc,
-"Returns a list of blocks (3 sized tuples).\n\
+             "Returns a list of blocks (3 sized tuples).\n\
 \n\
 image: A PIL image to base the blocks on.\n\
 block_count_per_side: This integer determine the number of blocks the function will return.\n\
@@ -101,8 +99,7 @@ If it is 10, for example, 100 blocks will be returns (10 width, 10 height). The
 necessarely cover square areas. The area covered by each block will be proportional to the image\n\
 itself.\n");

-static PyObject* block_getblocks2(PyObject *self, PyObject *args)
-{
+static PyObject *block_getblocks2(PyObject *self, PyObject *args) {
  int block_count_per_side, width, height, block_width, block_height, ih;
  PyObject *image;
  PyObject *pimage_size, *pwidth, *pheight;
@@ -128,23 +125,23 @@ static PyObject* block_getblocks2(PyObject *self, PyObject *args)
  block_width = max(width / block_count_per_side, 1);
  block_height = max(height / block_count_per_side, 1);

-    result = PyList_New(block_count_per_side * block_count_per_side);
+  result = PyList_New((Py_ssize_t)block_count_per_side * block_count_per_side);
  if (result == NULL) {
    return NULL;
  }

-    for (ih=0; ih<block_count_per_side; ih++) {
+  for (ih = 0; ih < block_count_per_side; ih++) {
    int top, bottom, iw;
-        top = min(ih*block_height, height-block_height);
+    top = min(ih * block_height, height - block_height);
    bottom = top + block_height;
-        for (iw=0; iw<block_count_per_side; iw++) {
+    for (iw = 0; iw < block_count_per_side; iw++) {
      int left, right;
      PyObject *pbox;
      PyObject *pmethodname;
      PyObject *pcrop;
      PyObject *pblock;

-            left = min(iw*block_width, width-block_width);
+      left = min(iw * block_width, width - block_width);
      right = left + block_width;
      pbox = inttuple(4, left, top, right, bottom);
      pmethodname = PyUnicode_FromString("crop");
@@ -161,7 +158,7 @@ static PyObject* block_getblocks2(PyObject *self, PyObject *args)
        Py_DECREF(result);
        return NULL;
      }
-            PyList_SET_ITEM(result, ih*block_count_per_side+iw, pblock);
+      PyList_SET_ITEM(result, ih * block_count_per_side + iw, pblock);
    }
  }

@@ -169,19 +166,19 @@ static PyObject* block_getblocks2(PyObject *self, PyObject *args)
 }

 PyDoc_STRVAR(block_avgdiff_doc,
-"Returns the average diff between first blocks and seconds.\n\
+             "Returns the average diff between first blocks and seconds.\n\
 \n\
 If the result surpasses limit, limit + 1 is returned, except if less than min_iterations\n\
 iterations have been made in the blocks.\n");

-static PyObject* block_avgdiff(PyObject *self, PyObject *args)
-{
+static PyObject *block_avgdiff(PyObject *self, PyObject *args) {
  PyObject *first, *second;
  int limit, min_iterations;
  Py_ssize_t count;
  int sum, i, result;

-    if (!PyArg_ParseTuple(args, "OOii", &first, &second, &limit, &min_iterations)) {
+  if (!PyArg_ParseTuple(args, "OOii", &first, &second, &limit,
+                        &min_iterations)) {
    return NULL;
  }

@@ -196,7 +193,7 @@ static PyObject* block_avgdiff(PyObject *self, PyObject *args)
  }

  sum = 0;
-    for (i=0; i<count; i++) {
+  for (i = 0; i < count; i++) {
    int iteration_count;
    PyObject *item1, *item2;

@@ -206,7 +203,8 @@ static PyObject* block_avgdiff(PyObject *self, PyObject *args)
    sum += diff(item1, item2);
    Py_DECREF(item1);
    Py_DECREF(item2);
-        if ((sum > limit*iteration_count) && (iteration_count >= min_iterations)) {
+    if ((sum > limit * iteration_count) &&
+        (iteration_count >= min_iterations)) {
      return PyLong_FromLong(limit + 1);
    }
  }
@@ -224,8 +222,7 @@ static PyMethodDef BlockMethods[] = {
    {NULL, NULL, 0, NULL} /* Sentinel */
 };

-static struct PyModuleDef BlockDef = {
-    PyModuleDef_HEAD_INIT,
+static struct PyModuleDef BlockDef = {PyModuleDef_HEAD_INIT,
                                      "_block",
                                      NULL,
                                      -1,
@@ -233,12 +230,9 @@ static struct PyModuleDef BlockDef = {
                                      NULL,
                                      NULL,
                                      NULL,
-    NULL
-};
+                                      NULL};

-PyObject *
-PyInit__block(void)
-{
+PyObject *PyInit__block(void) {
  PyObject *m = PyModule_Create(&BlockDef);
  if (m == NULL) {
    return NULL;
@@ -246,7 +240,8 @@ PyInit__block(void)

  NoBlocksError = PyErr_NewException("_block.NoBlocksError", NULL, NULL);
  PyModule_AddObject(m, "NoBlocksError", NoBlocksError);
-    DifferentBlockCountError = PyErr_NewException("_block.DifferentBlockCountError", NULL, NULL);
+  DifferentBlockCountError =
+      PyErr_NewException("_block.DifferentBlockCountError", NULL, NULL);
  PyModule_AddObject(m, "DifferentBlockCountError", DifferentBlockCountError);

  return m;
--- a/core/pe/modules/block_osx.m
+++ b/core/pe/modules/block_osx.m
@@ -10,6 +10,8 @@
 #include "common.h"

 #import <Foundation/Foundation.h>
+#import <CoreGraphics/CoreGraphics.h>
+#import <ImageIO/ImageIO.h>

 #define RADIANS( degrees ) ( degrees * M_PI / 180 )

--- a/core/pe/photo.py
+++ b/core/pe/photo.py
@@ -14,23 +14,22 @@ from . import exif
 # This global value is set by the platform-specific subclasser of the Photo base class
 PLAT_SPECIFIC_PHOTO_CLASS = None

+
 def format_dimensions(dimensions):
-    return '%d x %d' % (dimensions[0], dimensions[1])
+    return "%d x %d" % (dimensions[0], dimensions[1])
+

 def get_delta_dimensions(value, ref_value):
-    return (value[0]-ref_value[0], value[1]-ref_value[1])
+    return (value[0] - ref_value[0], value[1] - ref_value[1])


 class Photo(fs.File):
    INITIAL_INFO = fs.File.INITIAL_INFO.copy()
-    INITIAL_INFO.update({
-        'dimensions': (0, 0),
-        'exif_timestamp': '',
-    })
+    INITIAL_INFO.update({"dimensions": (0, 0), "exif_timestamp": ""})
    __slots__ = fs.File.__slots__ + tuple(INITIAL_INFO.keys())

    # These extensions are supported on all platforms
-    HANDLED_EXTS = {'png', 'jpg', 'jpeg', 'gif', 'bmp', 'tiff', 'tif'}
+    HANDLED_EXTS = {"png", "jpg", "jpeg", "gif", "bmp", "tiff", "tif"}

    def _plat_get_dimensions(self):
        raise NotImplementedError()
@@ -39,12 +38,12 @@ class Photo(fs.File):
        raise NotImplementedError()

    def _get_orientation(self):
-        if not hasattr(self, '_cached_orientation'):
+        if not hasattr(self, "_cached_orientation"):
            try:
-                with self.path.open('rb') as fp:
+                with self.path.open("rb") as fp:
                    exifdata = exif.get_fields(fp)
                    # the value is a list (probably one-sized) of ints
-                    orientations = exifdata['Orientation']
+                    orientations = exifdata["Orientation"]
                    self._cached_orientation = orientations[0]
            except Exception:  # Couldn't read EXIF data, no transforms
                self._cached_orientation = 0
@@ -52,12 +51,12 @@ class Photo(fs.File):

    def _get_exif_timestamp(self):
        try:
-            with self.path.open('rb') as fp:
+            with self.path.open("rb") as fp:
                exifdata = exif.get_fields(fp)
-                return exifdata['DateTimeOriginal']
+                return exifdata["DateTimeOriginal"]
        except Exception:
            logging.info("Couldn't read EXIF of picture: %s", self.path)
-        return ''
+        return ""

    @classmethod
    def can_handle(cls, path):
@@ -79,28 +78,27 @@ class Photo(fs.File):
        else:
            percentage = group.percentage
            dupe_count = len(group.dupes)
-        dupe_folder_path = getattr(self, 'display_folder_path', self.folder_path)
+        dupe_folder_path = getattr(self, "display_folder_path", self.folder_path)
        return {
-            'name': self.name,
-            'folder_path': str(dupe_folder_path),
-            'size': format_size(size, 0, 1, False),
-            'extension': self.extension,
-            'dimensions': format_dimensions(dimensions),
-            'exif_timestamp': self.exif_timestamp,
-            'mtime': format_timestamp(mtime, delta and m),
-            'percentage': format_perc(percentage),
-            'dupe_count': format_dupe_count(dupe_count),
+            "name": self.name,
+            "folder_path": str(dupe_folder_path),
+            "size": format_size(size, 0, 1, False),
+            "extension": self.extension,
+            "dimensions": format_dimensions(dimensions),
+            "exif_timestamp": self.exif_timestamp,
+            "mtime": format_timestamp(mtime, delta and m),
+            "percentage": format_perc(percentage),
+            "dupe_count": format_dupe_count(dupe_count),
        }

    def _read_info(self, field):
        fs.File._read_info(self, field)
-        if field == 'dimensions':
+        if field == "dimensions":
            self.dimensions = self._plat_get_dimensions()
            if self._get_orientation() in {5, 6, 7, 8}:
                self.dimensions = (self.dimensions[1], self.dimensions[0])
-        elif field == 'exif_timestamp':
+        elif field == "exif_timestamp":
            self.exif_timestamp = self._get_exif_timestamp()

    def get_blocks(self, block_count_per_side):
        return self._plat_get_blocks(block_count_per_side, self._get_orientation())
-
--- a/core/pe/prioritize.py
+++ b/core/pe/prioritize.py
@@ -8,11 +8,16 @@
 from hscommon.trans import trget

 from core.prioritize import (
-    KindCategory, FolderCategory, FilenameCategory, NumericalCategory,
-    SizeCategory, MtimeCategory
+    KindCategory,
+    FolderCategory,
+    FilenameCategory,
+    NumericalCategory,
+    SizeCategory,
+    MtimeCategory,
 )

-coltr = trget('columns')
+coltr = trget("columns")
+

 class DimensionsCategory(NumericalCategory):
    NAME = coltr("Dimensions")
@@ -24,8 +29,13 @@ class DimensionsCategory(NumericalCategory):
        width, height = value
        return (-width, -height)

+
 def all_categories():
    return [
-        KindCategory, FolderCategory, FilenameCategory, SizeCategory, DimensionsCategory,
-        MtimeCategory
+        KindCategory,
+        FolderCategory,
+        FilenameCategory,
+        SizeCategory,
+        DimensionsCategory,
+        MtimeCategory,
    ]
--- a/core/pe/result_table.py
+++ b/core/pe/result_table.py
@@ -10,19 +10,20 @@ from hscommon.trans import trget

 from core.gui.result_table import ResultTable as ResultTableBase

-coltr = trget('columns')
+coltr = trget("columns")
+

 class ResultTable(ResultTableBase):
    COLUMNS = [
-        Column('marked', ''),
-        Column('name', coltr("Filename")),
-        Column('folder_path', coltr("Folder"), optional=True),
-        Column('size', coltr("Size (KB)"), optional=True),
-        Column('extension', coltr("Kind"), visible=False, optional=True),
-        Column('dimensions', coltr("Dimensions"), optional=True),
-        Column('exif_timestamp', coltr("EXIF Timestamp"), visible=False, optional=True),
-        Column('mtime', coltr("Modification"), visible=False, optional=True),
-        Column('percentage', coltr("Match %"), optional=True),
-        Column('dupe_count', coltr("Dupe Count"), visible=False, optional=True),
+        Column("marked", ""),
+        Column("name", coltr("Filename")),
+        Column("folder_path", coltr("Folder"), optional=True),
+        Column("size", coltr("Size (KB)"), optional=True),
+        Column("extension", coltr("Kind"), visible=False, optional=True),
+        Column("dimensions", coltr("Dimensions"), optional=True),
+        Column("exif_timestamp", coltr("EXIF Timestamp"), visible=False, optional=True),
+        Column("mtime", coltr("Modification"), visible=False, optional=True),
+        Column("percentage", coltr("Match %"), optional=True),
+        Column("dupe_count", coltr("Dupe Count"), visible=False, optional=True),
    ]
-    DELTA_COLUMNS = {'size', 'dimensions', 'mtime'}
+    DELTA_COLUMNS = {"size", "dimensions", "mtime"}
--- a/core/pe/scanner.py
+++ b/core/pe/scanner.py
@@ -10,6 +10,7 @@ from core.scanner import Scanner, ScanType, ScanOption

 from . import matchblock, matchexif

+
 class ScannerPE(Scanner):
    cache_path = None
    match_scaled = False
@@ -17,21 +18,20 @@ class ScannerPE(Scanner):
    @staticmethod
    def get_scan_options():
        return [
-            ScanOption(ScanType.FuzzyBlock, tr("Contents")),
-            ScanOption(ScanType.ExifTimestamp, tr("EXIF Timestamp")),
+            ScanOption(ScanType.FUZZYBLOCK, tr("Contents")),
+            ScanOption(ScanType.EXIFTIMESTAMP, tr("EXIF Timestamp")),
        ]

    def _getmatches(self, files, j):
-        if self.scan_type == ScanType.FuzzyBlock:
+        if self.scan_type == ScanType.FUZZYBLOCK:
            return matchblock.getmatches(
                files,
                cache_path=self.cache_path,
                threshold=self.min_match_percentage,
                match_scaled=self.match_scaled,
-                j=j
+                j=j,
            )
-        elif self.scan_type == ScanType.ExifTimestamp:
+        elif self.scan_type == ScanType.EXIFTIMESTAMP:
            return matchexif.getmatches(files, self.match_scaled, j)
        else:
-            raise Exception("Invalid scan type")
-
+            raise ValueError("Invalid scan type")
--- a/core/prioritize.py
+++ b/core/prioritize.py
@@ -9,7 +9,8 @@
 from hscommon.util import dedupe, flatten, rem_file_ext
 from hscommon.trans import trget, tr

-coltr = trget('columns')
+coltr = trget("columns")
+

 class CriterionCategory:
    NAME = "Undefined"
@@ -17,7 +18,7 @@ class CriterionCategory:
    def __init__(self, results):
        self.results = results

-    #--- Virtual
+    # --- Virtual
    def extract_value(self, dupe):
        raise NotImplementedError()

@@ -30,6 +31,7 @@ class CriterionCategory:
    def criteria_list(self):
        raise NotImplementedError()

+
 class Criterion:
    def __init__(self, category, value):
        self.category = category
@@ -68,6 +70,7 @@ class KindCategory(ValueListCategory):
            value = tr("None")
        return value

+
 class FolderCategory(ValueListCategory):
    NAME = coltr("Folder")

@@ -79,11 +82,12 @@ class FolderCategory(ValueListCategory):

    def sort_key(self, dupe, crit_value):
        value = self.extract_value(dupe)
-        if value[:len(crit_value)] == crit_value:
+        if value[: len(crit_value)] == crit_value:
            return 0
        else:
            return 1

+
 class FilenameCategory(CriterionCategory):
    NAME = coltr("Filename")
    ENDS_WITH_NUMBER = 0
@@ -117,12 +121,16 @@ class FilenameCategory(CriterionCategory):
            return value

    def criteria_list(self):
-        return [Criterion(self, crit_value) for crit_value in [
+        return [
+            Criterion(self, crit_value)
+            for crit_value in [
                self.ENDS_WITH_NUMBER,
                self.DOESNT_END_WITH_NUMBER,
                self.LONGEST,
                self.SHORTEST,
-        ]]
+            ]
+        ]
+

 class NumericalCategory(CriterionCategory):
    HIGHEST = 0
@@ -143,12 +151,14 @@ class NumericalCategory(CriterionCategory):
    def criteria_list(self):
        return [Criterion(self, self.HIGHEST), Criterion(self, self.LOWEST)]

+
 class SizeCategory(NumericalCategory):
    NAME = coltr("Size")

    def extract_value(self, dupe):
        return dupe.size

+
 class MtimeCategory(NumericalCategory):
    NAME = coltr("Modification")

@@ -158,5 +168,6 @@ class MtimeCategory(NumericalCategory):
    def format_criterion_value(self, value):
        return tr("Newest") if value == self.HIGHEST else tr("Oldest")

+
 def all_categories():
    return [KindCategory, FolderCategory, FilenameCategory, SizeCategory, MtimeCategory]
--- a/core/results.py
+++ b/core/results.py
@@ -20,6 +20,7 @@ from hscommon.trans import tr
 from . import engine
 from .markable import Markable

+
 class Results(Markable):
    """Manages a collection of duplicate :class:`~core.engine.Group`.

@@ -34,7 +35,8 @@ class Results(Markable):
        A list of all duplicates (:class:`~core.fs.File` instances), without ref, contained in the
        currently managed :attr:`groups`.
    """
-    #---Override
+
+    # ---Override
    def __init__(self, app):
        Markable.__init__(self)
        self.__groups = []
@@ -50,6 +52,7 @@ class Results(Markable):
        self.app = app
        self.problems = []  # (dupe, error_msg)
        self.is_modified = False
+        self.refresh_required = False

    def _did_mark(self, dupe):
        self.__marked_size += dupe.size
@@ -90,15 +93,17 @@ class Results(Markable):
        else:
            Markable.mark_none(self)

-    #---Private
+    # ---Private
    def __get_dupe_list(self):
-        if self.__dupes is None:
+        if self.__dupes is None or self.refresh_required:
            self.__dupes = flatten(group.dupes for group in self.groups)
+            self.refresh_required = False
            if None in self.__dupes:
                # This is debug logging to try to figure out #44
                logging.warning(
                    "There is a None value in the Results' dupe list. dupes: %r groups: %r",
-                    self.__dupes, self.groups
+                    self.__dupes,
+                    self.groups,
                )
            if self.__filtered_dupes:
                self.__dupes = [dupe for dupe in self.__dupes if dupe in self.__filtered_dupes]
@@ -133,7 +138,7 @@ class Results(Markable):
            format_size(total_size, 2),
        )
        if self.__filters:
-            result += tr(" filter: %s") % ' --> '.join(self.__filters)
+            result += tr(" filter: %s") % " --> ".join(self.__filters)
        return result

    def __recalculate_stats(self):
@@ -151,7 +156,7 @@ class Results(Markable):
        for g in self.__groups:
            for dupe in g:
                self.__group_of_duplicate[dupe] = g
-                if not hasattr(dupe, 'is_ref'):
+                if not hasattr(dupe, "is_ref"):
                    dupe.is_ref = False
        self.is_modified = bool(self.__groups)
        old_filters = nonone(self.__filters, [])
@@ -159,7 +164,7 @@ class Results(Markable):
        for filter_str in old_filters:
            self.apply_filter(filter_str)

-    #---Public
+    # ---Public
    def apply_filter(self, filter_str):
        """Applies a filter ``filter_str`` to :attr:`groups`

@@ -198,8 +203,7 @@ class Results(Markable):
        self.__dupes = None

    def get_group_of_duplicate(self, dupe):
-        """Returns :class:`~core.engine.Group` in which ``dupe`` belongs.
-        """
+        """Returns :class:`~core.engine.Group` in which ``dupe`` belongs."""
        try:
            return self.__group_of_duplicate[dupe]
        except (TypeError, KeyError):
@@ -214,6 +218,7 @@ class Results(Markable):
        :param get_file: a function f(path) returning a :class:`~core.fs.File` wrapping the path.
        :param j: A :ref:`job progress instance <jobs>`.
        """
+
        def do_match(ref_file, other_files, group):
            if not other_files:
                return
@@ -223,31 +228,31 @@ class Results(Markable):

        self.apply_filter(None)
        root = ET.parse(infile).getroot()
-        group_elems = list(root.getiterator('group'))
+        group_elems = list(root.iter("group"))
        groups = []
        marked = set()
        for group_elem in j.iter_with_progress(group_elems, every=100):
            group = engine.Group()
            dupes = []
-            for file_elem in group_elem.getiterator('file'):
-                path = file_elem.get('path')
-                words = file_elem.get('words', '')
+            for file_elem in group_elem.iter("file"):
+                path = file_elem.get("path")
+                words = file_elem.get("words", "")
                if not path:
                    continue
                file = get_file(path)
                if file is None:
                    continue
-                file.words = words.split(',')
-                file.is_ref = file_elem.get('is_ref') == 'y'
+                file.words = words.split(",")
+                file.is_ref = file_elem.get("is_ref") == "y"
                dupes.append(file)
-                if file_elem.get('marked') == 'y':
+                if file_elem.get("marked") == "y":
                    marked.add(file)
-            for match_elem in group_elem.getiterator('match'):
+            for match_elem in group_elem.iter("match"):
                try:
                    attrs = match_elem.attrib
-                    first_file = dupes[int(attrs['first'])]
-                    second_file = dupes[int(attrs['second'])]
-                    percentage = int(attrs['percentage'])
+                    first_file = dupes[int(attrs["first"])]
+                    second_file = dupes[int(attrs["second"])]
+                    percentage = int(attrs["percentage"])
                    group.add_match(engine.Match(first_file, second_file, percentage))
                except (IndexError, KeyError, ValueError):
                    # Covers missing attr, non-int values and indexes out of bounds
@@ -264,8 +269,7 @@ class Results(Markable):
        self.is_modified = False

    def make_ref(self, dupe):
-        """Make ``dupe`` take the :attr:`~core.engine.Group.ref` position of its group.
-        """
+        """Make ``dupe`` take the :attr:`~core.engine.Group.ref` position of its group."""
        g = self.get_group_of_duplicate(dupe)
        r = g.ref
        if not g.switch_ref(dupe):
@@ -339,9 +343,9 @@ class Results(Markable):
        :param outfile: file object or path.
        """
        self.apply_filter(None)
-        root = ET.Element('results')
+        root = ET.Element("results")
        for g in self.groups:
-            group_elem = ET.SubElement(root, 'group')
+            group_elem = ET.SubElement(root, "group")
            dupe2index = {}
            for index, d in enumerate(g):
                dupe2index[d] = index
@@ -349,24 +353,24 @@ class Results(Markable):
                    words = engine.unpack_fields(d.words)
                except AttributeError:
                    words = ()
-                file_elem = ET.SubElement(group_elem, 'file')
+                file_elem = ET.SubElement(group_elem, "file")
                try:
-                    file_elem.set('path', str(d.path))
-                    file_elem.set('words', ','.join(words))
+                    file_elem.set("path", str(d.path))
+                    file_elem.set("words", ",".join(words))
                except ValueError:  # If there's an invalid character, just skip the file
-                    file_elem.set('path', '')
-                file_elem.set('is_ref', ('y' if d.is_ref else 'n'))
-                file_elem.set('marked', ('y' if self.is_marked(d) else 'n'))
+                    file_elem.set("path", "")
+                file_elem.set("is_ref", ("y" if d.is_ref else "n"))
+                file_elem.set("marked", ("y" if self.is_marked(d) else "n"))
            for match in g.matches:
-                match_elem = ET.SubElement(group_elem, 'match')
-                match_elem.set('first', str(dupe2index[match.first]))
-                match_elem.set('second', str(dupe2index[match.second]))
-                match_elem.set('percentage', str(int(match.percentage)))
+                match_elem = ET.SubElement(group_elem, "match")
+                match_elem.set("first", str(dupe2index[match.first]))
+                match_elem.set("second", str(dupe2index[match.second]))
+                match_elem.set("percentage", str(int(match.percentage)))
        tree = ET.ElementTree(root)

        def do_write(outfile):
-            with FileOrPath(outfile, 'wb') as fp:
-                tree.write(fp, encoding='utf-8')
+            with FileOrPath(outfile, "wb") as fp:
+                tree.write(fp, encoding="utf-8")

        try:
            do_write(outfile)
@@ -392,8 +396,10 @@ class Results(Markable):
        """
        if not self.__dupes:
            self.__get_dupe_list()
-        keyfunc = lambda d: self.app._get_dupe_sort_key(d, lambda: self.get_group_of_duplicate(d), key, delta)
-        self.__dupes.sort(key=keyfunc, reverse=not asc)
+        self.__dupes.sort(
+            key=lambda d: self.app._get_dupe_sort_key(d, lambda: self.get_group_of_duplicate(d), key, delta),
+            reverse=not asc,
+        )
        self.__dupes_sort_descriptor = (key, asc, delta)

    def sort_groups(self, key, asc=True):
@@ -404,12 +410,10 @@ class Results(Markable):
        :param str key: key attribute name to sort with.
        :param bool asc: If false, sorting is reversed.
        """
-        keyfunc = lambda g: self.app._get_group_sort_key(g, key)
-        self.groups.sort(key=keyfunc, reverse=not asc)
+        self.groups.sort(key=lambda g: self.app._get_group_sort_key(g, key), reverse=not asc)
        self.__groups_sort_descriptor = (key, asc)

-    #---Properties
+    # ---Properties
    dupes = property(__get_dupe_list)
    groups = property(__get_groups, __set_groups)
    stat_line = property(__get_stat_line)
-
--- a/core/scanner.py
+++ b/core/scanner.py
@@ -19,31 +19,35 @@ from . import engine
 # there will be some nasty bugs popping up (ScanType is used in core when in should exclusively be
 # used in core_*). One day I'll clean this up.

+
 class ScanType:
-    Filename = 0
-    Fields = 1
-    FieldsNoOrder = 2
-    Tag = 3
-    Folders = 4
-    Contents = 5
+    FILENAME = 0
+    FIELDS = 1
+    FIELDSNOORDER = 2
+    TAG = 3
+    FOLDERS = 4
+    CONTENTS = 5

-    #PE
-    FuzzyBlock = 10
-    ExifTimestamp = 11
+    # PE
+    FUZZYBLOCK = 10
+    EXIFTIMESTAMP = 11

-ScanOption = namedtuple('ScanOption', 'scan_type label')

-SCANNABLE_TAGS = ['track', 'artist', 'album', 'title', 'genre', 'year']
+ScanOption = namedtuple("ScanOption", "scan_type label")
+
+SCANNABLE_TAGS = ["track", "artist", "album", "title", "genre", "year"]
+
+RE_DIGIT_ENDING = re.compile(r"\d+|\(\d+\)|\[\d+\]|{\d+}")

-RE_DIGIT_ENDING = re.compile(r'\d+|\(\d+\)|\[\d+\]|{\d+}')

 def is_same_with_digit(name, refname):
    # Returns True if name is the same as refname, but with digits (with brackets or not) at the end
    if not name.startswith(refname):
        return False
-    end = name[len(refname):].strip()
+    end = name[len(refname) :].strip()
    return RE_DIGIT_ENDING.match(end) is not None

+
 def remove_dupe_paths(files):
    # Returns files with duplicates-by-path removed. Files with the exact same path are considered
    # duplicates and only the first file to have a path is kept. In certain cases, we have files
@@ -67,32 +71,43 @@ def remove_dupe_paths(files):
        result.append(f)
    return result

+
 class Scanner:
    def __init__(self):
        self.discarded_file_count = 0

    def _getmatches(self, files, j):
-        if self.size_threshold or self.scan_type in {ScanType.Contents, ScanType.Folders}:
+        if (
+            self.size_threshold
+            or self.large_size_threshold
+            or self.scan_type
+            in {
+                ScanType.CONTENTS,
+                ScanType.FOLDERS,
+            }
+        ):
            j = j.start_subjob([2, 8])
            for f in j.iter_with_progress(files, tr("Read size of %d/%d files")):
                f.size  # pre-read, makes a smoother progress if read here (especially for bundles)
            if self.size_threshold:
                files = [f for f in files if f.size >= self.size_threshold]
-        if self.scan_type in {ScanType.Contents, ScanType.Folders}:
-            return engine.getmatches_by_contents(files, j=j)
+            if self.large_size_threshold:
+                files = [f for f in files if f.size <= self.large_size_threshold]
+        if self.scan_type in {ScanType.CONTENTS, ScanType.FOLDERS}:
+            return engine.getmatches_by_contents(files, bigsize=self.big_file_size_threshold, j=j)
        else:
            j = j.start_subjob([2, 8])
            kw = {}
-            kw['match_similar_words'] = self.match_similar_words
-            kw['weight_words'] = self.word_weighting
-            kw['min_match_percentage'] = self.min_match_percentage
-            if self.scan_type == ScanType.FieldsNoOrder:
-                self.scan_type = ScanType.Fields
-                kw['no_field_order'] = True
+            kw["match_similar_words"] = self.match_similar_words
+            kw["weight_words"] = self.word_weighting
+            kw["min_match_percentage"] = self.min_match_percentage
+            if self.scan_type == ScanType.FIELDSNOORDER:
+                self.scan_type = ScanType.FIELDS
+                kw["no_field_order"] = True
            func = {
-                ScanType.Filename: lambda f: engine.getwords(rem_file_ext(f.name)),
-                ScanType.Fields: lambda f: engine.getfields(rem_file_ext(f.name)),
-                ScanType.Tag: lambda f: [
+                ScanType.FILENAME: lambda f: engine.getwords(rem_file_ext(f.name)),
+                ScanType.FIELDS: lambda f: engine.getfields(rem_file_ext(f.name)),
+                ScanType.TAG: lambda f: [
                    engine.getwords(str(getattr(f, attrname)))
                    for attrname in SCANNABLE_TAGS
                    if attrname in self.scanned_tags
@@ -111,9 +126,9 @@ class Scanner:
    def _tie_breaker(ref, dupe):
        refname = rem_file_ext(ref.name).lower()
        dupename = rem_file_ext(dupe.name).lower()
-        if 'copy' in dupename:
+        if "copy" in dupename:
            return False
-        if 'copy' in refname:
+        if "copy" in refname:
            return True
        if is_same_with_digit(dupename, refname):
            return False
@@ -130,19 +145,19 @@ class Scanner:
        raise NotImplementedError()

    def get_dupe_groups(self, files, ignore_list=None, j=job.nulljob):
-        for f in (f for f in files if not hasattr(f, 'is_ref')):
+        for f in (f for f in files if not hasattr(f, "is_ref")):
            f.is_ref = False
        files = remove_dupe_paths(files)
        logging.info("Getting matches. Scan type: %d", self.scan_type)
        matches = self._getmatches(files, j)
-        logging.info('Found %d matches' % len(matches))
+        logging.info("Found %d matches" % len(matches))
        j.set_progress(100, tr("Almost done! Fiddling with results..."))
        # In removing what we call here "false matches", we first want to remove, if we scan by
        # folders, we want to remove folder matches for which the parent is also in a match (they're
        # "duplicated duplicates if you will). Then, we also don't want mixed file kinds if the
        # option isn't enabled, we want matches for which both files exist and, lastly, we don't
        # want matches with both files as ref.
-        if self.scan_type == ScanType.Folders and matches:
+        if self.scan_type == ScanType.FOLDERS and matches:
            allpath = {m.first.path for m in matches}
            allpath |= {m.second.path for m in matches}
            sortedpaths = sorted(allpath)
@@ -159,13 +174,15 @@ class Scanner:
        matches = [m for m in matches if m.first.path.exists() and m.second.path.exists()]
        matches = [m for m in matches if not (m.first.is_ref and m.second.is_ref)]
        if ignore_list:
-            matches = [
-                m for m in matches
-                if not ignore_list.AreIgnored(str(m.first.path), str(m.second.path))
-            ]
-        logging.info('Grouping matches')
+            matches = [m for m in matches if not ignore_list.are_ignored(str(m.first.path), str(m.second.path))]
+        logging.info("Grouping matches")
        groups = engine.get_groups(matches)
-        if self.scan_type in {ScanType.Filename, ScanType.Fields, ScanType.FieldsNoOrder, ScanType.Tag}:
+        if self.scan_type in {
+            ScanType.FILENAME,
+            ScanType.FIELDS,
+            ScanType.FIELDSNOORDER,
+            ScanType.TAG,
+        }:
            matched_files = dedupe([m.first for m in matches] + [m.second for m in matches])
            self.discarded_file_count = len(matched_files) - sum(len(g) for g in groups)
        else:
@@ -181,7 +198,7 @@ class Scanner:
            # reporting discarded matches.
            self.discarded_file_count = 0
        groups = [g for g in groups if any(not f.is_ref for f in g)]
-        logging.info('Created %d groups' % len(groups))
+        logging.info("Created %d groups" % len(groups))
        for g in groups:
            g.prioritize(self._key_func, self._tie_breaker)
        return groups
@@ -189,8 +206,9 @@ class Scanner:
    match_similar_words = False
    min_match_percentage = 80
    mix_file_kind = True
-    scan_type = ScanType.Filename
-    scanned_tags = {'artist', 'title'}
+    scan_type = ScanType.FILENAME
+    scanned_tags = {"artist", "title"}
    size_threshold = 0
+    large_size_threshold = 0
+    big_file_size_threshold = 0
    word_weighting = False
-
--- a/core/se/fs.py
+++ b/core/se/fs.py
@@ -11,6 +11,7 @@ from hscommon.util import format_size
 from core import fs
 from core.util import format_timestamp, format_perc, format_words, format_dupe_count

+
 def get_display_info(dupe, group, delta):
    size = dupe.size
    mtime = dupe.mtime
@@ -26,16 +27,17 @@ def get_display_info(dupe, group, delta):
        percentage = group.percentage
        dupe_count = len(group.dupes)
    return {
-        'name': dupe.name,
-        'folder_path': str(dupe.folder_path),
-        'size': format_size(size, 0, 1, False),
-        'extension': dupe.extension,
-        'mtime': format_timestamp(mtime, delta and m),
-        'percentage': format_perc(percentage),
-        'words': format_words(dupe.words) if hasattr(dupe, 'words') else '',
-        'dupe_count': format_dupe_count(dupe_count),
+        "name": dupe.name,
+        "folder_path": str(dupe.folder_path),
+        "size": format_size(size, 0, 1, False),
+        "extension": dupe.extension,
+        "mtime": format_timestamp(mtime, delta and m),
+        "percentage": format_perc(percentage),
+        "words": format_words(dupe.words) if hasattr(dupe, "words") else "",
+        "dupe_count": format_dupe_count(dupe_count),
    }

+
 class File(fs.File):
    def get_display_info(self, group, delta):
        return get_display_info(self, group, delta)
@@ -44,4 +46,3 @@ class File(fs.File):
 class Folder(fs.Folder):
    def get_display_info(self, group, delta):
        return get_display_info(self, group, delta)
-
--- a/core/se/result_table.py
+++ b/core/se/result_table.py
@@ -10,18 +10,19 @@ from hscommon.trans import trget

 from core.gui.result_table import ResultTable as ResultTableBase

-coltr = trget('columns')
+coltr = trget("columns")
+

 class ResultTable(ResultTableBase):
    COLUMNS = [
-        Column('marked', ''),
-        Column('name', coltr("Filename")),
-        Column('folder_path', coltr("Folder"), optional=True),
-        Column('size', coltr("Size (KB)"), optional=True),
-        Column('extension', coltr("Kind"), visible=False, optional=True),
-        Column('mtime', coltr("Modification"), visible=False, optional=True),
-        Column('percentage', coltr("Match %"), optional=True),
-        Column('words', coltr("Words Used"), visible=False, optional=True),
-        Column('dupe_count', coltr("Dupe Count"), visible=False, optional=True),
+        Column("marked", ""),
+        Column("name", coltr("Filename")),
+        Column("folder_path", coltr("Folder"), optional=True),
+        Column("size", coltr("Size (KB)"), optional=True),
+        Column("extension", coltr("Kind"), visible=False, optional=True),
+        Column("mtime", coltr("Modification"), visible=False, optional=True),
+        Column("percentage", coltr("Match %"), optional=True),
+        Column("words", coltr("Words Used"), visible=False, optional=True),
+        Column("dupe_count", coltr("Dupe Count"), visible=False, optional=True),
    ]
-    DELTA_COLUMNS = {'size', 'mtime'}
+    DELTA_COLUMNS = {"size", "mtime"}
--- a/core/se/scanner.py
+++ b/core/se/scanner.py
@@ -8,12 +8,12 @@ from hscommon.trans import tr

 from core.scanner import Scanner as ScannerBase, ScanOption, ScanType

+
 class ScannerSE(ScannerBase):
    @staticmethod
    def get_scan_options():
        return [
-            ScanOption(ScanType.Filename, tr("Filename")),
-            ScanOption(ScanType.Contents, tr("Contents")),
-            ScanOption(ScanType.Folders, tr("Folders")),
+            ScanOption(ScanType.FILENAME, tr("Filename")),
+            ScanOption(ScanType.CONTENTS, tr("Contents")),
+            ScanOption(ScanType.FOLDERS, tr("Folders")),
        ]
-
--- a/core/tests/app_test.py
+++ b/core/tests/app_test.py
@@ -8,7 +8,7 @@ import os
 import os.path as op
 import logging

-from pytest import mark
+import pytest
 from hscommon.path import Path
 import hscommon.conflict
 import hscommon.util
@@ -20,93 +20,98 @@ from .results_test import GetTestGroups
 from .. import app, fs, engine
 from ..scanner import ScanType

+
 def add_fake_files_to_directories(directories, files):
    directories.get_files = lambda j=None: iter(files)
-    directories._dirs.append('this is just so Scan() doesnt return 3')
+    directories._dirs.append("this is just so Scan() doesn't return 3")
+

 class TestCaseDupeGuru:
    def test_apply_filter_calls_results_apply_filter(self, monkeypatch):
        dgapp = TestApp().app
-        monkeypatch.setattr(dgapp.results, 'apply_filter', log_calls(dgapp.results.apply_filter))
-        dgapp.apply_filter('foo')
+        monkeypatch.setattr(dgapp.results, "apply_filter", log_calls(dgapp.results.apply_filter))
+        dgapp.apply_filter("foo")
        eq_(2, len(dgapp.results.apply_filter.calls))
        call = dgapp.results.apply_filter.calls[0]
-        assert call['filter_str'] is None
+        assert call["filter_str"] is None
        call = dgapp.results.apply_filter.calls[1]
-        eq_('foo', call['filter_str'])
+        eq_("foo", call["filter_str"])

    def test_apply_filter_escapes_regexp(self, monkeypatch):
        dgapp = TestApp().app
-        monkeypatch.setattr(dgapp.results, 'apply_filter', log_calls(dgapp.results.apply_filter))
-        dgapp.apply_filter('()[]\\.|+?^abc')
+        monkeypatch.setattr(dgapp.results, "apply_filter", log_calls(dgapp.results.apply_filter))
+        dgapp.apply_filter("()[]\\.|+?^abc")
        call = dgapp.results.apply_filter.calls[1]
-        eq_('\\(\\)\\[\\]\\\\\\.\\|\\+\\?\\^abc', call['filter_str'])
-        dgapp.apply_filter('(*)') # In "simple mode", we want the * to behave as a wilcard
+        eq_("\\(\\)\\[\\]\\\\\\.\\|\\+\\?\\^abc", call["filter_str"])
+        dgapp.apply_filter("(*)")  # In "simple mode", we want the * to behave as a wildcard
        call = dgapp.results.apply_filter.calls[3]
-        eq_(r'\(.*\)', call['filter_str'])
-        dgapp.options['escape_filter_regexp'] = False
-        dgapp.apply_filter('(abc)')
+        eq_(r"\(.*\)", call["filter_str"])
+        dgapp.options["escape_filter_regexp"] = False
+        dgapp.apply_filter("(abc)")
        call = dgapp.results.apply_filter.calls[5]
-        eq_('(abc)', call['filter_str'])
+        eq_("(abc)", call["filter_str"])

    def test_copy_or_move(self, tmpdir, monkeypatch):
        # The goal here is just to have a test for a previous blowup I had. I know my test coverage
        # for this unit is pathetic. What's done is done. My approach now is to add tests for
        # every change I want to make. The blowup was caused by a missing import.
        p = Path(str(tmpdir))
-        p['foo'].open('w').close()
-        monkeypatch.setattr(hscommon.conflict, 'smart_copy', log_calls(lambda source_path, dest_path: None))
+        p["foo"].open("w").close()
+        monkeypatch.setattr(
+            hscommon.conflict,
+            "smart_copy",
+            log_calls(lambda source_path, dest_path: None),
+        )
        # XXX This monkeypatch is temporary. will be fixed in a better monkeypatcher.
-        monkeypatch.setattr(app, 'smart_copy', hscommon.conflict.smart_copy)
-        monkeypatch.setattr(os, 'makedirs', lambda path: None) # We don't want the test to create that fake directory
+        monkeypatch.setattr(app, "smart_copy", hscommon.conflict.smart_copy)
+        monkeypatch.setattr(os, "makedirs", lambda path: None)  # We don't want the test to create that fake directory
        dgapp = TestApp().app
        dgapp.directories.add_path(p)
        [f] = dgapp.directories.get_files()
-        dgapp.copy_or_move(f, True, 'some_destination', 0)
+        dgapp.copy_or_move(f, True, "some_destination", 0)
        eq_(1, len(hscommon.conflict.smart_copy.calls))
        call = hscommon.conflict.smart_copy.calls[0]
-        eq_(call['dest_path'], op.join('some_destination', 'foo'))
-        eq_(call['source_path'], f.path)
+        eq_(call["dest_path"], op.join("some_destination", "foo"))
+        eq_(call["source_path"], f.path)

    def test_copy_or_move_clean_empty_dirs(self, tmpdir, monkeypatch):
        tmppath = Path(str(tmpdir))
-        sourcepath = tmppath['source']
+        sourcepath = tmppath["source"]
        sourcepath.mkdir()
-        sourcepath['myfile'].open('w')
+        sourcepath["myfile"].open("w")
        app = TestApp().app
        app.directories.add_path(tmppath)
        [myfile] = app.directories.get_files()
-        monkeypatch.setattr(app, 'clean_empty_dirs', log_calls(lambda path: None))
-        app.copy_or_move(myfile, False, tmppath['dest'], 0)
+        monkeypatch.setattr(app, "clean_empty_dirs", log_calls(lambda path: None))
+        app.copy_or_move(myfile, False, tmppath["dest"], 0)
        calls = app.clean_empty_dirs.calls
        eq_(1, len(calls))
-        eq_(sourcepath, calls[0]['path'])
+        eq_(sourcepath, calls[0]["path"])

-    def test_Scan_with_objects_evaluating_to_false(self):
+    def test_scan_with_objects_evaluating_to_false(self):
        class FakeFile(fs.File):
            def __bool__(self):
                return False

-
        # At some point, any() was used in a wrong way that made Scan() wrongly return 1
        app = TestApp().app
-        f1, f2 = [FakeFile('foo') for i in range(2)]
+        f1, f2 = [FakeFile("foo") for _ in range(2)]
        f1.is_ref, f2.is_ref = (False, False)
        assert not (bool(f1) and bool(f2))
        add_fake_files_to_directories(app.directories, [f1, f2])
        app.start_scanning()  # no exception

-    @mark.skipif("not hasattr(os, 'link')")
+    @pytest.mark.skipif("not hasattr(os, 'link')")
    def test_ignore_hardlink_matches(self, tmpdir):
        # If the ignore_hardlink_matches option is set, don't match files hardlinking to the same
        # inode.
        tmppath = Path(str(tmpdir))
-        tmppath['myfile'].open('w').write('foo')
-        os.link(str(tmppath['myfile']), str(tmppath['hardlink']))
+        tmppath["myfile"].open("w").write("foo")
+        os.link(str(tmppath["myfile"]), str(tmppath["hardlink"]))
        app = TestApp().app
        app.directories.add_path(tmppath)
-        app.options['scan_type'] = ScanType.Contents
-        app.options['ignore_hardlink_matches'] = True
+        app.options["scan_type"] = ScanType.CONTENTS
+        app.options["ignore_hardlink_matches"] = True
        app.start_scanning()
        eq_(len(app.results.groups), 0)

@@ -116,27 +121,33 @@ class TestCaseDupeGuru:
        # making the selected row None. Don't crash when it happens.
        dgapp = TestApp().app
        # selected_row is None because there's no result.
-        assert not dgapp.result_table.rename_selected('foo') # no crash
+        assert not dgapp.result_table.rename_selected("foo")  # no crash

-class TestCaseDupeGuru_clean_empty_dirs:
-    def pytest_funcarg__do_setup(self, request):
-        monkeypatch = request.getfuncargvalue('monkeypatch')
-        monkeypatch.setattr(hscommon.util, 'delete_if_empty', log_calls(lambda path, files_to_delete=[]: None))
+
+class TestCaseDupeGuruCleanEmptyDirs:
+    @pytest.fixture
+    def do_setup(self, request):
+        monkeypatch = request.getfixturevalue("monkeypatch")
+        monkeypatch.setattr(
+            hscommon.util,
+            "delete_if_empty",
+            log_calls(lambda path, files_to_delete=[]: None),
+        )
        # XXX This monkeypatch is temporary. will be fixed in a better monkeypatcher.
-        monkeypatch.setattr(app, 'delete_if_empty', hscommon.util.delete_if_empty)
+        monkeypatch.setattr(app, "delete_if_empty", hscommon.util.delete_if_empty)
        self.app = TestApp().app

    def test_option_off(self, do_setup):
-        self.app.clean_empty_dirs(Path('/foo/bar'))
+        self.app.clean_empty_dirs(Path("/foo/bar"))
        eq_(0, len(hscommon.util.delete_if_empty.calls))

    def test_option_on(self, do_setup):
-        self.app.options['clean_empty_dirs'] = True
-        self.app.clean_empty_dirs(Path('/foo/bar'))
+        self.app.options["clean_empty_dirs"] = True
+        self.app.clean_empty_dirs(Path("/foo/bar"))
        calls = hscommon.util.delete_if_empty.calls
        eq_(1, len(calls))
-        eq_(Path('/foo/bar'), calls[0]['path'])
-        eq_(['.DS_Store'], calls[0]['files_to_delete'])
+        eq_(Path("/foo/bar"), calls[0]["path"])
+        eq_([".DS_Store"], calls[0]["files_to_delete"])

    def test_recurse_up(self, do_setup, monkeypatch):
        # delete_if_empty must be recursively called up in the path until it returns False
@@ -144,20 +155,21 @@ class TestCaseDupeGuru_clean_empty_dirs:
        def mock_delete_if_empty(path, files_to_delete=[]):
            return len(path) > 1

-        monkeypatch.setattr(hscommon.util, 'delete_if_empty', mock_delete_if_empty)
+        monkeypatch.setattr(hscommon.util, "delete_if_empty", mock_delete_if_empty)
        # XXX This monkeypatch is temporary. will be fixed in a better monkeypatcher.
-        monkeypatch.setattr(app, 'delete_if_empty', mock_delete_if_empty)
-        self.app.options['clean_empty_dirs'] = True
-        self.app.clean_empty_dirs(Path('not-empty/empty/empty'))
+        monkeypatch.setattr(app, "delete_if_empty", mock_delete_if_empty)
+        self.app.options["clean_empty_dirs"] = True
+        self.app.clean_empty_dirs(Path("not-empty/empty/empty"))
        calls = hscommon.util.delete_if_empty.calls
        eq_(3, len(calls))
-        eq_(Path('not-empty/empty/empty'), calls[0]['path'])
-        eq_(Path('not-empty/empty'), calls[1]['path'])
-        eq_(Path('not-empty'), calls[2]['path'])
+        eq_(Path("not-empty/empty/empty"), calls[0]["path"])
+        eq_(Path("not-empty/empty"), calls[1]["path"])
+        eq_(Path("not-empty"), calls[2]["path"])


 class TestCaseDupeGuruWithResults:
-    def pytest_funcarg__do_setup(self, request):
+    @pytest.fixture
+    def do_setup(self, request):
        app = TestApp()
        self.app = app.app
        self.objects, self.matches, self.groups = GetTestGroups()
@@ -166,13 +178,13 @@ class TestCaseDupeGuruWithResults:
        self.dtree = app.dtree
        self.rtable = app.rtable
        self.rtable.refresh()
-        tmpdir = request.getfuncargvalue('tmpdir')
+        tmpdir = request.getfixturevalue("tmpdir")
        tmppath = Path(str(tmpdir))
-        tmppath['foo'].mkdir()
-        tmppath['bar'].mkdir()
+        tmppath["foo"].mkdir()
+        tmppath["bar"].mkdir()
        self.app.directories.add_path(tmppath)

-    def test_GetObjects(self, do_setup):
+    def test_get_objects(self, do_setup):
        objects = self.objects
        groups = self.groups
        r = self.rtable[0]
@@ -185,10 +197,10 @@ class TestCaseDupeGuruWithResults:
        assert r._group is groups[1]
        assert r._dupe is objects[4]

-    def test_GetObjects_after_sort(self, do_setup):
+    def test_get_objects_after_sort(self, do_setup):
        objects = self.objects
        groups = self.groups[:]  # we need an un-sorted reference
-        self.rtable.sort('name', False)
+        self.rtable.sort("name", False)
        r = self.rtable[1]
        assert r._group is groups[1]
        assert r._dupe is objects[4]
@@ -200,7 +212,7 @@ class TestCaseDupeGuruWithResults:
        # The first 2 dupes have been removed. The 3rd one is a ref. it stays there, in first pos.
        eq_(self.rtable.selected_indexes, [1])  # no exception

-    def test_selectResultNodePaths(self, do_setup):
+    def test_select_result_node_paths(self, do_setup):
        app = self.app
        objects = self.objects
        self.rtable.select([1, 2])
@@ -208,7 +220,7 @@ class TestCaseDupeGuruWithResults:
        assert app.selected_dupes[0] is objects[1]
        assert app.selected_dupes[1] is objects[2]

-    def test_selectResultNodePaths_with_ref(self, do_setup):
+    def test_select_result_node_paths_with_ref(self, do_setup):
        app = self.app
        objects = self.objects
        self.rtable.select([1, 2, 3])
@@ -217,12 +229,12 @@ class TestCaseDupeGuruWithResults:
        assert app.selected_dupes[1] is objects[2]
        assert app.selected_dupes[2] is self.groups[1].ref

-    def test_selectResultNodePaths_after_sort(self, do_setup):
+    def test_select_result_node_paths_after_sort(self, do_setup):
        app = self.app
        objects = self.objects
-        groups = self.groups[:] #To keep the old order in memory
-        self.rtable.sort('name', False) #0
-        #Now, the group order is supposed to be reversed
+        groups = self.groups[:]  # To keep the old order in memory
+        self.rtable.sort("name", False)  # 0
+        # Now, the group order is supposed to be reversed
        self.rtable.select([1, 2, 3])
        eq_(len(app.selected_dupes), 3)
        assert app.selected_dupes[0] is objects[4]
@@ -244,11 +256,11 @@ class TestCaseDupeGuruWithResults:
        app.remove_selected()
        eq_(self.rtable.selected_indexes, [])  # no exception

-    def test_selectPowerMarkerRows_after_sort(self, do_setup):
+    def test_select_powermarker_rows_after_sort(self, do_setup):
        app = self.app
        objects = self.objects
        self.rtable.power_marker = True
-        self.rtable.sort('name', False)
+        self.rtable.sort("name", False)
        self.rtable.select([0, 1, 2])
        eq_(len(app.selected_dupes), 3)
        assert app.selected_dupes[0] is objects[4]
@@ -283,15 +295,15 @@ class TestCaseDupeGuruWithResults:
        app.toggle_selected_mark_state()
        eq_(app.results.mark_count, 0)

-    def test_refreshDetailsWithSelected(self, do_setup):
+    def test_refresh_details_with_selected(self, do_setup):
        self.rtable.select([1, 4])
-        eq_(self.dpanel.row(0), ('Filename', 'bar bleh', 'foo bar'))
-        self.dpanel.view.check_gui_calls(['refresh'])
+        eq_(self.dpanel.row(0), ("Filename", "bar bleh", "foo bar"))
+        self.dpanel.view.check_gui_calls(["refresh"])
        self.rtable.select([])
-        eq_(self.dpanel.row(0), ('Filename', '---', '---'))
-        self.dpanel.view.check_gui_calls(['refresh'])
+        eq_(self.dpanel.row(0), ("Filename", "---", "---"))
+        self.dpanel.view.check_gui_calls(["refresh"])

-    def test_makeSelectedReference(self, do_setup):
+    def test_make_selected_reference(self, do_setup):
        app = self.app
        objects = self.objects
        groups = self.groups
@@ -300,17 +312,17 @@ class TestCaseDupeGuruWithResults:
        assert groups[0].ref is objects[1]
        assert groups[1].ref is objects[4]

-    def test_makeSelectedReference_by_selecting_two_dupes_in_the_same_group(self, do_setup):
+    def test_make_selected_reference_by_selecting_two_dupes_in_the_same_group(self, do_setup):
        app = self.app
        objects = self.objects
        groups = self.groups
        self.rtable.select([1, 2, 4])
-        #Only [0, 0] and [1, 0] must go ref, not [0, 1] because it is a part of the same group
+        # Only [0, 0] and [1, 0] must go ref, not [0, 1] because it is a part of the same group
        app.make_selected_reference()
        assert groups[0].ref is objects[1]
        assert groups[1].ref is objects[4]

-    def test_removeSelected(self, do_setup):
+    def test_remove_selected(self, do_setup):
        app = self.app
        self.rtable.select([1, 4])
        app.remove_selected()
@@ -318,7 +330,7 @@ class TestCaseDupeGuruWithResults:
        app.remove_selected()
        eq_(len(app.results.dupes), 0)

-    def test_addDirectory_simple(self, do_setup):
+    def test_add_directory_simple(self, do_setup):
        # There's already a directory in self.app, so adding another once makes 2 of em
        app = self.app
        # any other path that isn't a parent or child of the already added path
@@ -326,7 +338,7 @@ class TestCaseDupeGuruWithResults:
        app.add_directory(otherpath)
        eq_(len(app.directories), 2)

-    def test_addDirectory_already_there(self, do_setup):
+    def test_add_directory_already_there(self, do_setup):
        app = self.app
        otherpath = Path(op.dirname(__file__))
        app.add_directory(otherpath)
@@ -334,46 +346,46 @@ class TestCaseDupeGuruWithResults:
        eq_(len(app.view.messages), 1)
        assert "already" in app.view.messages[0]

-    def test_addDirectory_does_not_exist(self, do_setup):
+    def test_add_directory_does_not_exist(self, do_setup):
        app = self.app
-        app.add_directory('/does_not_exist')
+        app.add_directory("/does_not_exist")
        eq_(len(app.view.messages), 1)
        assert "exist" in app.view.messages[0]

    def test_ignore(self, do_setup):
        app = self.app
-        self.rtable.select([4]) #The dupe of the second, 2 sized group
+        self.rtable.select([4])  # The dupe of the second, 2 sized group
        app.add_selected_to_ignore_list()
        eq_(len(app.ignore_list), 1)
-        self.rtable.select([1]) #first dupe of the 3 dupes group
+        self.rtable.select([1])  # first dupe of the 3 dupes group
        app.add_selected_to_ignore_list()
-        #BOTH the ref and the other dupe should have been added
+        # BOTH the ref and the other dupe should have been added
        eq_(len(app.ignore_list), 3)

-    def test_purgeIgnoreList(self, do_setup, tmpdir):
+    def test_purge_ignorelist(self, do_setup, tmpdir):
        app = self.app
-        p1 = str(tmpdir.join('file1'))
-        p2 = str(tmpdir.join('file2'))
-        open(p1, 'w').close()
-        open(p2, 'w').close()
-        dne = '/does_not_exist'
-        app.ignore_list.Ignore(dne, p1)
-        app.ignore_list.Ignore(p2, dne)
-        app.ignore_list.Ignore(p1, p2)
+        p1 = str(tmpdir.join("file1"))
+        p2 = str(tmpdir.join("file2"))
+        open(p1, "w").close()
+        open(p2, "w").close()
+        dne = "/does_not_exist"
+        app.ignore_list.ignore(dne, p1)
+        app.ignore_list.ignore(p2, dne)
+        app.ignore_list.ignore(p1, p2)
        app.purge_ignore_list()
        eq_(1, len(app.ignore_list))
-        assert app.ignore_list.AreIgnored(p1, p2)
-        assert not app.ignore_list.AreIgnored(dne, p1)
+        assert app.ignore_list.are_ignored(p1, p2)
+        assert not app.ignore_list.are_ignored(dne, p1)

    def test_only_unicode_is_added_to_ignore_list(self, do_setup):
-        def FakeIgnore(first, second):
+        def fake_ignore(first, second):
            if not isinstance(first, str):
                self.fail()
            if not isinstance(second, str):
                self.fail()

        app = self.app
-        app.ignore_list.Ignore = FakeIgnore
+        app.ignore_list.ignore = fake_ignore
        self.rtable.select([4])
        app.add_selected_to_ignore_list()

@@ -401,21 +413,22 @@ class TestCaseDupeGuruWithResults:
        # Ref #238
        self.rtable.delta_values = True
        self.rtable.power_marker = True
-        self.rtable.sort('dupe_count', False)
+        self.rtable.sort("dupe_count", False)
        # don't crash
-        self.rtable.sort('percentage', False)
+        self.rtable.sort("percentage", False)
        # don't crash


-class TestCaseDupeGuru_renameSelected:
-    def pytest_funcarg__do_setup(self, request):
-        tmpdir = request.getfuncargvalue('tmpdir')
+class TestCaseDupeGuruRenameSelected:
+    @pytest.fixture
+    def do_setup(self, request):
+        tmpdir = request.getfixturevalue("tmpdir")
        p = Path(str(tmpdir))
-        fp = open(str(p['foo bar 1']), mode='w')
+        fp = open(str(p["foo bar 1"]), mode="w")
        fp.close()
-        fp = open(str(p['foo bar 2']), mode='w')
+        fp = open(str(p["foo bar 2"]), mode="w")
        fp.close()
-        fp = open(str(p['foo bar 3']), mode='w')
+        fp = open(str(p["foo bar 3"]), mode="w")
        fp.close()
        files = fs.get_files(p)
        for f in files:
@@ -437,46 +450,47 @@ class TestCaseDupeGuru_renameSelected:
        app = self.app
        g = self.groups[0]
        self.rtable.select([1])
-        assert app.rename_selected('renamed')
+        assert app.rename_selected("renamed")
        names = [p.name for p in self.p.listdir()]
-        assert 'renamed' in names
-        assert 'foo bar 2' not in names
-        eq_(g.dupes[0].name, 'renamed')
+        assert "renamed" in names
+        assert "foo bar 2" not in names
+        eq_(g.dupes[0].name, "renamed")

    def test_none_selected(self, do_setup, monkeypatch):
        app = self.app
        g = self.groups[0]
        self.rtable.select([])
-        monkeypatch.setattr(logging, 'warning', log_calls(lambda msg: None))
-        assert not app.rename_selected('renamed')
-        msg = logging.warning.calls[0]['msg']
-        eq_('dupeGuru Warning: list index out of range', msg)
+        monkeypatch.setattr(logging, "warning", log_calls(lambda msg: None))
+        assert not app.rename_selected("renamed")
+        msg = logging.warning.calls[0]["msg"]
+        eq_("dupeGuru Warning: list index out of range", msg)
        names = [p.name for p in self.p.listdir()]
-        assert 'renamed' not in names
-        assert 'foo bar 2' in names
-        eq_(g.dupes[0].name, 'foo bar 2')
+        assert "renamed" not in names
+        assert "foo bar 2" in names
+        eq_(g.dupes[0].name, "foo bar 2")

    def test_name_already_exists(self, do_setup, monkeypatch):
        app = self.app
        g = self.groups[0]
        self.rtable.select([1])
-        monkeypatch.setattr(logging, 'warning', log_calls(lambda msg: None))
-        assert not app.rename_selected('foo bar 1')
-        msg = logging.warning.calls[0]['msg']
-        assert msg.startswith('dupeGuru Warning: \'foo bar 1\' already exists in')
+        monkeypatch.setattr(logging, "warning", log_calls(lambda msg: None))
+        assert not app.rename_selected("foo bar 1")
+        msg = logging.warning.calls[0]["msg"]
+        assert msg.startswith("dupeGuru Warning: 'foo bar 1' already exists in")
        names = [p.name for p in self.p.listdir()]
-        assert 'foo bar 1' in names
-        assert 'foo bar 2' in names
-        eq_(g.dupes[0].name, 'foo bar 2')
+        assert "foo bar 1" in names
+        assert "foo bar 2" in names
+        eq_(g.dupes[0].name, "foo bar 2")


 class TestAppWithDirectoriesInTree:
-    def pytest_funcarg__do_setup(self, request):
-        tmpdir = request.getfuncargvalue('tmpdir')
+    @pytest.fixture
+    def do_setup(self, request):
+        tmpdir = request.getfixturevalue("tmpdir")
        p = Path(str(tmpdir))
-        p['sub1'].mkdir()
-        p['sub2'].mkdir()
-        p['sub3'].mkdir()
+        p["sub1"].mkdir()
+        p["sub2"].mkdir()
+        p["sub3"].mkdir()
        app = TestApp()
        self.app = app.app
        self.dtree = app.dtree
@@ -488,11 +502,9 @@ class TestAppWithDirectoriesInTree:
        # refreshed.
        node = self.dtree[0]
        eq_(len(node), 3)  # a len() call is required for subnodes to be loaded
-        subnode = node[0]
        node.state = 1  # the state property is a state index
        node = self.dtree[0]
        eq_(len(node), 3)
        subnode = node[0]
        eq_(subnode.state, 1)
-        self.dtree.view.check_gui_calls(['refresh_states'])
-
+        self.dtree.view.check_gui_calls(["refresh_states"])
--- a/core/tests/base.py
+++ b/core/tests/base.py
@@ -17,6 +17,7 @@ from ..app import DupeGuru as DupeGuruBase
 from ..gui.result_table import ResultTable as ResultTableBase
 from ..gui.prioritize_dialog import PrioritizeDialog

+
 class DupeGuruView:
    JOB = nulljob

@@ -44,23 +45,27 @@ class DupeGuruView:
    def create_results_window(self):
        pass

+
 class ResultTable(ResultTableBase):
    COLUMNS = [
-        Column('marked', ''),
-        Column('name', 'Filename'),
-        Column('folder_path', 'Directory'),
-        Column('size', 'Size (KB)'),
-        Column('extension', 'Kind'),
+        Column("marked", ""),
+        Column("name", "Filename"),
+        Column("folder_path", "Directory"),
+        Column("size", "Size (KB)"),
+        Column("extension", "Kind"),
    ]
-    DELTA_COLUMNS = {'size', }
+    DELTA_COLUMNS = {
+        "size",
+    }
+

 class DupeGuru(DupeGuruBase):
-    NAME = 'dupeGuru'
-    METADATA_TO_READ = ['size']
+    NAME = "dupeGuru"
+    METADATA_TO_READ = ["size"]

    def __init__(self):
        DupeGuruBase.__init__(self, DupeGuruView())
-        self.appdata = '/tmp'
+        self.appdata = "/tmp"
        self._recreate_result_table()

    def _prioritization_categories(self):
@@ -78,17 +83,18 @@ class NamedObject:
    def __init__(self, name="foobar", with_words=False, size=1, folder=None):
        self.name = name
        if folder is None:
-            folder = 'basepath'
+            folder = "basepath"
        self._folder = Path(folder)
        self.size = size
        self.md5partial = name
        self.md5 = name
+        self.md5samples = name
        if with_words:
            self.words = getwords(name)
        self.is_ref = False

    def __bool__(self):
-        return False #Make sure that operations are made correctly when the bool value of files is false.
+        return False  # Make sure that operations are made correctly when the bool value of files is false.

    def get_display_info(self, group, delta):
        size = self.size
@@ -97,10 +103,10 @@ class NamedObject:
            r = group.ref
            size -= r.size
        return {
-            'name': self.name,
-            'folder_path': str(self.folder_path),
-            'size': format_size(size, 0, 1, False),
-            'extension': self.extension if hasattr(self, 'extension') else '---',
+            "name": self.name,
+            "folder_path": str(self.folder_path),
+            "size": format_size(size, 0, 1, False),
+            "extension": self.extension if hasattr(self, "extension") else "---",
        }

    @property
@@ -115,6 +121,7 @@ class NamedObject:
    def extension(self):
        return get_file_ext(self.name)

+
 # Returns a group set that looks like that:
 # "foo bar" (1)
 #   "bar bleh" (1024)
@@ -127,22 +134,25 @@ def GetTestGroups():
        NamedObject("bar bleh"),
        NamedObject("foo bleh"),
        NamedObject("ibabtu"),
-        NamedObject("ibabtu")
+        NamedObject("ibabtu"),
    ]
    objects[1].size = 1024
-    matches = engine.getmatches(objects) #we should have 5 matches
-    groups = engine.get_groups(matches) #We should have 2 groups
+    matches = engine.getmatches(objects)  # we should have 5 matches
+    groups = engine.get_groups(matches)  # We should have 2 groups
    for g in groups:
-        g.prioritize(lambda x: objects.index(x)) #We want the dupes to be in the same order as the list is
+        g.prioritize(lambda x: objects.index(x))  # We want the dupes to be in the same order as the list is
    groups.sort(key=len, reverse=True)  # We want the group with 3 members to be first.
    return (objects, matches, groups)

+
 class TestApp(TestAppBase):
+    __test__ = False
+
    def __init__(self):
        def link_gui(gui):
            gui.view = self.make_logger()
-            if hasattr(gui, 'columns'): # tables
-                gui.columns.view = self.make_logger()
+            if hasattr(gui, "_columns"):  # tables
+                gui._columns.view = self.make_logger()
            return gui

        TestAppBase.__init__(self)
@@ -166,7 +176,7 @@ class TestApp(TestAppBase):
        # rtable is a property because its instance can be replaced during execution
        return self.app.result_table

-    #--- Helpers
+    # --- Helpers
    def select_pri_criterion(self, name):
        # Select a main prioritize criterion by name instead of by index. Makes tests more
        # maintainable.
--- a/core/tests/block_test.py
+++ b/core/tests/block_test.py
@@ -13,13 +13,16 @@ try:
 except ImportError:
    skip("Can't import the block module, probably hasn't been compiled.")

+
 def my_avgdiff(first, second, limit=768, min_iter=3):  # this is so I don't have to re-write every call
    return avgdiff(first, second, limit, min_iter)

+
 BLACK = (0, 0, 0)
-RED = (0xff, 0, 0)
-GREEN = (0, 0xff, 0)
-BLUE = (0, 0, 0xff)
+RED = (0xFF, 0, 0)
+GREEN = (0, 0xFF, 0)
+BLUE = (0, 0, 0xFF)
+

 class FakeImage:
    def __init__(self, size, data):
@@ -37,16 +40,20 @@ class FakeImage:
                pixels.append(pixel)
        return FakeImage((box[2] - box[0], box[3] - box[1]), pixels)

+
 def empty():
    return FakeImage((0, 0), [])

-def single_pixel(): #one red pixel
-    return FakeImage((1, 1), [(0xff, 0, 0)])
+
+def single_pixel():  # one red pixel
+    return FakeImage((1, 1), [(0xFF, 0, 0)])
+

 def four_pixels():
-    pixels = [RED, (0, 0x80, 0xff), (0x80, 0, 0), (0, 0x40, 0x80)]
+    pixels = [RED, (0, 0x80, 0xFF), (0x80, 0, 0), (0, 0x40, 0x80)]
    return FakeImage((2, 2), pixels)

+
 class TestCasegetblock:
    def test_single_pixel(self):
        im = single_pixel()
@@ -60,104 +67,12 @@ class TestCasegetblock:
    def test_four_pixels(self):
        im = four_pixels()
        [b] = getblocks2(im, 1)
-        meanred = (0xff + 0x80) // 4
+        meanred = (0xFF + 0x80) // 4
        meangreen = (0x80 + 0x40) // 4
-        meanblue = (0xff + 0x80) // 4
+        meanblue = (0xFF + 0x80) // 4
        eq_((meanred, meangreen, meanblue), b)


-# class TCdiff(unittest.TestCase):
-#     def test_diff(self):
-#         b1 = (10, 20, 30)
-#         b2 = (1, 2, 3)
-#         eq_(9 + 18 + 27, diff(b1, b2))
-#
-#     def test_diff_negative(self):
-#         b1 = (10, 20, 30)
-#         b2 = (1, 2, 3)
-#         eq_(9 + 18 + 27, diff(b2, b1))
-#
-#     def test_diff_mixed_positive_and_negative(self):
-#         b1 = (1, 5, 10)
-#         b2 = (10, 1, 15)
-#         eq_(9 + 4 + 5, diff(b1, b2))
-#
-
-# class TCgetblocks(unittest.TestCase):
-#     def test_empty_image(self):
-#         im = empty()
-#         blocks = getblocks(im, 1)
-#         eq_(0, len(blocks))
-#
-#     def test_one_block_image(self):
-#         im = four_pixels()
-#         blocks = getblocks2(im, 1)
-#         eq_(1, len(blocks))
-#         block = blocks[0]
-#         meanred = (0xff + 0x80) // 4
-#         meangreen = (0x80 + 0x40) // 4
-#         meanblue = (0xff + 0x80) // 4
-#         eq_((meanred, meangreen, meanblue), block)
-#
-#     def test_not_enough_height_to_fit_a_block(self):
-#         im = FakeImage((2, 1), [BLACK, BLACK])
-#         blocks = getblocks(im, 2)
-#         eq_(0, len(blocks))
-#
-#     def xtest_dont_include_leftovers(self):
-#         # this test is disabled because getblocks is not used and getblock in cdeffed
-#         pixels = [
-#             RED,(0, 0x80, 0xff), BLACK,
-#             (0x80, 0, 0),(0, 0x40, 0x80), BLACK,
-#             BLACK, BLACK, BLACK
-#         ]
-#         im = FakeImage((3, 3), pixels)
-#         blocks = getblocks(im, 2)
-#         block = blocks[0]
-#         #Because the block is smaller than the image, only blocksize must be considered.
-#         meanred = (0xff + 0x80) // 4
-#         meangreen = (0x80 + 0x40) // 4
-#         meanblue = (0xff + 0x80) // 4
-#         eq_((meanred, meangreen, meanblue), block)
-#
-#     def xtest_two_blocks(self):
-#         # this test is disabled because getblocks is not used and getblock in cdeffed
-#         pixels = [BLACK for i in xrange(4 * 2)]
-#         pixels[0] = RED
-#         pixels[1] = (0, 0x80, 0xff)
-#         pixels[4] = (0x80, 0, 0)
-#         pixels[5] = (0, 0x40, 0x80)
-#         im = FakeImage((4, 2), pixels)
-#         blocks = getblocks(im, 2)
-#         eq_(2, len(blocks))
-#         block = blocks[0]
-#         #Because the block is smaller than the image, only blocksize must be considered.
-#         meanred = (0xff + 0x80) // 4
-#         meangreen = (0x80 + 0x40) // 4
-#         meanblue = (0xff + 0x80) // 4
-#         eq_((meanred, meangreen, meanblue), block)
-#         eq_(BLACK, blocks[1])
-#
-#     def test_four_blocks(self):
-#         pixels = [BLACK for i in xrange(4 * 4)]
-#         pixels[0] = RED
-#         pixels[1] = (0, 0x80, 0xff)
-#         pixels[4] = (0x80, 0, 0)
-#         pixels[5] = (0, 0x40, 0x80)
-#         im = FakeImage((4, 4), pixels)
-#         blocks = getblocks2(im, 2)
-#         eq_(4, len(blocks))
-#         block = blocks[0]
-#         #Because the block is smaller than the image, only blocksize must be considered.
-#         meanred = (0xff + 0x80) // 4
-#         meangreen = (0x80 + 0x40) // 4
-#         meanblue = (0xff + 0x80) // 4
-#         eq_((meanred, meangreen, meanblue), block)
-#         eq_(BLACK, blocks[1])
-#         eq_(BLACK, blocks[2])
-#         eq_(BLACK, blocks[3])
-#
-
 class TestCasegetblocks2:
    def test_empty_image(self):
        im = empty()
@@ -169,9 +84,9 @@ class TestCasegetblocks2:
        blocks = getblocks2(im, 1)
        eq_(1, len(blocks))
        block = blocks[0]
-        meanred = (0xff + 0x80) // 4
+        meanred = (0xFF + 0x80) // 4
        meangreen = (0x80 + 0x40) // 4
-        meanblue = (0xff + 0x80) // 4
+        meanblue = (0xFF + 0x80) // 4
        eq_((meanred, meangreen, meanblue), block)

    def test_four_blocks_all_black(self):
@@ -225,25 +140,25 @@ class TestCaseavgdiff:
            my_avgdiff([b, b], [b])

    def test_first_arg_is_empty_but_not_second(self):
-        #Don't return 0 (as when the 2 lists are empty), raise!
+        # Don't return 0 (as when the 2 lists are empty), raise!
        b = (0, 0, 0)
        with raises(DifferentBlockCountError):
            my_avgdiff([], [b])

    def test_limit(self):
        ref = (0, 0, 0)
-        b1 = (10, 10, 10) #avg 30
-        b2 = (20, 20, 20) #avg 45
-        b3 = (30, 30, 30) #avg 60
+        b1 = (10, 10, 10)  # avg 30
+        b2 = (20, 20, 20)  # avg 45
+        b3 = (30, 30, 30)  # avg 60
        blocks1 = [ref, ref, ref]
        blocks2 = [b1, b2, b3]
        eq_(45, my_avgdiff(blocks1, blocks2, 44))

    def test_min_iterations(self):
        ref = (0, 0, 0)
-        b1 = (10, 10, 10) #avg 30
-        b2 = (20, 20, 20) #avg 45
-        b3 = (10, 10, 10) #avg 40
+        b1 = (10, 10, 10)  # avg 30
+        b2 = (20, 20, 20)  # avg 45
+        b3 = (10, 10, 10)  # avg 40
        blocks1 = [ref, ref, ref]
        blocks2 = [b1, b2, b3]
        eq_(40, my_avgdiff(blocks1, blocks2, 45 - 1, 3))
@@ -262,8 +177,8 @@ class TestCaseavgdiff:
    def test_return_at_least_1_at_the_slightest_difference(self):
        ref = (0, 0, 0)
        b1 = (1, 0, 0)
-        blocks1 = [ref for i in range(250)]
-        blocks2 = [ref for i in range(250)]
+        blocks1 = [ref for _ in range(250)]
+        blocks2 = [ref for _ in range(250)]
        blocks2[0] = b1
        eq_(1, my_avgdiff(blocks1, blocks2))

@@ -272,41 +187,3 @@ class TestCaseavgdiff:
        blocks1 = [ref, ref]
        blocks2 = [ref, ref]
        eq_(0, my_avgdiff(blocks1, blocks2))
-
-
-# class TCmaxdiff(unittest.TestCase):
-#     def test_empty(self):
-#         self.assertRaises(NoBlocksError, maxdiff,[],[])
-#
-#     def test_two_blocks(self):
-#         b1 = (5, 10, 15)
-#         b2 = (255, 250, 245)
-#         b3 = (0, 0, 0)
-#         b4 = (255, 0, 255)
-#         blocks1 = [b1, b2]
-#         blocks2 = [b3, b4]
-#         expected1 = 5 + 10 + 15
-#         expected2 = 0 + 250 + 10
-#         expected = max(expected1, expected2)
-#         eq_(expected, maxdiff(blocks1, blocks2))
-#
-#     def test_blocks_not_the_same_size(self):
-#         b = (0, 0, 0)
-#         self.assertRaises(DifferentBlockCountError, maxdiff,[b, b],[b])
-#
-#     def test_first_arg_is_empty_but_not_second(self):
-#         #Don't return 0 (as when the 2 lists are empty), raise!
-#         b = (0, 0, 0)
-#         self.assertRaises(DifferentBlockCountError, maxdiff,[],[b])
-#
-#     def test_limit(self):
-#         b1 = (5, 10, 15)
-#         b2 = (255, 250, 245)
-#         b3 = (0, 0, 0)
-#         b4 = (255, 0, 255)
-#         blocks1 = [b1, b2]
-#         blocks2 = [b3, b4]
-#         expected1 = 5 + 10 + 15
-#         expected2 = 0 + 250 + 10
-#         eq_(expected1, maxdiff(blocks1, blocks2, expected1 - 1))
-#
--- a/core/tests/cache_test.py
+++ b/core/tests/cache_test.py
@@ -16,34 +16,35 @@ try:
 except ImportError:
    skip("Can't import the cache module, probably hasn't been compiled.")

-class TestCasecolors_to_string:
+
+class TestCaseColorsToString:
    def test_no_color(self):
-        eq_('', colors_to_string([]))
+        eq_("", colors_to_string([]))

    def test_single_color(self):
-        eq_('000000', colors_to_string([(0, 0, 0)]))
-        eq_('010101', colors_to_string([(1, 1, 1)]))
-        eq_('0a141e', colors_to_string([(10, 20, 30)]))
+        eq_("000000", colors_to_string([(0, 0, 0)]))
+        eq_("010101", colors_to_string([(1, 1, 1)]))
+        eq_("0a141e", colors_to_string([(10, 20, 30)]))

    def test_two_colors(self):
-        eq_('000102030405', colors_to_string([(0, 1, 2), (3, 4, 5)]))
+        eq_("000102030405", colors_to_string([(0, 1, 2), (3, 4, 5)]))


-class TestCasestring_to_colors:
+class TestCaseStringToColors:
    def test_empty(self):
-        eq_([], string_to_colors(''))
+        eq_([], string_to_colors(""))

    def test_single_color(self):
-        eq_([(0, 0, 0)], string_to_colors('000000'))
-        eq_([(2, 3, 4)], string_to_colors('020304'))
-        eq_([(10, 20, 30)], string_to_colors('0a141e'))
+        eq_([(0, 0, 0)], string_to_colors("000000"))
+        eq_([(2, 3, 4)], string_to_colors("020304"))
+        eq_([(10, 20, 30)], string_to_colors("0a141e"))

    def test_two_colors(self):
-        eq_([(10, 20, 30), (40, 50, 60)], string_to_colors('0a141e28323c'))
+        eq_([(10, 20, 30), (40, 50, 60)], string_to_colors("0a141e28323c"))

    def test_incomplete_color(self):
        # don't return anything if it's not a complete color
-        eq_([], string_to_colors('102'))
+        eq_([], string_to_colors("102"))


 class BaseTestCaseCache:
@@ -54,58 +55,58 @@ class BaseTestCaseCache:
        c = self.get_cache()
        eq_(0, len(c))
        with raises(KeyError):
-            c['foo']
+            c["foo"]

    def test_set_then_retrieve_blocks(self):
        c = self.get_cache()
        b = [(0, 0, 0), (1, 2, 3)]
-        c['foo'] = b
-        eq_(b, c['foo'])
+        c["foo"] = b
+        eq_(b, c["foo"])

    def test_delitem(self):
        c = self.get_cache()
-        c['foo'] = ''
-        del c['foo']
-        assert 'foo' not in c
+        c["foo"] = ""
+        del c["foo"]
+        assert "foo" not in c
        with raises(KeyError):
-            del c['foo']
+            del c["foo"]

    def test_persistance(self, tmpdir):
-        DBNAME = tmpdir.join('hstest.db')
+        DBNAME = tmpdir.join("hstest.db")
        c = self.get_cache(str(DBNAME))
-        c['foo'] = [(1, 2, 3)]
+        c["foo"] = [(1, 2, 3)]
        del c
        c = self.get_cache(str(DBNAME))
-        eq_([(1, 2, 3)], c['foo'])
+        eq_([(1, 2, 3)], c["foo"])

    def test_filter(self):
        c = self.get_cache()
-        c['foo'] = ''
-        c['bar'] = ''
-        c['baz'] = ''
-        c.filter(lambda p: p != 'bar') #only 'bar' is removed
+        c["foo"] = ""
+        c["bar"] = ""
+        c["baz"] = ""
+        c.filter(lambda p: p != "bar")  # only 'bar' is removed
        eq_(2, len(c))
-        assert 'foo' in c
-        assert 'baz' in c
-        assert 'bar' not in c
+        assert "foo" in c
+        assert "baz" in c
+        assert "bar" not in c

    def test_clear(self):
        c = self.get_cache()
-        c['foo'] = ''
-        c['bar'] = ''
-        c['baz'] = ''
+        c["foo"] = ""
+        c["bar"] = ""
+        c["baz"] = ""
        c.clear()
        eq_(0, len(c))
-        assert 'foo' not in c
-        assert 'baz' not in c
-        assert 'bar' not in c
+        assert "foo" not in c
+        assert "baz" not in c
+        assert "bar" not in c

    def test_by_id(self):
        # it's possible to use the cache by referring to the files by their row_id
        c = self.get_cache()
        b = [(0, 0, 0), (1, 2, 3)]
-        c['foo'] = b
-        foo_id = c.get_id('foo')
+        c["foo"] = b
+        foo_id = c.get_id("foo")
        eq_(c[foo_id], b)


@@ -120,16 +121,16 @@ class TestCaseSqliteCache(BaseTestCaseCache):
        # If we don't do this monkeypatching, we get a weird exception about trying to flush a
        # closed file. I've tried setting logging level and stuff, but nothing worked. So, there we
        # go, a dirty monkeypatch.
-        monkeypatch.setattr(logging, 'warning', lambda *args, **kw: None)
-        dbname = str(tmpdir.join('foo.db'))
-        fp = open(dbname, 'w')
-        fp.write('invalid sqlite content')
+        monkeypatch.setattr(logging, "warning", lambda *args, **kw: None)
+        dbname = str(tmpdir.join("foo.db"))
+        fp = open(dbname, "w")
+        fp.write("invalid sqlite content")
        fp.close()
        c = self.get_cache(dbname)  # should not raise a DatabaseError
-        c['foo'] = [(1, 2, 3)]
+        c["foo"] = [(1, 2, 3)]
        del c
        c = self.get_cache(dbname)
-        eq_(c['foo'], [(1, 2, 3)])
+        eq_(c["foo"], [(1, 2, 3)])


 class TestCaseShelveCache(BaseTestCaseCache):
@@ -161,4 +162,3 @@ class TestCaseCacheSQLEscape:
            del c["foo'bar"]
        except KeyError:
            assert False
-
--- a/core/tests/conftest.py
+++ b/core/tests/conftest.py
@@ -1 +1 @@
-from hscommon.testutil import pytest_funcarg__app # noqa
+from hscommon.testutil import app  # noqa
--- a/core/tests/directories_test.py
+++ b/core/tests/directories_test.py
@@ -12,93 +12,109 @@ import shutil
 from pytest import raises
 from hscommon.path import Path
 from hscommon.testutil import eq_
+from hscommon.plat import ISWINDOWS

 from ..fs import File
-from ..directories import Directories, DirectoryState, AlreadyThereError, InvalidPathError
+from ..directories import (
+    Directories,
+    DirectoryState,
+    AlreadyThereError,
+    InvalidPathError,
+)
+from ..exclude import ExcludeList, ExcludeDict
+

 def create_fake_fs(rootpath):
    # We have it as a separate function because other units are using it.
-    rootpath = rootpath['fs']
+    rootpath = rootpath["fs"]
    rootpath.mkdir()
-    rootpath['dir1'].mkdir()
-    rootpath['dir2'].mkdir()
-    rootpath['dir3'].mkdir()
-    fp = rootpath['file1.test'].open('w')
-    fp.write('1')
+    rootpath["dir1"].mkdir()
+    rootpath["dir2"].mkdir()
+    rootpath["dir3"].mkdir()
+    fp = rootpath["file1.test"].open("w")
+    fp.write("1")
    fp.close()
-    fp = rootpath['file2.test'].open('w')
-    fp.write('12')
+    fp = rootpath["file2.test"].open("w")
+    fp.write("12")
    fp.close()
-    fp = rootpath['file3.test'].open('w')
-    fp.write('123')
+    fp = rootpath["file3.test"].open("w")
+    fp.write("123")
    fp.close()
-    fp = rootpath['dir1']['file1.test'].open('w')
-    fp.write('1')
+    fp = rootpath["dir1"]["file1.test"].open("w")
+    fp.write("1")
    fp.close()
-    fp = rootpath['dir2']['file2.test'].open('w')
-    fp.write('12')
+    fp = rootpath["dir2"]["file2.test"].open("w")
+    fp.write("12")
    fp.close()
-    fp = rootpath['dir3']['file3.test'].open('w')
-    fp.write('123')
+    fp = rootpath["dir3"]["file3.test"].open("w")
+    fp.write("123")
    fp.close()
    return rootpath

+
 testpath = None

+
 def setup_module(module):
    # In this unit, we have tests depending on two directory structure. One with only one file in it
    # and another with a more complex structure.
    testpath = Path(tempfile.mkdtemp())
    module.testpath = testpath
-    rootpath = testpath['onefile']
+    rootpath = testpath["onefile"]
    rootpath.mkdir()
-    fp = rootpath['test.txt'].open('w')
-    fp.write('test_data')
+    fp = rootpath["test.txt"].open("w")
+    fp.write("test_data")
    fp.close()
    create_fake_fs(testpath)

+
 def teardown_module(module):
    shutil.rmtree(str(module.testpath))

+
 def test_empty():
    d = Directories()
    eq_(len(d), 0)
-    assert 'foobar' not in d
+    assert "foobar" not in d
+

 def test_add_path():
    d = Directories()
-    p = testpath['onefile']
+    p = testpath["onefile"]
    d.add_path(p)
    eq_(1, len(d))
    assert p in d
-    assert (p['foobar']) in d
+    assert (p["foobar"]) in d
    assert p.parent() not in d
-    p = testpath['fs']
+    p = testpath["fs"]
    d.add_path(p)
    eq_(2, len(d))
    assert p in d

-def test_AddPath_when_path_is_already_there():
+
+def test_add_path_when_path_is_already_there():
    d = Directories()
-    p = testpath['onefile']
+    p = testpath["onefile"]
    d.add_path(p)
    with raises(AlreadyThereError):
        d.add_path(p)
    with raises(AlreadyThereError):
-        d.add_path(p['foobar'])
+        d.add_path(p["foobar"])
    eq_(1, len(d))

+
 def test_add_path_containing_paths_already_there():
    d = Directories()
-    d.add_path(testpath['onefile'])
+    d.add_path(testpath["onefile"])
    eq_(1, len(d))
    d.add_path(testpath)
    eq_(len(d), 1)
    eq_(d[0], testpath)

-def test_AddPath_non_latin(tmpdir):
+
+def test_add_path_non_latin(tmpdir):
    p = Path(str(tmpdir))
-    to_add = p['unicode\u201a']
+    to_add = p["unicode\u201a"]
    os.mkdir(str(to_add))
    d = Directories()
    try:
@@ -106,63 +122,69 @@ def test_AddPath_non_latin(tmpdir):
    except UnicodeDecodeError:
        assert False

+
 def test_del():
    d = Directories()
-    d.add_path(testpath['onefile'])
+    d.add_path(testpath["onefile"])
    try:
        del d[1]
        assert False
    except IndexError:
        pass
-    d.add_path(testpath['fs'])
+    d.add_path(testpath["fs"])
    del d[1]
    eq_(1, len(d))

+
 def test_states():
    d = Directories()
-    p = testpath['onefile']
+    p = testpath["onefile"]
    d.add_path(p)
-    eq_(DirectoryState.Normal, d.get_state(p))
-    d.set_state(p, DirectoryState.Reference)
-    eq_(DirectoryState.Reference, d.get_state(p))
-    eq_(DirectoryState.Reference, d.get_state(p['dir1']))
+    eq_(DirectoryState.NORMAL, d.get_state(p))
+    d.set_state(p, DirectoryState.REFERENCE)
+    eq_(DirectoryState.REFERENCE, d.get_state(p))
+    eq_(DirectoryState.REFERENCE, d.get_state(p["dir1"]))
    eq_(1, len(d.states))
    eq_(p, list(d.states.keys())[0])
-    eq_(DirectoryState.Reference, d.states[p])
+    eq_(DirectoryState.REFERENCE, d.states[p])
+

 def test_get_state_with_path_not_there():
    # When the path's not there, just return DirectoryState.Normal
    d = Directories()
-    d.add_path(testpath['onefile'])
-    eq_(d.get_state(testpath), DirectoryState.Normal)
+    d.add_path(testpath["onefile"])
+    eq_(d.get_state(testpath), DirectoryState.NORMAL)
+

 def test_states_overwritten_when_larger_directory_eat_smaller_ones():
    # ref #248
    # When setting the state of a folder, we overwrite previously set states for subfolders.
    d = Directories()
-    p = testpath['onefile']
+    p = testpath["onefile"]
    d.add_path(p)
-    d.set_state(p, DirectoryState.Excluded)
+    d.set_state(p, DirectoryState.EXCLUDED)
    d.add_path(testpath)
-    d.set_state(testpath, DirectoryState.Reference)
-    eq_(d.get_state(p), DirectoryState.Reference)
-    eq_(d.get_state(p['dir1']), DirectoryState.Reference)
-    eq_(d.get_state(testpath), DirectoryState.Reference)
+    d.set_state(testpath, DirectoryState.REFERENCE)
+    eq_(d.get_state(p), DirectoryState.REFERENCE)
+    eq_(d.get_state(p["dir1"]), DirectoryState.REFERENCE)
+    eq_(d.get_state(testpath), DirectoryState.REFERENCE)
+

 def test_get_files():
    d = Directories()
-    p = testpath['fs']
+    p = testpath["fs"]
    d.add_path(p)
-    d.set_state(p['dir1'], DirectoryState.Reference)
-    d.set_state(p['dir2'], DirectoryState.Excluded)
+    d.set_state(p["dir1"], DirectoryState.REFERENCE)
+    d.set_state(p["dir2"], DirectoryState.EXCLUDED)
    files = list(d.get_files())
    eq_(5, len(files))
    for f in files:
-        if f.path.parent() == p['dir1']:
+        if f.path.parent() == p["dir1"]:
            assert f.is_ref
        else:
            assert not f.is_ref

+
 def test_get_files_with_folders():
    # When fileclasses handle folders, return them and stop recursing!
    class FakeFile(File):
@@ -171,106 +193,120 @@ def test_get_files_with_folders():
            return True

    d = Directories()
-    p = testpath['fs']
+    p = testpath["fs"]
    d.add_path(p)
    files = list(d.get_files(fileclasses=[FakeFile]))
    # We have the 3 root files and the 3 root dirs
    eq_(6, len(files))

+
 def test_get_folders():
    d = Directories()
-    p = testpath['fs']
+    p = testpath["fs"]
    d.add_path(p)
-    d.set_state(p['dir1'], DirectoryState.Reference)
-    d.set_state(p['dir2'], DirectoryState.Excluded)
+    d.set_state(p["dir1"], DirectoryState.REFERENCE)
+    d.set_state(p["dir2"], DirectoryState.EXCLUDED)
    folders = list(d.get_folders())
    eq_(len(folders), 3)
    ref = [f for f in folders if f.is_ref]
    not_ref = [f for f in folders if not f.is_ref]
    eq_(len(ref), 1)
-    eq_(ref[0].path, p['dir1'])
+    eq_(ref[0].path, p["dir1"])
    eq_(len(not_ref), 2)
    eq_(ref[0].size, 1)

+
 def test_get_files_with_inherited_exclusion():
    d = Directories()
-    p = testpath['onefile']
+    p = testpath["onefile"]
    d.add_path(p)
-    d.set_state(p, DirectoryState.Excluded)
+    d.set_state(p, DirectoryState.EXCLUDED)
    eq_([], list(d.get_files()))

+
 def test_save_and_load(tmpdir):
    d1 = Directories()
    d2 = Directories()
-    p1 = Path(str(tmpdir.join('p1')))
+    p1 = Path(str(tmpdir.join("p1")))
    p1.mkdir()
-    p2 = Path(str(tmpdir.join('p2')))
+    p2 = Path(str(tmpdir.join("p2")))
    p2.mkdir()
    d1.add_path(p1)
    d1.add_path(p2)
-    d1.set_state(p1, DirectoryState.Reference)
-    d1.set_state(p1['dir1'], DirectoryState.Excluded)
-    tmpxml = str(tmpdir.join('directories_testunit.xml'))
+    d1.set_state(p1, DirectoryState.REFERENCE)
+    d1.set_state(p1["dir1"], DirectoryState.EXCLUDED)
+    tmpxml = str(tmpdir.join("directories_testunit.xml"))
    d1.save_to_file(tmpxml)
    d2.load_from_file(tmpxml)
    eq_(2, len(d2))
-    eq_(DirectoryState.Reference, d2.get_state(p1))
-    eq_(DirectoryState.Excluded, d2.get_state(p1['dir1']))
+    eq_(DirectoryState.REFERENCE, d2.get_state(p1))
+    eq_(DirectoryState.EXCLUDED, d2.get_state(p1["dir1"]))
+

 def test_invalid_path():
    d = Directories()
-    p = Path('does_not_exist')
+    p = Path("does_not_exist")
    with raises(InvalidPathError):
        d.add_path(p)
    eq_(0, len(d))

+
 def test_set_state_on_invalid_path():
    d = Directories()
    try:
-        d.set_state(Path('foobar',), DirectoryState.Normal)
+        d.set_state(
+            Path(
+                "foobar",
+            ),
+            DirectoryState.NORMAL,
+        )
    except LookupError:
        assert False

+
 def test_load_from_file_with_invalid_path(tmpdir):
-    #This test simulates a load from file resulting in a
-    #InvalidPath raise. Other directories must be loaded.
+    # This test simulates a load from file resulting in a
+    # InvalidPath raise. Other directories must be loaded.
    d1 = Directories()
-    d1.add_path(testpath['onefile'])
-    #Will raise InvalidPath upon loading
-    p = Path(str(tmpdir.join('toremove')))
+    d1.add_path(testpath["onefile"])
+    # Will raise InvalidPath upon loading
+    p = Path(str(tmpdir.join("toremove")))
    p.mkdir()
    d1.add_path(p)
    p.rmdir()
-    tmpxml = str(tmpdir.join('directories_testunit.xml'))
+    tmpxml = str(tmpdir.join("directories_testunit.xml"))
    d1.save_to_file(tmpxml)
    d2 = Directories()
    d2.load_from_file(tmpxml)
    eq_(1, len(d2))

+
 def test_unicode_save(tmpdir):
    d = Directories()
-    p1 = Path(str(tmpdir))['hello\xe9']
+    p1 = Path(str(tmpdir))["hello\xe9"]
    p1.mkdir()
-    p1['foo\xe9'].mkdir()
+    p1["foo\xe9"].mkdir()
    d.add_path(p1)
-    d.set_state(p1['foo\xe9'], DirectoryState.Excluded)
-    tmpxml = str(tmpdir.join('directories_testunit.xml'))
+    d.set_state(p1["foo\xe9"], DirectoryState.EXCLUDED)
+    tmpxml = str(tmpdir.join("directories_testunit.xml"))
    try:
        d.save_to_file(tmpxml)
    except UnicodeDecodeError:
        assert False

+
 def test_get_files_refreshes_its_directories():
    d = Directories()
-    p = testpath['fs']
+    p = testpath["fs"]
    d.add_path(p)
    files = d.get_files()
    eq_(6, len(list(files)))
    time.sleep(1)
-    os.remove(str(p['dir1']['file1.test']))
+    os.remove(str(p["dir1"]["file1.test"]))
    files = d.get_files()
    eq_(5, len(list(files)))

+
 def test_get_files_does_not_choke_on_non_existing_directories(tmpdir):
    d = Directories()
    p = Path(str(tmpdir))
@@ -278,36 +314,259 @@ def test_get_files_does_not_choke_on_non_existing_directories(tmpdir):
    p.rmtree()
    eq_([], list(d.get_files()))

+
 def test_get_state_returns_excluded_by_default_for_hidden_directories(tmpdir):
    d = Directories()
    p = Path(str(tmpdir))
-    hidden_dir_path = p['.foo']
-    p['.foo'].mkdir()
+    hidden_dir_path = p[".foo"]
+    p[".foo"].mkdir()
    d.add_path(p)
-    eq_(d.get_state(hidden_dir_path), DirectoryState.Excluded)
+    eq_(d.get_state(hidden_dir_path), DirectoryState.EXCLUDED)
    # But it can be overriden
-    d.set_state(hidden_dir_path, DirectoryState.Normal)
-    eq_(d.get_state(hidden_dir_path), DirectoryState.Normal)
+    d.set_state(hidden_dir_path, DirectoryState.NORMAL)
+    eq_(d.get_state(hidden_dir_path), DirectoryState.NORMAL)
+

 def test_default_path_state_override(tmpdir):
    # It's possible for a subclass to override the default state of a path
    class MyDirectories(Directories):
        def _default_state_for_path(self, path):
-            if 'foobar' in path:
-                return DirectoryState.Excluded
+            if "foobar" in path:
+                return DirectoryState.EXCLUDED

    d = MyDirectories()
    p1 = Path(str(tmpdir))
-    p1['foobar'].mkdir()
-    p1['foobar/somefile'].open('w').close()
-    p1['foobaz'].mkdir()
-    p1['foobaz/somefile'].open('w').close()
+    p1["foobar"].mkdir()
+    p1["foobar/somefile"].open("w").close()
+    p1["foobaz"].mkdir()
+    p1["foobaz/somefile"].open("w").close()
    d.add_path(p1)
-    eq_(d.get_state(p1['foobaz']), DirectoryState.Normal)
-    eq_(d.get_state(p1['foobar']), DirectoryState.Excluded)
+    eq_(d.get_state(p1["foobaz"]), DirectoryState.NORMAL)
+    eq_(d.get_state(p1["foobar"]), DirectoryState.EXCLUDED)
    eq_(len(list(d.get_files())), 1)  # only the 'foobaz' file is there
    # However, the default state can be changed
-    d.set_state(p1['foobar'], DirectoryState.Normal)
-    eq_(d.get_state(p1['foobar']), DirectoryState.Normal)
+    d.set_state(p1["foobar"], DirectoryState.NORMAL)
+    eq_(d.get_state(p1["foobar"]), DirectoryState.NORMAL)
    eq_(len(list(d.get_files())), 2)

+
+class TestExcludeList:
+    def setup_method(self, method):
+        self.d = Directories(exclude_list=ExcludeList(union_regex=False))
+
+    def get_files_and_expect_num_result(self, num_result):
+        """Calls get_files(), get the filenames only, print for debugging.
+        num_result is how many files are expected as a result."""
+        print(
+            f"EXCLUDED REGEX: paths {self.d._exclude_list.compiled_paths} \
+files: {self.d._exclude_list.compiled_files} all: {self.d._exclude_list.compiled}"
+        )
+        files = list(self.d.get_files())
+        files = [file.name for file in files]
+        print(f"FINAL FILES {files}")
+        eq_(len(files), num_result)
+        return files
+
+    def test_exclude_recycle_bin_by_default(self, tmpdir):
+        regex = r"^.*Recycle\.Bin$"
+        self.d._exclude_list.add(regex)
+        self.d._exclude_list.mark(regex)
+        p1 = Path(str(tmpdir))
+        p1["$Recycle.Bin"].mkdir()
+        p1["$Recycle.Bin"]["subdir"].mkdir()
+        self.d.add_path(p1)
+        eq_(self.d.get_state(p1["$Recycle.Bin"]), DirectoryState.EXCLUDED)
+        # By default, subdirs should be excluded too, but this can be overridden separately
+        eq_(self.d.get_state(p1["$Recycle.Bin"]["subdir"]), DirectoryState.EXCLUDED)
+        self.d.set_state(p1["$Recycle.Bin"]["subdir"], DirectoryState.NORMAL)
+        eq_(self.d.get_state(p1["$Recycle.Bin"]["subdir"]), DirectoryState.NORMAL)
+
+    def test_exclude_refined(self, tmpdir):
+        regex1 = r"^\$Recycle\.Bin$"
+        self.d._exclude_list.add(regex1)
+        self.d._exclude_list.mark(regex1)
+        p1 = Path(str(tmpdir))
+        p1["$Recycle.Bin"].mkdir()
+        p1["$Recycle.Bin"]["somefile.png"].open("w").close()
+        p1["$Recycle.Bin"]["some_unwanted_file.jpg"].open("w").close()
+        p1["$Recycle.Bin"]["subdir"].mkdir()
+        p1["$Recycle.Bin"]["subdir"]["somesubdirfile.png"].open("w").close()
+        p1["$Recycle.Bin"]["subdir"]["unwanted_subdirfile.gif"].open("w").close()
+        p1["$Recycle.Bin"]["subdar"].mkdir()
+        p1["$Recycle.Bin"]["subdar"]["somesubdarfile.jpeg"].open("w").close()
+        p1["$Recycle.Bin"]["subdar"]["unwanted_subdarfile.png"].open("w").close()
+        self.d.add_path(p1["$Recycle.Bin"])
+
+        # Filter should set the default state to Excluded
+        eq_(self.d.get_state(p1["$Recycle.Bin"]), DirectoryState.EXCLUDED)
+        # The subdir should inherit its parent state
+        eq_(self.d.get_state(p1["$Recycle.Bin"]["subdir"]), DirectoryState.EXCLUDED)
+        eq_(self.d.get_state(p1["$Recycle.Bin"]["subdar"]), DirectoryState.EXCLUDED)
+        # Override a child path's state
+        self.d.set_state(p1["$Recycle.Bin"]["subdir"], DirectoryState.NORMAL)
+        eq_(self.d.get_state(p1["$Recycle.Bin"]["subdir"]), DirectoryState.NORMAL)
+        # Parent should keep its default state, and the other child too
+        eq_(self.d.get_state(p1["$Recycle.Bin"]), DirectoryState.EXCLUDED)
+        eq_(self.d.get_state(p1["$Recycle.Bin"]["subdar"]), DirectoryState.EXCLUDED)
+        # print(f"get_folders(): {[x for x in self.d.get_folders()]}")
+
+        # only the 2 files directly under the Normal directory
+        files = self.get_files_and_expect_num_result(2)
+        assert "somefile.png" not in files
+        assert "some_unwanted_file.jpg" not in files
+        assert "somesubdarfile.jpeg" not in files
+        assert "unwanted_subdarfile.png" not in files
+        assert "somesubdirfile.png" in files
+        assert "unwanted_subdirfile.gif" in files
+        # Overriding the parent should enable all children
+        self.d.set_state(p1["$Recycle.Bin"], DirectoryState.NORMAL)
+        eq_(self.d.get_state(p1["$Recycle.Bin"]["subdar"]), DirectoryState.NORMAL)
+        # all files there
+        files = self.get_files_and_expect_num_result(6)
+        assert "somefile.png" in files
+        assert "some_unwanted_file.jpg" in files
+
+        # This should still filter out files under directory, despite the Normal state
+        regex2 = r".*unwanted.*"
+        self.d._exclude_list.add(regex2)
+        self.d._exclude_list.mark(regex2)
+        files = self.get_files_and_expect_num_result(3)
+        assert "somefile.png" in files
+        assert "some_unwanted_file.jpg" not in files
+        assert "unwanted_subdirfile.gif" not in files
+        assert "unwanted_subdarfile.png" not in files
+
+        if ISWINDOWS:
+            regex3 = r".*Recycle\.Bin\\.*unwanted.*subdirfile.*"
+        else:
+            regex3 = r".*Recycle\.Bin\/.*unwanted.*subdirfile.*"
+        self.d._exclude_list.rename(regex2, regex3)
+        assert self.d._exclude_list.error(regex3) is None
+        # print(f"get_folders(): {[x for x in self.d.get_folders()]}")
+        # Directory shouldn't change its state here, unless explicitely done by user
+        eq_(self.d.get_state(p1["$Recycle.Bin"]["subdir"]), DirectoryState.NORMAL)
+        files = self.get_files_and_expect_num_result(5)
+        assert "unwanted_subdirfile.gif" not in files
+        assert "unwanted_subdarfile.png" in files
+
+        # using end of line character should only filter the directory, or file ending with subdir
+        regex4 = r".*subdir$"
+        self.d._exclude_list.rename(regex3, regex4)
+        assert self.d._exclude_list.error(regex4) is None
+        p1["$Recycle.Bin"]["subdar"]["file_ending_with_subdir"].open("w").close()
+        eq_(self.d.get_state(p1["$Recycle.Bin"]["subdir"]), DirectoryState.EXCLUDED)
+        files = self.get_files_and_expect_num_result(4)
+        assert "file_ending_with_subdir" not in files
+        assert "somesubdarfile.jpeg" in files
+        assert "somesubdirfile.png" not in files
+        assert "unwanted_subdirfile.gif" not in files
+        self.d.set_state(p1["$Recycle.Bin"]["subdir"], DirectoryState.NORMAL)
+        eq_(self.d.get_state(p1["$Recycle.Bin"]["subdir"]), DirectoryState.NORMAL)
+        # print(f"get_folders(): {[x for x in self.d.get_folders()]}")
+        files = self.get_files_and_expect_num_result(6)
+        assert "file_ending_with_subdir" not in files
+        assert "somesubdirfile.png" in files
+        assert "unwanted_subdirfile.gif" in files
+
+        regex5 = r".*subdir.*"
+        self.d._exclude_list.rename(regex4, regex5)
+        # Files containing substring should be filtered
+        eq_(self.d.get_state(p1["$Recycle.Bin"]["subdir"]), DirectoryState.NORMAL)
+        # The path should not match, only the filename, the "subdir" in the directory name shouldn't matter
+        p1["$Recycle.Bin"]["subdir"]["file_which_shouldnt_match"].open("w").close()
+        files = self.get_files_and_expect_num_result(5)
+        assert "somesubdirfile.png" not in files
+        assert "unwanted_subdirfile.gif" not in files
+        assert "file_ending_with_subdir" not in files
+        assert "file_which_shouldnt_match" in files
+
+        # This should match the directory only
+        regex6 = r".*/.*subdir.*/.*"
+        if ISWINDOWS:
+            regex6 = r".*\\.*subdir.*\\.*"
+        assert os.sep in regex6
+        self.d._exclude_list.rename(regex5, regex6)
+        self.d._exclude_list.remove(regex1)
+        eq_(len(self.d._exclude_list.compiled), 1)
+        assert regex1 not in self.d._exclude_list
+        assert regex5 not in self.d._exclude_list
+        assert self.d._exclude_list.error(regex6) is None
+        assert regex6 in self.d._exclude_list
+        # This still should not be affected
+        eq_(self.d.get_state(p1["$Recycle.Bin"]["subdir"]), DirectoryState.NORMAL)
+        files = self.get_files_and_expect_num_result(5)
+        # These files are under the "/subdir" directory
+        assert "somesubdirfile.png" not in files
+        assert "unwanted_subdirfile.gif" not in files
+        # This file under "subdar" directory should not be filtered out
+        assert "file_ending_with_subdir" in files
+        # This file is in a directory that should be filtered out
+        assert "file_which_shouldnt_match" not in files
+
+    def test_japanese_unicode(self, tmpdir):
+        p1 = Path(str(tmpdir))
+        p1["$Recycle.Bin"].mkdir()
+        p1["$Recycle.Bin"]["somerecycledfile.png"].open("w").close()
+        p1["$Recycle.Bin"]["some_unwanted_file.jpg"].open("w").close()
+        p1["$Recycle.Bin"]["subdir"].mkdir()
+        p1["$Recycle.Bin"]["subdir"]["過去白濁物語～]_カラー.jpg"].open("w").close()
+        p1["$Recycle.Bin"]["思叫物語"].mkdir()
+        p1["$Recycle.Bin"]["思叫物語"]["なししろ会う前"].open("w").close()
+        p1["$Recycle.Bin"]["思叫物語"]["堂～ロ"].open("w").close()
+        self.d.add_path(p1["$Recycle.Bin"])
+        regex3 = r".*物語.*"
+        self.d._exclude_list.add(regex3)
+        self.d._exclude_list.mark(regex3)
+        # print(f"get_folders(): {[x for x in self.d.get_folders()]}")
+        eq_(self.d.get_state(p1["$Recycle.Bin"]["思叫物語"]), DirectoryState.EXCLUDED)
+        files = self.get_files_and_expect_num_result(2)
+        assert "過去白濁物語～]_カラー.jpg" not in files
+        assert "なししろ会う前" not in files
+        assert "堂～ロ" not in files
+        # using end of line character should only filter that directory, not affecting its files
+        regex4 = r".*物語$"
+        self.d._exclude_list.rename(regex3, regex4)
+        assert self.d._exclude_list.error(regex4) is None
+        self.d.set_state(p1["$Recycle.Bin"]["思叫物語"], DirectoryState.NORMAL)
+        files = self.get_files_and_expect_num_result(5)
+        assert "過去白濁物語～]_カラー.jpg" in files
+        assert "なししろ会う前" in files
+        assert "堂～ロ" in files
+
+    def test_get_state_returns_excluded_for_hidden_directories_and_files(self, tmpdir):
+        # This regex only work for files, not paths
+        regex = r"^\..*$"
+        self.d._exclude_list.add(regex)
+        self.d._exclude_list.mark(regex)
+        p1 = Path(str(tmpdir))
+        p1["foobar"].mkdir()
+        p1["foobar"][".hidden_file.txt"].open("w").close()
+        p1["foobar"][".hidden_dir"].mkdir()
+        p1["foobar"][".hidden_dir"]["foobar.jpg"].open("w").close()
+        p1["foobar"][".hidden_dir"][".hidden_subfile.png"].open("w").close()
+        self.d.add_path(p1["foobar"])
+        # It should not inherit its parent's state originally
+        eq_(self.d.get_state(p1["foobar"][".hidden_dir"]), DirectoryState.EXCLUDED)
+        self.d.set_state(p1["foobar"][".hidden_dir"], DirectoryState.NORMAL)
+        # The files should still be filtered
+        files = self.get_files_and_expect_num_result(1)
+        eq_(len(self.d._exclude_list.compiled_paths), 0)
+        eq_(len(self.d._exclude_list.compiled_files), 1)
+        assert ".hidden_file.txt" not in files
+        assert ".hidden_subfile.png" not in files
+        assert "foobar.jpg" in files
+
+
+class TestExcludeDict(TestExcludeList):
+    def setup_method(self, method):
+        self.d = Directories(exclude_list=ExcludeDict(union_regex=False))
+
+
+class TestExcludeListunion(TestExcludeList):
+    def setup_method(self, method):
+        self.d = Directories(exclude_list=ExcludeList(union_regex=True))
+
+
+class TestExcludeDictunion(TestExcludeList):
+    def setup_method(self, method):
+        self.d = Directories(exclude_list=ExcludeDict(union_regex=True))
--- a/core/tests/engine_test.py
+++ b/core/tests/engine_test.py
@@ -13,13 +13,28 @@ from hscommon.testutil import eq_, log_calls
 from .base import NamedObject
 from .. import engine
 from ..engine import (
-    get_match, getwords, Group, getfields, unpack_fields, compare_fields, compare, WEIGHT_WORDS,
-    MATCH_SIMILAR_WORDS, NO_FIELD_ORDER, build_word_dict, get_groups, getmatches, Match,
-    getmatches_by_contents, merge_similar_words, reduce_common_words
+    get_match,
+    getwords,
+    Group,
+    getfields,
+    unpack_fields,
+    compare_fields,
+    compare,
+    WEIGHT_WORDS,
+    MATCH_SIMILAR_WORDS,
+    NO_FIELD_ORDER,
+    build_word_dict,
+    get_groups,
+    getmatches,
+    Match,
+    getmatches_by_contents,
+    merge_similar_words,
+    reduce_common_words,
 )

 no = NamedObject

+
 def get_match_triangle():
    o1 = NamedObject(with_words=True)
    o2 = NamedObject(with_words=True)
@@ -29,6 +44,7 @@ def get_match_triangle():
    m3 = get_match(o2, o3)
    return [m1, m2, m3]

+
 def get_test_group():
    m1, m2, m3 = get_match_triangle()
    result = Group()
@@ -37,6 +53,7 @@ def get_test_group():
    result.add_match(m3)
    return result

+
 def assert_match(m, name1, name2):
    # When testing matches, whether objects are in first or second position very often doesn't
    # matter. This function makes this test more convenient.
@@ -46,53 +63,57 @@ def assert_match(m, name1, name2):
        eq_(m.first.name, name2)
        eq_(m.second.name, name1)

+
 class TestCasegetwords:
    def test_spaces(self):
-        eq_(['a', 'b', 'c', 'd'], getwords("a b c d"))
-        eq_(['a', 'b', 'c', 'd'], getwords(" a  b  c d "))
+        eq_(["a", "b", "c", "d"], getwords("a b c d"))
+        eq_(["a", "b", "c", "d"], getwords(" a  b  c d "))
+
+    def test_unicode(self):
+        eq_(["e", "c", "0", "a", "o", "u", "e", "u"], getwords("é ç 0 à ö û è ¤ ù"))
+        eq_(["02", "君のこころは輝いてるかい？", "国木田花丸", "solo", "ver"], getwords("02 君のこころは輝いてるかい？ 国木田花丸 Solo Ver"))

    def test_splitter_chars(self):
        eq_(
-            [chr(i) for i in range(ord('a'), ord('z')+1)],
-            getwords("a-b_c&d+e(f)g;h\\i[j]k{l}m:n.o,p<q>r/s?t~u!v@w#x$y*z")
+            [chr(i) for i in range(ord("a"), ord("z") + 1)],
+            getwords("a-b_c&d+e(f)g;h\\i[j]k{l}m:n.o,p<q>r/s?t~u!v@w#x$y*z"),
        )

    def test_joiner_chars(self):
        eq_(["aec"], getwords("a'e\u0301c"))

    def test_empty(self):
-        eq_([], getwords(''))
+        eq_([], getwords(""))

    def test_returns_lowercase(self):
-        eq_(['foo', 'bar'], getwords('FOO BAR'))
+        eq_(["foo", "bar"], getwords("FOO BAR"))

    def test_decompose_unicode(self):
-        eq_(getwords('foo\xe9bar'), ['fooebar'])
+        eq_(["fooebar"], getwords("foo\xe9bar"))


 class TestCasegetfields:
    def test_simple(self):
-        eq_([['a', 'b'], ['c', 'd', 'e']], getfields('a b - c d e'))
+        eq_([["a", "b"], ["c", "d", "e"]], getfields("a b - c d e"))

    def test_empty(self):
-        eq_([], getfields(''))
+        eq_([], getfields(""))

    def test_cleans_empty_fields(self):
-        expected = [['a', 'bc', 'def']]
-        actual = getfields(' - a bc def')
+        expected = [["a", "bc", "def"]]
+        actual = getfields(" - a bc def")
        eq_(expected, actual)
-        expected = [['bc', 'def']]


-class TestCaseunpack_fields:
+class TestCaseUnpackFields:
    def test_with_fields(self):
-        expected = ['a', 'b', 'c', 'd', 'e', 'f']
-        actual = unpack_fields([['a'], ['b', 'c'], ['d', 'e', 'f']])
+        expected = ["a", "b", "c", "d", "e", "f"]
+        actual = unpack_fields([["a"], ["b", "c"], ["d", "e", "f"]])
        eq_(expected, actual)

    def test_without_fields(self):
-        expected = ['a', 'b', 'c', 'd', 'e', 'f']
-        actual = unpack_fields(['a', 'b', 'c', 'd', 'e', 'f'])
+        expected = ["a", "b", "c", "d", "e", "f"]
+        actual = unpack_fields(["a", "b", "c", "d", "e", "f"])
        eq_(expected, actual)

    def test_empty(self):
@@ -101,127 +122,140 @@ class TestCaseunpack_fields:

 class TestCaseWordCompare:
    def test_list(self):
-        eq_(100, compare(['a', 'b', 'c', 'd'], ['a', 'b', 'c', 'd']))
-        eq_(86, compare(['a', 'b', 'c', 'd'], ['a', 'b', 'c']))
+        eq_(100, compare(["a", "b", "c", "d"], ["a", "b", "c", "d"]))
+        eq_(86, compare(["a", "b", "c", "d"], ["a", "b", "c"]))

    def test_unordered(self):
-        #Sometimes, users don't want fuzzy matching too much When they set the slider
-        #to 100, they don't expect a filename with the same words, but not the same order, to match.
-        #Thus, we want to return 99 in that case.
-        eq_(99, compare(['a', 'b', 'c', 'd'], ['d', 'b', 'c', 'a']))
+        # Sometimes, users don't want fuzzy matching too much When they set the slider
+        # to 100, they don't expect a filename with the same words, but not the same order, to match.
+        # Thus, we want to return 99 in that case.
+        eq_(99, compare(["a", "b", "c", "d"], ["d", "b", "c", "a"]))

    def test_word_occurs_twice(self):
-        #if a word occurs twice in first, but once in second, we want the word to be only counted once
-        eq_(89, compare(['a', 'b', 'c', 'd', 'a'], ['d', 'b', 'c', 'a']))
+        # if a word occurs twice in first, but once in second, we want the word to be only counted once
+        eq_(89, compare(["a", "b", "c", "d", "a"], ["d", "b", "c", "a"]))

    def test_uses_copy_of_lists(self):
-        first = ['foo', 'bar']
-        second = ['bar', 'bleh']
+        first = ["foo", "bar"]
+        second = ["bar", "bleh"]
        compare(first, second)
-        eq_(['foo', 'bar'], first)
-        eq_(['bar', 'bleh'], second)
+        eq_(["foo", "bar"], first)
+        eq_(["bar", "bleh"], second)

    def test_word_weight(self):
-        eq_(int((6.0 / 13.0) * 100), compare(['foo', 'bar'], ['bar', 'bleh'], (WEIGHT_WORDS, )))
+        eq_(
+            int((6.0 / 13.0) * 100),
+            compare(["foo", "bar"], ["bar", "bleh"], (WEIGHT_WORDS,)),
+        )

    def test_similar_words(self):
-        eq_(100, compare(['the', 'white', 'stripes'], ['the', 'whites', 'stripe'], (MATCH_SIMILAR_WORDS, )))
+        eq_(
+            100,
+            compare(
+                ["the", "white", "stripes"],
+                ["the", "whites", "stripe"],
+                (MATCH_SIMILAR_WORDS,),
+            ),
+        )

    def test_empty(self):
        eq_(0, compare([], []))

    def test_with_fields(self):
-        eq_(67, compare([['a', 'b'], ['c', 'd', 'e']], [['a', 'b'], ['c', 'd', 'f']]))
+        eq_(67, compare([["a", "b"], ["c", "d", "e"]], [["a", "b"], ["c", "d", "f"]]))

    def test_propagate_flags_with_fields(self, monkeypatch):
        def mock_compare(first, second, flags):
            eq_((0, 1, 2, 3, 5), flags)

-        monkeypatch.setattr(engine, 'compare_fields', mock_compare)
-        compare([['a']], [['a']], (0, 1, 2, 3, 5))
+        monkeypatch.setattr(engine, "compare_fields", mock_compare)
+        compare([["a"]], [["a"]], (0, 1, 2, 3, 5))


 class TestCaseWordCompareWithFields:
    def test_simple(self):
-        eq_(67, compare_fields([['a', 'b'], ['c', 'd', 'e']], [['a', 'b'], ['c', 'd', 'f']]))
+        eq_(
+            67,
+            compare_fields([["a", "b"], ["c", "d", "e"]], [["a", "b"], ["c", "d", "f"]]),
+        )

    def test_empty(self):
        eq_(0, compare_fields([], []))

    def test_different_length(self):
-        eq_(0, compare_fields([['a'], ['b']], [['a'], ['b'], ['c']]))
+        eq_(0, compare_fields([["a"], ["b"]], [["a"], ["b"], ["c"]]))

    def test_propagates_flags(self, monkeypatch):
        def mock_compare(first, second, flags):
            eq_((0, 1, 2, 3, 5), flags)

-        monkeypatch.setattr(engine, 'compare_fields', mock_compare)
-        compare_fields([['a']], [['a']], (0, 1, 2, 3, 5))
+        monkeypatch.setattr(engine, "compare_fields", mock_compare)
+        compare_fields([["a"]], [["a"]], (0, 1, 2, 3, 5))

    def test_order(self):
-        first = [['a', 'b'], ['c', 'd', 'e']]
-        second = [['c', 'd', 'f'], ['a', 'b']]
+        first = [["a", "b"], ["c", "d", "e"]]
+        second = [["c", "d", "f"], ["a", "b"]]
        eq_(0, compare_fields(first, second))

    def test_no_order(self):
-        first = [['a', 'b'], ['c', 'd', 'e']]
-        second = [['c', 'd', 'f'], ['a', 'b']]
-        eq_(67, compare_fields(first, second, (NO_FIELD_ORDER, )))
-        first = [['a', 'b'], ['a', 'b']] #a field can only be matched once.
-        second = [['c', 'd', 'f'], ['a', 'b']]
-        eq_(0, compare_fields(first, second, (NO_FIELD_ORDER, )))
-        first = [['a', 'b'], ['a', 'b', 'c']]
-        second = [['c', 'd', 'f'], ['a', 'b']]
-        eq_(33, compare_fields(first, second, (NO_FIELD_ORDER, )))
+        first = [["a", "b"], ["c", "d", "e"]]
+        second = [["c", "d", "f"], ["a", "b"]]
+        eq_(67, compare_fields(first, second, (NO_FIELD_ORDER,)))
+        first = [["a", "b"], ["a", "b"]]  # a field can only be matched once.
+        second = [["c", "d", "f"], ["a", "b"]]
+        eq_(0, compare_fields(first, second, (NO_FIELD_ORDER,)))
+        first = [["a", "b"], ["a", "b", "c"]]
+        second = [["c", "d", "f"], ["a", "b"]]
+        eq_(33, compare_fields(first, second, (NO_FIELD_ORDER,)))

    def test_compare_fields_without_order_doesnt_alter_fields(self):
-        #The NO_ORDER comp type altered the fields!
-        first = [['a', 'b'], ['c', 'd', 'e']]
-        second = [['c', 'd', 'f'], ['a', 'b']]
-        eq_(67, compare_fields(first, second, (NO_FIELD_ORDER, )))
-        eq_([['a', 'b'], ['c', 'd', 'e']], first)
-        eq_([['c', 'd', 'f'], ['a', 'b']], second)
+        # The NO_ORDER comp type altered the fields!
+        first = [["a", "b"], ["c", "d", "e"]]
+        second = [["c", "d", "f"], ["a", "b"]]
+        eq_(67, compare_fields(first, second, (NO_FIELD_ORDER,)))
+        eq_([["a", "b"], ["c", "d", "e"]], first)
+        eq_([["c", "d", "f"], ["a", "b"]], second)


-class TestCasebuild_word_dict:
+class TestCaseBuildWordDict:
    def test_with_standard_words(self):
-        l = [NamedObject('foo bar', True)]
-        l.append(NamedObject('bar baz', True))
-        l.append(NamedObject('baz bleh foo', True))
-        d = build_word_dict(l)
+        item_list = [NamedObject("foo bar", True)]
+        item_list.append(NamedObject("bar baz", True))
+        item_list.append(NamedObject("baz bleh foo", True))
+        d = build_word_dict(item_list)
        eq_(4, len(d))
-        eq_(2, len(d['foo']))
-        assert l[0] in d['foo']
-        assert l[2] in d['foo']
-        eq_(2, len(d['bar']))
-        assert l[0] in d['bar']
-        assert l[1] in d['bar']
-        eq_(2, len(d['baz']))
-        assert l[1] in d['baz']
-        assert l[2] in d['baz']
-        eq_(1, len(d['bleh']))
-        assert l[2] in d['bleh']
+        eq_(2, len(d["foo"]))
+        assert item_list[0] in d["foo"]
+        assert item_list[2] in d["foo"]
+        eq_(2, len(d["bar"]))
+        assert item_list[0] in d["bar"]
+        assert item_list[1] in d["bar"]
+        eq_(2, len(d["baz"]))
+        assert item_list[1] in d["baz"]
+        assert item_list[2] in d["baz"]
+        eq_(1, len(d["bleh"]))
+        assert item_list[2] in d["bleh"]

    def test_unpack_fields(self):
-        o = NamedObject('')
-        o.words = [['foo', 'bar'], ['baz']]
+        o = NamedObject("")
+        o.words = [["foo", "bar"], ["baz"]]
        d = build_word_dict([o])
        eq_(3, len(d))
-        eq_(1, len(d['foo']))
+        eq_(1, len(d["foo"]))

    def test_words_are_unaltered(self):
-        o = NamedObject('')
-        o.words = [['foo', 'bar'], ['baz']]
+        o = NamedObject("")
+        o.words = [["foo", "bar"], ["baz"]]
        build_word_dict([o])
-        eq_([['foo', 'bar'], ['baz']], o.words)
+        eq_([["foo", "bar"], ["baz"]], o.words)

    def test_object_instances_can_only_be_once_in_words_object_list(self):
-        o = NamedObject('foo foo', True)
+        o = NamedObject("foo foo", True)
        d = build_word_dict([o])
-        eq_(1, len(d['foo']))
+        eq_(1, len(d["foo"]))

    def test_job(self):
-        def do_progress(p, d=''):
+        def do_progress(p, d=""):
            self.log.append(p)
            return True

@@ -234,54 +268,53 @@ class TestCasebuild_word_dict:
        eq_(100, self.log[1])


-class TestCasemerge_similar_words:
+class TestCaseMergeSimilarWords:
    def test_some_similar_words(self):
        d = {
-            'foobar': set([1]),
-            'foobar1': set([2]),
-            'foobar2': set([3]),
+            "foobar": set([1]),
+            "foobar1": set([2]),
+            "foobar2": set([3]),
        }
        merge_similar_words(d)
        eq_(1, len(d))
-        eq_(3, len(d['foobar']))
+        eq_(3, len(d["foobar"]))


-
-class TestCasereduce_common_words:
+class TestCaseReduceCommonWords:
    def test_typical(self):
        d = {
-            'foo': set([NamedObject('foo bar', True) for i in range(50)]),
-            'bar': set([NamedObject('foo bar', True) for i in range(49)])
+            "foo": set([NamedObject("foo bar", True) for _ in range(50)]),
+            "bar": set([NamedObject("foo bar", True) for _ in range(49)]),
        }
        reduce_common_words(d, 50)
-        assert 'foo' not in d
-        eq_(49, len(d['bar']))
+        assert "foo" not in d
+        eq_(49, len(d["bar"]))

    def test_dont_remove_objects_with_only_common_words(self):
        d = {
-            'common': set([NamedObject("common uncommon", True) for i in range(50)] + [NamedObject("common", True)]),
-            'uncommon': set([NamedObject("common uncommon", True)])
+            "common": set([NamedObject("common uncommon", True) for _ in range(50)] + [NamedObject("common", True)]),
+            "uncommon": set([NamedObject("common uncommon", True)]),
        }
        reduce_common_words(d, 50)
-        eq_(1, len(d['common']))
-        eq_(1, len(d['uncommon']))
+        eq_(1, len(d["common"]))
+        eq_(1, len(d["uncommon"]))

    def test_values_still_are_set_instances(self):
        d = {
-            'common': set([NamedObject("common uncommon", True) for i in range(50)] + [NamedObject("common", True)]),
-            'uncommon': set([NamedObject("common uncommon", True)])
+            "common": set([NamedObject("common uncommon", True) for _ in range(50)] + [NamedObject("common", True)]),
+            "uncommon": set([NamedObject("common uncommon", True)]),
        }
        reduce_common_words(d, 50)
-        assert isinstance(d['common'], set)
-        assert isinstance(d['uncommon'], set)
+        assert isinstance(d["common"], set)
+        assert isinstance(d["uncommon"], set)

-    def test_dont_raise_KeyError_when_a_word_has_been_removed(self):
-        #If a word has been removed by the reduce, an object in a subsequent common word that
-        #contains the word that has been removed would cause a KeyError.
+    def test_dont_raise_keyerror_when_a_word_has_been_removed(self):
+        # If a word has been removed by the reduce, an object in a subsequent common word that
+        # contains the word that has been removed would cause a KeyError.
        d = {
-            'foo': set([NamedObject('foo bar baz', True) for i in range(50)]),
-            'bar': set([NamedObject('foo bar baz', True) for i in range(50)]),
-            'baz': set([NamedObject('foo bar baz', True) for i in range(49)])
+            "foo": set([NamedObject("foo bar baz", True) for _ in range(50)]),
+            "bar": set([NamedObject("foo bar baz", True) for _ in range(50)]),
+            "baz": set([NamedObject("foo bar baz", True) for _ in range(49)]),
        }
        try:
            reduce_common_words(d, 50)
@@ -289,45 +322,43 @@ class TestCasereduce_common_words:
            self.fail()

    def test_unpack_fields(self):
-        #object.words may be fields.
+        # object.words may be fields.
        def create_it():
-            o = NamedObject('')
-            o.words = [['foo', 'bar'], ['baz']]
+            o = NamedObject("")
+            o.words = [["foo", "bar"], ["baz"]]
            return o

-        d = {
-            'foo': set([create_it() for i in range(50)])
-        }
+        d = {"foo": set([create_it() for _ in range(50)])}
        try:
            reduce_common_words(d, 50)
        except TypeError:
            self.fail("must support fields.")

    def test_consider_a_reduced_common_word_common_even_after_reduction(self):
-        #There was a bug in the code that causeda word that has already been reduced not to
-        #be counted as a common word for subsequent words. For example, if 'foo' is processed
-        #as a common word, keeping a "foo bar" file in it, and the 'bar' is processed, "foo bar"
-        #would not stay in 'bar' because 'foo' is not a common word anymore.
-        only_common = NamedObject('foo bar', True)
+        # There was a bug in the code that causeda word that has already been reduced not to
+        # be counted as a common word for subsequent words. For example, if 'foo' is processed
+        # as a common word, keeping a "foo bar" file in it, and the 'bar' is processed, "foo bar"
+        # would not stay in 'bar' because 'foo' is not a common word anymore.
+        only_common = NamedObject("foo bar", True)
        d = {
-            'foo': set([NamedObject('foo bar baz', True) for i in range(49)] + [only_common]),
-            'bar': set([NamedObject('foo bar baz', True) for i in range(49)] + [only_common]),
-            'baz': set([NamedObject('foo bar baz', True) for i in range(49)])
+            "foo": set([NamedObject("foo bar baz", True) for _ in range(49)] + [only_common]),
+            "bar": set([NamedObject("foo bar baz", True) for _ in range(49)] + [only_common]),
+            "baz": set([NamedObject("foo bar baz", True) for _ in range(49)]),
        }
        reduce_common_words(d, 50)
-        eq_(1, len(d['foo']))
-        eq_(1, len(d['bar']))
-        eq_(49, len(d['baz']))
+        eq_(1, len(d["foo"]))
+        eq_(1, len(d["bar"]))
+        eq_(49, len(d["baz"]))


-class TestCaseget_match:
+class TestCaseGetMatch:
    def test_simple(self):
        o1 = NamedObject("foo bar", True)
        o2 = NamedObject("bar bleh", True)
        m = get_match(o1, o2)
        eq_(50, m.percentage)
-        eq_(['foo', 'bar'], m.first.words)
-        eq_(['bar', 'bleh'], m.second.words)
+        eq_(["foo", "bar"], m.first.words)
+        eq_(["bar", "bleh"], m.second.words)
        assert m.first is o1
        assert m.second is o2

@@ -340,7 +371,7 @@ class TestCaseget_match:
        assert object() not in m

    def test_word_weight(self):
-        m = get_match(NamedObject("foo bar", True), NamedObject("bar bleh", True), (WEIGHT_WORDS, ))
+        m = get_match(NamedObject("foo bar", True), NamedObject("bar bleh", True), (WEIGHT_WORDS,))
        eq_(m.percentage, int((6.0 / 13.0) * 100))


@@ -349,54 +380,63 @@ class TestCaseGetMatches:
        eq_(getmatches([]), [])

    def test_simple(self):
-        l = [NamedObject("foo bar"), NamedObject("bar bleh"), NamedObject("a b c foo")]
-        r = getmatches(l)
+        item_list = [
+            NamedObject("foo bar"),
+            NamedObject("bar bleh"),
+            NamedObject("a b c foo"),
+        ]
+        r = getmatches(item_list)
        eq_(2, len(r))
-        m = first(m for m in r if m.percentage == 50) #"foo bar" and "bar bleh"
-        assert_match(m, 'foo bar', 'bar bleh')
-        m = first(m for m in r if m.percentage == 33) #"foo bar" and "a b c foo"
-        assert_match(m, 'foo bar', 'a b c foo')
+        m = first(m for m in r if m.percentage == 50)  # "foo bar" and "bar bleh"
+        assert_match(m, "foo bar", "bar bleh")
+        m = first(m for m in r if m.percentage == 33)  # "foo bar" and "a b c foo"
+        assert_match(m, "foo bar", "a b c foo")

    def test_null_and_unrelated_objects(self):
-        l = [NamedObject("foo bar"), NamedObject("bar bleh"), NamedObject(""), NamedObject("unrelated object")]
-        r = getmatches(l)
+        item_list = [
+            NamedObject("foo bar"),
+            NamedObject("bar bleh"),
+            NamedObject(""),
+            NamedObject("unrelated object"),
+        ]
+        r = getmatches(item_list)
        eq_(len(r), 1)
        m = r[0]
        eq_(m.percentage, 50)
-        assert_match(m, 'foo bar', 'bar bleh')
+        assert_match(m, "foo bar", "bar bleh")

    def test_twice_the_same_word(self):
-        l = [NamedObject("foo foo bar"), NamedObject("bar bleh")]
-        r = getmatches(l)
+        item_list = [NamedObject("foo foo bar"), NamedObject("bar bleh")]
+        r = getmatches(item_list)
        eq_(1, len(r))

    def test_twice_the_same_word_when_preworded(self):
-        l = [NamedObject("foo foo bar", True), NamedObject("bar bleh", True)]
-        r = getmatches(l)
+        item_list = [NamedObject("foo foo bar", True), NamedObject("bar bleh", True)]
+        r = getmatches(item_list)
        eq_(1, len(r))

    def test_two_words_match(self):
-        l = [NamedObject("foo bar"), NamedObject("foo bar bleh")]
-        r = getmatches(l)
+        item_list = [NamedObject("foo bar"), NamedObject("foo bar bleh")]
+        r = getmatches(item_list)
        eq_(1, len(r))

    def test_match_files_with_only_common_words(self):
-        #If a word occurs more than 50 times, it is excluded from the matching process
-        #The problem with the common_word_threshold is that the files containing only common
-        #words will never be matched together. We *should* match them.
-        # This test assumes that the common word threashold const is 50
-        l = [NamedObject("foo") for i in range(50)]
-        r = getmatches(l)
+        # If a word occurs more than 50 times, it is excluded from the matching process
+        # The problem with the common_word_threshold is that the files containing only common
+        # words will never be matched together. We *should* match them.
+        # This test assumes that the common word threshold const is 50
+        item_list = [NamedObject("foo") for _ in range(50)]
+        r = getmatches(item_list)
        eq_(1225, len(r))

    def test_use_words_already_there_if_there(self):
-        o1 = NamedObject('foo')
-        o2 = NamedObject('bar')
-        o2.words = ['foo']
+        o1 = NamedObject("foo")
+        o2 = NamedObject("bar")
+        o2.words = ["foo"]
        eq_(1, len(getmatches([o1, o2])))

    def test_job(self):
-        def do_progress(p, d=''):
+        def do_progress(p, d=""):
            self.log.append(p)
            return True

@@ -409,28 +449,28 @@ class TestCaseGetMatches:
        eq_(100, self.log[-1])

    def test_weight_words(self):
-        l = [NamedObject("foo bar"), NamedObject("bar bleh")]
-        m = getmatches(l, weight_words=True)[0]
+        item_list = [NamedObject("foo bar"), NamedObject("bar bleh")]
+        m = getmatches(item_list, weight_words=True)[0]
        eq_(int((6.0 / 13.0) * 100), m.percentage)

    def test_similar_word(self):
-        l = [NamedObject("foobar"), NamedObject("foobars")]
-        eq_(len(getmatches(l, match_similar_words=True)), 1)
-        eq_(getmatches(l, match_similar_words=True)[0].percentage, 100)
-        l = [NamedObject("foobar"), NamedObject("foo")]
-        eq_(len(getmatches(l, match_similar_words=True)), 0) #too far
-        l = [NamedObject("bizkit"), NamedObject("bizket")]
-        eq_(len(getmatches(l, match_similar_words=True)), 1)
-        l = [NamedObject("foobar"), NamedObject("foosbar")]
-        eq_(len(getmatches(l, match_similar_words=True)), 1)
+        item_list = [NamedObject("foobar"), NamedObject("foobars")]
+        eq_(len(getmatches(item_list, match_similar_words=True)), 1)
+        eq_(getmatches(item_list, match_similar_words=True)[0].percentage, 100)
+        item_list = [NamedObject("foobar"), NamedObject("foo")]
+        eq_(len(getmatches(item_list, match_similar_words=True)), 0)  # too far
+        item_list = [NamedObject("bizkit"), NamedObject("bizket")]
+        eq_(len(getmatches(item_list, match_similar_words=True)), 1)
+        item_list = [NamedObject("foobar"), NamedObject("foosbar")]
+        eq_(len(getmatches(item_list, match_similar_words=True)), 1)

    def test_single_object_with_similar_words(self):
-        l = [NamedObject("foo foos")]
-        eq_(len(getmatches(l, match_similar_words=True)), 0)
+        item_list = [NamedObject("foo foos")]
+        eq_(len(getmatches(item_list, match_similar_words=True)), 0)

    def test_double_words_get_counted_only_once(self):
-        l = [NamedObject("foo bar foo bleh"), NamedObject("foo bar bleh bar")]
-        m = getmatches(l)[0]
+        item_list = [NamedObject("foo bar foo bleh"), NamedObject("foo bar bleh bar")]
+        m = getmatches(item_list)[0]
        eq_(75, m.percentage)

    def test_with_fields(self):
@@ -450,13 +490,13 @@ class TestCaseGetMatches:
        eq_(m.percentage, 50)

    def test_only_match_similar_when_the_option_is_set(self):
-        l = [NamedObject("foobar"), NamedObject("foobars")]
-        eq_(len(getmatches(l, match_similar_words=False)), 0)
+        item_list = [NamedObject("foobar"), NamedObject("foobars")]
+        eq_(len(getmatches(item_list, match_similar_words=False)), 0)

    def test_dont_recurse_do_match(self):
        # with nosetests, the stack is increased. The number has to be high enough not to be failing falsely
        sys.setrecursionlimit(200)
-        files = [NamedObject('foo bar') for i in range(201)]
+        files = [NamedObject("foo bar") for _ in range(201)]
        try:
            getmatches(files)
        except RuntimeError:
@@ -465,34 +505,60 @@ class TestCaseGetMatches:
            sys.setrecursionlimit(1000)

    def test_min_match_percentage(self):
-        l = [NamedObject("foo bar"), NamedObject("bar bleh"), NamedObject("a b c foo")]
-        r = getmatches(l, min_match_percentage=50)
-        eq_(1, len(r)) #Only "foo bar" / "bar bleh" should match
+        item_list = [
+            NamedObject("foo bar"),
+            NamedObject("bar bleh"),
+            NamedObject("a b c foo"),
+        ]
+        r = getmatches(item_list, min_match_percentage=50)
+        eq_(1, len(r))  # Only "foo bar" / "bar bleh" should match

-    def test_MemoryError(self, monkeypatch):
+    def test_memory_error(self, monkeypatch):
        @log_calls
        def mocked_match(first, second, flags):
            if len(mocked_match.calls) > 42:
                raise MemoryError()
            return Match(first, second, 0)

-        objects = [NamedObject() for i in range(10)] # results in 45 matches
-        monkeypatch.setattr(engine, 'get_match', mocked_match)
+        objects = [NamedObject() for _ in range(10)]  # results in 45 matches
+        monkeypatch.setattr(engine, "get_match", mocked_match)
        try:
            r = getmatches(objects)
        except MemoryError:
-            self.fail('MemorryError must be handled')
+            self.fail("MemoryError must be handled")
        eq_(42, len(r))


 class TestCaseGetMatchesByContents:
-    def test_dont_compare_empty_files(self):
-        o1, o2 = no(size=0), no(size=0)
-        assert not getmatches_by_contents([o1, o2])
+    def test_big_file_partial_hashes(self):
+        smallsize = 1
+        bigsize = 100 * 1024 * 1024  # 100MB
+        f = [
+            no("bigfoo", size=bigsize),
+            no("bigbar", size=bigsize),
+            no("smallfoo", size=smallsize),
+            no("smallbar", size=smallsize),
+        ]
+        f[0].md5 = f[0].md5partial = f[0].md5samples = "foobar"
+        f[1].md5 = f[1].md5partial = f[1].md5samples = "foobar"
+        f[2].md5 = f[2].md5partial = "bleh"
+        f[3].md5 = f[3].md5partial = "bleh"
+        r = getmatches_by_contents(f, bigsize=bigsize)
+        eq_(len(r), 2)
+        # User disabled optimization for big files, compute hashes as usual
+        r = getmatches_by_contents(f, bigsize=0)
+        eq_(len(r), 2)
+        # Other file is now slightly different, md5partial is still the same
+        f[1].md5 = f[1].md5samples = "foobardiff"
+        r = getmatches_by_contents(f, bigsize=bigsize)
+        # Successfully filter it out
+        eq_(len(r), 1)
+        r = getmatches_by_contents(f, bigsize=0)
+        eq_(len(r), 1)


 class TestCaseGroup:
-    def test_empy(self):
+    def test_empty(self):
        g = Group()
        eq_(None, g.ref)
        eq_([], g.dupes)
@@ -599,7 +665,7 @@ class TestCaseGroup:
        eq_([o1], g.dupes)
        g.switch_ref(o2)
        assert o2 is g.ref
-        g.switch_ref(NamedObject('', True))
+        g.switch_ref(NamedObject("", True))
        assert o2 is g.ref

    def test_switch_ref_from_ref_dir(self):
@@ -620,11 +686,11 @@ class TestCaseGroup:
        m = g.get_match_of(o)
        assert g.ref in m
        assert o in m
-        assert g.get_match_of(NamedObject('', True)) is None
+        assert g.get_match_of(NamedObject("", True)) is None
        assert g.get_match_of(g.ref) is None

    def test_percentage(self):
-        #percentage should return the avg percentage in relation to the ref
+        # percentage should return the avg percentage in relation to the ref
        m1, m2, m3 = get_match_triangle()
        m1 = Match(m1[0], m1[1], 100)
        m2 = Match(m2[0], m2[1], 50)
@@ -651,9 +717,9 @@ class TestCaseGroup:
        o1 = m1.first
        o2 = m1.second
        o3 = m2.second
-        o1.name = 'c'
-        o2.name = 'b'
-        o3.name = 'a'
+        o1.name = "c"
+        o2.name = "b"
+        o3.name = "a"
        g = Group()
        g.add_match(m1)
        g.add_match(m2)
@@ -666,8 +732,7 @@ class TestCaseGroup:
        # if the ref has the same key as one or more of the dupe, run the tie_breaker func among them
        g = get_test_group()
        o1, o2, o3 = g.ordered
-        tie_breaker = lambda ref, dupe: dupe is o3
-        g.prioritize(lambda x: 0, tie_breaker)
+        g.prioritize(lambda x: 0, lambda ref, dupe: dupe is o3)
        assert g.ref is o3

    def test_prioritize_with_tie_breaker_runs_on_all_dupes(self):
@@ -678,8 +743,7 @@ class TestCaseGroup:
        o1.foo = 1
        o2.foo = 2
        o3.foo = 3
-        tie_breaker = lambda ref, dupe: dupe.foo > ref.foo
-        g.prioritize(lambda x: 0, tie_breaker)
+        g.prioritize(lambda x: 0, lambda ref, dupe: dupe.foo > ref.foo)
        assert g.ref is o3

    def test_prioritize_with_tie_breaker_runs_only_on_tie_dupes(self):
@@ -692,9 +756,7 @@ class TestCaseGroup:
        o1.bar = 1
        o2.bar = 2
        o3.bar = 3
-        key_func = lambda x: -x.foo
-        tie_breaker = lambda ref, dupe: dupe.bar > ref.bar
-        g.prioritize(key_func, tie_breaker)
+        g.prioritize(lambda x: -x.foo, lambda ref, dupe: dupe.bar > ref.bar)
        assert g.ref is o2

    def test_prioritize_with_ref_dupe(self):
@@ -709,9 +771,9 @@ class TestCaseGroup:
    def test_prioritize_nothing_changes(self):
        # prioritize() returns False when nothing changes in the group.
        g = get_test_group()
-        g[0].name = 'a'
-        g[1].name = 'b'
-        g[2].name = 'c'
+        g[0].name = "a"
+        g[1].name = "b"
+        g[2].name = "c"
        assert not g.prioritize(lambda x: x.name)

    def test_list_like(self):
@@ -723,7 +785,11 @@ class TestCaseGroup:

    def test_discard_matches(self):
        g = Group()
-        o1, o2, o3 = (NamedObject("foo", True), NamedObject("bar", True), NamedObject("baz", True))
+        o1, o2, o3 = (
+            NamedObject("foo", True),
+            NamedObject("bar", True),
+            NamedObject("baz", True),
+        )
        g.add_match(get_match(o1, o2))
        g.add_match(get_match(o1, o3))
        g.discard_matches()
@@ -731,14 +797,14 @@ class TestCaseGroup:
        eq_(0, len(g.candidates))


-class TestCaseget_groups:
+class TestCaseGetGroups:
    def test_empty(self):
        r = get_groups([])
        eq_([], r)

    def test_simple(self):
-        l = [NamedObject("foo bar"), NamedObject("bar bleh")]
-        matches = getmatches(l)
+        item_list = [NamedObject("foo bar"), NamedObject("bar bleh")]
+        matches = getmatches(item_list)
        m = matches[0]
        r = get_groups(matches)
        eq_(1, len(r))
@@ -747,28 +813,39 @@ class TestCaseget_groups:
        eq_([m.second], g.dupes)

    def test_group_with_multiple_matches(self):
-        #This results in 3 matches
-        l = [NamedObject("foo"), NamedObject("foo"), NamedObject("foo")]
-        matches = getmatches(l)
+        # This results in 3 matches
+        item_list = [NamedObject("foo"), NamedObject("foo"), NamedObject("foo")]
+        matches = getmatches(item_list)
        r = get_groups(matches)
        eq_(1, len(r))
        g = r[0]
        eq_(3, len(g))

    def test_must_choose_a_group(self):
-        l = [NamedObject("a b"), NamedObject("a b"), NamedObject("b c"), NamedObject("c d"), NamedObject("c d")]
-        #There will be 2 groups here: group "a b" and group "c d"
-        #"b c" can go either of them, but not both.
-        matches = getmatches(l)
+        item_list = [
+            NamedObject("a b"),
+            NamedObject("a b"),
+            NamedObject("b c"),
+            NamedObject("c d"),
+            NamedObject("c d"),
+        ]
+        # There will be 2 groups here: group "a b" and group "c d"
+        # "b c" can go either of them, but not both.
+        matches = getmatches(item_list)
        r = get_groups(matches)
        eq_(2, len(r))
-        eq_(5, len(r[0])+len(r[1]))
+        eq_(5, len(r[0]) + len(r[1]))

    def test_should_all_go_in_the_same_group(self):
-        l = [NamedObject("a b"), NamedObject("a b"), NamedObject("a b"), NamedObject("a b")]
-        #There will be 2 groups here: group "a b" and group "c d"
-        #"b c" can fit in both, but it must be in only one of them
-        matches = getmatches(l)
+        item_list = [
+            NamedObject("a b"),
+            NamedObject("a b"),
+            NamedObject("a b"),
+            NamedObject("a b"),
+        ]
+        # There will be 2 groups here: group "a b" and group "c d"
+        # "b c" can fit in both, but it must be in only one of them
+        matches = getmatches(item_list)
        r = get_groups(matches)
        eq_(1, len(r))

@@ -787,8 +864,8 @@ class TestCaseget_groups:
        assert o3 in g

    def test_four_sized_group(self):
-        l = [NamedObject("foobar") for i in range(4)]
-        m = getmatches(l)
+        item_list = [NamedObject("foobar") for _ in range(4)]
+        m = getmatches(item_list)
        r = get_groups(m)
        eq_(1, len(r))
        eq_(4, len(r[0]))
@@ -819,4 +896,3 @@ class TestCaseget_groups:
        assert B in g1
        assert C in g2
        assert D in g2
-
--- a/core/tests/exclude_test.py
+++ b/core/tests/exclude_test.py
@@ -0,0 +1,435 @@
+# Copyright 2016 Hardcoded Software (http://www.hardcoded.net)
+#
+# This software is licensed under the "GPLv3" License as described in the "LICENSE" file,
+# which should be included with this package. The terms are also available at
+# http://www.gnu.org/licenses/gpl-3.0.html
+
+import io
+from xml.etree import ElementTree as ET
+
+from hscommon.testutil import eq_
+from hscommon.plat import ISWINDOWS
+
+from .base import DupeGuru
+from ..exclude import ExcludeList, ExcludeDict, default_regexes, AlreadyThereException
+
+from re import error
+
+
+# Two slightly different implementations here, one around a list of lists,
+# and another around a dictionary.
+
+
+class TestCaseListXMLLoading:
+    def setup_method(self, method):
+        self.exclude_list = ExcludeList()
+
+    def test_load_non_existant_file(self):
+        # Loads the pre-defined regexes
+        self.exclude_list.load_from_xml("non_existant.xml")
+        eq_(len(default_regexes), len(self.exclude_list))
+        # they should also be marked by default
+        eq_(len(default_regexes), self.exclude_list.marked_count)
+
+    def test_save_to_xml(self):
+        f = io.BytesIO()
+        self.exclude_list.save_to_xml(f)
+        f.seek(0)
+        doc = ET.parse(f)
+        root = doc.getroot()
+        eq_("exclude_list", root.tag)
+
+    def test_save_and_load(self, tmpdir):
+        e1 = ExcludeList()
+        e2 = ExcludeList()
+        eq_(len(e1), 0)
+        e1.add(r"one")
+        e1.mark(r"one")
+        e1.add(r"two")
+        tmpxml = str(tmpdir.join("exclude_testunit.xml"))
+        e1.save_to_xml(tmpxml)
+        e2.load_from_xml(tmpxml)
+        # We should have the default regexes
+        assert r"one" in e2
+        assert r"two" in e2
+        eq_(len(e2), 2)
+        eq_(e2.marked_count, 1)
+
+    def test_load_xml_with_garbage_and_missing_elements(self):
+        root = ET.Element("foobar")  # The root element shouldn't matter
+        exclude_node = ET.SubElement(root, "bogus")
+        exclude_node.set("regex", "None")
+        exclude_node.set("marked", "y")
+
+        exclude_node = ET.SubElement(root, "exclude")
+        exclude_node.set("regex", "one")
+        # marked field invalid
+        exclude_node.set("markedddd", "y")
+
+        exclude_node = ET.SubElement(root, "exclude")
+        exclude_node.set("regex", "two")
+        # missing marked field
+
+        exclude_node = ET.SubElement(root, "exclude")
+        exclude_node.set("regex", "three")
+        exclude_node.set("markedddd", "pazjbjepo")
+
+        f = io.BytesIO()
+        tree = ET.ElementTree(root)
+        tree.write(f, encoding="utf-8")
+        f.seek(0)
+        self.exclude_list.load_from_xml(f)
+        print(f"{[x for x in self.exclude_list]}")
+        # only the two "exclude" nodes should be added,
+        eq_(3, len(self.exclude_list))
+        # None should be marked
+        eq_(0, self.exclude_list.marked_count)
+
+
+class TestCaseDictXMLLoading(TestCaseListXMLLoading):
+    def setup_method(self, method):
+        self.exclude_list = ExcludeDict()
+
+
+class TestCaseListEmpty:
+    def setup_method(self, method):
+        self.app = DupeGuru()
+        self.app.exclude_list = ExcludeList(union_regex=False)
+        self.exclude_list = self.app.exclude_list
+
+    def test_add_mark_and_remove_regex(self):
+        regex1 = r"one"
+        regex2 = r"two"
+        self.exclude_list.add(regex1)
+        assert regex1 in self.exclude_list
+        self.exclude_list.add(regex2)
+        self.exclude_list.mark(regex1)
+        self.exclude_list.mark(regex2)
+        eq_(len(self.exclude_list), 2)
+        eq_(len(self.exclude_list.compiled), 2)
+        compiled_files = [x for x in self.exclude_list.compiled_files]
+        eq_(len(compiled_files), 2)
+        self.exclude_list.remove(regex2)
+        assert regex2 not in self.exclude_list
+        eq_(len(self.exclude_list), 1)
+
+    def test_add_duplicate(self):
+        self.exclude_list.add(r"one")
+        eq_(1, len(self.exclude_list))
+        try:
+            self.exclude_list.add(r"one")
+        except Exception:
+            pass
+        eq_(1, len(self.exclude_list))
+
+    def test_add_not_compilable(self):
+        # Trying to add a non-valid regex should not work and raise exception
+        regex = r"one))"
+        try:
+            self.exclude_list.add(regex)
+        except Exception as e:
+            # Make sure we raise a re.error so that the interface can process it
+            eq_(type(e), error)
+        added = self.exclude_list.mark(regex)
+        eq_(added, False)
+        eq_(len(self.exclude_list), 0)
+        eq_(len(self.exclude_list.compiled), 0)
+        compiled_files = [x for x in self.exclude_list.compiled_files]
+        eq_(len(compiled_files), 0)
+
+    def test_force_add_not_compilable(self):
+        """Used when loading from XML for example"""
+        regex = r"one))"
+        self.exclude_list.add(regex, forced=True)
+        marked = self.exclude_list.mark(regex)
+        eq_(marked, False)  # can't be marked since not compilable
+        eq_(len(self.exclude_list), 1)
+        eq_(len(self.exclude_list.compiled), 0)
+        compiled_files = [x for x in self.exclude_list.compiled_files]
+        eq_(len(compiled_files), 0)
+        # adding a duplicate
+        regex = r"one))"
+        try:
+            self.exclude_list.add(regex, forced=True)
+        except Exception as e:
+            # we should have this exception, and it shouldn't be added
+            assert type(e) is AlreadyThereException
+        eq_(len(self.exclude_list), 1)
+        eq_(len(self.exclude_list.compiled), 0)
+
+    def test_rename_regex(self):
+        regex = r"one"
+        self.exclude_list.add(regex)
+        self.exclude_list.mark(regex)
+        regex_renamed = r"one))"
+        # Not compilable, can't be marked
+        self.exclude_list.rename(regex, regex_renamed)
+        assert regex not in self.exclude_list
+        assert regex_renamed in self.exclude_list
+        eq_(self.exclude_list.is_marked(regex_renamed), False)
+        self.exclude_list.mark(regex_renamed)
+        eq_(self.exclude_list.is_marked(regex_renamed), False)
+        regex_renamed_compilable = r"two"
+        self.exclude_list.rename(regex_renamed, regex_renamed_compilable)
+        assert regex_renamed_compilable in self.exclude_list
+        eq_(self.exclude_list.is_marked(regex_renamed), False)
+        self.exclude_list.mark(regex_renamed_compilable)
+        eq_(self.exclude_list.is_marked(regex_renamed_compilable), True)
+        eq_(len(self.exclude_list), 1)
+        # Should still be marked after rename
+        regex_compilable = r"three"
+        self.exclude_list.rename(regex_renamed_compilable, regex_compilable)
+        eq_(self.exclude_list.is_marked(regex_compilable), True)
+
+    def test_rename_regex_file_to_path(self):
+        regex = r".*/one.*"
+        if ISWINDOWS:
+            regex = r".*\\one.*"
+        regex2 = r".*one.*"
+        self.exclude_list.add(regex)
+        self.exclude_list.mark(regex)
+        compiled_re = [x.pattern for x in self.exclude_list._excluded_compiled]
+        files_re = [x.pattern for x in self.exclude_list.compiled_files]
+        paths_re = [x.pattern for x in self.exclude_list.compiled_paths]
+        assert regex in compiled_re
+        assert regex not in files_re
+        assert regex in paths_re
+        self.exclude_list.rename(regex, regex2)
+        compiled_re = [x.pattern for x in self.exclude_list._excluded_compiled]
+        files_re = [x.pattern for x in self.exclude_list.compiled_files]
+        paths_re = [x.pattern for x in self.exclude_list.compiled_paths]
+        assert regex not in compiled_re
+        assert regex2 in compiled_re
+        assert regex2 in files_re
+        assert regex2 not in paths_re
+
+    def test_restore_default(self):
+        """Only unmark previously added regexes and mark the pre-defined ones"""
+        regex = r"one"
+        self.exclude_list.add(regex)
+        self.exclude_list.mark(regex)
+        self.exclude_list.restore_defaults()
+        eq_(len(default_regexes), self.exclude_list.marked_count)
+        # added regex shouldn't be marked
+        eq_(self.exclude_list.is_marked(regex), False)
+        # added regex shouldn't be in compiled list either
+        compiled = [x for x in self.exclude_list.compiled]
+        assert regex not in compiled
+        # Only default regexes marked and in compiled list
+        for re in default_regexes:
+            assert self.exclude_list.is_marked(re)
+            found = False
+            for compiled_re in compiled:
+                if compiled_re.pattern == re:
+                    found = True
+            if not found:
+                raise (Exception(f"Default RE {re} not found in compiled list."))
+        eq_(len(default_regexes), len(self.exclude_list.compiled))
+
+
+class TestCaseListEmptyUnion(TestCaseListEmpty):
+    """Same but with union regex"""
+
+    def setup_method(self, method):
+        self.app = DupeGuru()
+        self.app.exclude_list = ExcludeList(union_regex=True)
+        self.exclude_list = self.app.exclude_list
+
+    def test_add_mark_and_remove_regex(self):
+        regex1 = r"one"
+        regex2 = r"two"
+        self.exclude_list.add(regex1)
+        assert regex1 in self.exclude_list
+        self.exclude_list.add(regex2)
+        self.exclude_list.mark(regex1)
+        self.exclude_list.mark(regex2)
+        eq_(len(self.exclude_list), 2)
+        eq_(len(self.exclude_list.compiled), 1)
+        compiled_files = [x for x in self.exclude_list.compiled_files]
+        eq_(len(compiled_files), 1)  # Two patterns joined together into one
+        assert "|" in compiled_files[0].pattern
+        self.exclude_list.remove(regex2)
+        assert regex2 not in self.exclude_list
+        eq_(len(self.exclude_list), 1)
+
+    def test_rename_regex_file_to_path(self):
+        regex = r".*/one.*"
+        if ISWINDOWS:
+            regex = r".*\\one.*"
+        regex2 = r".*one.*"
+        self.exclude_list.add(regex)
+        self.exclude_list.mark(regex)
+        eq_(len([x for x in self.exclude_list]), 1)
+        compiled_re = [x.pattern for x in self.exclude_list.compiled]
+        files_re = [x.pattern for x in self.exclude_list.compiled_files]
+        paths_re = [x.pattern for x in self.exclude_list.compiled_paths]
+        assert regex in compiled_re
+        assert regex not in files_re
+        assert regex in paths_re
+        self.exclude_list.rename(regex, regex2)
+        eq_(len([x for x in self.exclude_list]), 1)
+        compiled_re = [x.pattern for x in self.exclude_list.compiled]
+        files_re = [x.pattern for x in self.exclude_list.compiled_files]
+        paths_re = [x.pattern for x in self.exclude_list.compiled_paths]
+        assert regex not in compiled_re
+        assert regex2 in compiled_re
+        assert regex2 in files_re
+        assert regex2 not in paths_re
+
+    def test_restore_default(self):
+        """Only unmark previously added regexes and mark the pre-defined ones"""
+        regex = r"one"
+        self.exclude_list.add(regex)
+        self.exclude_list.mark(regex)
+        self.exclude_list.restore_defaults()
+        eq_(len(default_regexes), self.exclude_list.marked_count)
+        # added regex shouldn't be marked
+        eq_(self.exclude_list.is_marked(regex), False)
+        # added regex shouldn't be in compiled list either
+        compiled = [x for x in self.exclude_list.compiled]
+        assert regex not in compiled
+        # Need to escape both to get the same strings after compilation
+        compiled_escaped = set([x.encode("unicode-escape").decode() for x in compiled[0].pattern.split("|")])
+        default_escaped = set([x.encode("unicode-escape").decode() for x in default_regexes])
+        assert compiled_escaped == default_escaped
+        eq_(len(default_regexes), len(compiled[0].pattern.split("|")))
+
+
+class TestCaseDictEmpty(TestCaseListEmpty):
+    """Same, but with dictionary implementation"""
+
+    def setup_method(self, method):
+        self.app = DupeGuru()
+        self.app.exclude_list = ExcludeDict(union_regex=False)
+        self.exclude_list = self.app.exclude_list
+
+
+class TestCaseDictEmptyUnion(TestCaseDictEmpty):
+    """Same, but with union regex"""
+
+    def setup_method(self, method):
+        self.app = DupeGuru()
+        self.app.exclude_list = ExcludeDict(union_regex=True)
+        self.exclude_list = self.app.exclude_list
+
+    def test_add_mark_and_remove_regex(self):
+        regex1 = r"one"
+        regex2 = r"two"
+        self.exclude_list.add(regex1)
+        assert regex1 in self.exclude_list
+        self.exclude_list.add(regex2)
+        self.exclude_list.mark(regex1)
+        self.exclude_list.mark(regex2)
+        eq_(len(self.exclude_list), 2)
+        eq_(len(self.exclude_list.compiled), 1)
+        compiled_files = [x for x in self.exclude_list.compiled_files]
+        # two patterns joined into one
+        eq_(len(compiled_files), 1)
+        self.exclude_list.remove(regex2)
+        assert regex2 not in self.exclude_list
+        eq_(len(self.exclude_list), 1)
+
+    def test_rename_regex_file_to_path(self):
+        regex = r".*/one.*"
+        if ISWINDOWS:
+            regex = r".*\\one.*"
+        regex2 = r".*one.*"
+        self.exclude_list.add(regex)
+        self.exclude_list.mark(regex)
+        marked_re = [x for marked, x in self.exclude_list if marked]
+        eq_(len(marked_re), 1)
+        compiled_re = [x.pattern for x in self.exclude_list.compiled]
+        files_re = [x.pattern for x in self.exclude_list.compiled_files]
+        paths_re = [x.pattern for x in self.exclude_list.compiled_paths]
+        assert regex in compiled_re
+        assert regex not in files_re
+        assert regex in paths_re
+        self.exclude_list.rename(regex, regex2)
+        compiled_re = [x.pattern for x in self.exclude_list.compiled]
+        files_re = [x.pattern for x in self.exclude_list.compiled_files]
+        paths_re = [x.pattern for x in self.exclude_list.compiled_paths]
+        assert regex not in compiled_re
+        assert regex2 in compiled_re
+        assert regex2 in files_re
+        assert regex2 not in paths_re
+
+    def test_restore_default(self):
+        """Only unmark previously added regexes and mark the pre-defined ones"""
+        regex = r"one"
+        self.exclude_list.add(regex)
+        self.exclude_list.mark(regex)
+        self.exclude_list.restore_defaults()
+        eq_(len(default_regexes), self.exclude_list.marked_count)
+        # added regex shouldn't be marked
+        eq_(self.exclude_list.is_marked(regex), False)
+        # added regex shouldn't be in compiled list either
+        compiled = [x for x in self.exclude_list.compiled]
+        assert regex not in compiled
+        # Need to escape both to get the same strings after compilation
+        compiled_escaped = set([x.encode("unicode-escape").decode() for x in compiled[0].pattern.split("|")])
+        default_escaped = set([x.encode("unicode-escape").decode() for x in default_regexes])
+        assert compiled_escaped == default_escaped
+        eq_(len(default_regexes), len(compiled[0].pattern.split("|")))
+
+
+def split_union(pattern_object):
+    """Returns list of strings for each union pattern"""
+    return [x for x in pattern_object.pattern.split("|")]
+
+
+class TestCaseCompiledList:
+    """Test consistency between union or and separate versions."""
+
+    def setup_method(self, method):
+        self.e_separate = ExcludeList(union_regex=False)
+        self.e_separate.restore_defaults()
+        self.e_union = ExcludeList(union_regex=True)
+        self.e_union.restore_defaults()
+
+    def test_same_number_of_expressions(self):
+        # We only get one union Pattern item in a tuple, which is made of however many parts
+        eq_(len(split_union(self.e_union.compiled[0])), len(default_regexes))
+        # We get as many as there are marked items
+        eq_(len(self.e_separate.compiled), len(default_regexes))
+        exprs = split_union(self.e_union.compiled[0])
+        # We should have the same number and the same expressions
+        eq_(len(exprs), len(self.e_separate.compiled))
+        for expr in self.e_separate.compiled:
+            assert expr.pattern in exprs
+
+    def test_compiled_files(self):
+        # is path separator checked properly to yield the output
+        if ISWINDOWS:
+            regex1 = r"test\\one\\sub"
+        else:
+            regex1 = r"test/one/sub"
+        self.e_separate.add(regex1)
+        self.e_separate.mark(regex1)
+        self.e_union.add(regex1)
+        self.e_union.mark(regex1)
+        separate_compiled_dirs = self.e_separate.compiled
+        separate_compiled_files = [x for x in self.e_separate.compiled_files]
+        # HACK we need to call compiled property FIRST to generate the cache
+        union_compiled_dirs = self.e_union.compiled
+        # print(f"type: {type(self.e_union.compiled_files[0])}")
+        # A generator returning only one item... ugh
+        union_compiled_files = [x for x in self.e_union.compiled_files][0]
+        print(f"compiled files: {union_compiled_files}")
+        # Separate should give several plus the one added
+        eq_(len(separate_compiled_dirs), len(default_regexes) + 1)
+        # regex1 shouldn't be in the "files" version
+        eq_(len(separate_compiled_files), len(default_regexes))
+        # Only one Pattern returned, which when split should be however many + 1
+        eq_(len(split_union(union_compiled_dirs[0])), len(default_regexes) + 1)
+        # regex1 shouldn't be here either
+        eq_(len(split_union(union_compiled_files)), len(default_regexes))
+
+
+class TestCaseCompiledDict(TestCaseCompiledList):
+    """Test the dictionary version"""
+
+    def setup_method(self, method):
+        self.e_separate = ExcludeDict(union_regex=False)
+        self.e_separate.restore_defaults()
+        self.e_union = ExcludeDict(union_regex=True)
+        self.e_union.restore_defaults()
--- a/core/tests/fs_test.py
+++ b/core/tests/fs_test.py
@@ -7,6 +7,7 @@
 # http://www.gnu.org/licenses/gpl-3.0.html

 import hashlib
+from os import urandom

 from hscommon.path import Path
 from hscommon.testutil import eq_
@@ -14,32 +15,95 @@ from core.tests.directories_test import create_fake_fs

 from .. import fs

+
+def create_fake_fs_with_random_data(rootpath):
+    rootpath = rootpath["fs"]
+    rootpath.mkdir()
+    rootpath["dir1"].mkdir()
+    rootpath["dir2"].mkdir()
+    rootpath["dir3"].mkdir()
+    fp = rootpath["file1.test"].open("wb")
+    data1 = urandom(200 * 1024)  # 200KiB
+    data2 = urandom(1024 * 1024)  # 1MiB
+    data3 = urandom(10 * 1024 * 1024)  # 10MiB
+    fp.write(data1)
+    fp.close()
+    fp = rootpath["file2.test"].open("wb")
+    fp.write(data2)
+    fp.close()
+    fp = rootpath["file3.test"].open("wb")
+    fp.write(data3)
+    fp.close()
+    fp = rootpath["dir1"]["file1.test"].open("wb")
+    fp.write(data1)
+    fp.close()
+    fp = rootpath["dir2"]["file2.test"].open("wb")
+    fp.write(data2)
+    fp.close()
+    fp = rootpath["dir3"]["file3.test"].open("wb")
+    fp.write(data3)
+    fp.close()
+    return rootpath
+
+
 def test_size_aggregates_subfiles(tmpdir):
    p = create_fake_fs(Path(str(tmpdir)))
    b = fs.Folder(p)
    eq_(b.size, 12)

+
 def test_md5_aggregate_subfiles_sorted(tmpdir):
-    #dir.allfiles can return child in any order. Thus, bundle.md5 must aggregate
-    #all files' md5 it contains, but it must make sure that it does so in the 
-    #same order everytime.
-    p = create_fake_fs(Path(str(tmpdir)))
+    # dir.allfiles can return child in any order. Thus, bundle.md5 must aggregate
+    # all files' md5 it contains, but it must make sure that it does so in the
+    # same order everytime.
+    p = create_fake_fs_with_random_data(Path(str(tmpdir)))
    b = fs.Folder(p)
-    md51 = fs.File(p['dir1']['file1.test']).md5
-    md52 = fs.File(p['dir2']['file2.test']).md5
-    md53 = fs.File(p['dir3']['file3.test']).md5
-    md54 = fs.File(p['file1.test']).md5
-    md55 = fs.File(p['file2.test']).md5
-    md56 = fs.File(p['file3.test']).md5
+    md51 = fs.File(p["dir1"]["file1.test"]).md5
+    md52 = fs.File(p["dir2"]["file2.test"]).md5
+    md53 = fs.File(p["dir3"]["file3.test"]).md5
+    md54 = fs.File(p["file1.test"]).md5
+    md55 = fs.File(p["file2.test"]).md5
+    md56 = fs.File(p["file3.test"]).md5
    # The expected md5 is the md5 of md5s for folders and the direct md5 for files
    folder_md51 = hashlib.md5(md51).digest()
    folder_md52 = hashlib.md5(md52).digest()
    folder_md53 = hashlib.md5(md53).digest()
-    md5 = hashlib.md5(folder_md51+folder_md52+folder_md53+md54+md55+md56)
+    md5 = hashlib.md5(folder_md51 + folder_md52 + folder_md53 + md54 + md55 + md56)
    eq_(b.md5, md5.digest())

+
+def test_partial_md5_aggregate_subfile_sorted(tmpdir):
+    p = create_fake_fs_with_random_data(Path(str(tmpdir)))
+    b = fs.Folder(p)
+    md51 = fs.File(p["dir1"]["file1.test"]).md5partial
+    md52 = fs.File(p["dir2"]["file2.test"]).md5partial
+    md53 = fs.File(p["dir3"]["file3.test"]).md5partial
+    md54 = fs.File(p["file1.test"]).md5partial
+    md55 = fs.File(p["file2.test"]).md5partial
+    md56 = fs.File(p["file3.test"]).md5partial
+    # The expected md5 is the md5 of md5s for folders and the direct md5 for files
+    folder_md51 = hashlib.md5(md51).digest()
+    folder_md52 = hashlib.md5(md52).digest()
+    folder_md53 = hashlib.md5(md53).digest()
+    md5 = hashlib.md5(folder_md51 + folder_md52 + folder_md53 + md54 + md55 + md56)
+    eq_(b.md5partial, md5.digest())
+
+    md51 = fs.File(p["dir1"]["file1.test"]).md5samples
+    md52 = fs.File(p["dir2"]["file2.test"]).md5samples
+    md53 = fs.File(p["dir3"]["file3.test"]).md5samples
+    md54 = fs.File(p["file1.test"]).md5samples
+    md55 = fs.File(p["file2.test"]).md5samples
+    md56 = fs.File(p["file3.test"]).md5samples
+    # The expected md5 is the md5 of md5s for folders and the direct md5 for files
+    folder_md51 = hashlib.md5(md51).digest()
+    folder_md52 = hashlib.md5(md52).digest()
+    folder_md53 = hashlib.md5(md53).digest()
+    md5 = hashlib.md5(folder_md51 + folder_md52 + folder_md53 + md54 + md55 + md56)
+    eq_(b.md5samples, md5.digest())
+
+
 def test_has_file_attrs(tmpdir):
-    #a Folder must behave like a file, so it must have mtime attributes
+    # a Folder must behave like a file, so it must have mtime attributes
    b = fs.Folder(Path(str(tmpdir)))
    assert b.mtime > 0
-    eq_(b.extension, '')
+    eq_(b.extension, "")
--- a/core/tests/ignore_test.py
+++ b/core/tests/ignore_test.py
@@ -12,79 +12,87 @@ from hscommon.testutil import eq_

 from ..ignore import IgnoreList

+
 def test_empty():
    il = IgnoreList()
    eq_(0, len(il))
-    assert not il.AreIgnored('foo', 'bar')
+    assert not il.are_ignored("foo", "bar")
+

 def test_simple():
    il = IgnoreList()
-    il.Ignore('foo', 'bar')
-    assert il.AreIgnored('foo', 'bar')
-    assert il.AreIgnored('bar', 'foo')
-    assert not il.AreIgnored('foo', 'bleh')
-    assert not il.AreIgnored('bleh', 'bar')
+    il.ignore("foo", "bar")
+    assert il.are_ignored("foo", "bar")
+    assert il.are_ignored("bar", "foo")
+    assert not il.are_ignored("foo", "bleh")
+    assert not il.are_ignored("bleh", "bar")
    eq_(1, len(il))

+
 def test_multiple():
    il = IgnoreList()
-    il.Ignore('foo', 'bar')
-    il.Ignore('foo', 'bleh')
-    il.Ignore('bleh', 'bar')
-    il.Ignore('aybabtu', 'bleh')
-    assert il.AreIgnored('foo', 'bar')
-    assert il.AreIgnored('bar', 'foo')
-    assert il.AreIgnored('foo', 'bleh')
-    assert il.AreIgnored('bleh', 'bar')
-    assert not il.AreIgnored('aybabtu', 'bar')
+    il.ignore("foo", "bar")
+    il.ignore("foo", "bleh")
+    il.ignore("bleh", "bar")
+    il.ignore("aybabtu", "bleh")
+    assert il.are_ignored("foo", "bar")
+    assert il.are_ignored("bar", "foo")
+    assert il.are_ignored("foo", "bleh")
+    assert il.are_ignored("bleh", "bar")
+    assert not il.are_ignored("aybabtu", "bar")
    eq_(4, len(il))

+
 def test_clear():
    il = IgnoreList()
-    il.Ignore('foo', 'bar')
-    il.Clear()
-    assert not il.AreIgnored('foo', 'bar')
-    assert not il.AreIgnored('bar', 'foo')
+    il.ignore("foo", "bar")
+    il.clear()
+    assert not il.are_ignored("foo", "bar")
+    assert not il.are_ignored("bar", "foo")
    eq_(0, len(il))

+
 def test_add_same_twice():
    il = IgnoreList()
-    il.Ignore('foo', 'bar')
-    il.Ignore('bar', 'foo')
+    il.ignore("foo", "bar")
+    il.ignore("bar", "foo")
    eq_(1, len(il))

+
 def test_save_to_xml():
    il = IgnoreList()
-    il.Ignore('foo', 'bar')
-    il.Ignore('foo', 'bleh')
-    il.Ignore('bleh', 'bar')
+    il.ignore("foo", "bar")
+    il.ignore("foo", "bleh")
+    il.ignore("bleh", "bar")
    f = io.BytesIO()
    il.save_to_xml(f)
    f.seek(0)
    doc = ET.parse(f)
    root = doc.getroot()
-    eq_(root.tag, 'ignore_list')
+    eq_(root.tag, "ignore_list")
    eq_(len(root), 2)
-    eq_(len([c for c in root if c.tag == 'file']), 2)
+    eq_(len([c for c in root if c.tag == "file"]), 2)
    f1, f2 = root[:]
-    subchildren = [c for c in f1 if c.tag == 'file'] + [c for c in f2 if c.tag == 'file']
+    subchildren = [c for c in f1 if c.tag == "file"] + [c for c in f2 if c.tag == "file"]
    eq_(len(subchildren), 3)

-def test_SaveThenLoad():
+
+def test_save_then_load():
    il = IgnoreList()
-    il.Ignore('foo', 'bar')
-    il.Ignore('foo', 'bleh')
-    il.Ignore('bleh', 'bar')
-    il.Ignore('\u00e9', 'bar')
+    il.ignore("foo", "bar")
+    il.ignore("foo", "bleh")
+    il.ignore("bleh", "bar")
+    il.ignore("\u00e9", "bar")
    f = io.BytesIO()
    il.save_to_xml(f)
    f.seek(0)
    il = IgnoreList()
    il.load_from_xml(f)
    eq_(4, len(il))
-    assert il.AreIgnored('\u00e9', 'bar')
+    assert il.are_ignored("\u00e9", "bar")

-def test_LoadXML_with_empty_file_tags():
+
+def test_load_xml_with_empty_file_tags():
    f = io.BytesIO()
    f.write(b'<?xml version="1.0" encoding="utf-8"?><ignore_list><file><file/></file></ignore_list>')
    f.seek(0)
@@ -92,72 +100,80 @@ def test_LoadXML_with_empty_file_tags():
    il.load_from_xml(f)
    eq_(0, len(il))

-def test_AreIgnore_works_when_a_child_is_a_key_somewhere_else():
+
+def test_are_ignore_works_when_a_child_is_a_key_somewhere_else():
    il = IgnoreList()
-    il.Ignore('foo', 'bar')
-    il.Ignore('bar', 'baz')
-    assert il.AreIgnored('bar', 'foo')
+    il.ignore("foo", "bar")
+    il.ignore("bar", "baz")
+    assert il.are_ignored("bar", "foo")


 def test_no_dupes_when_a_child_is_a_key_somewhere_else():
    il = IgnoreList()
-    il.Ignore('foo', 'bar')
-    il.Ignore('bar', 'baz')
-    il.Ignore('bar', 'foo')
+    il.ignore("foo", "bar")
+    il.ignore("bar", "baz")
+    il.ignore("bar", "foo")
    eq_(2, len(il))

+
 def test_iterate():
-    #It must be possible to iterate through ignore list
+    # It must be possible to iterate through ignore list
    il = IgnoreList()
-    expected = [('foo', 'bar'), ('bar', 'baz'), ('foo', 'baz')]
+    expected = [("foo", "bar"), ("bar", "baz"), ("foo", "baz")]
    for i in expected:
-        il.Ignore(i[0], i[1])
+        il.ignore(i[0], i[1])
    for i in il:
-        expected.remove(i) #No exception should be raised
-    assert not expected #expected should be empty
+        expected.remove(i)  # No exception should be raised
+    assert not expected  # expected should be empty
+

 def test_filter():
    il = IgnoreList()
-    il.Ignore('foo', 'bar')
-    il.Ignore('bar', 'baz')
-    il.Ignore('foo', 'baz')
-    il.Filter(lambda f, s: f == 'bar')
+    il.ignore("foo", "bar")
+    il.ignore("bar", "baz")
+    il.ignore("foo", "baz")
+    il.filter(lambda f, s: f == "bar")
    eq_(1, len(il))
-    assert not il.AreIgnored('foo', 'bar')
-    assert il.AreIgnored('bar', 'baz')
+    assert not il.are_ignored("foo", "bar")
+    assert il.are_ignored("bar", "baz")
+

 def test_save_with_non_ascii_items():
    il = IgnoreList()
-    il.Ignore('\xac', '\xbf')
+    il.ignore("\xac", "\xbf")
    f = io.BytesIO()
    try:
        il.save_to_xml(f)
    except Exception as e:
        raise AssertionError(str(e))

+
 def test_len():
    il = IgnoreList()
    eq_(0, len(il))
-    il.Ignore('foo', 'bar')
+    il.ignore("foo", "bar")
    eq_(1, len(il))

+
 def test_nonzero():
    il = IgnoreList()
    assert not il
-    il.Ignore('foo', 'bar')
+    il.ignore("foo", "bar")
    assert il

+
 def test_remove():
    il = IgnoreList()
-    il.Ignore('foo', 'bar')
-    il.Ignore('foo', 'baz')
-    il.remove('bar', 'foo')
+    il.ignore("foo", "bar")
+    il.ignore("foo", "baz")
+    il.remove("bar", "foo")
    eq_(len(il), 1)
-    assert not il.AreIgnored('foo', 'bar')
+    assert not il.are_ignored("foo", "bar")
+

 def test_remove_non_existant():
    il = IgnoreList()
-    il.Ignore('foo', 'bar')
-    il.Ignore('foo', 'baz')
+    il.ignore("foo", "bar")
+    il.ignore("foo", "baz")
    with raises(ValueError):
-        il.remove('foo', 'bleh')
+        il.remove("foo", "bleh")
--- a/core/tests/markable_test.py
+++ b/core/tests/markable_test.py
@@ -8,33 +8,39 @@ from hscommon.testutil import eq_

 from ..markable import MarkableList, Markable

+
 def gen():
    ml = MarkableList()
    ml.extend(list(range(10)))
    return ml

+
 def test_unmarked():
    ml = gen()
    for i in ml:
        assert not ml.is_marked(i)

+
 def test_mark():
    ml = gen()
    assert ml.mark(3)
    assert ml.is_marked(3)
    assert not ml.is_marked(2)

+
 def test_unmark():
    ml = gen()
    ml.mark(4)
    assert ml.unmark(4)
    assert not ml.is_marked(4)

+
 def test_unmark_unmarked():
    ml = gen()
    assert not ml.unmark(4)
    assert not ml.is_marked(4)

+
 def test_mark_twice_and_unmark():
    ml = gen()
    assert ml.mark(5)
@@ -42,6 +48,7 @@ def test_mark_twice_and_unmark():
    ml.unmark(5)
    assert not ml.is_marked(5)

+
 def test_mark_toggle():
    ml = gen()
    ml.mark_toggle(6)
@@ -51,22 +58,25 @@ def test_mark_toggle():
    ml.mark_toggle(6)
    assert ml.is_marked(6)

+
 def test_is_markable():
    class Foobar(Markable):
        def _is_markable(self, o):
-            return o == 'foobar'
+            return o == "foobar"
+
    f = Foobar()
-    assert not f.is_marked('foobar')
-    assert not f.mark('foo')
-    assert not f.is_marked('foo')
-    f.mark_toggle('foo')
-    assert not f.is_marked('foo')
-    f.mark('foobar')
-    assert f.is_marked('foobar')
+    assert not f.is_marked("foobar")
+    assert not f.mark("foo")
+    assert not f.is_marked("foo")
+    f.mark_toggle("foo")
+    assert not f.is_marked("foo")
+    f.mark("foobar")
+    assert f.is_marked("foobar")
    ml = gen()
    ml.mark(11)
    assert not ml.is_marked(11)

+
 def test_change_notifications():
    class Foobar(Markable):
        def _did_mark(self, o):
@@ -77,13 +87,14 @@ def test_change_notifications():

    f = Foobar()
    f.log = []
-    f.mark('foo')
-    f.mark('foo')
-    f.mark_toggle('bar')
-    f.unmark('foo')
-    f.unmark('foo')
-    f.mark_toggle('bar')
-    eq_([(True, 'foo'), (True, 'bar'), (False, 'foo'), (False, 'bar')], f.log)
+    f.mark("foo")
+    f.mark("foo")
+    f.mark_toggle("bar")
+    f.unmark("foo")
+    f.unmark("foo")
+    f.mark_toggle("bar")
+    eq_([(True, "foo"), (True, "bar"), (False, "foo"), (False, "bar")], f.log)
+

 def test_mark_count():
    ml = gen()
@@ -93,6 +104,7 @@ def test_mark_count():
    ml.mark(11)
    eq_(1, ml.mark_count)

+
 def test_mark_none():
    log = []
    ml = gen()
@@ -104,6 +116,7 @@ def test_mark_none():
    eq_(0, ml.mark_count)
    eq_([1, 2], log)

+
 def test_mark_all():
    ml = gen()
    eq_(0, ml.mark_count)
@@ -111,6 +124,7 @@ def test_mark_all():
    eq_(10, ml.mark_count)
    assert ml.is_marked(1)

+
 def test_mark_invert():
    ml = gen()
    ml.mark(1)
@@ -118,6 +132,7 @@ def test_mark_invert():
    assert not ml.is_marked(1)
    assert ml.is_marked(2)

+
 def test_mark_while_inverted():
    log = []
    ml = gen()
@@ -134,6 +149,7 @@ def test_mark_while_inverted():
    eq_(7, ml.mark_count)
    eq_([(True, 1), (False, 1), (True, 2), (True, 1), (True, 3)], log)

+
 def test_remove_mark_flag():
    ml = gen()
    ml.mark(1)
@@ -145,10 +161,12 @@ def test_remove_mark_flag():
    ml._remove_mark_flag(1)
    assert ml.is_marked(1)

+
 def test_is_marked_returns_false_if_object_not_markable():
    class MyMarkableList(MarkableList):
        def _is_markable(self, o):
            return o != 4
+
    ml = MyMarkableList()
    ml.extend(list(range(10)))
    ml.mark_invert()
--- a/core/tests/prioritize_test.py
+++ b/core/tests/prioritize_test.py
@@ -14,6 +14,7 @@ from ..engine import Group, Match

 no = NamedObject

+
 def app_with_dupes(dupes):
    # Creates an app with specified dupes. dupes is a list of lists, each list in the list being
    # a dupe group. We cheat a little bit by creating dupe groups manually instead of running a
@@ -29,22 +30,25 @@ def app_with_dupes(dupes):
    app.app._results_changed()
    return app

-#---
+
+# ---
 def app_normal_results():
    # Just some results, with different extensions and size, for good measure.
    dupes = [
        [
-            no('foo1.ext1', size=1, folder='folder1'),
-            no('foo2.ext2', size=2, folder='folder2')
+            no("foo1.ext1", size=1, folder="folder1"),
+            no("foo2.ext2", size=2, folder="folder2"),
        ],
    ]
    return app_with_dupes(dupes)

+
@with_app(app_normal_results)
 def test_kind_subcrit(app):
    # The subcriteria of the "Kind" criteria is a list of extensions contained in the dupes.
    app.select_pri_criterion("Kind")
-    eq_(app.pdialog.criteria_list[:], ['ext1', 'ext2'])
+    eq_(app.pdialog.criteria_list[:], ["ext1", "ext2"])
+

@with_app(app_normal_results)
 def test_kind_reprioritization(app):
@@ -54,12 +58,14 @@ def test_kind_reprioritization(app):
    app.pdialog.criteria_list.select([1])  # ext2
    app.pdialog.add_selected()
    app.pdialog.perform_reprioritization()
-    eq_(app.rtable[0].data['name'], 'foo2.ext2')
+    eq_(app.rtable[0].data["name"], "foo2.ext2")
+

@with_app(app_normal_results)
 def test_folder_subcrit(app):
    app.select_pri_criterion("Folder")
-    eq_(app.pdialog.criteria_list[:], ['folder1', 'folder2'])
+    eq_(app.pdialog.criteria_list[:], ["folder1", "folder2"])
+

@with_app(app_normal_results)
 def test_folder_reprioritization(app):
@@ -67,7 +73,8 @@ def test_folder_reprioritization(app):
    app.pdialog.criteria_list.select([1])  # folder2
    app.pdialog.add_selected()
    app.pdialog.perform_reprioritization()
-    eq_(app.rtable[0].data['name'], 'foo2.ext2')
+    eq_(app.rtable[0].data["name"], "foo2.ext2")
+

@with_app(app_normal_results)
 def test_prilist_display(app):
@@ -88,10 +95,12 @@ def test_prilist_display(app):
    ]
    eq_(app.pdialog.prioritization_list[:], expected)

+
@with_app(app_normal_results)
 def test_size_subcrit(app):
    app.select_pri_criterion("Size")
-    eq_(app.pdialog.criteria_list[:], ['Highest', 'Lowest'])
+    eq_(app.pdialog.criteria_list[:], ["Highest", "Lowest"])
+

@with_app(app_normal_results)
 def test_size_reprioritization(app):
@@ -99,7 +108,8 @@ def test_size_reprioritization(app):
    app.pdialog.criteria_list.select([0])  # highest
    app.pdialog.add_selected()
    app.pdialog.perform_reprioritization()
-    eq_(app.rtable[0].data['name'], 'foo2.ext2')
+    eq_(app.rtable[0].data["name"], "foo2.ext2")
+

@with_app(app_normal_results)
 def test_reorder_prioritizations(app):
@@ -112,6 +122,7 @@ def test_reorder_prioritizations(app):
    ]
    eq_(app.pdialog.prioritization_list[:], expected)

+
@with_app(app_normal_results)
 def test_remove_crit_from_list(app):
    app.add_pri_criterion("Kind", 0)
@@ -123,46 +134,43 @@ def test_remove_crit_from_list(app):
    ]
    eq_(app.pdialog.prioritization_list[:], expected)

+
@with_app(app_normal_results)
 def test_add_crit_without_selection(app):
    # Adding a criterion without having made a selection doesn't cause a crash.
    app.pdialog.add_selected()  # no crash

-#---
+
+# ---
 def app_one_name_ends_with_number():
    dupes = [
-        [
-            no('foo.ext'),
-            no('foo1.ext'),
-        ],
+        [no("foo.ext"), no("foo1.ext")],
    ]
    return app_with_dupes(dupes)

+
@with_app(app_one_name_ends_with_number)
 def test_filename_reprioritization(app):
    app.add_pri_criterion("Filename", 0)  # Ends with a number
    app.pdialog.perform_reprioritization()
-    eq_(app.rtable[0].data['name'], 'foo1.ext')
+    eq_(app.rtable[0].data["name"], "foo1.ext")

-#---
+
+# ---
 def app_with_subfolders():
    dupes = [
-        [
-            no('foo1', folder='baz'),
-            no('foo2', folder='foo/bar'),
-        ],
-        [
-            no('foo3', folder='baz'),
-            no('foo4', folder='foo'),
-        ],
+        [no("foo1", folder="baz"), no("foo2", folder="foo/bar")],
+        [no("foo3", folder="baz"), no("foo4", folder="foo")],
    ]
    return app_with_dupes(dupes)

+
@with_app(app_with_subfolders)
 def test_folder_crit_is_sorted(app):
    # Folder subcriteria are sorted.
    app.select_pri_criterion("Folder")
-    eq_(app.pdialog.criteria_list[:], ['baz', 'foo', op.join('foo', 'bar')])
+    eq_(app.pdialog.criteria_list[:], ["baz", "foo", op.join("foo", "bar")])
+

@with_app(app_with_subfolders)
 def test_folder_crit_includes_subfolders(app):
@@ -171,27 +179,27 @@ def test_folder_crit_includes_subfolders(app):
    app.add_pri_criterion("Folder", 1)  # foo
    app.pdialog.perform_reprioritization()
    # Both foo and foo/bar dupes will be prioritized
-    eq_(app.rtable[0].data['name'], 'foo2')
-    eq_(app.rtable[2].data['name'], 'foo4')
+    eq_(app.rtable[0].data["name"], "foo2")
+    eq_(app.rtable[2].data["name"], "foo4")
+

@with_app(app_with_subfolders)
 def test_display_something_on_empty_extensions(app):
    # When there's no extension, display "None" instead of nothing at all.
    app.select_pri_criterion("Kind")
-    eq_(app.pdialog.criteria_list[:], ['None'])
+    eq_(app.pdialog.criteria_list[:], ["None"])

-#---
+
+# ---
 def app_one_name_longer_than_the_other():
    dupes = [
-        [
-            no('shortest.ext'),
-            no('loooongest.ext'),
-        ],
+        [no("shortest.ext"), no("loooongest.ext")],
    ]
    return app_with_dupes(dupes)

+
@with_app(app_one_name_longer_than_the_other)
 def test_longest_filename_prioritization(app):
    app.add_pri_criterion("Filename", 2)  # Longest
    app.pdialog.perform_reprioritization()
-    eq_(app.rtable[0].data['name'], 'loooongest.ext')
+    eq_(app.rtable[0].data["name"], "loooongest.ext")
--- a/core/tests/result_table_test.py
+++ b/core/tests/result_table_test.py
@@ -8,6 +8,7 @@

 from .base import TestApp, GetTestGroups

+
 def app_with_results():
    app = TestApp()
    objects, matches, groups = GetTestGroups()
@@ -15,23 +16,26 @@ def app_with_results():
    app.rtable.refresh()
    return app

+
 def test_delta_flags_delta_mode_off():
    app = app_with_results()
    # When the delta mode is off, we never have delta values flags
    app.rtable.delta_values = False
    # Ref file, always false anyway
-    assert not app.rtable[0].is_cell_delta('size')
+    assert not app.rtable[0].is_cell_delta("size")
    # False because delta mode is off
-    assert not app.rtable[1].is_cell_delta('size')
+    assert not app.rtable[1].is_cell_delta("size")
+

 def test_delta_flags_delta_mode_on_delta_columns():
    # When the delta mode is on, delta columns always have a delta flag, except for ref rows
    app = app_with_results()
    app.rtable.delta_values = True
    # Ref file, always false anyway
-    assert not app.rtable[0].is_cell_delta('size')
+    assert not app.rtable[0].is_cell_delta("size")
    # But for a dupe, the flag is on
-    assert app.rtable[1].is_cell_delta('size')
+    assert app.rtable[1].is_cell_delta("size")
+

 def test_delta_flags_delta_mode_on_non_delta_columns():
    # When the delta mode is on, non-delta columns have a delta flag if their value differs from
@@ -39,11 +43,12 @@ def test_delta_flags_delta_mode_on_non_delta_columns():
    app = app_with_results()
    app.rtable.delta_values = True
    # "bar bleh" != "foo bar", flag on
-    assert app.rtable[1].is_cell_delta('name')
+    assert app.rtable[1].is_cell_delta("name")
    # "ibabtu" row, but it's a ref, flag off
-    assert not app.rtable[3].is_cell_delta('name')
+    assert not app.rtable[3].is_cell_delta("name")
    # "ibabtu" == "ibabtu", flag off
-    assert not app.rtable[4].is_cell_delta('name')
+    assert not app.rtable[4].is_cell_delta("name")
+

 def test_delta_flags_delta_mode_on_non_delta_columns_case_insensitive():
    # Comparison that occurs for non-numeric columns to check whether they're delta is case
@@ -53,4 +58,4 @@ def test_delta_flags_delta_mode_on_non_delta_columns_case_insensitive():
    app.app.results.groups[1].dupes[0].name = "IBaBTU"
    app.rtable.delta_values = True
    # "ibAbtu" == "IBaBTU", flag off
-    assert not app.rtable[4].is_cell_delta('name')
+    assert not app.rtable[4].is_cell_delta("name")
--- a/core/tests/results_test.py
+++ b/core/tests/results_test.py
@@ -17,6 +17,7 @@ from .. import engine
 from .base import NamedObject, GetTestGroups, DupeGuru
 from ..results import Results

+
 class TestCaseResultsEmpty:
    def setup_method(self, method):
        self.app = DupeGuru()
@@ -24,7 +25,7 @@ class TestCaseResultsEmpty:

    def test_apply_invalid_filter(self):
        # If the applied filter is an invalid regexp, just ignore the filter.
-        self.results.apply_filter('[') # invalid
+        self.results.apply_filter("[")  # invalid
        self.test_stat_line()  # make sure that the stats line isn't saying we applied a '[' filter

    def test_stat_line(self):
@@ -34,7 +35,7 @@ class TestCaseResultsEmpty:
        eq_(0, len(self.results.groups))

    def test_get_group_of_duplicate(self):
-        assert self.results.get_group_of_duplicate('foo') is None
+        assert self.results.get_group_of_duplicate("foo") is None

    def test_save_to_xml(self):
        f = io.BytesIO()
@@ -42,7 +43,7 @@ class TestCaseResultsEmpty:
        f.seek(0)
        doc = ET.parse(f)
        root = doc.getroot()
-        eq_('results', root.tag)
+        eq_("results", root.tag)

    def test_is_modified(self):
        assert not self.results.is_modified
@@ -59,10 +60,10 @@ class TestCaseResultsEmpty:
        # would have been some kind of feedback to the user, but the work involved for something
        # that simply never happens (I never received a report of this crash, I experienced it
        # while fooling around) is too much. Instead, use standard name conflict resolution.
-        folderpath = tmpdir.join('foo')
+        folderpath = tmpdir.join("foo")
        folderpath.mkdir()
        self.results.save_to_xml(str(folderpath))  # no crash
-        assert tmpdir.join('[000] foo').check()
+        assert tmpdir.join("[000] foo").check()


 class TestCaseResultsWithSomeGroups:
@@ -116,18 +117,18 @@ class TestCaseResultsWithSomeGroups:
        assert d is g.ref

    def test_sort_groups(self):
-        self.results.make_ref(self.objects[1]) #We want to make the 1024 sized object to go ref.
+        self.results.make_ref(self.objects[1])  # We want to make the 1024 sized object to go ref.
        g1, g2 = self.groups
-        self.results.sort_groups('size')
+        self.results.sort_groups("size")
        assert self.results.groups[0] is g2
        assert self.results.groups[1] is g1
-        self.results.sort_groups('size', False)
+        self.results.sort_groups("size", False)
        assert self.results.groups[0] is g1
        assert self.results.groups[1] is g2

    def test_set_groups_when_sorted(self):
-        self.results.make_ref(self.objects[1]) #We want to make the 1024 sized object to go ref.
-        self.results.sort_groups('size')
+        self.results.make_ref(self.objects[1])  # We want to make the 1024 sized object to go ref.
+        self.results.sort_groups("size")
        objects, matches, groups = GetTestGroups()
        g1, g2 = groups
        g1.switch_ref(objects[1])
@@ -158,9 +159,9 @@ class TestCaseResultsWithSomeGroups:
        o3.size = 3
        o4.size = 2
        o5.size = 1
-        self.results.sort_dupes('size')
+        self.results.sort_dupes("size")
        eq_([o5, o3, o2], self.results.dupes)
-        self.results.sort_dupes('size', False)
+        self.results.sort_dupes("size", False)
        eq_([o2, o3, o5], self.results.dupes)

    def test_dupe_list_remember_sort(self):
@@ -170,25 +171,25 @@ class TestCaseResultsWithSomeGroups:
        o3.size = 3
        o4.size = 2
        o5.size = 1
-        self.results.sort_dupes('size')
+        self.results.sort_dupes("size")
        self.results.make_ref(o2)
        eq_([o5, o3, o1], self.results.dupes)

    def test_dupe_list_sort_delta_values(self):
        o1, o2, o3, o4, o5 = self.objects
        o1.size = 10
-        o2.size = 2 #-8
-        o3.size = 3 #-7
+        o2.size = 2  # -8
+        o3.size = 3  # -7
        o4.size = 20
-        o5.size = 1 #-19
-        self.results.sort_dupes('size', delta=True)
+        o5.size = 1  # -19
+        self.results.sort_dupes("size", delta=True)
        eq_([o5, o2, o3], self.results.dupes)

    def test_sort_empty_list(self):
-        #There was an infinite loop when sorting an empty list.
+        # There was an infinite loop when sorting an empty list.
        app = DupeGuru()
        r = app.results
-        r.sort_dupes('name')
+        r.sort_dupes("name")
        eq_([], r.dupes)

    def test_dupe_list_update_on_remove_duplicates(self):
@@ -236,7 +237,7 @@ class TestCaseResultsWithSomeGroups:
        # "aaa" makes our dupe go first in alphabetical order, but since we have the same value as
        # ref, we're going last.
        g2r.name = g2d1.name = "aaa"
-        self.results.sort_dupes('name', delta=True)
+        self.results.sort_dupes("name", delta=True)
        eq_("aaa", self.results.dupes[2].name)

    def test_dupe_list_sort_delta_values_nonnumeric_case_insensitive(self):
@@ -244,9 +245,10 @@ class TestCaseResultsWithSomeGroups:
        g1r, g1d1, g1d2, g2r, g2d1 = self.objects
        g2r.name = "AaA"
        g2d1.name = "aAa"
-        self.results.sort_dupes('name', delta=True)
+        self.results.sort_dupes("name", delta=True)
        eq_("aAa", self.results.dupes[2].name)

+
 class TestCaseResultsWithSavedResults:
    def setup_method(self, method):
        self.app = DupeGuru()
@@ -299,7 +301,7 @@ class TestCaseResultsMarkings:
        self.results.mark(self.objects[2])
        self.results.mark(self.objects[4])
        eq_("2 / 3 (2.00 B / 1.01 KB) duplicates marked.", self.results.stat_line)
-        self.results.mark(self.objects[0]) #this is a ref, it can't be counted
+        self.results.mark(self.objects[0])  # this is a ref, it can't be counted
        eq_("2 / 3 (2.00 B / 1.01 KB) duplicates marked.", self.results.stat_line)
        self.results.groups = self.groups
        eq_("0 / 3 (0.00 B / 1.01 KB) duplicates marked.", self.results.stat_line)
@@ -335,7 +337,7 @@ class TestCaseResultsMarkings:
        def log_object(o):
            log.append(o)
            if o is self.objects[1]:
-                raise EnvironmentError('foobar')
+                raise EnvironmentError("foobar")

        log = []
        self.results.mark_all()
@@ -350,7 +352,7 @@ class TestCaseResultsMarkings:
        eq_(len(self.results.problems), 1)
        dupe, msg = self.results.problems[0]
        assert dupe is self.objects[1]
-        eq_(msg, 'foobar')
+        eq_(msg, "foobar")

    def test_perform_on_marked_with_ref(self):
        def log_object(o):
@@ -400,7 +402,7 @@ class TestCaseResultsMarkings:
        self.results.make_ref(d)
        eq_("0 / 3 (0.00 B / 3.00 B) duplicates marked.", self.results.stat_line)

-    def test_SaveXML(self):
+    def test_save_xml(self):
        self.results.mark(self.objects[1])
        self.results.mark_invert()
        f = io.BytesIO()
@@ -408,20 +410,20 @@ class TestCaseResultsMarkings:
        f.seek(0)
        doc = ET.parse(f)
        root = doc.getroot()
-        g1, g2 = root.getiterator('group')
-        d1, d2, d3 = g1.getiterator('file')
-        eq_('n', d1.get('marked'))
-        eq_('n', d2.get('marked'))
-        eq_('y', d3.get('marked'))
-        d1, d2 = g2.getiterator('file')
-        eq_('n', d1.get('marked'))
-        eq_('y', d2.get('marked'))
+        g1, g2 = root.iter("group")
+        d1, d2, d3 = g1.iter("file")
+        eq_("n", d1.get("marked"))
+        eq_("n", d2.get("marked"))
+        eq_("y", d3.get("marked"))
+        d1, d2 = g2.iter("file")
+        eq_("n", d1.get("marked"))
+        eq_("y", d2.get("marked"))

-    def test_LoadXML(self):
+    def test_load_xml(self):
        def get_file(path):
            return [f for f in self.objects if str(f.path) == path][0]

-        self.objects[4].name = 'ibabtu 2' #we can't have 2 files with the same path
+        self.objects[4].name = "ibabtu 2"  # we can't have 2 files with the same path
        self.results.mark(self.objects[1])
        self.results.mark_invert()
        f = io.BytesIO()
@@ -449,46 +451,46 @@ class TestCaseResultsXML:

    def test_save_to_xml(self):
        self.objects[0].is_ref = True
-        self.objects[0].words = [['foo', 'bar']]
+        self.objects[0].words = [["foo", "bar"]]
        f = io.BytesIO()
        self.results.save_to_xml(f)
        f.seek(0)
        doc = ET.parse(f)
        root = doc.getroot()
-        eq_('results', root.tag)
+        eq_("results", root.tag)
        eq_(2, len(root))
-        eq_(2, len([c for c in root if c.tag == 'group']))
+        eq_(2, len([c for c in root if c.tag == "group"]))
        g1, g2 = root
        eq_(6, len(g1))
-        eq_(3, len([c for c in g1 if c.tag == 'file']))
-        eq_(3, len([c for c in g1 if c.tag == 'match']))
-        d1, d2, d3 = [c for c in g1 if c.tag == 'file']
-        eq_(op.join('basepath', 'foo bar'), d1.get('path'))
-        eq_(op.join('basepath', 'bar bleh'), d2.get('path'))
-        eq_(op.join('basepath', 'foo bleh'), d3.get('path'))
-        eq_('y', d1.get('is_ref'))
-        eq_('n', d2.get('is_ref'))
-        eq_('n', d3.get('is_ref'))
-        eq_('foo,bar', d1.get('words'))
-        eq_('bar,bleh', d2.get('words'))
-        eq_('foo,bleh', d3.get('words'))
+        eq_(3, len([c for c in g1 if c.tag == "file"]))
+        eq_(3, len([c for c in g1 if c.tag == "match"]))
+        d1, d2, d3 = [c for c in g1 if c.tag == "file"]
+        eq_(op.join("basepath", "foo bar"), d1.get("path"))
+        eq_(op.join("basepath", "bar bleh"), d2.get("path"))
+        eq_(op.join("basepath", "foo bleh"), d3.get("path"))
+        eq_("y", d1.get("is_ref"))
+        eq_("n", d2.get("is_ref"))
+        eq_("n", d3.get("is_ref"))
+        eq_("foo,bar", d1.get("words"))
+        eq_("bar,bleh", d2.get("words"))
+        eq_("foo,bleh", d3.get("words"))
        eq_(3, len(g2))
-        eq_(2, len([c for c in g2 if c.tag == 'file']))
-        eq_(1, len([c for c in g2 if c.tag == 'match']))
-        d1, d2 = [c for c in g2 if c.tag == 'file']
-        eq_(op.join('basepath', 'ibabtu'), d1.get('path'))
-        eq_(op.join('basepath', 'ibabtu'), d2.get('path'))
-        eq_('n', d1.get('is_ref'))
-        eq_('n', d2.get('is_ref'))
-        eq_('ibabtu', d1.get('words'))
-        eq_('ibabtu', d2.get('words'))
+        eq_(2, len([c for c in g2 if c.tag == "file"]))
+        eq_(1, len([c for c in g2 if c.tag == "match"]))
+        d1, d2 = [c for c in g2 if c.tag == "file"]
+        eq_(op.join("basepath", "ibabtu"), d1.get("path"))
+        eq_(op.join("basepath", "ibabtu"), d2.get("path"))
+        eq_("n", d1.get("is_ref"))
+        eq_("n", d2.get("is_ref"))
+        eq_("ibabtu", d1.get("words"))
+        eq_("ibabtu", d2.get("words"))

-    def test_LoadXML(self):
+    def test_load_xml(self):
        def get_file(path):
            return [f for f in self.objects if str(f.path) == path][0]

        self.objects[0].is_ref = True
-        self.objects[4].name = 'ibabtu 2' #we can't have 2 files with the same path
+        self.objects[4].name = "ibabtu 2"  # we can't have 2 files with the same path
        f = io.BytesIO()
        self.results.save_to_xml(f)
        f.seek(0)
@@ -504,36 +506,36 @@ class TestCaseResultsXML:
        assert g1[0] is self.objects[0]
        assert g1[1] is self.objects[1]
        assert g1[2] is self.objects[2]
-        eq_(['foo', 'bar'], g1[0].words)
-        eq_(['bar', 'bleh'], g1[1].words)
-        eq_(['foo', 'bleh'], g1[2].words)
+        eq_(["foo", "bar"], g1[0].words)
+        eq_(["bar", "bleh"], g1[1].words)
+        eq_(["foo", "bleh"], g1[2].words)
        eq_(2, len(g2))
        assert not g2[0].is_ref
        assert not g2[1].is_ref
        assert g2[0] is self.objects[3]
        assert g2[1] is self.objects[4]
-        eq_(['ibabtu'], g2[0].words)
-        eq_(['ibabtu'], g2[1].words)
+        eq_(["ibabtu"], g2[0].words)
+        eq_(["ibabtu"], g2[1].words)

-    def test_LoadXML_with_filename(self, tmpdir):
+    def test_load_xml_with_filename(self, tmpdir):
        def get_file(path):
            return [f for f in self.objects if str(f.path) == path][0]

-        filename = str(tmpdir.join('dupeguru_results.xml'))
-        self.objects[4].name = 'ibabtu 2' #we can't have 2 files with the same path
+        filename = str(tmpdir.join("dupeguru_results.xml"))
+        self.objects[4].name = "ibabtu 2"  # we can't have 2 files with the same path
        self.results.save_to_xml(filename)
        app = DupeGuru()
        r = Results(app)
        r.load_from_xml(filename, get_file)
        eq_(2, len(r.groups))

-    def test_LoadXML_with_some_files_that_dont_exist_anymore(self):
+    def test_load_xml_with_some_files_that_dont_exist_anymore(self):
        def get_file(path):
-            if path.endswith('ibabtu 2'):
+            if path.endswith("ibabtu 2"):
                return None
            return [f for f in self.objects if str(f.path) == path][0]

-        self.objects[4].name = 'ibabtu 2' #we can't have 2 files with the same path
+        self.objects[4].name = "ibabtu 2"  # we can't have 2 files with the same path
        f = io.BytesIO()
        self.results.save_to_xml(f)
        f.seek(0)
@@ -543,40 +545,40 @@ class TestCaseResultsXML:
        eq_(1, len(r.groups))
        eq_(3, len(r.groups[0]))

-    def test_LoadXML_missing_attributes_and_bogus_elements(self):
+    def test_load_xml_missing_attributes_and_bogus_elements(self):
        def get_file(path):
            return [f for f in self.objects if str(f.path) == path][0]

-        root = ET.Element('foobar') #The root element shouldn't matter, really.
-        group_node = ET.SubElement(root, 'group')
-        dupe_node = ET.SubElement(group_node, 'file') #Perfectly correct file
-        dupe_node.set('path', op.join('basepath', 'foo bar'))
-        dupe_node.set('is_ref', 'y')
-        dupe_node.set('words', 'foo, bar')
-        dupe_node = ET.SubElement(group_node, 'file') #is_ref missing, default to 'n'
-        dupe_node.set('path', op.join('basepath', 'foo bleh'))
-        dupe_node.set('words', 'foo, bleh')
-        dupe_node = ET.SubElement(group_node, 'file') #words are missing, valid.
-        dupe_node.set('path', op.join('basepath', 'bar bleh'))
-        dupe_node = ET.SubElement(group_node, 'file') #path is missing, invalid.
-        dupe_node.set('words', 'foo, bleh')
-        dupe_node = ET.SubElement(group_node, 'foobar') #Invalid element name
-        dupe_node.set('path', op.join('basepath', 'bar bleh'))
-        dupe_node.set('is_ref', 'y')
-        dupe_node.set('words', 'bar, bleh')
-        match_node = ET.SubElement(group_node, 'match') # match pointing to a bad index
-        match_node.set('first', '42')
-        match_node.set('second', '45')
-        match_node = ET.SubElement(group_node, 'match') # match with missing attrs
-        match_node = ET.SubElement(group_node, 'match') # match with non-int values
-        match_node.set('first', 'foo')
-        match_node.set('second', 'bar')
-        match_node.set('percentage', 'baz')
-        group_node = ET.SubElement(root, 'foobar') #invalid group
-        group_node = ET.SubElement(root, 'group') #empty group
+        root = ET.Element("foobar")  # The root element shouldn't matter, really.
+        group_node = ET.SubElement(root, "group")
+        dupe_node = ET.SubElement(group_node, "file")  # Perfectly correct file
+        dupe_node.set("path", op.join("basepath", "foo bar"))
+        dupe_node.set("is_ref", "y")
+        dupe_node.set("words", "foo, bar")
+        dupe_node = ET.SubElement(group_node, "file")  # is_ref missing, default to 'n'
+        dupe_node.set("path", op.join("basepath", "foo bleh"))
+        dupe_node.set("words", "foo, bleh")
+        dupe_node = ET.SubElement(group_node, "file")  # words are missing, valid.
+        dupe_node.set("path", op.join("basepath", "bar bleh"))
+        dupe_node = ET.SubElement(group_node, "file")  # path is missing, invalid.
+        dupe_node.set("words", "foo, bleh")
+        dupe_node = ET.SubElement(group_node, "foobar")  # Invalid element name
+        dupe_node.set("path", op.join("basepath", "bar bleh"))
+        dupe_node.set("is_ref", "y")
+        dupe_node.set("words", "bar, bleh")
+        match_node = ET.SubElement(group_node, "match")  # match pointing to a bad index
+        match_node.set("first", "42")
+        match_node.set("second", "45")
+        match_node = ET.SubElement(group_node, "match")  # match with missing attrs
+        match_node = ET.SubElement(group_node, "match")  # match with non-int values
+        match_node.set("first", "foo")
+        match_node.set("second", "bar")
+        match_node.set("percentage", "baz")
+        group_node = ET.SubElement(root, "foobar")  # invalid group
+        group_node = ET.SubElement(root, "group")  # empty group
        f = io.BytesIO()
        tree = ET.ElementTree(root)
-        tree.write(f, encoding='utf-8')
+        tree.write(f, encoding="utf-8")
        f.seek(0)
        app = DupeGuru()
        r = Results(app)
@@ -586,16 +588,16 @@ class TestCaseResultsXML:

    def test_xml_non_ascii(self):
        def get_file(path):
-            if path == op.join('basepath', '\xe9foo bar'):
+            if path == op.join("basepath", "\xe9foo bar"):
                return objects[0]
-            if path == op.join('basepath', 'bar bleh'):
+            if path == op.join("basepath", "bar bleh"):
                return objects[1]

        objects = [NamedObject("\xe9foo bar", True), NamedObject("bar bleh", True)]
-        matches = engine.getmatches(objects) #we should have 5 matches
-        groups = engine.get_groups(matches) #We should have 2 groups
+        matches = engine.getmatches(objects)  # we should have 5 matches
+        groups = engine.get_groups(matches)  # We should have 2 groups
        for g in groups:
-            g.prioritize(lambda x: objects.index(x)) #We want the dupes to be in the same order as the list is
+            g.prioritize(lambda x: objects.index(x))  # We want the dupes to be in the same order as the list is
        app = DupeGuru()
        results = Results(app)
        results.groups = groups
@@ -607,11 +609,11 @@ class TestCaseResultsXML:
        r.load_from_xml(f, get_file)
        g = r.groups[0]
        eq_("\xe9foo bar", g[0].name)
-        eq_(['efoo', 'bar'], g[0].words)
+        eq_(["efoo", "bar"], g[0].words)

    def test_load_invalid_xml(self):
        f = io.BytesIO()
-        f.write(b'<this is invalid')
+        f.write(b"<this is invalid")
        f.seek(0)
        app = DupeGuru()
        r = Results(app)
@@ -623,7 +625,7 @@ class TestCaseResultsXML:
        app = DupeGuru()
        r = Results(app)
        with raises(IOError):
-            r.load_from_xml('does_not_exist.xml', None)
+            r.load_from_xml("does_not_exist.xml", None)
        eq_(0, len(r.groups))

    def test_remember_match_percentage(self):
@@ -643,12 +645,12 @@ class TestCaseResultsXML:
        results.load_from_xml(f, self.get_file)
        group = results.groups[0]
        d1, d2, d3 = group
-        match = group.get_match_of(d2) #d1 - d2
+        match = group.get_match_of(d2)  # d1 - d2
        eq_(42, match[2])
-        match = group.get_match_of(d3) #d1 - d3
+        match = group.get_match_of(d3)  # d1 - d3
        eq_(43, match[2])
        group.switch_ref(d2)
-        match = group.get_match_of(d3) #d2 - d3
+        match = group.get_match_of(d3)  # d2 - d3
        eq_(46, match[2])

    def test_save_and_load(self):
@@ -661,12 +663,12 @@ class TestCaseResultsXML:

    def test_apply_filter_works_on_paths(self):
        # apply_filter() searches on the whole path, not just on the filename.
-        self.results.apply_filter('basepath')
+        self.results.apply_filter("basepath")
        eq_(len(self.results.groups), 2)

    def test_save_xml_with_invalid_characters(self):
        # Don't crash when saving files that have invalid xml characters in their path
-        self.objects[0].name = 'foo\x19'
+        self.objects[0].name = "foo\x19"
        self.results.save_to_xml(io.BytesIO())  # don't crash


@@ -676,7 +678,7 @@ class TestCaseResultsFilter:
        self.results = self.app.results
        self.objects, self.matches, self.groups = GetTestGroups()
        self.results.groups = self.groups
-        self.results.apply_filter(r'foo')
+        self.results.apply_filter(r"foo")

    def test_groups(self):
        eq_(1, len(self.results.groups))
@@ -694,7 +696,7 @@ class TestCaseResultsFilter:

    def test_dupes_reconstructed_filtered(self):
        # make_ref resets self.__dupes to None. When it's reconstructed, we want it filtered
-        dupe = self.results.dupes[0] #3rd object
+        dupe = self.results.dupes[0]  # 3rd object
        self.results.make_ref(dupe)
        eq_(1, len(self.results.dupes))
        assert self.results.dupes[0] is self.objects[0]
@@ -702,23 +704,23 @@ class TestCaseResultsFilter:
    def test_include_ref_dupes_in_filter(self):
        # When only the ref of a group match the filter, include it in the group
        self.results.apply_filter(None)
-        self.results.apply_filter(r'foo bar')
+        self.results.apply_filter(r"foo bar")
        eq_(1, len(self.results.groups))
        eq_(0, len(self.results.dupes))

    def test_filters_build_on_one_another(self):
-        self.results.apply_filter(r'bar')
+        self.results.apply_filter(r"bar")
        eq_(1, len(self.results.groups))
        eq_(0, len(self.results.dupes))

    def test_stat_line(self):
-        expected = '0 / 1 (0.00 B / 1.00 B) duplicates marked. filter: foo'
+        expected = "0 / 1 (0.00 B / 1.00 B) duplicates marked. filter: foo"
        eq_(expected, self.results.stat_line)
-        self.results.apply_filter(r'bar')
-        expected = '0 / 0 (0.00 B / 0.00 B) duplicates marked. filter: foo --> bar'
+        self.results.apply_filter(r"bar")
+        expected = "0 / 0 (0.00 B / 0.00 B) duplicates marked. filter: foo --> bar"
        eq_(expected, self.results.stat_line)
        self.results.apply_filter(None)
-        expected = '0 / 3 (0.00 B / 1.01 KB) duplicates marked.'
+        expected = "0 / 3 (0.00 B / 1.01 KB) duplicates marked."
        eq_(expected, self.results.stat_line)

    def test_mark_count_is_filtered_as_well(self):
@@ -726,8 +728,8 @@ class TestCaseResultsFilter:
        # We don't want to perform mark_all() because we want the mark list to contain objects
        for dupe in self.results.dupes:
            self.results.mark(dupe)
-        self.results.apply_filter(r'foo')
-        expected = '1 / 1 (1.00 B / 1.00 B) duplicates marked. filter: foo'
+        self.results.apply_filter(r"foo")
+        expected = "1 / 1 (1.00 B / 1.00 B) duplicates marked. filter: foo"
        eq_(expected, self.results.stat_line)

    def test_mark_all_only_affects_filtered_items(self):
@@ -741,20 +743,20 @@ class TestCaseResultsFilter:
        self.results.apply_filter(None)
        self.results.make_ref(self.objects[1])  # to have the 1024 b obkect as ref
        g1, g2 = self.groups
-        self.results.apply_filter('a') # Matches both group
-        self.results.sort_groups('size')
+        self.results.apply_filter("a")  # Matches both group
+        self.results.sort_groups("size")
        assert self.results.groups[0] is g2
        assert self.results.groups[1] is g1
        self.results.apply_filter(None)
        assert self.results.groups[0] is g2
        assert self.results.groups[1] is g1
-        self.results.sort_groups('size', False)
-        self.results.apply_filter('a')
+        self.results.sort_groups("size", False)
+        self.results.apply_filter("a")
        assert self.results.groups[1] is g2
        assert self.results.groups[0] is g1

    def test_set_group(self):
-        #We want the new group to be filtered
+        # We want the new group to be filtered
        self.objects, self.matches, self.groups = GetTestGroups()
        self.results.groups = self.groups
        eq_(1, len(self.results.groups))
@@ -764,12 +766,12 @@ class TestCaseResultsFilter:
        def get_file(path):
            return [f for f in self.objects if str(f.path) == path][0]

-        filename = str(tmpdir.join('dupeguru_results.xml'))
-        self.objects[4].name = 'ibabtu 2' #we can't have 2 files with the same path
+        filename = str(tmpdir.join("dupeguru_results.xml"))
+        self.objects[4].name = "ibabtu 2"  # we can't have 2 files with the same path
        self.results.save_to_xml(filename)
        app = DupeGuru()
        r = Results(app)
-        r.apply_filter('foo')
+        r.apply_filter("foo")
        r.load_from_xml(filename, get_file)
        eq_(2, len(r.groups))

@@ -778,7 +780,7 @@ class TestCaseResultsFilter:
        self.results.apply_filter(None)
        eq_(2, len(self.results.groups))
        eq_(2, len(self.results.dupes))
-        self.results.apply_filter('ibabtu')
+        self.results.apply_filter("ibabtu")
        self.results.remove_duplicates([self.results.dupes[0]])
        self.results.apply_filter(None)
        eq_(1, len(self.results.groups))
@@ -786,7 +788,7 @@ class TestCaseResultsFilter:

    def test_filter_is_case_insensitive(self):
        self.results.apply_filter(None)
-        self.results.apply_filter('FOO')
+        self.results.apply_filter("FOO")
        eq_(1, len(self.results.dupes))

    def test_make_ref_on_filtered_out_doesnt_mess_stats(self):
@@ -797,10 +799,10 @@ class TestCaseResultsFilter:
        bar_bleh = g1[1]  # The "bar bleh" dupe is filtered out
        self.results.make_ref(bar_bleh)
        # Now the stats should display *2* markable dupes (instead of 1)
-        expected = '0 / 2 (0.00 B / 2.00 B) duplicates marked. filter: foo'
+        expected = "0 / 2 (0.00 B / 2.00 B) duplicates marked. filter: foo"
        eq_(expected, self.results.stat_line)
        self.results.apply_filter(None)  # Now let's make sure our unfiltered results aren't fucked up
-        expected = '0 / 3 (0.00 B / 3.00 B) duplicates marked.'
+        expected = "0 / 3 (0.00 B / 3.00 B) duplicates marked."
        eq_(expected, self.results.stat_line)


@@ -814,6 +816,5 @@ class TestCaseResultsRefFile:
        self.results.groups = self.groups

    def test_stat_line(self):
-        expected = '0 / 2 (0.00 B / 2.00 B) duplicates marked.'
+        expected = "0 / 2 (0.00 B / 2.00 B) duplicates marked."
        eq_(expected, self.results.stat_line)
-
--- a/core/tests/scanner_test.py
+++ b/core/tests/scanner_test.py
@@ -4,6 +4,8 @@
 # which should be included with this package. The terms are also available at
 # http://www.gnu.org/licenses/gpl-3.0.html

+import pytest
+
 from hscommon.jobprogress import job
 from hscommon.path import Path
 from hscommon.testutil import eq_
@@ -14,6 +16,7 @@ from ..ignore import IgnoreList
 from ..scanner import Scanner, ScanType
 from ..me.scanner import ScannerME

+
 class NamedObject:
    def __init__(self, name="foobar", size=1, path=None):
        if path is None:
@@ -26,64 +29,85 @@ class NamedObject:
        self.words = getwords(name)

    def __repr__(self):
-        return '<NamedObject %r %r>' % (self.name, self.path)
+        return "<NamedObject %r %r>" % (self.name, self.path)


 no = NamedObject

-def pytest_funcarg__fake_fileexists(request):
+
+@pytest.fixture
+def fake_fileexists(request):
    # This is a hack to avoid invalidating all previous tests since the scanner started to test
    # for file existence before doing the match grouping.
-    monkeypatch = request.getfuncargvalue('monkeypatch')
-    monkeypatch.setattr(Path, 'exists', lambda _: True)
+    monkeypatch = request.getfixturevalue("monkeypatch")
+    monkeypatch.setattr(Path, "exists", lambda _: True)
+

 def test_empty(fake_fileexists):
    s = Scanner()
    r = s.get_dupe_groups([])
    eq_(r, [])

+
 def test_default_settings(fake_fileexists):
    s = Scanner()
    eq_(s.min_match_percentage, 80)
-    eq_(s.scan_type, ScanType.Filename)
+    eq_(s.scan_type, ScanType.FILENAME)
    eq_(s.mix_file_kind, True)
    eq_(s.word_weighting, False)
    eq_(s.match_similar_words, False)
+    eq_(s.size_threshold, 0)
+    eq_(s.large_size_threshold, 0)
+    eq_(s.big_file_size_threshold, 0)
+

 def test_simple_with_default_settings(fake_fileexists):
    s = Scanner()
-    f = [no('foo bar', path='p1'), no('foo bar', path='p2'), no('foo bleh')]
+    f = [no("foo bar", path="p1"), no("foo bar", path="p2"), no("foo bleh")]
    r = s.get_dupe_groups(f)
    eq_(len(r), 1)
    g = r[0]
-    #'foo bleh' cannot be in the group because the default min match % is 80
+    # 'foo bleh' cannot be in the group because the default min match % is 80
    eq_(len(g), 2)
    assert g.ref in f[:2]
    assert g.dupes[0] in f[:2]

+
 def test_simple_with_lower_min_match(fake_fileexists):
    s = Scanner()
    s.min_match_percentage = 50
-    f = [no('foo bar', path='p1'), no('foo bar', path='p2'), no('foo bleh')]
+    f = [no("foo bar", path="p1"), no("foo bar", path="p2"), no("foo bleh")]
    r = s.get_dupe_groups(f)
    eq_(len(r), 1)
    g = r[0]
    eq_(len(g), 3)

+
 def test_trim_all_ref_groups(fake_fileexists):
    # When all files of a group are ref, don't include that group in the results, but also don't
    # count the files from that group as discarded.
    s = Scanner()
-    f = [no('foo', path='p1'), no('foo', path='p2'), no('bar', path='p1'), no('bar', path='p2')]
+    f = [
+        no("foo", path="p1"),
+        no("foo", path="p2"),
+        no("bar", path="p1"),
+        no("bar", path="p2"),
+    ]
    f[2].is_ref = True
    f[3].is_ref = True
    r = s.get_dupe_groups(f)
    eq_(len(r), 1)
    eq_(s.discarded_file_count, 0)

-def test_priorize(fake_fileexists):
+
+def test_prioritize(fake_fileexists):
    s = Scanner()
-    f = [no('foo', path='p1'), no('foo', path='p2'), no('bar', path='p1'), no('bar', path='p2')]
+    f = [
+        no("foo", path="p1"),
+        no("foo", path="p2"),
+        no("bar", path="p1"),
+        no("bar", path="p2"),
+    ]
    f[1].size = 2
    f[2].size = 3
    f[3].is_ref = True
@@ -94,36 +118,112 @@ def test_priorize(fake_fileexists):
    assert f[3] in (g1.ref, g2.ref)
    assert f[2] in (g1.dupes[0], g2.dupes[0])

+
 def test_content_scan(fake_fileexists):
    s = Scanner()
-    s.scan_type = ScanType.Contents
-    f = [no('foo'), no('bar'), no('bleh')]
-    f[0].md5 = f[0].md5partial = 'foobar'
-    f[1].md5 = f[1].md5partial = 'foobar'
-    f[2].md5 = f[2].md5partial = 'bleh'
+    s.scan_type = ScanType.CONTENTS
+    f = [no("foo"), no("bar"), no("bleh")]
+    f[0].md5 = f[0].md5partial = f[0].md5samples = "foobar"
+    f[1].md5 = f[1].md5partial = f[1].md5samples = "foobar"
+    f[2].md5 = f[2].md5partial = f[1].md5samples = "bleh"
    r = s.get_dupe_groups(f)
    eq_(len(r), 1)
    eq_(len(r[0]), 2)
    eq_(s.discarded_file_count, 0)  # don't count the different md5 as discarded!

+
 def test_content_scan_compare_sizes_first(fake_fileexists):
    class MyFile(no):
        @property
-        def md5(file):
+        def md5(self):
            raise AssertionError()

    s = Scanner()
-    s.scan_type = ScanType.Contents
-    f = [MyFile('foo', 1), MyFile('bar', 2)]
+    s.scan_type = ScanType.CONTENTS
+    f = [MyFile("foo", 1), MyFile("bar", 2)]
    eq_(len(s.get_dupe_groups(f)), 0)

+
+def test_ignore_file_size(fake_fileexists):
+    s = Scanner()
+    s.scan_type = ScanType.CONTENTS
+    small_size = 10  # 10KB
+    s.size_threshold = 0
+    large_size = 100 * 1024 * 1024  # 100MB
+    s.large_size_threshold = 0
+    f = [
+        no("smallignore1", small_size - 1),
+        no("smallignore2", small_size - 1),
+        no("small1", small_size),
+        no("small2", small_size),
+        no("large1", large_size),
+        no("large2", large_size),
+        no("largeignore1", large_size + 1),
+        no("largeignore2", large_size + 1),
+    ]
+    f[0].md5 = f[0].md5partial = f[0].md5samples = "smallignore"
+    f[1].md5 = f[1].md5partial = f[1].md5samples = "smallignore"
+    f[2].md5 = f[2].md5partial = f[2].md5samples = "small"
+    f[3].md5 = f[3].md5partial = f[3].md5samples = "small"
+    f[4].md5 = f[4].md5partial = f[4].md5samples = "large"
+    f[5].md5 = f[5].md5partial = f[5].md5samples = "large"
+    f[6].md5 = f[6].md5partial = f[6].md5samples = "largeignore"
+    f[7].md5 = f[7].md5partial = f[7].md5samples = "largeignore"
+
+    r = s.get_dupe_groups(f)
+    # No ignores
+    eq_(len(r), 4)
+    # Ignore smaller
+    s.size_threshold = small_size
+    r = s.get_dupe_groups(f)
+    eq_(len(r), 3)
+    # Ignore larger
+    s.size_threshold = 0
+    s.large_size_threshold = large_size
+    r = s.get_dupe_groups(f)
+    eq_(len(r), 3)
+    # Ignore both
+    s.size_threshold = small_size
+    r = s.get_dupe_groups(f)
+    eq_(len(r), 2)
+
+
+def test_big_file_partial_hashes(fake_fileexists):
+    s = Scanner()
+    s.scan_type = ScanType.CONTENTS
+
+    smallsize = 1
+    bigsize = 100 * 1024 * 1024  # 100MB
+    s.big_file_size_threshold = bigsize
+
+    f = [no("bigfoo", bigsize), no("bigbar", bigsize), no("smallfoo", smallsize), no("smallbar", smallsize)]
+    f[0].md5 = f[0].md5partial = f[0].md5samples = "foobar"
+    f[1].md5 = f[1].md5partial = f[1].md5samples = "foobar"
+    f[2].md5 = f[2].md5partial = "bleh"
+    f[3].md5 = f[3].md5partial = "bleh"
+    r = s.get_dupe_groups(f)
+    eq_(len(r), 2)
+
+    # md5partial is still the same, but the file is actually different
+    f[1].md5 = f[1].md5samples = "difffoobar"
+    # here we compare the full md5s, as the user disabled the optimization
+    s.big_file_size_threshold = 0
+    r = s.get_dupe_groups(f)
+    eq_(len(r), 1)
+
+    # here we should compare the md5samples, and see they are different
+    s.big_file_size_threshold = bigsize
+    r = s.get_dupe_groups(f)
+    eq_(len(r), 1)
+
+
 def test_min_match_perc_doesnt_matter_for_content_scan(fake_fileexists):
    s = Scanner()
-    s.scan_type = ScanType.Contents
-    f = [no('foo'), no('bar'), no('bleh')]
-    f[0].md5 = f[0].md5partial = 'foobar'
-    f[1].md5 = f[1].md5partial = 'foobar'
-    f[2].md5 = f[2].md5partial = 'bleh'
+    s.scan_type = ScanType.CONTENTS
+    f = [no("foo"), no("bar"), no("bleh")]
+    f[0].md5 = f[0].md5partial = f[0].md5samples = "foobar"
+    f[1].md5 = f[1].md5partial = f[1].md5samples = "foobar"
+    f[2].md5 = f[2].md5partial = f[2].md5samples = "bleh"
    s.min_match_percentage = 101
    r = s.get_dupe_groups(f)
    eq_(len(r), 1)
@@ -133,157 +233,177 @@ def test_min_match_perc_doesnt_matter_for_content_scan(fake_fileexists):
    eq_(len(r), 1)
    eq_(len(r[0]), 2)

+
 def test_content_scan_doesnt_put_md5_in_words_at_the_end(fake_fileexists):
    s = Scanner()
-    s.scan_type = ScanType.Contents
-    f = [no('foo'), no('bar')]
-    f[0].md5 = f[0].md5partial = '\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f'
-    f[1].md5 = f[1].md5partial = '\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f'
+    s.scan_type = ScanType.CONTENTS
+    f = [no("foo"), no("bar")]
+    f[0].md5 = f[0].md5partial = f[0].md5samples = "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+    f[1].md5 = f[1].md5partial = f[1].md5samples = "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
    r = s.get_dupe_groups(f)
+    # FIXME looks like we are missing something here?
    r[0]

+
 def test_extension_is_not_counted_in_filename_scan(fake_fileexists):
    s = Scanner()
    s.min_match_percentage = 100
-    f = [no('foo.bar'), no('foo.bleh')]
+    f = [no("foo.bar"), no("foo.bleh")]
    r = s.get_dupe_groups(f)
    eq_(len(r), 1)
    eq_(len(r[0]), 2)

+
 def test_job(fake_fileexists):
-    def do_progress(progress, desc=''):
+    def do_progress(progress, desc=""):
        log.append(progress)
        return True

    s = Scanner()
    log = []
-    f = [no('foo bar'), no('foo bar'), no('foo bleh')]
+    f = [no("foo bar"), no("foo bar"), no("foo bleh")]
    s.get_dupe_groups(f, j=job.Job(1, do_progress))
    eq_(log[0], 0)
    eq_(log[-1], 100)

+
 def test_mix_file_kind(fake_fileexists):
    s = Scanner()
    s.mix_file_kind = False
-    f = [no('foo.1'), no('foo.2')]
+    f = [no("foo.1"), no("foo.2")]
    r = s.get_dupe_groups(f)
    eq_(len(r), 0)

+
 def test_word_weighting(fake_fileexists):
    s = Scanner()
    s.min_match_percentage = 75
    s.word_weighting = True
-    f = [no('foo bar'), no('foo bar bleh')]
+    f = [no("foo bar"), no("foo bar bleh")]
    r = s.get_dupe_groups(f)
    eq_(len(r), 1)
    g = r[0]
    m = g.get_match_of(g.dupes[0])
    eq_(m.percentage, 75)  # 16 letters, 12 matching

+
 def test_similar_words(fake_fileexists):
    s = Scanner()
    s.match_similar_words = True
-    f = [no('The White Stripes'), no('The Whites Stripe'), no('Limp Bizkit'), no('Limp Bizkitt')]
+    f = [
+        no("The White Stripes"),
+        no("The Whites Stripe"),
+        no("Limp Bizkit"),
+        no("Limp Bizkitt"),
+    ]
    r = s.get_dupe_groups(f)
    eq_(len(r), 2)

+
 def test_fields(fake_fileexists):
    s = Scanner()
-    s.scan_type = ScanType.Fields
-    f = [no('The White Stripes - Little Ghost'), no('The White Stripes - Little Acorn')]
+    s.scan_type = ScanType.FIELDS
+    f = [no("The White Stripes - Little Ghost"), no("The White Stripes - Little Acorn")]
    r = s.get_dupe_groups(f)
    eq_(len(r), 0)

+
 def test_fields_no_order(fake_fileexists):
    s = Scanner()
-    s.scan_type = ScanType.FieldsNoOrder
-    f = [no('The White Stripes - Little Ghost'), no('Little Ghost - The White Stripes')]
+    s.scan_type = ScanType.FIELDSNOORDER
+    f = [no("The White Stripes - Little Ghost"), no("Little Ghost - The White Stripes")]
    r = s.get_dupe_groups(f)
    eq_(len(r), 1)

+
 def test_tag_scan(fake_fileexists):
    s = Scanner()
-    s.scan_type = ScanType.Tag
-    o1 = no('foo')
-    o2 = no('bar')
-    o1.artist = 'The White Stripes'
-    o1.title = 'The Air Near My Fingers'
-    o2.artist = 'The White Stripes'
-    o2.title = 'The Air Near My Fingers'
+    s.scan_type = ScanType.TAG
+    o1 = no("foo")
+    o2 = no("bar")
+    o1.artist = "The White Stripes"
+    o1.title = "The Air Near My Fingers"
+    o2.artist = "The White Stripes"
+    o2.title = "The Air Near My Fingers"
    r = s.get_dupe_groups([o1, o2])
    eq_(len(r), 1)

+
 def test_tag_with_album_scan(fake_fileexists):
    s = Scanner()
-    s.scan_type = ScanType.Tag
-    s.scanned_tags = set(['artist', 'album', 'title'])
-    o1 = no('foo')
-    o2 = no('bar')
-    o3 = no('bleh')
-    o1.artist = 'The White Stripes'
-    o1.title = 'The Air Near My Fingers'
-    o1.album = 'Elephant'
-    o2.artist = 'The White Stripes'
-    o2.title = 'The Air Near My Fingers'
-    o2.album = 'Elephant'
-    o3.artist = 'The White Stripes'
-    o3.title = 'The Air Near My Fingers'
-    o3.album = 'foobar'
+    s.scan_type = ScanType.TAG
+    s.scanned_tags = set(["artist", "album", "title"])
+    o1 = no("foo")
+    o2 = no("bar")
+    o3 = no("bleh")
+    o1.artist = "The White Stripes"
+    o1.title = "The Air Near My Fingers"
+    o1.album = "Elephant"
+    o2.artist = "The White Stripes"
+    o2.title = "The Air Near My Fingers"
+    o2.album = "Elephant"
+    o3.artist = "The White Stripes"
+    o3.title = "The Air Near My Fingers"
+    o3.album = "foobar"
    r = s.get_dupe_groups([o1, o2, o3])
    eq_(len(r), 1)

+
 def test_that_dash_in_tags_dont_create_new_fields(fake_fileexists):
    s = Scanner()
-    s.scan_type = ScanType.Tag
-    s.scanned_tags = set(['artist', 'album', 'title'])
+    s.scan_type = ScanType.TAG
+    s.scanned_tags = set(["artist", "album", "title"])
    s.min_match_percentage = 50
-    o1 = no('foo')
-    o2 = no('bar')
-    o1.artist = 'The White Stripes - a'
-    o1.title = 'The Air Near My Fingers - a'
-    o1.album = 'Elephant - a'
-    o2.artist = 'The White Stripes - b'
-    o2.title = 'The Air Near My Fingers - b'
-    o2.album = 'Elephant - b'
+    o1 = no("foo")
+    o2 = no("bar")
+    o1.artist = "The White Stripes - a"
+    o1.title = "The Air Near My Fingers - a"
+    o1.album = "Elephant - a"
+    o2.artist = "The White Stripes - b"
+    o2.title = "The Air Near My Fingers - b"
+    o2.album = "Elephant - b"
    r = s.get_dupe_groups([o1, o2])
    eq_(len(r), 1)

+
 def test_tag_scan_with_different_scanned(fake_fileexists):
    s = Scanner()
-    s.scan_type = ScanType.Tag
-    s.scanned_tags = set(['track', 'year'])
-    o1 = no('foo')
-    o2 = no('bar')
-    o1.artist = 'The White Stripes'
-    o1.title = 'some title'
-    o1.track = 'foo'
-    o1.year = 'bar'
-    o2.artist = 'The White Stripes'
-    o2.title = 'another title'
-    o2.track = 'foo'
-    o2.year = 'bar'
+    s.scan_type = ScanType.TAG
+    s.scanned_tags = set(["track", "year"])
+    o1 = no("foo")
+    o2 = no("bar")
+    o1.artist = "The White Stripes"
+    o1.title = "some title"
+    o1.track = "foo"
+    o1.year = "bar"
+    o2.artist = "The White Stripes"
+    o2.title = "another title"
+    o2.track = "foo"
+    o2.year = "bar"
    r = s.get_dupe_groups([o1, o2])
    eq_(len(r), 1)

+
 def test_tag_scan_only_scans_existing_tags(fake_fileexists):
    s = Scanner()
-    s.scan_type = ScanType.Tag
-    s.scanned_tags = set(['artist', 'foo'])
-    o1 = no('foo')
-    o2 = no('bar')
-    o1.artist = 'The White Stripes'
-    o1.foo = 'foo'
-    o2.artist = 'The White Stripes'
-    o2.foo = 'bar'
+    s.scan_type = ScanType.TAG
+    s.scanned_tags = set(["artist", "foo"])
+    o1 = no("foo")
+    o2 = no("bar")
+    o1.artist = "The White Stripes"
+    o1.foo = "foo"
+    o2.artist = "The White Stripes"
+    o2.foo = "bar"
    r = s.get_dupe_groups([o1, o2])
    eq_(len(r), 1)  # Because 'foo' is not scanned, they match

+
 def test_tag_scan_converts_to_str(fake_fileexists):
    s = Scanner()
-    s.scan_type = ScanType.Tag
-    s.scanned_tags = set(['track'])
-    o1 = no('foo')
-    o2 = no('bar')
+    s.scan_type = ScanType.TAG
+    s.scanned_tags = set(["track"])
+    o1 = no("foo")
+    o2 = no("bar")
    o1.track = 42
    o2.track = 42
    try:
@@ -292,31 +412,33 @@ def test_tag_scan_converts_to_str(fake_fileexists):
        raise AssertionError()
    eq_(len(r), 1)

+
 def test_tag_scan_non_ascii(fake_fileexists):
    s = Scanner()
-    s.scan_type = ScanType.Tag
-    s.scanned_tags = set(['title'])
-    o1 = no('foo')
-    o2 = no('bar')
-    o1.title = 'foobar\u00e9'
-    o2.title = 'foobar\u00e9'
+    s.scan_type = ScanType.TAG
+    s.scanned_tags = set(["title"])
+    o1 = no("foo")
+    o2 = no("bar")
+    o1.title = "foobar\u00e9"
+    o2.title = "foobar\u00e9"
    try:
        r = s.get_dupe_groups([o1, o2])
    except UnicodeEncodeError:
        raise AssertionError()
    eq_(len(r), 1)

+
 def test_ignore_list(fake_fileexists):
    s = Scanner()
-    f1 = no('foobar')
-    f2 = no('foobar')
-    f3 = no('foobar')
-    f1.path = Path('dir1/foobar')
-    f2.path = Path('dir2/foobar')
-    f3.path = Path('dir3/foobar')
+    f1 = no("foobar")
+    f2 = no("foobar")
+    f3 = no("foobar")
+    f1.path = Path("dir1/foobar")
+    f2.path = Path("dir2/foobar")
+    f3.path = Path("dir3/foobar")
    ignore_list = IgnoreList()
-    ignore_list.Ignore(str(f1.path), str(f2.path))
-    ignore_list.Ignore(str(f1.path), str(f3.path))
+    ignore_list.ignore(str(f1.path), str(f2.path))
+    ignore_list.ignore(str(f1.path), str(f3.path))
    r = s.get_dupe_groups([f1, f2, f3], ignore_list=ignore_list)
    eq_(len(r), 1)
    g = r[0]
@@ -327,19 +449,20 @@ def test_ignore_list(fake_fileexists):
    # Ignored matches are not counted as discarded
    eq_(s.discarded_file_count, 0)

+
 def test_ignore_list_checks_for_unicode(fake_fileexists):
-    #scanner was calling path_str for ignore list checks. Since the Path changes, it must
-    #be unicode(path)
+    # scanner was calling path_str for ignore list checks. Since the Path changes, it must
+    # be unicode(path)
    s = Scanner()
-    f1 = no('foobar')
-    f2 = no('foobar')
-    f3 = no('foobar')
-    f1.path = Path('foo1\u00e9')
-    f2.path = Path('foo2\u00e9')
-    f3.path = Path('foo3\u00e9')
+    f1 = no("foobar")
+    f2 = no("foobar")
+    f3 = no("foobar")
+    f1.path = Path("foo1\u00e9")
+    f2.path = Path("foo2\u00e9")
+    f3.path = Path("foo3\u00e9")
    ignore_list = IgnoreList()
-    ignore_list.Ignore(str(f1.path), str(f2.path))
-    ignore_list.Ignore(str(f1.path), str(f3.path))
+    ignore_list.ignore(str(f1.path), str(f2.path))
+    ignore_list.ignore(str(f1.path), str(f3.path))
    r = s.get_dupe_groups([f1, f2, f3], ignore_list=ignore_list)
    eq_(len(r), 1)
    g = r[0]
@@ -348,6 +471,7 @@ def test_ignore_list_checks_for_unicode(fake_fileexists):
    assert f2 in g
    assert f3 in g

+
 def test_file_evaluates_to_false(fake_fileexists):
    # A very wrong way to use any() was added at some point, causing resulting group list
    # to be empty.
@@ -355,19 +479,19 @@ def test_file_evaluates_to_false(fake_fileexists):
        def __bool__(self):
            return False

-
    s = Scanner()
-    f1 = FalseNamedObject('foobar', path='p1')
-    f2 = FalseNamedObject('foobar', path='p2')
+    f1 = FalseNamedObject("foobar", path="p1")
+    f2 = FalseNamedObject("foobar", path="p2")
    r = s.get_dupe_groups([f1, f2])
    eq_(len(r), 1)

+
 def test_size_threshold(fake_fileexists):
    # Only file equal or higher than the size_threshold in size are scanned
    s = Scanner()
-    f1 = no('foo', 1, path='p1')
-    f2 = no('foo', 2, path='p2')
-    f3 = no('foo', 3, path='p3')
+    f1 = no("foo", 1, path="p1")
+    f2 = no("foo", 2, path="p2")
+    f3 = no("foo", 3, path="p3")
    s.size_threshold = 2
    groups = s.get_dupe_groups([f1, f2, f3])
    eq_(len(groups), 1)
@@ -377,48 +501,52 @@ def test_size_threshold(fake_fileexists):
    assert f2 in group
    assert f3 in group

+
 def test_tie_breaker_path_deepness(fake_fileexists):
    # If there is a tie in prioritization, path deepness is used as a tie breaker
    s = Scanner()
-    o1, o2 = no('foo'), no('foo')
-    o1.path = Path('foo')
-    o2.path = Path('foo/bar')
+    o1, o2 = no("foo"), no("foo")
+    o1.path = Path("foo")
+    o2.path = Path("foo/bar")
    [group] = s.get_dupe_groups([o1, o2])
    assert group.ref is o2

+
 def test_tie_breaker_copy(fake_fileexists):
    # if copy is in the words used (even if it has a deeper path), it becomes a dupe
    s = Scanner()
-    o1, o2 = no('foo bar Copy'), no('foo bar')
-    o1.path = Path('deeper/path')
-    o2.path = Path('foo')
+    o1, o2 = no("foo bar Copy"), no("foo bar")
+    o1.path = Path("deeper/path")
+    o2.path = Path("foo")
    [group] = s.get_dupe_groups([o1, o2])
    assert group.ref is o2

+
 def test_tie_breaker_same_name_plus_digit(fake_fileexists):
    # if ref has the same words as dupe, but has some just one extra word which is a digit, it
    # becomes a dupe
    s = Scanner()
-    o1 = no('foo bar 42')
-    o2 = no('foo bar [42]')
-    o3 = no('foo bar (42)')
-    o4 = no('foo bar {42}')
-    o5 = no('foo bar')
+    o1 = no("foo bar 42")
+    o2 = no("foo bar [42]")
+    o3 = no("foo bar (42)")
+    o4 = no("foo bar {42}")
+    o5 = no("foo bar")
    # all numbered names have deeper paths, so they'll end up ref if the digits aren't correctly
    # used as tie breakers
-    o1.path = Path('deeper/path')
-    o2.path = Path('deeper/path')
-    o3.path = Path('deeper/path')
-    o4.path = Path('deeper/path')
-    o5.path = Path('foo')
+    o1.path = Path("deeper/path")
+    o2.path = Path("deeper/path")
+    o3.path = Path("deeper/path")
+    o4.path = Path("deeper/path")
+    o5.path = Path("foo")
    [group] = s.get_dupe_groups([o1, o2, o3, o4, o5])
    assert group.ref is o5

+
 def test_partial_group_match(fake_fileexists):
    # Count the number of discarded matches (when a file doesn't match all other dupes of the
    # group) in Scanner.discarded_file_count
    s = Scanner()
-    o1, o2, o3 = no('a b'), no('a'), no('b')
+    o1, o2, o3 = no("a b"), no("a"), no("b")
    s.min_match_percentage = 50
    [group] = s.get_dupe_groups([o1, o2, o3])
    eq_(len(group), 2)
@@ -431,16 +559,17 @@ def test_partial_group_match(fake_fileexists):
        assert o3 in group
    eq_(s.discarded_file_count, 1)

+
 def test_dont_group_files_that_dont_exist(tmpdir):
    # when creating groups, check that files exist first. It's possible that these files have
    # been moved during the scan by the user.
    # In this test, we have to delete one of the files between the get_matches() part and the
    # get_groups() part.
    s = Scanner()
-    s.scan_type = ScanType.Contents
+    s.scan_type = ScanType.CONTENTS
    p = Path(str(tmpdir))
-    p['file1'].open('w').write('foo')
-    p['file2'].open('w').write('foo')
+    p["file1"].open("w").write("foo")
+    p["file2"].open("w").write("foo")
    file1, file2 = fs.get_files(p)

    def getmatches(*args, **kw):
@@ -451,61 +580,64 @@ def test_dont_group_files_that_dont_exist(tmpdir):

    assert not s.get_dupe_groups([file1, file2])

+
 def test_folder_scan_exclude_subfolder_matches(fake_fileexists):
    # when doing a Folders scan type, don't include matches for folders whose parent folder already
    # match.
    s = Scanner()
-    s.scan_type = ScanType.Folders
+    s.scan_type = ScanType.FOLDERS
    topf1 = no("top folder 1", size=42)
-    topf1.md5 = topf1.md5partial = b"some_md5_1"
-    topf1.path = Path('/topf1')
+    topf1.md5 = topf1.md5partial = topf1.md5samples = b"some_md5_1"
+    topf1.path = Path("/topf1")
    topf2 = no("top folder 2", size=42)
-    topf2.md5 = topf2.md5partial = b"some_md5_1"
-    topf2.path = Path('/topf2')
+    topf2.md5 = topf2.md5partial = topf2.md5samples = b"some_md5_1"
+    topf2.path = Path("/topf2")
    subf1 = no("sub folder 1", size=41)
-    subf1.md5 = subf1.md5partial = b"some_md5_2"
-    subf1.path = Path('/topf1/sub')
+    subf1.md5 = subf1.md5partial = subf1.md5samples = b"some_md5_2"
+    subf1.path = Path("/topf1/sub")
    subf2 = no("sub folder 2", size=41)
-    subf2.md5 = subf2.md5partial = b"some_md5_2"
-    subf2.path = Path('/topf2/sub')
+    subf2.md5 = subf2.md5partial = subf2.md5samples = b"some_md5_2"
+    subf2.path = Path("/topf2/sub")
    eq_(len(s.get_dupe_groups([topf1, topf2, subf1, subf2])), 1)  # only top folders
    # however, if another folder matches a subfolder, keep in in the matches
    otherf = no("other folder", size=41)
-    otherf.md5 = otherf.md5partial = b"some_md5_2"
-    otherf.path = Path('/otherfolder')
+    otherf.md5 = otherf.md5partial = otherf.md5samples = b"some_md5_2"
+    otherf.path = Path("/otherfolder")
    eq_(len(s.get_dupe_groups([topf1, topf2, subf1, subf2, otherf])), 2)

+
 def test_ignore_files_with_same_path(fake_fileexists):
    # It's possible that the scanner is fed with two file instances pointing to the same path. One
    # of these files has to be ignored
    s = Scanner()
-    f1 = no('foobar', path='path1/foobar')
-    f2 = no('foobar', path='path1/foobar')
+    f1 = no("foobar", path="path1/foobar")
+    f2 = no("foobar", path="path1/foobar")
    eq_(s.get_dupe_groups([f1, f2]), [])

+
 def test_dont_count_ref_files_as_discarded(fake_fileexists):
    # To speed up the scan, we don't bother comparing contents of files that are both ref files.
    # However, this causes problems in "discarded" counting and we make sure here that we don't
    # report discarded matches in exact duplicate scans.
    s = Scanner()
-    s.scan_type = ScanType.Contents
+    s.scan_type = ScanType.CONTENTS
    o1 = no("foo", path="p1")
    o2 = no("foo", path="p2")
    o3 = no("foo", path="p3")
-    o1.md5 = o1.md5partial = 'foobar'
-    o2.md5 = o2.md5partial = 'foobar'
-    o3.md5 = o3.md5partial = 'foobar'
+    o1.md5 = o1.md5partial = o1.md5samples = "foobar"
+    o2.md5 = o2.md5partial = o2.md5samples = "foobar"
+    o3.md5 = o3.md5partial = o3.md5samples = "foobar"
    o1.is_ref = True
    o2.is_ref = True
    eq_(len(s.get_dupe_groups([o1, o2, o3])), 1)
    eq_(s.discarded_file_count, 0)

-def test_priorize_me(fake_fileexists):
-    # in ScannerME, bitrate goes first (right after is_ref) in priorization
+
+def test_prioritize_me(fake_fileexists):
+    # in ScannerME, bitrate goes first (right after is_ref) in prioritization
    s = ScannerME()
-    o1, o2 = no('foo', path='p1'), no('foo', path='p2')
+    o1, o2 = no("foo", path="p1"), no("foo", path="p2")
    o1.bitrate = 1
    o2.bitrate = 2
    [group] = s.get_dupe_groups([o1, o2])
    assert group.ref is o2
-
--- a/core/util.py
+++ b/core/util.py
@@ -5,38 +5,46 @@
 # http://www.gnu.org/licenses/gpl-3.0.html

 import time
+import sys
+import os

 from hscommon.util import format_time_decimal

+
 def format_timestamp(t, delta):
    if delta:
        return format_time_decimal(t)
    else:
        if t > 0:
-            return time.strftime('%Y/%m/%d %H:%M:%S', time.localtime(t))
+            return time.strftime("%Y/%m/%d %H:%M:%S", time.localtime(t))
        else:
-            return '---'
+            return "---"
+

 def format_words(w):
    def do_format(w):
        if isinstance(w, list):
-            return '(%s)' % ', '.join(do_format(item) for item in w)
+            return "(%s)" % ", ".join(do_format(item) for item in w)
        else:
-            return w.replace('\n', ' ')
+            return w.replace("\n", " ")
+
+    return ", ".join(do_format(item) for item in w)

-    return ', '.join(do_format(item) for item in w)

 def format_perc(p):
    return "%0.0f" % p

+
 def format_dupe_count(c):
-    return str(c) if c else '---'
+    return str(c) if c else "---"
+

 def cmp_value(dupe, attrname):
-    value = getattr(dupe, attrname, '')
+    value = getattr(dupe, attrname, "")
    return value.lower() if isinstance(value, str) else value

-def fix_surrogate_encoding(s, encoding='utf-8'):
+
+def fix_surrogate_encoding(s, encoding="utf-8"):
    # ref #210. It's possible to end up with file paths that, while correct unicode strings, are
    # decoded with the 'surrogateescape' option, which make the string unencodable to utf-8. We fix
    # these strings here by trying to encode them and, if it fails, we do an encode/decode dance
@@ -49,8 +57,10 @@ def fix_surrogate_encoding(s, encoding='utf-8'):
    try:
        s.encode(encoding)
    except UnicodeEncodeError:
-        return s.encode(encoding, 'replace').decode(encoding)
+        return s.encode(encoding, "replace").decode(encoding)
    else:
        return s


+def executable_folder():
+    return os.path.dirname(os.path.abspath(sys.argv[0]))
--- a/help/changelog
+++ b/help/changelog
@@ -1,3 +1,77 @@
+=== 4.2.1 (2022-03-25)
+* Default to English on unsupported system language (#976)
+* Fix image viewer zoom datatype issue (#978)
+* Fix errors from window change event (#937, #980)
+* Fix deprecation warning from SQLite
+* Enforce minimum Windows version in installer (#983)
+* Fix help path for local files
+* Drop python 3.6 support
+* VS Code project settings added, yaml validation for GitHub actions
+
+=== 4.2.0 (2021-01-24)
+
+* Add Malay and Turkish
+* Add dark style for windows (#900)
+* Add caching md5 file hashes (#942)
+* Add feature to partially hash large files, with user adjustable preference (#908)
+* Add portable mode (store settings next to executable)
+* Add file association for .dupeguru files on windows
+* Add ability to pass .dupeguru file to load on startup (#902)
+* Add ability to reveal in explorer/finder (#895)
+* Switch audio tag processing from hsaudiotag to mutagen (#440)
+* Add ability to use Qt dialogs instead of native OS dialogs for some file selection operations
+* Add OS and Python details to error dialog to assist in troubleshooting
+* Add preference to ignore large files with threshold (#430)
+* Fix error on close from DetailsPanel (#857, #873)
+* Change reference background color (#894, #898)
+* Remove stripping of unicode characters when matching names (#879)
+* Fix exception when deleting in delta view (#863, #905)
+* Fix dupes only view not updating after re-prioritize results (#757, #910, #911)
+* Fix ability to drag'n'drop file/folder with certain characters in name (#897)
+* Fix window position opening partially offscreen (#653)
+* Fix TypeError is photo mode (#551)
+* Change message for when files are deleted directly (#904)
+* Add more feedback during scan (#700)
+* Add Python version check to build.py (#589)
+* General code cleanups
+* Improvements to using standardized build tooling
+* Moved CI/CD to github actions, added codeql, SonarCloud
+
+=== 4.1.1 (2021-03-21)
+
+* Add Japanese 
+* Update internationalization and translations to be up to date with current UI.
+* Minor translation and UI language updates
+* Fix language selection issues on Windows (#760)
+* Add some additional notes about builds on Linux based systems
+* Add import from transifex export to build.py
+
+=== 4.1.0 (2020-12-29)
+
+* Use tabs instead of separate windows (#688)
+* Show the shortcut for "mark selected" in results dialog (#656, #641)
+* Add image comparison features to details dialog (#683)
+* Add the ability to use regex based exclusion filters (#705)
+* Change reference row background color, and allow user to adjust the color (#701)
+* Save / Load directories as XML (#706)
+* Workaround for EXIF IFD type mismatch in parsing function (#630, #698)
+* Progress dialog stuck at "Verified X/X matches" (#693, #694)
+* Fix word wrap in ignore list dialog (#687)
+* Fix issue with result window action on creation (#685)
+* Colorize details table differences, allow moving rows (#682)
+* Fix loading Result of 'Scan Type: Folders' shows only '---' in every table cell (#677, #676)
+* Fix issue with details and results dialog row trimming (#655, #654)
+* Add option to enable/disable bold font (#646, #314)
+* Use relative icon path for themes to override more easily (#746)
+* Fix issues with Python 3.8 compatibility (#665)
+* Fix flake8 issues (#672)
+* Update to use newer pytest and expand flake8 checking, cleanup various Deprecation Warnings
+* Add warnings to packaging script when files are not built (#691)
+* Use relative icon path for themes to override more easily (#746)
+* Update Packaging for Ubuntu (#593)
+* Minor Build Updates (#627, #575, #628, #614)
+* Update CI builds and add windows CI (#572, #669)
+
 === 4.0.4 (2019-05-13)

 * Update qt/platform.py to support other Unix style OSes (#444)
--- a/help/de/faq.rst
+++ b/help/de/faq.rst
@@ -1,7 +1,7 @@
 Häufig gestellte Fragen
 ==========================

-.. topic:: What is |appname|?
+.. topic:: What is dupeGuru?

    .. only:: edition_se

@@ -25,7 +25,7 @@ Häufig gestellte Fragen

 .. topic:: Was sind die Demo-Einschränkungen von dupeGuru?

-    Keine, |appname| ist `Fairware <http://open.hardcoded.net/about/>`_.
+    Keine, dupeGuru ist `Fairware <http://open.hardcoded.net/about/>`_.

 .. topic:: Die Markierungsbox einer Datei, die ich löschen möchte, ist deaktiviert. Was muss ich tun?

--- a/help/de/index.rst
+++ b/help/de/index.rst
@@ -1,21 +1,13 @@
-|appname| Hilfe
+dupeGuru Hilfe
 ===============

 .. only:: edition_se

-    Dieses Dokument ist auch auf `Englisch <http://www.hardcoded.net/dupeguru/help/en/>`__ und `Französisch <http://www.hardcoded.net/dupeguru/help/fr/>`__ verfügbar.
-
-.. only:: edition_me
-
-    Dieses Dokument ist auch auf `Englisch <http://www.hardcoded.net/dupeguru/help/en/>`__ und `Französisch <http://www.hardcoded.net/dupeguru_me/help/fr/>`__ verfügbar.
-
-.. only:: edition_pe
-
-    Dieses Dokument ist auch auf `Englisch <http://www.hardcoded.net/dupeguru/help/en/>`__ und `Französisch <http://www.hardcoded.net/dupeguru_pe/help/fr/>`__ verfügbar.
+    Dieses Dokument ist auch auf `Englisch <http://dupeguru.voltaicideas.net/help/en/>`__ und `Französisch <http://dupeguru.voltaicideas.net/help/fr/>`__ verfügbar.

 .. only:: edition_se or edition_me

-    |appname| ist ein Tool zum Auffinden von Duplikaten auf Ihrem Computer. Es kann entweder Dateinamen oder Inhalte scannen. Der Dateiname-Scan stellt einen lockeren Suchalgorithmus zur Verfügung, der sogar Duplikate findet, die nicht den exakten selben Namen haben.
+    dupeGuru ist ein Tool zum Auffinden von Duplikaten auf Ihrem Computer. Es kann entweder Dateinamen oder Inhalte scannen. Der Dateiname-Scan stellt einen lockeren Suchalgorithmus zur Verfügung, der sogar Duplikate findet, die nicht den exakten selben Namen haben.

 .. only:: edition_pe

@@ -23,7 +15,7 @@

 Obwohl dupeGuru auch leicht ohne Dokumentation genutzt werden kann, ist es sinnvoll die Hilfe zu lesen. Wenn Sie nach einer Führung für den ersten Duplikatscan suchen, werfen Sie einen Blick auf die :doc:`Schnellstart <quick_start>` Sektion

-Es ist eine gute Idee |appname| aktuell zu halten. Sie können die neueste Version auf der `homepage`_ finden.
+Es ist eine gute Idee dupeGuru aktuell zu halten. Sie können die neueste Version auf der http://dupeguru.voltaicideas.net finden.

 Inhalte:

--- a/help/en/contribute.rst
+++ b/help/en/contribute.rst
@@ -12,7 +12,7 @@ a community around this project.

 So, whatever your skills, if you're interested in contributing to dupeGuru, please do so. Normally,
 this documentation should be enough to get you started, but if it isn't, then **please**,
-`let me know`_ because it's a problem that I'm committed to fix. If there's any situation where you'd
+open a discussion at https://github.com/arsenetar/dupeguru/discussions.  If there's any situation where you'd
 wish to contribute but some doubt you're having prevent you from going forward, please contact me.
 I'd much prefer to spend the time figuring out with you whether (and how) you can contribute than
 taking the chance of missing that opportunity.
@@ -82,10 +82,9 @@ agree on what should be added to the documentation.
 dupeGuru. For more information about how to do that, you can refer to the `translator guide`_.

 .. _been open source: https://www.hardcoded.net/articles/free-as-in-speech-fair-as-in-trade
-.. _let me know: mailto:hsoft@hardcoded.net
-.. _Source code repository: https://github.com/hsoft/dupeguru
-.. _Issue Tracker: https://github.com/hsoft/dupeguru/issues
-.. _Issue labels meaning: https://github.com/hsoft/dupeguru/wiki/issue-labels
+.. _Source code repository: https://github.com/arsenetar/dupeguru
+.. _Issue Tracker: https://github.com/arsenetar/issues
+.. _Issue labels meaning: https://github.com/arsenetar/wiki/issue-labels
 .. _Sphinx: http://sphinx-doc.org/
 .. _reST: http://en.wikipedia.org/wiki/ReStructuredText
-.. _translator guide: https://github.com/hsoft/dupeguru/wiki/Translator-Guide
+.. _translator guide: https://github.com/arsenetar/wiki/Translator-Guide
--- a/help/en/developer/hscommon/jobprogress/qt.rst
+++ b/help/en/developer/hscommon/jobprogress/qt.rst
@@ -1,12 +0,0 @@
-hscommon.jobprogress.qt
-=======================
-
-.. automodule:: hscommon.jobprogress.qt
-
-    .. autosummary::
-        
-        Progress
-    
-    .. autoclass:: Progress
-        :members:
-
--- a/help/en/faq.rst
+++ b/help/en/faq.rst
@@ -151,8 +151,6 @@ delete files" option that is offered to you when you activate Send to Trash. Thi
 files to the Trash, but delete them immediately. In some cases, for example on network storage
 (NAS), this has been known to work when normal deletion didn't.

-If this fail, `HS forums`_ might be of some help.
-
 Why is Picture mode's contents scan so slow?
 --------------------------------------------

@@ -178,7 +176,6 @@ Preferences are stored elsewhere:
 * Linux: ``~/.config/Hardcoded Software/dupeGuru.conf``
 * Mac OS X: In the built-in ``defaults`` system, as ``com.hardcoded-software.dupeguru``

-.. _HS forums: https://forum.hardcoded.net/
-.. _Github: https://github.com/hsoft/dupeguru
-.. _open an issue: https://github.com/hsoft/dupeguru/wiki/issue-labels
+.. _Github: https://github.com/arsenetar/dupeguru
+.. _open an issue: https://github.com/arsenetar/dupeguru/wiki/issue-labels

--- a/help/en/index.rst
+++ b/help/en/index.rst
@@ -3,11 +3,11 @@ dupeGuru help

 This help document is also available in these languages:

-* `French <http://www.hardcoded.net/dupeguru/help/fr>`__
-* `German <http://www.hardcoded.net/dupeguru/help/de>`__
-* `Armenian <http://www.hardcoded.net/dupeguru/help/hy>`__
-* `Russian <http://www.hardcoded.net/dupeguru/help/ru>`__
-* `Ukrainian <http://www.hardcoded.net/dupeguru/help/uk>`__
+* `French <http://dupeguru.voltaicideas.net/help/fr>`__
+* `German <http://dupeguru.voltaicideas.net/help/de>`__
+* `Armenian <http://dupeguru.voltaicideas.net/help/hy>`__
+* `Russian <http://dupeguru.voltaicideas.net/help/ru>`__
+* `Ukrainian <http://dupeguru.voltaicideas.net/help/uk>`__

 dupeGuru is a tool to find duplicate files on your computer. It has three
 modes, Standard, Music and Picture, with each mode having its own scan types
@@ -42,4 +42,4 @@ Indices and tables
 * :ref:`genindex`
 * :ref:`search`

-.. _homepage: https://www.hardcoded.net/dupeguru
+.. _homepage: https://dupeguru.voltaicideas.net/
--- a/help/fr/faq.rst
+++ b/help/fr/faq.rst
@@ -3,7 +3,7 @@ Foire aux questions

 .. contents::

-Qu'est-ce que |appname|?
+Qu'est-ce que dupeGuru?
 ------------------------

 .. only:: edition_se
--- a/help/fr/index.rst
+++ b/help/fr/index.rst
@@ -1,21 +1,13 @@
-Aide |appname|
+Aide dupeGuru
 ===============

 .. only:: edition_se

-    Ce document est aussi disponible en `anglais <http://www.hardcoded.net/dupeguru/help/en/>`__, en `allemand <http://www.hardcoded.net/dupeguru/help/de/>`__ et en `arménien <http://www.hardcoded.net/dupeguru/help/hy/>`__.
-
-.. only:: edition_me
-
-    Ce document est aussi disponible en `anglais <http://www.hardcoded.net/dupeguru_me/help/en/>`__, en `allemand <http://www.hardcoded.net/dupeguru_me/help/de/>`__ et en `arménien <http://www.hardcoded.net/dupeguru_me/help/hy/>`__.
-
-.. only:: edition_pe
-
-    Ce document est aussi disponible en `anglais <http://www.hardcoded.net/dupeguru_pe/help/en/>`__, en `allemand <http://www.hardcoded.net/dupeguru_pe/help/de/>`__ et en `arménien <http://www.hardcoded.net/dupeguru_pe/help/hy/>`__.
+    Ce document est aussi disponible en `anglais <http://dupeguru.voltaicideas.net/help/en/>`__, en `allemand <http://dupeguru.voltaicideas.net/help/de/>`__ et en `arménien <http://dupeguru.voltaicideas.net/help/hy/>`__.

 .. only:: edition_se or edition_me

-    |appname| est un outil pour trouver des doublons parmi vos fichiers. Il peut comparer soit les noms de fichiers, soit le contenu. Le comparateur de nom de fichier peut trouver des doublons même si les noms ne sont pas exactement pareils.
+    dupeGuru est un outil pour trouver des doublons parmi vos fichiers. Il peut comparer soit les noms de fichiers, soit le contenu. Le comparateur de nom de fichier peut trouver des doublons même si les noms ne sont pas exactement pareils.

 .. only:: edition_pe

@@ -23,7 +15,7 @@ Aide |appname|

 Bien que dupeGuru puisse être utilisé sans lire l'aide, une telle lecture vous permettra de bien comprendre comment l'application fonctionne. Pour un guide rapide pour une première utilisation, référez vous à la section :doc:`Démarrage Rapide <quick_start>`.

-C'est toujours une bonne idée de garder |appname| à jour. Vous pouvez télécharger la dernière version sur sa `page web`_.
+C'est toujours une bonne idée de garder dupeGuru à jour. Vous pouvez télécharger la dernière version sur sa http://dupeguru.voltaicideas.net.

 Contents:

--- a/help/hy/faq.rst
+++ b/help/hy/faq.rst
@@ -1,7 +1,7 @@
 Հաճախ Տրվող Հարցեր
 ==========================

-.. topic:: Ի՞նչ է |appname|-ը:
+.. topic:: Ի՞նչ է dupeGuru-ը:

    .. only:: edition_se

--- a/help/hy/index.rst
+++ b/help/hy/index.rst
@@ -1,21 +1,13 @@
-|appname| help
+dupeGuru help
 ===============

 .. only:: edition_se

-    Այս փաստաթուղթը հասանելի է նաև՝ `Ֆրանսերեն <http://www.hardcoded.net/dupeguru/help/fr/>`__ և `Գերմաներեն <http://www.hardcoded.net/dupeguru/help/de/>`__.
-
-.. only:: edition_me
-
-    Այս փաստաթուղթը հասանելի է նաև՝ `Ֆրանսերեն <http://www.hardcoded.net/dupeguru_me/help/fr/>`__ և `Գերմաներեն <http://www.hardcoded.net/dupeguru_me/help/de/>`__.
-
-.. only:: edition_pe
-
-    Այս փաստաթուղթը հասանելի է նաև՝ `Ֆրանսերեն <http://www.hardcoded.net/dupeguru_pe/help/fr/>`__ և `Գերմաներեն <http://www.hardcoded.net/dupeguru_pe/help/de/>`__.
+    Այս փաստաթուղթը հասանելի է նաև՝ `Ֆրանսերեն <http://dupeguru.voltaicideas.net/help/fr/>`__ և `Գերմաներեն <http://dupeguru.voltaicideas.net/help/de/>`__.

 .. only:: edition_se or edition_me

-    |appname| ծրագիր է՝ գտնելու կրկնօրինակ ունեցող ֆայլեր Ձեր համակարգչում: Այն կարող է անգամ ստուգել ֆայլի անունները կան բովանդակությունը: Ֆայլի անվան ստուգման հնարավորությունները ոչ ճշգրիտ համընկման ալգորիթմով, որը կարող է գտնել ֆայլի անվան կրկնօրինակներ, անգամ եթե դրանք նույնը չեն:
+    dupeGuru ծրագիր է՝ գտնելու կրկնօրինակ ունեցող ֆայլեր Ձեր համակարգչում: Այն կարող է անգամ ստուգել ֆայլի անունները կան բովանդակությունը: Ֆայլի անվան ստուգման հնարավորությունները ոչ ճշգրիտ համընկման ալգորիթմով, որը կարող է գտնել ֆայլի անվան կրկնօրինակներ, անգամ եթե դրանք նույնը չեն:

 .. only:: edition_pe

@@ -23,7 +15,7 @@

 Չնայած dupeGuru-ն կարող է հեշտությամբ օգտագործվել առանց օգնության, այնուհանդերձ եթե կարդաք այս ֆայլը, այն մեծապես կօգնի Ձեզ ընկալելու ծրագրի աշխատանքը: Եթե Դուք նայում եք ձեռնարկը կրկնօրինակների առաջին ստուգման համար, ապա կարող եք ընտրել :doc:`Արագ Սկիզբ <quick_start>` հատվածը:

-Շատ լավ միտք է պահելու |appname| թարմացված: Կարող եք բեռնել վեբ կայքի համապատասխան էջից `homepage`_:
+Շատ լավ միտք է պահելու dupeGuru թարմացված: Կարող եք բեռնել վեբ կայքի համապատասխան էջից http://dupeguru.voltaicideas.net:

 Պարունակությունը.

--- a/help/ru/faq.rst
+++ b/help/ru/faq.rst
@@ -1,7 +1,7 @@
 Часто задаваемые вопросы
 ==========================

-.. topic:: Что такое |appname|?
+.. topic:: Что такое dupeGuru?

    .. only:: edition_se

--- a/help/ru/index.rst
+++ b/help/ru/index.rst
@@ -1,21 +1,11 @@
-|appname| help
+dupeGuru help
 ===============

-.. only:: edition_se
-
-    Этот документ также доступна на `французском <http://www.hardcoded.net/dupeguru/help/fr/>`__, `немецком <http://www.hardcoded.net/dupeguru/help/de/>`__ и `армянский <http://www.hardcoded.net/dupeguru/help/hy/>`__.
-
-.. only:: edition_me
-
-    Этот документ также доступна на `французском <http://www.hardcoded.net/dupeguru_me/help/fr/>`__, `немецкий <http://www.hardcoded.net/dupeguru_me/help/de/>`__ и `армянский <http://www.hardcoded.net/dupeguru_me/help/hy/>`__.
-
-.. only:: edition_pe
-
-    Этот документ также доступна на `французском <http://www.hardcoded.net/dupeguru_pe/help/fr/>`__, `немецкий <http://www.hardcoded.net/dupeguru_pe/help/de/>`__ и `армянский <http://www.hardcoded.net/dupeguru_pe/help/hy/>`__.
+Этот документ также доступна на `французском <http://dupeguru.voltaicideas.net/help/fr/>`__, `немецком <http://dupeguru.voltaicideas.net/help/de/>`__ и `армянский <http://dupeguru.voltaicideas.net/help/hy/>`__.

 .. only:: edition_se or edition_me

-    |appname| есть инструмент для поиска дубликатов файлов на вашем компьютере. Он может сканировать либо имен файлов или содержимого.Имя файла функций сканирования нечеткого соответствия алгоритма, который позволяет найти одинаковые имена файлов, даже если они не совсем то же самое.
+    dupeGuru есть инструмент для поиска дубликатов файлов на вашем компьютере. Он может сканировать либо имен файлов или содержимого.Имя файла функций сканирования нечеткого соответствия алгоритма, который позволяет найти одинаковые имена файлов, даже если они не совсем то же самое.

 .. only:: edition_pe

@@ -23,7 +13,7 @@

 Хотя dupeGuru может быть легко использована без документации, чтение этого файла поможет вам освоить его. Если вы ищете руководство для вашей первой дублировать сканирования, вы можете взглянуть на раздел :doc:`Быстрый <quick_start>` Начало.

-Это хорошая идея, чтобы сохранить |appname| обновлен. Вы можете скачать последнюю версию на своей `homepage`_.
+Это хорошая идея, чтобы сохранить dupeGuru обновлен. Вы можете скачать последнюю версию на своей http://dupeguru.voltaicideas.net.
 Содержание:

 .. toctree::
--- a/help/uk/faq.rst
+++ b/help/uk/faq.rst
@@ -1,7 +1,7 @@
 Часті питання
 ==========================

-.. topic:: Що таке |appname|?
+.. topic:: Що таке dupeGuru?

    .. only:: edition_se

--- a/help/uk/index.rst
+++ b/help/uk/index.rst
@@ -1,21 +1,13 @@
-|appname| help
+dupeGuru help
 ===============

 .. only:: edition_se

-    Цей документ також доступна на `французькому <http://www.hardcoded.net/dupeguru/help/fr/>`__, `німецький <http://www.hardcoded.net/dupeguru/help/de/>`__ і `Вірменський <http://www.hardcoded.net/dupeguru/help/hy/>`__.
-
-.. only:: edition_me
-
-    Цей документ також доступна на `французькому  <http://www.hardcoded.net/dupeguru_me/help/fr/>`__, `німецький <http://www.hardcoded.net/dupeguru_me/help/de/>`__ і `Вірменський <http://www.hardcoded.net/dupeguru_me/help/hy/>`__.
-
-.. only:: edition_pe
-
-    Цей документ також доступна на `французькому <http://www.hardcoded.net/dupeguru_pe/help/fr/>`__, `німецький <http://www.hardcoded.net/dupeguru_pe/help/de/>`__ і `Вірменський <http://www.hardcoded.net/dupeguru_pe/help/hy/>`__.
+    Цей документ також доступна на `французькому <http://dupeguru.voltaicideas.net/help/fr/>`__, `німецький <http://dupeguru.voltaicideas.net/help/de/>`__ і `Вірменський <http://dupeguru.voltaicideas.net/help/hy/>`__.

 .. only:: edition_se or edition_me

-    |appname| це інструмент для пошуку дублікатів файлів на вашому комп'ютері. Він може сканувати або імен файлів або вмісту. Файл функцій сканування нечіткого відповідності алгоритму, який дозволяє знайти однакові імена файлів, навіть якщо вони не зовсім те ж саме.
+    dupeGuru це інструмент для пошуку дублікатів файлів на вашому комп'ютері. Він може сканувати або імен файлів або вмісту. Файл функцій сканування нечіткого відповідності алгоритму, який дозволяє знайти однакові імена файлів, навіть якщо вони не зовсім те ж саме.

 .. only:: edition_pe

@@ -23,7 +15,7 @@

 Хоча dupeGuru може бути легко використана без документації, читання цього файлу допоможе вам освоїти його. Якщо ви шукаєте керівництво для вашої першої дублювати сканування, ви можете поглянути на: :doc:`Quick Start <quick_start>` 

-Це гарна ідея, щоб зберегти |appname| оновлено. Ви можете завантажити останню версію на своєму `homepage`_.
+Це гарна ідея, щоб зберегти dupeGuru оновлено. Ви можете завантажити останню версію на своєму http://dupeguru.voltaicideas.net.

 Contents:

--- a/1
+++ b/1
--- a/hscommon/.gitignore
+++ b/hscommon/.gitignore
@@ -0,0 +1,5 @@
+*.pyc
+*.mo
+*.so
+.DS_Store
+/docs_html
--- a/Show More
+++ b/Show More