1
0
mirror of https://github.com/arsenetar/dupeguru.git synced 2026-01-25 16:11:39 +00:00

Compare commits

...

419 Commits
4.0.4 ... 4.3.1

Author SHA1 Message Date
1f1dfa88dc Update version & changelog for 4.3.1 release 2022-07-07 22:06:06 -05:00
916c5204cf Update translations from transifex 2022-07-07 21:57:59 -05:00
71af825b37 Move try/except of cache db to get() and put()
- Move the try/except of cache db calls to the calls themselves.
- Add some additional information to logging statements on cache db
  exception to improve troubleshooting.
2022-07-07 21:52:22 -05:00
97f490b8b7 Fix typo in engine.py 2022-07-07 19:06:35 -05:00
d369bcddd7 Updates from investigation of #1015
- Add protection for empty hash digests in comparison of non-zero size
  files
- Bump version to 4.3.1-dev for identification
2022-07-07 19:00:09 -05:00
360dceca7b Update to version 4.3.0, update changelog 2022-06-30 23:27:14 -05:00
92b27801c3 Update translations, remove iphoto_plist.py 2022-06-30 23:03:40 -05:00
Marcus Yanello
b9aabb8545 Redirect stdout from custom command to the log files (#1008)
Send the logs for the custom command subprocess to the logs
Closes #1007
2022-06-13 21:04:40 -05:00
d5eeab4a17 Additional type hints in hscommon 2022-05-11 00:50:34 -05:00
7865e4aeac Type hinting hscommon & cleanup 2022-05-09 23:36:39 -05:00
58863b1728 Change to use a real temporary directory for test
app_test was not using a real temporary location originally
2022-05-09 01:46:42 -05:00
e382683f66 Replace all relative imports 2022-05-09 01:40:08 -05:00
f7ed1c801c Add type hinting to desktop.py 2022-05-09 01:15:25 -05:00
f587c7b5d8 Removed unused code in hscommon/util
Also added type hints throughout
2022-05-09 00:47:57 -05:00
40ff40bea8 Move create_qsettings() out of preferences
- Load order was impacting translations
- Fix by moving create_qsettings() for now
2022-05-08 20:33:31 -05:00
7a44c72a0a Complete removal of qtlib locale files 2022-05-08 19:52:25 -05:00
66aff9f74e Update pot files
This "moves" the translation points from qtlib.pot to ui.pot.
Needs further updates to propagate across.
2022-05-08 19:28:37 -05:00
5451f55219 Move qtlib localization files to top level 2022-05-08 19:23:13 -05:00
36280b01e6 Finish moving all qtlib py files to qt 2022-05-08 19:22:08 -05:00
18359c3ea6 Start flattening Qtlib into qt
- Remove app.py from qtlib (unused)
- Remove .gitignore from qtlib (unecessary)
- Move contents of preferences.py in qtlib to qt, clean up references
- Simplify language dropdown code
2022-05-08 18:51:10 -05:00
0a4e61edf5 Additional cleanup per mypy
- Add Callable type to hasher (should realy be more specific...)
- Add type hint to COLUMNS in qtlib/table.py
- Use Qt.ItemFlag.ItemIsEnabled instead of Qt.itemIsEnabled in qtlib/table.py
2022-04-30 05:16:46 -05:00
d73a85b82e Add type hints for compiled modules 2022-04-30 05:11:54 -05:00
81c593399e Format changes with black 2022-04-27 20:59:20 -05:00
6a732a79a8 Remove old tx config 2022-04-27 20:58:30 -05:00
63dd4d4561 Apply pyupgrade changes 2022-04-27 20:53:12 -05:00
e0061d7bc1 Fix #989, typo in debian control file 2022-04-02 16:43:19 -05:00
c5818b1d1f Add option to profile scans
- Add preference for profiling scans
- Move debug options to tab in preferences
- Add label with clickable link to debug output (appdata) to debug tab in preferences
- Update translation source files
2022-03-31 00:16:37 -05:00
a470a8de25 Update fs.py to optimize stat() calls
- Update to get size and mtime at time of class creation when os.DirEntry is used for initialization.
- Folders still calculate size later for folder scans.
- Ref #962, #959
2022-03-30 22:58:01 -05:00
a37b5b0eeb Fix #988 2022-03-30 01:06:51 -05:00
efd500ecc1 Update directory scanning to use os.scandir()
- Change to use os.scandir() instead of os.walk() to leverage DirEntry objects.
- Avoids extra calls to stat() on files during fs.can_handle()
- See 3x speed improvement on Windows in some cases
2022-03-29 23:37:56 -05:00
43fcc52291 Replace pathlib.glob() with os.scandir() in fs.py 2022-03-29 22:35:38 -05:00
50f5db1543 Update fs to support DirEntry on get_file() 2022-03-29 22:32:36 -05:00
a5b0ccdd02 Improve performance of Directories.get_state() 2022-03-29 21:48:14 -05:00
143147cb8e Remove Cocoa specific and other unused code 2022-03-28 00:47:46 -05:00
ebb81d9f03 Remove pathlib function added in Python 3.9 2022-03-28 00:06:32 -05:00
da9f8b2b9d Squashed commit of the following:
commit 8b15fe9a502ebf4841c6529e7098cef03a6a5e6f
Author: Andrew Senetar <arsenetar@gmail.com>
Date:   Sun Mar 27 23:48:15 2022 -0500

    Finish up changes to copy_or_move

commit 21f6a32cf3186a400af8f30e67ad2743dc9a49bd
Author: Andrew Senetar <arsenetar@gmail.com>
Date:   Thu Mar 17 23:56:52 2022 -0500

    Migrate from hscommon.path to pathlib
    - Part one, this gets all hscommon and core tests passing
    - App appears to be able to load directories and complete scans, need further testing
    - app.py copy_or_move needs some additional work
2022-03-27 23:50:03 -05:00
5ed5eddde6 Add polib back to requirements.txt 2022-03-27 22:35:34 -05:00
9f40e4e786 Squashed commit of the following:
commit 5eb515f666bfa1ff06c2e96bdc351a4b7456580e
Author: Andrew Senetar <arsenetar@gmail.com>
Date:   Sun Mar 27 22:19:39 2022 -0500

    Add fallback to md5 if xxhash not available

    Mainly here for the case when distributions have not packaged python3-xxhash.

commit 51b18d4c84
Author: Andrew Senetar <arsenetar@gmail.com>
Date:   Sat Mar 19 15:25:46 2022 -0500

    Switch file hashing to xxhash instead of md5

    - Improves performance significantly in some cases
    - Add xxhash to requirements.txt and sort requirements
    - Rename md5 based members to digest
    - Update all tests to use new member names and hashing methods
    - Update hash db code to upgrade schema

    NOTE: May consider supporting multiple hashing algorithms in the future.
2022-03-27 22:27:13 -05:00
86bf9b39d0 Add update check function and call from about
- Implement a update check against the GitHub releases via the api
- Add semantic-version dependency
- Add automatic check when opening about dialog
2022-03-27 21:13:27 -05:00
c0be0aecbd Minor documentation update 2022-03-27 21:04:37 -05:00
c408873d20 Update changelog 2022-03-25 23:37:46 -05:00
bbcdfbf698 Add vscode extension recommendation 2022-03-21 22:27:16 -05:00
8cee1a9467 Fix internal links in CONTRIBUTING.md 2022-03-21 22:19:58 -05:00
448d33dcb6 Add workflow yml validation settings
- Add yml validation to project for vscode
- Allow .vscode/settings.json
- Apply formatting to workflow files
2022-03-21 22:18:22 -05:00
8d414cadac Add initial partial CONTRIBUTING.md
- Adopt a CONTRIBUTING.md format similar to that used by atom/atom.
- Add label section as replacement to wiki
- Add style guide section
- Setup basic document structure

TODO:
- Migrate some existing wiki information here where applicable.
- Migrate some existing help information here.
- Finish up remaining sections.
2022-03-21 22:04:45 -05:00
f902ee889a Add configuration for isort to pyproject.toml 2022-03-21 00:25:36 -05:00
bc89e71935 Update .gitignore
- Pull from github/gitignore to cover some things better
- Organize remaining items
- Remove a few no longer relevant items
2022-03-20 23:25:01 -05:00
17b83c8001 Move polib to setup_requires instead of install_requires 2022-03-20 22:48:03 -05:00
0f845ee67a Update min python version in Makefile 2022-03-20 01:23:01 -05:00
d40e32a143 Update transifex config & pull latest updates
- Update transifex configuration to new format
- Pull translation updates
2022-03-19 20:21:14 -05:00
1bc206e62d Bump version to 4.2.1 2022-03-19 19:02:41 -05:00
106a0feaba Add sponsor information 2022-03-19 17:46:12 -05:00
984e0c4094 Fix help path for local files and some help doc updates 2022-03-19 17:43:11 -05:00
9321e811d7 Enforce minimum Windows version ref #983 2022-03-19 17:01:54 -05:00
a64fcbfb5c Fix deprecation warning from sqlite 2022-03-19 17:01:53 -05:00
cff07a12d6 Black formatter changes 2022-03-19 17:01:53 -05:00
Alfonso Montero
b9c7832c4a Apply @arsenetar's proposed change to fix for errors on window change event. Solves #937. (#980) 2022-03-15 20:47:48 -05:00
b9dfeac2f3 Drop Python 3.6 Support 2022-03-15 05:10:41 -05:00
efc99eee96 Merge pull request #978 from glubsy/fix_zoom_scrollbar
Fix image viewer scrollbar zoom
2022-03-14 20:43:40 -05:00
glubsy
ff7733bb73 Fix image viewer
When zooming in or out, the value computed might be a float instead
of an int, which is what the QScrollBar expect for its setValue method.
Simply casting to int should be enough here.
2022-03-12 22:36:17 +01:00
4b2fbe87ea Default to English on unsupported system language Fix #976
- Add check for supported language to system locale detection
- Fall-back to English when not a supported locale
2022-03-12 04:36:13 -06:00
9e4b41feb5 Fix BASE_PATH for frozen macOS app 2022-03-09 06:50:41 -06:00
cbfa8720f1 Update imports for objc module 2022-03-09 05:01:12 -06:00
a02c5e5b9b Add built modules as artifacts 2022-03-04 01:14:01 -06:00
35e6ffd6af Fix macOS packaging issue 2022-02-09 22:33:41 -06:00
e957f840da Fix python version check in makefile, close #971 2022-02-09 21:59:35 -06:00
85e22089bd Black formatting changes 2022-02-09 21:49:51 -06:00
b7d68b4458 Update debian control template depends 2022-02-09 21:45:45 -06:00
8f440603ee Add Python 3.10 to tox.ini 2022-01-25 10:39:52 -06:00
5d8e559ca3 Fix issue introduced in fix for #900 2022-01-25 10:39:08 -06:00
2c11eecf97 Update version and changelog to 4.2.0 2022-01-24 22:28:40 -06:00
02803f738b Update translation files including Malay 2022-01-24 21:05:33 -06:00
db27e6a645 Add Malay to language selection 2022-01-24 21:02:57 -06:00
c9c35cc60d Add translation source file for dark style change. 2022-01-24 19:33:42 -06:00
880205dbc8 Fix python 3.10 in default action 2022-01-24 19:30:42 -06:00
6456e64328 Update python versions for CI/CD
- Update python versions for Default action
- Set python versions for sonarcloud
2022-01-24 19:27:29 -06:00
f6a0c0cc6d Add initial dark style for use in Windows
- Other platforms can achieve this with the OS theme so not enabled for them at this time.
- Adds preference in display options to use dark style, default is false.
2022-01-24 19:14:30 -06:00
eb57d269fc Update translation source files 2021-11-23 21:11:30 -06:00
34f41dc522 Merge pull request #942 from Dobatymo/hash-cache
Implement hash cache for md5 hash based on sqlite
2021-11-23 21:08:22 -06:00
Dobatymo
77460045c4 clean up abstraction 2021-10-29 15:24:47 +08:00
Dobatymo
9753afba74 change FilesDB to singleton class
move hash calculation back in to Files class
clear cache now clears hash cache in addition to picture cache
2021-10-29 15:12:40 +08:00
Dobatymo
1ea108fc2b changed cache filename 2021-10-29 15:12:40 +08:00
Dobatymo
2f02a6010d implement hash cache for md5 hash based on sqlite 2021-10-29 15:12:40 +08:00
b80489fd66 Update translation source files 2021-09-15 20:15:09 -05:00
1d60e124ee Update invoke_custom_command to run for all selected items 2021-09-02 20:48:25 -05:00
e22d7d2fc9 Remove filtering of 0 size files in engine
Files size is already able to be filtered at a higher level, some users
may decide to see zero length files. Fix #321.
2021-08-28 18:16:22 -05:00
0a0694e095 Expand fix for #630 to fix #551 2021-08-28 17:29:25 -05:00
3da9d5d869 Update documentation files, add multi-language doc build
- Update links in documentation, and some errors
- Remove non-existent page
- Update build to build all languages with --alldoc flag
- Fix one minor debugging change introduced in package.py
2021-08-28 17:07:18 -05:00
78fb052d77 Add more progress details to getmatches, ref #700 2021-08-28 04:58:22 -05:00
9805cba10d Use different message for direct delete success, close #904 2021-08-28 04:27:34 -05:00
4c3dfe2f1f Provide more feedback during scans
- Add output for number of collected files / folders
- Update to allow indeterminate progress bar
- Remove unused hscommon\jobprogress\qt.py
2021-08-28 04:05:07 -05:00
b0baa5bfd6 Add windows position handling at open, fix #653
- Move offscreen windows back on screen
- Restore maximized state without impacting resored size
- Fullscreen comes back on primary screen, needs further work to support
  restore on other screens
2021-08-27 23:26:19 -05:00
22996ee914 Merge pull request #935 from chchia/master
resize preference dialog file size box
2021-08-27 21:57:03 -05:00
chchia
31ec9c667f resize preference dialog file size box 2021-08-28 10:28:06 +08:00
3045361243 Add preference to ignore large files, close #430 2021-08-27 05:35:54 -05:00
809116c764 Fix CodeQL Alerts
- Cast int to Py_ssize_t for multiplication
2021-08-26 03:43:31 -05:00
83f401595d Minor Updates
- Cleanup extension modules in setup.py to use correct namespaces
- Update build.py to leverage setup.py for modules
- Roll mutagen required version back to 1.44.0 to support more distros
- Change build.py and sphinxgen.py to use pathlib
- Remove hsaudiotag from package list for debian and arch
2021-08-26 03:29:24 -05:00
814d145366 Updates to setup files
- Include additional non-python files in MANIFEST.in (package_data in
  setup.cfg was not including the files)
- Update requirements in setup.cfg
2021-08-25 04:10:38 -05:00
efb76c7686 Add OS and Python Information to error dialog 2021-08-25 02:05:18 -05:00
47dbe805bb More cleanup and fixed a flake8 build issue 2021-08-25 01:11:24 -05:00
f11fccc889 More cleanups
- Cleanup columns.py and tables
- Other misc cleanups
- Remove text_field.py from qtlib as it is not used
- Remove unused variables from image_viewer method
2021-08-25 00:46:33 -05:00
2e13c4ccb5 Update internationalization files 2021-08-24 03:54:54 -05:00
da72ffd1fd Add ability to use non-native dialog for directories
- Add preference for native dialogs
- Add non-native directory selection to allow selecting multiple folders
  fixes #874 when using non-native.
2021-08-24 03:52:43 -05:00
2c9437bef4 Fix #897 2021-08-24 03:13:03 -05:00
f9085386a6 First pass code cleanup in qt/qtlib 2021-08-24 00:12:23 -05:00
d576a7043c Code cleanups in core and other affected files 2021-08-21 18:02:02 -05:00
1ef5f56158 Code cleanups in hscommon & external effects 2021-08-21 16:56:27 -05:00
f9316de244 Code cleanups in hscommon\tests 2021-08-21 16:25:33 -05:00
0189c29f47 Misc cleanups in core/tests 2021-08-21 03:52:09 -05:00
b4fa1d68f0 Add check for python version to build.py, close #589 2021-08-20 23:49:20 -05:00
16df882481 Update requirements.txt for previous change 2021-08-19 00:17:46 -05:00
58c04ff9ad Switch from hsaudiotag to mutagen, close #440
- This opens up the ability to support more tags and audio information
- Also makes progress on #333
2021-08-19 00:14:26 -05:00
6b8f85e39a Reveal in Explorer / Finder, close #895 2021-08-18 20:51:45 -05:00
2fff1a3436 Add ablity to load results at start, closes #902
- Add ablility to load .dupguru file at start by passing as first argument
- Add file association to .dupeguru file in windows at install
2021-08-18 19:24:14 -05:00
a685524dd5 Add files for more standardized build tools
- Prior investigation into linux packaging (not using pyinstaller) suggested
having setuptools files could make packaging easier and automatable
- Add setup.cfg and setup.py as initial starting point
- Add MANIFEST.in (at least temporarily)

Currently with the python build module this almost works for main application.
It does not include all the extra data files right now.
2021-08-18 04:12:38 -05:00
74918e2c56 Attempt to fix apt-get failure 2021-08-18 03:07:47 -05:00
18895d983b Fix syntax error in codeql-analysis.yml 2021-08-18 03:04:44 -05:00
fe720208ea Add minimum custom build for codeql cpp 2021-08-18 02:49:20 -05:00
091d9e9239 Create codeql-analysis.yml
Test out codeql
2021-08-18 02:33:40 -05:00
5a4958cff9 Update translation .pot files 2021-08-17 21:18:47 -05:00
be10b462fc Add portable mode
If settings.ini is present next to the executable, will run in portable mode.
This results in settings, data, and cache all being in same folder as dupeGuru.
2021-08-17 21:12:32 -05:00
d62b13bcdb Removing travis
- All CI is now covered by Github Actions
- Remove .travis.yml
- Remove tox-travis in requirements-extra.txt
2021-08-17 18:16:20 -05:00
06eca11f0b Remove format check from lint job 2021-08-17 00:52:14 -05:00
2879f18e0d Run linting and formatting check in parallel before test 2021-08-17 00:50:41 -05:00
3ee21771f9 Fix workflow file format 2021-08-17 00:33:54 -05:00
c0ba6fb57a Test out github actions
Add a workflow to test
2021-08-17 00:31:15 -05:00
bc942b8263 Add black format check to tox runs 2021-08-15 04:10:46 -05:00
ffe6b7047c Format all files with black correcting line length 2021-08-15 04:10:18 -05:00
9446f37fad Remove flake8 E731 Errors
Note: black formatting is now applying correctly as well.
2021-08-15 03:53:43 -05:00
af19660c18 Update flake8 and black configuration
- Update black to now use 120 lines
- Update flake8 to use recommended settings for black integration
2021-08-15 03:32:31 -05:00
99ad297906 Change preferences to use spinboxes where applicable
- Change LineEdit to Spinbox for minimum file size 0-1,000,000KB
- Change LineEdit to Spinbox for big file size 0-1,000,000MB
2021-08-15 02:11:42 -05:00
e11f996dfc Merge pull request #908 from glubsy/hash_sample_optimization
Hash sample optimization
2021-08-13 23:41:17 -05:00
glubsy
e95306e58f Fix flake 8 2021-08-14 02:52:00 +02:00
glubsy
891a875990 Cache constant expression
Perhaps the python byte code is already optimized, but just in case it is not, keep pre-compute the constant expression.
2021-08-13 21:33:21 +02:00
glubsy
545a5a75fb Fix for older python versions
The "walrus" operator is only available in python 3.8 and later. Fall back to more traditional notation.
2021-08-13 20:56:33 +02:00
glubsy
7b764f183e Avoid partially hashing small files
Computing 3 hash samples for files less than 3MiB (3 * CHUNK_SIZE) is not efficient since spans of later samples would overlap a previous one.
Therefore we can simply return the hash of the entire small file instead.
2021-08-13 20:47:01 +02:00
fdc8a17d26 Update .travis.yml
- Windows test uses 3.9.6 now
- Intentation changes
2021-08-07 19:35:57 -05:00
cb3bbbec6e Upgrade Requirement Minimums
- Upgrade requirements to specify more current minimums
- Remove compatability code from sphinxgen for old versions
- Upgrade pyinstaller to a minimum version that works with latest macOS
2021-08-07 19:28:41 -05:00
c51a82a2ce Fix Issues from Translation Update
- Add Qtlib to transifex config
- Pull latest qtlib translations
- Fix flake8 error
- Remove code for manual translation import, use transifex-client instead
2021-08-06 22:21:35 -05:00
0cd8f5e948 Update translation pot files 2021-08-06 21:41:52 -05:00
9c09607c08 Add Turkish & Updates from Transifex
- Pull updates from Transifex
- Add Turkish
- Sort language lists in code
- Remove old locale conversion code as it appears to work correctly on
windows without different conversions.
2021-08-06 21:41:52 -05:00
3bd342770c Update configurations
- Enable Unicode for NSIS Installer
- Update transifex config to new project
2021-08-06 21:41:52 -05:00
14b456dcf9 Merge pull request #927 from glubsy/fix_directories_tests
Fix Directories regex test
2021-08-06 20:08:27 -05:00
glubsy
3dccb686e2 Fix Directories regex test
The entire path to the file would match unless another path separator is added.
2021-08-06 17:18:23 +02:00
0db66baace Merge pull request #907 from glubsy/missing_renamed_regex
Missing renamed regex
2021-08-03 22:26:08 -05:00
e3828ae2ca Merge pull request #911 from glubsy/fix_757_fix_regression
Fix infinite recursion
2021-06-22 22:44:12 -05:00
glubsy
23c59787e5 Fix infinite recursion
Force the Results to update its internal __dupes list whenever at least one group has re-prioritized and changed its dupes/ref.
2021-06-23 05:36:10 +02:00
2f8d603251 Merge pull request #910 from glubsy/757_fix
Fix refs appearing in dupes-only view
2021-06-22 21:54:49 -05:00
glubsy
a51f263632 Fix refs appearing in dupes-only view
* Some refs appeared in the dupes-only view after a re-prioritization was done a second time.
* It seems the core.Results.__dupes list was not properly updated whenever core.app.Dupeguru.reprioritize_groups() -> core.Results.sort_dupes() was called.
When a re-prioritization is done, some refs became dupe, and some dupes became ref in their place. So we need to update the new state of the internal list of dupes kept by the Results object, instead of relying on the outdated cached one.
* Fix #757.
2021-06-22 22:57:57 +02:00
glubsy
718ca5b313 Remove unused import 2021-06-22 02:41:33 +02:00
glubsy
277bc3fbb8 Add unit tests for hash sample optimization
* Instead of keeping md5 samples separate, merge them as one hash computed from the various selected chunks we picked.
* We don't need to keep a boolean to see whether or not the user chose to optimize; we can simply compare the value of the threshold, since 0 means no optimization currently active.
2021-06-21 22:44:05 +02:00
glubsy
e07dfd5955 Add partial hashes optimization for big files
* Big files above the user selected threshold can be partially hashed in 3 places.
* If the user is willing to take the risk, we consider files with identical md5samples as being identical.
2021-06-21 19:03:21 +02:00
4641bd6ec9 Merge pull request #905 from glubsy/fix_863
Fix exception when deleting while in delta view
2021-06-19 20:29:47 -05:00
glubsy
a6f83ad3d7 Fix missing regexp after rename
* Doing a full match should be safer to avoid partial results which would result in overly aggressive filtering.
* Add new tests to test suite to cover this issue.
* Fixes #903.
2021-06-19 02:00:25 +02:00
glubsy
ab8750eedb Fix partial regex match yielding false positive 2021-06-17 03:49:59 +02:00
glubsy
22033211d6 Fix exception when deleting while in delta view 2021-05-31 23:49:21 +02:00
0b46ca2222 Merge pull request #879 from glubsy/fix_unicode
Fix stripping (japanese) unicode characters
2021-05-25 19:11:19 -05:00
72e0f76242 Merge pull request #898 from AlttiRi/master
Change reference background color #894
2021-05-25 19:10:31 -05:00
[Alt'tiRi]
65c1d463f8 Change reference background color #894 2021-05-22 02:52:41 +03:00
e6c791ab0a Merge pull request #884 from samusz/master
Small typo
2021-05-09 23:32:32 -05:00
Sacha Muszlak
78f5088101 Merge pull request #1 from samusz/samusz-patch-1
typo correction
2021-05-07 09:41:47 +02:00
Sacha Muszlak
095df5eb95 typo correction 2021-05-07 09:40:08 +02:00
glubsy
f1ae478433 Fix including character at the border 2021-04-29 05:29:35 +02:00
glubsy
c4dcfd3d4b Fix stripping (japanese) unicode characters
* Accents are getting removed from Unicode characters to generate similar "words".
* Non-latin characters which cannot be processed that way (eg. japanese, greek, russian, etc.) should not be filtered out at all otherwise files are erroneously skipped or detected as dupes if only some characters make it passed the filter.
* Starting from an arbitrary unicode codepoint (converted to decimal), above which we know it is pointless to try any sort of processing, we leave the characters as is.
* Fix #878.
2021-04-29 05:15:34 +02:00
0840104edf Merge pull request #873 from glubsy/fix_857
Fix 857
2021-04-20 20:05:05 -05:00
glubsy
6b4b436251 Fix crash on shutdown
* Fixes "'DetailsPanel' object has no attribute '_table'" error on shutdown if the Results table is updated (item removed) while the Details Dialog is shown as a floating window.
* It seems that QApplication.quit() triggers some sort of refresh on the floating QDockWidget, which in turn makes calls to the underlying model that is possibly being destroyed, ie. there might be a race condition here.
* Closing or hiding the QDockWidget before the cal to quit() is a workaround. Similarly, this is already done in the quitTriggered() method anyway.
* This fixes #857.
2021-04-16 17:54:49 +02:00
glubsy
d18b8c10ec Remove redundant assignment
The "app" field is already set in the parent class.
2021-04-15 18:03:00 +02:00
4a40b346a4 Update to 4.1.1 2021-03-21 22:50:33 -05:00
035cdc23b1 Update translations from Transifex 2021-03-21 22:45:19 -05:00
fbdb333457 Update a few translation items
- Add Japanese as a selectable language
- Wrap a few missed strings in tr()
- Regenerate .pot files
2021-03-17 20:21:29 -05:00
e36aab177c Add import feature to build.py for translations 2021-03-17 19:55:00 -05:00
77116ba94b Bring in the languages that came incorrect last import again 2021-03-17 19:44:16 -05:00
d7f79aefd2 Remove translations imported incorrectly 2021-03-17 19:40:47 -05:00
4c939f379c Update translations from transifex 2021-03-09 21:16:37 -06:00
d098fe2281 Update translation pot files 2021-03-09 20:38:03 -06:00
09cfbad38d Merge pull request #844 from glubsy/translation_fixes
Fix problematic string for translations
2021-03-09 20:19:08 -06:00
glubsy
528dedd813 Fix problematic string for translations
Some languages have very different phrase syntaxes depending on which word is used.
Better used two separate strings than a dynamically created one.
2021-02-09 01:40:00 +01:00
b30d67b834 Merge pull request #775 from glubsy/PR_typo_fix
Fix label strings
2021-02-02 19:08:28 -06:00
glubsy
3e6e74e2a9 Update URL 2021-01-30 22:17:43 +01:00
glubsy
b919b3ddc8 Fix typo 2021-01-30 04:20:22 +01:00
glubsy
be3862fa8f fix typo 2021-01-29 18:56:29 +01:00
glubsy
da09920553 Update exclusion filter help string 2021-01-29 17:57:44 +01:00
glubsy
2baba3bfa0 Fix selection label 2021-01-29 17:38:37 +01:00
a659a70dbe Add transifex project link to readme 2021-01-28 23:04:44 -06:00
c9e48a5e3b Update pyrcc5 note with new information
New information about the other system package which resolves the dependency
added.
This was brought up in #766.
2021-01-21 19:08:59 -06:00
68711162d1 Add note about pyrcc5 2021-01-21 18:49:44 -06:00
0b0fd36629 Revert "Update ReadMe and requirements"
This reverts commit bf5d151799.
2021-01-21 18:33:40 -06:00
bf5d151799 Update ReadMe and requirements
- On linux (Debian based) pyrcc5 does not make it onto the path so
updating the notes here to take care of this behavior and update requirements
so virtual environment load it correctly.
- Fix #766
2021-01-21 18:13:17 -06:00
e29a427caf Update translation files 2021-01-11 22:38:03 -06:00
95ccbad92b Fix #760, issue with language on windows
Fix the issue related to run.py qsettings not using the same options as
in preferences.py
2021-01-11 21:41:14 -06:00
421a58a61c Merge pull request #758 from serg-z/serg-z/prioritize-dialog-multi-selections
Prioritize dialog: adding/removing multiple items, adding/removing on double clicking an item, drag-n-drop fix
2021-01-11 18:50:15 -06:00
Sergey Zhuravlevich
b5a3313f80 Prioritize dialog: fix drag-n-drop putting items before the last item
When the items in the prioritizations list were drag-n-dropped to the
empty space, the row was equal to -1 and the dropped items ended up
being moved to the position before the last item. Fixing the row value
helps to avoid that behavior.

Signed-off-by: Sergey Zhuravlevich <sergey@zhur.xyz>
2021-01-07 17:42:43 +01:00
Sergey Zhuravlevich
116ac18e13 Prioritize dialog: add/remove criteria on double clicking an item
Signed-off-by: Sergey Zhuravlevich <sergey@zhur.xyz>
2021-01-07 17:42:43 +01:00
Sergey Zhuravlevich
32dcd90b50 Prioritize dialog: allow removing multiple prioritizations at once
Removing prioritizations one-by-one can be tedious. This commit enables
extended selection in the prioritizations list. Multiple items can be
selected with conventional methods, such as holding down Ctrl or Shift
key and clicking the items or holding down the left mouse button and
hovering the cursor over the list. All items also can be selected with
Ctrl+A.

Multiple items drag-n-drop is also possible.

To avoid confusion, the selection in the prioritizations list is cleared
after the items are removed or drag-n-dropped.

Signed-off-by: Sergey Zhuravlevich <sergey@zhur.xyz>
2021-01-07 17:42:30 +01:00
Sergey Zhuravlevich
c2fef8d624 Prioritize dialog: allow adding multiple criteria at once
Adding criteria to the prioritizations list one-by-one can be tedious.
This commit enables extended selection in the criteria list and
implements adding multiple items. Multiple criteria can be selected with
conventional methods, such as holding down Ctrl or Shift keys and
clicking the items or holding down the left mouse button and hovering
the cursor over the list. All items also can be selected with Ctrl+A.

Signed-off-by: Sergey Zhuravlevich <sergey@zhur.xyz>
2021-01-07 17:42:07 +01:00
fd0adc77b3 Update Readme notes for system setup 2021-01-06 12:22:15 -06:00
6a03e1e399 Update URLs 2021-01-05 23:21:44 -06:00
ae51842007 Update README.md 2021-01-05 23:04:42 -06:00
ab6acd9e88 Merge pull request #733 from glubsy/dev
Increment version to 4.1.0
2021-01-05 22:48:21 -06:00
6a2c1eb293 Fix flake8 issues introduced in package.py 2020-12-30 20:04:14 -06:00
7b4c31d262 Update for macos Qt version
- Update package.py to include a pyinstaller based packaging
- Update requirements and requirements-extra
- Add icon for macos
- Add macos.md for instructions
2020-12-30 16:44:27 -06:00
glubsy
5553414205 Fix updating QTableView on input
* When clicking on the test regex button or editing the test input field, the tableView doesn't update its data properly.
* Somehow QTableView.update() doesn't request the data from the model.
* The workaround is to call refresh on the model directly, which will in turn update its view.
2020-12-30 23:18:42 +01:00
glubsy
b138dfad33 Fix exception when testing invalid regex
* If a regex in the table is invalid and failed to compile, its "compiled" property is None.
* Only test against the regex if its compilation worked.
2020-12-30 22:50:42 +01:00
701e6d4bb2 Merge pull request #755 from glubsy/packaging
Fix Debian packaging issues
2020-12-30 14:41:34 -06:00
b44d1652b6 Change windows to use ini in AppData 2020-12-30 12:43:10 -06:00
glubsy
990eaaa797 Update requirements.txt
* Recently, the "hsaudiotag3k" on pypi has changed name slightly
* The actual version is now "1.1.3.post1"
* This avoids errors when invoking `pip -r requirements.txt`
2020-12-30 18:52:37 +01:00
glubsy
348ce95f83 Remove comment
* There is a bug with pyqt5<=5.14 where the table does not update after a call to update() and needs to receive a mouse click event in order to repaint as expected.
* This does not affect Windows only as this is a Qt5 bug.
* This seems to be fixed with pyqt5>=5.15.1.
2020-12-30 18:44:38 +01:00
glubsy
3255bdf0a2 Fix incorrect path 2020-12-30 17:55:53 +01:00
glubsy
1058247b44 Fix missing application icon
Should be placed in /usr/share/pixmaps for .dekstop file to point to it.
2020-12-30 00:24:15 +01:00
glubsy
7414f82e28 Fix missing directory for pixmap symlink in Debian 2020-12-29 23:57:10 +01:00
glubsy
8105bb709f Fix debian src package build
Workaround "dpkg-source: error: can't build with source format '3.0 (native)': native package version may not have a revision" error as mentioned in #753
2020-12-29 23:45:15 +01:00
ec628751af Minor cleanup to Windows.md 2020-12-29 14:56:37 -06:00
glubsy
288023d03e Update changelog 2020-12-29 21:51:16 +01:00
glubsy
7740dfca0e Update Readme 2020-12-29 21:31:36 +01:00
1e12ad8d4c Clean up Makefile & unused files
- Remove requirements-windows.txt as no longer used
- Remove srcpkg.sh as not up to date and not used
- Minor cleanup in makefile
- Update minimum python version to 3.6 in makefile
2020-12-29 14:08:37 -06:00
glubsy
c1d94d6771 Merge branch 'master' into dev 2020-12-29 20:10:42 +01:00
7f691d3c31 Merge pull request #705 from glubsy/exclude_list
Add Exclusion Filters
2020-12-29 12:56:44 -06:00
glubsy
a93bd3aeee Add missing translation hooks 2020-12-29 18:52:22 +01:00
glubsy
39d353d073 Add comment about Win7 bug
* For some reason the table view doesn't update properly after the test string button is clicked nor when the input field is edited
* The table rows only get repainted the rows properly after receiving a mouse click event
* This doesn't happen on Linux
2020-12-29 18:28:30 +01:00
glubsy
b76e86686a Tweak green color on exclude table 2020-12-29 16:41:34 +01:00
glubsy
b5f59d27c9 Brighten up validation color
Dark green lacks contrast against black foreground font
2020-12-29 16:31:03 +01:00
glubsy
f0d3dec517 Fix exclude tests 2020-12-29 16:07:55 +01:00
glubsy
90c7c067b7 Merge branch 'master' into exclude_list 2020-12-29 15:55:44 +01:00
c8cfa954d5 Minor packaging cleanups
- Fix issue with newline in pkg/debian/source/format
- Update pyinstaller requirement to support python 3.8/3.9
2020-12-28 22:51:09 -06:00
glubsy
e533a396fb Remove redundant check 2020-12-29 05:39:26 +01:00
glubsy
4b4cc04e87 Fix directories tests on Windows
Regexes did not match properly because the separator for Windows is '\\'
2020-12-29 05:35:30 +01:00
e822a67b38 Force correct python environment for tox on windows 2020-12-28 21:18:16 -06:00
c30c3400d4 Fix typo in .travis.yml 2020-12-28 21:07:49 -06:00
d539517525 Update Windows Requirements & CI
- Merge windows requirements into requirements.txt and requirements-extra.txt
- Update tox.ini to always use build.py
- Update build.py to have module only option
- Update tox.ini to text python 3.9
- Update .travis.yml to test 3.8 and 3.9 on newer Ubuntu LTS
-Update .travis.yml to work with changes to windows tox
(also update windows to 3.8)
2020-12-28 20:59:01 -06:00
glubsy
07eba09ec2 Fix error after merging branches 2020-12-29 01:01:26 +01:00
glubsy
7f19647e4b Remove unused lines 2020-12-29 00:56:25 +01:00
bf7d720126 Merge pull request #746 from glubsy/PR_iconpath
Make icon path relative
2020-12-28 14:47:34 -06:00
glubsy
6bc619055e Change version to 4.1.0 2020-12-06 20:13:03 +01:00
glubsy
452d1604bd Make icon path relative
* Removes the hardcoded path to the icon in the .desktop file
* Allows themes to override the default application icon (icons are searched for in theme paths first)
* Debian: create symbolic link in /usr/share/pixmaps that points to the icon file
* Arch: the same thing is done by PKGBUILD maintainers downstream
2020-12-06 18:36:52 +01:00
glubsy
680cb581c1 Merge branch 'master' into exclude_list 2020-10-28 03:58:05 +01:00
1d05f8910d Merge pull request #701 from glubsy/PR_ref_row_background_color
Change reference row background color
2020-10-27 21:53:53 -05:00
glubsy
bd09b30468 Merge branch 'master' into PR_ref_row_background_color 2020-10-28 03:50:13 +01:00
8d9933d035 Merge pull request #683 from glubsy/details_dialog_improvements
Add image comparison features to details dialog
2020-10-27 21:28:23 -05:00
glubsy
cf5ba038d7 Remove icon credits from about box
* Moved credits to CREDITS file
* Updated exchange icon with higher hue contrast for better visibility on dark backgrounds
2020-10-28 02:18:41 +01:00
glubsy
59ce740369 Remove print debug statements 2020-10-28 01:50:49 +01:00
glubsy
92feba5f08 Remove obsolete UI setup code 2020-10-28 01:48:39 +01:00
glubsy
a265b71d36 Improve comment reflecting modification of function 2020-10-28 01:45:03 +01:00
8d26c921a0 Merge pull request #706 from glubsy/save_directories
Save/Load directories in Directories
2020-10-27 19:10:11 -05:00
glubsy
32d66cd19b Move up to 4.0.5
* Initial push to 4.0.5 milestone
* Update changelog
2020-10-27 19:38:51 +01:00
glubsy
735ba2fd0e Update error dialog traceback message for users
* Incite users to look for already existing issues
* Also invite them to test the very latest version available first
2020-10-27 18:23:14 +01:00
glubsy
b16b6ecf4d Fix error after merging branches 2020-10-27 18:15:15 +01:00
glubsy
2875448c71 Merge branch 'save_directories' into dev 2020-10-27 16:23:49 +01:00
glubsy
51b76385c0 Merge branch 'exclude_list' into dev 2020-10-27 16:23:43 +01:00
glubsy
b9f8dd6ea0 Merge branch 'PR_ref_row_background_color' into dev 2020-10-27 16:23:35 +01:00
glubsy
6623b04403 Merge branch 'details_dialog_improvements' into dev 2020-10-27 16:23:23 +01:00
glubsy
424d34a7ed Add desktop.ini to filter list 2020-09-04 19:07:07 +02:00
glubsy
2a032d24bc Save/Load directories in Directories
* Add the ability to save / load directories as XML, just like the last_directories.xml which get loaded on program start.
2020-09-04 18:56:25 +02:00
glubsy
b8af2a4eb5 Don't show parent window's context menu on viewers
* When right clicking on image viewers while they are docked, the context menu of the Results window showed up.
* This also enables capture of right click and middle click buttons to drag around images, which solves a conflict with some theme engines that enable left mouse button click to drag a window's position regardless of where the event happens, hence blocking the panning.
* Probably unnecessary to check which button is released.
2020-09-03 01:44:01 +02:00
glubsy
a55e02b36d Fix table maximum size being off by a few pixels
* Sometimes, the splitter doesn't fully reach the table maximum height, and the scrollbar is still displayed on the right because a few pixels are still hidden.
* It seems the splitter handle counts towards the total height of the widget (the table), so we add it to the maximum height of the table
* The scrollbar disappears when we reach just above the actual table's height
2020-09-02 23:45:31 +02:00
glubsy
18c933b4bf Prevent widget from stretching in layout
* In some themes, the color picker widgets get stretched, while the color picker for the details dialog group doesn't.
This should keep them a bit more consistent across themes.
2020-09-02 20:26:23 +02:00
glubsy
ea11a566af Highlight rows when testing regex string
* Add testing feature to Exclusion dialog to allow users to test regexes against an arbitrary string.
* Fixed test suites.
* Improve comments and help dialog box.
2020-09-01 23:02:58 +02:00
glubsy
584e9c92d9 Fix duplicate items in menubar
* When recreating the Results window, the menubar had duplicate items added each time.
* Removing the underlying C++ object is apparently enough to fix the issue.
* SetParent(None) can still be used in case of floating windows
2020-08-31 21:23:53 +02:00
glubsy
4a1641e39d Add test suite, fix bugs 2020-08-31 20:35:56 +02:00
glubsy
26d18945b1 Fix tab indices not aligned with stackwidget's
* The custom QStackWidget+QTabBar class did not manage the tabs properly because the indices in the stackwidget were not aligned with the ones in the tab bar.
* Properly disable exclude list action when it is the currently displayed widget.
* Merge action callbacks for triggering ignore list or exclude list to avoid repeating code and remove unused checks for tab visibility.
* Remove unused SetTabVisible() function.
2020-08-23 16:49:43 +02:00
glubsy
3382bd5e5b Fix crash when recreating Results window/tab
* We need to set the Details Dialog's previous instance to None when recreating a new Results window
otherwise Qt crashes since we are probably dereferencing a dangling reference.
* Also fixes Results tab not showing up when selecting it from the View menu.
2020-08-20 17:12:39 +02:00
glubsy
9f223f3964 Concatenate regexes prio to compilation
* Concatenating regexes into one Pattern might yield better performance under (un)certain conditions.
* Filenames are tested against regexes with no os.sep in them. This may or may not be what we want to do.
And alternative would be to test against the whole (absolute) path of each file, which would filter more agressively.
2020-08-20 02:46:06 +02:00
glubsy
2eaf7e7893 Implement exclude list dialog on the Qt side 2020-08-17 05:54:59 +02:00
glubsy
a26de27c47 Implement dialog and base classes for model/view 2020-08-14 20:19:47 +02:00
glubsy
21e62b7374 Colorize background for reference row
As per issue #647, highlight background color for reference for better readability.
2020-08-12 21:37:29 +02:00
9e6b117327 Merge pull request #698 from glubsy/fix_630
Workaround for #630
2020-08-06 23:16:02 -05:00
glubsy
3333d26557 Try to handle conversion to int or fail gracefully 2020-08-07 00:37:37 +02:00
glubsy
6e81042989 Workaround for #630
* In some cases, the function dump_IFD() in core/pe/exif.py assigns a string instead of an int as "values".
* This value is then used as _cached_orientation in core/pe/photo.py in _get_orientation().
* The method _plat_get_blocks() in qt/pe/photo.py was only expecting an integer for the orientation argument, so we work around the issue for now by ignoring the value if it's a string.
2020-08-06 00:23:49 +02:00
glubsy
470307aa3c Ignore path and filename based on regex
* Added initial draft for test suit
* Fixed small logging bug
2020-08-03 16:19:27 +02:00
glubsy
089f00adb8 Fix typo in class member reference 2020-08-03 16:18:15 +02:00
glubsy
76fbfc2822 Fix adding new Result tab if already existed
* Whenever the Result Window already existed and its tab was in second position, and if the ignore list tab was in 3rd position, asking to show the Result window through the View menu would add a new tab and push the Result tab to the third position (ignore list tab would then become 2nd position).
* Fix view menu Directories entry not switching to index "0" in custom tab bar.
2020-08-02 16:12:47 +02:00
glubsy
866bf996cf Prevent Directories tab from closing on MacOS
* The close button on custom tabs cannot be hidden on MacOS for some reason.
* Prevent the directories tab from closing if the close button was clicked by mistake
2020-08-01 19:35:12 +02:00
glubsy
0104d8922c Fix alignment for combo box's label 2020-08-01 19:11:37 +02:00
glubsy
fbd7c4fe5f Tweak visuals for cache selection item 2020-08-01 19:07:45 +02:00
glubsy
de5e61293b Add stretch to bottom of General pref tab 2020-08-01 19:02:04 +02:00
glubsy
a3e402a3af Group general interface options together
* Use QGroupBox to keep items together on the display tab in the preference dialog just like for the other options.
* It is probably not be necessary to keep these as class members
2020-08-01 18:50:44 +02:00
glubsy
056fa819cc Revert stretching last section in Result window
* It seems that stretching the last section automatically is a bit inconvenient on MacOS as it will grow beyond the window border.
* Keep it as it was before for now until a better solution is devised.
2020-08-01 18:42:46 +02:00
glubsy
3be1ee87c6 Merge branch 'master' into details_dialog_improvements 2020-08-01 18:29:22 +02:00
glubsy
628d772766 Use FormLayout instead of GridLayout
QFormLayout should adhere to each platform's style better. It also simplifies the code a bit since we don't have to setup the labels, etc.
2020-08-01 17:40:31 +02:00
glubsy
acdeb01206 Tweak preference layout for better readability
* We use GroupBoxes to group items together and surround them in a frame
* Remove separator lines to avoid cluttering
* Adjust columns and their stretch factors for better alignment of buttons
2020-08-01 16:42:14 +02:00
ab402d4024 Merge pull request #688 from glubsy/tab_window
Use tabs instead of floating windows
2020-07-31 22:11:31 -05:00
glubsy
d2cdcc989b Fix 1 pixel sized color in color picker buttons
* On Linux, even with 1 pixel size, the button is filled entirely with the color selected
* On MacOS, the color pixmap stays at 1 pixel so we hard code the size to 16x16
2020-08-01 02:09:38 +02:00
glubsy
2620d0080c Fix layout error
* Avoid attempting to add a QLayout to DetailsDialog which already has a layout by removing superfluous layout setup.
2020-07-31 22:37:18 +02:00
glubsy
63a9f00552 Add minor change to variable names 2020-07-31 22:27:18 +02:00
glubsy
87f9317805 Place tab bar below menu bar by default 2020-07-31 16:59:34 +02:00
glubsy
a542168a0d Reorganize view menu entries and keep consistency 2020-07-31 16:57:18 +02:00
glubsy
86e1b55b02 Fix menu items being wrongly disabled
* Add Directories to the View menu.
* View menu items should be disabled properly depending on whether they point to the current page/tab.
* Keep "Load scan results" actions active while viewing pages other than the Directories tab.
2020-07-31 05:08:08 +02:00
glubsy
1b3b40543b Fix ignore list view menu entry being disabled 2020-07-31 03:59:37 +02:00
glubsy
dd6ffe08d7 Add option to place tab bar below main menu 2020-07-31 01:32:29 +02:00
glubsy
11254381a8 Save dock panel position on quit
* Restore the details dialog dock position if it was previously docked (i.e. not floating).
* Since the details_dialog instance was not deleted after closing by default, the previous instances were still saving their own geometry. We now delete them explicitely if we have to recreate a new instance to avoid the signal triggering the callback to save the geometry.
* Since restoreGeometry() and saveGeometry() are only called in our QDockWidget, it should be safe to modify the methods for the Preferences class (in qtlib).
2020-07-30 20:25:20 +02:00
glubsy
23642815f6 Remove unused properties in details table headers 2020-07-30 15:38:37 +02:00
glubsy
7e4f371841 Avoid crash when quitting
* If details dialog failed to be created for some reason, avoid crashing by dereferencing a null pointer
2020-07-30 15:30:09 +02:00
glubsy
9b8637ffc8 Stretch last header section in Result window 2020-07-30 15:16:31 +02:00
glubsy
79613f9b1e Fix crash quitting while details dialog active
* While the details dialog is opened, if quit is triggered, the error message "'DetailsPanel' object has no attribute '_table'" is reported
* A workaround is to cleanly close the dialog before tear down
2020-07-30 03:22:13 +02:00
glubsy
fa54e93236 Add preference to turn off scrollbars in viewers
Refactor preference Display page to only include PE specific preferences in the PE mode.
2020-07-30 03:13:58 +02:00
glubsy
8fb82ae3d8 Merge branch 'master' into tab_window 2020-07-29 21:48:32 +02:00
glubsy
eab5003e61 Add color preference for delta in details table 2020-07-29 21:43:45 +02:00
glubsy
da8c493c9f Toggle visibility of details dialog
* When using the Ctrl+I shortcut or the "Details" button in the Results window, toggle the details dialog on/off.
* This works also while it is docked.
2020-07-29 20:43:18 +02:00
glubsy
9795f14176 Fix title bar toggling on/off when dialog 2020-07-29 20:00:27 +02:00
glubsy
1937120ad7 Fix toggling details view via menu or shortcut
* Using Ctrl+I would toggle the title bar on/off
2020-07-29 04:51:03 +02:00
glubsy
1823575af4 Fix swapping table view columns
We now have only two columns to swap, not 3.
2020-07-29 04:26:40 +02:00
glubsy
7dc9f25b06 Merge branch 'master' into details_dialog_improvements 2020-07-29 04:20:16 +02:00
5502b48089 Merge pull request #685 from glubsy/fix_result_window_action
Fix updating result window action upon creation
2020-07-28 20:05:10 -05:00
f02b66fd54 Merge pull request #682 from glubsy/details_table_tweaks
Colorize details table differences, allow moving around of rows
2020-07-28 19:33:21 -05:00
d2235f9bc9 Merge pull request #694 from glubsy/fix_matchblock_freeze
Work around frozen progress dialog
2020-07-28 18:10:24 -05:00
glubsy
5f5f9232c1 Properly wait for multiprocesses to exit
* Fix for #693
2020-07-28 16:44:06 +02:00
c36fd84512 Merge pull request #691 from glubsy/fix_package_script
Fix error in package script for (Arch) Linux
2020-07-28 00:51:17 -05:00
glubsy
63b2f95cfa Work around frozen progress dialog
* It seems that matchblock.getmatches() returns too early and the (multi-)processes become zombies
* This is a workaround which seems to work by sleeping for one second and avoid zombie processes
2020-07-25 23:37:41 +02:00
glubsy
d193e1fd12 Fix typo in error message 2020-07-24 03:50:08 +02:00
glubsy
f0adf35db4 Add helpful message in build files are missing 2020-07-24 03:48:07 +02:00
glubsy
49a1beb225 Avoid using workarounds in package script
* Just like the Windows package function counterpart, better abort building the package if the help and locale files have not been build instead of ignoring the error
2020-07-24 03:33:13 +02:00
glubsy
f19b5d6ea6 Fix error in package script for (Arch) Linux
* While packaging, the "build/help" and "build/locale" directories are not found.
* Work around the issue with try/except statements.
2020-07-24 03:23:03 +02:00
glubsy
730fadf63f Merge branch 'preferences_tabs' into details_dialog_improvements 2020-07-22 22:41:22 +02:00
glubsy
9ae0d7e5cf Add color picker buttons to preferences dialog
* Buttons display the color currently in use
* Result table uses selected colors accordingly
* Keep items aligned with GridLayouts in preference dialog
* Reordering of items in a more logical manner*
2020-07-22 22:12:46 +02:00
1167519730 Merge pull request #687 from glubsy/ignore_list_wordwrap
Fix word wrap in ignore list dialog
2020-07-21 20:39:14 -05:00
glubsy
cf64565012 Add option to use internal icons in details dialog
* On Windows and MacOS, no idea how themes work so only allow Linux to use their theme icons
* Internal icons are used by default on non-Linux platforms
2020-07-21 03:52:15 +02:00
glubsy
298f659f6e Fix Restore Default Preferences button
* When clicking the "Restore Default" in the preferences dialog, only affect the preferences displayed in the current tab. The hidden tab should not be affected by this button.
2020-07-20 05:04:25 +02:00
glubsy
3539263437 Add tabs to preference dialog. 2020-07-20 03:10:06 +02:00
glubsy
6213d50670 Squashed commit of the following:
commit ac941037ff
Author: glubsy <glubsy@users.noreply.github.com>
Date:   Thu Jul 16 22:21:24 2020 +0200

    Fix resize of top frame not updating scaled pixmap

    * Also limit viewing features such as zoom levels when files have different dimensions
    * GraphicsViewImageViewer is still a bit buggy: the scrollbars are toggled on when the pixmap is null in the reference viewer (we do not use that class right anyway)

commit 733b3b0ed4
Author: glubsy <glubsy@users.noreply.github.com>
Date:   Thu Jul 16 01:31:24 2020 +0200

    Prevent zoom for images of differing dimensions

    * If images are not the same size, prevent zooming features from being used by disabling the normal size button, only enable swap

commit 9168d72f38
Author: glubsy <glubsy@users.noreply.github.com>
Date:   Wed Jul 15 22:47:32 2020 +0200

    Update preferences on show(), not in constructor

    * If the dialog window shouldn't have a titlebar during construction, update accordingly only when showing to fix Windows displaying a window without titlebar on first show
    * Only save geometry if the window is floating. Otherwise geometry while docked is saved whih gives weird results on subsequent starts, since it may be floating by default anyway (at least on Linux where titlebar being disabled is allowed while floating)
    * Vertical title bar doesn't seem to work on Windows, add note in preferences dialog

commit 75621cc816
Author: glubsy <glubsy@users.noreply.github.com>
Date:   Wed Jul 15 22:04:19 2020 +0200

    Prevent Windows from floating if no decoration

    * Windows users cannot move a window which has no native decorations. Toggling a dock widget's titlebar off also removes native decorations on a floating window. Until we implement a replacement titlebar by overriding paintEvents, simply force the floating window to go back to docked state after we toggled the titlebar off.

commit 3c816b2f11
Author: glubsy <glubsy@users.noreply.github.com>
Date:   Wed Jul 15 21:43:01 2020 +0200

    Fix computing and setting offset to 0 for tableview

commit 85d6e05cd4
Merge: 66127d02 3eddeb6a
Author: glubsy <glubsy@users.noreply.github.com>
Date:   Wed Jul 15 21:25:44 2020 +0200

    Merge branch 'dockable_windows' into details_dialog_improvements_dev

commit 66127d025e
Author: glubsy <glubsy@users.noreply.github.com>
Date:   Wed Jul 15 20:22:13 2020 +0200

    Add credit for icons used, upscale exchange icon

    * Jason Cho gave his express permission to use the icon (it was made 10 years ago and he doesn't have the source files anymore)
    * Used waifu2x to upscale the icon
    * Used GIMP to draw dark outline around the icon
    * Source files are included

commit 58c675d1fa
Author: glubsy <glubsy@users.noreply.github.com>
Date:   Wed Jul 15 05:25:47 2020 +0200

    Add custom icons

    * Use custom icons on platforms which do not provide theme
    * Old zoom icons credits to "schollidesign" from icon pack Office and Entertainment (GPL licence).
    * Exchange icon credit to Jason Cho (Unknown license).
    * Use hack to resize viewers on first show() as well

commit 95b8406c7b
Author: glubsy <glubsy@users.noreply.github.com>
Date:   Wed Jul 15 04:14:24 2020 +0200

    Fix scrollbar displayed while splitter maxed out

    * For some reason the table's height is a few pixel longer on Windows so we work around the issue by adding a small offset to the maximum height hint.
    * No idea about MacOS yet but this might need the same treatment.

commit 3eddeb6aeb
Author: glubsy <glubsy@users.noreply.github.com>
Date:   Tue Jul 14 17:37:48 2020 +0200

    Fix ME/SE details dialogs, add preferences

    * Fix ME and SE versions of details dialog not displaying their content properly after change to QDockWidget
    * Add option to toggle titlebar and orientation of titlebar in preferences dialog
    * Fix setting layout on PE details dialog window while layout already set, by removing the self (parent) reference in constructing the QSplitter

commit 56912a7108
Author: glubsy <glubsy@users.noreply.github.com>
Date:   Mon Jul 13 05:06:04 2020 +0200

    Make details dialog dockable
2020-07-16 22:31:54 +02:00
glubsy
ac941037ff Fix resize of top frame not updating scaled pixmap
* Also limit viewing features such as zoom levels when files have different dimensions
* GraphicsViewImageViewer is still a bit buggy: the scrollbars are toggled on when the pixmap is null in the reference viewer (we do not use that class right anyway)
2020-07-16 22:21:24 +02:00
glubsy
733b3b0ed4 Prevent zoom for images of differing dimensions
* If images are not the same size, prevent zooming features from being used by disabling the normal size button, only enable swap
2020-07-16 01:31:24 +02:00
glubsy
9168d72f38 Update preferences on show(), not in constructor
* If the dialog window shouldn't have a titlebar during construction, update accordingly only when showing to fix Windows displaying a window without titlebar on first show
* Only save geometry if the window is floating. Otherwise geometry while docked is saved whih gives weird results on subsequent starts, since it may be floating by default anyway (at least on Linux where titlebar being disabled is allowed while floating)
* Vertical title bar doesn't seem to work on Windows, add note in preferences dialog
2020-07-15 23:00:55 +02:00
glubsy
75621cc816 Prevent Windows from floating if no decoration
* Windows users cannot move a window which has no native decorations. Toggling a dock widget's titlebar off also removes native decorations on a floating window. Until we implement a replacement titlebar by overriding paintEvents, simply force the floating window to go back to docked state after we toggled the titlebar off.
2020-07-15 22:12:19 +02:00
glubsy
3c816b2f11 Fix computing and setting offset to 0 for tableview 2020-07-15 21:48:11 +02:00
glubsy
85d6e05cd4 Merge branch 'dockable_windows' into details_dialog_improvements_dev 2020-07-15 21:25:44 +02:00
glubsy
66127d025e Add credit for icons used, upscale exchange icon
* Jason Cho gave his express permission to use the icon (it was made 10 years ago and he doesn't have the source files anymore)
* Used waifu2x to upscale the icon
* Used GIMP to draw dark outline around the icon
* Source files are included
2020-07-15 20:22:13 +02:00
glubsy
58c675d1fa Add custom icons
* Use custom icons on platforms which do not provide theme
* Old zoom icons credits to "schollidesign" from icon pack Office and Entertainment (GPL licence).
* Exchange icon credit to Jason Cho (Unknown license).
* Use hack to resize viewers on first show() as well
2020-07-15 05:25:47 +02:00
glubsy
95b8406c7b Fix scrollbar displayed while splitter maxed out
* For some reason the table's height is a few pixel longer on Windows so we work around the issue by adding a small offset to the maximum height hint.
* No idea about MacOS yet but this might need the same treatment.
2020-07-15 04:14:24 +02:00
glubsy
3eddeb6aeb Fix ME/SE details dialogs, add preferences
* Fix ME and SE versions of details dialog not displaying their content properly after change to QDockWidget
* Add option to toggle titlebar and orientation of titlebar in preferences dialog
* Fix setting layout on PE details dialog window while layout already set, by removing the self (parent) reference in constructing the QSplitter
2020-07-14 17:37:48 +02:00
glubsy
56912a7108 Make details dialog dockable 2020-07-13 05:06:04 +02:00
glubsy
7ab299874d Merge commit 'b0a256f0' 2020-07-12 17:54:51 +02:00
glubsy
a4265e7fff Use tabs instead of floating windows
* Directories dialog, Results window and ignore list dialog are the three dialog windows which can now be tabbed instead of previously floating.
* Menus are automatically updated depending on the type of dialog as the current tab. Menu items which do not apply to the currently displayed tab are disabled but not hidden.
* The floating windows logic is preserved in case we want to use them again later (I don't see why though)
* There are two different versions of the tab bar: the default one used in TabBarWindow class places the tabs next to the top menu to save screen real estate. The other option is to use TabWindow which uses a regular QTabWidget where the tab bar is placed right on top of the displayed window.
* There is a toggle option in the View menu to hide the tabs, the windows can still be navigated to with the View menu items.
2020-07-12 17:23:35 +02:00
glubsy
db228ec8a3 Fix word wrap in ignore list dialog 2020-07-12 16:17:18 +02:00
glubsy
61fc4f07ae Fix updating result window action upon creation
* Result Window action was not being properly updated
after the ResultWindow had been created.
There was no way of retrieving the window after it had been closed.
2020-07-07 16:54:08 +02:00
glubsy
b0a256f0d4 Fix flake8 minor issues 2020-07-02 23:09:02 +02:00
glubsy
4ee9479a5f Add image comparison features to details dialog
* Add splitter in order to hide the details table.
* Add a toolbar to the Details Dialog window to allow for better image
comparisons: zoom in/out, swap pixmaps in place, best-fit-to-viewport.
Scrollbars and viewports are synchronized.
2020-07-02 22:52:47 +02:00
glubsy
e7b3252534 Cleanup of details table 2020-07-02 22:36:57 +02:00
glubsy
36ab84423a Move buttons into the toolbar class.
* Moved the QToolbar into the image viewer's  translation unit.
* QAction are still attached to the dialog window for shortcuts to work
2020-07-02 22:36:57 +02:00
glubsy
370b582c9b Add working zoom functions to GraphicsView viewers. 2020-07-02 22:36:57 +02:00
glubsy
9f15139d5f Fix view resetting when selecting reference only.
* Needed to ignore the scrollbar changes in the disabled
panel, sine a null pixmap would reset the bars to 0 and affect
the selected viewer.
* Keep view as same scale accross entries from the same group.
2020-07-02 22:36:57 +02:00
glubsy
011939f5ee Keep scale accross files of the same dupe group.
* Also fix scaled down pixmap when updating pixmap in the same group
* Fix ignoring mouse wheel event when max scale has been reached
* Fix toggle scrollbars when asking for normal size
2020-07-02 22:36:57 +02:00
glubsy
977c20f7c4 Add QSplitter to hide TableView in DetailsDialog 2020-07-02 22:36:57 +02:00
glubsy
aa79b31aae Work around resizing down offset by 1 pixel. 2020-07-02 22:36:57 +02:00
glubsy
970bb5e19d Add mostly working ScrollArea imge viewers
* Work around flickering of scrollbars due to
GridLayout resizing on odd pixels by disabling
the scrollbars while BestFit is active
* Also setting minimum column width to work around
the issue above.
* Avoid updating scrollbar values twice by using a
simple boolean lock
2020-07-02 22:36:57 +02:00
glubsy
a706d0ebe5 Implement mostly working ScrollArea viewer
Using a QWidget inside the QScrollArea mostly works
but we only move around the pixmap inside the QWidget,
not the QWidget itself, which doesn't update scrollbars.
Need a better implementation.
2020-07-02 22:36:57 +02:00
glubsy
b7abcf2989 Use native QPixmap swap() method instead of manual setPixmap()
When swapping images, use getters to hopefully get a reference to
each pixmap and swap them within a single slot.
2020-07-02 22:36:57 +02:00
glubsy
8103cb3664 Disable unused methods from controller
* setPixmap() now disables the QWidget automatically if the pixmap passed is null.
* the controller relays repaint events to the other widget
2020-07-02 22:36:57 +02:00
glubsy
c3797918d2 Controller class to decouple from the dialog class
The controller singleton acts as a proxy to relay
signals from each widget to the other
It should help encapsulating things better if we need to
use a different class for image viewers in the future.
2020-07-02 22:36:57 +02:00
glubsy
60ddb9b596 Working synchronized views. 2020-07-02 22:36:57 +02:00
glubsy
a29f3fb407 only update delta when mouse is being dragged to reduce paint events 2020-07-02 22:36:57 +02:00
glubsy
c6162914ed working synchronized panning 2020-07-02 22:36:57 +02:00
glubsy
02bd822ca0 working zoom functions, mouse wheel event 2020-07-02 22:36:57 +02:00
glubsy
ea6197626b drag mouse with ImageViewer class 2020-07-02 22:36:57 +02:00
glubsy
468a736bfb add normal size button 2020-07-02 22:36:57 +02:00
glubsy
f42df12a29 attempt at double click on Qlabel 2020-07-02 22:36:57 +02:00
glubsy
9b48e1851d add zoom and swap buttons to details dialog 2020-07-02 22:36:57 +02:00
glubsy
c973224fa4 Fix flake8 identation warnings 2020-07-01 03:05:59 +02:00
092cf1471b Add details to commented out tests. 2020-06-30 12:25:23 -05:00
glubsy
5cbe342d5b Ignore formatting if no data returned from model 2020-06-30 18:32:20 +02:00
4f252480d3 Fix pywin32 dependency 2020-06-30 00:52:04 -05:00
5cc439d846 Clean up rest of DeprecationWarnings 2020-06-30 00:51:06 -05:00
glubsy
c6f5031dd8 Add color and bold font if difference in model
* Could be better optimized if there is a way to
set those variables earlier in the model or somewhere
in the viewer when it requests the data.
* Right now it compares strings(?) many times for every role
we handle, which is not ideal.
2020-06-30 04:20:27 +02:00
glubsy
eb6946343b Remove superflous top-left corner button 2020-06-30 01:19:25 +02:00
glubsy
e41a6b878c Allow moving rows around in details table
* Replaces the "Attribute" column with a horizontal header
* We ignore the first value in each row from the model and instead
populate a horizontal header with the value in order to allow
2020-06-30 01:02:56 +02:00
ee2671a5f3 More Test and Flake8 Cleanup
- Allow flake8 to check more files as well.
2020-06-27 01:08:12 -05:00
e05c72ad8c Upgrade to latest pytest
- Currently some incompatibility in the hscommon tests, commented out
the ones with issues temporarily
- Also updated some deprecation warnings, still more to do
2020-06-25 23:26:48 -05:00
7658cdafbc Merge pull request #665 from KIAaze/fix_packaging_ubu20.04
Fix packaging on *ubuntu 20.04 (more specifically python version >=3.8)
2020-06-24 18:47:09 -05:00
ecf005fad0 Add distro to requirements and use for packaging
- Add distro as a requirement
- Use distro.id() to get the id as it is a bit cleaner than distro.linux_distribution()
2020-06-24 18:39:06 -05:00
de0542d2a8 Merge pull request #677 from glubsy/fix_folder
Fix standard mode folder comparison view generating "---" in results table
2020-06-24 18:30:30 -05:00
glubsy
bcb26507fe Remove superfluous argument 2020-06-25 01:23:03 +02:00
c35db7f698 Merge pull request #672 from jpvlsmv/variable_fix_trivial
Rename an ell variable into something that flake8 doesn't complain about
2020-06-24 17:18:49 -05:00
d2193328a7 Add e to lin 2020-06-24 17:11:09 -05:00
glubsy
ed64428c80 Add missing file class for folder type.
* results.py doesn't set the proper type for dupes at the line
"file = get_file(path)" so we add it on top
* Perhap it could have been added to _get_fileclasses() in core.app.py too
but I have not tested it
2020-06-24 23:32:04 +02:00
glubsy
e89156e55c Add temporary workaround for bug #676
* In standard mode, for folder comparison, dupe type is wrongly set as core.fs.Folder
while it should be core.se.fs.Folder.
* Catching the NotImplementedError exception redirects to the appropriate handler
* This is only a temporary workaround until a better fix is implemented
2020-06-24 22:01:30 +02:00
4c9309ea9c Add changelog to pkg/debian
May try some other way of doing this later, but for now this will
let the PPA build make some progress.
2020-06-16 20:45:48 -05:00
1c00331bc2 Remove Old Issue Template 2020-06-15 23:28:31 -05:00
427e32f406 Update issue templates
Change to the new issue template flow.
2020-06-15 23:18:13 -05:00
Joe Moore
b048fa5968 Rename an ell variable into something that flake8 doesn't complain about 2020-06-05 19:44:08 -04:00
d5a6ca7488 Merge pull request #669 from jpvlsmv/refactor_ci
Refactor ci a little bit
2020-06-01 11:57:58 -05:00
Joe Moore
d15dea7aa0 Move flake8 requirement out of .txt into tox environment spec 2020-05-30 09:49:17 -04:00
Joe Moore
ccb1c75f22 Call style-checker tox environment 2020-05-30 09:40:23 -04:00
Joe Moore
dffbed8e22 Use build and package scripts on windows 2020-05-30 09:34:03 -04:00
Joe Moore
50ce010212 Move flake8 to a separate tox env 2020-05-30 09:33:35 -04:00
KIAaze
0e8cd32a6e Changed to -F option to build everything (source and binary packages). 2020-05-20 23:15:49 +01:00
KIAaze
ea191a8924 Fixed AttributeError in the packaging script when using python>=3.8.
platform.dist() is deprecated since python version 3.5 and was removed in version 3.8.
Added exception to use the distro package in that case, as suggested by the python documentation:
https://docs.python.org/3.7/library/platform.html?highlight=platform#module-platform
2020-05-20 23:13:11 +01:00
6abcedddda Merge pull request #656 from glubsy/selected_shortcut_description
Add shortcut description to mark selected action
2020-05-13 20:17:41 -05:00
debf309a9a Merge pull request #655 from glubsy/fix_row_trimming
Fix row trimming
2020-05-08 22:07:38 -05:00
glubsy
4b1c925ab1 use a QKeySequence instead 2020-05-07 16:24:07 +02:00
glubsy
1c0990f610 Add shortcut description to mark selected action 2020-05-07 15:37:21 +02:00
glubsy
89f2dc3b15 prevent word wrapping from truncating row too agressively 2020-05-07 14:55:01 +02:00
glubsy
ffae58040d prevent trimming too short in details panel's rows 2020-05-07 14:53:09 +02:00
0cc1cb4cb8 Merge pull request #646 from glubsy/bold_font
Add a preference option to disable bold font on reference row.
2020-05-05 22:03:41 -05:00
glubsy
dab762f05e Add a preference option to disable bold font on reference row. 2020-04-27 01:36:27 +02:00
c4a6958ef0 Merge pull request #628 from nikmartin/linuxBuild
remove 'm' from SO var on Linux and OSX
2020-03-04 19:34:49 -06:00
98c6f12b39 Merge pull request #627 from ptman/patch-1
Fix handling of filenames with space
2020-03-04 19:34:38 -06:00
5d21454789 Update .travis.yml
Remove python 3.5 and add 3.8
2020-03-04 19:30:30 -06:00
3e4fe5b765 Update tox.ini
Remove python 3.5 add 3.8
2020-03-04 19:29:01 -06:00
Nik Martin
bd0f53bcbe remove 'm' from SO var on Linux and OSX 2020-02-26 15:39:39 -06:00
Paul Tötterman
d820fcc7b1 Fix handling of filenames with space
I got spaces in CURDIR for some reason
2020-02-21 16:02:30 +02:00
de8a0a21b2 Update Packaging
- Add changes from OSX build to local hscommon/build.py
- Update package.py & srcpkg.sh
  - Remove invalid submodule references
  - Update srcpkg.sh to use xz
- Update package.py pyinstaller configuration
  - Call PyInstaller inline
  - Add --noconfirm option to be more script friendly
  - Add UCRT Redist location to path should fix #545 as now all the dlls
    are included
2019-12-31 21:36:52 -06:00
7ba8aa3514 Format files with black
- Format all files with black
- Update tox.ini flake8 arguments to be compatible
- Add black to requirements-extra.txt
- Reduce ignored flake8 rules and fix a few violations
2019-12-31 20:16:27 -06:00
359d6498f7 Update documentation & CI
- Remove references to submodules as they are no longer used
- Update top level readme with more recent status
- Update travis configuration to use python 3.7 instead of latest for now
2019-12-31 17:33:17 -06:00
2ea02bd7b5 Update hscommon/build.py
Update changelog format to use changes from
https://github.com/hsoft/hscommon/pull/6/.  This allows for changes from
 #593 to work correctly.
2019-11-06 20:25:20 -06:00
8506d482af Merge pull request #593 from eugenesan/master
Update packaging for 4.0.4
2019-10-08 20:14:49 -05:00
411d0d6e4a Cross platform fix for makefile #575 2019-09-09 20:23:37 -05:00
95ff6b6b76 Add files from hscommon and qtlib 2019-09-09 19:54:28 -05:00
334f6fe989 Remove qtlib and hscommon submodules 2019-09-09 19:45:58 -05:00
Eugene San (eugenesan)
080bb8935c Update packaging for 4.0.4
* Fix main version (Don't use spaces and capitals in versions!)
* Change debian changelog format in hscommon
* Fix build cleanup
* Switch to XZ compression
* Update build instructions
* Build single package for both Debian/Ubuntu
* Update packaging
2019-08-29 14:50:41 -07:00
ad2a07a289 Merge pull request #572 from jpvlsmv/issue-570
Issue 570 - CI process improvements
2019-05-23 18:08:41 -05:00
Joe Moore
c61a7f1aaf Use 3-ending python names consistantly 2019-05-23 10:43:28 -04:00
Joe Moore
f536f32b19 Reference standard dependencies on Windows 2019-05-23 10:40:42 -04:00
Joe Moore
8cdff7b48c Define tox windows test environment 2019-05-22 11:31:07 -04:00
Joe Moore
718e99e880 Explicitly call tox windows environment on windows 2019-05-22 11:29:37 -04:00
Joe Moore
3c2ef97ee2 Install requisites in install task, move tox-travis into -extras 2019-05-21 10:45:02 -04:00
Joe Moore
2f439d0fc7 Install requisites in install task, move tox-travis into -extras 2019-05-21 10:44:40 -04:00
Joe Moore
4f234f272f Increase tox verbosity 2019-05-21 10:19:04 -04:00
Joe Moore
18acaae888 Attempt to build dupeguru before running the tox cases 2019-05-21 10:18:41 -04:00
Joe Moore
be7d558dfe Add Windows build to the matrix 2019-05-18 14:36:43 -04:00
Joe Moore
0b12236537 Switch to explicit matrix build 2019-05-18 14:35:10 -04:00
Joe Moore
ed2a0bcd4d Drop python 3.4 and test py 3.7 instead 2019-05-18 13:50:24 -04:00
288 changed files with 36392 additions and 10285 deletions

13
.github/FUNDING.yml vendored Normal file
View File

@@ -0,0 +1,13 @@
# These are supported funding model platforms
github: arsenetar
patreon: # Replace with a single Patreon username
open_collective: # Replace with a single Open Collective username
ko_fi: # Replace with a single Ko-fi username
tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
liberapay: # Replace with a single Liberapay username
issuehunt: # Replace with a single IssueHunt username
otechie: # Replace with a single Otechie username
lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry
custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']

View File

@@ -1,24 +0,0 @@
# Instructions
1. Provide a short descriptive title for the issue. A good example is 'Results window appears off screen.', a non-optimal example is 'Problem with App'.
2. Please fill out either the 'Bug / Issue' or the 'Feature Request' section. Replace values in ` `.
3. Delete these instructions and the unused sections.
# Bug / issue Report
System Information:
- DupeGuru Version: `version`
- Operating System: `Windows/Linux/OSX` `distribution` `version`
If using the source distribution and building yourself also provide (otherwise remove):
- Python Version: `version ex. 3.6.6` `32/64bit`
- Complier: `gcc/llvm/msvc` `version`
## Description
`Provide a detailed description of the issue to help reproduce it. If it happens after a specific sequence of events provide them here.`
## Debug Log
```
If reporting an error provide the debug log and/or the error message information. If the debug log is short < 40 lines you can provide it here, otherwise attach the text file to this issue.
```
# Feature Requests
`Provide a detailed description of the feature.`

31
.github/ISSUE_TEMPLATE/bug_report.md vendored Normal file
View File

@@ -0,0 +1,31 @@
---
name: Bug report
about: Create a report to help us improve
title: ''
labels: bug
assignees: ''
---
**Describe the bug**
A clear and concise description of what the bug is.
**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
**Expected behavior**
A clear and concise description of what you expected to happen.
**Screenshots**
If applicable, add screenshots to help explain your problem.
**Desktop (please complete the following information):**
- OS: [e.g. Windows 10 / OSX 10.15 / Ubuntu 20.04 / Arch Linux]
- Version [e.g. 4.1.0]
**Additional context**
Add any other context about the problem here. You may include the debug log although it is normally best to attach it as a file.

View File

@@ -0,0 +1,20 @@
---
name: Feature request
about: Suggest an idea for this project
title: ''
labels: feature
assignees: ''
---
**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
**Describe the solution you'd like**
A clear and concise description of what you want to happen.
**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.
**Additional context**
Add any other context or screenshots about the feature request here.

50
.github/workflows/codeql-analysis.yml vendored Normal file
View File

@@ -0,0 +1,50 @@
name: "CodeQL"
on:
push:
branches: [master]
pull_request:
# The branches below must be a subset of the branches above
branches: [master]
schedule:
- cron: "24 20 * * 2"
jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
permissions:
actions: read
contents: read
security-events: write
strategy:
fail-fast: false
matrix:
language: ["cpp", "python"]
steps:
- name: Checkout repository
uses: actions/checkout@v2
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v1
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.
# queries: ./path/to/local/query, your-org/your-repo/queries@main
- if: matrix.language == 'cpp'
name: Build Cpp
run: |
sudo apt-get update
sudo apt-get install python3-pyqt5
make modules
- if: matrix.language == 'python'
name: Autobuild
uses: github/codeql-action/autobuild@v1
# Analysis
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v1

84
.github/workflows/default.yml vendored Normal file
View File

@@ -0,0 +1,84 @@
# Workflow lints, and checks format in parallel then runs tests on all platforms
name: Default CI/CD
on:
push:
branches: [master]
pull_request:
branches: [master]
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.10
uses: actions/setup-python@v2
with:
python-version: "3.10"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt -r requirements-extra.txt
- name: Lint with flake8
run: |
flake8 .
format:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.10
uses: actions/setup-python@v2
with:
python-version: "3.10"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt -r requirements-extra.txt
- name: Check format with black
run: |
black .
test:
needs: [lint, format]
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
python-version: [3.7, 3.8, 3.9, "3.10"]
exclude:
- os: macos-latest
python-version: 3.7
- os: macos-latest
python-version: 3.8
- os: macos-latest
python-version: 3.9
- os: windows-latest
python-version: 3.7
- os: windows-latest
python-version: 3.8
- os: windows-latest
python-version: 3.9
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt -r requirements-extra.txt
- name: Build python modules
run: |
python build.py --modules
- name: Run tests
run: |
pytest core hscommon
- name: Upload Artifacts
if: matrix.os == 'ubuntu-latest'
uses: actions/upload-artifact@v3
with:
name: modules ${{ matrix.python-version }}
path: ${{ github.workspace }}/**/*.so

120
.gitignore vendored
View File

@@ -1,25 +1,111 @@
.DS_Store # Byte-compiled / optimized / DLL files
__pycache__ __pycache__/
*.py[cod]
*$py.class
# C extensions
*.so *.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo *.mo
*.waf* #*.pot
.lock-waf*
.tox
/tags
build # PEP 582; used by e.g. github.com/David-OConnor/pyflow
dist __pypackages__/
env*
/deps
cocoa/autogen
/run.py # Environments
/cocoa/*/Info.plist .env
/cocoa/*/build .venv
env*/
venv/
ENV/
env.bak/
venv.bak/
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# macOS
.DS_Store
# Visual Studio Code
.vscode/*
!.vscode/settings.json
#!.vscode/tasks.json
#!.vscode/launch.json
!.vscode/extensions.json
!.vscode/*.code-snippets
# Local History for Visual Studio Code
.history/
# Built Visual Studio Code Extensions
*.vsix
# dupeGuru Specific
/qt/*_rc.py /qt/*_rc.py
/help/*/conf.py /help/*/conf.py
/help/*/changelog.rst /help/*/changelog.rst
cocoa/autogen
/cocoa/*/Info.plist
/cocoa/*/build
*.pyd *.waf*
*.exe .lock-waf*
*.spec /tags

6
.gitmodules vendored
View File

@@ -1,6 +0,0 @@
[submodule "qtlib"]
path = qtlib
url = https://github.com/hsoft/qtlib.git
[submodule "hscommon"]
path = hscommon
url = https://github.com/hsoft/hscommon.git

1
.sonarcloud.properties Normal file
View File

@@ -0,0 +1 @@
sonar.python.version=3.7, 3.8, 3.9, 3.10

View File

@@ -1,11 +0,0 @@
sudo: false
dist: xenial
language: python
python:
- "3.4"
- "3.5"
- "3.6"
- "3.7"
install: pip install tox-travis
script: tox

View File

@@ -1,21 +1,21 @@
[main] [main]
host = https://www.transifex.com host = https://www.transifex.com
[dupeguru.core] [o:voltaicideas:p:dupeguru-1:r:columns]
file_filter = locale/<lang>/LC_MESSAGES/core.po
source_file = locale/core.pot
source_lang = en
type = PO
[dupeguru.columns]
file_filter = locale/<lang>/LC_MESSAGES/columns.po file_filter = locale/<lang>/LC_MESSAGES/columns.po
source_file = locale/columns.pot source_file = locale/columns.pot
source_lang = en source_lang = en
type = PO type = PO
[dupeguru.ui] [o:voltaicideas:p:dupeguru-1:r:core]
file_filter = locale/<lang>/LC_MESSAGES/core.po
source_file = locale/core.pot
source_lang = en
type = PO
[o:voltaicideas:p:dupeguru-1:r:ui]
file_filter = locale/<lang>/LC_MESSAGES/ui.po file_filter = locale/<lang>/LC_MESSAGES/ui.po
source_file = locale/ui.pot source_file = locale/ui.pot
source_lang = en source_lang = en
type = PO type = PO

10
.vscode/extensions.json vendored Normal file
View File

@@ -0,0 +1,10 @@
{
// List of extensions which should be recommended for users of this workspace.
"recommendations": [
"redhat.vscode-yaml",
"ms-python.vscode-pylance",
"ms-python.python"
],
// List of extensions recommended by VS Code that should not be recommended for users of this workspace.
"unwantedRecommendations": []
}

12
.vscode/settings.json vendored Normal file
View File

@@ -0,0 +1,12 @@
{
"python.formatting.provider": "black",
"cSpell.words": [
"Dupras",
"hscommon"
],
"python.languageServer": "Pylance",
"yaml.schemaStore.enable": true,
"yaml.schemas": {
"https://json.schemastore.org/github-workflow.json": ".github/workflows/*.yml"
}
}

88
CONTRIBUTING.md Normal file
View File

@@ -0,0 +1,88 @@
# Contributing to dupeGuru
The following is a set of guidelines and information for contributing to dupeGuru.
#### Table of Contents
[Things to Know Before Starting](#things-to-know-before-starting)
[Ways to Contribute](#ways-to-contribute)
* [Reporting Bugs](#reporting-bugs)
* [Suggesting Enhancements](#suggesting-enhancements)
* [Localization](#localization)
* [Code Contribution](#code-contribution)
* [Pull Requests](#pull-requests)
[Style Guides](#style-guides)
* [Git Commit Messages](#git-commit-messages)
* [Python Style Guide](#python-style-guide)
* [Documentation Style Guide](#documentation-style-guide)
[Additional Notes](#additional-notes)
* [Issue and Pull Request Labels](#issue-and-pull-request-labels)
## Things to Know Before Starting
**TODO**
## Ways to contribute
### Reporting Bugs
**TODO**
### Suggesting Enhancements
**TODO**
### Localization
**TODO**
### Code Contribution
**TODO**
### Pull Requests
Please follow these steps to have your contribution considered by the maintainers:
1. Keep Pull Request specific to one feature or bug.
2. Follow the [style guides](#style-guides)
3. After you submit your pull request, verify that all [status checks](https://help.github.com/articles/about-status-checks/) are passing <details><summary>What if the status checks are failing?</summary>If a status check is failing, and you believe that the failure is unrelated to your change, please leave a comment on the pull request explaining why you believe the failure is unrelated. A maintainer will re-run the status check for you. If we conclude that the failure was a false positive, then we will open an issue to track that problem with our status check suite.</details>
While the prerequisites above must be satisfied prior to having your pull request reviewed, the reviewer(s) may ask you to complete additional design work, tests, or other changes before your pull request can be ultimately accepted.
## Style Guides
### Git Commit Messages
- Use the present tense ("Add feature" not "Added feature")
- Use the imperative mood ("Move cursor to..." not "Moves cursor to...")
- Limit the first line to 72 characters or less
- Reference issues and pull requests liberally after the first line
### Python Style Guide
- All files are formatted with [Black](https://github.com/psf/black)
- Follow [PEP 8](https://peps.python.org/pep-0008/) as much as practical
- Pass [flake8](https://flake8.pycqa.org/en/latest/) linting
- Include [PEP 484](https://peps.python.org/pep-0484/) type hints (new code)
### Documentation Style Guide
**TODO**
## Additional Notes
### Issue and Pull Request Labels
This section lists and describes the various labels used with issues and pull requests. Each of the labels is listed with a search link as well.
#### Issue Type and Status
| Label name | Search | Description |
|------------|--------|-------------|
| `enhancement` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3Aenhancement) | Feature requests and enhancements. |
| `bug` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3Abug) | Bug reports. |
| `duplicate` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3Aduplicate) | Issue is a duplicate of existing issue. |
| `needs-reproduction` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3Aneeds-reproduction) | A bug that has not been able to be reproduced. |
| `needs-information` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3Aneeds-information) | More information needs to be collected about these problems or feature requests (e.g. steps to reproduce). |
| `blocked` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3Ablocked) | Issue blocked by other issues. |
| `beginner` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3Abeginner) | Less complex issues for users who want to start contributing. |
#### Category Labels
| Label name | Search | Description |
|------------|--------|-------------|
| `3rd party` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3A%223rd%20party%22) | Related to a 3rd party dependency. |
| `crash` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3Acrash) | Related to crashes (complete, or unhandled). |
| `documentation` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3Adocumentation) | Related to any documentation. |
| `linux` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3linux) | Related to running on Linux. |
| `mac` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3Amac) | Related to running on macOS. |
| `performance` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3Aperformance) | Related to the performance. |
| `ui` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3Aui)| Related to the visual design. |
| `windows` | [search](https://github.com/arsenetar/dupeguru/issues?q=is%3Aopen+is%3Aissue+label%3Awindows) | Related to running on Windows. |
#### Pull Request Labels
None at this time, if the volume of Pull Requests increase labels may be added to manage.

View File

@@ -1,6 +1,8 @@
To know who contributed to dupeGuru, you can look at the commit log, but not all contributions To know who contributed to dupeGuru, you can look at the commit log, but not all contributions
result in a commit. This file lists contributors who don't necessarily appear in the commit log. result in a commit. This file lists contributors who don't necessarily appear in the commit log.
* Jason Cho, Exchange icon
* schollidesign (https://findicons.com/pack/1035/human_o2), Zoom-in, Zoom-out, Zoom-best-fit, Zoom-original icons
* Jérôme Cantin, Main icon * Jérôme Cantin, Main icon
* Gregor Tätzner, German localization * Gregor Tätzner, German localization
* Frank Weber, German localization * Frank Weber, German localization

5
MANIFEST.in Normal file
View File

@@ -0,0 +1,5 @@
recursive-include core *.h
recursive-include core *.m
include run.py
graft locale
graft help

View File

@@ -1,7 +1,7 @@
PYTHON ?= python3 PYTHON ?= python3
PYTHON_VERSION_MINOR := $(shell ${PYTHON} -c "import sys; print(sys.version_info.minor)") PYTHON_VERSION_MINOR := $(shell ${PYTHON} -c "import sys; print(sys.version_info.minor)")
PYRCC5 ?= pyrcc5 PYRCC5 ?= pyrcc5
REQ_MINOR_VERSION = 4 REQ_MINOR_VERSION = 7
PREFIX ?= /usr/local PREFIX ?= /usr/local
# Window compatability via Msys2 # Window compatability via Msys2
@@ -9,13 +9,13 @@ PREFIX ?= /usr/local
# - compile generates .pyd instead of .so # - compile generates .pyd instead of .so
# - venv with --sytem-site-packages has issues on windows as well... # - venv with --sytem-site-packages has issues on windows as well...
ifeq ($(shell uname -o), Msys) ifeq ($(shell ${PYTHON} -c "import platform; print(platform.system())"), Windows)
BIN = Scripts BIN = Scripts
SO = *.pyd SO = *.pyd
VENV_OPTIONS = VENV_OPTIONS =
else else
BIN = bin BIN = bin
SO = cpython-3$(PYTHON_VERSION_MINOR)m*.so SO = *.so
VENV_OPTIONS = --system-site-packages VENV_OPTIONS = --system-site-packages
endif endif
@@ -34,9 +34,8 @@ endif
# Our build scripts are not very "make like" yet and perform their task in a bundle. For now, we # Our build scripts are not very "make like" yet and perform their task in a bundle. For now, we
# use one of each file to act as a representative, a target, of these groups. # use one of each file to act as a representative, a target, of these groups.
submodules_target = hscommon/__init__.py
packages = hscommon qtlib core qt packages = hscommon core qt
localedirs = $(wildcard locale/*/LC_MESSAGES) localedirs = $(wildcard locale/*/LC_MESSAGES)
pofiles = $(wildcard locale/*/LC_MESSAGES/*.po) pofiles = $(wildcard locale/*/LC_MESSAGES/*.po)
mofiles = $(patsubst %.po,%.mo,$(pofiles)) mofiles = $(patsubst %.po,%.mo,$(pofiles))
@@ -44,17 +43,17 @@ mofiles = $(patsubst %.po,%.mo,$(pofiles))
vpath %.po $(localedirs) vpath %.po $(localedirs)
vpath %.mo $(localedirs) vpath %.mo $(localedirs)
all : | env i18n modules qt/dg_rc.py all: | env i18n modules qt/dg_rc.py
@echo "Build complete! You can run dupeGuru with 'make run'" @echo "Build complete! You can run dupeGuru with 'make run'"
run: run:
$(VENV_PYTHON) run.py $(VENV_PYTHON) run.py
pyc: pyc: | env
${PYTHON} -m compileall ${packages} ${VENV_PYTHON} -m compileall ${packages}
reqs : reqs:
ifneq ($(shell test $(PYTHON_VERSION_MINOR) -gt $(REQ_MINOR_VERSION); echo $$?),0) ifneq ($(shell test $(PYTHON_VERSION_MINOR) -ge $(REQ_MINOR_VERSION); echo $$?),0)
$(error "Python 3.${REQ_MINOR_VERSION}+ required. Aborting.") $(error "Python 3.${REQ_MINOR_VERSION}+ required. Aborting.")
endif endif
ifndef NO_VENV ifndef NO_VENV
@@ -64,12 +63,7 @@ endif
@${PYTHON} -c 'import PyQt5' >/dev/null 2>&1 || \ @${PYTHON} -c 'import PyQt5' >/dev/null 2>&1 || \
{ echo "PyQt 5.4+ required. Install it and try again. Aborting"; exit 1; } { echo "PyQt 5.4+ required. Install it and try again. Aborting"; exit 1; }
# Ensure that submodules are initialized env: | reqs
$(submodules_target) :
git submodule init
git submodule update
env : | $(submodules_target) reqs
ifndef NO_VENV ifndef NO_VENV
@echo "Creating our virtualenv" @echo "Creating our virtualenv"
${PYTHON} -m venv env ${PYTHON} -m venv env
@@ -79,40 +73,26 @@ ifndef NO_VENV
${PYTHON} -m venv --upgrade ${VENV_OPTIONS} env ${PYTHON} -m venv --upgrade ${VENV_OPTIONS} env
endif endif
build/help : | env build/help: | env
$(VENV_PYTHON) build.py --doc $(VENV_PYTHON) build.py --doc
qt/dg_rc.py : qt/dg.qrc qt/dg_rc.py: qt/dg.qrc
$(PYRCC5) qt/dg.qrc > qt/dg_rc.py $(PYRCC5) qt/dg.qrc > qt/dg_rc.py
i18n: $(mofiles) i18n: $(mofiles)
%.mo : %.po %.mo: %.po
msgfmt -o $@ $< msgfmt -o $@ $<
core/pe/_block.$(SO) : core/pe/modules/block.c core/pe/modules/common.c modules: | env
$(PYTHON) hscommon/build_ext.py $^ _block $(VENV_PYTHON) build.py --modules
mv _block.$(SO) core/pe
core/pe/_cache.$(SO) : core/pe/modules/cache.c core/pe/modules/common.c mergepot: | env
$(PYTHON) hscommon/build_ext.py $^ _cache
mv _cache.$(SO) core/pe
qt/pe/_block_qt.$(SO) : qt/pe/modules/block.c
$(PYTHON) hscommon/build_ext.py $^ _block_qt
mv _block_qt.$(SO) qt/pe
modules : core/pe/_block.$(SO) core/pe/_cache.$(SO) qt/pe/_block_qt.$(SO)
mergepot :
$(VENV_PYTHON) build.py --mergepot $(VENV_PYTHON) build.py --mergepot
normpo : normpo: | env
$(VENV_PYTHON) build.py --normpo $(VENV_PYTHON) build.py --normpo
srcpkg :
./scripts/srcpkg.sh
install: all pyc install: all pyc
mkdir -p ${DESTDIR}${PREFIX}/share/dupeguru mkdir -p ${DESTDIR}${PREFIX}/share/dupeguru
cp -rf ${packages} locale ${DESTDIR}${PREFIX}/share/dupeguru cp -rf ${packages} locale ${DESTDIR}${PREFIX}/share/dupeguru
@@ -129,7 +109,7 @@ installdocs: build/help
mkdir -p ${DESTDIR}${PREFIX}/share/dupeguru mkdir -p ${DESTDIR}${PREFIX}/share/dupeguru
cp -rf build/help ${DESTDIR}${PREFIX}/share/dupeguru cp -rf build/help ${DESTDIR}${PREFIX}/share/dupeguru
uninstall : uninstall:
rm -rf "${DESTDIR}${PREFIX}/share/dupeguru" rm -rf "${DESTDIR}${PREFIX}/share/dupeguru"
rm -f "${DESTDIR}${PREFIX}/bin/dupeguru" rm -f "${DESTDIR}${PREFIX}/bin/dupeguru"
rm -f "${DESTDIR}${PREFIX}/share/applications/dupeguru.desktop" rm -f "${DESTDIR}${PREFIX}/share/applications/dupeguru.desktop"
@@ -140,4 +120,4 @@ clean:
-rm locale/*/LC_MESSAGES/*.mo -rm locale/*/LC_MESSAGES/*.mo
-rm core/pe/*.$(SO) qt/pe/*.$(SO) -rm core/pe/*.$(SO) qt/pe/*.$(SO)
.PHONY : clean srcpkg normpo mergepot modules i18n reqs run pyc install uninstall all .PHONY: clean normpo mergepot modules i18n reqs run pyc install uninstall all

View File

@@ -1,68 +1,83 @@
# dupeGuru # dupeGuru
[dupeGuru][dupeguru] is a cross-platform (Linux, OS X, Windows) GUI tool to find duplicate files in [dupeGuru][dupeguru] is a cross-platform (Linux, OS X, Windows) GUI tool to find duplicate files in
a system. It's written mostly in Python 3 and has the peculiarity of using a system. It is written mostly in Python 3 and uses [qt](https://www.qt.io/) for the UI.
[multiple GUI toolkits][cross-toolkit], all using the same core Python code. On OS X, the UI layer
is written in Objective-C and uses Cocoa. On Linux, it's written in Python and uses Qt5.
The Cocoa UI of dupeGuru is hosted in a separate repo: https://github.com/hsoft/dupeguru-cocoa ## Current status
Still looking for additional help especially with regards to:
## Current status: Additional Maintainers Wanted (/ Note on Things in General) * OSX maintenance: reproducing bugs, packaging verification.
* Linux maintenance: reproducing bugs, maintaining PPA repository, Debian package, rpm package.
When I started contributing to dupeGuru, it was to help provide an updated Windows build for dupeGuru. I hoped to contribute more over time and help work through some of the general issues as well. Since Virgil Dupras left as the lead maintainer, I have not been able to devote enough time to work through as many issues as I had hoped. Now I am going to be devoting a more consistent amount of time each month to work on dupeGuru, however I will not be able to get to all issues. Additionally there are a few specific areas where additional help would be appreciated: * Translations: updating missing strings, transifex project at https://www.transifex.com/voltaicideas/dupeguru-1
* Documentation: keeping it up-to-date.
- OSX maintenance
- UI issues (I have no experience with cocoa)
- General issues & releases (I lack OSX environments / hardware to develop and test on, looking into doing builds through Travis CI.)
- Linux maintenance
- Packaging (I have not really done much linux packaging yet, although will be spending some time trying to get at least .deb and potentially ppa's updated.)
I am still working to update the new site & update links within the help and the repository to use the new urls. Additionally, hoping to get a 4.0.4 release out this year for at least Windows and Linux.
Thanks,
Andrew Senetar
## Contents of this folder ## Contents of this folder
This folder contains the source for dupeGuru. Its documentation is in `help`, but is also This folder contains the source for dupeGuru. Its documentation is in `help`, but is also
[available online][documentation] in its built form. Here's how this source tree is organised: [available online][documentation] in its built form. Here's how this source tree is organized:
* core: Contains the core logic code for dupeGuru. It's Python code. * core: Contains the core logic code for dupeGuru. It's Python code.
* qt: UI code for the Qt toolkit. It's written in Python and uses PyQt. * qt: UI code for the Qt toolkit. It's written in Python and uses PyQt.
* images: Images used by the different UI codebases. * images: Images used by the different UI codebases.
* pkg: Skeleton files required to create different packages * pkg: Skeleton files required to create different packages
* help: Help document, written for Sphinx. * help: Help document, written for Sphinx.
* locale: .po files for localisation. * locale: .po files for localization.
There are also other sub-folder that comes from external repositories and are part of this repo as
git submodules:
* hscommon: A collection of helpers used across HS applications. * hscommon: A collection of helpers used across HS applications.
* qtlib: A collection of helpers used across Qt UI codebases of HS applications.
## How to build dupeGuru from source ## How to build dupeGuru from source
### Windows ### Windows & macOS specific additional instructions
For windows instructions see the [Windows Instructions](Windows.md). For windows instructions see the [Windows Instructions](Windows.md).
### Prerequisites For macos instructions (qt version) see the [macOS Instructions](macos.md).
* [Python 3.4+][python] ### Prerequisites
* [Python 3.7+][python]
* PyQt5 * PyQt5
### make ### System Setup
When running in a linux based environment the following system packages or equivalents are needed to build:
* python3-pyqt5
* pyqt5-dev-tools (on some systems, see note)
* python3-venv (only if using a virtual environment)
* python3-dev
* build-essential
dupeGuru is built with "make": Note: On some linux systems pyrcc5 is not put on the path when installing python3-pyqt5, this will cause some issues with the resource files (and icons). These systems should have a respective pyqt5-dev-tools package, which should also be installed. The presence of pyrcc5 can be checked with `which pyrcc5`. Debian based systems need the extra package, and Arch does not.
$ make To create packages the following are also needed:
$ make run * python3-setuptools
* debhelper
### Generate Ubuntu packages ### Building with Make
dupeGuru comes with a makefile that can be used to build and run:
$ bash -c "pyvenv --system-site-packages env && source env/bin/activate && pip install -r requirements.txt && python3 build.py --clean && python3 package.py" $ make && make run
### Running tests ### Building without Make
$ cd <dupeGuru directory>
$ python3 -m venv --system-site-packages ./env
$ source ./env/bin/activate
$ pip install -r requirements.txt
$ python build.py
$ python run.py
### Generating Debian/Ubuntu package
To generate packages the extra requirements in requirements-extra.txt must be installed, the
steps are as follows:
$ cd <dupeGuru directory>
$ python3 -m venv --system-site-packages ./env
$ source ./env/bin/activate
$ pip install -r requirements.txt -r requirements-extra.txt
$ python build.py --clean
$ python package.py
This can be made a one-liner (once in the directory) as:
$ bash -c "python3 -m venv --system-site-packages env && source env/bin/activate && pip install -r requirements.txt -r requirements-extra.txt && python build.py --clean && python package.py"
## Running tests
The complete test suite is run with [Tox 1.7+][tox]. If you have it installed system-wide, you The complete test suite is run with [Tox 1.7+][tox]. If you have it installed system-wide, you
don't even need to set up a virtualenv. Just `cd` into the root project folder and run `tox`. don't even need to set up a virtualenv. Just `cd` into the root project folder and run `tox`.

View File

@@ -2,28 +2,26 @@
### Prerequisites ### Prerequisites
- [Python 3.5+][python] - [Python 3.7+][python]
- [Visual Studio 2017][vs] or [Visual Studio Build Tools 2017][vsBuildTools] with the Windows 10 SDK - [Visual Studio 2019][vs] or [Visual Studio Build Tools 2019][vsBuildTools] with the Windows 10 SDK
- [nsis][nsis] (for installer creation) - [nsis][nsis] (for installer creation)
- [msys2][msys2] (for using makefile method) - [msys2][msys2] (for using makefile method)
When installing Visual Studio or the Visual Studio Build Tools with the Windows 10 SDK on versions of Windows below 10 be sure to make sure that the Universal CRT is installed before installing Visual studio as noted in the [Windows 10 SDK Notes][win10sdk] and found at [KB2999226][KB2999226]. NOTE: When installing Visual Studio or the Visual Studio Build Tools with the Windows 10 SDK on versions of Windows below 10 be sure to make sure that the Universal CRT is installed before installing Visual studio as noted in the [Windows 10 SDK Notes][win10sdk] and found at [KB2999226][KB2999226].
After installing python it is recommended to update setuptools before compiling packages. To update run (example is for python launcher and 3.5): After installing python it is recommended to update setuptools before compiling packages. To update run (example is for python launcher and 3.8):
$ py -3.5 -m pip install --upgrade setuptools $ py -3.8 -m pip install --upgrade setuptools
More details on setting up python for compiling packages on windows can be found on the [python wiki][pythonWindowsCompilers] More details on setting up python for compiling packages on windows can be found on the [python wiki][pythonWindowsCompilers] Take note of the required vc++ versions.
### With build.py (preferred) ### With build.py (preferred)
To build with a different python version 3.5 vs 3.6 or 32 bit vs 64 bit specify that version instead of -3.5 to the `py` command below. If you want to build additional versions while keeping all virtual environments setup use a different location for each vritual environment. To build with a different python version 3.7 vs 3.8 or 32 bit vs 64 bit specify that version instead of -3.8 to the `py` command below. If you want to build additional versions while keeping all virtual environments setup use a different location for each virtual environment.
$ cd <dupeGuru directory> $ cd <dupeGuru directory>
$ git submodule init $ py -3.8 -m venv .\env
$ git submodule update
$ py -3.5 -m venv .\env
$ .\env\Scripts\activate $ .\env\Scripts\activate
$ pip install -r requirements.txt -r requirements-windows.txt $ pip install -r requirements.txt
$ python build.py $ python build.py
$ python run.py $ python run.py
@@ -36,23 +34,21 @@ It is possible to build dupeGuru with the makefile on windows using a compatable
Then the following execution of the makefile should work. Pass the correct value for PYTHON to the makefile if not on the path as python3. Then the following execution of the makefile should work. Pass the correct value for PYTHON to the makefile if not on the path as python3.
$ cd <dupeGuru directory> $ cd <dupeGuru directory>
$ make PYTHON='py -3.5' $ make PYTHON='py -3.8'
$ make run $ make run
NOTE: Install PyQt5 & cx-Freeze with requirements-windows.txt into the venv before runing the packaging scripts in the section below.
### Generate Windows Installer Packages ### Generate Windows Installer Packages
You need to use the respective x86 or x64 version of python to build the 32 bit and 64 bit versions. The build scripts will automatically detect the python architecture for you. When using build.py make sure the resulting python works before continuing to package.py. NOTE: package.py looks for the 'makensis' executable in the default location for a 64 bit windows system. Run the following in the respective virtual environment. You need to use the respective x86 or x64 version of python to build the 32 bit and 64 bit versions. The build scripts will automatically detect the python architecture for you. When using build.py make sure the resulting python works before continuing to package.py. NOTE: package.py looks for the 'makensis' executable in the default location for a 64 bit windows system. The extra requirements need to be installed to run packaging: `pip install -r requirements-extra.txt`. Run the following in the respective virtual environment.
$ python package.py $ python package.py
### Running tests ### Running tests
The complete test suite can be run with tox just like on linux. The complete test suite can be run with tox just like on linux. NOTE: The extra requirements need to be installed to run unit tests: `pip install -r requirements-extra.txt`.
[python]: http://www.python.org/ [python]: http://www.python.org/
[nsis]: http://nsis.sourceforge.net/Main_Page [nsis]: http://nsis.sourceforge.net/Main_Page
[vs]: https://www.visualstudio.com/downloads/#visual-studio-community-2017 [vs]: https://www.visualstudio.com/downloads/#visual-studio-community-2019
[vsBuildTools]: https://www.visualstudio.com/downloads/#build-tools-for-visual-studio-2017 [vsBuildTools]: https://www.visualstudio.com/downloads/#build-tools-for-visual-studio-2019
[win10sdk]: https://developer.microsoft.com/en-us/windows/downloads/windows-10-sdk [win10sdk]: https://developer.microsoft.com/en-us/windows/downloads/windows-10-sdk
[KB2999226]: https://support.microsoft.com/en-us/help/2999226/update-for-universal-c-runtime-in-windows [KB2999226]: https://support.microsoft.com/en-us/help/2999226/update-for-universal-c-runtime-in-windows
[pythonWindowsCompilers]: https://wiki.python.org/moin/WindowsCompilers [pythonWindowsCompilers]: https://wiki.python.org/moin/WindowsCompilers

170
build.py
View File

@@ -4,136 +4,147 @@
# which should be included with this package. The terms are also available at # which should be included with this package. The terms are also available at
# http://www.gnu.org/licenses/gpl-3.0.html # http://www.gnu.org/licenses/gpl-3.0.html
import os from pathlib import Path
import os.path as op import sys
from optparse import OptionParser from optparse import OptionParser
import shutil import shutil
from multiprocessing import Pool
from setuptools import setup, Extension from setuptools import sandbox
from hscommon import sphinxgen from hscommon import sphinxgen
from hscommon.build import ( from hscommon.build import (
add_to_pythonpath, print_and_do, move_all, fix_qt_resource_file, add_to_pythonpath,
print_and_do,
fix_qt_resource_file,
) )
from hscommon import loc from hscommon import loc
def parse_args(): def parse_args():
usage = "usage: %prog [options]" usage = "usage: %prog [options]"
parser = OptionParser(usage=usage) parser = OptionParser(usage=usage)
parser.add_option( parser.add_option(
'--clean', action='store_true', dest='clean', "--clean",
help="Clean build folder before building" action="store_true",
dest="clean",
help="Clean build folder before building",
)
parser.add_option("--doc", action="store_true", dest="doc", help="Build only the help file (en)")
parser.add_option("--alldoc", action="store_true", dest="all_doc", help="Build only the help file in all languages")
parser.add_option("--loc", action="store_true", dest="loc", help="Build only localization")
parser.add_option(
"--updatepot",
action="store_true",
dest="updatepot",
help="Generate .pot files from source code.",
) )
parser.add_option( parser.add_option(
'--doc', action='store_true', dest='doc', "--mergepot",
help="Build only the help file" action="store_true",
dest="mergepot",
help="Update all .po files based on .pot files.",
) )
parser.add_option( parser.add_option(
'--loc', action='store_true', dest='loc', "--normpo",
help="Build only localization" action="store_true",
dest="normpo",
help="Normalize all PO files (do this before commit).",
) )
parser.add_option( parser.add_option(
'--updatepot', action='store_true', dest='updatepot', "--modules",
help="Generate .pot files from source code." action="store_true",
) dest="modules",
parser.add_option( help="Build the python modules.",
'--mergepot', action='store_true', dest='mergepot',
help="Update all .po files based on .pot files."
)
parser.add_option(
'--normpo', action='store_true', dest='normpo',
help="Normalize all PO files (do this before commit)."
) )
(options, args) = parser.parse_args() (options, args) = parser.parse_args()
return options return options
def build_help():
print("Generating Help")
current_path = op.abspath('.')
help_basepath = op.join(current_path, 'help', 'en')
help_destpath = op.join(current_path, 'build', 'help')
changelog_path = op.join(current_path, 'help', 'changelog')
tixurl = "https://github.com/hsoft/dupeguru/issues/{}"
confrepl = {'language': 'en'}
changelogtmpl = op.join(current_path, 'help', 'changelog.tmpl')
conftmpl = op.join(current_path, 'help', 'conf.tmpl')
sphinxgen.gen(help_basepath, help_destpath, changelog_path, tixurl, confrepl, conftmpl, changelogtmpl)
def build_qt_localizations(): def build_one_help(language):
loc.compile_all_po(op.join('qtlib', 'locale')) print(f"Generating Help in {language}")
loc.merge_locale_dir(op.join('qtlib', 'locale'), 'locale') current_path = Path(".").absolute()
changelog_path = current_path.joinpath("help", "changelog")
tixurl = "https://github.com/arsenetar/dupeguru/issues/{}"
changelogtmpl = current_path.joinpath("help", "changelog.tmpl")
conftmpl = current_path.joinpath("help", "conf.tmpl")
help_basepath = current_path.joinpath("help", language)
help_destpath = current_path.joinpath("build", "help", language)
confrepl = {"language": language}
sphinxgen.gen(
help_basepath,
help_destpath,
changelog_path,
tixurl,
confrepl,
conftmpl,
changelogtmpl,
)
def build_help():
languages = ["en", "de", "fr", "hy", "ru", "uk"]
# Running with Pools as for some reason sphinx seems to cross contaminate the output otherwise
with Pool(len(languages)) as p:
p.map(build_one_help, languages)
def build_localizations(): def build_localizations():
loc.compile_all_po('locale') loc.compile_all_po("locale")
build_qt_localizations() locale_dest = Path("build", "locale")
locale_dest = op.join('build', 'locale') if locale_dest.exists():
if op.exists(locale_dest):
shutil.rmtree(locale_dest) shutil.rmtree(locale_dest)
shutil.copytree('locale', locale_dest, ignore=shutil.ignore_patterns('*.po', '*.pot')) shutil.copytree("locale", locale_dest, ignore=shutil.ignore_patterns("*.po", "*.pot"))
def build_updatepot(): def build_updatepot():
print("Building .pot files from source files") print("Building .pot files from source files")
print("Building core.pot") print("Building core.pot")
loc.generate_pot(['core'], op.join('locale', 'core.pot'), ['tr']) loc.generate_pot(["core"], Path("locale", "core.pot"), ["tr"])
print("Building columns.pot") print("Building columns.pot")
loc.generate_pot(['core'], op.join('locale', 'columns.pot'), ['coltr']) loc.generate_pot(["core"], Path("locale", "columns.pot"), ["coltr"])
print("Building ui.pot") print("Building ui.pot")
# When we're not under OS X, we don't want to overwrite ui.pot because it contains Cocoa locs loc.generate_pot(["qt"], Path("locale", "ui.pot"), ["tr"], merge=True)
# We want to merge the generated pot with the old pot in the most preserving way possible.
ui_packages = ['qt', op.join('cocoa', 'inter')]
loc.generate_pot(ui_packages, op.join('locale', 'ui.pot'), ['tr'], merge=True)
print("Building qtlib.pot")
loc.generate_pot(['qtlib'], op.join('qtlib', 'locale', 'qtlib.pot'), ['tr'])
def build_mergepot(): def build_mergepot():
print("Updating .po files using .pot files") print("Updating .po files using .pot files")
loc.merge_pots_into_pos('locale') loc.merge_pots_into_pos("locale")
loc.merge_pots_into_pos(op.join('qtlib', 'locale'))
loc.merge_pots_into_pos(op.join('cocoalib', 'locale'))
def build_normpo(): def build_normpo():
loc.normalize_all_pos('locale') loc.normalize_all_pos("locale")
loc.normalize_all_pos(op.join('qtlib', 'locale'))
loc.normalize_all_pos(op.join('cocoalib', 'locale'))
def build_pe_modules(): def build_pe_modules():
print("Building PE Modules") print("Building PE Modules")
exts = [ # Leverage setup.py to build modules
Extension( sandbox.run_setup("setup.py", ["build_ext", "--inplace"])
"_block",
[op.join('core', 'pe', 'modules', 'block.c'), op.join('core', 'pe', 'modules', 'common.c')]
),
Extension(
"_cache",
[op.join('core', 'pe', 'modules', 'cache.c'), op.join('core', 'pe', 'modules', 'common.c')]
),
]
exts.append(Extension("_block_qt", [op.join('qt', 'pe', 'modules', 'block.c')]))
setup(
script_args=['build_ext', '--inplace'],
ext_modules=exts,
)
move_all('_block_qt*', op.join('qt', 'pe'))
move_all('_block*', op.join('core', 'pe'))
move_all('_cache*', op.join('core', 'pe'))
def build_normal(): def build_normal():
print("Building dupeGuru with UI qt") print("Building dupeGuru with UI qt")
add_to_pythonpath('.') add_to_pythonpath(".")
print("Building dupeGuru") print("Building dupeGuru")
build_pe_modules() build_pe_modules()
print("Building localizations") print("Building localizations")
build_localizations() build_localizations()
print("Building Qt stuff") print("Building Qt stuff")
print_and_do("pyrcc5 {0} > {1}".format(op.join('qt', 'dg.qrc'), op.join('qt', 'dg_rc.py'))) print_and_do("pyrcc5 {} > {}".format(Path("qt", "dg.qrc"), Path("qt", "dg_rc.py")))
fix_qt_resource_file(op.join('qt', 'dg_rc.py')) fix_qt_resource_file(Path("qt", "dg_rc.py"))
build_help() build_help()
def main(): def main():
if sys.version_info < (3, 7):
sys.exit("Python < 3.7 is unsupported.")
options = parse_args() options = parse_args()
if not op.exists('build'): if options.clean and Path("build").exists():
os.mkdir('build') shutil.rmtree("build")
if not Path("build").exists():
Path("build").mkdir()
if options.doc: if options.doc:
build_one_help("en")
elif options.all_doc:
build_help() build_help()
elif options.loc: elif options.loc:
build_localizations() build_localizations()
@@ -143,8 +154,11 @@ def main():
build_mergepot() build_mergepot()
elif options.normpo: elif options.normpo:
build_normpo() build_normpo()
elif options.modules:
build_pe_modules()
else: else:
build_normal() build_normal()
if __name__ == '__main__':
if __name__ == "__main__":
main() main()

View File

@@ -1,3 +1,2 @@
__version__ = '4.0.4 RC' __version__ = "4.3.1"
__appname__ = 'dupeGuru' __appname__ = "dupeGuru"

View File

@@ -4,38 +4,42 @@
# which should be included with this package. The terms are also available at # which should be included with this package. The terms are also available at
# http://www.gnu.org/licenses/gpl-3.0.html # http://www.gnu.org/licenses/gpl-3.0.html
import cProfile
import datetime
import os import os
import os.path as op import os.path as op
import logging import logging
import subprocess import subprocess
import re import re
import shutil import shutil
from pathlib import Path
from send2trash import send2trash from send2trash import send2trash
from hscommon.jobprogress import job from hscommon.jobprogress import job
from hscommon.notify import Broadcaster from hscommon.notify import Broadcaster
from hscommon.path import Path
from hscommon.conflict import smart_move, smart_copy from hscommon.conflict import smart_move, smart_copy
from hscommon.gui.progress_window import ProgressWindow from hscommon.gui.progress_window import ProgressWindow
from hscommon.util import delete_if_empty, first, escape, nonone, allsame from hscommon.util import delete_if_empty, first, escape, nonone, allsame
from hscommon.trans import tr from hscommon.trans import tr
from hscommon import desktop from hscommon import desktop
from . import se, me, pe from core import se, me, pe
from .pe.photo import get_delta_dimensions from core.pe.photo import get_delta_dimensions
from .util import cmp_value, fix_surrogate_encoding from core.util import cmp_value, fix_surrogate_encoding
from . import directories, results, export, fs, prioritize from core import directories, results, export, fs, prioritize
from .ignore import IgnoreList from core.ignore import IgnoreList
from .scanner import ScanType from core.exclude import ExcludeDict as ExcludeList
from .gui.deletion_options import DeletionOptions from core.scanner import ScanType
from .gui.details_panel import DetailsPanel from core.gui.deletion_options import DeletionOptions
from .gui.directory_tree import DirectoryTree from core.gui.details_panel import DetailsPanel
from .gui.ignore_list_dialog import IgnoreListDialog from core.gui.directory_tree import DirectoryTree
from .gui.problem_dialog import ProblemDialog from core.gui.ignore_list_dialog import IgnoreListDialog
from .gui.stats_label import StatsLabel from core.gui.exclude_list_dialog import ExcludeListDialogCore
from core.gui.problem_dialog import ProblemDialog
from core.gui.stats_label import StatsLabel
HAD_FIRST_LAUNCH_PREFERENCE = 'HadFirstLaunch' HAD_FIRST_LAUNCH_PREFERENCE = "HadFirstLaunch"
DEBUG_MODE_PREFERENCE = 'DebugMode' DEBUG_MODE_PREFERENCE = "DebugMode"
MSG_NO_MARKED_DUPES = tr("There are no marked duplicates. Nothing has been done.") MSG_NO_MARKED_DUPES = tr("There are no marked duplicates. Nothing has been done.")
MSG_NO_SELECTED_DUPES = tr("There are no selected duplicates. Nothing has been done.") MSG_NO_SELECTED_DUPES = tr("There are no selected duplicates. Nothing has been done.")
@@ -44,31 +48,36 @@ MSG_MANY_FILES_TO_OPEN = tr(
"files are opened with, doing so can create quite a mess. Continue?" "files are opened with, doing so can create quite a mess. Continue?"
) )
class DestType: class DestType:
Direct = 0 DIRECT = 0
Relative = 1 RELATIVE = 1
Absolute = 2 ABSOLUTE = 2
class JobType: class JobType:
Scan = 'job_scan' SCAN = "job_scan"
Load = 'job_load' LOAD = "job_load"
Move = 'job_move' MOVE = "job_move"
Copy = 'job_copy' COPY = "job_copy"
Delete = 'job_delete' DELETE = "job_delete"
class AppMode: class AppMode:
Standard = 0 STANDARD = 0
Music = 1 MUSIC = 1
Picture = 2 PICTURE = 2
JOBID2TITLE = { JOBID2TITLE = {
JobType.Scan: tr("Scanning for duplicates"), JobType.SCAN: tr("Scanning for duplicates"),
JobType.Load: tr("Loading"), JobType.LOAD: tr("Loading"),
JobType.Move: tr("Moving"), JobType.MOVE: tr("Moving"),
JobType.Copy: tr("Copying"), JobType.COPY: tr("Copying"),
JobType.Delete: tr("Sending to Trash"), JobType.DELETE: tr("Sending to Trash"),
} }
class DupeGuru(Broadcaster): class DupeGuru(Broadcaster):
"""Holds everything together. """Holds everything together.
@@ -100,7 +109,8 @@ class DupeGuru(Broadcaster):
Instance of :mod:`meta-gui <core.gui>` table listing the results from :attr:`results` Instance of :mod:`meta-gui <core.gui>` table listing the results from :attr:`results`
""" """
#--- View interface
# --- View interface
# get_default(key_name) # get_default(key_name)
# set_default(key_name, value) # set_default(key_name, value)
# show_message(msg) # show_message(msg)
@@ -116,37 +126,41 @@ class DupeGuru(Broadcaster):
NAME = PROMPT_NAME = "dupeGuru" NAME = PROMPT_NAME = "dupeGuru"
PICTURE_CACHE_TYPE = 'sqlite' # set to 'shelve' for a ShelveCache PICTURE_CACHE_TYPE = "sqlite" # set to 'shelve' for a ShelveCache
def __init__(self, view): def __init__(self, view, portable=False):
if view.get_default(DEBUG_MODE_PREFERENCE): if view.get_default(DEBUG_MODE_PREFERENCE):
logging.getLogger().setLevel(logging.DEBUG) logging.getLogger().setLevel(logging.DEBUG)
logging.debug("Debug mode enabled") logging.debug("Debug mode enabled")
Broadcaster.__init__(self) Broadcaster.__init__(self)
self.view = view self.view = view
self.appdata = desktop.special_folder_path(desktop.SpecialFolder.AppData, appname=self.NAME) self.appdata = desktop.special_folder_path(desktop.SpecialFolder.APPDATA, portable=portable)
if not op.exists(self.appdata): if not op.exists(self.appdata):
os.makedirs(self.appdata) os.makedirs(self.appdata)
self.app_mode = AppMode.Standard self.app_mode = AppMode.STANDARD
self.discarded_file_count = 0 self.discarded_file_count = 0
self.directories = directories.Directories() self.exclude_list = ExcludeList()
hash_cache_file = op.join(self.appdata, "hash_cache.db")
fs.filesdb.connect(hash_cache_file)
self.directories = directories.Directories(self.exclude_list)
self.results = results.Results(self) self.results = results.Results(self)
self.ignore_list = IgnoreList() self.ignore_list = IgnoreList()
# In addition to "app-level" options, this dictionary also holds options that will be # In addition to "app-level" options, this dictionary also holds options that will be
# sent to the scanner. They don't have default values because those defaults values are # sent to the scanner. They don't have default values because those defaults values are
# defined in the scanner class. # defined in the scanner class.
self.options = { self.options = {
'escape_filter_regexp': True, "escape_filter_regexp": True,
'clean_empty_dirs': False, "clean_empty_dirs": False,
'ignore_hardlink_matches': False, "ignore_hardlink_matches": False,
'copymove_dest_type': DestType.Relative, "copymove_dest_type": DestType.RELATIVE,
'picture_cache_type': self.PICTURE_CACHE_TYPE "picture_cache_type": self.PICTURE_CACHE_TYPE,
} }
self.selected_dupes = [] self.selected_dupes = []
self.details_panel = DetailsPanel(self) self.details_panel = DetailsPanel(self)
self.directory_tree = DirectoryTree(self) self.directory_tree = DirectoryTree(self)
self.problem_dialog = ProblemDialog(self) self.problem_dialog = ProblemDialog(self)
self.ignore_list_dialog = IgnoreListDialog(self) self.ignore_list_dialog = IgnoreListDialog(self)
self.exclude_list_dialog = ExcludeListDialogCore(self)
self.stats_label = StatsLabel(self) self.stats_label = StatsLabel(self)
self.result_table = None self.result_table = None
self.deletion_options = DeletionOptions() self.deletion_options = DeletionOptions()
@@ -155,13 +169,13 @@ class DupeGuru(Broadcaster):
for child in children: for child in children:
child.connect() child.connect()
#--- Private # --- Private
def _recreate_result_table(self): def _recreate_result_table(self):
if self.result_table is not None: if self.result_table is not None:
self.result_table.disconnect() self.result_table.disconnect()
if self.app_mode == AppMode.Picture: if self.app_mode == AppMode.PICTURE:
self.result_table = pe.result_table.ResultTable(self) self.result_table = pe.result_table.ResultTable(self)
elif self.app_mode == AppMode.Music: elif self.app_mode == AppMode.MUSIC:
self.result_table = me.result_table.ResultTable(self) self.result_table = me.result_table.ResultTable(self)
else: else:
self.result_table = se.result_table.ResultTable(self) self.result_table = se.result_table.ResultTable(self)
@@ -169,26 +183,24 @@ class DupeGuru(Broadcaster):
self.view.create_results_window() self.view.create_results_window()
def _get_picture_cache_path(self): def _get_picture_cache_path(self):
cache_type = self.options['picture_cache_type'] cache_type = self.options["picture_cache_type"]
cache_name = 'cached_pictures.shelve' if cache_type == 'shelve' else 'cached_pictures.db' cache_name = "cached_pictures.shelve" if cache_type == "shelve" else "cached_pictures.db"
return op.join(self.appdata, cache_name) return op.join(self.appdata, cache_name)
def _get_dupe_sort_key(self, dupe, get_group, key, delta): def _get_dupe_sort_key(self, dupe, get_group, key, delta):
if self.app_mode in (AppMode.Music, AppMode.Picture): if self.app_mode in (AppMode.MUSIC, AppMode.PICTURE) and key == "folder_path":
if key == 'folder_path': dupe_folder_path = getattr(dupe, "display_folder_path", dupe.folder_path)
dupe_folder_path = getattr(dupe, 'display_folder_path', dupe.folder_path) return str(dupe_folder_path).lower()
return str(dupe_folder_path).lower() if self.app_mode == AppMode.PICTURE and delta and key == "dimensions":
if self.app_mode == AppMode.Picture: r = cmp_value(dupe, key)
if delta and key == 'dimensions': ref_value = cmp_value(get_group().ref, key)
r = cmp_value(dupe, key) return get_delta_dimensions(r, ref_value)
ref_value = cmp_value(get_group().ref, key) if key == "marked":
return get_delta_dimensions(r, ref_value)
if key == 'marked':
return self.results.is_marked(dupe) return self.results.is_marked(dupe)
if key == 'percentage': if key == "percentage":
m = get_group().get_match_of(dupe) m = get_group().get_match_of(dupe)
return m.percentage return m.percentage
elif key == 'dupe_count': elif key == "dupe_count":
return 0 return 0
else: else:
result = cmp_value(dupe, key) result = cmp_value(dupe, key)
@@ -202,15 +214,14 @@ class DupeGuru(Broadcaster):
return result return result
def _get_group_sort_key(self, group, key): def _get_group_sort_key(self, group, key):
if self.app_mode in (AppMode.Music, AppMode.Picture): if self.app_mode in (AppMode.MUSIC, AppMode.PICTURE) and key == "folder_path":
if key == 'folder_path': dupe_folder_path = getattr(group.ref, "display_folder_path", group.ref.folder_path)
dupe_folder_path = getattr(group.ref, 'display_folder_path', group.ref.folder_path) return str(dupe_folder_path).lower()
return str(dupe_folder_path).lower() if key == "percentage":
if key == 'percentage':
return group.percentage return group.percentage
if key == 'dupe_count': if key == "dupe_count":
return len(group) return len(group)
if key == 'marked': if key == "marked":
return len([dupe for dupe in group.dupes if self.results.is_marked(dupe)]) return len([dupe for dupe in group.dupes if self.results.is_marked(dupe)])
return cmp_value(group.ref, key) return cmp_value(group.ref, key)
@@ -233,17 +244,17 @@ class DupeGuru(Broadcaster):
else: else:
os.remove(str_path) os.remove(str_path)
else: else:
send2trash(str_path) # Raises OSError when there's a problem send2trash(str_path) # Raises OSError when there's a problem
if link_deleted: if link_deleted:
group = self.results.get_group_of_duplicate(dupe) group = self.results.get_group_of_duplicate(dupe)
ref = group.ref ref = group.ref
linkfunc = os.link if use_hardlinks else os.symlink linkfunc = os.link if use_hardlinks else os.symlink
linkfunc(str(ref.path), str_path) linkfunc(str(ref.path), str_path)
self.clean_empty_dirs(dupe.path.parent()) self.clean_empty_dirs(dupe.path.parent)
def _create_file(self, path): def _create_file(self, path):
# We add fs.Folder to fileclasses in case the file we're loading contains folder paths. # We add fs.Folder to fileclasses in case the file we're loading contains folder paths.
return fs.get_file(path, self.fileclasses + [fs.Folder]) return fs.get_file(path, self.fileclasses + [se.fs.Folder])
def _get_file(self, str_path): def _get_file(self, str_path):
path = Path(str_path) path = Path(str_path)
@@ -253,14 +264,11 @@ class DupeGuru(Broadcaster):
try: try:
f._read_all_info(attrnames=self.METADATA_TO_READ) f._read_all_info(attrnames=self.METADATA_TO_READ)
return f return f
except EnvironmentError: except OSError:
return None return None
def _get_export_data(self): def _get_export_data(self):
columns = [ columns = [col for col in self.result_table._columns.ordered_columns if col.visible and col.name != "marked"]
col for col in self.result_table.columns.ordered_columns
if col.visible and col.name != 'marked'
]
colnames = [col.display for col in columns] colnames = [col.display for col in columns]
rows = [] rows = []
for group_id, group in enumerate(self.results.groups): for group_id, group in enumerate(self.results.groups):
@@ -272,11 +280,8 @@ class DupeGuru(Broadcaster):
return colnames, rows return colnames, rows
def _results_changed(self): def _results_changed(self):
self.selected_dupes = [ self.selected_dupes = [d for d in self.selected_dupes if self.results.get_group_of_duplicate(d) is not None]
d for d in self.selected_dupes self.notify("results_changed")
if self.results.get_group_of_duplicate(d) is not None
]
self.notify('results_changed')
def _start_job(self, jobid, func, args=()): def _start_job(self, jobid, func, args=()):
title = JOBID2TITLE[jobid] title = JOBID2TITLE[jobid]
@@ -290,32 +295,36 @@ class DupeGuru(Broadcaster):
self.view.show_message(msg) self.view.show_message(msg)
def _job_completed(self, jobid): def _job_completed(self, jobid):
if jobid == JobType.Scan: if jobid == JobType.SCAN:
self._results_changed() self._results_changed()
fs.filesdb.commit()
if not self.results.groups: if not self.results.groups:
self.view.show_message(tr("No duplicates found.")) self.view.show_message(tr("No duplicates found."))
else: else:
self.view.show_results_window() self.view.show_results_window()
if jobid in {JobType.Move, JobType.Delete}: if jobid in {JobType.MOVE, JobType.DELETE}:
self._results_changed() self._results_changed()
if jobid == JobType.Load: if jobid == JobType.LOAD:
self._recreate_result_table() self._recreate_result_table()
self._results_changed() self._results_changed()
self.view.show_results_window() self.view.show_results_window()
if jobid in {JobType.Copy, JobType.Move, JobType.Delete}: if jobid in {JobType.COPY, JobType.MOVE, JobType.DELETE}:
if self.results.problems: if self.results.problems:
self.problem_dialog.refresh() self.problem_dialog.refresh()
self.view.show_problem_dialog() self.view.show_problem_dialog()
else: else:
msg = { if jobid == JobType.COPY:
JobType.Copy: tr("All marked files were copied successfully."), msg = tr("All marked files were copied successfully.")
JobType.Move: tr("All marked files were moved successfully."), elif jobid == JobType.MOVE:
JobType.Delete: tr("All marked files were successfully sent to Trash."), msg = tr("All marked files were moved successfully.")
}[jobid] elif jobid == JobType.DELETE and self.deletion_options.direct:
msg = tr("All marked files were deleted successfully.")
else:
msg = tr("All marked files were successfully sent to Trash.")
self.view.show_message(msg) self.view.show_message(msg)
def _job_error(self, jobid, err): def _job_error(self, jobid, err):
if jobid == JobType.Load: if jobid == JobType.LOAD:
msg = tr("Could not load file: {}").format(err) msg = tr("Could not load file: {}").format(err)
self.view.show_message(msg) self.view.show_message(msg)
return False return False
@@ -341,26 +350,26 @@ class DupeGuru(Broadcaster):
if dupes == self.selected_dupes: if dupes == self.selected_dupes:
return return
self.selected_dupes = dupes self.selected_dupes = dupes
self.notify('dupes_selected') self.notify("dupes_selected")
#--- Protected # --- Protected
def _get_fileclasses(self): def _get_fileclasses(self):
if self.app_mode == AppMode.Picture: if self.app_mode == AppMode.PICTURE:
return [pe.photo.PLAT_SPECIFIC_PHOTO_CLASS] return [pe.photo.PLAT_SPECIFIC_PHOTO_CLASS]
elif self.app_mode == AppMode.Music: elif self.app_mode == AppMode.MUSIC:
return [me.fs.MusicFile] return [me.fs.MusicFile]
else: else:
return [se.fs.File] return [se.fs.File]
def _prioritization_categories(self): def _prioritization_categories(self):
if self.app_mode == AppMode.Picture: if self.app_mode == AppMode.PICTURE:
return pe.prioritize.all_categories() return pe.prioritize.all_categories()
elif self.app_mode == AppMode.Music: elif self.app_mode == AppMode.MUSIC:
return me.prioritize.all_categories() return me.prioritize.all_categories()
else: else:
return prioritize.all_categories() return prioritize.all_categories()
#--- Public # --- Public
def add_directory(self, d): def add_directory(self, d):
"""Adds folder ``d`` to :attr:`directories`. """Adds folder ``d`` to :attr:`directories`.
@@ -370,15 +379,14 @@ class DupeGuru(Broadcaster):
""" """
try: try:
self.directories.add_path(Path(d)) self.directories.add_path(Path(d))
self.notify('directories_changed') self.notify("directories_changed")
except directories.AlreadyThereError: except directories.AlreadyThereError:
self.view.show_message(tr("'{}' already is in the list.").format(d)) self.view.show_message(tr("'{}' already is in the list.").format(d))
except directories.InvalidPathError: except directories.InvalidPathError:
self.view.show_message(tr("'{}' does not exist.").format(d)) self.view.show_message(tr("'{}' does not exist.").format(d))
def add_selected_to_ignore_list(self): def add_selected_to_ignore_list(self):
"""Adds :attr:`selected_dupes` to :attr:`ignore_list`. """Adds :attr:`selected_dupes` to :attr:`ignore_list`."""
"""
dupes = self.without_ref(self.selected_dupes) dupes = self.without_ref(self.selected_dupes)
if not dupes: if not dupes:
self.view.show_message(MSG_NO_SELECTED_DUPES) self.view.show_message(MSG_NO_SELECTED_DUPES)
@@ -390,60 +398,64 @@ class DupeGuru(Broadcaster):
g = self.results.get_group_of_duplicate(dupe) g = self.results.get_group_of_duplicate(dupe)
for other in g: for other in g:
if other is not dupe: if other is not dupe:
self.ignore_list.Ignore(str(other.path), str(dupe.path)) self.ignore_list.ignore(str(other.path), str(dupe.path))
self.remove_duplicates(dupes) self.remove_duplicates(dupes)
self.ignore_list_dialog.refresh() self.ignore_list_dialog.refresh()
def apply_filter(self, filter): def apply_filter(self, result_filter):
"""Apply a filter ``filter`` to the results so that it shows only dupe groups that match it. """Apply a filter ``filter`` to the results so that it shows only dupe groups that match it.
:param str filter: filter to apply :param str filter: filter to apply
""" """
self.results.apply_filter(None) self.results.apply_filter(None)
if self.options['escape_filter_regexp']: if self.options["escape_filter_regexp"]:
filter = escape(filter, set('()[]\\.|+?^')) result_filter = escape(result_filter, set("()[]\\.|+?^"))
filter = escape(filter, '*', '.') result_filter = escape(result_filter, "*", ".")
self.results.apply_filter(filter) self.results.apply_filter(result_filter)
self._results_changed() self._results_changed()
def clean_empty_dirs(self, path): def clean_empty_dirs(self, path):
if self.options['clean_empty_dirs']: if self.options["clean_empty_dirs"]:
while delete_if_empty(path, ['.DS_Store']): while delete_if_empty(path, [".DS_Store"]):
path = path.parent() path = path.parent
def clear_picture_cache(self): def clear_picture_cache(self):
try: try:
os.remove(self._get_picture_cache_path()) os.remove(self._get_picture_cache_path())
except FileNotFoundError: except FileNotFoundError:
pass # we don't care pass # we don't care
def clear_hash_cache(self):
fs.filesdb.clear()
def copy_or_move(self, dupe, copy: bool, destination: str, dest_type: DestType): def copy_or_move(self, dupe, copy: bool, destination: str, dest_type: DestType):
source_path = dupe.path source_path = dupe.path
location_path = first(p for p in self.directories if dupe.path in p) location_path = first(p for p in self.directories if p in dupe.path.parents)
dest_path = Path(destination) dest_path = Path(destination)
if dest_type in {DestType.Relative, DestType.Absolute}: if dest_type in {DestType.RELATIVE, DestType.ABSOLUTE}:
# no filename, no windows drive letter # no filename, no windows drive letter
source_base = source_path.remove_drive_letter().parent() source_base = source_path.relative_to(source_path.anchor).parent
if dest_type == DestType.Relative: if dest_type == DestType.RELATIVE:
source_base = source_base[location_path:] source_base = source_base.relative_to(location_path.relative_to(location_path.anchor))
dest_path = dest_path[source_base] dest_path = dest_path.joinpath(source_base)
if not dest_path.exists(): if not dest_path.exists():
dest_path.makedirs() dest_path.mkdir(parents=True)
# Add filename to dest_path. For file move/copy, it's not required, but for folders, yes. # Add filename to dest_path. For file move/copy, it's not required, but for folders, yes.
dest_path = dest_path[source_path.name] dest_path = dest_path.joinpath(source_path.name)
logging.debug("Copy/Move operation from '%s' to '%s'", source_path, dest_path) logging.debug("Copy/Move operation from '%s' to '%s'", source_path, dest_path)
# Raises an EnvironmentError if there's a problem # Raises an EnvironmentError if there's a problem
if copy: if copy:
smart_copy(source_path, dest_path) smart_copy(source_path, dest_path)
else: else:
smart_move(source_path, dest_path) smart_move(source_path, dest_path)
self.clean_empty_dirs(source_path.parent()) self.clean_empty_dirs(source_path.parent)
def copy_or_move_marked(self, copy): def copy_or_move_marked(self, copy):
"""Start an async move (or copy) job on marked duplicates. """Start an async move (or copy) job on marked duplicates.
:param bool copy: If True, duplicates will be copied instead of moved :param bool copy: If True, duplicates will be copied instead of moved
""" """
def do(j): def do(j):
def op(dupe): def op(dupe):
j.add_progress() j.add_progress()
@@ -455,28 +467,30 @@ class DupeGuru(Broadcaster):
if not self.results.mark_count: if not self.results.mark_count:
self.view.show_message(MSG_NO_MARKED_DUPES) self.view.show_message(MSG_NO_MARKED_DUPES)
return return
opname = tr("copy") if copy else tr("move") destination = self.view.select_dest_folder(
prompt = tr("Select a directory to {} marked files to").format(opname) tr("Select a directory to copy marked files to")
destination = self.view.select_dest_folder(prompt) if copy
else tr("Select a directory to move marked files to")
)
if destination: if destination:
desttype = self.options['copymove_dest_type'] desttype = self.options["copymove_dest_type"]
jobid = JobType.Copy if copy else JobType.Move jobid = JobType.COPY if copy else JobType.MOVE
self._start_job(jobid, do) self._start_job(jobid, do)
def delete_marked(self): def delete_marked(self):
"""Start an async job to send marked duplicates to the trash. """Start an async job to send marked duplicates to the trash."""
"""
if not self.results.mark_count: if not self.results.mark_count:
self.view.show_message(MSG_NO_MARKED_DUPES) self.view.show_message(MSG_NO_MARKED_DUPES)
return return
if not self.deletion_options.show(self.results.mark_count): if not self.deletion_options.show(self.results.mark_count):
return return
args = [ args = [
self.deletion_options.link_deleted, self.deletion_options.use_hardlinks, self.deletion_options.link_deleted,
self.deletion_options.direct self.deletion_options.use_hardlinks,
self.deletion_options.direct,
] ]
logging.debug("Starting deletion job with args %r", args) logging.debug("Starting deletion job with args %r", args)
self._start_job(JobType.Delete, self._do_delete, args=args) self._start_job(JobType.DELETE, self._do_delete, args=args)
def export_to_xhtml(self): def export_to_xhtml(self):
"""Export current results to XHTML. """Export current results to XHTML.
@@ -495,7 +509,7 @@ class DupeGuru(Broadcaster):
The columns and their order in the resulting CSV file is determined in the same way as in The columns and their order in the resulting CSV file is determined in the same way as in
:meth:`export_to_xhtml`. :meth:`export_to_xhtml`.
""" """
dest_file = self.view.select_dest_file(tr("Select a destination for your exported CSV"), 'csv') dest_file = self.view.select_dest_file(tr("Select a destination for your exported CSV"), "csv")
if dest_file: if dest_file:
colnames, rows = self._get_export_data() colnames, rows = self._get_export_data()
try: try:
@@ -505,13 +519,14 @@ class DupeGuru(Broadcaster):
def get_display_info(self, dupe, group, delta=False): def get_display_info(self, dupe, group, delta=False):
def empty_data(): def empty_data():
return {c.name: '---' for c in self.result_table.COLUMNS[1:]} return {c.name: "---" for c in self.result_table.COLUMNS[1:]}
if (dupe is None) or (group is None): if (dupe is None) or (group is None):
return empty_data() return empty_data()
try: try:
return dupe.get_display_info(group, delta) return dupe.get_display_info(group, delta)
except Exception as e: except Exception as e:
logging.warning("Exception on GetDisplayInfo for %s: %s", str(dupe.path), str(e)) logging.warning("Exception (type: %s) on GetDisplayInfo for %s: %s", type(e), str(dupe.path), str(e))
return empty_data() return empty_data()
def invoke_custom_command(self): def invoke_custom_command(self):
@@ -521,28 +536,32 @@ class DupeGuru(Broadcaster):
is replaced with that dupe's ref file. If there's no selection, the command is not invoked. is replaced with that dupe's ref file. If there's no selection, the command is not invoked.
If the dupe is a ref, ``%d`` and ``%r`` will be the same. If the dupe is a ref, ``%d`` and ``%r`` will be the same.
""" """
cmd = self.view.get_default('CustomCommand') cmd = self.view.get_default("CustomCommand")
if not cmd: if not cmd:
msg = tr("You have no custom command set up. Set it up in your preferences.") msg = tr("You have no custom command set up. Set it up in your preferences.")
self.view.show_message(msg) self.view.show_message(msg)
return return
if not self.selected_dupes: if not self.selected_dupes:
return return
dupe = self.selected_dupes[0] dupes = self.selected_dupes
group = self.results.get_group_of_duplicate(dupe) refs = [self.results.get_group_of_duplicate(dupe).ref for dupe in dupes]
ref = group.ref for dupe, ref in zip(dupes, refs):
cmd = cmd.replace('%d', str(dupe.path)) dupe_cmd = cmd.replace("%d", str(dupe.path))
cmd = cmd.replace('%r', str(ref.path)) dupe_cmd = dupe_cmd.replace("%r", str(ref.path))
match = re.match(r'"([^"]+)"(.*)', cmd) match = re.match(r'"([^"]+)"(.*)', dupe_cmd)
if match is not None: if match is not None:
# This code here is because subprocess. Popen doesn't seem to accept, under Windows, # This code here is because subprocess. Popen doesn't seem to accept, under Windows,
# executable paths with spaces in it, *even* when they're enclosed in "". So this is # executable paths with spaces in it, *even* when they're enclosed in "". So this is
# a workaround to make the damn thing work. # a workaround to make the damn thing work.
exepath, args = match.groups() exepath, args = match.groups()
path, exename = op.split(exepath) path, exename = op.split(exepath)
subprocess.Popen(exename + args, shell=True, cwd=path) p = subprocess.Popen(exename + args, shell=True, cwd=path, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
else: output = p.stdout.read()
subprocess.Popen(cmd, shell=True) logging.info("Custom command %s %s: %s", exename, args, output)
else:
p = subprocess.Popen(dupe_cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
output = p.stdout.read()
logging.info("Custom command %s: %s", dupe_cmd, output)
def load(self): def load(self):
"""Load directory selection and ignore list from files in appdata. """Load directory selection and ignore list from files in appdata.
@@ -551,20 +570,31 @@ class DupeGuru(Broadcaster):
is persistent data, is the same as when the last session was closed (when :meth:`save` was is persistent data, is the same as when the last session was closed (when :meth:`save` was
called). called).
""" """
self.directories.load_from_file(op.join(self.appdata, 'last_directories.xml')) self.directories.load_from_file(op.join(self.appdata, "last_directories.xml"))
self.notify('directories_changed') self.notify("directories_changed")
p = op.join(self.appdata, 'ignore_list.xml') p = op.join(self.appdata, "ignore_list.xml")
self.ignore_list.load_from_xml(p) self.ignore_list.load_from_xml(p)
self.ignore_list_dialog.refresh() self.ignore_list_dialog.refresh()
p = op.join(self.appdata, "exclude_list.xml")
self.exclude_list.load_from_xml(p)
self.exclude_list_dialog.refresh()
def load_directories(self, filepath):
# Clear out previous entries
self.directories.__init__()
self.directories.load_from_file(filepath)
self.notify("directories_changed")
def load_from(self, filename): def load_from(self, filename):
"""Start an async job to load results from ``filename``. """Start an async job to load results from ``filename``.
:param str filename: path of the XML file (created with :meth:`save_as`) to load :param str filename: path of the XML file (created with :meth:`save_as`) to load
""" """
def do(j): def do(j):
self.results.load_from_xml(filename, self._get_file, j) self.results.load_from_xml(filename, self._get_file, j)
self._start_job(JobType.Load, do)
self._start_job(JobType.LOAD, do)
def make_selected_reference(self): def make_selected_reference(self):
"""Promote :attr:`selected_dupes` to reference position within their respective groups. """Promote :attr:`selected_dupes` to reference position within their respective groups.
@@ -577,9 +607,8 @@ class DupeGuru(Broadcaster):
changed_groups = set() changed_groups = set()
for dupe in dupes: for dupe in dupes:
g = self.results.get_group_of_duplicate(dupe) g = self.results.get_group_of_duplicate(dupe)
if g not in changed_groups: if g not in changed_groups and self.results.make_ref(dupe):
if self.results.make_ref(dupe): changed_groups.add(g)
changed_groups.add(g)
# It's not always obvious to users what this action does, so to make it a bit clearer, # It's not always obvious to users what this action does, so to make it a bit clearer,
# we change our selection to the ref of all changed groups. However, we also want to keep # we change our selection to the ref of all changed groups. However, we also want to keep
# the files that were ref before and weren't changed by the action. In effect, what this # the files that were ref before and weren't changed by the action. In effect, what this
@@ -588,35 +617,31 @@ class DupeGuru(Broadcaster):
if not self.result_table.power_marker: if not self.result_table.power_marker:
if changed_groups: if changed_groups:
self.selected_dupes = [ self.selected_dupes = [
d for d in self.selected_dupes d for d in self.selected_dupes if self.results.get_group_of_duplicate(d).ref is d
if self.results.get_group_of_duplicate(d).ref is d
] ]
self.notify('results_changed') self.notify("results_changed")
else: else:
# If we're in "Dupes Only" mode (previously called Power Marker), things are a bit # If we're in "Dupes Only" mode (previously called Power Marker), things are a bit
# different. The refs are not shown in the table, and if our operation is successful, # different. The refs are not shown in the table, and if our operation is successful,
# this means that there's no way to follow our dupe selection. Then, the best thing to # this means that there's no way to follow our dupe selection. Then, the best thing to
# do is to keep our selection index-wise (different dupe selection, but same index # do is to keep our selection index-wise (different dupe selection, but same index
# selection). # selection).
self.notify('results_changed_but_keep_selection') self.notify("results_changed_but_keep_selection")
def mark_all(self): def mark_all(self):
"""Set all dupes in the results as marked. """Set all dupes in the results as marked."""
"""
self.results.mark_all() self.results.mark_all()
self.notify('marking_changed') self.notify("marking_changed")
def mark_none(self): def mark_none(self):
"""Set all dupes in the results as unmarked. """Set all dupes in the results as unmarked."""
"""
self.results.mark_none() self.results.mark_none()
self.notify('marking_changed') self.notify("marking_changed")
def mark_invert(self): def mark_invert(self):
"""Invert the marked state of all dupes in the results. """Invert the marked state of all dupes in the results."""
"""
self.results.mark_invert() self.results.mark_invert()
self.notify('marking_changed') self.notify("marking_changed")
def mark_dupe(self, dupe, marked): def mark_dupe(self, dupe, marked):
"""Change marked status of ``dupe``. """Change marked status of ``dupe``.
@@ -629,21 +654,18 @@ class DupeGuru(Broadcaster):
self.results.mark(dupe) self.results.mark(dupe)
else: else:
self.results.unmark(dupe) self.results.unmark(dupe)
self.notify('marking_changed') self.notify("marking_changed")
def open_selected(self): def open_selected(self):
"""Open :attr:`selected_dupes` with their associated application. """Open :attr:`selected_dupes` with their associated application."""
""" if len(self.selected_dupes) > 10 and not self.view.ask_yes_no(MSG_MANY_FILES_TO_OPEN):
if len(self.selected_dupes) > 10: return
if not self.view.ask_yes_no(MSG_MANY_FILES_TO_OPEN):
return
for dupe in self.selected_dupes: for dupe in self.selected_dupes:
desktop.open_path(dupe.path) desktop.open_path(dupe.path)
def purge_ignore_list(self): def purge_ignore_list(self):
"""Remove files that don't exist from :attr:`ignore_list`. """Remove files that don't exist from :attr:`ignore_list`."""
""" self.ignore_list.filter(lambda f, s: op.exists(f) and op.exists(s))
self.ignore_list.Filter(lambda f, s: op.exists(f) and op.exists(s))
self.ignore_list_dialog.refresh() self.ignore_list_dialog.refresh()
def remove_directories(self, indexes): def remove_directories(self, indexes):
@@ -656,7 +678,7 @@ class DupeGuru(Broadcaster):
indexes = sorted(indexes, reverse=True) indexes = sorted(indexes, reverse=True)
for index in indexes: for index in indexes:
del self.directories[index] del self.directories[index]
self.notify('directories_changed') self.notify("directories_changed")
except IndexError: except IndexError:
pass pass
@@ -669,11 +691,10 @@ class DupeGuru(Broadcaster):
:type duplicates: list of :class:`~core.fs.File` :type duplicates: list of :class:`~core.fs.File`
""" """
self.results.remove_duplicates(self.without_ref(duplicates)) self.results.remove_duplicates(self.without_ref(duplicates))
self.notify('results_changed_but_keep_selection') self.notify("results_changed_but_keep_selection")
def remove_marked(self): def remove_marked(self):
"""Removed marked duplicates from the results (without touching the files themselves). """Removed marked duplicates from the results (without touching the files themselves)."""
"""
if not self.results.mark_count: if not self.results.mark_count:
self.view.show_message(MSG_NO_MARKED_DUPES) self.view.show_message(MSG_NO_MARKED_DUPES)
return return
@@ -684,8 +705,7 @@ class DupeGuru(Broadcaster):
self._results_changed() self._results_changed()
def remove_selected(self): def remove_selected(self):
"""Removed :attr:`selected_dupes` from the results (without touching the files themselves). """Removed :attr:`selected_dupes` from the results (without touching the files themselves)."""
"""
dupes = self.without_ref(self.selected_dupes) dupes = self.without_ref(self.selected_dupes)
if not dupes: if not dupes:
self.view.show_message(MSG_NO_SELECTED_DUPES) self.view.show_message(MSG_NO_SELECTED_DUPES)
@@ -723,6 +743,8 @@ class DupeGuru(Broadcaster):
for group in self.results.groups: for group in self.results.groups:
if group.prioritize(key_func=sort_key): if group.prioritize(key_func=sort_key):
count += 1 count += 1
if count:
self.results.refresh_required = True
self._results_changed() self._results_changed()
msg = tr("{} duplicate groups were changed by the re-prioritization.").format(count) msg = tr("{} duplicate groups were changed by the re-prioritization.").format(count)
self.view.show_message(msg) self.view.show_message(msg)
@@ -734,10 +756,15 @@ class DupeGuru(Broadcaster):
def save(self): def save(self):
if not op.exists(self.appdata): if not op.exists(self.appdata):
os.makedirs(self.appdata) os.makedirs(self.appdata)
self.directories.save_to_file(op.join(self.appdata, 'last_directories.xml')) self.directories.save_to_file(op.join(self.appdata, "last_directories.xml"))
p = op.join(self.appdata, 'ignore_list.xml') p = op.join(self.appdata, "ignore_list.xml")
self.ignore_list.save_to_xml(p) self.ignore_list.save_to_xml(p)
self.notify('save_session') p = op.join(self.appdata, "exclude_list.xml")
self.exclude_list.save_to_xml(p)
self.notify("save_session")
def close(self):
fs.filesdb.close()
def save_as(self, filename): def save_as(self, filename):
"""Save results in ``filename``. """Save results in ``filename``.
@@ -749,7 +776,17 @@ class DupeGuru(Broadcaster):
except OSError as e: except OSError as e:
self.view.show_message(tr("Couldn't write to file: {}").format(str(e))) self.view.show_message(tr("Couldn't write to file: {}").format(str(e)))
def start_scanning(self): def save_directories_as(self, filename):
"""Save directories in ``filename``.
:param str filename: path of the file to save directories (as XML) to.
"""
try:
self.directories.save_to_file(filename)
except OSError as e:
self.view.show_message(tr("Couldn't write to file: {}").format(str(e)))
def start_scanning(self, profile_scan=False):
"""Starts an async job to scan for duplicates. """Starts an async job to scan for duplicates.
Scans folders selected in :attr:`directories` and put the results in :attr:`results` Scans folders selected in :attr:`directories` and put the results in :attr:`results`
@@ -762,25 +799,31 @@ class DupeGuru(Broadcaster):
for k, v in self.options.items(): for k, v in self.options.items():
if hasattr(scanner, k): if hasattr(scanner, k):
setattr(scanner, k, v) setattr(scanner, k, v)
if self.app_mode == AppMode.Picture: if self.app_mode == AppMode.PICTURE:
scanner.cache_path = self._get_picture_cache_path() scanner.cache_path = self._get_picture_cache_path()
self.results.groups = [] self.results.groups = []
self._recreate_result_table() self._recreate_result_table()
self._results_changed() self._results_changed()
def do(j): def do(j):
if profile_scan:
pr = cProfile.Profile()
pr.enable()
j.set_progress(0, tr("Collecting files to scan")) j.set_progress(0, tr("Collecting files to scan"))
if scanner.scan_type == ScanType.Folders: if scanner.scan_type == ScanType.FOLDERS:
files = list(self.directories.get_folders(folderclass=se.fs.Folder, j=j)) files = list(self.directories.get_folders(folderclass=se.fs.Folder, j=j))
else: else:
files = list(self.directories.get_files(fileclasses=self.fileclasses, j=j)) files = list(self.directories.get_files(fileclasses=self.fileclasses, j=j))
if self.options['ignore_hardlink_matches']: if self.options["ignore_hardlink_matches"]:
files = self._remove_hardlink_dupes(files) files = self._remove_hardlink_dupes(files)
logging.info('Scanning %d files' % len(files)) logging.info("Scanning %d files" % len(files))
self.results.groups = scanner.get_dupe_groups(files, self.ignore_list, j) self.results.groups = scanner.get_dupe_groups(files, self.ignore_list, j)
self.discarded_file_count = scanner.discarded_file_count self.discarded_file_count = scanner.discarded_file_count
if profile_scan:
pr.disable()
pr.dump_stats(op.join(self.appdata, f"{datetime.datetime.now():%Y-%m-%d_%H-%M-%S}.profile"))
self._start_job(JobType.Scan, do) self._start_job(JobType.SCAN, do)
def toggle_selected_mark_state(self): def toggle_selected_mark_state(self):
selected = self.without_ref(self.selected_dupes) selected = self.without_ref(self.selected_dupes)
@@ -792,11 +835,10 @@ class DupeGuru(Broadcaster):
markfunc = self.results.mark markfunc = self.results.mark
for dupe in selected: for dupe in selected:
markfunc(dupe) markfunc(dupe)
self.notify('marking_changed') self.notify("marking_changed")
def without_ref(self, dupes): def without_ref(self, dupes):
"""Returns ``dupes`` with all reference elements removed. """Returns ``dupes`` with all reference elements removed."""
"""
return [dupe for dupe in dupes if self.results.get_group_of_duplicate(dupe).ref is not dupe] return [dupe for dupe in dupes if self.results.get_group_of_duplicate(dupe).ref is not dupe]
def get_default(self, key, fallback_value=None): def get_default(self, key, fallback_value=None):
@@ -812,7 +854,7 @@ class DupeGuru(Broadcaster):
def set_default(self, key, value): def set_default(self, key, value):
self.view.set_default(key, value) self.view.set_default(key, value)
#--- Properties # --- Properties
@property @property
def stat_line(self): def stat_line(self):
result = self.results.stat_line result = self.results.stat_line
@@ -826,22 +868,31 @@ class DupeGuru(Broadcaster):
@property @property
def SCANNER_CLASS(self): def SCANNER_CLASS(self):
if self.app_mode == AppMode.Picture: if self.app_mode == AppMode.PICTURE:
return pe.scanner.ScannerPE return pe.scanner.ScannerPE
elif self.app_mode == AppMode.Music: elif self.app_mode == AppMode.MUSIC:
return me.scanner.ScannerME return me.scanner.ScannerME
else: else:
return se.scanner.ScannerSE return se.scanner.ScannerSE
@property @property
def METADATA_TO_READ(self): def METADATA_TO_READ(self):
if self.app_mode == AppMode.Picture: if self.app_mode == AppMode.PICTURE:
return ['size', 'mtime', 'dimensions', 'exif_timestamp'] return ["size", "mtime", "dimensions", "exif_timestamp"]
elif self.app_mode == AppMode.Music: elif self.app_mode == AppMode.MUSIC:
return [ return [
'size', 'mtime', 'duration', 'bitrate', 'samplerate', 'title', 'artist', "size",
'album', 'genre', 'year', 'track', 'comment' "mtime",
"duration",
"bitrate",
"samplerate",
"title",
"artist",
"album",
"genre",
"year",
"track",
"comment",
] ]
else: else:
return ['size', 'mtime'] return ["size", "mtime"]

View File

@@ -7,20 +7,22 @@
import os import os
from xml.etree import ElementTree as ET from xml.etree import ElementTree as ET
import logging import logging
from pathlib import Path
from hscommon.jobprogress import job from hscommon.jobprogress import job
from hscommon.path import Path
from hscommon.util import FileOrPath from hscommon.util import FileOrPath
from hscommon.trans import tr
from . import fs from core import fs
__all__ = [ __all__ = [
'Directories', "Directories",
'DirectoryState', "DirectoryState",
'AlreadyThereError', "AlreadyThereError",
'InvalidPathError', "InvalidPathError",
] ]
class DirectoryState: class DirectoryState:
"""Enum describing how a folder should be considered. """Enum describing how a folder should be considered.
@@ -28,16 +30,20 @@ class DirectoryState:
* DirectoryState.Reference: Scan files, but make sure never to delete any of them * DirectoryState.Reference: Scan files, but make sure never to delete any of them
* DirectoryState.Excluded: Don't scan this folder * DirectoryState.Excluded: Don't scan this folder
""" """
Normal = 0
Reference = 1 NORMAL = 0
Excluded = 2 REFERENCE = 1
EXCLUDED = 2
class AlreadyThereError(Exception): class AlreadyThereError(Exception):
"""The path being added is already in the directory list""" """The path being added is already in the directory list"""
class InvalidPathError(Exception): class InvalidPathError(Exception):
"""The path being added is invalid""" """The path being added is invalid"""
class Directories: class Directories:
"""Holds user folder selection. """Holds user folder selection.
@@ -47,15 +53,17 @@ class Directories:
Then, when the user starts the scan, :meth:`get_files` is called to retrieve all files (wrapped Then, when the user starts the scan, :meth:`get_files` is called to retrieve all files (wrapped
in :mod:`core.fs`) that have to be scanned according to the chosen folders/states. in :mod:`core.fs`) that have to be scanned according to the chosen folders/states.
""" """
#---Override
def __init__(self): # ---Override
def __init__(self, exclude_list=None):
self._dirs = [] self._dirs = []
# {path: state} # {path: state}
self.states = {} self.states = {}
self._exclude_list = exclude_list
def __contains__(self, path): def __contains__(self, path):
for p in self._dirs: for p in self._dirs:
if path in p: if path == p or p in path.parents:
return True return True
return False return False
@@ -68,57 +76,74 @@ class Directories:
def __len__(self): def __len__(self):
return len(self._dirs) return len(self._dirs)
#---Private # ---Private
def _default_state_for_path(self, path): def _default_state_for_path(self, path):
# New logic with regex filters
if self._exclude_list is not None and self._exclude_list.mark_count > 0:
# We iterate even if we only have one item here
for denied_path_re in self._exclude_list.compiled:
if denied_path_re.match(str(path.name)):
return DirectoryState.EXCLUDED
# return # We still use the old logic to force state on hidden dirs
# Override this in subclasses to specify the state of some special folders. # Override this in subclasses to specify the state of some special folders.
if path.name.startswith('.'): # hidden if path.name.startswith("."):
return DirectoryState.Excluded return DirectoryState.EXCLUDED
def _get_files(self, from_path, fileclasses, j): def _get_files(self, from_path, fileclasses, j):
for root, dirs, files in os.walk(str(from_path)): try:
j.check_if_cancelled() with os.scandir(from_path) as iter:
root = Path(root) root_path = Path(from_path)
state = self.get_state(root) state = self.get_state(root_path)
if state == DirectoryState.Excluded: # if we have no un-excluded dirs under this directory skip going deeper
# Recursively get files from folders with lots of subfolder is expensive. However, there skip_dirs = state == DirectoryState.EXCLUDED and not any(
# might be a subfolder in this path that is not excluded. What we want to do is to skim p.parts[: len(root_path.parts)] == root_path.parts for p in self.states
# through self.states and see if we must continue, or we can stop right here to save time )
if not any(p[:len(root)] == root for p in self.states): count = 0
del dirs[:] for item in iter:
try: j.check_if_cancelled()
if state != DirectoryState.Excluded: try:
found_files = [fs.get_file(root + f, fileclasses=fileclasses) for f in files] if item.is_dir():
found_files = [f for f in found_files if f is not None] if skip_dirs:
# In some cases, directories can be considered as files by dupeGuru, which is continue
# why we have this line below. In fact, there only one case: Bundle files under yield from self._get_files(item.path, fileclasses, j)
# OS X... In other situations, this forloop will do nothing. continue
for d in dirs[:]: elif state == DirectoryState.EXCLUDED:
f = fs.get_file(root + d, fileclasses=fileclasses) continue
if f is not None: # File excluding or not
found_files.append(f) if (
dirs.remove(d) self._exclude_list is None
logging.debug("Collected %d files in folder %s", len(found_files), str(from_path)) or not self._exclude_list.mark_count
for file in found_files: or not self._exclude_list.is_excluded(str(from_path), item.name)
file.is_ref = state == DirectoryState.Reference ):
yield file file = fs.get_file(item, fileclasses=fileclasses)
except (EnvironmentError, fs.InvalidPath): if file:
pass file.is_ref = state == DirectoryState.REFERENCE
count += 1
yield file
except (OSError, fs.InvalidPath):
pass
logging.debug(
"Collected %d files in folder %s",
count,
str(root_path),
)
except OSError:
pass
def _get_folders(self, from_folder, j): def _get_folders(self, from_folder, j):
j.check_if_cancelled() j.check_if_cancelled()
try: try:
for subfolder in from_folder.subfolders: for subfolder in from_folder.subfolders:
for folder in self._get_folders(subfolder, j): yield from self._get_folders(subfolder, j)
yield folder
state = self.get_state(from_folder.path) state = self.get_state(from_folder.path)
if state != DirectoryState.Excluded: if state != DirectoryState.EXCLUDED:
from_folder.is_ref = state == DirectoryState.Reference from_folder.is_ref = state == DirectoryState.REFERENCE
logging.debug("Yielding Folder %r state: %d", from_folder, state) logging.debug("Yielding Folder %r state: %d", from_folder, state)
yield from_folder yield from_folder
except (EnvironmentError, fs.InvalidPath): except (OSError, fs.InvalidPath):
pass pass
#---Public # ---Public
def add_path(self, path): def add_path(self, path):
"""Adds ``path`` to self, if not already there. """Adds ``path`` to self, if not already there.
@@ -133,7 +158,7 @@ class Directories:
raise AlreadyThereError() raise AlreadyThereError()
if not path.exists(): if not path.exists():
raise InvalidPathError() raise InvalidPathError()
self._dirs = [p for p in self._dirs if p not in path] self._dirs = [p for p in self._dirs if path not in p.parents]
self._dirs.append(path) self._dirs.append(path)
@staticmethod @staticmethod
@@ -144,10 +169,10 @@ class Directories:
:rtype: list of Path :rtype: list of Path
""" """
try: try:
subpaths = [p for p in path.listdir() if p.isdir()] subpaths = [p for p in path.glob("*") if p.is_dir()]
subpaths.sort(key=lambda x: x.name.lower()) subpaths.sort(key=lambda x: x.name.lower())
return subpaths return subpaths
except EnvironmentError: except OSError:
return [] return []
def get_files(self, fileclasses=None, j=job.nulljob): def get_files(self, fileclasses=None, j=job.nulljob):
@@ -157,8 +182,12 @@ class Directories:
""" """
if fileclasses is None: if fileclasses is None:
fileclasses = [fs.File] fileclasses = [fs.File]
file_count = 0
for path in self._dirs: for path in self._dirs:
for file in self._get_files(path, fileclasses=fileclasses, j=j): for file in self._get_files(path, fileclasses=fileclasses, j=j):
file_count += 1
if type(j) != job.NullJob:
j.set_progress(-1, tr("Collected {} files to scan").format(file_count))
yield file yield file
def get_folders(self, folderclass=None, j=job.nulljob): def get_folders(self, folderclass=None, j=job.nulljob):
@@ -168,9 +197,13 @@ class Directories:
""" """
if folderclass is None: if folderclass is None:
folderclass = fs.Folder folderclass = fs.Folder
folder_count = 0
for path in self._dirs: for path in self._dirs:
from_folder = folderclass(path) from_folder = folderclass(path)
for folder in self._get_folders(from_folder, j): for folder in self._get_folders(from_folder, j):
folder_count += 1
if type(j) != job.NullJob:
j.set_progress(-1, tr("Collected {} folders to scan").format(folder_count))
yield folder yield folder
def get_state(self, path): def get_state(self, path):
@@ -181,13 +214,16 @@ class Directories:
# direct match? easy result. # direct match? easy result.
if path in self.states: if path in self.states:
return self.states[path] return self.states[path]
state = self._default_state_for_path(path) or DirectoryState.Normal state = self._default_state_for_path(path) or DirectoryState.NORMAL
prevlen = 0 # Save non-default states in cache, necessary for _get_files()
# we loop through the states to find the longest matching prefix if state != DirectoryState.NORMAL:
for p, s in self.states.items(): self.states[path] = state
if p.is_parent_of(path) and len(p) > prevlen: return state
prevlen = len(p) # find the longest parent path that is in states and return that state if found
state = s # NOTE: path.parents is ordered longest to shortest
for parent_path in path.parents:
if parent_path in self.states:
return self.states[parent_path]
return state return state
def has_any_file(self): def has_any_file(self):
@@ -212,21 +248,21 @@ class Directories:
root = ET.parse(infile).getroot() root = ET.parse(infile).getroot()
except Exception: except Exception:
return return
for rdn in root.getiterator('root_directory'): for rdn in root.iter("root_directory"):
attrib = rdn.attrib attrib = rdn.attrib
if 'path' not in attrib: if "path" not in attrib:
continue continue
path = attrib['path'] path = attrib["path"]
try: try:
self.add_path(Path(path)) self.add_path(Path(path))
except (AlreadyThereError, InvalidPathError): except (AlreadyThereError, InvalidPathError):
pass pass
for sn in root.getiterator('state'): for sn in root.iter("state"):
attrib = sn.attrib attrib = sn.attrib
if not ('path' in attrib and 'value' in attrib): if not ("path" in attrib and "value" in attrib):
continue continue
path = attrib['path'] path = attrib["path"]
state = attrib['value'] state = attrib["value"]
self.states[Path(path)] = int(state) self.states[Path(path)] = int(state)
def save_to_file(self, outfile): def save_to_file(self, outfile):
@@ -234,17 +270,17 @@ class Directories:
:param file outfile: path or file pointer to XML file to save to. :param file outfile: path or file pointer to XML file to save to.
""" """
with FileOrPath(outfile, 'wb') as fp: with FileOrPath(outfile, "wb") as fp:
root = ET.Element('directories') root = ET.Element("directories")
for root_path in self: for root_path in self:
root_path_node = ET.SubElement(root, 'root_directory') root_path_node = ET.SubElement(root, "root_directory")
root_path_node.set('path', str(root_path)) root_path_node.set("path", str(root_path))
for path, state in self.states.items(): for path, state in self.states.items():
state_node = ET.SubElement(root, 'state') state_node = ET.SubElement(root, "state")
state_node.set('path', str(path)) state_node.set("path", str(path))
state_node.set('value', str(state)) state_node.set("value", str(state))
tree = ET.ElementTree(root) tree = ET.ElementTree(root)
tree.write(fp, encoding='utf-8') tree.write(fp, encoding="utf-8")
def set_state(self, path, state): def set_state(self, path, state):
"""Set the state of folder at ``path``. """Set the state of folder at ``path``.
@@ -256,7 +292,6 @@ class Directories:
if self.get_state(path) == state: if self.get_state(path) == state:
return return
for iter_path in list(self.states.keys()): for iter_path in list(self.states.keys()):
if path.is_parent_of(iter_path): if path in iter_path.parents:
del self.states[iter_path] del self.states[iter_path]
self.states[path] = state self.states[path] = state

View File

@@ -24,18 +24,33 @@ from hscommon.jobprogress import job
) = range(3) ) = range(3)
JOB_REFRESH_RATE = 100 JOB_REFRESH_RATE = 100
PROGRESS_MESSAGE = tr("%d matches found from %d groups")
def getwords(s): def getwords(s):
# We decompose the string so that ascii letters with accents can be part of the word. # We decompose the string so that ascii letters with accents can be part of the word.
s = normalize('NFD', s) s = normalize("NFD", s)
s = multi_replace(s, "-_&+():;\\[]{}.,<>/?~!@#$*", ' ').lower() s = multi_replace(s, "-_&+():;\\[]{}.,<>/?~!@#$*", " ").lower()
s = ''.join(c for c in s if c in string.ascii_letters + string.digits + string.whitespace) # logging.debug(f"DEBUG chars for: {s}\n"
return [_f for _f in s.split(' ') if _f] # remove empty elements # f"{[c for c in s if ord(c) != 32]}\n"
# f"{[ord(c) for c in s if ord(c) != 32]}")
# HACK We shouldn't ignore non-ascii characters altogether. Any Unicode char
# above common european characters that cannot be "sanitized" (ie. stripped
# of their accents, etc.) are preserved as is. The arbitrary limit is
# obtained from this one: ord("\u037e") GREEK QUESTION MARK
s = "".join(
c
for c in s
if (ord(c) <= 894 and c in string.ascii_letters + string.digits + string.whitespace) or ord(c) > 894
)
return [_f for _f in s.split(" ") if _f] # remove empty elements
def getfields(s): def getfields(s):
fields = [getwords(field) for field in s.split(' - ')] fields = [getwords(field) for field in s.split(" - ")]
return [_f for _f in fields if _f] return [_f for _f in fields if _f]
def unpack_fields(fields): def unpack_fields(fields):
result = [] result = []
for field in fields: for field in fields:
@@ -45,6 +60,7 @@ def unpack_fields(fields):
result.append(field) result.append(field)
return result return result
def compare(first, second, flags=()): def compare(first, second, flags=()):
"""Returns the % of words that match between ``first`` and ``second`` """Returns the % of words that match between ``first`` and ``second``
@@ -55,11 +71,11 @@ def compare(first, second, flags=()):
return 0 return 0
if any(isinstance(element, list) for element in first): if any(isinstance(element, list) for element in first):
return compare_fields(first, second, flags) return compare_fields(first, second, flags)
second = second[:] #We must use a copy of second because we remove items from it second = second[:] # We must use a copy of second because we remove items from it
match_similar = MATCH_SIMILAR_WORDS in flags match_similar = MATCH_SIMILAR_WORDS in flags
weight_words = WEIGHT_WORDS in flags weight_words = WEIGHT_WORDS in flags
joined = first + second joined = first + second
total_count = (sum(len(word) for word in joined) if weight_words else len(joined)) total_count = sum(len(word) for word in joined) if weight_words else len(joined)
match_count = 0 match_count = 0
in_order = True in_order = True
for word in first: for word in first:
@@ -71,12 +87,13 @@ def compare(first, second, flags=()):
if second[0] != word: if second[0] != word:
in_order = False in_order = False
second.remove(word) second.remove(word)
match_count += (len(word) if weight_words else 1) match_count += len(word) if weight_words else 1
result = round(((match_count * 2) / total_count) * 100) result = round(((match_count * 2) / total_count) * 100)
if (result == 100) and (not in_order): if (result == 100) and (not in_order):
result = 99 # We cannot consider a match exact unless the ordering is the same result = 99 # We cannot consider a match exact unless the ordering is the same
return result return result
def compare_fields(first, second, flags=()): def compare_fields(first, second, flags=()):
"""Returns the score for the lowest matching :ref:`fields`. """Returns the score for the lowest matching :ref:`fields`.
@@ -87,23 +104,24 @@ def compare_fields(first, second, flags=()):
return 0 return 0
if NO_FIELD_ORDER in flags: if NO_FIELD_ORDER in flags:
results = [] results = []
#We don't want to remove field directly in the list. We must work on a copy. # We don't want to remove field directly in the list. We must work on a copy.
second = second[:] second = second[:]
for field1 in first: for field1 in first:
max = 0 max_score = 0
matched_field = None matched_field = None
for field2 in second: for field2 in second:
r = compare(field1, field2, flags) r = compare(field1, field2, flags)
if r > max: if r > max_score:
max = r max_score = r
matched_field = field2 matched_field = field2
results.append(max) results.append(max_score)
if matched_field: if matched_field:
second.remove(matched_field) second.remove(matched_field)
else: else:
results = [compare(field1, field2, flags) for field1, field2 in zip(first, second)] results = [compare(field1, field2, flags) for field1, field2 in zip(first, second)]
return min(results) if results else 0 return min(results) if results else 0
def build_word_dict(objects, j=job.nulljob): def build_word_dict(objects, j=job.nulljob):
"""Returns a dict of objects mapped by their words. """Returns a dict of objects mapped by their words.
@@ -113,11 +131,12 @@ def build_word_dict(objects, j=job.nulljob):
The result will be a dict with words as keys, lists of objects as values. The result will be a dict with words as keys, lists of objects as values.
""" """
result = defaultdict(set) result = defaultdict(set)
for object in j.iter_with_progress(objects, 'Prepared %d/%d files', JOB_REFRESH_RATE): for object in j.iter_with_progress(objects, "Prepared %d/%d files", JOB_REFRESH_RATE):
for word in unpack_fields(object.words): for word in unpack_fields(object.words):
result[word].add(object) result[word].add(object)
return result return result
def merge_similar_words(word_dict): def merge_similar_words(word_dict):
"""Take all keys in ``word_dict`` that are similar, and merge them together. """Take all keys in ``word_dict`` that are similar, and merge them together.
@@ -126,7 +145,7 @@ def merge_similar_words(word_dict):
a word equal to the other. a word equal to the other.
""" """
keys = list(word_dict.keys()) keys = list(word_dict.keys())
keys.sort(key=len)# we want the shortest word to stay keys.sort(key=len) # we want the shortest word to stay
while keys: while keys:
key = keys.pop(0) key = keys.pop(0)
similars = difflib.get_close_matches(key, keys, 100, 0.8) similars = difflib.get_close_matches(key, keys, 100, 0.8)
@@ -138,6 +157,7 @@ def merge_similar_words(word_dict):
del word_dict[similar] del word_dict[similar]
keys.remove(similar) keys.remove(similar)
def reduce_common_words(word_dict, threshold): def reduce_common_words(word_dict, threshold):
"""Remove all objects from ``word_dict`` values where the object count >= ``threshold`` """Remove all objects from ``word_dict`` values where the object count >= ``threshold``
@@ -146,7 +166,7 @@ def reduce_common_words(word_dict, threshold):
The exception to this removal are the objects where all the words of the object are common. The exception to this removal are the objects where all the words of the object are common.
Because if we remove them, we will miss some duplicates! Because if we remove them, we will miss some duplicates!
""" """
uncommon_words = set(word for word, objects in word_dict.items() if len(objects) < threshold) uncommon_words = {word for word, objects in word_dict.items() if len(objects) < threshold}
for word, objects in list(word_dict.items()): for word, objects in list(word_dict.items()):
if len(objects) < threshold: if len(objects) < threshold:
continue continue
@@ -159,11 +179,13 @@ def reduce_common_words(word_dict, threshold):
else: else:
del word_dict[word] del word_dict[word]
# Writing docstrings in a namedtuple is tricky. From Python 3.3, it's possible to set __doc__, but # Writing docstrings in a namedtuple is tricky. From Python 3.3, it's possible to set __doc__, but
# some research allowed me to find a more elegant solution, which is what is done here. See # some research allowed me to find a more elegant solution, which is what is done here. See
# http://stackoverflow.com/questions/1606436/adding-docstrings-to-namedtuples-in-python # http://stackoverflow.com/questions/1606436/adding-docstrings-to-namedtuples-in-python
class Match(namedtuple('Match', 'first second percentage')):
class Match(namedtuple("Match", "first second percentage")):
"""Represents a match between two :class:`~core.fs.File`. """Represents a match between two :class:`~core.fs.File`.
Regarless of the matching method, when two files are determined to match, a Match pair is created, Regarless of the matching method, when two files are determined to match, a Match pair is created,
@@ -182,16 +204,24 @@ class Match(namedtuple('Match', 'first second percentage')):
their match level according to the scan method which found the match. int from 1 to 100. For their match level according to the scan method which found the match. int from 1 to 100. For
exact scan methods, such as Contents scans, this will always be 100. exact scan methods, such as Contents scans, this will always be 100.
""" """
__slots__ = () __slots__ = ()
def get_match(first, second, flags=()): def get_match(first, second, flags=()):
#it is assumed here that first and second both have a "words" attribute # it is assumed here that first and second both have a "words" attribute
percentage = compare(first.words, second.words, flags) percentage = compare(first.words, second.words, flags)
return Match(first, second, percentage) return Match(first, second, percentage)
def getmatches( def getmatches(
objects, min_match_percentage=0, match_similar_words=False, weight_words=False, objects,
no_field_order=False, j=job.nulljob): min_match_percentage=0,
match_similar_words=False,
weight_words=False,
no_field_order=False,
j=job.nulljob,
):
"""Returns a list of :class:`Match` within ``objects`` after fuzzily matching their words. """Returns a list of :class:`Match` within ``objects`` after fuzzily matching their words.
:param objects: List of :class:`~core.fs.File` to match. :param objects: List of :class:`~core.fs.File` to match.
@@ -206,7 +236,7 @@ def getmatches(
j = j.start_subjob(2) j = j.start_subjob(2)
sj = j.start_subjob(2) sj = j.start_subjob(2)
for o in objects: for o in objects:
if not hasattr(o, 'words'): if not hasattr(o, "words"):
o.words = getwords(o.name) o.words = getwords(o.name)
word_dict = build_word_dict(objects, sj) word_dict = build_word_dict(objects, sj)
reduce_common_words(word_dict, COMMON_WORD_THRESHOLD) reduce_common_words(word_dict, COMMON_WORD_THRESHOLD)
@@ -219,10 +249,11 @@ def getmatches(
match_flags.append(MATCH_SIMILAR_WORDS) match_flags.append(MATCH_SIMILAR_WORDS)
if no_field_order: if no_field_order:
match_flags.append(NO_FIELD_ORDER) match_flags.append(NO_FIELD_ORDER)
j.start_job(len(word_dict), tr("0 matches found")) j.start_job(len(word_dict), PROGRESS_MESSAGE % (0, 0))
compared = defaultdict(set) compared = defaultdict(set)
result = [] result = []
try: try:
word_count = 0
# This whole 'popping' thing is there to avoid taking too much memory at the same time. # This whole 'popping' thing is there to avoid taking too much memory at the same time.
while word_dict: while word_dict:
items = word_dict.popitem()[1] items = word_dict.popitem()[1]
@@ -237,39 +268,54 @@ def getmatches(
result.append(m) result.append(m)
if len(result) >= LIMIT: if len(result) >= LIMIT:
return result return result
j.add_progress(desc=tr("%d matches found") % len(result)) word_count += 1
j.add_progress(desc=PROGRESS_MESSAGE % (len(result), word_count))
except MemoryError: except MemoryError:
# This is the place where the memory usage is at its peak during the scan. # This is the place where the memory usage is at its peak during the scan.
# Just continue the process with an incomplete list of matches. # Just continue the process with an incomplete list of matches.
del compared # This should give us enough room to call logging. del compared # This should give us enough room to call logging.
logging.warning('Memory Overflow. Matches: %d. Word dict: %d' % (len(result), len(word_dict))) logging.warning("Memory Overflow. Matches: %d. Word dict: %d" % (len(result), len(word_dict)))
return result return result
return result return result
def getmatches_by_contents(files, j=job.nulljob):
def getmatches_by_contents(files, bigsize=0, j=job.nulljob):
"""Returns a list of :class:`Match` within ``files`` if their contents is the same. """Returns a list of :class:`Match` within ``files`` if their contents is the same.
:param bigsize: The size in bytes over which we consider files big enough to
justify taking samples of the file for hashing. If 0, compute digest as usual.
:param j: A :ref:`job progress instance <jobs>`. :param j: A :ref:`job progress instance <jobs>`.
""" """
size2files = defaultdict(set) size2files = defaultdict(set)
for f in files: for f in files:
if f.size: size2files[f.size].add(f)
size2files[f.size].add(f)
del files del files
possible_matches = [files for files in size2files.values() if len(files) > 1] possible_matches = [files for files in size2files.values() if len(files) > 1]
del size2files del size2files
result = [] result = []
j.start_job(len(possible_matches), tr("0 matches found")) j.start_job(len(possible_matches), PROGRESS_MESSAGE % (0, 0))
group_count = 0
for group in possible_matches: for group in possible_matches:
for first, second in itertools.combinations(group, 2): for first, second in itertools.combinations(group, 2):
if first.is_ref and second.is_ref: if first.is_ref and second.is_ref:
continue # Don't spend time comparing two ref pics together. continue # Don't spend time comparing two ref pics together.
if first.md5partial == second.md5partial: if first.size == 0 and second.size == 0:
if first.md5 == second.md5: # skip hashing for zero length files
result.append(Match(first, second, 100)) result.append(Match(first, second, 100))
j.add_progress(desc=tr("%d matches found") % len(result)) continue
# if digests are the same (and not None) then files match
if first.digest_partial == second.digest_partial and first.digest_partial is not None:
if bigsize > 0 and first.size > bigsize:
if first.digest_samples == second.digest_samples and first.digest_samples is not None:
result.append(Match(first, second, 100))
else:
if first.digest == second.digest and first.digest is not None:
result.append(Match(first, second, 100))
group_count += 1
j.add_progress(desc=PROGRESS_MESSAGE % (len(result), group_count))
return result return result
class Group: class Group:
"""A group of :class:`~core.fs.File` that match together. """A group of :class:`~core.fs.File` that match together.
@@ -297,7 +343,8 @@ class Group:
Average match percentage of match pairs containing :attr:`ref`. Average match percentage of match pairs containing :attr:`ref`.
""" """
#---Override
# ---Override
def __init__(self): def __init__(self):
self._clear() self._clear()
@@ -313,7 +360,7 @@ class Group:
def __len__(self): def __len__(self):
return len(self.ordered) return len(self.ordered)
#---Private # ---Private
def _clear(self): def _clear(self):
self._percentage = None self._percentage = None
self._matches_for_ref = None self._matches_for_ref = None
@@ -328,7 +375,7 @@ class Group:
self._matches_for_ref = [match for match in self.matches if ref in match] self._matches_for_ref = [match for match in self.matches if ref in match]
return self._matches_for_ref return self._matches_for_ref
#---Public # ---Public
def add_match(self, match): def add_match(self, match):
"""Adds ``match`` to internal match list and possibly add duplicates to the group. """Adds ``match`` to internal match list and possibly add duplicates to the group.
@@ -339,6 +386,7 @@ class Group:
:param tuple match: pair of :class:`~core.fs.File` to add :param tuple match: pair of :class:`~core.fs.File` to add
""" """
def add_candidate(item, match): def add_candidate(item, match):
matches = self.candidates[item] matches = self.candidates[item]
matches.add(match) matches.add(match)
@@ -362,14 +410,13 @@ class Group:
You can call this after the duplicate scanning process to free a bit of memory. You can call this after the duplicate scanning process to free a bit of memory.
""" """
discarded = set(m for m in self.matches if not all(obj in self.unordered for obj in [m.first, m.second])) discarded = {m for m in self.matches if not all(obj in self.unordered for obj in [m.first, m.second])}
self.matches -= discarded self.matches -= discarded
self.candidates = defaultdict(set) self.candidates = defaultdict(set)
return discarded return discarded
def get_match_of(self, item): def get_match_of(self, item):
"""Returns the match pair between ``item`` and :attr:`ref`. """Returns the match pair between ``item`` and :attr:`ref`."""
"""
if item is self.ref: if item is self.ref:
return return
for m in self._get_matches_for_ref(): for m in self._get_matches_for_ref():
@@ -385,8 +432,7 @@ class Group:
""" """
# tie_breaker(ref, dupe) --> True if dupe should be ref # tie_breaker(ref, dupe) --> True if dupe should be ref
# Returns True if anything changed during prioritization. # Returns True if anything changed during prioritization.
master_key_func = lambda x: (-x.is_ref, key_func(x)) new_order = sorted(self.ordered, key=lambda x: (-x.is_ref, key_func(x)))
new_order = sorted(self.ordered, key=master_key_func)
changed = new_order != self.ordered changed = new_order != self.ordered
self.ordered = new_order self.ordered = new_order
if tie_breaker is None: if tie_breaker is None:
@@ -409,17 +455,16 @@ class Group:
self.unordered.remove(item) self.unordered.remove(item)
self._percentage = None self._percentage = None
self._matches_for_ref = None self._matches_for_ref = None
if (len(self) > 1) and any(not getattr(item, 'is_ref', False) for item in self): if (len(self) > 1) and any(not getattr(item, "is_ref", False) for item in self):
if discard_matches: if discard_matches:
self.matches = set(m for m in self.matches if item not in m) self.matches = {m for m in self.matches if item not in m}
else: else:
self._clear() self._clear()
except ValueError: except ValueError:
pass pass
def switch_ref(self, with_dupe): def switch_ref(self, with_dupe):
"""Make the :attr:`ref` dupe of the group switch position with ``with_dupe``. """Make the :attr:`ref` dupe of the group switch position with ``with_dupe``."""
"""
if self.ref.is_ref: if self.ref.is_ref:
return False return False
try: try:
@@ -485,7 +530,7 @@ def get_groups(matches):
del dupe2group del dupe2group
del matches del matches
# should free enough memory to continue # should free enough memory to continue
logging.warning('Memory Overflow. Groups: {0}'.format(len(groups))) logging.warning(f"Memory Overflow. Groups: {len(groups)}")
# Now that we have a group, we have to discard groups' matches and see if there're any "orphan" # Now that we have a group, we have to discard groups' matches and see if there're any "orphan"
# matches, that is, matches that were candidate in a group but that none of their 2 files were # matches, that is, matches that were candidate in a group but that none of their 2 files were
# accepted in the group. With these orphan groups, it's safe to build additional groups # accepted in the group. With these orphan groups, it's safe to build additional groups
@@ -493,9 +538,8 @@ def get_groups(matches):
orphan_matches = [] orphan_matches = []
for group in groups: for group in groups:
orphan_matches += { orphan_matches += {
m for m in group.discard_matches() m for m in group.discard_matches() if not any(obj in matched_files for obj in [m.first, m.second])
if not any(obj in matched_files for obj in [m.first, m.second])
} }
if groups and orphan_matches: if groups and orphan_matches:
groups += get_groups(orphan_matches) # no job, as it isn't supposed to take a long time groups += get_groups(orphan_matches) # no job, as it isn't supposed to take a long time
return groups return groups

513
core/exclude.py Normal file
View File

@@ -0,0 +1,513 @@
# This software is licensed under the "GPLv3" License as described in the "LICENSE" file,
# which should be included with this package. The terms are also available at
# http://www.gnu.org/licenses/gpl-3.0.html
from core.markable import Markable
from xml.etree import ElementTree as ET
# TODO: perhaps use regex module for better Unicode support? https://pypi.org/project/regex/
# also https://pypi.org/project/re2/
# TODO update the Result list with newly added regexes if possible
import re
from os import sep
import logging
import functools
from hscommon.util import FileOrPath
from hscommon.plat import ISWINDOWS
import time
default_regexes = [
r"^thumbs\.db$", # Obsolete after WindowsXP
r"^desktop\.ini$", # Windows metadata
r"^\.DS_Store$", # MacOS metadata
r"^\.Trash\-.*", # Linux trash directories
r"^\$Recycle\.Bin$", # Windows
r"^\..*", # Hidden files on Unix-like
]
# These are too broad
forbidden_regexes = [r".*", r"\/.*", r".*\/.*", r".*\\\\.*", r".*\..*"]
def timer(func):
@functools.wraps(func)
def wrapper_timer(*args):
start = time.perf_counter_ns()
value = func(*args)
end = time.perf_counter_ns()
print(f"DEBUG: func {func.__name__!r} took {end - start} ns.")
return value
return wrapper_timer
def memoize(func):
func.cache = dict()
@functools.wraps(func)
def _memoize(*args):
if args not in func.cache:
func.cache[args] = func(*args)
return func.cache[args]
return _memoize
class AlreadyThereException(Exception):
"""Expression already in the list"""
def __init__(self, arg="Expression is already in excluded list."):
super().__init__(arg)
class ExcludeList(Markable):
"""A list of lists holding regular expression strings and the compiled re.Pattern"""
# Used to filter out directories and files that we would rather avoid scanning.
# The list() class allows us to preserve item order without too much hassle.
# The downside is we have to compare strings every time we look for an item in the list
# since we use regex strings as keys.
# If _use_union is True, the compiled regexes will be combined into one single
# Pattern instead of separate Patterns which may or may not give better
# performance compared to looping through each Pattern individually.
# ---Override
def __init__(self, union_regex=True):
Markable.__init__(self)
self._use_union = union_regex
# list([str regex, bool iscompilable, re.error exception, Pattern compiled], ...)
self._excluded = []
self._excluded_compiled = set()
self._dirty = True
def __iter__(self):
"""Iterate in order."""
for item in self._excluded:
regex = item[0]
yield self.is_marked(regex), regex
def __contains__(self, item):
return self.has_entry(item)
def __len__(self):
"""Returns the total number of regexes regardless of mark status."""
return len(self._excluded)
def __getitem__(self, key):
"""Returns the list item corresponding to key."""
for item in self._excluded:
if item[0] == key:
return item
raise KeyError(f"Key {key} is not in exclusion list.")
def __setitem__(self, key, value):
# TODO if necessary
pass
def __delitem__(self, key):
# TODO if necessary
pass
def get_compiled(self, key):
"""Returns the (precompiled) Pattern for key"""
return self.__getitem__(key)[3]
def is_markable(self, regex):
return self._is_markable(regex)
def _is_markable(self, regex):
"""Return the cached result of "compilable" property"""
for item in self._excluded:
if item[0] == regex:
return item[1]
return False # should not be necessary, the regex SHOULD be in there
def _did_mark(self, regex):
self._add_compiled(regex)
def _did_unmark(self, regex):
self._remove_compiled(regex)
def _add_compiled(self, regex):
self._dirty = True
if self._use_union:
return
for item in self._excluded:
# FIXME probably faster to just rebuild the set from the compiled instead of comparing strings
if item[0] == regex:
# no need to test if already present since it's a set()
self._excluded_compiled.add(item[3])
break
def _remove_compiled(self, regex):
self._dirty = True
if self._use_union:
return
for item in self._excluded_compiled:
if regex in item.pattern:
self._excluded_compiled.remove(item)
break
# @timer
@memoize
def _do_compile(self, expr):
return re.compile(expr)
# @timer
# @memoize # probably not worth memoizing this one if we memoize the above
def compile_re(self, regex):
compiled = None
try:
compiled = self._do_compile(regex)
except Exception as e:
return False, e, compiled
return True, None, compiled
def error(self, regex):
"""Return the compilation error Exception for regex.
It should have a "msg" attr."""
for item in self._excluded:
if item[0] == regex:
return item[2]
def build_compiled_caches(self, union=False):
if not union:
self._cached_compiled_files = [x for x in self._excluded_compiled if not has_sep(x.pattern)]
self._cached_compiled_paths = [x for x in self._excluded_compiled if has_sep(x.pattern)]
self._dirty = False
return
marked_count = [x for marked, x in self if marked]
# If there is no item, the compiled Pattern will be '' and match everything!
if not marked_count:
self._cached_compiled_union_all = []
self._cached_compiled_union_files = []
self._cached_compiled_union_paths = []
else:
# HACK returned as a tuple to get a free iterator and keep interface
# the same regardless of whether the client asked for union or not
self._cached_compiled_union_all = (re.compile("|".join(marked_count)),)
files_marked = [x for x in marked_count if not has_sep(x)]
if not files_marked:
self._cached_compiled_union_files = tuple()
else:
self._cached_compiled_union_files = (re.compile("|".join(files_marked)),)
paths_marked = [x for x in marked_count if has_sep(x)]
if not paths_marked:
self._cached_compiled_union_paths = tuple()
else:
self._cached_compiled_union_paths = (re.compile("|".join(paths_marked)),)
self._dirty = False
@property
def compiled(self):
"""Should be used by other classes to retrieve the up-to-date list of patterns."""
if self._use_union:
if self._dirty:
self.build_compiled_caches(self._use_union)
return self._cached_compiled_union_all
return self._excluded_compiled
@property
def compiled_files(self):
"""When matching against filenames only, we probably won't be seeing any
directory separator, so we filter out regexes with os.sep in them.
The interface should be expected to be a generator, even if it returns only
one item (one Pattern in the union case)."""
if self._dirty:
self.build_compiled_caches(self._use_union)
return self._cached_compiled_union_files if self._use_union else self._cached_compiled_files
@property
def compiled_paths(self):
"""Returns patterns with only separators in them, for more precise filtering."""
if self._dirty:
self.build_compiled_caches(self._use_union)
return self._cached_compiled_union_paths if self._use_union else self._cached_compiled_paths
# ---Public
def add(self, regex, forced=False):
"""This interface should throw exceptions if there is an error during
regex compilation"""
if self.has_entry(regex):
# This exception should never be ignored
raise AlreadyThereException()
if regex in forbidden_regexes:
raise ValueError("Forbidden (dangerous) expression.")
iscompilable, exception, compiled = self.compile_re(regex)
if not iscompilable and not forced:
# This exception can be ignored, but taken into account
# to avoid adding to compiled set
raise exception
else:
self._do_add(regex, iscompilable, exception, compiled)
def _do_add(self, regex, iscompilable, exception, compiled):
# We need to insert at the top
self._excluded.insert(0, [regex, iscompilable, exception, compiled])
@property
def marked_count(self):
"""Returns the number of marked regexes only."""
return len([x for marked, x in self if marked])
def has_entry(self, regex):
for item in self._excluded:
if regex == item[0]:
return True
return False
def is_excluded(self, dirname, filename):
"""Return True if the file or the absolute path to file is supposed to be
filtered out, False otherwise."""
matched = False
for expr in self.compiled_files:
if expr.fullmatch(filename):
matched = True
break
if not matched:
for expr in self.compiled_paths:
if expr.fullmatch(dirname + sep + filename):
matched = True
break
return matched
def remove(self, regex):
for item in self._excluded:
if item[0] == regex:
self._excluded.remove(item)
self._remove_compiled(regex)
def rename(self, regex, newregex):
if regex == newregex:
return
found = False
was_marked = False
is_compilable = False
for item in self._excluded:
if item[0] == regex:
found = True
was_marked = self.is_marked(regex)
is_compilable, exception, compiled = self.compile_re(newregex)
# We overwrite the found entry
self._excluded[self._excluded.index(item)] = [newregex, is_compilable, exception, compiled]
self._remove_compiled(regex)
break
if not found:
return
if is_compilable:
self._add_compiled(newregex)
if was_marked:
# Not marked by default when added, add it back
self.mark(newregex)
# def change_index(self, regex, new_index):
# """Internal list must be a list, not dict."""
# item = self._excluded.pop(regex)
# self._excluded.insert(new_index, item)
def restore_defaults(self):
for _, regex in self:
if regex not in default_regexes:
self.unmark(regex)
for default_regex in default_regexes:
if not self.has_entry(default_regex):
self.add(default_regex)
self.mark(default_regex)
def load_from_xml(self, infile):
"""Loads the ignore list from a XML created with save_to_xml.
infile can be a file object or a filename.
"""
try:
root = ET.parse(infile).getroot()
except Exception as e:
logging.warning(f"Error while loading {infile}: {e}")
self.restore_defaults()
return e
marked = set()
exclude_elems = (e for e in root if e.tag == "exclude")
for exclude_item in exclude_elems:
regex_string = exclude_item.get("regex")
if not regex_string:
continue
try:
# "forced" avoids compilation exceptions and adds anyway
self.add(regex_string, forced=True)
except AlreadyThereException:
logging.error(
f'Regex "{regex_string}" \
loaded from XML was already present in the list.'
)
continue
if exclude_item.get("marked") == "y":
marked.add(regex_string)
for item in marked:
self.mark(item)
def save_to_xml(self, outfile):
"""Create a XML file that can be used by load_from_xml.
outfile can be a file object or a filename."""
root = ET.Element("exclude_list")
# reversed in order to keep order of entries when reloading from xml later
for item in reversed(self._excluded):
exclude_node = ET.SubElement(root, "exclude")
exclude_node.set("regex", str(item[0]))
exclude_node.set("marked", ("y" if self.is_marked(item[0]) else "n"))
tree = ET.ElementTree(root)
with FileOrPath(outfile, "wb") as fp:
tree.write(fp, encoding="utf-8")
class ExcludeDict(ExcludeList):
"""Exclusion list holding a set of regular expressions as keys, the compiled
Pattern, compilation error and compilable boolean as values."""
# Implemntation around a dictionary instead of a list, which implies
# to keep the index of each string-key as its sub-element and keep it updated
# whenever insert/remove is done.
def __init__(self, union_regex=False):
Markable.__init__(self)
self._use_union = union_regex
# { "regex string":
# {
# "index": int,
# "compilable": bool,
# "error": str,
# "compiled": Pattern or None
# }
# }
self._excluded = {}
self._excluded_compiled = set()
self._dirty = True
def __iter__(self):
"""Iterate in order."""
for regex in ordered_keys(self._excluded):
yield self.is_marked(regex), regex
def __getitem__(self, key):
"""Returns the dict item correponding to key"""
return self._excluded.__getitem__(key)
def get_compiled(self, key):
"""Returns the compiled item for key"""
return self.__getitem__(key).get("compiled")
def is_markable(self, regex):
return self._is_markable(regex)
def _is_markable(self, regex):
"""Return the cached result of "compilable" property"""
exists = self._excluded.get(regex)
if exists:
return exists.get("compilable")
return False
def _add_compiled(self, regex):
self._dirty = True
if self._use_union:
return
try:
self._excluded_compiled.add(self._excluded.get(regex).get("compiled"))
except Exception as e:
logging.error(f"Exception while adding regex {regex} to compiled set: {e}")
return
def is_compilable(self, regex):
"""Returns the cached "compilable" value"""
return self._excluded[regex]["compilable"]
def error(self, regex):
"""Return the compilation error message for regex string"""
return self._excluded.get(regex).get("error")
# ---Public
def _do_add(self, regex, iscompilable, exception, compiled):
# We always insert at the top, so index should be 0
# and other indices should be pushed by one
for value in self._excluded.values():
value["index"] += 1
self._excluded[regex] = {"index": 0, "compilable": iscompilable, "error": exception, "compiled": compiled}
def has_entry(self, regex):
if regex in self._excluded.keys():
return True
return False
def remove(self, regex):
old_value = self._excluded.pop(regex)
# Bring down all indices which where above it
index = old_value["index"]
if index == len(self._excluded) - 1: # we start at 0...
# Old index was at the end, no need to update other indices
self._remove_compiled(regex)
return
for value in self._excluded.values():
if value.get("index") > old_value["index"]:
value["index"] -= 1
self._remove_compiled(regex)
def rename(self, regex, newregex):
if regex == newregex or regex not in self._excluded.keys():
return
was_marked = self.is_marked(regex)
previous = self._excluded.pop(regex)
iscompilable, error, compiled = self.compile_re(newregex)
self._excluded[newregex] = {
"index": previous.get("index"),
"compilable": iscompilable,
"error": error,
"compiled": compiled,
}
self._remove_compiled(regex)
if iscompilable:
self._add_compiled(newregex)
if was_marked:
self.mark(newregex)
def save_to_xml(self, outfile):
"""Create a XML file that can be used by load_from_xml.
outfile can be a file object or a filename.
"""
root = ET.Element("exclude_list")
# reversed in order to keep order of entries when reloading from xml later
reversed_list = []
for key in ordered_keys(self._excluded):
reversed_list.append(key)
for item in reversed(reversed_list):
exclude_node = ET.SubElement(root, "exclude")
exclude_node.set("regex", str(item))
exclude_node.set("marked", ("y" if self.is_marked(item) else "n"))
tree = ET.ElementTree(root)
with FileOrPath(outfile, "wb") as fp:
tree.write(fp, encoding="utf-8")
def ordered_keys(_dict):
"""Returns an iterator over the keys of dictionary sorted by "index" key"""
if not len(_dict):
return
list_of_items = []
for item in _dict.items():
list_of_items.append(item)
list_of_items.sort(key=lambda x: x[1].get("index"))
for item in list_of_items:
yield item[0]
if ISWINDOWS:
def has_sep(regexp):
return "\\" + sep in regexp
else:
def has_sep(regexp):
return sep in regexp

View File

@@ -114,36 +114,38 @@ ROW_TEMPLATE = """
CELL_TEMPLATE = """<td>{value}</td>""" CELL_TEMPLATE = """<td>{value}</td>"""
def export_to_xhtml(colnames, rows): def export_to_xhtml(colnames, rows):
# a row is a list of values with the first value being a flag indicating if the row should be indented # a row is a list of values with the first value being a flag indicating if the row should be indented
if rows: if rows:
assert len(rows[0]) == len(colnames) + 1 # + 1 is for the "indented" flag assert len(rows[0]) == len(colnames) + 1 # + 1 is for the "indented" flag
colheaders = ''.join(COLHEADERS_TEMPLATE.format(name=name) for name in colnames) colheaders = "".join(COLHEADERS_TEMPLATE.format(name=name) for name in colnames)
rendered_rows = [] rendered_rows = []
previous_group_id = None previous_group_id = None
for row in rows: for row in rows:
# [2:] is to remove the indented flag + filename # [2:] is to remove the indented flag + filename
if row[0] != previous_group_id: if row[0] != previous_group_id:
# We've just changed dupe group, which means that this dupe is a ref. We don't indent it. # We've just changed dupe group, which means that this dupe is a ref. We don't indent it.
indented = '' indented = ""
else: else:
indented = 'indented' indented = "indented"
filename = row[1] filename = row[1]
cells = ''.join(CELL_TEMPLATE.format(value=value) for value in row[2:]) cells = "".join(CELL_TEMPLATE.format(value=value) for value in row[2:])
rendered_rows.append(ROW_TEMPLATE.format(indented=indented, filename=filename, cells=cells)) rendered_rows.append(ROW_TEMPLATE.format(indented=indented, filename=filename, cells=cells))
previous_group_id = row[0] previous_group_id = row[0]
rendered_rows = ''.join(rendered_rows) rendered_rows = "".join(rendered_rows)
# The main template can't use format because the css code uses {} # The main template can't use format because the css code uses {}
content = MAIN_TEMPLATE.replace('$colheaders', colheaders).replace('$rows', rendered_rows) content = MAIN_TEMPLATE.replace("$colheaders", colheaders).replace("$rows", rendered_rows)
folder = mkdtemp() folder = mkdtemp()
destpath = op.join(folder, 'export.htm') destpath = op.join(folder, "export.htm")
fp = open(destpath, 'wt', encoding='utf-8') fp = open(destpath, "wt", encoding="utf-8")
fp.write(content) fp.write(content)
fp.close() fp.close()
return destpath return destpath
def export_to_csv(dest, colnames, rows): def export_to_csv(dest, colnames, rows):
writer = csv.writer(open(dest, 'wt', encoding='utf-8')) writer = csv.writer(open(dest, "wt", encoding="utf-8"))
writer.writerow(["Group ID"] + colnames) writer.writerow(["Group ID"] + colnames)
for row in rows: for row in rows:
writer.writerow(row) writer.writerow(row)

View File

@@ -11,25 +11,50 @@
# resulting needless complexity and memory usage. It's been a while since I wanted to do that fork, # resulting needless complexity and memory usage. It's been a while since I wanted to do that fork,
# and I'm doing it now. # and I'm doing it now.
import hashlib import os
import logging
from math import floor
import logging
import sqlite3
from threading import Lock
from typing import Any, AnyStr, Union, Callable
from pathlib import Path
from hscommon.util import nonone, get_file_ext from hscommon.util import nonone, get_file_ext
hasher: Callable
try:
import xxhash
hasher = xxhash.xxh128
except ImportError:
import hashlib
hasher = hashlib.md5
__all__ = [ __all__ = [
'File', "File",
'Folder', "Folder",
'get_file', "get_file",
'get_files', "get_files",
'FSError', "FSError",
'AlreadyExistsError', "AlreadyExistsError",
'InvalidPath', "InvalidPath",
'InvalidDestinationError', "InvalidDestinationError",
'OperationError', "OperationError",
] ]
NOT_SET = object() NOT_SET = object()
# The goal here is to not run out of memory on really big files. However, the chunk
# size has to be large enough so that the python loop isn't too costly in terms of
# CPU.
CHUNK_SIZE = 1024 * 1024 # 1 MiB
# Minimum size below which partial hashing is not used
MIN_FILE_SIZE = 3 * CHUNK_SIZE # 3MiB, because we take 3 samples
class FSError(Exception): class FSError(Exception):
cls_message = "An error has occured on '{name}' in '{parent}'" cls_message = "An error has occured on '{name}' in '{parent}'"
@@ -40,8 +65,8 @@ class FSError(Exception):
elif isinstance(fsobject, File): elif isinstance(fsobject, File):
name = fsobject.name name = fsobject.name
else: else:
name = '' name = ""
parentname = str(parent) if parent is not None else '' parentname = str(parent) if parent is not None else ""
Exception.__init__(self, message.format(name=name, parent=parentname)) Exception.__init__(self, message.format(name=name, parent=parentname))
@@ -49,40 +74,137 @@ class AlreadyExistsError(FSError):
"The directory or file name we're trying to add already exists" "The directory or file name we're trying to add already exists"
cls_message = "'{name}' already exists in '{parent}'" cls_message = "'{name}' already exists in '{parent}'"
class InvalidPath(FSError): class InvalidPath(FSError):
"The path of self is invalid, and cannot be worked with." "The path of self is invalid, and cannot be worked with."
cls_message = "'{name}' is invalid." cls_message = "'{name}' is invalid."
class InvalidDestinationError(FSError): class InvalidDestinationError(FSError):
"""A copy/move operation has been called, but the destination is invalid.""" """A copy/move operation has been called, but the destination is invalid."""
cls_message = "'{name}' is an invalid destination for this operation." cls_message = "'{name}' is an invalid destination for this operation."
class OperationError(FSError): class OperationError(FSError):
"""A copy/move/delete operation has been called, but the checkup after the """A copy/move/delete operation has been called, but the checkup after the
operation shows that it didn't work.""" operation shows that it didn't work."""
cls_message = "Operation on '{name}' failed." cls_message = "Operation on '{name}' failed."
class File:
"""Represents a file and holds metadata to be used for scanning. class FilesDB:
schema_version = 1
schema_version_description = "Changed from md5 to xxhash if available."
create_table_query = "CREATE TABLE IF NOT EXISTS files (path TEXT PRIMARY KEY, size INTEGER, mtime_ns INTEGER, entry_dt DATETIME, digest BLOB, digest_partial BLOB, digest_samples BLOB)"
drop_table_query = "DROP TABLE IF EXISTS files;"
select_query = "SELECT {key} FROM files WHERE path=:path AND size=:size and mtime_ns=:mtime_ns"
insert_query = """
INSERT INTO files (path, size, mtime_ns, entry_dt, {key}) VALUES (:path, :size, :mtime_ns, datetime('now'), :value)
ON CONFLICT(path) DO UPDATE SET size=:size, mtime_ns=:mtime_ns, entry_dt=datetime('now'), {key}=:value;
""" """
INITIAL_INFO = {
'size': 0, def __init__(self):
'mtime': 0, self.conn = None
'md5': '', self.cur = None
'md5partial': '', self.lock = None
}
def connect(self, path: Union[AnyStr, os.PathLike]) -> None:
self.conn = sqlite3.connect(path, check_same_thread=False)
self.cur = self.conn.cursor()
self.lock = Lock()
self._check_upgrade()
def _check_upgrade(self) -> None:
with self.lock:
has_schema = self.cur.execute(
"SELECT NAME FROM sqlite_master WHERE type='table' AND name='schema_version'"
).fetchall()
version = None
if has_schema:
version = self.cur.execute("SELECT version FROM schema_version ORDER BY version DESC").fetchone()[0]
else:
self.cur.execute("CREATE TABLE schema_version (version int PRIMARY KEY, description TEXT)")
if version != self.schema_version:
self.cur.execute(self.drop_table_query)
self.cur.execute(
"INSERT OR REPLACE INTO schema_version VALUES (:version, :description)",
{"version": self.schema_version, "description": self.schema_version_description},
)
self.cur.execute(self.create_table_query)
self.conn.commit()
def clear(self) -> None:
with self.lock:
self.cur.execute(self.drop_table_query)
self.cur.execute(self.create_table_query)
def get(self, path: Path, key: str) -> Union[bytes, None]:
stat = path.stat()
size = stat.st_size
mtime_ns = stat.st_mtime_ns
try:
with self.lock:
self.cur.execute(
self.select_query.format(key=key), {"path": str(path), "size": size, "mtime_ns": mtime_ns}
)
result = self.cur.fetchone()
if result:
return result[0]
except Exception as ex:
logging.warning(f"Couldn't get {key} for {path} w/{size}, {mtime_ns}: {ex}")
return None
def put(self, path: Path, key: str, value: Any) -> None:
stat = path.stat()
size = stat.st_size
mtime_ns = stat.st_mtime_ns
try:
with self.lock:
self.cur.execute(
self.insert_query.format(key=key),
{"path": str(path), "size": size, "mtime_ns": mtime_ns, "value": value},
)
except Exception as ex:
logging.warning(f"Couldn't put {key} for {path} w/{size}, {mtime_ns}: {ex}")
def commit(self) -> None:
with self.lock:
self.conn.commit()
def close(self) -> None:
with self.lock:
self.cur.close()
self.conn.close()
filesdb = FilesDB() # Singleton
class File:
"""Represents a file and holds metadata to be used for scanning."""
INITIAL_INFO = {"size": 0, "mtime": 0, "digest": b"", "digest_partial": b"", "digest_samples": b""}
# Slots for File make us save quite a bit of memory. In a memory test I've made with a lot of # Slots for File make us save quite a bit of memory. In a memory test I've made with a lot of
# files, I saved 35% memory usage with "unread" files (no _read_info() call) and gains become # files, I saved 35% memory usage with "unread" files (no _read_info() call) and gains become
# even greater when we take into account read attributes (70%!). Yeah, it's worth it. # even greater when we take into account read attributes (70%!). Yeah, it's worth it.
__slots__ = ('path', 'is_ref', 'words') + tuple(INITIAL_INFO.keys()) __slots__ = ("path", "is_ref", "words") + tuple(INITIAL_INFO.keys())
def __init__(self, path): def __init__(self, path):
self.path = path
for attrname in self.INITIAL_INFO: for attrname in self.INITIAL_INFO:
setattr(self, attrname, NOT_SET) setattr(self, attrname, NOT_SET)
if type(path) is os.DirEntry:
self.path = Path(path.path)
self.size = nonone(path.stat().st_size, 0)
self.mtime = nonone(path.stat().st_mtime, 0)
else:
self.path = path
def __repr__(self): def __repr__(self):
return "<{} {}>".format(self.__class__.__name__, str(self.path)) return f"<{self.__class__.__name__} {str(self.path)}>"
def __getattribute__(self, attrname): def __getattribute__(self, attrname):
result = object.__getattribute__(self, attrname) result = object.__getattribute__(self, attrname)
@@ -96,43 +218,78 @@ class File:
result = self.INITIAL_INFO[attrname] result = self.INITIAL_INFO[attrname]
return result return result
#This offset is where we should start reading the file to get a partial md5 def _calc_digest(self):
#For audio file, it should be where audio data starts # type: () -> bytes
def _get_md5partial_offset_and_size(self):
return (0x4000, 0x4000) #16Kb with self.path.open("rb") as fp:
file_hash = hasher()
# The goal here is to not run out of memory on really big files. However, the chunk
# size has to be large enough so that the python loop isn't too costly in terms of
# CPU.
CHUNK_SIZE = 1024 * 1024 # 1 mb
filedata = fp.read(CHUNK_SIZE)
while filedata:
file_hash.update(filedata)
filedata = fp.read(CHUNK_SIZE)
return file_hash.digest()
def _calc_digest_partial(self):
# type: () -> bytes
# This offset is where we should start reading the file to get a partial hash
# For audio file, it should be where audio data starts
offset, size = (0x4000, 0x4000)
with self.path.open("rb") as fp:
fp.seek(offset)
partial_data = fp.read(size)
return hasher(partial_data).digest()
def _calc_digest_samples(self) -> bytes:
size = self.size
with self.path.open("rb") as fp:
# Chunk at 25% of the file
fp.seek(floor(size * 25 / 100), 0)
file_data = fp.read(CHUNK_SIZE)
file_hash = hasher(file_data)
# Chunk at 60% of the file
fp.seek(floor(size * 60 / 100), 0)
file_data = fp.read(CHUNK_SIZE)
file_hash.update(file_data)
# Last chunk of the file
fp.seek(-CHUNK_SIZE, 2)
file_data = fp.read(CHUNK_SIZE)
file_hash.update(file_data)
return file_hash.digest()
def _read_info(self, field): def _read_info(self, field):
if field in ('size', 'mtime'): # print(f"_read_info({field}) for {self}")
if field in ("size", "mtime"):
stats = self.path.stat() stats = self.path.stat()
self.size = nonone(stats.st_size, 0) self.size = nonone(stats.st_size, 0)
self.mtime = nonone(stats.st_mtime, 0) self.mtime = nonone(stats.st_mtime, 0)
elif field == 'md5partial': elif field == "digest_partial":
try: self.digest_partial = filesdb.get(self.path, "digest_partial")
fp = self.path.open('rb') if self.digest_partial is None:
offset, size = self._get_md5partial_offset_and_size() self.digest_partial = self._calc_digest_partial()
fp.seek(offset) filesdb.put(self.path, "digest_partial", self.digest_partial)
partialdata = fp.read(size) elif field == "digest":
md5 = hashlib.md5(partialdata) self.digest = filesdb.get(self.path, "digest")
self.md5partial = md5.digest() if self.digest is None:
fp.close() self.digest = self._calc_digest()
except Exception: filesdb.put(self.path, "digest", self.digest)
pass elif field == "digest_samples":
elif field == 'md5': size = self.size
try: # Might as well hash such small files entirely.
fp = self.path.open('rb') if size <= MIN_FILE_SIZE:
md5 = hashlib.md5() setattr(self, field, self.digest)
# The goal here is to not run out of memory on really big files. However, the chunk return
# size has to be large enough so that the python loop isn't too costly in terms of self.digest_samples = filesdb.get(self.path, "digest_samples")
# CPU. if self.digest_samples is None:
CHUNK_SIZE = 1024 * 1024 # 1 mb self.digest_samples = self._calc_digest_samples()
filedata = fp.read(CHUNK_SIZE) filesdb.put(self.path, "digest_samples", self.digest_samples)
while filedata:
md5.update(filedata)
filedata = fp.read(CHUNK_SIZE)
self.md5 = md5.digest()
fp.close()
except Exception:
pass
def _read_all_info(self, attrnames=None): def _read_all_info(self, attrnames=None):
"""Cache all possible info. """Cache all possible info.
@@ -144,33 +301,31 @@ class File:
for attrname in attrnames: for attrname in attrnames:
getattr(self, attrname) getattr(self, attrname)
#--- Public # --- Public
@classmethod @classmethod
def can_handle(cls, path): def can_handle(cls, path):
"""Returns whether this file wrapper class can handle ``path``. """Returns whether this file wrapper class can handle ``path``."""
""" return not path.is_symlink() and path.is_file()
return not path.islink() and path.isfile()
def rename(self, newname): def rename(self, newname):
if newname == self.name: if newname == self.name:
return return
destpath = self.path.parent()[newname] destpath = self.path.parent.joinpath(newname)
if destpath.exists(): if destpath.exists():
raise AlreadyExistsError(newname, self.path.parent()) raise AlreadyExistsError(newname, self.path.parent)
try: try:
self.path.rename(destpath) self.path.rename(destpath)
except EnvironmentError: except OSError:
raise OperationError(self) raise OperationError(self)
if not destpath.exists(): if not destpath.exists():
raise OperationError(self) raise OperationError(self)
self.path = destpath self.path = destpath
def get_display_info(self, group, delta): def get_display_info(self, group, delta):
"""Returns a display-ready dict of dupe's data. """Returns a display-ready dict of dupe's data."""
"""
raise NotImplementedError() raise NotImplementedError()
#--- Properties # --- Properties
@property @property
def extension(self): def extension(self):
return get_file_ext(self.name) return get_file_ext(self.name)
@@ -181,18 +336,20 @@ class File:
@property @property
def folder_path(self): def folder_path(self):
return self.path.parent() return self.path.parent
class Folder(File): class Folder(File):
"""A wrapper around a folder path. """A wrapper around a folder path.
It has the size/md5 info of a File, but it's value are the sum of its subitems. It has the size/digest info of a File, but its value is the sum of its subitems.
""" """
__slots__ = File.__slots__ + ('_subfolders', )
__slots__ = File.__slots__ + ("_subfolders",)
def __init__(self, path): def __init__(self, path):
File.__init__(self, path) File.__init__(self, path)
self.size = NOT_SET
self._subfolders = None self._subfolders = None
def _all_items(self): def _all_items(self):
@@ -201,35 +358,37 @@ class Folder(File):
return folders + files return folders + files
def _read_info(self, field): def _read_info(self, field):
if field in {'size', 'mtime'}: # print(f"_read_info({field}) for Folder {self}")
if field in {"size", "mtime"}:
size = sum((f.size for f in self._all_items()), 0) size = sum((f.size for f in self._all_items()), 0)
self.size = size self.size = size
stats = self.path.stat() stats = self.path.stat()
self.mtime = nonone(stats.st_mtime, 0) self.mtime = nonone(stats.st_mtime, 0)
elif field in {'md5', 'md5partial'}: elif field in {"digest", "digest_partial", "digest_samples"}:
# What's sensitive here is that we must make sure that subfiles' # What's sensitive here is that we must make sure that subfiles'
# md5 are always added up in the same order, but we also want a # digest are always added up in the same order, but we also want a
# different md5 if a file gets moved in a different subdirectory. # different digest if a file gets moved in a different subdirectory.
def get_dir_md5_concat():
def get_dir_digest_concat():
items = self._all_items() items = self._all_items()
items.sort(key=lambda f: f.path) items.sort(key=lambda f: f.path)
md5s = [getattr(f, field) for f in items] digests = [getattr(f, field) for f in items]
return b''.join(md5s) return b"".join(digests)
md5 = hashlib.md5(get_dir_md5_concat()) digest = hasher(get_dir_digest_concat()).digest()
digest = md5.digest()
setattr(self, field, digest) setattr(self, field, digest)
@property @property
def subfolders(self): def subfolders(self):
if self._subfolders is None: if self._subfolders is None:
subfolders = [p for p in self.path.listdir() if not p.islink() and p.isdir()] with os.scandir(self.path) as iter:
subfolders = [p for p in iter if not p.is_symlink() and p.is_dir()]
self._subfolders = [self.__class__(p) for p in subfolders] self._subfolders = [self.__class__(p) for p in subfolders]
return self._subfolders return self._subfolders
@classmethod @classmethod
def can_handle(cls, path): def can_handle(cls, path):
return not path.islink() and path.isdir() return not path.is_symlink() and path.is_dir()
def get_file(path, fileclasses=[File]): def get_file(path, fileclasses=[File]):
@@ -244,6 +403,7 @@ def get_file(path, fileclasses=[File]):
if fileclass.can_handle(path): if fileclass.can_handle(path):
return fileclass(path) return fileclass(path)
def get_files(path, fileclasses=[File]): def get_files(path, fileclasses=[File]):
"""Returns a list of :class:`File` for each file contained in ``path``. """Returns a list of :class:`File` for each file contained in ``path``.
@@ -253,10 +413,11 @@ def get_files(path, fileclasses=[File]):
assert all(issubclass(fileclass, File) for fileclass in fileclasses) assert all(issubclass(fileclass, File) for fileclass in fileclasses)
try: try:
result = [] result = []
for path in path.listdir(): with os.scandir(path) as iter:
file = get_file(path, fileclasses=fileclasses) for item in iter:
if file is not None: file = get_file(item, fileclasses=fileclasses)
result.append(file) if file is not None:
result.append(file)
return result return result
except EnvironmentError: except OSError:
raise InvalidPath(path) raise InvalidPath(path)

View File

@@ -13,4 +13,3 @@ blue, which is supposed to be orange, does the sorting logic, holds selection, e
.. _cross-toolkit: http://www.hardcoded.net/articles/cross-toolkit-software .. _cross-toolkit: http://www.hardcoded.net/articles/cross-toolkit-software
""" """

View File

@@ -8,23 +8,28 @@
from hscommon.notify import Listener from hscommon.notify import Listener
class DupeGuruGUIObject(Listener): class DupeGuruGUIObject(Listener):
def __init__(self, app): def __init__(self, app):
Listener.__init__(self, app) Listener.__init__(self, app)
self.app = app self.app = app
def directories_changed(self): def directories_changed(self):
# Implemented in child classes
pass pass
def dupes_selected(self): def dupes_selected(self):
# Implemented in child classes
pass pass
def marking_changed(self): def marking_changed(self):
# Implemented in child classes
pass pass
def results_changed(self): def results_changed(self):
# Implemented in child classes
pass pass
def results_changed_but_keep_selection(self): def results_changed_but_keep_selection(self):
# Implemented in child classes
pass pass

View File

@@ -1,8 +1,8 @@
# Created On: 2012-05-30 # Created On: 2012-05-30
# Copyright 2015 Hardcoded Software (http://www.hardcoded.net) # Copyright 2015 Hardcoded Software (http://www.hardcoded.net)
# #
# This software is licensed under the "GPLv3" License as described in the "LICENSE" file, # This software is licensed under the "GPLv3" License as described in the "LICENSE" file,
# which should be included with this package. The terms are also available at # which should be included with this package. The terms are also available at
# http://www.gnu.org/licenses/gpl-3.0.html # http://www.gnu.org/licenses/gpl-3.0.html
import os import os
@@ -10,42 +10,44 @@ import os
from hscommon.gui.base import GUIObject from hscommon.gui.base import GUIObject
from hscommon.trans import tr from hscommon.trans import tr
class DeletionOptionsView: class DeletionOptionsView:
"""Expected interface for :class:`DeletionOptions`'s view. """Expected interface for :class:`DeletionOptions`'s view.
*Not actually used in the code. For documentation purposes only.* *Not actually used in the code. For documentation purposes only.*
Our view presents the user with an appropriate way (probably a mix of checkboxes and radio Our view presents the user with an appropriate way (probably a mix of checkboxes and radio
buttons) to set the different flags in :class:`DeletionOptions`. Note that buttons) to set the different flags in :class:`DeletionOptions`. Note that
:attr:`DeletionOptions.use_hardlinks` is only relevant if :attr:`DeletionOptions.link_deleted` :attr:`DeletionOptions.use_hardlinks` is only relevant if :attr:`DeletionOptions.link_deleted`
is true. This is why we toggle the "enabled" state of that flag. is true. This is why we toggle the "enabled" state of that flag.
We expect the view to set :attr:`DeletionOptions.link_deleted` immediately as the user changes We expect the view to set :attr:`DeletionOptions.link_deleted` immediately as the user changes
its value because it will toggle :meth:`set_hardlink_option_enabled` its value because it will toggle :meth:`set_hardlink_option_enabled`
Other than the flags, there's also a prompt message which has a dynamic content, defined by Other than the flags, there's also a prompt message which has a dynamic content, defined by
:meth:`update_msg`. :meth:`update_msg`.
""" """
def update_msg(self, msg: str): def update_msg(self, msg: str):
"""Update the dialog's prompt with ``str``. """Update the dialog's prompt with ``str``."""
"""
def show(self): def show(self):
"""Show the dialog in a modal fashion. """Show the dialog in a modal fashion.
Returns whether the dialog was "accepted" (the user pressed OK). Returns whether the dialog was "accepted" (the user pressed OK).
""" """
def set_hardlink_option_enabled(self, is_enabled: bool): def set_hardlink_option_enabled(self, is_enabled: bool):
"""Enable or disable the widget controlling :attr:`DeletionOptions.use_hardlinks`. """Enable or disable the widget controlling :attr:`DeletionOptions.use_hardlinks`."""
"""
class DeletionOptions(GUIObject): class DeletionOptions(GUIObject):
"""Present the user with deletion options before proceeding. """Present the user with deletion options before proceeding.
When the user activates "Send to trash", we present him with a couple of options that changes When the user activates "Send to trash", we present him with a couple of options that changes
the behavior of that deletion operation. the behavior of that deletion operation.
""" """
def __init__(self): def __init__(self):
GUIObject.__init__(self) GUIObject.__init__(self)
#: Whether symlinks or hardlinks are used when doing :attr:`link_deleted`. #: Whether symlinks or hardlinks are used when doing :attr:`link_deleted`.
@@ -54,10 +56,10 @@ class DeletionOptions(GUIObject):
#: Delete dupes directly and don't send to trash. #: Delete dupes directly and don't send to trash.
#: *bool*. *get/set* #: *bool*. *get/set*
self.direct = False self.direct = False
def show(self, mark_count): def show(self, mark_count):
"""Prompt the user with a modal dialog offering our deletion options. """Prompt the user with a modal dialog offering our deletion options.
:param int mark_count: Number of dupes marked for deletion. :param int mark_count: Number of dupes marked for deletion.
:rtype: bool :rtype: bool
:returns: Whether the user accepted the dialog (we cancel deletion if false). :returns: Whether the user accepted the dialog (we cancel deletion if false).
@@ -69,10 +71,9 @@ class DeletionOptions(GUIObject):
msg = tr("You are sending {} file(s) to the Trash.").format(mark_count) msg = tr("You are sending {} file(s) to the Trash.").format(mark_count)
self.view.update_msg(msg) self.view.update_msg(msg)
return self.view.show() return self.view.show()
def supports_links(self): def supports_links(self):
"""Returns whether our platform supports symlinks. """Returns whether our platform supports symlinks."""
"""
# When on a platform that doesn't implement it, calling os.symlink() (with the wrong number # When on a platform that doesn't implement it, calling os.symlink() (with the wrong number
# of arguments) raises NotImplementedError, which allows us to gracefully check for the # of arguments) raises NotImplementedError, which allows us to gracefully check for the
# feature. # feature.
@@ -87,21 +88,19 @@ class DeletionOptions(GUIObject):
except TypeError: except TypeError:
# wrong number of arguments # wrong number of arguments
return True return True
@property @property
def link_deleted(self): def link_deleted(self):
"""Replace deleted dupes with symlinks (or hardlinks) to the dupe group reference. """Replace deleted dupes with symlinks (or hardlinks) to the dupe group reference.
*bool*. *get/set* *bool*. *get/set*
Whether the link is a symlink or hardlink is decided by :attr:`use_hardlinks`. Whether the link is a symlink or hardlink is decided by :attr:`use_hardlinks`.
""" """
return self._link_deleted return self._link_deleted
@link_deleted.setter @link_deleted.setter
def link_deleted(self, value): def link_deleted(self, value):
self._link_deleted = value self._link_deleted = value
hardlinks_enabled = value and self.supports_links() hardlinks_enabled = value and self.supports_links()
self.view.set_hardlink_option_enabled(hardlinks_enabled) self.view.set_hardlink_option_enabled(hardlinks_enabled)

View File

@@ -7,7 +7,8 @@
# http://www.gnu.org/licenses/gpl-3.0.html # http://www.gnu.org/licenses/gpl-3.0.html
from hscommon.gui.base import GUIObject from hscommon.gui.base import GUIObject
from .base import DupeGuruGUIObject from core.gui.base import DupeGuruGUIObject
class DetailsPanel(GUIObject, DupeGuruGUIObject): class DetailsPanel(GUIObject, DupeGuruGUIObject):
def __init__(self, app): def __init__(self, app):
@@ -19,7 +20,7 @@ class DetailsPanel(GUIObject, DupeGuruGUIObject):
self._refresh() self._refresh()
self.view.refresh() self.view.refresh()
#--- Private # --- Private
def _refresh(self): def _refresh(self):
if self.app.selected_dupes: if self.app.selected_dupes:
dupe = self.app.selected_dupes[0] dupe = self.app.selected_dupes[0]
@@ -31,18 +32,16 @@ class DetailsPanel(GUIObject, DupeGuruGUIObject):
# we don't want the two sides of the table to display the stats for the same file # we don't want the two sides of the table to display the stats for the same file
ref = group.ref if group is not None and group.ref is not dupe else None ref = group.ref if group is not None and group.ref is not dupe else None
data2 = self.app.get_display_info(ref, group, False) data2 = self.app.get_display_info(ref, group, False)
columns = self.app.result_table.COLUMNS[1:] # first column is the 'marked' column columns = self.app.result_table.COLUMNS[1:] # first column is the 'marked' column
self._table = [(c.display, data1[c.name], data2[c.name]) for c in columns] self._table = [(c.display, data1[c.name], data2[c.name]) for c in columns]
#--- Public # --- Public
def row_count(self): def row_count(self):
return len(self._table) return len(self._table)
def row(self, row_index): def row(self, row_index):
return self._table[row_index] return self._table[row_index]
#--- Event Handlers # --- Event Handlers
def dupes_selected(self): def dupes_selected(self):
self._refresh() self._view_updated()
self.view.refresh()

View File

@@ -1,17 +1,18 @@
# Created By: Virgil Dupras # Created By: Virgil Dupras
# Created On: 2010-02-06 # Created On: 2010-02-06
# Copyright 2015 Hardcoded Software (http://www.hardcoded.net) # Copyright 2015 Hardcoded Software (http://www.hardcoded.net)
# #
# This software is licensed under the "GPLv3" License as described in the "LICENSE" file, # This software is licensed under the "GPLv3" License as described in the "LICENSE" file,
# which should be included with this package. The terms are also available at # which should be included with this package. The terms are also available at
# http://www.gnu.org/licenses/gpl-3.0.html # http://www.gnu.org/licenses/gpl-3.0.html
from hscommon.gui.tree import Tree, Node from hscommon.gui.tree import Tree, Node
from ..directories import DirectoryState from core.directories import DirectoryState
from .base import DupeGuruGUIObject from core.gui.base import DupeGuruGUIObject
STATE_ORDER = [DirectoryState.NORMAL, DirectoryState.REFERENCE, DirectoryState.EXCLUDED]
STATE_ORDER = [DirectoryState.Normal, DirectoryState.Reference, DirectoryState.Excluded]
# Lazily loads children # Lazily loads children
class DirectoryNode(Node): class DirectoryNode(Node):
@@ -21,29 +22,29 @@ class DirectoryNode(Node):
self._directory_path = path self._directory_path = path
self._loaded = False self._loaded = False
self._state = STATE_ORDER.index(self._tree.app.directories.get_state(path)) self._state = STATE_ORDER.index(self._tree.app.directories.get_state(path))
def __len__(self): def __len__(self):
if not self._loaded: if not self._loaded:
self._load() self._load()
return Node.__len__(self) return Node.__len__(self)
def _load(self): def _load(self):
self.clear() self.clear()
subpaths = self._tree.app.directories.get_subfolders(self._directory_path) subpaths = self._tree.app.directories.get_subfolders(self._directory_path)
for path in subpaths: for path in subpaths:
self.append(DirectoryNode(self._tree, path, path.name)) self.append(DirectoryNode(self._tree, path, path.name))
self._loaded = True self._loaded = True
def update_all_states(self): def update_all_states(self):
self._state = STATE_ORDER.index(self._tree.app.directories.get_state(self._directory_path)) self._state = STATE_ORDER.index(self._tree.app.directories.get_state(self._directory_path))
for node in self: for node in self:
node.update_all_states() node.update_all_states()
# The state propery is an index to the combobox # The state propery is an index to the combobox
@property @property
def state(self): def state(self):
return self._state return self._state
@state.setter @state.setter
def state(self, value): def state(self, value):
if value == self._state: if value == self._state:
@@ -52,29 +53,29 @@ class DirectoryNode(Node):
state = STATE_ORDER[value] state = STATE_ORDER[value]
self._tree.app.directories.set_state(self._directory_path, state) self._tree.app.directories.set_state(self._directory_path, state)
self._tree.update_all_states() self._tree.update_all_states()
class DirectoryTree(Tree, DupeGuruGUIObject): class DirectoryTree(Tree, DupeGuruGUIObject):
#--- model -> view calls: # --- model -> view calls:
# refresh() # refresh()
# refresh_states() # when only states label need to be refreshed # refresh_states() # when only states label need to be refreshed
# #
def __init__(self, app): def __init__(self, app):
Tree.__init__(self) Tree.__init__(self)
DupeGuruGUIObject.__init__(self, app) DupeGuruGUIObject.__init__(self, app)
def _view_updated(self): def _view_updated(self):
self._refresh() self._refresh()
self.view.refresh() self.view.refresh()
def _refresh(self): def _refresh(self):
self.clear() self.clear()
for path in self.app.directories: for path in self.app.directories:
self.append(DirectoryNode(self, path, str(path))) self.append(DirectoryNode(self, path, str(path)))
def add_directory(self, path): def add_directory(self, path):
self.app.add_directory(path) self.app.add_directory(path)
def remove_selected(self): def remove_selected(self):
selected_paths = self.selected_paths selected_paths = self.selected_paths
if not selected_paths: if not selected_paths:
@@ -85,23 +86,21 @@ class DirectoryTree(Tree, DupeGuruGUIObject):
else: else:
# All selected nodes or on second-or-more level, exclude them. # All selected nodes or on second-or-more level, exclude them.
nodes = self.selected_nodes nodes = self.selected_nodes
newstate = DirectoryState.Excluded newstate = DirectoryState.EXCLUDED
if all(node.state == DirectoryState.Excluded for node in nodes): if all(node.state == DirectoryState.EXCLUDED for node in nodes):
newstate = DirectoryState.Normal newstate = DirectoryState.NORMAL
for node in nodes: for node in nodes:
node.state = newstate node.state = newstate
def select_all(self): def select_all(self):
self.selected_nodes = list(self) self.selected_nodes = list(self)
self.view.refresh() self.view.refresh()
def update_all_states(self): def update_all_states(self):
for node in self: for node in self:
node.update_all_states() node.update_all_states()
self.view.refresh_states() self.view.refresh_states()
#--- Event Handlers # --- Event Handlers
def directories_changed(self): def directories_changed(self):
self._refresh() self._view_updated()
self.view.refresh()

View File

@@ -0,0 +1,90 @@
# Created On: 2012/03/13
# Copyright 2015 Hardcoded Software (http://www.hardcoded.net)
#
# This software is licensed under the "GPLv3" License as described in the "LICENSE" file,
# which should be included with this package. The terms are also available at
# http://www.gnu.org/licenses/gpl-3.0.html
from core.gui.exclude_list_table import ExcludeListTable
from core.exclude import has_sep
from os import sep
import logging
class ExcludeListDialogCore:
def __init__(self, app):
self.app = app
self.exclude_list = self.app.exclude_list # Markable from exclude.py
self.exclude_list_table = ExcludeListTable(self, app) # GUITable, this is the "model"
def restore_defaults(self):
self.exclude_list.restore_defaults()
self.refresh()
def refresh(self):
self.exclude_list_table.refresh()
def remove_selected(self):
for row in self.exclude_list_table.selected_rows:
self.exclude_list_table.remove(row)
self.exclude_list.remove(row.regex)
self.refresh()
def rename_selected(self, newregex):
"""Rename the selected regex to ``newregex``.
If there is more than one selected row, the first one is used.
:param str newregex: The regex to rename the row's regex to.
:return bool: true if success, false if error.
"""
try:
r = self.exclude_list_table.selected_rows[0]
self.exclude_list.rename(r.regex, newregex)
self.refresh()
return True
except Exception as e:
logging.warning(f"Error while renaming regex to {newregex}: {e}")
return False
def add(self, regex):
self.exclude_list.add(regex)
self.exclude_list.mark(regex)
self.exclude_list_table.add(regex)
def test_string(self, test_string):
"""Set the highlight property on each row when its regex matches the
test_string supplied. Return True if any row matched."""
matched = False
for row in self.exclude_list_table.rows:
compiled_regex = self.exclude_list.get_compiled(row.regex)
if self.is_match(test_string, compiled_regex):
row.highlight = True
matched = True
else:
row.highlight = False
return matched
def is_match(self, test_string, compiled_regex):
# This method is like an inverted version of ExcludeList.is_excluded()
if not compiled_regex:
return False
matched = False
# Test only the filename portion of the path
if not has_sep(compiled_regex.pattern) and sep in test_string:
filename = test_string.rsplit(sep, 1)[1]
if compiled_regex.fullmatch(filename):
matched = True
return matched
# Test the entire path + filename
if compiled_regex.fullmatch(test_string):
matched = True
return matched
def reset_rows_highlight(self):
for row in self.exclude_list_table.rows:
row.highlight = False
def show(self):
self.view.show()

View File

@@ -0,0 +1,96 @@
# This software is licensed under the "GPLv3" License as described in the "LICENSE" file,
# which should be included with this package. The terms are also available at
# http://www.gnu.org/licenses/gpl-3.0.html
from core.gui.base import DupeGuruGUIObject
from hscommon.gui.table import GUITable, Row
from hscommon.gui.column import Column, Columns
from hscommon.trans import trget
tr = trget("ui")
class ExcludeListTable(GUITable, DupeGuruGUIObject):
COLUMNS = [Column("marked", ""), Column("regex", tr("Regular Expressions"))]
def __init__(self, exclude_list_dialog, app):
GUITable.__init__(self)
DupeGuruGUIObject.__init__(self, app)
self._columns = Columns(self)
self.dialog = exclude_list_dialog
def rename_selected(self, newname):
row = self.selected_row
if row is None:
return False
row._data = None
return self.dialog.rename_selected(newname)
# --- Virtual
def _do_add(self, regex):
"""(Virtual) Creates a new row, adds it in the table.
Returns ``(row, insert_index)``."""
# Return index 0 to insert at the top
return ExcludeListRow(self, self.dialog.exclude_list.is_marked(regex), regex), 0
def _do_delete(self):
self.dialog.exclude_list.remove(self.selected_row.regex)
# --- Override
def add(self, regex):
row, insert_index = self._do_add(regex)
self.insert(insert_index, row)
self.view.refresh()
def _fill(self):
for enabled, regex in self.dialog.exclude_list:
self.append(ExcludeListRow(self, enabled, regex))
def refresh(self, refresh_view=True):
"""Override to avoid keeping previous selection in case of multiple rows
selected previously."""
self.cancel_edits()
del self[:]
self._fill()
if refresh_view:
self.view.refresh()
class ExcludeListRow(Row):
def __init__(self, table, enabled, regex):
Row.__init__(self, table)
self._app = table.app
self._data = None
self.enabled = str(enabled)
self.regex = str(regex)
self.highlight = False
@property
def data(self):
if self._data is None:
self._data = {"marked": self.enabled, "regex": self.regex}
return self._data
@property
def markable(self):
return self._app.exclude_list.is_markable(self.regex)
@property
def marked(self):
return self._app.exclude_list.is_marked(self.regex)
@marked.setter
def marked(self, value):
if value:
self._app.exclude_list.mark(self.regex)
else:
self._app.exclude_list.unmark(self.regex)
@property
def error(self):
# This assumes error() returns an Exception()
message = self._app.exclude_list.error(self.regex)
if hasattr(message, "msg"):
return self._app.exclude_list.error(self.regex).msg
else:
return message # Exception object

View File

@@ -6,24 +6,25 @@
# http://www.gnu.org/licenses/gpl-3.0.html # http://www.gnu.org/licenses/gpl-3.0.html
from hscommon.trans import tr from hscommon.trans import tr
from .ignore_list_table import IgnoreListTable from core.gui.ignore_list_table import IgnoreListTable
class IgnoreListDialog: class IgnoreListDialog:
#--- View interface # --- View interface
# show() # show()
# #
def __init__(self, app): def __init__(self, app):
self.app = app self.app = app
self.ignore_list = self.app.ignore_list self.ignore_list = self.app.ignore_list
self.ignore_list_table = IgnoreListTable(self) self.ignore_list_table = IgnoreListTable(self) # GUITable
def clear(self): def clear(self):
if not self.ignore_list: if not self.ignore_list:
return return
msg = tr("Do you really want to remove all %d items from the ignore list?") % len(self.ignore_list) msg = tr("Do you really want to remove all %d items from the ignore list?") % len(self.ignore_list)
if self.app.view.ask_yes_no(msg): if self.app.view.ask_yes_no(msg):
self.ignore_list.Clear() self.ignore_list.clear()
self.refresh() self.refresh()
def refresh(self): def refresh(self):
@@ -36,4 +37,3 @@ class IgnoreListDialog:
def show(self): def show(self):
self.view.show() self.view.show()

View File

@@ -1,35 +1,36 @@
# Created By: Virgil Dupras # Created By: Virgil Dupras
# Created On: 2012-03-13 # Created On: 2012-03-13
# Copyright 2015 Hardcoded Software (http://www.hardcoded.net) # Copyright 2015 Hardcoded Software (http://www.hardcoded.net)
# #
# This software is licensed under the "GPLv3" License as described in the "LICENSE" file, # This software is licensed under the "GPLv3" License as described in the "LICENSE" file,
# which should be included with this package. The terms are also available at # which should be included with this package. The terms are also available at
# http://www.gnu.org/licenses/gpl-3.0.html # http://www.gnu.org/licenses/gpl-3.0.html
from hscommon.gui.table import GUITable, Row from hscommon.gui.table import GUITable, Row
from hscommon.gui.column import Column, Columns from hscommon.gui.column import Column, Columns
from hscommon.trans import trget from hscommon.trans import trget
coltr = trget('columns') coltr = trget("columns")
class IgnoreListTable(GUITable): class IgnoreListTable(GUITable):
COLUMNS = [ COLUMNS = [
# the str concat below saves us needless localization. # the str concat below saves us needless localization.
Column('path1', coltr("File Path") + " 1"), Column("path1", coltr("File Path") + " 1"),
Column('path2', coltr("File Path") + " 2"), Column("path2", coltr("File Path") + " 2"),
] ]
def __init__(self, ignore_list_dialog): def __init__(self, ignore_list_dialog):
GUITable.__init__(self) GUITable.__init__(self)
self.columns = Columns(self) self._columns = Columns(self)
self.view = None self.view = None
self.dialog = ignore_list_dialog self.dialog = ignore_list_dialog
#--- Override # --- Override
def _fill(self): def _fill(self):
for path1, path2 in self.dialog.ignore_list: for path1, path2 in self.dialog.ignore_list:
self.append(IgnoreListRow(self, path1, path2)) self.append(IgnoreListRow(self, path1, path2))
class IgnoreListRow(Row): class IgnoreListRow(Row):
def __init__(self, table, path1, path2): def __init__(self, table, path1, path2):
@@ -38,4 +39,3 @@ class IgnoreListRow(Row):
self.path2_original = path2 self.path2_original = path2
self.path1 = str(path1) self.path1 = str(path1)
self.path2 = str(path2) self.path2 = str(path2)

View File

@@ -9,6 +9,7 @@
from hscommon.gui.base import GUIObject from hscommon.gui.base import GUIObject
from hscommon.gui.selectable_list import GUISelectableList from hscommon.gui.selectable_list import GUISelectableList
class CriterionCategoryList(GUISelectableList): class CriterionCategoryList(GUISelectableList):
def __init__(self, dialog): def __init__(self, dialog):
self.dialog = dialog self.dialog = dialog
@@ -18,6 +19,7 @@ class CriterionCategoryList(GUISelectableList):
self.dialog.select_category(self.dialog.categories[self.selected_index]) self.dialog.select_category(self.dialog.categories[self.selected_index])
GUISelectableList._update_selection(self) GUISelectableList._update_selection(self)
class PrioritizationList(GUISelectableList): class PrioritizationList(GUISelectableList):
def __init__(self, dialog): def __init__(self, dialog):
self.dialog = dialog self.dialog = dialog
@@ -41,6 +43,7 @@ class PrioritizationList(GUISelectableList):
del prilist[i] del prilist[i]
self._refresh_contents() self._refresh_contents()
class PrioritizeDialog(GUIObject): class PrioritizeDialog(GUIObject):
def __init__(self, app): def __init__(self, app):
GUIObject.__init__(self) GUIObject.__init__(self)
@@ -52,15 +55,15 @@ class PrioritizeDialog(GUIObject):
self.prioritizations = [] self.prioritizations = []
self.prioritization_list = PrioritizationList(self) self.prioritization_list = PrioritizationList(self)
#--- Override # --- Override
def _view_updated(self): def _view_updated(self):
self.category_list.select(0) self.category_list.select(0)
#--- Private # --- Private
def _sort_key(self, dupe): def _sort_key(self, dupe):
return tuple(crit.sort_key(dupe) for crit in self.prioritizations) return tuple(crit.sort_key(dupe) for crit in self.prioritizations)
#--- Public # --- Public
def select_category(self, category): def select_category(self, category):
self.criteria = category.criteria_list() self.criteria = category.criteria_list()
self.criteria_list[:] = [c.display_value for c in self.criteria] self.criteria_list[:] = [c.display_value for c in self.criteria]
@@ -69,13 +72,15 @@ class PrioritizeDialog(GUIObject):
# Add selected criteria in criteria_list to prioritization_list. # Add selected criteria in criteria_list to prioritization_list.
if self.criteria_list.selected_index is None: if self.criteria_list.selected_index is None:
return return
crit = self.criteria[self.criteria_list.selected_index] for i in self.criteria_list.selected_indexes:
self.prioritizations.append(crit) crit = self.criteria[i]
del crit self.prioritizations.append(crit)
del crit
self.prioritization_list[:] = [crit.display for crit in self.prioritizations] self.prioritization_list[:] = [crit.display for crit in self.prioritizations]
def remove_selected(self): def remove_selected(self):
self.prioritization_list.remove_selected() self.prioritization_list.remove_selected()
self.prioritization_list.select([])
def perform_reprioritization(self): def perform_reprioritization(self):
self.app.reprioritize_groups(self._sort_key) self.app.reprioritize_groups(self._sort_key)

View File

@@ -1,29 +1,29 @@
# Created By: Virgil Dupras # Created By: Virgil Dupras
# Created On: 2010-04-12 # Created On: 2010-04-12
# Copyright 2015 Hardcoded Software (http://www.hardcoded.net) # Copyright 2015 Hardcoded Software (http://www.hardcoded.net)
# #
# This software is licensed under the "GPLv3" License as described in the "LICENSE" file, # This software is licensed under the "GPLv3" License as described in the "LICENSE" file,
# which should be included with this package. The terms are also available at # which should be included with this package. The terms are also available at
# http://www.gnu.org/licenses/gpl-3.0.html # http://www.gnu.org/licenses/gpl-3.0.html
from hscommon import desktop from hscommon import desktop
from .problem_table import ProblemTable from core.gui.problem_table import ProblemTable
class ProblemDialog: class ProblemDialog:
def __init__(self, app): def __init__(self, app):
self.app = app self.app = app
self._selected_dupe = None self._selected_dupe = None
self.problem_table = ProblemTable(self) self.problem_table = ProblemTable(self)
def refresh(self): def refresh(self):
self._selected_dupe = None self._selected_dupe = None
self.problem_table.refresh() self.problem_table.refresh()
def reveal_selected_dupe(self): def reveal_selected_dupe(self):
if self._selected_dupe is not None: if self._selected_dupe is not None:
desktop.reveal_path(self._selected_dupe.path) desktop.reveal_path(self._selected_dupe.path)
def select_dupe(self, dupe): def select_dupe(self, dupe):
self._selected_dupe = dupe self._selected_dupe = dupe

View File

@@ -1,39 +1,40 @@
# Created By: Virgil Dupras # Created By: Virgil Dupras
# Created On: 2010-04-12 # Created On: 2010-04-12
# Copyright 2015 Hardcoded Software (http://www.hardcoded.net) # Copyright 2015 Hardcoded Software (http://www.hardcoded.net)
# #
# This software is licensed under the "GPLv3" License as described in the "LICENSE" file, # This software is licensed under the "GPLv3" License as described in the "LICENSE" file,
# which should be included with this package. The terms are also available at # which should be included with this package. The terms are also available at
# http://www.gnu.org/licenses/gpl-3.0.html # http://www.gnu.org/licenses/gpl-3.0.html
from hscommon.gui.table import GUITable, Row from hscommon.gui.table import GUITable, Row
from hscommon.gui.column import Column, Columns from hscommon.gui.column import Column, Columns
from hscommon.trans import trget from hscommon.trans import trget
coltr = trget('columns') coltr = trget("columns")
class ProblemTable(GUITable): class ProblemTable(GUITable):
COLUMNS = [ COLUMNS = [
Column('path', coltr("File Path")), Column("path", coltr("File Path")),
Column('msg', coltr("Error Message")), Column("msg", coltr("Error Message")),
] ]
def __init__(self, problem_dialog): def __init__(self, problem_dialog):
GUITable.__init__(self) GUITable.__init__(self)
self.columns = Columns(self) self._columns = Columns(self)
self.dialog = problem_dialog self.dialog = problem_dialog
#--- Override # --- Override
def _update_selection(self): def _update_selection(self):
row = self.selected_row row = self.selected_row
dupe = row.dupe if row is not None else None dupe = row.dupe if row is not None else None
self.dialog.select_dupe(dupe) self.dialog.select_dupe(dupe)
def _fill(self): def _fill(self):
problems = self.dialog.app.results.problems problems = self.dialog.app.results.problems
for dupe, msg in problems: for dupe, msg in problems:
self.append(ProblemRow(self, dupe, msg)) self.append(ProblemRow(self, dupe, msg))
class ProblemRow(Row): class ProblemRow(Row):
def __init__(self, table, dupe, msg): def __init__(self, table, dupe, msg):
@@ -41,4 +42,3 @@ class ProblemRow(Row):
self.dupe = dupe self.dupe = dupe
self.msg = msg self.msg = msg
self.path = str(dupe.path) self.path = str(dupe.path)

View File

@@ -1,9 +1,9 @@
# Created By: Virgil Dupras # Created By: Virgil Dupras
# Created On: 2010-02-11 # Created On: 2010-02-11
# Copyright 2015 Hardcoded Software (http://www.hardcoded.net) # Copyright 2015 Hardcoded Software (http://www.hardcoded.net)
# #
# This software is licensed under the "GPLv3" License as described in the "LICENSE" file, # This software is licensed under the "GPLv3" License as described in the "LICENSE" file,
# which should be included with this package. The terms are also available at # which should be included with this package. The terms are also available at
# http://www.gnu.org/licenses/gpl-3.0.html # http://www.gnu.org/licenses/gpl-3.0.html
from operator import attrgetter from operator import attrgetter
@@ -11,7 +11,8 @@ from operator import attrgetter
from hscommon.gui.table import GUITable, Row from hscommon.gui.table import GUITable, Row
from hscommon.gui.column import Columns from hscommon.gui.column import Columns
from .base import DupeGuruGUIObject from core.gui.base import DupeGuruGUIObject
class DupeRow(Row): class DupeRow(Row):
def __init__(self, table, group, dupe): def __init__(self, table, group, dupe):
@@ -22,14 +23,14 @@ class DupeRow(Row):
self._data = None self._data = None
self._data_delta = None self._data_delta = None
self._delta_columns = None self._delta_columns = None
def is_cell_delta(self, column_name): def is_cell_delta(self, column_name):
"""Returns whether a cell is in delta mode (orange color). """Returns whether a cell is in delta mode (orange color).
If the result table is in delta mode, returns True if the column is one of the "delta If the result table is in delta mode, returns True if the column is one of the "delta
columns", that is, one of the columns that display a a differential value rather than an columns", that is, one of the columns that display a a differential value rather than an
absolute value. absolute value.
If not, returns True if the dupe's value is different from its ref value. If not, returns True if the dupe's value is different from its ref value.
""" """
if not self.table.delta_values: if not self.table.delta_values:
@@ -40,64 +41,66 @@ class DupeRow(Row):
# table.DELTA_COLUMNS are always "delta" # table.DELTA_COLUMNS are always "delta"
self._delta_columns = self.table.DELTA_COLUMNS.copy() self._delta_columns = self.table.DELTA_COLUMNS.copy()
dupe_info = self.data dupe_info = self.data
if self._group.ref is None:
return False
ref_info = self._group.ref.get_display_info(group=self._group, delta=False) ref_info = self._group.ref.get_display_info(group=self._group, delta=False)
for key, value in dupe_info.items(): for key, value in dupe_info.items():
if (key not in self._delta_columns) and (ref_info[key].lower() != value.lower()): if (key not in self._delta_columns) and (ref_info[key].lower() != value.lower()):
self._delta_columns.add(key) self._delta_columns.add(key)
return column_name in self._delta_columns return column_name in self._delta_columns
@property @property
def data(self): def data(self):
if self._data is None: if self._data is None:
self._data = self._app.get_display_info(self._dupe, self._group, False) self._data = self._app.get_display_info(self._dupe, self._group, False)
return self._data return self._data
@property @property
def data_delta(self): def data_delta(self):
if self._data_delta is None: if self._data_delta is None:
self._data_delta = self._app.get_display_info(self._dupe, self._group, True) self._data_delta = self._app.get_display_info(self._dupe, self._group, True)
return self._data_delta return self._data_delta
@property @property
def isref(self): def isref(self):
return self._dupe is self._group.ref return self._dupe is self._group.ref
@property @property
def markable(self): def markable(self):
return self._app.results.is_markable(self._dupe) return self._app.results.is_markable(self._dupe)
@property @property
def marked(self): def marked(self):
return self._app.results.is_marked(self._dupe) return self._app.results.is_marked(self._dupe)
@marked.setter @marked.setter
def marked(self, value): def marked(self, value):
self._app.mark_dupe(self._dupe, value) self._app.mark_dupe(self._dupe, value)
class ResultTable(GUITable, DupeGuruGUIObject): class ResultTable(GUITable, DupeGuruGUIObject):
def __init__(self, app): def __init__(self, app):
GUITable.__init__(self) GUITable.__init__(self)
DupeGuruGUIObject.__init__(self, app) DupeGuruGUIObject.__init__(self, app)
self.columns = Columns(self, prefaccess=app, savename='ResultTable') self._columns = Columns(self, prefaccess=app, savename="ResultTable")
self._power_marker = False self._power_marker = False
self._delta_values = False self._delta_values = False
self._sort_descriptors = ('name', True) self._sort_descriptors = ("name", True)
#--- Override # --- Override
def _view_updated(self): def _view_updated(self):
self._refresh_with_view() self._refresh_with_view()
def _restore_selection(self, previous_selection): def _restore_selection(self, previous_selection):
if self.app.selected_dupes: if self.app.selected_dupes:
to_find = set(self.app.selected_dupes) to_find = set(self.app.selected_dupes)
indexes = [i for i, r in enumerate(self) if r._dupe in to_find] indexes = [i for i, r in enumerate(self) if r._dupe in to_find]
self.selected_indexes = indexes self.selected_indexes = indexes
def _update_selection(self): def _update_selection(self):
rows = self.selected_rows rows = self.selected_rows
self.app._select_dupes(list(map(attrgetter('_dupe'), rows))) self.app._select_dupes(list(map(attrgetter("_dupe"), rows)))
def _fill(self): def _fill(self):
if not self.power_marker: if not self.power_marker:
for group in self.app.results.groups: for group in self.app.results.groups:
@@ -108,22 +111,22 @@ class ResultTable(GUITable, DupeGuruGUIObject):
for dupe in self.app.results.dupes: for dupe in self.app.results.dupes:
group = self.app.results.get_group_of_duplicate(dupe) group = self.app.results.get_group_of_duplicate(dupe)
self.append(DupeRow(self, group, dupe)) self.append(DupeRow(self, group, dupe))
def _refresh_with_view(self): def _refresh_with_view(self):
self.refresh() self.refresh()
self.view.show_selected_row() self.view.show_selected_row()
#--- Public # --- Public
def get_row_value(self, index, column): def get_row_value(self, index, column):
try: try:
row = self[index] row = self[index]
except IndexError: except IndexError:
return '---' return "---"
if self.delta_values: if self.delta_values:
return row.data_delta[column] return row.data_delta[column]
else: else:
return row.data[column] return row.data[column]
def rename_selected(self, newname): def rename_selected(self, newname):
row = self.selected_row row = self.selected_row
if row is None: if row is None:
@@ -133,7 +136,7 @@ class ResultTable(GUITable, DupeGuruGUIObject):
row._data = None row._data = None
row._data_delta = None row._data_delta = None
return self.app.rename_selected(newname) return self.app.rename_selected(newname)
def sort(self, key, asc): def sort(self, key, asc):
if self.power_marker: if self.power_marker:
self.app.results.sort_dupes(key, asc, self.delta_values) self.app.results.sort_dupes(key, asc, self.delta_values)
@@ -141,12 +144,12 @@ class ResultTable(GUITable, DupeGuruGUIObject):
self.app.results.sort_groups(key, asc) self.app.results.sort_groups(key, asc)
self._sort_descriptors = (key, asc) self._sort_descriptors = (key, asc)
self._refresh_with_view() self._refresh_with_view()
#--- Properties # --- Properties
@property @property
def power_marker(self): def power_marker(self):
return self._power_marker return self._power_marker
@power_marker.setter @power_marker.setter
def power_marker(self, value): def power_marker(self, value):
if value == self._power_marker: if value == self._power_marker:
@@ -155,29 +158,29 @@ class ResultTable(GUITable, DupeGuruGUIObject):
key, asc = self._sort_descriptors key, asc = self._sort_descriptors
self.sort(key, asc) self.sort(key, asc)
# no need to refresh, it has happened in sort() # no need to refresh, it has happened in sort()
@property @property
def delta_values(self): def delta_values(self):
return self._delta_values return self._delta_values
@delta_values.setter @delta_values.setter
def delta_values(self, value): def delta_values(self, value):
if value == self._delta_values: if value == self._delta_values:
return return
self._delta_values = value self._delta_values = value
self.refresh() self.refresh()
@property @property
def selected_dupe_count(self): def selected_dupe_count(self):
return sum(1 for row in self.selected_rows if not row.isref) return sum(1 for row in self.selected_rows if not row.isref)
#--- Event Handlers # --- Event Handlers
def marking_changed(self): def marking_changed(self):
self.view.invalidate_markings() self.view.invalidate_markings()
def results_changed(self): def results_changed(self):
self._refresh_with_view() self._refresh_with_view()
def results_changed_but_keep_selection(self): def results_changed_but_keep_selection(self):
# What we want to to here is that instead of restoring selected *dupes* after refresh, we # What we want to to here is that instead of restoring selected *dupes* after refresh, we
# restore selected *paths*. # restore selected *paths*.
@@ -185,7 +188,6 @@ class ResultTable(GUITable, DupeGuruGUIObject):
self.refresh(refresh_view=False) self.refresh(refresh_view=False)
self.select(indexes) self.select(indexes)
self.view.refresh() self.view.refresh()
def save_session(self): def save_session(self):
self.columns.save_columns() self._columns.save_columns()

View File

@@ -1,21 +1,23 @@
# Created By: Virgil Dupras # Created By: Virgil Dupras
# Created On: 2010-02-11 # Created On: 2010-02-11
# Copyright 2015 Hardcoded Software (http://www.hardcoded.net) # Copyright 2015 Hardcoded Software (http://www.hardcoded.net)
# #
# This software is licensed under the "GPLv3" License as described in the "LICENSE" file, # This software is licensed under the "GPLv3" License as described in the "LICENSE" file,
# which should be included with this package. The terms are also available at # which should be included with this package. The terms are also available at
# http://www.gnu.org/licenses/gpl-3.0.html # http://www.gnu.org/licenses/gpl-3.0.html
from .base import DupeGuruGUIObject from core.gui.base import DupeGuruGUIObject
class StatsLabel(DupeGuruGUIObject): class StatsLabel(DupeGuruGUIObject):
def _view_updated(self): def _view_updated(self):
self.view.refresh() self.view.refresh()
@property @property
def display(self): def display(self):
return self.app.stat_line return self.app.stat_line
def results_changed(self): def results_changed(self):
self.view.refresh() self.view.refresh()
marking_changed = results_changed marking_changed = results_changed

View File

@@ -10,16 +10,17 @@ from xml.etree import ElementTree as ET
from hscommon.util import FileOrPath from hscommon.util import FileOrPath
class IgnoreList: class IgnoreList:
"""An ignore list implementation that is iterable, filterable and exportable to XML. """An ignore list implementation that is iterable, filterable and exportable to XML.
Call Ignore to add an ignore list entry, and AreIgnore to check if 2 items are in the list. Call Ignore to add an ignore list entry, and AreIgnore to check if 2 items are in the list.
When iterated, 2 sized tuples will be returned, the tuples containing 2 items ignored together. When iterated, 2 sized tuples will be returned, the tuples containing 2 items ignored together.
""" """
#---Override
# ---Override
def __init__(self): def __init__(self):
self._ignored = {} self.clear()
self._count = 0
def __iter__(self): def __iter__(self):
for first, seconds in self._ignored.items(): for first, seconds in self._ignored.items():
@@ -29,8 +30,8 @@ class IgnoreList:
def __len__(self): def __len__(self):
return self._count return self._count
#---Public # ---Public
def AreIgnored(self, first, second): def are_ignored(self, first, second):
def do_check(first, second): def do_check(first, second):
try: try:
matches = self._ignored[first] matches = self._ignored[first]
@@ -40,23 +41,23 @@ class IgnoreList:
return do_check(first, second) or do_check(second, first) return do_check(first, second) or do_check(second, first)
def Clear(self): def clear(self):
self._ignored = {} self._ignored = {}
self._count = 0 self._count = 0
def Filter(self, func): def filter(self, func):
"""Applies a filter on all ignored items, and remove all matches where func(first,second) """Applies a filter on all ignored items, and remove all matches where func(first,second)
doesn't return True. doesn't return True.
""" """
filtered = IgnoreList() filtered = IgnoreList()
for first, second in self: for first, second in self:
if func(first, second): if func(first, second):
filtered.Ignore(first, second) filtered.ignore(first, second)
self._ignored = filtered._ignored self._ignored = filtered._ignored
self._count = filtered._count self._count = filtered._count
def Ignore(self, first, second): def ignore(self, first, second):
if self.AreIgnored(first, second): if self.are_ignored(first, second):
return return
try: try:
matches = self._ignored[first] matches = self._ignored[first]
@@ -86,9 +87,8 @@ class IgnoreList:
except KeyError: except KeyError:
return False return False
if not inner(first, second): if not inner(first, second) and not inner(second, first):
if not inner(second, first): raise ValueError()
raise ValueError()
def load_from_xml(self, infile): def load_from_xml(self, infile):
"""Loads the ignore list from a XML created with save_to_xml. """Loads the ignore list from a XML created with save_to_xml.
@@ -99,31 +99,29 @@ class IgnoreList:
root = ET.parse(infile).getroot() root = ET.parse(infile).getroot()
except Exception: except Exception:
return return
file_elems = (e for e in root if e.tag == 'file') file_elems = (e for e in root if e.tag == "file")
for fn in file_elems: for fn in file_elems:
file_path = fn.get('path') file_path = fn.get("path")
if not file_path: if not file_path:
continue continue
subfile_elems = (e for e in fn if e.tag == 'file') subfile_elems = (e for e in fn if e.tag == "file")
for sfn in subfile_elems: for sfn in subfile_elems:
subfile_path = sfn.get('path') subfile_path = sfn.get("path")
if subfile_path: if subfile_path:
self.Ignore(file_path, subfile_path) self.ignore(file_path, subfile_path)
def save_to_xml(self, outfile): def save_to_xml(self, outfile):
"""Create a XML file that can be used by load_from_xml. """Create a XML file that can be used by load_from_xml.
outfile can be a file object or a filename. outfile can be a file object or a filename.
""" """
root = ET.Element('ignore_list') root = ET.Element("ignore_list")
for filename, subfiles in self._ignored.items(): for filename, subfiles in self._ignored.items():
file_node = ET.SubElement(root, 'file') file_node = ET.SubElement(root, "file")
file_node.set('path', filename) file_node.set("path", filename)
for subfilename in subfiles: for subfilename in subfiles:
subfile_node = ET.SubElement(file_node, 'file') subfile_node = ET.SubElement(file_node, "file")
subfile_node.set('path', subfilename) subfile_node.set("path", subfilename)
tree = ET.ElementTree(root) tree = ET.ElementTree(root)
with FileOrPath(outfile, 'wb') as fp: with FileOrPath(outfile, "wb") as fp:
tree.write(fp, encoding='utf-8') tree.write(fp, encoding="utf-8")

View File

@@ -2,40 +2,43 @@
# Created On: 2006/02/23 # Created On: 2006/02/23
# Copyright 2015 Hardcoded Software (http://www.hardcoded.net) # Copyright 2015 Hardcoded Software (http://www.hardcoded.net)
# This software is licensed under the "GPLv3" License as described in the "LICENSE" file, # This software is licensed under the "GPLv3" License as described in the "LICENSE" file,
# which should be included with this package. The terms are also available at # which should be included with this package. The terms are also available at
# http://www.gnu.org/licenses/gpl-3.0.html # http://www.gnu.org/licenses/gpl-3.0.html
class Markable: class Markable:
def __init__(self): def __init__(self):
self.__marked = set() self.__marked = set()
self.__inverted = False self.__inverted = False
#---Virtual # ---Virtual
#About did_mark and did_unmark: They only happen what an object is actually added/removed # About did_mark and did_unmark: They only happen what an object is actually added/removed
# in self.__marked, and is not affected by __inverted. Thus, self.mark while __inverted # in self.__marked, and is not affected by __inverted. Thus, self.mark while __inverted
#is True will launch _DidUnmark. # is True will launch _DidUnmark.
def _did_mark(self, o): def _did_mark(self, o):
# Implemented in child classes
pass pass
def _did_unmark(self, o): def _did_unmark(self, o):
# Implemented in child classes
pass pass
def _get_markable_count(self): def _get_markable_count(self):
return 0 return 0
def _is_markable(self, o): def _is_markable(self, o):
return True return True
#---Protected # ---Protected
def _remove_mark_flag(self, o): def _remove_mark_flag(self, o):
try: try:
self.__marked.remove(o) self.__marked.remove(o)
self._did_unmark(o) self._did_unmark(o)
except KeyError: except KeyError:
pass pass
#---Public # ---Public
def is_marked(self, o): def is_marked(self, o):
if not self._is_markable(o): if not self._is_markable(o):
return False return False
@@ -43,31 +46,31 @@ class Markable:
if self.__inverted: if self.__inverted:
is_marked = not is_marked is_marked = not is_marked
return is_marked return is_marked
def mark(self, o): def mark(self, o):
if self.is_marked(o): if self.is_marked(o):
return False return False
if not self._is_markable(o): if not self._is_markable(o):
return False return False
return self.mark_toggle(o) return self.mark_toggle(o)
def mark_multiple(self, objects): def mark_multiple(self, objects):
for o in objects: for o in objects:
self.mark(o) self.mark(o)
def mark_all(self): def mark_all(self):
self.mark_none() self.mark_none()
self.__inverted = True self.__inverted = True
def mark_invert(self): def mark_invert(self):
self.__inverted = not self.__inverted self.__inverted = not self.__inverted
def mark_none(self): def mark_none(self):
for o in self.__marked: for o in self.__marked:
self._did_unmark(o) self._did_unmark(o)
self.__marked = set() self.__marked = set()
self.__inverted = False self.__inverted = False
def mark_toggle(self, o): def mark_toggle(self, o):
try: try:
self.__marked.remove(o) self.__marked.remove(o)
@@ -78,32 +81,33 @@ class Markable:
self.__marked.add(o) self.__marked.add(o)
self._did_mark(o) self._did_mark(o)
return True return True
def mark_toggle_multiple(self, objects): def mark_toggle_multiple(self, objects):
for o in objects: for o in objects:
self.mark_toggle(o) self.mark_toggle(o)
def unmark(self, o): def unmark(self, o):
if not self.is_marked(o): if not self.is_marked(o):
return False return False
return self.mark_toggle(o) return self.mark_toggle(o)
def unmark_multiple(self, objects): def unmark_multiple(self, objects):
for o in objects: for o in objects:
self.unmark(o) self.unmark(o)
#--- Properties # --- Properties
@property @property
def mark_count(self): def mark_count(self):
if self.__inverted: if self.__inverted:
return self._get_markable_count() - len(self.__marked) return self._get_markable_count() - len(self.__marked)
else: else:
return len(self.__marked) return len(self.__marked)
@property @property
def mark_inverted(self): def mark_inverted(self):
return self.__inverted return self.__inverted
class MarkableList(list, Markable): class MarkableList(list, Markable):
def __init__(self): def __init__(self):
list.__init__(self) list.__init__(self)

View File

@@ -1 +1 @@
from . import fs, prioritize, result_table, scanner # noqa from core.me import fs, prioritize, result_table, scanner # noqa

View File

@@ -6,39 +6,54 @@
# which should be included with this package. The terms are also available at # which should be included with this package. The terms are also available at
# http://www.gnu.org/licenses/gpl-3.0.html # http://www.gnu.org/licenses/gpl-3.0.html
from hsaudiotag import auto import mutagen
from hscommon.util import get_file_ext, format_size, format_time from hscommon.util import get_file_ext, format_size, format_time
from core.util import format_timestamp, format_perc, format_words, format_dupe_count from core.util import format_timestamp, format_perc, format_words, format_dupe_count
from core import fs from core import fs
TAG_FIELDS = { TAG_FIELDS = {
'audiosize', 'duration', 'bitrate', 'samplerate', 'title', 'artist', "audiosize",
'album', 'genre', 'year', 'track', 'comment' "duration",
"bitrate",
"samplerate",
"title",
"artist",
"album",
"genre",
"year",
"track",
"comment",
} }
# This is a temporary workaround for migration from hsaudiotag for the can_handle method
SUPPORTED_EXTS = {"mp3", "wma", "m4a", "m4p", "ogg", "flac", "aif", "aiff", "aifc"}
class MusicFile(fs.File): class MusicFile(fs.File):
INITIAL_INFO = fs.File.INITIAL_INFO.copy() INITIAL_INFO = fs.File.INITIAL_INFO.copy()
INITIAL_INFO.update({ INITIAL_INFO.update(
'audiosize': 0, {
'bitrate': 0, "audiosize": 0,
'duration': 0, "bitrate": 0,
'samplerate': 0, "duration": 0,
'artist': '', "samplerate": 0,
'album': '', "artist": "",
'title': '', "album": "",
'genre': '', "title": "",
'comment': '', "genre": "",
'year': '', "comment": "",
'track': 0, "year": "",
}) "track": 0,
}
)
__slots__ = fs.File.__slots__ + tuple(INITIAL_INFO.keys()) __slots__ = fs.File.__slots__ + tuple(INITIAL_INFO.keys())
@classmethod @classmethod
def can_handle(cls, path): def can_handle(cls, path):
if not fs.File.can_handle(path): if not fs.File.can_handle(path):
return False return False
return get_file_ext(path.name) in auto.EXT2CLASS return get_file_ext(path.name) in SUPPORTED_EXTS
def get_display_info(self, group, delta): def get_display_info(self, group, delta):
size = self.size size = self.size
@@ -60,45 +75,41 @@ class MusicFile(fs.File):
else: else:
percentage = group.percentage percentage = group.percentage
dupe_count = len(group.dupes) dupe_count = len(group.dupes)
dupe_folder_path = getattr(self, 'display_folder_path', self.folder_path) dupe_folder_path = getattr(self, "display_folder_path", self.folder_path)
return { return {
'name': self.name, "name": self.name,
'folder_path': str(dupe_folder_path), "folder_path": str(dupe_folder_path),
'size': format_size(size, 2, 2, False), "size": format_size(size, 2, 2, False),
'duration': format_time(duration, with_hours=False), "duration": format_time(duration, with_hours=False),
'bitrate': str(bitrate), "bitrate": str(bitrate),
'samplerate': str(samplerate), "samplerate": str(samplerate),
'extension': self.extension, "extension": self.extension,
'mtime': format_timestamp(mtime, delta and m), "mtime": format_timestamp(mtime, delta and m),
'title': self.title, "title": self.title,
'artist': self.artist, "artist": self.artist,
'album': self.album, "album": self.album,
'genre': self.genre, "genre": self.genre,
'year': self.year, "year": self.year,
'track': str(self.track), "track": str(self.track),
'comment': self.comment, "comment": self.comment,
'percentage': format_perc(percentage), "percentage": format_perc(percentage),
'words': format_words(self.words) if hasattr(self, 'words') else '', "words": format_words(self.words) if hasattr(self, "words") else "",
'dupe_count': format_dupe_count(dupe_count), "dupe_count": format_dupe_count(dupe_count),
} }
def _get_md5partial_offset_and_size(self):
f = auto.File(str(self.path))
return (f.audio_offset, f.audio_size)
def _read_info(self, field): def _read_info(self, field):
fs.File._read_info(self, field) fs.File._read_info(self, field)
if field in TAG_FIELDS: if field in TAG_FIELDS:
f = auto.File(str(self.path)) # The various conversions here are to make this look like the previous implementation
self.audiosize = f.audio_size file = mutagen.File(str(self.path), easy=True)
self.bitrate = f.bitrate self.audiosize = self.path.stat().st_size
self.duration = f.duration self.bitrate = file.info.bitrate / 1000
self.samplerate = f.sample_rate self.duration = file.info.length
self.artist = f.artist self.samplerate = file.info.sample_rate
self.album = f.album self.artist = ", ".join(file.tags.get("artist") or [])
self.title = f.title self.album = ", ".join(file.tags.get("album") or [])
self.genre = f.genre self.title = ", ".join(file.tags.get("title") or [])
self.comment = f.comment self.genre = ", ".join(file.tags.get("genre") or [])
self.year = f.year self.comment = ", ".join(file.tags.get("comment") or [""])
self.track = f.track self.year = ", ".join(file.tags.get("date") or [])
self.track = (file.tags.get("tracknumber") or [""])[0]

View File

@@ -8,11 +8,16 @@
from hscommon.trans import trget from hscommon.trans import trget
from core.prioritize import ( from core.prioritize import (
KindCategory, FolderCategory, FilenameCategory, NumericalCategory, KindCategory,
SizeCategory, MtimeCategory FolderCategory,
FilenameCategory,
NumericalCategory,
SizeCategory,
MtimeCategory,
) )
coltr = trget('columns') coltr = trget("columns")
class DurationCategory(NumericalCategory): class DurationCategory(NumericalCategory):
NAME = coltr("Duration") NAME = coltr("Duration")
@@ -20,21 +25,29 @@ class DurationCategory(NumericalCategory):
def extract_value(self, dupe): def extract_value(self, dupe):
return dupe.duration return dupe.duration
class BitrateCategory(NumericalCategory): class BitrateCategory(NumericalCategory):
NAME = coltr("Bitrate") NAME = coltr("Bitrate")
def extract_value(self, dupe): def extract_value(self, dupe):
return dupe.bitrate return dupe.bitrate
class SamplerateCategory(NumericalCategory): class SamplerateCategory(NumericalCategory):
NAME = coltr("Samplerate") NAME = coltr("Samplerate")
def extract_value(self, dupe): def extract_value(self, dupe):
return dupe.samplerate return dupe.samplerate
def all_categories(): def all_categories():
return [ return [
KindCategory, FolderCategory, FilenameCategory, SizeCategory, DurationCategory, KindCategory,
BitrateCategory, SamplerateCategory, MtimeCategory FolderCategory,
FilenameCategory,
SizeCategory,
DurationCategory,
BitrateCategory,
SamplerateCategory,
MtimeCategory,
] ]

View File

@@ -1,8 +1,8 @@
# Created On: 2011-11-27 # Created On: 2011-11-27
# Copyright 2015 Hardcoded Software (http://www.hardcoded.net) # Copyright 2015 Hardcoded Software (http://www.hardcoded.net)
# #
# This software is licensed under the "GPLv3" License as described in the "LICENSE" file, # This software is licensed under the "GPLv3" License as described in the "LICENSE" file,
# which should be included with this package. The terms are also available at # which should be included with this package. The terms are also available at
# http://www.gnu.org/licenses/gpl-3.0.html # http://www.gnu.org/licenses/gpl-3.0.html
from hscommon.gui.column import Column from hscommon.gui.column import Column
@@ -10,28 +10,29 @@ from hscommon.trans import trget
from core.gui.result_table import ResultTable as ResultTableBase from core.gui.result_table import ResultTable as ResultTableBase
coltr = trget('columns') coltr = trget("columns")
class ResultTable(ResultTableBase): class ResultTable(ResultTableBase):
COLUMNS = [ COLUMNS = [
Column('marked', ''), Column("marked", ""),
Column('name', coltr("Filename")), Column("name", coltr("Filename")),
Column('folder_path', coltr("Folder"), visible=False, optional=True), Column("folder_path", coltr("Folder"), visible=False, optional=True),
Column('size', coltr("Size (MB)"), optional=True), Column("size", coltr("Size (MB)"), optional=True),
Column('duration', coltr("Time"), optional=True), Column("duration", coltr("Time"), optional=True),
Column('bitrate', coltr("Bitrate"), optional=True), Column("bitrate", coltr("Bitrate"), optional=True),
Column('samplerate', coltr("Sample Rate"), visible=False, optional=True), Column("samplerate", coltr("Sample Rate"), visible=False, optional=True),
Column('extension', coltr("Kind"), optional=True), Column("extension", coltr("Kind"), optional=True),
Column('mtime', coltr("Modification"), visible=False, optional=True), Column("mtime", coltr("Modification"), visible=False, optional=True),
Column('title', coltr("Title"), visible=False, optional=True), Column("title", coltr("Title"), visible=False, optional=True),
Column('artist', coltr("Artist"), visible=False, optional=True), Column("artist", coltr("Artist"), visible=False, optional=True),
Column('album', coltr("Album"), visible=False, optional=True), Column("album", coltr("Album"), visible=False, optional=True),
Column('genre', coltr("Genre"), visible=False, optional=True), Column("genre", coltr("Genre"), visible=False, optional=True),
Column('year', coltr("Year"), visible=False, optional=True), Column("year", coltr("Year"), visible=False, optional=True),
Column('track', coltr("Track Number"), visible=False, optional=True), Column("track", coltr("Track Number"), visible=False, optional=True),
Column('comment', coltr("Comment"), visible=False, optional=True), Column("comment", coltr("Comment"), visible=False, optional=True),
Column('percentage', coltr("Match %"), optional=True), Column("percentage", coltr("Match %"), optional=True),
Column('words', coltr("Words Used"), visible=False, optional=True), Column("words", coltr("Words Used"), visible=False, optional=True),
Column('dupe_count', coltr("Dupe Count"), visible=False, optional=True), Column("dupe_count", coltr("Dupe Count"), visible=False, optional=True),
] ]
DELTA_COLUMNS = {'size', 'duration', 'bitrate', 'samplerate', 'mtime'} DELTA_COLUMNS = {"size", "duration", "bitrate", "samplerate", "mtime"}

View File

@@ -8,6 +8,7 @@ from hscommon.trans import tr
from core.scanner import Scanner as ScannerBase, ScanOption, ScanType from core.scanner import Scanner as ScannerBase, ScanOption, ScanType
class ScannerME(ScannerBase): class ScannerME(ScannerBase):
@staticmethod @staticmethod
def _key_func(dupe): def _key_func(dupe):
@@ -16,11 +17,9 @@ class ScannerME(ScannerBase):
@staticmethod @staticmethod
def get_scan_options(): def get_scan_options():
return [ return [
ScanOption(ScanType.Filename, tr("Filename")), ScanOption(ScanType.FILENAME, tr("Filename")),
ScanOption(ScanType.Fields, tr("Filename - Fields")), ScanOption(ScanType.FIELDS, tr("Filename - Fields")),
ScanOption(ScanType.FieldsNoOrder, tr("Filename - Fields (No Order)")), ScanOption(ScanType.FIELDSNOORDER, tr("Filename - Fields (No Order)")),
ScanOption(ScanType.Tag, tr("Tags")), ScanOption(ScanType.TAG, tr("Tags")),
ScanOption(ScanType.Contents, tr("Contents")), ScanOption(ScanType.CONTENTS, tr("Contents")),
] ]

View File

@@ -1 +1,11 @@
from . import block, cache, exif, iphoto_plist, matchblock, matchexif, photo, prioritize, result_table, scanner # noqa from core.pe import ( # noqa
block,
cache,
exif,
matchblock,
matchexif,
photo,
prioritize,
result_table,
scanner,
)

View File

@@ -6,7 +6,7 @@
# which should be included with this package. The terms are also available at # which should be included with this package. The terms are also available at
# http://www.gnu.org/licenses/gpl-3.0.html # http://www.gnu.org/licenses/gpl-3.0.html
from ._block import NoBlocksError, DifferentBlockCountError, avgdiff, getblocks2 # NOQA from core.pe._block import NoBlocksError, DifferentBlockCountError, avgdiff, getblocks2 # NOQA
# Converted to C # Converted to C
# def getblock(image): # def getblock(image):

13
core/pe/block.pyi Normal file
View File

@@ -0,0 +1,13 @@
from typing import Tuple, List, Union, Sequence
_block = Tuple[int, int, int]
class NoBlocksError(Exception): ... # noqa: E302, E701
class DifferentBlockCountError(Exception): ... # noqa E701
def getblock(image: object) -> Union[_block, None]: ... # noqa: E302
def getblocks2(image: object, block_count_per_side: int) -> Union[List[_block], None]: ...
def diff(first: _block, second: _block) -> int: ...
def avgdiff( # noqa: E302
first: Sequence[_block], second: Sequence[_block], limit: int = 768, min_iterations: int = 1
) -> Union[int, None]: ...

View File

@@ -4,7 +4,8 @@
# which should be included with this package. The terms are also available at # which should be included with this package. The terms are also available at
# http://www.gnu.org/licenses/gpl-3.0.html # http://www.gnu.org/licenses/gpl-3.0.html
from ._cache import string_to_colors # noqa from core.pe._cache import string_to_colors # noqa
def colors_to_string(colors): def colors_to_string(colors):
"""Transform the 3 sized tuples 'colors' into a hex string. """Transform the 3 sized tuples 'colors' into a hex string.
@@ -12,7 +13,8 @@ def colors_to_string(colors):
[(0,100,255)] --> 0064ff [(0,100,255)] --> 0064ff
[(1,2,3),(4,5,6)] --> 010203040506 [(1,2,3),(4,5,6)] --> 010203040506
""" """
return ''.join('%02x%02x%02x' % (r, g, b) for r, g, b in colors) return "".join("{:02x}{:02x}{:02x}".format(r, g, b) for r, g, b in colors)
# This function is an important bottleneck of dupeGuru PE. It has been converted to C. # This function is an important bottleneck of dupeGuru PE. It has been converted to C.
# def string_to_colors(s): # def string_to_colors(s):
@@ -23,4 +25,3 @@ def colors_to_string(colors):
# number = int(s[i:i+6], 16) # number = int(s[i:i+6], 16)
# result.append((number >> 16, (number >> 8) & 0xff, number & 0xff)) # result.append((number >> 16, (number >> 8) & 0xff, number & 0xff))
# return result # return result

6
core/pe/cache.pyi Normal file
View File

@@ -0,0 +1,6 @@
from typing import Union, Tuple, List
_block = Tuple[int, int, int]
def colors_to_string(colors: List[_block]) -> str: ... # noqa: E302
def string_to_colors(s: str) -> Union[List[_block], None]: ...

View File

@@ -10,31 +10,37 @@ import shelve
import tempfile import tempfile
from collections import namedtuple from collections import namedtuple
from .cache import string_to_colors, colors_to_string from core.pe.cache import string_to_colors, colors_to_string
def wrap_path(path): def wrap_path(path):
return 'path:{}'.format(path) return f"path:{path}"
def unwrap_path(key): def unwrap_path(key):
return key[5:] return key[5:]
def wrap_id(path): def wrap_id(path):
return 'id:{}'.format(path) return f"id:{path}"
def unwrap_id(key): def unwrap_id(key):
return int(key[3:]) return int(key[3:])
CacheRow = namedtuple('CacheRow', 'id path blocks mtime')
CacheRow = namedtuple("CacheRow", "id path blocks mtime")
class ShelveCache: class ShelveCache:
"""A class to cache picture blocks in a shelve backend. """A class to cache picture blocks in a shelve backend."""
"""
def __init__(self, db=None, readonly=False): def __init__(self, db=None, readonly=False):
self.istmp = db is None self.istmp = db is None
if self.istmp: if self.istmp:
self.dtmp = tempfile.mkdtemp() self.dtmp = tempfile.mkdtemp()
self.ftmp = db = op.join(self.dtmp, 'tmpdb') self.ftmp = db = op.join(self.dtmp, "tmpdb")
flag = 'r' if readonly else 'c' flag = "r" if readonly else "c"
self.shelve = shelve.open(db, flag) self.shelve = shelve.open(db, flag)
self.maxid = self._compute_maxid() self.maxid = self._compute_maxid()
@@ -54,10 +60,10 @@ class ShelveCache:
return string_to_colors(self.shelve[skey].blocks) return string_to_colors(self.shelve[skey].blocks)
def __iter__(self): def __iter__(self):
return (unwrap_path(k) for k in self.shelve if k.startswith('path:')) return (unwrap_path(k) for k in self.shelve if k.startswith("path:"))
def __len__(self): def __len__(self):
return sum(1 for k in self.shelve if k.startswith('path:')) return sum(1 for k in self.shelve if k.startswith("path:"))
def __setitem__(self, path_str, blocks): def __setitem__(self, path_str, blocks):
blocks = colors_to_string(blocks) blocks = colors_to_string(blocks)
@@ -74,7 +80,7 @@ class ShelveCache:
self.shelve[wrap_id(rowid)] = wrap_path(path_str) self.shelve[wrap_id(rowid)] = wrap_path(path_str)
def _compute_maxid(self): def _compute_maxid(self):
return max((unwrap_id(k) for k in self.shelve if k.startswith('id:')), default=1) return max((unwrap_id(k) for k in self.shelve if k.startswith("id:")), default=1)
def _get_new_id(self): def _get_new_id(self):
self.maxid += 1 self.maxid += 1
@@ -133,4 +139,3 @@ class ShelveCache:
# #402 and #439. I don't think it hurts to silently ignore the error, so that's # #402 and #439. I don't think it hurts to silently ignore the error, so that's
# what we do # what we do
pass pass

View File

@@ -9,12 +9,13 @@ import os.path as op
import logging import logging
import sqlite3 as sqlite import sqlite3 as sqlite
from .cache import string_to_colors, colors_to_string from core.pe.cache import string_to_colors, colors_to_string
class SqliteCache: class SqliteCache:
"""A class to cache picture blocks in a sqlite backend. """A class to cache picture blocks in a sqlite backend."""
"""
def __init__(self, db=':memory:', readonly=False): def __init__(self, db=":memory:", readonly=False):
# readonly is not used in the sqlite version of the cache # readonly is not used in the sqlite version of the cache
self.dbname = db self.dbname = db
self.con = None self.con = None
@@ -67,9 +68,9 @@ class SqliteCache:
try: try:
self.con.execute(sql, [blocks, mtime, path_str]) self.con.execute(sql, [blocks, mtime, path_str])
except sqlite.OperationalError: except sqlite.OperationalError:
logging.warning('Picture cache could not set value for key %r', path_str) logging.warning("Picture cache could not set value for key %r", path_str)
except sqlite.DatabaseError as e: except sqlite.DatabaseError as e:
logging.warning('DatabaseError while setting value for key %r: %s', path_str, str(e)) logging.warning("DatabaseError while setting value for key %r: %s", path_str, str(e))
def _create_con(self, second_try=False): def _create_con(self, second_try=False):
def create_tables(): def create_tables():
@@ -82,19 +83,19 @@ class SqliteCache:
self.con = sqlite.connect(self.dbname, isolation_level=None) self.con = sqlite.connect(self.dbname, isolation_level=None)
try: try:
self.con.execute("select path, mtime, blocks from pictures where 1=2") self.con.execute("select path, mtime, blocks from pictures where 1=2")
except sqlite.OperationalError: # new db except sqlite.OperationalError: # new db
create_tables() create_tables()
except sqlite.DatabaseError as e: # corrupted db except sqlite.DatabaseError as e: # corrupted db
if second_try: if second_try:
raise # Something really strange is happening raise # Something really strange is happening
logging.warning('Could not create picture cache because of an error: %s', str(e)) logging.warning("Could not create picture cache because of an error: %s", str(e))
self.con.close() self.con.close()
os.remove(self.dbname) os.remove(self.dbname)
self._create_con(second_try=True) self._create_con(second_try=True)
def clear(self): def clear(self):
self.close() self.close()
if self.dbname != ':memory:': if self.dbname != ":memory:":
os.remove(self.dbname) os.remove(self.dbname)
self._create_con() self._create_con()
@@ -117,7 +118,7 @@ class SqliteCache:
raise ValueError(path) raise ValueError(path)
def get_multiple(self, rowids): def get_multiple(self, rowids):
sql = "select rowid, blocks from pictures where rowid in (%s)" % ','.join(map(str, rowids)) sql = "select rowid, blocks from pictures where rowid in (%s)" % ",".join(map(str, rowids))
cur = self.con.execute(sql) cur = self.con.execute(sql)
return ((rowid, string_to_colors(blocks)) for rowid, blocks in cur) return ((rowid, string_to_colors(blocks)) for rowid, blocks in cur)
@@ -138,6 +139,5 @@ class SqliteCache:
continue continue
todelete.append(rowid) todelete.append(rowid)
if todelete: if todelete:
sql = "delete from pictures where rowid in (%s)" % ','.join(map(str, todelete)) sql = "delete from pictures where rowid in (%s)" % ",".join(map(str, todelete))
self.con.execute(sql) self.con.execute(sql)

View File

@@ -83,17 +83,17 @@ EXIF_TAGS = {
0xA003: "PixelYDimension", 0xA003: "PixelYDimension",
0xA004: "RelatedSoundFile", 0xA004: "RelatedSoundFile",
0xA005: "InteroperabilityIFDPointer", 0xA005: "InteroperabilityIFDPointer",
0xA20B: "FlashEnergy", # 0x920B in TIFF/EP 0xA20B: "FlashEnergy", # 0x920B in TIFF/EP
0xA20C: "SpatialFrequencyResponse", # 0x920C - - 0xA20C: "SpatialFrequencyResponse", # 0x920C - -
0xA20E: "FocalPlaneXResolution", # 0x920E - - 0xA20E: "FocalPlaneXResolution", # 0x920E - -
0xA20F: "FocalPlaneYResolution", # 0x920F - - 0xA20F: "FocalPlaneYResolution", # 0x920F - -
0xA210: "FocalPlaneResolutionUnit", # 0x9210 - - 0xA210: "FocalPlaneResolutionUnit", # 0x9210 - -
0xA214: "SubjectLocation", # 0x9214 - - 0xA214: "SubjectLocation", # 0x9214 - -
0xA215: "ExposureIndex", # 0x9215 - - 0xA215: "ExposureIndex", # 0x9215 - -
0xA217: "SensingMethod", # 0x9217 - - 0xA217: "SensingMethod", # 0x9217 - -
0xA300: "FileSource", 0xA300: "FileSource",
0xA301: "SceneType", 0xA301: "SceneType",
0xA302: "CFAPattern", # 0x828E in TIFF/EP 0xA302: "CFAPattern", # 0x828E in TIFF/EP
0xA401: "CustomRendered", 0xA401: "CustomRendered",
0xA402: "ExposureMode", 0xA402: "ExposureMode",
0xA403: "WhiteBalance", 0xA403: "WhiteBalance",
@@ -148,17 +148,18 @@ GPS_TA0GS = {
0x1B: "GPSProcessingMethod", 0x1B: "GPSProcessingMethod",
0x1C: "GPSAreaInformation", 0x1C: "GPSAreaInformation",
0x1D: "GPSDateStamp", 0x1D: "GPSDateStamp",
0x1E: "GPSDifferential" 0x1E: "GPSDifferential",
} }
INTEL_ENDIAN = ord('I') INTEL_ENDIAN = ord("I")
MOTOROLA_ENDIAN = ord('M') MOTOROLA_ENDIAN = ord("M")
# About MAX_COUNT: It's possible to have corrupted exif tags where the entry count is way too high # About MAX_COUNT: It's possible to have corrupted exif tags where the entry count is way too high
# and thus makes us loop, not endlessly, but for heck of a long time for nothing. Therefore, we put # and thus makes us loop, not endlessly, but for heck of a long time for nothing. Therefore, we put
# an arbitrary limit on the entry count we'll allow ourselves to read and any IFD reporting more # an arbitrary limit on the entry count we'll allow ourselves to read and any IFD reporting more
# entries than that will be considered corrupt. # entries than that will be considered corrupt.
MAX_COUNT = 0xffff MAX_COUNT = 0xFFFF
def s2n_motorola(bytes): def s2n_motorola(bytes):
x = 0 x = 0
@@ -166,6 +167,7 @@ def s2n_motorola(bytes):
x = (x << 8) | c x = (x << 8) | c
return x return x
def s2n_intel(bytes): def s2n_intel(bytes):
x = 0 x = 0
y = 0 y = 0
@@ -174,13 +176,14 @@ def s2n_intel(bytes):
y = y + 8 y = y + 8
return x return x
class Fraction: class Fraction:
def __init__(self, num, den): def __init__(self, num, den):
self.num = num self.num = num
self.den = den self.den = den
def __repr__(self): def __repr__(self):
return '%d/%d' % (self.num, self.den) return "%d/%d" % (self.num, self.den)
class TIFF_file: class TIFF_file:
@@ -190,16 +193,22 @@ class TIFF_file:
self.s2nfunc = s2n_intel if self.endian == INTEL_ENDIAN else s2n_motorola self.s2nfunc = s2n_intel if self.endian == INTEL_ENDIAN else s2n_motorola
def s2n(self, offset, length, signed=0, debug=False): def s2n(self, offset, length, signed=0, debug=False):
slice = self.data[offset:offset+length] data_slice = self.data[offset : offset + length]
val = self.s2nfunc(slice) val = self.s2nfunc(data_slice)
# Sign extension ? # Sign extension ?
if signed: if signed:
msb = 1 << (8*length - 1) msb = 1 << (8 * length - 1)
if val & msb: if val & msb:
val = val - (msb << 1) val = val - (msb << 1)
if debug: if debug:
logging.debug(self.endian) logging.debug(self.endian)
logging.debug("Slice for offset %d length %d: %r and value: %d", offset, length, slice, val) logging.debug(
"Slice for offset %d length %d: %r and value: %d",
offset,
length,
data_slice,
val,
)
return val return val
def first_IFD(self): def first_IFD(self):
@@ -225,82 +234,84 @@ class TIFF_file:
return [] return []
a = [] a = []
for i in range(entries): for i in range(entries):
entry = ifd + 2 + 12*i entry = ifd + 2 + 12 * i
tag = self.s2n(entry, 2) tag = self.s2n(entry, 2)
type = self.s2n(entry+2, 2) entry_type = self.s2n(entry + 2, 2)
if not 1 <= type <= 10: if not 1 <= entry_type <= 10:
continue # not handled continue # not handled
typelen = [1, 1, 2, 4, 8, 1, 1, 2, 4, 8][type-1] typelen = [1, 1, 2, 4, 8, 1, 1, 2, 4, 8][entry_type - 1]
count = self.s2n(entry+4, 4) count = self.s2n(entry + 4, 4)
if count > MAX_COUNT: if count > MAX_COUNT:
logging.debug("Probably corrupt. Aborting.") logging.debug("Probably corrupt. Aborting.")
return [] return []
offset = entry+8 offset = entry + 8
if count*typelen > 4: if count * typelen > 4:
offset = self.s2n(offset, 4) offset = self.s2n(offset, 4)
if type == 2: if entry_type == 2:
# Special case: nul-terminated ASCII string # Special case: nul-terminated ASCII string
values = str(self.data[offset:offset+count-1], encoding='latin-1') values = str(self.data[offset : offset + count - 1], encoding="latin-1")
else: else:
values = [] values = []
signed = (type == 6 or type >= 8) signed = entry_type == 6 or entry_type >= 8
for j in range(count): for _ in range(count):
if type in {5, 10}: if entry_type in {5, 10}:
# The type is either 5 or 10 # The type is either 5 or 10
value_j = Fraction(self.s2n(offset, 4, signed), value_j = Fraction(self.s2n(offset, 4, signed), self.s2n(offset + 4, 4, signed))
self.s2n(offset+4, 4, signed))
else: else:
# Not a fraction # Not a fraction
value_j = self.s2n(offset, typelen, signed) value_j = self.s2n(offset, typelen, signed)
values.append(value_j) values.append(value_j)
offset = offset + typelen offset = offset + typelen
# Now "values" is either a string or an array # Now "values" is either a string or an array
a.append((tag, type, values)) a.append((tag, entry_type, values))
return a return a
def read_exif_header(fp): def read_exif_header(fp):
# If `fp`'s first bytes are not exif, it tries to find it in the next 4kb # If `fp`'s first bytes are not exif, it tries to find it in the next 4kb
def isexif(data): def isexif(data):
return data[0:4] == b'\377\330\377\341' and data[6:10] == b'Exif' return data[0:4] == b"\377\330\377\341" and data[6:10] == b"Exif"
data = fp.read(12) data = fp.read(12)
if isexif(data): if isexif(data):
return data return data
# ok, not exif, try to find it # ok, not exif, try to find it
large_data = fp.read(4096) large_data = fp.read(4096)
try: try:
index = large_data.index(b'Exif') index = large_data.index(b"Exif")
data = large_data[index-6:index+6] data = large_data[index - 6 : index + 6]
# large_data omits the first 12 bytes, and the index is at the middle of the header, so we # large_data omits the first 12 bytes, and the index is at the middle of the header, so we
# must seek index + 18 # must seek index + 18
fp.seek(index+18) fp.seek(index + 18)
return data return data
except ValueError: except ValueError:
raise ValueError("Not an Exif file") raise ValueError("Not an Exif file")
def get_fields(fp): def get_fields(fp):
data = read_exif_header(fp) data = read_exif_header(fp)
length = data[4] * 256 + data[5] length = data[4] * 256 + data[5]
logging.debug("Exif header length: %d bytes", length) logging.debug("Exif header length: %d bytes", length)
data = fp.read(length-8) data = fp.read(length - 8)
data_format = data[0] data_format = data[0]
logging.debug("%s format", {INTEL_ENDIAN: 'Intel', MOTOROLA_ENDIAN: 'Motorola'}[data_format]) logging.debug("%s format", {INTEL_ENDIAN: "Intel", MOTOROLA_ENDIAN: "Motorola"}[data_format])
T = TIFF_file(data) T = TIFF_file(data)
# There may be more than one IFD per file, but we only read the first one because others are # There may be more than one IFD per file, but we only read the first one because others are
# most likely thumbnails. # most likely thumbnails.
main_IFD_offset = T.first_IFD() main_ifd_offset = T.first_IFD()
result = {} result = {}
def add_tag_to_result(tag, values): def add_tag_to_result(tag, values):
try: try:
stag = EXIF_TAGS[tag] stag = EXIF_TAGS[tag]
except KeyError: except KeyError:
stag = '0x%04X' % tag stag = "0x%04X" % tag
if stag in result: if stag in result:
return # don't overwrite data return # don't overwrite data
result[stag] = values result[stag] = values
logging.debug("IFD at offset %d", main_IFD_offset) logging.debug("IFD at offset %d", main_ifd_offset)
IFD = T.dump_IFD(main_IFD_offset) IFD = T.dump_IFD(main_ifd_offset)
exif_off = gps_off = 0 exif_off = gps_off = 0
for tag, type, values in IFD: for tag, type, values in IFD:
if tag == 0x8769: if tag == 0x8769:

View File

@@ -1,31 +0,0 @@
# Created By: Virgil Dupras
# Created On: 2014-03-15
# Copyright 2015 Hardcoded Software (http://www.hardcoded.net)
#
# This software is licensed under the "GPLv3" License as described in the "LICENSE" file,
# which should be included with this package. The terms are also available at
# http://www.gnu.org/licenses/gpl-3.0.html
import plistlib
class IPhotoPlistParser(plistlib._PlistParser):
"""A parser for iPhoto plists.
iPhoto plists tend to be malformed, so we have to subclass the built-in parser to be a bit more
lenient.
"""
def __init__(self):
plistlib._PlistParser.__init__(self, use_builtin_types=True, dict_type=dict)
# For debugging purposes, we remember the last bit of data to be analyzed so that we can
# log it in case of an exception
self.lastdata = ''
def get_data(self):
self.lastdata = plistlib._PlistParser.get_data(self)
return self.lastdata
def end_integer(self):
try:
self.add_object(int(self.get_data()))
except ValueError:
self.add_object(0)

View File

@@ -15,7 +15,7 @@ from hscommon.trans import tr
from hscommon.jobprogress import job from hscommon.jobprogress import job
from core.engine import Match from core.engine import Match
from .block import avgdiff, DifferentBlockCountError, NoBlocksError from core.pe.block import avgdiff, DifferentBlockCountError, NoBlocksError
# OPTIMIZATION NOTES: # OPTIMIZATION NOTES:
# The bottleneck of the matching phase is CPU, which is why we use multiprocessing. However, another # The bottleneck of the matching phase is CPU, which is why we use multiprocessing. However, another
@@ -48,14 +48,18 @@ except Exception:
logging.warning("Had problems to determine cpu count on launch.") logging.warning("Had problems to determine cpu count on launch.")
RESULTS_QUEUE_LIMIT = 8 RESULTS_QUEUE_LIMIT = 8
def get_cache(cache_path, readonly=False): def get_cache(cache_path, readonly=False):
if cache_path.endswith('shelve'): if cache_path.endswith("shelve"):
from .cache_shelve import ShelveCache from core.pe.cache_shelve import ShelveCache
return ShelveCache(cache_path, readonly=readonly) return ShelveCache(cache_path, readonly=readonly)
else: else:
from .cache_sqlite import SqliteCache from core.pe.cache_sqlite import SqliteCache
return SqliteCache(cache_path, readonly=readonly) return SqliteCache(cache_path, readonly=readonly)
def prepare_pictures(pictures, cache_path, with_dimensions, j=job.nulljob): def prepare_pictures(pictures, cache_path, with_dimensions, j=job.nulljob):
# The MemoryError handlers in there use logging without first caring about whether or not # The MemoryError handlers in there use logging without first caring about whether or not
# there is enough memory left to carry on the operation because it is assumed that the # there is enough memory left to carry on the operation because it is assumed that the
@@ -63,7 +67,7 @@ def prepare_pictures(pictures, cache_path, with_dimensions, j=job.nulljob):
# time that MemoryError is raised. # time that MemoryError is raised.
cache = get_cache(cache_path) cache = get_cache(cache_path)
cache.purge_outdated() cache.purge_outdated()
prepared = [] # only pictures for which there was no error getting blocks prepared = [] # only pictures for which there was no error getting blocks
try: try:
for picture in j.iter_with_progress(pictures, tr("Analyzed %d/%d pictures")): for picture in j.iter_with_progress(pictures, tr("Analyzed %d/%d pictures")):
if not picture.path: if not picture.path:
@@ -77,41 +81,50 @@ def prepare_pictures(pictures, cache_path, with_dimensions, j=job.nulljob):
picture.unicode_path = str(picture.path) picture.unicode_path = str(picture.path)
logging.debug("Analyzing picture at %s", picture.unicode_path) logging.debug("Analyzing picture at %s", picture.unicode_path)
if with_dimensions: if with_dimensions:
picture.dimensions # pre-read dimensions picture.dimensions # pre-read dimensions
try: try:
if picture.unicode_path not in cache: if picture.unicode_path not in cache:
blocks = picture.get_blocks(BLOCK_COUNT_PER_SIDE) blocks = picture.get_blocks(BLOCK_COUNT_PER_SIDE)
cache[picture.unicode_path] = blocks cache[picture.unicode_path] = blocks
prepared.append(picture) prepared.append(picture)
except (IOError, ValueError) as e: except (OSError, ValueError) as e:
logging.warning(str(e)) logging.warning(str(e))
except MemoryError: except MemoryError:
logging.warning("Ran out of memory while reading %s of size %d", picture.unicode_path, picture.size) logging.warning(
if picture.size < 10 * 1024 * 1024: # We're really running out of memory "Ran out of memory while reading %s of size %d",
picture.unicode_path,
picture.size,
)
if picture.size < 10 * 1024 * 1024: # We're really running out of memory
raise raise
except MemoryError: except MemoryError:
logging.warning('Ran out of memory while preparing pictures') logging.warning("Ran out of memory while preparing pictures")
cache.close() cache.close()
return prepared return prepared
def get_chunks(pictures): def get_chunks(pictures):
min_chunk_count = multiprocessing.cpu_count() * 2 # have enough chunks to feed all subprocesses min_chunk_count = multiprocessing.cpu_count() * 2 # have enough chunks to feed all subprocesses
chunk_count = len(pictures) // DEFAULT_CHUNK_SIZE chunk_count = len(pictures) // DEFAULT_CHUNK_SIZE
chunk_count = max(min_chunk_count, chunk_count) chunk_count = max(min_chunk_count, chunk_count)
chunk_size = (len(pictures) // chunk_count) + 1 chunk_size = (len(pictures) // chunk_count) + 1
chunk_size = max(MIN_CHUNK_SIZE, chunk_size) chunk_size = max(MIN_CHUNK_SIZE, chunk_size)
logging.info( logging.info(
"Creating %d chunks with a chunk size of %d for %d pictures", chunk_count, "Creating %d chunks with a chunk size of %d for %d pictures",
chunk_size, len(pictures) chunk_count,
chunk_size,
len(pictures),
) )
chunks = [pictures[i:i+chunk_size] for i in range(0, len(pictures), chunk_size)] chunks = [pictures[i : i + chunk_size] for i in range(0, len(pictures), chunk_size)]
return chunks return chunks
def get_match(first, second, percentage): def get_match(first, second, percentage):
if percentage < 0: if percentage < 0:
percentage = 0 percentage = 0
return Match(first, second, percentage) return Match(first, second, percentage)
def async_compare(ref_ids, other_ids, dbname, threshold, picinfo): def async_compare(ref_ids, other_ids, dbname, threshold, picinfo):
# The list of ids in ref_ids have to be compared to the list of ids in other_ids. other_ids # The list of ids in ref_ids have to be compared to the list of ids in other_ids. other_ids
# can be None. In this case, ref_ids has to be compared with itself # can be None. In this case, ref_ids has to be compared with itself
@@ -142,6 +155,7 @@ def async_compare(ref_ids, other_ids, dbname, threshold, picinfo):
cache.close() cache.close()
return results return results
def getmatches(pictures, cache_path, threshold, match_scaled=False, j=job.nulljob): def getmatches(pictures, cache_path, threshold, match_scaled=False, j=job.nulljob):
def get_picinfo(p): def get_picinfo(p):
if match_scaled: if match_scaled:
@@ -160,7 +174,10 @@ def getmatches(pictures, cache_path, threshold, match_scaled=False, j=job.nulljo
async_results.remove(result) async_results.remove(result)
comparison_count += 1 comparison_count += 1
# About the NOQA below: I think there's a bug in pyflakes. To investigate... # About the NOQA below: I think there's a bug in pyflakes. To investigate...
progress_msg = tr("Performed %d/%d chunk matches") % (comparison_count, len(comparisons_to_do)) # NOQA progress_msg = tr("Performed %d/%d chunk matches") % (
comparison_count,
len(comparisons_to_do),
) # NOQA
j.set_progress(comparison_count, progress_msg) j.set_progress(comparison_count, progress_msg)
j = j.start_subjob([3, 7]) j = j.start_subjob([3, 7])
@@ -175,7 +192,7 @@ def getmatches(pictures, cache_path, threshold, match_scaled=False, j=job.nulljo
except ValueError: except ValueError:
pass pass
cache.close() cache.close()
pictures = [p for p in pictures if hasattr(p, 'cache_id')] pictures = [p for p in pictures if hasattr(p, "cache_id")]
pool = multiprocessing.Pool() pool = multiprocessing.Pool()
async_results = [] async_results = []
matches = [] matches = []
@@ -203,9 +220,13 @@ def getmatches(pictures, cache_path, threshold, match_scaled=False, j=job.nulljo
# some wiggle room, log about the incident, and stop matching right here. We then process # some wiggle room, log about the incident, and stop matching right here. We then process
# the matches we have. The rest of the process doesn't allocate much and we should be # the matches we have. The rest of the process doesn't allocate much and we should be
# alright. # alright.
del comparisons_to_do, chunks, pictures # some wiggle room for the next statements del (
comparisons_to_do,
chunks,
pictures,
) # some wiggle room for the next statements
logging.warning("Ran out of memory when scanning! We had %d matches.", len(matches)) logging.warning("Ran out of memory when scanning! We had %d matches.", len(matches))
del matches[-len(matches)//3:] # some wiggle room to ensure we don't run out of memory again. del matches[-len(matches) // 3 :] # some wiggle room to ensure we don't run out of memory again.
pool.close() pool.close()
result = [] result = []
myiter = j.iter_with_progress( myiter = j.iter_with_progress(
@@ -217,13 +238,14 @@ def getmatches(pictures, cache_path, threshold, match_scaled=False, j=job.nulljo
for ref_id, other_id, percentage in myiter: for ref_id, other_id, percentage in myiter:
ref = id2picture[ref_id] ref = id2picture[ref_id]
other = id2picture[other_id] other = id2picture[other_id]
if percentage == 100 and ref.md5 != other.md5: if percentage == 100 and ref.digest != other.digest:
percentage = 99 percentage = 99
if percentage >= threshold: if percentage >= threshold:
ref.dimensions # pre-read dimensions for display in results ref.dimensions # pre-read dimensions for display in results
other.dimensions other.dimensions
result.append(get_match(ref, other, percentage)) result.append(get_match(ref, other, percentage))
pool.join()
return result return result
multiprocessing.freeze_support()
multiprocessing.freeze_support()

View File

@@ -13,14 +13,15 @@ from hscommon.trans import tr
from core.engine import Match from core.engine import Match
def getmatches(files, match_scaled, j): def getmatches(files, match_scaled, j):
timestamp2pic = defaultdict(set) timestamp2pic = defaultdict(set)
for picture in j.iter_with_progress(files, tr("Read EXIF of %d/%d pictures")): for picture in j.iter_with_progress(files, tr("Read EXIF of %d/%d pictures")):
timestamp = picture.exif_timestamp timestamp = picture.exif_timestamp
if timestamp: if timestamp:
timestamp2pic[timestamp].add(picture) timestamp2pic[timestamp].add(picture)
if '0000:00:00 00:00:00' in timestamp2pic: # very likely false matches if "0000:00:00 00:00:00" in timestamp2pic: # very likely false matches
del timestamp2pic['0000:00:00 00:00:00'] del timestamp2pic["0000:00:00 00:00:00"]
matches = [] matches = []
for pictures in timestamp2pic.values(): for pictures in timestamp2pic.values():
for p1, p2 in combinations(pictures, 2): for p1, p2 in combinations(pictures, 2):
@@ -28,4 +29,3 @@ def getmatches(files, match_scaled, j):
continue continue
matches.append(Match(p1, p2, 100)) matches.append(Match(p1, p2, 100))
return matches return matches

View File

@@ -2,9 +2,9 @@
* Created On: 2010-01-30 * Created On: 2010-01-30
* Copyright 2014 Hardcoded Software (http://www.hardcoded.net) * Copyright 2014 Hardcoded Software (http://www.hardcoded.net)
* *
* This software is licensed under the "BSD" License as described in the "LICENSE" file, * This software is licensed under the "BSD" License as described in the
* which should be included with this package. The terms are also available at * "LICENSE" file, which should be included with this package. The terms are
* http://www.hardcoded.net/licenses/bsd_license * also available at http://www.hardcoded.net/licenses/bsd_license
*/ */
#include "common.h" #include "common.h"
@@ -14,86 +14,84 @@ static PyObject *NoBlocksError;
/* avgdiff/maxdiff has been called with 2 block lists of different size. */ /* avgdiff/maxdiff has been called with 2 block lists of different size. */
static PyObject *DifferentBlockCountError; static PyObject *DifferentBlockCountError;
/* Returns a 3 sized tuple containing the mean color of 'image'. /* Returns a 3 sized tuple containing the mean color of 'image'.
* image: a PIL image or crop. * image: a PIL image or crop.
*/ */
static PyObject* getblock(PyObject *image) static PyObject *getblock(PyObject *image) {
{ int i, totr, totg, totb;
int i, totr, totg, totb; Py_ssize_t pixel_count;
Py_ssize_t pixel_count; PyObject *ppixels;
PyObject *ppixels;
totr = totg = totb = 0;
totr = totg = totb = 0; ppixels = PyObject_CallMethod(image, "getdata", NULL);
ppixels = PyObject_CallMethod(image, "getdata", NULL); if (ppixels == NULL) {
if (ppixels == NULL) { return NULL;
return NULL; }
}
pixel_count = PySequence_Length(ppixels);
pixel_count = PySequence_Length(ppixels); for (i = 0; i < pixel_count; i++) {
for (i=0; i<pixel_count; i++) { PyObject *ppixel, *pr, *pg, *pb;
PyObject *ppixel, *pr, *pg, *pb; int r, g, b;
int r, g, b;
ppixel = PySequence_ITEM(ppixels, i);
ppixel = PySequence_ITEM(ppixels, i); pr = PySequence_ITEM(ppixel, 0);
pr = PySequence_ITEM(ppixel, 0); pg = PySequence_ITEM(ppixel, 1);
pg = PySequence_ITEM(ppixel, 1); pb = PySequence_ITEM(ppixel, 2);
pb = PySequence_ITEM(ppixel, 2); Py_DECREF(ppixel);
Py_DECREF(ppixel); r = PyLong_AsLong(pr);
r = PyLong_AsLong(pr); g = PyLong_AsLong(pg);
g = PyLong_AsLong(pg); b = PyLong_AsLong(pb);
b = PyLong_AsLong(pb); Py_DECREF(pr);
Py_DECREF(pr); Py_DECREF(pg);
Py_DECREF(pg); Py_DECREF(pb);
Py_DECREF(pb);
totr += r;
totr += r; totg += g;
totg += g; totb += b;
totb += b; }
}
Py_DECREF(ppixels);
Py_DECREF(ppixels);
if (pixel_count) {
if (pixel_count) { totr /= pixel_count;
totr /= pixel_count; totg /= pixel_count;
totg /= pixel_count; totb /= pixel_count;
totb /= pixel_count; }
}
return inttuple(3, totr, totg, totb);
return inttuple(3, totr, totg, totb);
} }
/* Returns the difference between the first block and the second. /* Returns the difference between the first block and the second.
* It returns an absolute sum of the 3 differences (RGB). * It returns an absolute sum of the 3 differences (RGB).
*/ */
static int diff(PyObject *first, PyObject *second) static int diff(PyObject *first, PyObject *second) {
{ int r1, g1, b1, r2, b2, g2;
int r1, g1, b1, r2, b2, g2; PyObject *pr, *pg, *pb;
PyObject *pr, *pg, *pb; pr = PySequence_ITEM(first, 0);
pr = PySequence_ITEM(first, 0); pg = PySequence_ITEM(first, 1);
pg = PySequence_ITEM(first, 1); pb = PySequence_ITEM(first, 2);
pb = PySequence_ITEM(first, 2); r1 = PyLong_AsLong(pr);
r1 = PyLong_AsLong(pr); g1 = PyLong_AsLong(pg);
g1 = PyLong_AsLong(pg); b1 = PyLong_AsLong(pb);
b1 = PyLong_AsLong(pb); Py_DECREF(pr);
Py_DECREF(pr); Py_DECREF(pg);
Py_DECREF(pg); Py_DECREF(pb);
Py_DECREF(pb);
pr = PySequence_ITEM(second, 0);
pr = PySequence_ITEM(second, 0); pg = PySequence_ITEM(second, 1);
pg = PySequence_ITEM(second, 1); pb = PySequence_ITEM(second, 2);
pb = PySequence_ITEM(second, 2); r2 = PyLong_AsLong(pr);
r2 = PyLong_AsLong(pr); g2 = PyLong_AsLong(pg);
g2 = PyLong_AsLong(pg); b2 = PyLong_AsLong(pb);
b2 = PyLong_AsLong(pb); Py_DECREF(pr);
Py_DECREF(pr); Py_DECREF(pg);
Py_DECREF(pg); Py_DECREF(pb);
Py_DECREF(pb);
return abs(r1 - r2) + abs(g1 - g2) + abs(b1 - b2);
return abs(r1 - r2) + abs(g1 - g2) + abs(b1 - b2);
} }
PyDoc_STRVAR(block_getblocks2_doc, PyDoc_STRVAR(block_getblocks2_doc,
"Returns a list of blocks (3 sized tuples).\n\ "Returns a list of blocks (3 sized tuples).\n\
\n\ \n\
image: A PIL image to base the blocks on.\n\ image: A PIL image to base the blocks on.\n\
block_count_per_side: This integer determine the number of blocks the function will return.\n\ block_count_per_side: This integer determine the number of blocks the function will return.\n\
@@ -101,153 +99,150 @@ If it is 10, for example, 100 blocks will be returns (10 width, 10 height). The
necessarely cover square areas. The area covered by each block will be proportional to the image\n\ necessarely cover square areas. The area covered by each block will be proportional to the image\n\
itself.\n"); itself.\n");
static PyObject* block_getblocks2(PyObject *self, PyObject *args) static PyObject *block_getblocks2(PyObject *self, PyObject *args) {
{ int block_count_per_side, width, height, block_width, block_height, ih;
int block_count_per_side, width, height, block_width, block_height, ih; PyObject *image;
PyObject *image; PyObject *pimage_size, *pwidth, *pheight;
PyObject *pimage_size, *pwidth, *pheight; PyObject *result;
PyObject *result;
if (!PyArg_ParseTuple(args, "Oi", &image, &block_count_per_side)) {
if (!PyArg_ParseTuple(args, "Oi", &image, &block_count_per_side)) { return NULL;
}
pimage_size = PyObject_GetAttrString(image, "size");
pwidth = PySequence_ITEM(pimage_size, 0);
pheight = PySequence_ITEM(pimage_size, 1);
width = PyLong_AsLong(pwidth);
height = PyLong_AsLong(pheight);
Py_DECREF(pimage_size);
Py_DECREF(pwidth);
Py_DECREF(pheight);
if (!(width && height)) {
return PyList_New(0);
}
block_width = max(width / block_count_per_side, 1);
block_height = max(height / block_count_per_side, 1);
result = PyList_New((Py_ssize_t)block_count_per_side * block_count_per_side);
if (result == NULL) {
return NULL;
}
for (ih = 0; ih < block_count_per_side; ih++) {
int top, bottom, iw;
top = min(ih * block_height, height - block_height);
bottom = top + block_height;
for (iw = 0; iw < block_count_per_side; iw++) {
int left, right;
PyObject *pbox;
PyObject *pmethodname;
PyObject *pcrop;
PyObject *pblock;
left = min(iw * block_width, width - block_width);
right = left + block_width;
pbox = inttuple(4, left, top, right, bottom);
pmethodname = PyUnicode_FromString("crop");
pcrop = PyObject_CallMethodObjArgs(image, pmethodname, pbox, NULL);
Py_DECREF(pmethodname);
Py_DECREF(pbox);
if (pcrop == NULL) {
Py_DECREF(result);
return NULL; return NULL;
} }
pblock = getblock(pcrop);
pimage_size = PyObject_GetAttrString(image, "size"); Py_DECREF(pcrop);
pwidth = PySequence_ITEM(pimage_size, 0); if (pblock == NULL) {
pheight = PySequence_ITEM(pimage_size, 1); Py_DECREF(result);
width = PyLong_AsLong(pwidth);
height = PyLong_AsLong(pheight);
Py_DECREF(pimage_size);
Py_DECREF(pwidth);
Py_DECREF(pheight);
if (!(width && height)) {
return PyList_New(0);
}
block_width = max(width / block_count_per_side, 1);
block_height = max(height / block_count_per_side, 1);
result = PyList_New(block_count_per_side * block_count_per_side);
if (result == NULL) {
return NULL; return NULL;
}
PyList_SET_ITEM(result, ih * block_count_per_side + iw, pblock);
} }
}
for (ih=0; ih<block_count_per_side; ih++) {
int top, bottom, iw; return result;
top = min(ih*block_height, height-block_height);
bottom = top + block_height;
for (iw=0; iw<block_count_per_side; iw++) {
int left, right;
PyObject *pbox;
PyObject *pmethodname;
PyObject *pcrop;
PyObject *pblock;
left = min(iw*block_width, width-block_width);
right = left + block_width;
pbox = inttuple(4, left, top, right, bottom);
pmethodname = PyUnicode_FromString("crop");
pcrop = PyObject_CallMethodObjArgs(image, pmethodname, pbox, NULL);
Py_DECREF(pmethodname);
Py_DECREF(pbox);
if (pcrop == NULL) {
Py_DECREF(result);
return NULL;
}
pblock = getblock(pcrop);
Py_DECREF(pcrop);
if (pblock == NULL) {
Py_DECREF(result);
return NULL;
}
PyList_SET_ITEM(result, ih*block_count_per_side+iw, pblock);
}
}
return result;
} }
PyDoc_STRVAR(block_avgdiff_doc, PyDoc_STRVAR(block_avgdiff_doc,
"Returns the average diff between first blocks and seconds.\n\ "Returns the average diff between first blocks and seconds.\n\
\n\ \n\
If the result surpasses limit, limit + 1 is returned, except if less than min_iterations\n\ If the result surpasses limit, limit + 1 is returned, except if less than min_iterations\n\
iterations have been made in the blocks.\n"); iterations have been made in the blocks.\n");
static PyObject* block_avgdiff(PyObject *self, PyObject *args) static PyObject *block_avgdiff(PyObject *self, PyObject *args) {
{ PyObject *first, *second;
PyObject *first, *second; int limit, min_iterations;
int limit, min_iterations; Py_ssize_t count;
Py_ssize_t count; int sum, i, result;
int sum, i, result;
if (!PyArg_ParseTuple(args, "OOii", &first, &second, &limit,
if (!PyArg_ParseTuple(args, "OOii", &first, &second, &limit, &min_iterations)) { &min_iterations)) {
return NULL; return NULL;
}
count = PySequence_Length(first);
if (count != PySequence_Length(second)) {
PyErr_SetString(DifferentBlockCountError, "");
return NULL;
}
if (!count) {
PyErr_SetString(NoBlocksError, "");
return NULL;
}
sum = 0;
for (i = 0; i < count; i++) {
int iteration_count;
PyObject *item1, *item2;
iteration_count = i + 1;
item1 = PySequence_ITEM(first, i);
item2 = PySequence_ITEM(second, i);
sum += diff(item1, item2);
Py_DECREF(item1);
Py_DECREF(item2);
if ((sum > limit * iteration_count) &&
(iteration_count >= min_iterations)) {
return PyLong_FromLong(limit + 1);
} }
}
count = PySequence_Length(first);
if (count != PySequence_Length(second)) { result = sum / count;
PyErr_SetString(DifferentBlockCountError, ""); if (!result && sum) {
return NULL; result = 1;
} }
if (!count) { return PyLong_FromLong(result);
PyErr_SetString(NoBlocksError, "");
return NULL;
}
sum = 0;
for (i=0; i<count; i++) {
int iteration_count;
PyObject *item1, *item2;
iteration_count = i + 1;
item1 = PySequence_ITEM(first, i);
item2 = PySequence_ITEM(second, i);
sum += diff(item1, item2);
Py_DECREF(item1);
Py_DECREF(item2);
if ((sum > limit*iteration_count) && (iteration_count >= min_iterations)) {
return PyLong_FromLong(limit + 1);
}
}
result = sum / count;
if (!result && sum) {
result = 1;
}
return PyLong_FromLong(result);
} }
static PyMethodDef BlockMethods[] = { static PyMethodDef BlockMethods[] = {
{"getblocks2", block_getblocks2, METH_VARARGS, block_getblocks2_doc}, {"getblocks2", block_getblocks2, METH_VARARGS, block_getblocks2_doc},
{"avgdiff", block_avgdiff, METH_VARARGS, block_avgdiff_doc}, {"avgdiff", block_avgdiff, METH_VARARGS, block_avgdiff_doc},
{NULL, NULL, 0, NULL} /* Sentinel */ {NULL, NULL, 0, NULL} /* Sentinel */
}; };
static struct PyModuleDef BlockDef = { static struct PyModuleDef BlockDef = {PyModuleDef_HEAD_INIT,
PyModuleDef_HEAD_INIT, "_block",
"_block", NULL,
NULL, -1,
-1, BlockMethods,
BlockMethods, NULL,
NULL, NULL,
NULL, NULL,
NULL, NULL};
NULL
};
PyObject * PyObject *PyInit__block(void) {
PyInit__block(void) PyObject *m = PyModule_Create(&BlockDef);
{ if (m == NULL) {
PyObject *m = PyModule_Create(&BlockDef); return NULL;
if (m == NULL) { }
return NULL;
}
NoBlocksError = PyErr_NewException("_block.NoBlocksError", NULL, NULL);
PyModule_AddObject(m, "NoBlocksError", NoBlocksError);
DifferentBlockCountError = PyErr_NewException("_block.DifferentBlockCountError", NULL, NULL);
PyModule_AddObject(m, "DifferentBlockCountError", DifferentBlockCountError);
return m; NoBlocksError = PyErr_NewException("_block.NoBlocksError", NULL, NULL);
PyModule_AddObject(m, "NoBlocksError", NoBlocksError);
DifferentBlockCountError =
PyErr_NewException("_block.DifferentBlockCountError", NULL, NULL);
PyModule_AddObject(m, "DifferentBlockCountError", DifferentBlockCountError);
return m;
} }

View File

@@ -10,6 +10,8 @@
#include "common.h" #include "common.h"
#import <Foundation/Foundation.h> #import <Foundation/Foundation.h>
#import <CoreGraphics/CoreGraphics.h>
#import <ImageIO/ImageIO.h>
#define RADIANS( degrees ) ( degrees * M_PI / 180 ) #define RADIANS( degrees ) ( degrees * M_PI / 180 )

View File

@@ -9,28 +9,27 @@ from hscommon.util import get_file_ext, format_size
from core.util import format_timestamp, format_perc, format_dupe_count from core.util import format_timestamp, format_perc, format_dupe_count
from core import fs from core import fs
from . import exif from core.pe import exif
# This global value is set by the platform-specific subclasser of the Photo base class # This global value is set by the platform-specific subclasser of the Photo base class
PLAT_SPECIFIC_PHOTO_CLASS = None PLAT_SPECIFIC_PHOTO_CLASS = None
def format_dimensions(dimensions): def format_dimensions(dimensions):
return '%d x %d' % (dimensions[0], dimensions[1]) return "%d x %d" % (dimensions[0], dimensions[1])
def get_delta_dimensions(value, ref_value): def get_delta_dimensions(value, ref_value):
return (value[0]-ref_value[0], value[1]-ref_value[1]) return (value[0] - ref_value[0], value[1] - ref_value[1])
class Photo(fs.File): class Photo(fs.File):
INITIAL_INFO = fs.File.INITIAL_INFO.copy() INITIAL_INFO = fs.File.INITIAL_INFO.copy()
INITIAL_INFO.update({ INITIAL_INFO.update({"dimensions": (0, 0), "exif_timestamp": ""})
'dimensions': (0, 0),
'exif_timestamp': '',
})
__slots__ = fs.File.__slots__ + tuple(INITIAL_INFO.keys()) __slots__ = fs.File.__slots__ + tuple(INITIAL_INFO.keys())
# These extensions are supported on all platforms # These extensions are supported on all platforms
HANDLED_EXTS = {'png', 'jpg', 'jpeg', 'gif', 'bmp', 'tiff', 'tif'} HANDLED_EXTS = {"png", "jpg", "jpeg", "gif", "bmp", "tiff", "tif"}
def _plat_get_dimensions(self): def _plat_get_dimensions(self):
raise NotImplementedError() raise NotImplementedError()
@@ -39,25 +38,25 @@ class Photo(fs.File):
raise NotImplementedError() raise NotImplementedError()
def _get_orientation(self): def _get_orientation(self):
if not hasattr(self, '_cached_orientation'): if not hasattr(self, "_cached_orientation"):
try: try:
with self.path.open('rb') as fp: with self.path.open("rb") as fp:
exifdata = exif.get_fields(fp) exifdata = exif.get_fields(fp)
# the value is a list (probably one-sized) of ints # the value is a list (probably one-sized) of ints
orientations = exifdata['Orientation'] orientations = exifdata["Orientation"]
self._cached_orientation = orientations[0] self._cached_orientation = orientations[0]
except Exception: # Couldn't read EXIF data, no transforms except Exception: # Couldn't read EXIF data, no transforms
self._cached_orientation = 0 self._cached_orientation = 0
return self._cached_orientation return self._cached_orientation
def _get_exif_timestamp(self): def _get_exif_timestamp(self):
try: try:
with self.path.open('rb') as fp: with self.path.open("rb") as fp:
exifdata = exif.get_fields(fp) exifdata = exif.get_fields(fp)
return exifdata['DateTimeOriginal'] return exifdata["DateTimeOriginal"]
except Exception: except Exception:
logging.info("Couldn't read EXIF of picture: %s", self.path) logging.info("Couldn't read EXIF of picture: %s", self.path)
return '' return ""
@classmethod @classmethod
def can_handle(cls, path): def can_handle(cls, path):
@@ -79,28 +78,27 @@ class Photo(fs.File):
else: else:
percentage = group.percentage percentage = group.percentage
dupe_count = len(group.dupes) dupe_count = len(group.dupes)
dupe_folder_path = getattr(self, 'display_folder_path', self.folder_path) dupe_folder_path = getattr(self, "display_folder_path", self.folder_path)
return { return {
'name': self.name, "name": self.name,
'folder_path': str(dupe_folder_path), "folder_path": str(dupe_folder_path),
'size': format_size(size, 0, 1, False), "size": format_size(size, 0, 1, False),
'extension': self.extension, "extension": self.extension,
'dimensions': format_dimensions(dimensions), "dimensions": format_dimensions(dimensions),
'exif_timestamp': self.exif_timestamp, "exif_timestamp": self.exif_timestamp,
'mtime': format_timestamp(mtime, delta and m), "mtime": format_timestamp(mtime, delta and m),
'percentage': format_perc(percentage), "percentage": format_perc(percentage),
'dupe_count': format_dupe_count(dupe_count), "dupe_count": format_dupe_count(dupe_count),
} }
def _read_info(self, field): def _read_info(self, field):
fs.File._read_info(self, field) fs.File._read_info(self, field)
if field == 'dimensions': if field == "dimensions":
self.dimensions = self._plat_get_dimensions() self.dimensions = self._plat_get_dimensions()
if self._get_orientation() in {5, 6, 7, 8}: if self._get_orientation() in {5, 6, 7, 8}:
self.dimensions = (self.dimensions[1], self.dimensions[0]) self.dimensions = (self.dimensions[1], self.dimensions[0])
elif field == 'exif_timestamp': elif field == "exif_timestamp":
self.exif_timestamp = self._get_exif_timestamp() self.exif_timestamp = self._get_exif_timestamp()
def get_blocks(self, block_count_per_side): def get_blocks(self, block_count_per_side):
return self._plat_get_blocks(block_count_per_side, self._get_orientation()) return self._plat_get_blocks(block_count_per_side, self._get_orientation())

View File

@@ -8,11 +8,16 @@
from hscommon.trans import trget from hscommon.trans import trget
from core.prioritize import ( from core.prioritize import (
KindCategory, FolderCategory, FilenameCategory, NumericalCategory, KindCategory,
SizeCategory, MtimeCategory FolderCategory,
FilenameCategory,
NumericalCategory,
SizeCategory,
MtimeCategory,
) )
coltr = trget('columns') coltr = trget("columns")
class DimensionsCategory(NumericalCategory): class DimensionsCategory(NumericalCategory):
NAME = coltr("Dimensions") NAME = coltr("Dimensions")
@@ -24,8 +29,13 @@ class DimensionsCategory(NumericalCategory):
width, height = value width, height = value
return (-width, -height) return (-width, -height)
def all_categories(): def all_categories():
return [ return [
KindCategory, FolderCategory, FilenameCategory, SizeCategory, DimensionsCategory, KindCategory,
MtimeCategory FolderCategory,
FilenameCategory,
SizeCategory,
DimensionsCategory,
MtimeCategory,
] ]

View File

@@ -1,8 +1,8 @@
# Created On: 2011-11-27 # Created On: 2011-11-27
# Copyright 2015 Hardcoded Software (http://www.hardcoded.net) # Copyright 2015 Hardcoded Software (http://www.hardcoded.net)
# #
# This software is licensed under the "GPLv3" License as described in the "LICENSE" file, # This software is licensed under the "GPLv3" License as described in the "LICENSE" file,
# which should be included with this package. The terms are also available at # which should be included with this package. The terms are also available at
# http://www.gnu.org/licenses/gpl-3.0.html # http://www.gnu.org/licenses/gpl-3.0.html
from hscommon.gui.column import Column from hscommon.gui.column import Column
@@ -10,19 +10,20 @@ from hscommon.trans import trget
from core.gui.result_table import ResultTable as ResultTableBase from core.gui.result_table import ResultTable as ResultTableBase
coltr = trget('columns') coltr = trget("columns")
class ResultTable(ResultTableBase): class ResultTable(ResultTableBase):
COLUMNS = [ COLUMNS = [
Column('marked', ''), Column("marked", ""),
Column('name', coltr("Filename")), Column("name", coltr("Filename")),
Column('folder_path', coltr("Folder"), optional=True), Column("folder_path", coltr("Folder"), optional=True),
Column('size', coltr("Size (KB)"), optional=True), Column("size", coltr("Size (KB)"), optional=True),
Column('extension', coltr("Kind"), visible=False, optional=True), Column("extension", coltr("Kind"), visible=False, optional=True),
Column('dimensions', coltr("Dimensions"), optional=True), Column("dimensions", coltr("Dimensions"), optional=True),
Column('exif_timestamp', coltr("EXIF Timestamp"), visible=False, optional=True), Column("exif_timestamp", coltr("EXIF Timestamp"), visible=False, optional=True),
Column('mtime', coltr("Modification"), visible=False, optional=True), Column("mtime", coltr("Modification"), visible=False, optional=True),
Column('percentage', coltr("Match %"), optional=True), Column("percentage", coltr("Match %"), optional=True),
Column('dupe_count', coltr("Dupe Count"), visible=False, optional=True), Column("dupe_count", coltr("Dupe Count"), visible=False, optional=True),
] ]
DELTA_COLUMNS = {'size', 'dimensions', 'mtime'} DELTA_COLUMNS = {"size", "dimensions", "mtime"}

View File

@@ -8,7 +8,8 @@ from hscommon.trans import tr
from core.scanner import Scanner, ScanType, ScanOption from core.scanner import Scanner, ScanType, ScanOption
from . import matchblock, matchexif from core.pe import matchblock, matchexif
class ScannerPE(Scanner): class ScannerPE(Scanner):
cache_path = None cache_path = None
@@ -17,21 +18,20 @@ class ScannerPE(Scanner):
@staticmethod @staticmethod
def get_scan_options(): def get_scan_options():
return [ return [
ScanOption(ScanType.FuzzyBlock, tr("Contents")), ScanOption(ScanType.FUZZYBLOCK, tr("Contents")),
ScanOption(ScanType.ExifTimestamp, tr("EXIF Timestamp")), ScanOption(ScanType.EXIFTIMESTAMP, tr("EXIF Timestamp")),
] ]
def _getmatches(self, files, j): def _getmatches(self, files, j):
if self.scan_type == ScanType.FuzzyBlock: if self.scan_type == ScanType.FUZZYBLOCK:
return matchblock.getmatches( return matchblock.getmatches(
files, files,
cache_path=self.cache_path, cache_path=self.cache_path,
threshold=self.min_match_percentage, threshold=self.min_match_percentage,
match_scaled=self.match_scaled, match_scaled=self.match_scaled,
j=j j=j,
) )
elif self.scan_type == ScanType.ExifTimestamp: elif self.scan_type == ScanType.EXIFTIMESTAMP:
return matchexif.getmatches(files, self.match_scaled, j) return matchexif.getmatches(files, self.match_scaled, j)
else: else:
raise Exception("Invalid scan type") raise ValueError("Invalid scan type")

View File

@@ -1,48 +1,50 @@
# Created By: Virgil Dupras # Created By: Virgil Dupras
# Created On: 2011/09/07 # Created On: 2011/09/07
# Copyright 2015 Hardcoded Software (http://www.hardcoded.net) # Copyright 2015 Hardcoded Software (http://www.hardcoded.net)
# #
# This software is licensed under the "GPLv3" License as described in the "LICENSE" file, # This software is licensed under the "GPLv3" License as described in the "LICENSE" file,
# which should be included with this package. The terms are also available at # which should be included with this package. The terms are also available at
# http://www.gnu.org/licenses/gpl-3.0.html # http://www.gnu.org/licenses/gpl-3.0.html
from hscommon.util import dedupe, flatten, rem_file_ext from hscommon.util import dedupe, flatten, rem_file_ext
from hscommon.trans import trget, tr from hscommon.trans import trget, tr
coltr = trget('columns') coltr = trget("columns")
class CriterionCategory: class CriterionCategory:
NAME = "Undefined" NAME = "Undefined"
def __init__(self, results): def __init__(self, results):
self.results = results self.results = results
#--- Virtual # --- Virtual
def extract_value(self, dupe): def extract_value(self, dupe):
raise NotImplementedError() raise NotImplementedError()
def format_criterion_value(self, value): def format_criterion_value(self, value):
return value return value
def sort_key(self, dupe, crit_value): def sort_key(self, dupe, crit_value):
raise NotImplementedError() raise NotImplementedError()
def criteria_list(self): def criteria_list(self):
raise NotImplementedError() raise NotImplementedError()
class Criterion: class Criterion:
def __init__(self, category, value): def __init__(self, category, value):
self.category = category self.category = category
self.value = value self.value = value
self.display_value = category.format_criterion_value(value) self.display_value = category.format_criterion_value(value)
def sort_key(self, dupe): def sort_key(self, dupe):
return self.category.sort_key(dupe, self.value) return self.category.sort_key(dupe, self.value)
@property @property
def display(self): def display(self):
return "{} ({})".format(self.category.NAME, self.display_value) return f"{self.category.NAME} ({self.display_value})"
class ValueListCategory(CriterionCategory): class ValueListCategory(CriterionCategory):
def sort_key(self, dupe, crit_value): def sort_key(self, dupe, crit_value):
@@ -52,37 +54,41 @@ class ValueListCategory(CriterionCategory):
return 0 return 0
else: else:
return 1 return 1
def criteria_list(self): def criteria_list(self):
dupes = flatten(g[:] for g in self.results.groups) dupes = flatten(g[:] for g in self.results.groups)
values = sorted(dedupe(self.extract_value(d) for d in dupes)) values = sorted(dedupe(self.extract_value(d) for d in dupes))
return [Criterion(self, value) for value in values] return [Criterion(self, value) for value in values]
class KindCategory(ValueListCategory): class KindCategory(ValueListCategory):
NAME = coltr("Kind") NAME = coltr("Kind")
def extract_value(self, dupe): def extract_value(self, dupe):
value = dupe.extension value = dupe.extension
if not value: if not value:
value = tr("None") value = tr("None")
return value return value
class FolderCategory(ValueListCategory): class FolderCategory(ValueListCategory):
NAME = coltr("Folder") NAME = coltr("Folder")
def extract_value(self, dupe): def extract_value(self, dupe):
return dupe.folder_path return dupe.folder_path
def format_criterion_value(self, value): def format_criterion_value(self, value):
return str(value) return str(value)
def sort_key(self, dupe, crit_value): def sort_key(self, dupe, crit_value):
value = self.extract_value(dupe) value = self.extract_value(dupe)
if value[:len(crit_value)] == crit_value: # This is instead of using is_relative_to() which was added in py 3.9
return 0 try:
else: value.relative_to(crit_value)
except ValueError:
return 1 return 1
return 0
class FilenameCategory(CriterionCategory): class FilenameCategory(CriterionCategory):
NAME = coltr("Filename") NAME = coltr("Filename")
@@ -90,7 +96,7 @@ class FilenameCategory(CriterionCategory):
DOESNT_END_WITH_NUMBER = 1 DOESNT_END_WITH_NUMBER = 1
LONGEST = 2 LONGEST = 2
SHORTEST = 3 SHORTEST = 3
def format_criterion_value(self, value): def format_criterion_value(self, value):
return { return {
self.ENDS_WITH_NUMBER: tr("Ends with number"), self.ENDS_WITH_NUMBER: tr("Ends with number"),
@@ -98,10 +104,10 @@ class FilenameCategory(CriterionCategory):
self.LONGEST: tr("Longest"), self.LONGEST: tr("Longest"),
self.SHORTEST: tr("Shortest"), self.SHORTEST: tr("Shortest"),
}[value] }[value]
def extract_value(self, dupe): def extract_value(self, dupe):
return rem_file_ext(dupe.name) return rem_file_ext(dupe.name)
def sort_key(self, dupe, crit_value): def sort_key(self, dupe, crit_value):
value = self.extract_value(dupe) value = self.extract_value(dupe)
if crit_value in {self.ENDS_WITH_NUMBER, self.DOESNT_END_WITH_NUMBER}: if crit_value in {self.ENDS_WITH_NUMBER, self.DOESNT_END_WITH_NUMBER}:
@@ -113,50 +119,57 @@ class FilenameCategory(CriterionCategory):
else: else:
value = len(value) value = len(value)
if crit_value == self.LONGEST: if crit_value == self.LONGEST:
value *= -1 # We want the biggest values on top value *= -1 # We want the biggest values on top
return value return value
def criteria_list(self): def criteria_list(self):
return [Criterion(self, crit_value) for crit_value in [ return [
self.ENDS_WITH_NUMBER, Criterion(self, crit_value)
self.DOESNT_END_WITH_NUMBER, for crit_value in [
self.LONGEST, self.ENDS_WITH_NUMBER,
self.SHORTEST, self.DOESNT_END_WITH_NUMBER,
]] self.LONGEST,
self.SHORTEST,
]
]
class NumericalCategory(CriterionCategory): class NumericalCategory(CriterionCategory):
HIGHEST = 0 HIGHEST = 0
LOWEST = 1 LOWEST = 1
def format_criterion_value(self, value): def format_criterion_value(self, value):
return tr("Highest") if value == self.HIGHEST else tr("Lowest") return tr("Highest") if value == self.HIGHEST else tr("Lowest")
def invert_numerical_value(self, value): # Virtual def invert_numerical_value(self, value): # Virtual
return value * -1 return value * -1
def sort_key(self, dupe, crit_value): def sort_key(self, dupe, crit_value):
value = self.extract_value(dupe) value = self.extract_value(dupe)
if crit_value == self.HIGHEST: # we want highest values on top if crit_value == self.HIGHEST: # we want highest values on top
value = self.invert_numerical_value(value) value = self.invert_numerical_value(value)
return value return value
def criteria_list(self): def criteria_list(self):
return [Criterion(self, self.HIGHEST), Criterion(self, self.LOWEST)] return [Criterion(self, self.HIGHEST), Criterion(self, self.LOWEST)]
class SizeCategory(NumericalCategory): class SizeCategory(NumericalCategory):
NAME = coltr("Size") NAME = coltr("Size")
def extract_value(self, dupe): def extract_value(self, dupe):
return dupe.size return dupe.size
class MtimeCategory(NumericalCategory): class MtimeCategory(NumericalCategory):
NAME = coltr("Modification") NAME = coltr("Modification")
def extract_value(self, dupe): def extract_value(self, dupe):
return dupe.mtime return dupe.mtime
def format_criterion_value(self, value): def format_criterion_value(self, value):
return tr("Newest") if value == self.HIGHEST else tr("Oldest") return tr("Newest") if value == self.HIGHEST else tr("Oldest")
def all_categories(): def all_categories():
return [KindCategory, FolderCategory, FilenameCategory, SizeCategory, MtimeCategory] return [KindCategory, FolderCategory, FilenameCategory, SizeCategory, MtimeCategory]

View File

@@ -17,8 +17,9 @@ from hscommon.conflict import get_conflicted_name
from hscommon.util import flatten, nonone, FileOrPath, format_size from hscommon.util import flatten, nonone, FileOrPath, format_size
from hscommon.trans import tr from hscommon.trans import tr
from . import engine from core import engine
from .markable import Markable from core.markable import Markable
class Results(Markable): class Results(Markable):
"""Manages a collection of duplicate :class:`~core.engine.Group`. """Manages a collection of duplicate :class:`~core.engine.Group`.
@@ -34,22 +35,24 @@ class Results(Markable):
A list of all duplicates (:class:`~core.fs.File` instances), without ref, contained in the A list of all duplicates (:class:`~core.fs.File` instances), without ref, contained in the
currently managed :attr:`groups`. currently managed :attr:`groups`.
""" """
#---Override
# ---Override
def __init__(self, app): def __init__(self, app):
Markable.__init__(self) Markable.__init__(self)
self.__groups = [] self.__groups = []
self.__group_of_duplicate = {} self.__group_of_duplicate = {}
self.__groups_sort_descriptor = None # This is a tuple (key, asc) self.__groups_sort_descriptor = None # This is a tuple (key, asc)
self.__dupes = None self.__dupes = None
self.__dupes_sort_descriptor = None # This is a tuple (key, asc, delta) self.__dupes_sort_descriptor = None # This is a tuple (key, asc, delta)
self.__filters = None self.__filters = None
self.__filtered_dupes = None self.__filtered_dupes = None
self.__filtered_groups = None self.__filtered_groups = None
self.__recalculate_stats() self.__recalculate_stats()
self.__marked_size = 0 self.__marked_size = 0
self.app = app self.app = app
self.problems = [] # (dupe, error_msg) self.problems = [] # (dupe, error_msg)
self.is_modified = False self.is_modified = False
self.refresh_required = False
def _did_mark(self, dupe): def _did_mark(self, dupe):
self.__marked_size += dupe.size self.__marked_size += dupe.size
@@ -90,15 +93,17 @@ class Results(Markable):
else: else:
Markable.mark_none(self) Markable.mark_none(self)
#---Private # ---Private
def __get_dupe_list(self): def __get_dupe_list(self):
if self.__dupes is None: if self.__dupes is None or self.refresh_required:
self.__dupes = flatten(group.dupes for group in self.groups) self.__dupes = flatten(group.dupes for group in self.groups)
self.refresh_required = False
if None in self.__dupes: if None in self.__dupes:
# This is debug logging to try to figure out #44 # This is debug logging to try to figure out #44
logging.warning( logging.warning(
"There is a None value in the Results' dupe list. dupes: %r groups: %r", "There is a None value in the Results' dupe list. dupes: %r groups: %r",
self.__dupes, self.groups self.__dupes,
self.groups,
) )
if self.__filtered_dupes: if self.__filtered_dupes:
self.__dupes = [dupe for dupe in self.__dupes if dupe in self.__filtered_dupes] self.__dupes = [dupe for dupe in self.__dupes if dupe in self.__filtered_dupes]
@@ -133,7 +138,7 @@ class Results(Markable):
format_size(total_size, 2), format_size(total_size, 2),
) )
if self.__filters: if self.__filters:
result += tr(" filter: %s") % ' --> '.join(self.__filters) result += tr(" filter: %s") % " --> ".join(self.__filters)
return result return result
def __recalculate_stats(self): def __recalculate_stats(self):
@@ -151,7 +156,7 @@ class Results(Markable):
for g in self.__groups: for g in self.__groups:
for dupe in g: for dupe in g:
self.__group_of_duplicate[dupe] = g self.__group_of_duplicate[dupe] = g
if not hasattr(dupe, 'is_ref'): if not hasattr(dupe, "is_ref"):
dupe.is_ref = False dupe.is_ref = False
self.is_modified = bool(self.__groups) self.is_modified = bool(self.__groups)
old_filters = nonone(self.__filters, []) old_filters = nonone(self.__filters, [])
@@ -159,7 +164,7 @@ class Results(Markable):
for filter_str in old_filters: for filter_str in old_filters:
self.apply_filter(filter_str) self.apply_filter(filter_str)
#---Public # ---Public
def apply_filter(self, filter_str): def apply_filter(self, filter_str):
"""Applies a filter ``filter_str`` to :attr:`groups` """Applies a filter ``filter_str`` to :attr:`groups`
@@ -182,11 +187,11 @@ class Results(Markable):
try: try:
filter_re = re.compile(filter_str, re.IGNORECASE) filter_re = re.compile(filter_str, re.IGNORECASE)
except re.error: except re.error:
return # don't apply this filter. return # don't apply this filter.
self.__filters.append(filter_str) self.__filters.append(filter_str)
if self.__filtered_dupes is None: if self.__filtered_dupes is None:
self.__filtered_dupes = flatten(g[:] for g in self.groups) self.__filtered_dupes = flatten(g[:] for g in self.groups)
self.__filtered_dupes = set(dupe for dupe in self.__filtered_dupes if filter_re.search(str(dupe.path))) self.__filtered_dupes = {dupe for dupe in self.__filtered_dupes if filter_re.search(str(dupe.path))}
filtered_groups = set() filtered_groups = set()
for dupe in self.__filtered_dupes: for dupe in self.__filtered_dupes:
filtered_groups.add(self.get_group_of_duplicate(dupe)) filtered_groups.add(self.get_group_of_duplicate(dupe))
@@ -198,8 +203,7 @@ class Results(Markable):
self.__dupes = None self.__dupes = None
def get_group_of_duplicate(self, dupe): def get_group_of_duplicate(self, dupe):
"""Returns :class:`~core.engine.Group` in which ``dupe`` belongs. """Returns :class:`~core.engine.Group` in which ``dupe`` belongs."""
"""
try: try:
return self.__group_of_duplicate[dupe] return self.__group_of_duplicate[dupe]
except (TypeError, KeyError): except (TypeError, KeyError):
@@ -214,6 +218,7 @@ class Results(Markable):
:param get_file: a function f(path) returning a :class:`~core.fs.File` wrapping the path. :param get_file: a function f(path) returning a :class:`~core.fs.File` wrapping the path.
:param j: A :ref:`job progress instance <jobs>`. :param j: A :ref:`job progress instance <jobs>`.
""" """
def do_match(ref_file, other_files, group): def do_match(ref_file, other_files, group):
if not other_files: if not other_files:
return return
@@ -223,31 +228,31 @@ class Results(Markable):
self.apply_filter(None) self.apply_filter(None)
root = ET.parse(infile).getroot() root = ET.parse(infile).getroot()
group_elems = list(root.getiterator('group')) group_elems = list(root.iter("group"))
groups = [] groups = []
marked = set() marked = set()
for group_elem in j.iter_with_progress(group_elems, every=100): for group_elem in j.iter_with_progress(group_elems, every=100):
group = engine.Group() group = engine.Group()
dupes = [] dupes = []
for file_elem in group_elem.getiterator('file'): for file_elem in group_elem.iter("file"):
path = file_elem.get('path') path = file_elem.get("path")
words = file_elem.get('words', '') words = file_elem.get("words", "")
if not path: if not path:
continue continue
file = get_file(path) file = get_file(path)
if file is None: if file is None:
continue continue
file.words = words.split(',') file.words = words.split(",")
file.is_ref = file_elem.get('is_ref') == 'y' file.is_ref = file_elem.get("is_ref") == "y"
dupes.append(file) dupes.append(file)
if file_elem.get('marked') == 'y': if file_elem.get("marked") == "y":
marked.add(file) marked.add(file)
for match_elem in group_elem.getiterator('match'): for match_elem in group_elem.iter("match"):
try: try:
attrs = match_elem.attrib attrs = match_elem.attrib
first_file = dupes[int(attrs['first'])] first_file = dupes[int(attrs["first"])]
second_file = dupes[int(attrs['second'])] second_file = dupes[int(attrs["second"])]
percentage = int(attrs['percentage']) percentage = int(attrs["percentage"])
group.add_match(engine.Match(first_file, second_file, percentage)) group.add_match(engine.Match(first_file, second_file, percentage))
except (IndexError, KeyError, ValueError): except (IndexError, KeyError, ValueError):
# Covers missing attr, non-int values and indexes out of bounds # Covers missing attr, non-int values and indexes out of bounds
@@ -264,8 +269,7 @@ class Results(Markable):
self.is_modified = False self.is_modified = False
def make_ref(self, dupe): def make_ref(self, dupe):
"""Make ``dupe`` take the :attr:`~core.engine.Group.ref` position of its group. """Make ``dupe`` take the :attr:`~core.engine.Group.ref` position of its group."""
"""
g = self.get_group_of_duplicate(dupe) g = self.get_group_of_duplicate(dupe)
r = g.ref r = g.ref
if not g.switch_ref(dupe): if not g.switch_ref(dupe):
@@ -297,7 +301,7 @@ class Results(Markable):
try: try:
func(dupe) func(dupe)
to_remove.append(dupe) to_remove.append(dupe)
except (EnvironmentError, UnicodeEncodeError) as e: except (OSError, UnicodeEncodeError) as e:
self.problems.append((dupe, str(e))) self.problems.append((dupe, str(e)))
if remove_from_results: if remove_from_results:
self.remove_duplicates(to_remove) self.remove_duplicates(to_remove)
@@ -339,9 +343,9 @@ class Results(Markable):
:param outfile: file object or path. :param outfile: file object or path.
""" """
self.apply_filter(None) self.apply_filter(None)
root = ET.Element('results') root = ET.Element("results")
for g in self.groups: for g in self.groups:
group_elem = ET.SubElement(root, 'group') group_elem = ET.SubElement(root, "group")
dupe2index = {} dupe2index = {}
for index, d in enumerate(g): for index, d in enumerate(g):
dupe2index[d] = index dupe2index[d] = index
@@ -349,29 +353,29 @@ class Results(Markable):
words = engine.unpack_fields(d.words) words = engine.unpack_fields(d.words)
except AttributeError: except AttributeError:
words = () words = ()
file_elem = ET.SubElement(group_elem, 'file') file_elem = ET.SubElement(group_elem, "file")
try: try:
file_elem.set('path', str(d.path)) file_elem.set("path", str(d.path))
file_elem.set('words', ','.join(words)) file_elem.set("words", ",".join(words))
except ValueError: # If there's an invalid character, just skip the file except ValueError: # If there's an invalid character, just skip the file
file_elem.set('path', '') file_elem.set("path", "")
file_elem.set('is_ref', ('y' if d.is_ref else 'n')) file_elem.set("is_ref", ("y" if d.is_ref else "n"))
file_elem.set('marked', ('y' if self.is_marked(d) else 'n')) file_elem.set("marked", ("y" if self.is_marked(d) else "n"))
for match in g.matches: for match in g.matches:
match_elem = ET.SubElement(group_elem, 'match') match_elem = ET.SubElement(group_elem, "match")
match_elem.set('first', str(dupe2index[match.first])) match_elem.set("first", str(dupe2index[match.first]))
match_elem.set('second', str(dupe2index[match.second])) match_elem.set("second", str(dupe2index[match.second]))
match_elem.set('percentage', str(int(match.percentage))) match_elem.set("percentage", str(int(match.percentage)))
tree = ET.ElementTree(root) tree = ET.ElementTree(root)
def do_write(outfile): def do_write(outfile):
with FileOrPath(outfile, 'wb') as fp: with FileOrPath(outfile, "wb") as fp:
tree.write(fp, encoding='utf-8') tree.write(fp, encoding="utf-8")
try: try:
do_write(outfile) do_write(outfile)
except IOError as e: except OSError as e:
# If our IOError is because dest is already a directory, we want to handle that. 21 is # If our OSError is because dest is already a directory, we want to handle that. 21 is
# the code we get on OS X and Linux, 13 is what we get on Windows. # the code we get on OS X and Linux, 13 is what we get on Windows.
if e.errno in {21, 13}: if e.errno in {21, 13}:
p = str(outfile) p = str(outfile)
@@ -392,8 +396,10 @@ class Results(Markable):
""" """
if not self.__dupes: if not self.__dupes:
self.__get_dupe_list() self.__get_dupe_list()
keyfunc = lambda d: self.app._get_dupe_sort_key(d, lambda: self.get_group_of_duplicate(d), key, delta) self.__dupes.sort(
self.__dupes.sort(key=keyfunc, reverse=not asc) key=lambda d: self.app._get_dupe_sort_key(d, lambda: self.get_group_of_duplicate(d), key, delta),
reverse=not asc,
)
self.__dupes_sort_descriptor = (key, asc, delta) self.__dupes_sort_descriptor = (key, asc, delta)
def sort_groups(self, key, asc=True): def sort_groups(self, key, asc=True):
@@ -404,12 +410,10 @@ class Results(Markable):
:param str key: key attribute name to sort with. :param str key: key attribute name to sort with.
:param bool asc: If false, sorting is reversed. :param bool asc: If false, sorting is reversed.
""" """
keyfunc = lambda g: self.app._get_group_sort_key(g, key) self.groups.sort(key=lambda g: self.app._get_group_sort_key(g, key), reverse=not asc)
self.groups.sort(key=keyfunc, reverse=not asc)
self.__groups_sort_descriptor = (key, asc) self.__groups_sort_descriptor = (key, asc)
#---Properties # ---Properties
dupes = property(__get_dupe_list) dupes = property(__get_dupe_list)
groups = property(__get_groups, __set_groups) groups = property(__get_groups, __set_groups)
stat_line = property(__get_stat_line) stat_line = property(__get_stat_line)

View File

@@ -13,37 +13,41 @@ from hscommon.jobprogress import job
from hscommon.util import dedupe, rem_file_ext, get_file_ext from hscommon.util import dedupe, rem_file_ext, get_file_ext
from hscommon.trans import tr from hscommon.trans import tr
from . import engine from core import engine
# It's quite ugly to have scan types from all editions all put in the same class, but because there's # It's quite ugly to have scan types from all editions all put in the same class, but because there's
# there will be some nasty bugs popping up (ScanType is used in core when in should exclusively be # there will be some nasty bugs popping up (ScanType is used in core when in should exclusively be
# used in core_*). One day I'll clean this up. # used in core_*). One day I'll clean this up.
class ScanType: class ScanType:
Filename = 0 FILENAME = 0
Fields = 1 FIELDS = 1
FieldsNoOrder = 2 FIELDSNOORDER = 2
Tag = 3 TAG = 3
Folders = 4 FOLDERS = 4
Contents = 5 CONTENTS = 5
#PE # PE
FuzzyBlock = 10 FUZZYBLOCK = 10
ExifTimestamp = 11 EXIFTIMESTAMP = 11
ScanOption = namedtuple('ScanOption', 'scan_type label')
SCANNABLE_TAGS = ['track', 'artist', 'album', 'title', 'genre', 'year'] ScanOption = namedtuple("ScanOption", "scan_type label")
SCANNABLE_TAGS = ["track", "artist", "album", "title", "genre", "year"]
RE_DIGIT_ENDING = re.compile(r"\d+|\(\d+\)|\[\d+\]|{\d+}")
RE_DIGIT_ENDING = re.compile(r'\d+|\(\d+\)|\[\d+\]|{\d+}')
def is_same_with_digit(name, refname): def is_same_with_digit(name, refname):
# Returns True if name is the same as refname, but with digits (with brackets or not) at the end # Returns True if name is the same as refname, but with digits (with brackets or not) at the end
if not name.startswith(refname): if not name.startswith(refname):
return False return False
end = name[len(refname):].strip() end = name[len(refname) :].strip()
return RE_DIGIT_ENDING.match(end) is not None return RE_DIGIT_ENDING.match(end) is not None
def remove_dupe_paths(files): def remove_dupe_paths(files):
# Returns files with duplicates-by-path removed. Files with the exact same path are considered # Returns files with duplicates-by-path removed. Files with the exact same path are considered
# duplicates and only the first file to have a path is kept. In certain cases, we have files # duplicates and only the first file to have a path is kept. In certain cases, we have files
@@ -57,42 +61,53 @@ def remove_dupe_paths(files):
if normalized in path2file: if normalized in path2file:
try: try:
if op.samefile(normalized, str(path2file[normalized].path)): if op.samefile(normalized, str(path2file[normalized].path)):
continue # same file, it's a dupe continue # same file, it's a dupe
else: else:
pass # We don't treat them as dupes pass # We don't treat them as dupes
except OSError: except OSError:
continue # File doesn't exist? Well, treat them as dupes continue # File doesn't exist? Well, treat them as dupes
else: else:
path2file[normalized] = f path2file[normalized] = f
result.append(f) result.append(f)
return result return result
class Scanner: class Scanner:
def __init__(self): def __init__(self):
self.discarded_file_count = 0 self.discarded_file_count = 0
def _getmatches(self, files, j): def _getmatches(self, files, j):
if self.size_threshold or self.scan_type in {ScanType.Contents, ScanType.Folders}: if (
self.size_threshold
or self.large_size_threshold
or self.scan_type
in {
ScanType.CONTENTS,
ScanType.FOLDERS,
}
):
j = j.start_subjob([2, 8]) j = j.start_subjob([2, 8])
for f in j.iter_with_progress(files, tr("Read size of %d/%d files")): for f in j.iter_with_progress(files, tr("Read size of %d/%d files")):
f.size # pre-read, makes a smoother progress if read here (especially for bundles) f.size # pre-read, makes a smoother progress if read here (especially for bundles)
if self.size_threshold: if self.size_threshold:
files = [f for f in files if f.size >= self.size_threshold] files = [f for f in files if f.size >= self.size_threshold]
if self.scan_type in {ScanType.Contents, ScanType.Folders}: if self.large_size_threshold:
return engine.getmatches_by_contents(files, j=j) files = [f for f in files if f.size <= self.large_size_threshold]
if self.scan_type in {ScanType.CONTENTS, ScanType.FOLDERS}:
return engine.getmatches_by_contents(files, bigsize=self.big_file_size_threshold, j=j)
else: else:
j = j.start_subjob([2, 8]) j = j.start_subjob([2, 8])
kw = {} kw = {}
kw['match_similar_words'] = self.match_similar_words kw["match_similar_words"] = self.match_similar_words
kw['weight_words'] = self.word_weighting kw["weight_words"] = self.word_weighting
kw['min_match_percentage'] = self.min_match_percentage kw["min_match_percentage"] = self.min_match_percentage
if self.scan_type == ScanType.FieldsNoOrder: if self.scan_type == ScanType.FIELDSNOORDER:
self.scan_type = ScanType.Fields self.scan_type = ScanType.FIELDS
kw['no_field_order'] = True kw["no_field_order"] = True
func = { func = {
ScanType.Filename: lambda f: engine.getwords(rem_file_ext(f.name)), ScanType.FILENAME: lambda f: engine.getwords(rem_file_ext(f.name)),
ScanType.Fields: lambda f: engine.getfields(rem_file_ext(f.name)), ScanType.FIELDS: lambda f: engine.getfields(rem_file_ext(f.name)),
ScanType.Tag: lambda f: [ ScanType.TAG: lambda f: [
engine.getwords(str(getattr(f, attrname))) engine.getwords(str(getattr(f, attrname)))
for attrname in SCANNABLE_TAGS for attrname in SCANNABLE_TAGS
if attrname in self.scanned_tags if attrname in self.scanned_tags
@@ -111,15 +126,15 @@ class Scanner:
def _tie_breaker(ref, dupe): def _tie_breaker(ref, dupe):
refname = rem_file_ext(ref.name).lower() refname = rem_file_ext(ref.name).lower()
dupename = rem_file_ext(dupe.name).lower() dupename = rem_file_ext(dupe.name).lower()
if 'copy' in dupename: if "copy" in dupename:
return False return False
if 'copy' in refname: if "copy" in refname:
return True return True
if is_same_with_digit(dupename, refname): if is_same_with_digit(dupename, refname):
return False return False
if is_same_with_digit(refname, dupename): if is_same_with_digit(refname, dupename):
return True return True
return len(dupe.path) > len(ref.path) return len(dupe.path.parts) > len(ref.path.parts)
@staticmethod @staticmethod
def get_scan_options(): def get_scan_options():
@@ -130,26 +145,26 @@ class Scanner:
raise NotImplementedError() raise NotImplementedError()
def get_dupe_groups(self, files, ignore_list=None, j=job.nulljob): def get_dupe_groups(self, files, ignore_list=None, j=job.nulljob):
for f in (f for f in files if not hasattr(f, 'is_ref')): for f in (f for f in files if not hasattr(f, "is_ref")):
f.is_ref = False f.is_ref = False
files = remove_dupe_paths(files) files = remove_dupe_paths(files)
logging.info("Getting matches. Scan type: %d", self.scan_type) logging.info("Getting matches. Scan type: %d", self.scan_type)
matches = self._getmatches(files, j) matches = self._getmatches(files, j)
logging.info('Found %d matches' % len(matches)) logging.info("Found %d matches" % len(matches))
j.set_progress(100, tr("Almost done! Fiddling with results...")) j.set_progress(100, tr("Almost done! Fiddling with results..."))
# In removing what we call here "false matches", we first want to remove, if we scan by # In removing what we call here "false matches", we first want to remove, if we scan by
# folders, we want to remove folder matches for which the parent is also in a match (they're # folders, we want to remove folder matches for which the parent is also in a match (they're
# "duplicated duplicates if you will). Then, we also don't want mixed file kinds if the # "duplicated duplicates if you will). Then, we also don't want mixed file kinds if the
# option isn't enabled, we want matches for which both files exist and, lastly, we don't # option isn't enabled, we want matches for which both files exist and, lastly, we don't
# want matches with both files as ref. # want matches with both files as ref.
if self.scan_type == ScanType.Folders and matches: if self.scan_type == ScanType.FOLDERS and matches:
allpath = {m.first.path for m in matches} allpath = {m.first.path for m in matches}
allpath |= {m.second.path for m in matches} allpath |= {m.second.path for m in matches}
sortedpaths = sorted(allpath) sortedpaths = sorted(allpath)
toremove = set() toremove = set()
last_parent_path = sortedpaths[0] last_parent_path = sortedpaths[0]
for p in sortedpaths[1:]: for p in sortedpaths[1:]:
if p in last_parent_path: if last_parent_path in p.parents:
toremove.add(p) toremove.add(p)
else: else:
last_parent_path = p last_parent_path = p
@@ -159,13 +174,15 @@ class Scanner:
matches = [m for m in matches if m.first.path.exists() and m.second.path.exists()] matches = [m for m in matches if m.first.path.exists() and m.second.path.exists()]
matches = [m for m in matches if not (m.first.is_ref and m.second.is_ref)] matches = [m for m in matches if not (m.first.is_ref and m.second.is_ref)]
if ignore_list: if ignore_list:
matches = [ matches = [m for m in matches if not ignore_list.are_ignored(str(m.first.path), str(m.second.path))]
m for m in matches logging.info("Grouping matches")
if not ignore_list.AreIgnored(str(m.first.path), str(m.second.path))
]
logging.info('Grouping matches')
groups = engine.get_groups(matches) groups = engine.get_groups(matches)
if self.scan_type in {ScanType.Filename, ScanType.Fields, ScanType.FieldsNoOrder, ScanType.Tag}: if self.scan_type in {
ScanType.FILENAME,
ScanType.FIELDS,
ScanType.FIELDSNOORDER,
ScanType.TAG,
}:
matched_files = dedupe([m.first for m in matches] + [m.second for m in matches]) matched_files = dedupe([m.first for m in matches] + [m.second for m in matches])
self.discarded_file_count = len(matched_files) - sum(len(g) for g in groups) self.discarded_file_count = len(matched_files) - sum(len(g) for g in groups)
else: else:
@@ -181,7 +198,7 @@ class Scanner:
# reporting discarded matches. # reporting discarded matches.
self.discarded_file_count = 0 self.discarded_file_count = 0
groups = [g for g in groups if any(not f.is_ref for f in g)] groups = [g for g in groups if any(not f.is_ref for f in g)]
logging.info('Created %d groups' % len(groups)) logging.info("Created %d groups" % len(groups))
for g in groups: for g in groups:
g.prioritize(self._key_func, self._tie_breaker) g.prioritize(self._key_func, self._tie_breaker)
return groups return groups
@@ -189,8 +206,9 @@ class Scanner:
match_similar_words = False match_similar_words = False
min_match_percentage = 80 min_match_percentage = 80
mix_file_kind = True mix_file_kind = True
scan_type = ScanType.Filename scan_type = ScanType.FILENAME
scanned_tags = {'artist', 'title'} scanned_tags = {"artist", "title"}
size_threshold = 0 size_threshold = 0
large_size_threshold = 0
big_file_size_threshold = 0
word_weighting = False word_weighting = False

View File

@@ -1 +1 @@
from . import fs, result_table, scanner # noqa from core.se import fs, result_table, scanner # noqa

View File

@@ -11,6 +11,7 @@ from hscommon.util import format_size
from core import fs from core import fs
from core.util import format_timestamp, format_perc, format_words, format_dupe_count from core.util import format_timestamp, format_perc, format_words, format_dupe_count
def get_display_info(dupe, group, delta): def get_display_info(dupe, group, delta):
size = dupe.size size = dupe.size
mtime = dupe.mtime mtime = dupe.mtime
@@ -26,16 +27,17 @@ def get_display_info(dupe, group, delta):
percentage = group.percentage percentage = group.percentage
dupe_count = len(group.dupes) dupe_count = len(group.dupes)
return { return {
'name': dupe.name, "name": dupe.name,
'folder_path': str(dupe.folder_path), "folder_path": str(dupe.folder_path),
'size': format_size(size, 0, 1, False), "size": format_size(size, 0, 1, False),
'extension': dupe.extension, "extension": dupe.extension,
'mtime': format_timestamp(mtime, delta and m), "mtime": format_timestamp(mtime, delta and m),
'percentage': format_perc(percentage), "percentage": format_perc(percentage),
'words': format_words(dupe.words) if hasattr(dupe, 'words') else '', "words": format_words(dupe.words) if hasattr(dupe, "words") else "",
'dupe_count': format_dupe_count(dupe_count), "dupe_count": format_dupe_count(dupe_count),
} }
class File(fs.File): class File(fs.File):
def get_display_info(self, group, delta): def get_display_info(self, group, delta):
return get_display_info(self, group, delta) return get_display_info(self, group, delta)
@@ -44,4 +46,3 @@ class File(fs.File):
class Folder(fs.Folder): class Folder(fs.Folder):
def get_display_info(self, group, delta): def get_display_info(self, group, delta):
return get_display_info(self, group, delta) return get_display_info(self, group, delta)

View File

@@ -1,8 +1,8 @@
# Created On: 2011-11-27 # Created On: 2011-11-27
# Copyright 2015 Hardcoded Software (http://www.hardcoded.net) # Copyright 2015 Hardcoded Software (http://www.hardcoded.net)
# #
# This software is licensed under the "GPLv3" License as described in the "LICENSE" file, # This software is licensed under the "GPLv3" License as described in the "LICENSE" file,
# which should be included with this package. The terms are also available at # which should be included with this package. The terms are also available at
# http://www.gnu.org/licenses/gpl-3.0.html # http://www.gnu.org/licenses/gpl-3.0.html
from hscommon.gui.column import Column from hscommon.gui.column import Column
@@ -10,18 +10,19 @@ from hscommon.trans import trget
from core.gui.result_table import ResultTable as ResultTableBase from core.gui.result_table import ResultTable as ResultTableBase
coltr = trget('columns') coltr = trget("columns")
class ResultTable(ResultTableBase): class ResultTable(ResultTableBase):
COLUMNS = [ COLUMNS = [
Column('marked', ''), Column("marked", ""),
Column('name', coltr("Filename")), Column("name", coltr("Filename")),
Column('folder_path', coltr("Folder"), optional=True), Column("folder_path", coltr("Folder"), optional=True),
Column('size', coltr("Size (KB)"), optional=True), Column("size", coltr("Size (KB)"), optional=True),
Column('extension', coltr("Kind"), visible=False, optional=True), Column("extension", coltr("Kind"), visible=False, optional=True),
Column('mtime', coltr("Modification"), visible=False, optional=True), Column("mtime", coltr("Modification"), visible=False, optional=True),
Column('percentage', coltr("Match %"), optional=True), Column("percentage", coltr("Match %"), optional=True),
Column('words', coltr("Words Used"), visible=False, optional=True), Column("words", coltr("Words Used"), visible=False, optional=True),
Column('dupe_count', coltr("Dupe Count"), visible=False, optional=True), Column("dupe_count", coltr("Dupe Count"), visible=False, optional=True),
] ]
DELTA_COLUMNS = {'size', 'mtime'} DELTA_COLUMNS = {"size", "mtime"}

View File

@@ -8,12 +8,12 @@ from hscommon.trans import tr
from core.scanner import Scanner as ScannerBase, ScanOption, ScanType from core.scanner import Scanner as ScannerBase, ScanOption, ScanType
class ScannerSE(ScannerBase): class ScannerSE(ScannerBase):
@staticmethod @staticmethod
def get_scan_options(): def get_scan_options():
return [ return [
ScanOption(ScanType.Filename, tr("Filename")), ScanOption(ScanType.FILENAME, tr("Filename")),
ScanOption(ScanType.Contents, tr("Contents")), ScanOption(ScanType.CONTENTS, tr("Contents")),
ScanOption(ScanType.Folders, tr("Folders")), ScanOption(ScanType.FOLDERS, tr("Folders")),
] ]

View File

@@ -7,106 +7,113 @@
import os import os
import os.path as op import os.path as op
import logging import logging
import tempfile
from pytest import mark import pytest
from hscommon.path import Path from pathlib import Path
import hscommon.conflict import hscommon.conflict
import hscommon.util import hscommon.util
from hscommon.testutil import eq_, log_calls from hscommon.testutil import eq_, log_calls
from hscommon.jobprogress.job import Job from hscommon.jobprogress.job import Job
from .base import TestApp from core.tests.base import TestApp
from .results_test import GetTestGroups from core.tests.results_test import GetTestGroups
from .. import app, fs, engine from core import app, fs, engine
from ..scanner import ScanType from core.scanner import ScanType
def add_fake_files_to_directories(directories, files): def add_fake_files_to_directories(directories, files):
directories.get_files = lambda j=None: iter(files) directories.get_files = lambda j=None: iter(files)
directories._dirs.append('this is just so Scan() doesnt return 3') directories._dirs.append("this is just so Scan() doesn't return 3")
class TestCaseDupeGuru: class TestCaseDupeGuru:
def test_apply_filter_calls_results_apply_filter(self, monkeypatch): def test_apply_filter_calls_results_apply_filter(self, monkeypatch):
dgapp = TestApp().app dgapp = TestApp().app
monkeypatch.setattr(dgapp.results, 'apply_filter', log_calls(dgapp.results.apply_filter)) monkeypatch.setattr(dgapp.results, "apply_filter", log_calls(dgapp.results.apply_filter))
dgapp.apply_filter('foo') dgapp.apply_filter("foo")
eq_(2, len(dgapp.results.apply_filter.calls)) eq_(2, len(dgapp.results.apply_filter.calls))
call = dgapp.results.apply_filter.calls[0] call = dgapp.results.apply_filter.calls[0]
assert call['filter_str'] is None assert call["filter_str"] is None
call = dgapp.results.apply_filter.calls[1] call = dgapp.results.apply_filter.calls[1]
eq_('foo', call['filter_str']) eq_("foo", call["filter_str"])
def test_apply_filter_escapes_regexp(self, monkeypatch): def test_apply_filter_escapes_regexp(self, monkeypatch):
dgapp = TestApp().app dgapp = TestApp().app
monkeypatch.setattr(dgapp.results, 'apply_filter', log_calls(dgapp.results.apply_filter)) monkeypatch.setattr(dgapp.results, "apply_filter", log_calls(dgapp.results.apply_filter))
dgapp.apply_filter('()[]\\.|+?^abc') dgapp.apply_filter("()[]\\.|+?^abc")
call = dgapp.results.apply_filter.calls[1] call = dgapp.results.apply_filter.calls[1]
eq_('\\(\\)\\[\\]\\\\\\.\\|\\+\\?\\^abc', call['filter_str']) eq_("\\(\\)\\[\\]\\\\\\.\\|\\+\\?\\^abc", call["filter_str"])
dgapp.apply_filter('(*)') # In "simple mode", we want the * to behave as a wilcard dgapp.apply_filter("(*)") # In "simple mode", we want the * to behave as a wildcard
call = dgapp.results.apply_filter.calls[3] call = dgapp.results.apply_filter.calls[3]
eq_(r'\(.*\)', call['filter_str']) eq_(r"\(.*\)", call["filter_str"])
dgapp.options['escape_filter_regexp'] = False dgapp.options["escape_filter_regexp"] = False
dgapp.apply_filter('(abc)') dgapp.apply_filter("(abc)")
call = dgapp.results.apply_filter.calls[5] call = dgapp.results.apply_filter.calls[5]
eq_('(abc)', call['filter_str']) eq_("(abc)", call["filter_str"])
def test_copy_or_move(self, tmpdir, monkeypatch): def test_copy_or_move(self, tmpdir, monkeypatch):
# The goal here is just to have a test for a previous blowup I had. I know my test coverage # The goal here is just to have a test for a previous blowup I had. I know my test coverage
# for this unit is pathetic. What's done is done. My approach now is to add tests for # for this unit is pathetic. What's done is done. My approach now is to add tests for
# every change I want to make. The blowup was caused by a missing import. # every change I want to make. The blowup was caused by a missing import.
p = Path(str(tmpdir)) p = Path(str(tmpdir))
p['foo'].open('w').close() p.joinpath("foo").touch()
monkeypatch.setattr(hscommon.conflict, 'smart_copy', log_calls(lambda source_path, dest_path: None)) monkeypatch.setattr(
hscommon.conflict,
"smart_copy",
log_calls(lambda source_path, dest_path: None),
)
# XXX This monkeypatch is temporary. will be fixed in a better monkeypatcher. # XXX This monkeypatch is temporary. will be fixed in a better monkeypatcher.
monkeypatch.setattr(app, 'smart_copy', hscommon.conflict.smart_copy) monkeypatch.setattr(app, "smart_copy", hscommon.conflict.smart_copy)
monkeypatch.setattr(os, 'makedirs', lambda path: None) # We don't want the test to create that fake directory monkeypatch.setattr(os, "makedirs", lambda path: None) # We don't want the test to create that fake directory
dgapp = TestApp().app dgapp = TestApp().app
dgapp.directories.add_path(p) dgapp.directories.add_path(p)
[f] = dgapp.directories.get_files() [f] = dgapp.directories.get_files()
dgapp.copy_or_move(f, True, 'some_destination', 0) with tempfile.TemporaryDirectory() as tmp_dir:
eq_(1, len(hscommon.conflict.smart_copy.calls)) dgapp.copy_or_move(f, True, tmp_dir, 0)
call = hscommon.conflict.smart_copy.calls[0] eq_(1, len(hscommon.conflict.smart_copy.calls))
eq_(call['dest_path'], op.join('some_destination', 'foo')) call = hscommon.conflict.smart_copy.calls[0]
eq_(call['source_path'], f.path) eq_(call["dest_path"], Path(tmp_dir, "foo"))
eq_(call["source_path"], f.path)
def test_copy_or_move_clean_empty_dirs(self, tmpdir, monkeypatch): def test_copy_or_move_clean_empty_dirs(self, tmpdir, monkeypatch):
tmppath = Path(str(tmpdir)) tmppath = Path(str(tmpdir))
sourcepath = tmppath['source'] sourcepath = tmppath.joinpath("source")
sourcepath.mkdir() sourcepath.mkdir()
sourcepath['myfile'].open('w') sourcepath.joinpath("myfile").touch()
app = TestApp().app app = TestApp().app
app.directories.add_path(tmppath) app.directories.add_path(tmppath)
[myfile] = app.directories.get_files() [myfile] = app.directories.get_files()
monkeypatch.setattr(app, 'clean_empty_dirs', log_calls(lambda path: None)) monkeypatch.setattr(app, "clean_empty_dirs", log_calls(lambda path: None))
app.copy_or_move(myfile, False, tmppath['dest'], 0) app.copy_or_move(myfile, False, tmppath.joinpath("dest"), 0)
calls = app.clean_empty_dirs.calls calls = app.clean_empty_dirs.calls
eq_(1, len(calls)) eq_(1, len(calls))
eq_(sourcepath, calls[0]['path']) eq_(sourcepath, calls[0]["path"])
def test_Scan_with_objects_evaluating_to_false(self): def test_scan_with_objects_evaluating_to_false(self):
class FakeFile(fs.File): class FakeFile(fs.File):
def __bool__(self): def __bool__(self):
return False return False
# At some point, any() was used in a wrong way that made Scan() wrongly return 1 # At some point, any() was used in a wrong way that made Scan() wrongly return 1
app = TestApp().app app = TestApp().app
f1, f2 = [FakeFile('foo') for i in range(2)] f1, f2 = (FakeFile("foo") for _ in range(2))
f1.is_ref, f2.is_ref = (False, False) f1.is_ref, f2.is_ref = (False, False)
assert not (bool(f1) and bool(f2)) assert not (bool(f1) and bool(f2))
add_fake_files_to_directories(app.directories, [f1, f2]) add_fake_files_to_directories(app.directories, [f1, f2])
app.start_scanning() # no exception app.start_scanning() # no exception
@mark.skipif("not hasattr(os, 'link')") @pytest.mark.skipif("not hasattr(os, 'link')")
def test_ignore_hardlink_matches(self, tmpdir): def test_ignore_hardlink_matches(self, tmpdir):
# If the ignore_hardlink_matches option is set, don't match files hardlinking to the same # If the ignore_hardlink_matches option is set, don't match files hardlinking to the same
# inode. # inode.
tmppath = Path(str(tmpdir)) tmppath = Path(str(tmpdir))
tmppath['myfile'].open('w').write('foo') tmppath.joinpath("myfile").open("wt").write("foo")
os.link(str(tmppath['myfile']), str(tmppath['hardlink'])) os.link(str(tmppath.joinpath("myfile")), str(tmppath.joinpath("hardlink")))
app = TestApp().app app = TestApp().app
app.directories.add_path(tmppath) app.directories.add_path(tmppath)
app.options['scan_type'] = ScanType.Contents app.options["scan_type"] = ScanType.CONTENTS
app.options['ignore_hardlink_matches'] = True app.options["ignore_hardlink_matches"] = True
app.start_scanning() app.start_scanning()
eq_(len(app.results.groups), 0) eq_(len(app.results.groups), 0)
@@ -116,48 +123,55 @@ class TestCaseDupeGuru:
# making the selected row None. Don't crash when it happens. # making the selected row None. Don't crash when it happens.
dgapp = TestApp().app dgapp = TestApp().app
# selected_row is None because there's no result. # selected_row is None because there's no result.
assert not dgapp.result_table.rename_selected('foo') # no crash assert not dgapp.result_table.rename_selected("foo") # no crash
class TestCaseDupeGuru_clean_empty_dirs:
def pytest_funcarg__do_setup(self, request): class TestCaseDupeGuruCleanEmptyDirs:
monkeypatch = request.getfuncargvalue('monkeypatch') @pytest.fixture
monkeypatch.setattr(hscommon.util, 'delete_if_empty', log_calls(lambda path, files_to_delete=[]: None)) def do_setup(self, request):
monkeypatch = request.getfixturevalue("monkeypatch")
monkeypatch.setattr(
hscommon.util,
"delete_if_empty",
log_calls(lambda path, files_to_delete=[]: None),
)
# XXX This monkeypatch is temporary. will be fixed in a better monkeypatcher. # XXX This monkeypatch is temporary. will be fixed in a better monkeypatcher.
monkeypatch.setattr(app, 'delete_if_empty', hscommon.util.delete_if_empty) monkeypatch.setattr(app, "delete_if_empty", hscommon.util.delete_if_empty)
self.app = TestApp().app self.app = TestApp().app
def test_option_off(self, do_setup): def test_option_off(self, do_setup):
self.app.clean_empty_dirs(Path('/foo/bar')) self.app.clean_empty_dirs(Path("/foo/bar"))
eq_(0, len(hscommon.util.delete_if_empty.calls)) eq_(0, len(hscommon.util.delete_if_empty.calls))
def test_option_on(self, do_setup): def test_option_on(self, do_setup):
self.app.options['clean_empty_dirs'] = True self.app.options["clean_empty_dirs"] = True
self.app.clean_empty_dirs(Path('/foo/bar')) self.app.clean_empty_dirs(Path("/foo/bar"))
calls = hscommon.util.delete_if_empty.calls calls = hscommon.util.delete_if_empty.calls
eq_(1, len(calls)) eq_(1, len(calls))
eq_(Path('/foo/bar'), calls[0]['path']) eq_(Path("/foo/bar"), calls[0]["path"])
eq_(['.DS_Store'], calls[0]['files_to_delete']) eq_([".DS_Store"], calls[0]["files_to_delete"])
def test_recurse_up(self, do_setup, monkeypatch): def test_recurse_up(self, do_setup, monkeypatch):
# delete_if_empty must be recursively called up in the path until it returns False # delete_if_empty must be recursively called up in the path until it returns False
@log_calls @log_calls
def mock_delete_if_empty(path, files_to_delete=[]): def mock_delete_if_empty(path, files_to_delete=[]):
return len(path) > 1 return len(path.parts) > 1
monkeypatch.setattr(hscommon.util, 'delete_if_empty', mock_delete_if_empty) monkeypatch.setattr(hscommon.util, "delete_if_empty", mock_delete_if_empty)
# XXX This monkeypatch is temporary. will be fixed in a better monkeypatcher. # XXX This monkeypatch is temporary. will be fixed in a better monkeypatcher.
monkeypatch.setattr(app, 'delete_if_empty', mock_delete_if_empty) monkeypatch.setattr(app, "delete_if_empty", mock_delete_if_empty)
self.app.options['clean_empty_dirs'] = True self.app.options["clean_empty_dirs"] = True
self.app.clean_empty_dirs(Path('not-empty/empty/empty')) self.app.clean_empty_dirs(Path("not-empty/empty/empty"))
calls = hscommon.util.delete_if_empty.calls calls = hscommon.util.delete_if_empty.calls
eq_(3, len(calls)) eq_(3, len(calls))
eq_(Path('not-empty/empty/empty'), calls[0]['path']) eq_(Path("not-empty/empty/empty"), calls[0]["path"])
eq_(Path('not-empty/empty'), calls[1]['path']) eq_(Path("not-empty/empty"), calls[1]["path"])
eq_(Path('not-empty'), calls[2]['path']) eq_(Path("not-empty"), calls[2]["path"])
class TestCaseDupeGuruWithResults: class TestCaseDupeGuruWithResults:
def pytest_funcarg__do_setup(self, request): @pytest.fixture
def do_setup(self, request):
app = TestApp() app = TestApp()
self.app = app.app self.app = app.app
self.objects, self.matches, self.groups = GetTestGroups() self.objects, self.matches, self.groups = GetTestGroups()
@@ -166,13 +180,13 @@ class TestCaseDupeGuruWithResults:
self.dtree = app.dtree self.dtree = app.dtree
self.rtable = app.rtable self.rtable = app.rtable
self.rtable.refresh() self.rtable.refresh()
tmpdir = request.getfuncargvalue('tmpdir') tmpdir = request.getfixturevalue("tmpdir")
tmppath = Path(str(tmpdir)) tmppath = Path(str(tmpdir))
tmppath['foo'].mkdir() tmppath.joinpath("foo").mkdir()
tmppath['bar'].mkdir() tmppath.joinpath("bar").mkdir()
self.app.directories.add_path(tmppath) self.app.directories.add_path(tmppath)
def test_GetObjects(self, do_setup): def test_get_objects(self, do_setup):
objects = self.objects objects = self.objects
groups = self.groups groups = self.groups
r = self.rtable[0] r = self.rtable[0]
@@ -185,10 +199,10 @@ class TestCaseDupeGuruWithResults:
assert r._group is groups[1] assert r._group is groups[1]
assert r._dupe is objects[4] assert r._dupe is objects[4]
def test_GetObjects_after_sort(self, do_setup): def test_get_objects_after_sort(self, do_setup):
objects = self.objects objects = self.objects
groups = self.groups[:] # we need an un-sorted reference groups = self.groups[:] # we need an un-sorted reference
self.rtable.sort('name', False) self.rtable.sort("name", False)
r = self.rtable[1] r = self.rtable[1]
assert r._group is groups[1] assert r._group is groups[1]
assert r._dupe is objects[4] assert r._dupe is objects[4]
@@ -198,9 +212,9 @@ class TestCaseDupeGuruWithResults:
self.rtable.select([1, 2, 3]) self.rtable.select([1, 2, 3])
self.app.remove_selected() self.app.remove_selected()
# The first 2 dupes have been removed. The 3rd one is a ref. it stays there, in first pos. # The first 2 dupes have been removed. The 3rd one is a ref. it stays there, in first pos.
eq_(self.rtable.selected_indexes, [1]) # no exception eq_(self.rtable.selected_indexes, [1]) # no exception
def test_selectResultNodePaths(self, do_setup): def test_select_result_node_paths(self, do_setup):
app = self.app app = self.app
objects = self.objects objects = self.objects
self.rtable.select([1, 2]) self.rtable.select([1, 2])
@@ -208,7 +222,7 @@ class TestCaseDupeGuruWithResults:
assert app.selected_dupes[0] is objects[1] assert app.selected_dupes[0] is objects[1]
assert app.selected_dupes[1] is objects[2] assert app.selected_dupes[1] is objects[2]
def test_selectResultNodePaths_with_ref(self, do_setup): def test_select_result_node_paths_with_ref(self, do_setup):
app = self.app app = self.app
objects = self.objects objects = self.objects
self.rtable.select([1, 2, 3]) self.rtable.select([1, 2, 3])
@@ -217,12 +231,12 @@ class TestCaseDupeGuruWithResults:
assert app.selected_dupes[1] is objects[2] assert app.selected_dupes[1] is objects[2]
assert app.selected_dupes[2] is self.groups[1].ref assert app.selected_dupes[2] is self.groups[1].ref
def test_selectResultNodePaths_after_sort(self, do_setup): def test_select_result_node_paths_after_sort(self, do_setup):
app = self.app app = self.app
objects = self.objects objects = self.objects
groups = self.groups[:] #To keep the old order in memory groups = self.groups[:] # To keep the old order in memory
self.rtable.sort('name', False) #0 self.rtable.sort("name", False) # 0
#Now, the group order is supposed to be reversed # Now, the group order is supposed to be reversed
self.rtable.select([1, 2, 3]) self.rtable.select([1, 2, 3])
eq_(len(app.selected_dupes), 3) eq_(len(app.selected_dupes), 3)
assert app.selected_dupes[0] is objects[4] assert app.selected_dupes[0] is objects[4]
@@ -242,13 +256,13 @@ class TestCaseDupeGuruWithResults:
self.rtable.power_marker = True self.rtable.power_marker = True
self.rtable.select([0, 1, 2]) self.rtable.select([0, 1, 2])
app.remove_selected() app.remove_selected()
eq_(self.rtable.selected_indexes, []) # no exception eq_(self.rtable.selected_indexes, []) # no exception
def test_selectPowerMarkerRows_after_sort(self, do_setup): def test_select_powermarker_rows_after_sort(self, do_setup):
app = self.app app = self.app
objects = self.objects objects = self.objects
self.rtable.power_marker = True self.rtable.power_marker = True
self.rtable.sort('name', False) self.rtable.sort("name", False)
self.rtable.select([0, 1, 2]) self.rtable.select([0, 1, 2])
eq_(len(app.selected_dupes), 3) eq_(len(app.selected_dupes), 3)
assert app.selected_dupes[0] is objects[4] assert app.selected_dupes[0] is objects[4]
@@ -283,15 +297,15 @@ class TestCaseDupeGuruWithResults:
app.toggle_selected_mark_state() app.toggle_selected_mark_state()
eq_(app.results.mark_count, 0) eq_(app.results.mark_count, 0)
def test_refreshDetailsWithSelected(self, do_setup): def test_refresh_details_with_selected(self, do_setup):
self.rtable.select([1, 4]) self.rtable.select([1, 4])
eq_(self.dpanel.row(0), ('Filename', 'bar bleh', 'foo bar')) eq_(self.dpanel.row(0), ("Filename", "bar bleh", "foo bar"))
self.dpanel.view.check_gui_calls(['refresh']) self.dpanel.view.check_gui_calls(["refresh"])
self.rtable.select([]) self.rtable.select([])
eq_(self.dpanel.row(0), ('Filename', '---', '---')) eq_(self.dpanel.row(0), ("Filename", "---", "---"))
self.dpanel.view.check_gui_calls(['refresh']) self.dpanel.view.check_gui_calls(["refresh"])
def test_makeSelectedReference(self, do_setup): def test_make_selected_reference(self, do_setup):
app = self.app app = self.app
objects = self.objects objects = self.objects
groups = self.groups groups = self.groups
@@ -300,25 +314,25 @@ class TestCaseDupeGuruWithResults:
assert groups[0].ref is objects[1] assert groups[0].ref is objects[1]
assert groups[1].ref is objects[4] assert groups[1].ref is objects[4]
def test_makeSelectedReference_by_selecting_two_dupes_in_the_same_group(self, do_setup): def test_make_selected_reference_by_selecting_two_dupes_in_the_same_group(self, do_setup):
app = self.app app = self.app
objects = self.objects objects = self.objects
groups = self.groups groups = self.groups
self.rtable.select([1, 2, 4]) self.rtable.select([1, 2, 4])
#Only [0, 0] and [1, 0] must go ref, not [0, 1] because it is a part of the same group # Only [0, 0] and [1, 0] must go ref, not [0, 1] because it is a part of the same group
app.make_selected_reference() app.make_selected_reference()
assert groups[0].ref is objects[1] assert groups[0].ref is objects[1]
assert groups[1].ref is objects[4] assert groups[1].ref is objects[4]
def test_removeSelected(self, do_setup): def test_remove_selected(self, do_setup):
app = self.app app = self.app
self.rtable.select([1, 4]) self.rtable.select([1, 4])
app.remove_selected() app.remove_selected()
eq_(len(app.results.dupes), 1) # the first path is now selected eq_(len(app.results.dupes), 1) # the first path is now selected
app.remove_selected() app.remove_selected()
eq_(len(app.results.dupes), 0) eq_(len(app.results.dupes), 0)
def test_addDirectory_simple(self, do_setup): def test_add_directory_simple(self, do_setup):
# There's already a directory in self.app, so adding another once makes 2 of em # There's already a directory in self.app, so adding another once makes 2 of em
app = self.app app = self.app
# any other path that isn't a parent or child of the already added path # any other path that isn't a parent or child of the already added path
@@ -326,7 +340,7 @@ class TestCaseDupeGuruWithResults:
app.add_directory(otherpath) app.add_directory(otherpath)
eq_(len(app.directories), 2) eq_(len(app.directories), 2)
def test_addDirectory_already_there(self, do_setup): def test_add_directory_already_there(self, do_setup):
app = self.app app = self.app
otherpath = Path(op.dirname(__file__)) otherpath = Path(op.dirname(__file__))
app.add_directory(otherpath) app.add_directory(otherpath)
@@ -334,46 +348,46 @@ class TestCaseDupeGuruWithResults:
eq_(len(app.view.messages), 1) eq_(len(app.view.messages), 1)
assert "already" in app.view.messages[0] assert "already" in app.view.messages[0]
def test_addDirectory_does_not_exist(self, do_setup): def test_add_directory_does_not_exist(self, do_setup):
app = self.app app = self.app
app.add_directory('/does_not_exist') app.add_directory("/does_not_exist")
eq_(len(app.view.messages), 1) eq_(len(app.view.messages), 1)
assert "exist" in app.view.messages[0] assert "exist" in app.view.messages[0]
def test_ignore(self, do_setup): def test_ignore(self, do_setup):
app = self.app app = self.app
self.rtable.select([4]) #The dupe of the second, 2 sized group self.rtable.select([4]) # The dupe of the second, 2 sized group
app.add_selected_to_ignore_list() app.add_selected_to_ignore_list()
eq_(len(app.ignore_list), 1) eq_(len(app.ignore_list), 1)
self.rtable.select([1]) #first dupe of the 3 dupes group self.rtable.select([1]) # first dupe of the 3 dupes group
app.add_selected_to_ignore_list() app.add_selected_to_ignore_list()
#BOTH the ref and the other dupe should have been added # BOTH the ref and the other dupe should have been added
eq_(len(app.ignore_list), 3) eq_(len(app.ignore_list), 3)
def test_purgeIgnoreList(self, do_setup, tmpdir): def test_purge_ignorelist(self, do_setup, tmpdir):
app = self.app app = self.app
p1 = str(tmpdir.join('file1')) p1 = str(tmpdir.join("file1"))
p2 = str(tmpdir.join('file2')) p2 = str(tmpdir.join("file2"))
open(p1, 'w').close() open(p1, "w").close()
open(p2, 'w').close() open(p2, "w").close()
dne = '/does_not_exist' dne = "/does_not_exist"
app.ignore_list.Ignore(dne, p1) app.ignore_list.ignore(dne, p1)
app.ignore_list.Ignore(p2, dne) app.ignore_list.ignore(p2, dne)
app.ignore_list.Ignore(p1, p2) app.ignore_list.ignore(p1, p2)
app.purge_ignore_list() app.purge_ignore_list()
eq_(1, len(app.ignore_list)) eq_(1, len(app.ignore_list))
assert app.ignore_list.AreIgnored(p1, p2) assert app.ignore_list.are_ignored(p1, p2)
assert not app.ignore_list.AreIgnored(dne, p1) assert not app.ignore_list.are_ignored(dne, p1)
def test_only_unicode_is_added_to_ignore_list(self, do_setup): def test_only_unicode_is_added_to_ignore_list(self, do_setup):
def FakeIgnore(first, second): def fake_ignore(first, second):
if not isinstance(first, str): if not isinstance(first, str):
self.fail() self.fail()
if not isinstance(second, str): if not isinstance(second, str):
self.fail() self.fail()
app = self.app app = self.app
app.ignore_list.Ignore = FakeIgnore app.ignore_list.ignore = fake_ignore
self.rtable.select([4]) self.rtable.select([4])
app.add_selected_to_ignore_list() app.add_selected_to_ignore_list()
@@ -381,9 +395,9 @@ class TestCaseDupeGuruWithResults:
# When doing a scan with results being present prior to the scan, correctly invalidate the # When doing a scan with results being present prior to the scan, correctly invalidate the
# results table. # results table.
app = self.app app = self.app
app.JOB = Job(1, lambda *args, **kw: False) # Cancels the task app.JOB = Job(1, lambda *args, **kw: False) # Cancels the task
add_fake_files_to_directories(app.directories, self.objects) # We want the scan to at least start add_fake_files_to_directories(app.directories, self.objects) # We want the scan to at least start
app.start_scanning() # will be cancelled immediately app.start_scanning() # will be cancelled immediately
eq_(len(app.result_table), 0) eq_(len(app.result_table), 0)
def test_selected_dupes_after_removal(self, do_setup): def test_selected_dupes_after_removal(self, do_setup):
@@ -401,22 +415,20 @@ class TestCaseDupeGuruWithResults:
# Ref #238 # Ref #238
self.rtable.delta_values = True self.rtable.delta_values = True
self.rtable.power_marker = True self.rtable.power_marker = True
self.rtable.sort('dupe_count', False) self.rtable.sort("dupe_count", False)
# don't crash # don't crash
self.rtable.sort('percentage', False) self.rtable.sort("percentage", False)
# don't crash # don't crash
class TestCaseDupeGuru_renameSelected: class TestCaseDupeGuruRenameSelected:
def pytest_funcarg__do_setup(self, request): @pytest.fixture
tmpdir = request.getfuncargvalue('tmpdir') def do_setup(self, request):
tmpdir = request.getfixturevalue("tmpdir")
p = Path(str(tmpdir)) p = Path(str(tmpdir))
fp = open(str(p['foo bar 1']), mode='w') p.joinpath("foo bar 1").touch()
fp.close() p.joinpath("foo bar 2").touch()
fp = open(str(p['foo bar 2']), mode='w') p.joinpath("foo bar 3").touch()
fp.close()
fp = open(str(p['foo bar 3']), mode='w')
fp.close()
files = fs.get_files(p) files = fs.get_files(p)
for f in files: for f in files:
f.is_ref = False f.is_ref = False
@@ -437,46 +449,47 @@ class TestCaseDupeGuru_renameSelected:
app = self.app app = self.app
g = self.groups[0] g = self.groups[0]
self.rtable.select([1]) self.rtable.select([1])
assert app.rename_selected('renamed') assert app.rename_selected("renamed")
names = [p.name for p in self.p.listdir()] names = [p.name for p in self.p.glob("*")]
assert 'renamed' in names assert "renamed" in names
assert 'foo bar 2' not in names assert "foo bar 2" not in names
eq_(g.dupes[0].name, 'renamed') eq_(g.dupes[0].name, "renamed")
def test_none_selected(self, do_setup, monkeypatch): def test_none_selected(self, do_setup, monkeypatch):
app = self.app app = self.app
g = self.groups[0] g = self.groups[0]
self.rtable.select([]) self.rtable.select([])
monkeypatch.setattr(logging, 'warning', log_calls(lambda msg: None)) monkeypatch.setattr(logging, "warning", log_calls(lambda msg: None))
assert not app.rename_selected('renamed') assert not app.rename_selected("renamed")
msg = logging.warning.calls[0]['msg'] msg = logging.warning.calls[0]["msg"]
eq_('dupeGuru Warning: list index out of range', msg) eq_("dupeGuru Warning: list index out of range", msg)
names = [p.name for p in self.p.listdir()] names = [p.name for p in self.p.glob("*")]
assert 'renamed' not in names assert "renamed" not in names
assert 'foo bar 2' in names assert "foo bar 2" in names
eq_(g.dupes[0].name, 'foo bar 2') eq_(g.dupes[0].name, "foo bar 2")
def test_name_already_exists(self, do_setup, monkeypatch): def test_name_already_exists(self, do_setup, monkeypatch):
app = self.app app = self.app
g = self.groups[0] g = self.groups[0]
self.rtable.select([1]) self.rtable.select([1])
monkeypatch.setattr(logging, 'warning', log_calls(lambda msg: None)) monkeypatch.setattr(logging, "warning", log_calls(lambda msg: None))
assert not app.rename_selected('foo bar 1') assert not app.rename_selected("foo bar 1")
msg = logging.warning.calls[0]['msg'] msg = logging.warning.calls[0]["msg"]
assert msg.startswith('dupeGuru Warning: \'foo bar 1\' already exists in') assert msg.startswith("dupeGuru Warning: 'foo bar 1' already exists in")
names = [p.name for p in self.p.listdir()] names = [p.name for p in self.p.glob("*")]
assert 'foo bar 1' in names assert "foo bar 1" in names
assert 'foo bar 2' in names assert "foo bar 2" in names
eq_(g.dupes[0].name, 'foo bar 2') eq_(g.dupes[0].name, "foo bar 2")
class TestAppWithDirectoriesInTree: class TestAppWithDirectoriesInTree:
def pytest_funcarg__do_setup(self, request): @pytest.fixture
tmpdir = request.getfuncargvalue('tmpdir') def do_setup(self, request):
tmpdir = request.getfixturevalue("tmpdir")
p = Path(str(tmpdir)) p = Path(str(tmpdir))
p['sub1'].mkdir() p.joinpath("sub1").mkdir()
p['sub2'].mkdir() p.joinpath("sub2").mkdir()
p['sub3'].mkdir() p.joinpath("sub3").mkdir()
app = TestApp() app = TestApp()
self.app = app.app self.app = app.app
self.dtree = app.dtree self.dtree = app.dtree
@@ -487,12 +500,10 @@ class TestAppWithDirectoriesInTree:
# Setting a node state to something also affect subnodes. These subnodes must be correctly # Setting a node state to something also affect subnodes. These subnodes must be correctly
# refreshed. # refreshed.
node = self.dtree[0] node = self.dtree[0]
eq_(len(node), 3) # a len() call is required for subnodes to be loaded eq_(len(node), 3) # a len() call is required for subnodes to be loaded
subnode = node[0] node.state = 1 # the state property is a state index
node.state = 1 # the state property is a state index
node = self.dtree[0] node = self.dtree[0]
eq_(len(node), 3) eq_(len(node), 3)
subnode = node[0] subnode = node[0]
eq_(subnode.state, 1) eq_(subnode.state, 1)
self.dtree.view.check_gui_calls(['refresh_states']) self.dtree.view.check_gui_calls(["refresh_states"])

View File

@@ -4,18 +4,18 @@
# which should be included with this package. The terms are also available at # which should be included with this package. The terms are also available at
# http://www.gnu.org/licenses/gpl-3.0.html # http://www.gnu.org/licenses/gpl-3.0.html
from hscommon.testutil import TestApp as TestAppBase, CallLogger, eq_, with_app # noqa from hscommon.testutil import TestApp as TestAppBase, CallLogger, eq_, with_app # noqa
from hscommon.path import Path from pathlib import Path
from hscommon.util import get_file_ext, format_size from hscommon.util import get_file_ext, format_size
from hscommon.gui.column import Column from hscommon.gui.column import Column
from hscommon.jobprogress.job import nulljob, JobCancelled from hscommon.jobprogress.job import nulljob, JobCancelled
from .. import engine from core import engine, prioritize
from .. import prioritize from core.engine import getwords
from ..engine import getwords from core.app import DupeGuru as DupeGuruBase
from ..app import DupeGuru as DupeGuruBase from core.gui.result_table import ResultTable as ResultTableBase
from ..gui.result_table import ResultTable as ResultTableBase from core.gui.prioritize_dialog import PrioritizeDialog
from ..gui.prioritize_dialog import PrioritizeDialog
class DupeGuruView: class DupeGuruView:
JOB = nulljob JOB = nulljob
@@ -39,28 +39,32 @@ class DupeGuruView:
self.messages.append(msg) self.messages.append(msg)
def ask_yes_no(self, prompt): def ask_yes_no(self, prompt):
return True # always answer yes return True # always answer yes
def create_results_window(self): def create_results_window(self):
pass pass
class ResultTable(ResultTableBase): class ResultTable(ResultTableBase):
COLUMNS = [ COLUMNS = [
Column('marked', ''), Column("marked", ""),
Column('name', 'Filename'), Column("name", "Filename"),
Column('folder_path', 'Directory'), Column("folder_path", "Directory"),
Column('size', 'Size (KB)'), Column("size", "Size (KB)"),
Column('extension', 'Kind'), Column("extension", "Kind"),
] ]
DELTA_COLUMNS = {'size', } DELTA_COLUMNS = {
"size",
}
class DupeGuru(DupeGuruBase): class DupeGuru(DupeGuruBase):
NAME = 'dupeGuru' NAME = "dupeGuru"
METADATA_TO_READ = ['size'] METADATA_TO_READ = ["size"]
def __init__(self): def __init__(self):
DupeGuruBase.__init__(self, DupeGuruView()) DupeGuruBase.__init__(self, DupeGuruView())
self.appdata = '/tmp' self.appdata = "/tmp"
self._recreate_result_table() self._recreate_result_table()
def _prioritization_categories(self): def _prioritization_categories(self):
@@ -78,17 +82,18 @@ class NamedObject:
def __init__(self, name="foobar", with_words=False, size=1, folder=None): def __init__(self, name="foobar", with_words=False, size=1, folder=None):
self.name = name self.name = name
if folder is None: if folder is None:
folder = 'basepath' folder = "basepath"
self._folder = Path(folder) self._folder = Path(folder)
self.size = size self.size = size
self.md5partial = name self.digest_partial = name
self.md5 = name self.digest = name
self.digest_samples = name
if with_words: if with_words:
self.words = getwords(name) self.words = getwords(name)
self.is_ref = False self.is_ref = False
def __bool__(self): def __bool__(self):
return False #Make sure that operations are made correctly when the bool value of files is false. return False # Make sure that operations are made correctly when the bool value of files is false.
def get_display_info(self, group, delta): def get_display_info(self, group, delta):
size = self.size size = self.size
@@ -97,24 +102,25 @@ class NamedObject:
r = group.ref r = group.ref
size -= r.size size -= r.size
return { return {
'name': self.name, "name": self.name,
'folder_path': str(self.folder_path), "folder_path": str(self.folder_path),
'size': format_size(size, 0, 1, False), "size": format_size(size, 0, 1, False),
'extension': self.extension if hasattr(self, 'extension') else '---', "extension": self.extension if hasattr(self, "extension") else "---",
} }
@property @property
def path(self): def path(self):
return self._folder[self.name] return self._folder.joinpath(self.name)
@property @property
def folder_path(self): def folder_path(self):
return self.path.parent() return self.path.parent
@property @property
def extension(self): def extension(self):
return get_file_ext(self.name) return get_file_ext(self.name)
# Returns a group set that looks like that: # Returns a group set that looks like that:
# "foo bar" (1) # "foo bar" (1)
# "bar bleh" (1024) # "bar bleh" (1024)
@@ -127,22 +133,25 @@ def GetTestGroups():
NamedObject("bar bleh"), NamedObject("bar bleh"),
NamedObject("foo bleh"), NamedObject("foo bleh"),
NamedObject("ibabtu"), NamedObject("ibabtu"),
NamedObject("ibabtu") NamedObject("ibabtu"),
] ]
objects[1].size = 1024 objects[1].size = 1024
matches = engine.getmatches(objects) #we should have 5 matches matches = engine.getmatches(objects) # we should have 5 matches
groups = engine.get_groups(matches) #We should have 2 groups groups = engine.get_groups(matches) # We should have 2 groups
for g in groups: for g in groups:
g.prioritize(lambda x: objects.index(x)) #We want the dupes to be in the same order as the list is g.prioritize(lambda x: objects.index(x)) # We want the dupes to be in the same order as the list is
groups.sort(key=len, reverse=True) # We want the group with 3 members to be first. groups.sort(key=len, reverse=True) # We want the group with 3 members to be first.
return (objects, matches, groups) return (objects, matches, groups)
class TestApp(TestAppBase): class TestApp(TestAppBase):
__test__ = False
def __init__(self): def __init__(self):
def link_gui(gui): def link_gui(gui):
gui.view = self.make_logger() gui.view = self.make_logger()
if hasattr(gui, 'columns'): # tables if hasattr(gui, "_columns"): # tables
gui.columns.view = self.make_logger() gui._columns.view = self.make_logger()
return gui return gui
TestAppBase.__init__(self) TestAppBase.__init__(self)
@@ -166,7 +175,7 @@ class TestApp(TestAppBase):
# rtable is a property because its instance can be replaced during execution # rtable is a property because its instance can be replaced during execution
return self.app.result_table return self.app.result_table
#--- Helpers # --- Helpers
def select_pri_criterion(self, name): def select_pri_criterion(self, name):
# Select a main prioritize criterion by name instead of by index. Makes tests more # Select a main prioritize criterion by name instead of by index. Makes tests more
# maintainable. # maintainable.

View File

@@ -9,17 +9,20 @@ from pytest import raises, skip
from hscommon.testutil import eq_ from hscommon.testutil import eq_
try: try:
from ..pe.block import avgdiff, getblocks2, NoBlocksError, DifferentBlockCountError from core.pe.block import avgdiff, getblocks2, NoBlocksError, DifferentBlockCountError
except ImportError: except ImportError:
skip("Can't import the block module, probably hasn't been compiled.") skip("Can't import the block module, probably hasn't been compiled.")
def my_avgdiff(first, second, limit=768, min_iter=3): # this is so I don't have to re-write every call
def my_avgdiff(first, second, limit=768, min_iter=3): # this is so I don't have to re-write every call
return avgdiff(first, second, limit, min_iter) return avgdiff(first, second, limit, min_iter)
BLACK = (0, 0, 0) BLACK = (0, 0, 0)
RED = (0xff, 0, 0) RED = (0xFF, 0, 0)
GREEN = (0, 0xff, 0) GREEN = (0, 0xFF, 0)
BLUE = (0, 0, 0xff) BLUE = (0, 0, 0xFF)
class FakeImage: class FakeImage:
def __init__(self, size, data): def __init__(self, size, data):
@@ -37,16 +40,20 @@ class FakeImage:
pixels.append(pixel) pixels.append(pixel)
return FakeImage((box[2] - box[0], box[3] - box[1]), pixels) return FakeImage((box[2] - box[0], box[3] - box[1]), pixels)
def empty(): def empty():
return FakeImage((0, 0), []) return FakeImage((0, 0), [])
def single_pixel(): #one red pixel
return FakeImage((1, 1), [(0xff, 0, 0)]) def single_pixel(): # one red pixel
return FakeImage((1, 1), [(0xFF, 0, 0)])
def four_pixels(): def four_pixels():
pixels = [RED, (0, 0x80, 0xff), (0x80, 0, 0), (0, 0x40, 0x80)] pixels = [RED, (0, 0x80, 0xFF), (0x80, 0, 0), (0, 0x40, 0x80)]
return FakeImage((2, 2), pixels) return FakeImage((2, 2), pixels)
class TestCasegetblock: class TestCasegetblock:
def test_single_pixel(self): def test_single_pixel(self):
im = single_pixel() im = single_pixel()
@@ -60,104 +67,12 @@ class TestCasegetblock:
def test_four_pixels(self): def test_four_pixels(self):
im = four_pixels() im = four_pixels()
[b] = getblocks2(im, 1) [b] = getblocks2(im, 1)
meanred = (0xff + 0x80) // 4 meanred = (0xFF + 0x80) // 4
meangreen = (0x80 + 0x40) // 4 meangreen = (0x80 + 0x40) // 4
meanblue = (0xff + 0x80) // 4 meanblue = (0xFF + 0x80) // 4
eq_((meanred, meangreen, meanblue), b) eq_((meanred, meangreen, meanblue), b)
# class TCdiff(unittest.TestCase):
# def test_diff(self):
# b1 = (10, 20, 30)
# b2 = (1, 2, 3)
# eq_(9 + 18 + 27, diff(b1, b2))
#
# def test_diff_negative(self):
# b1 = (10, 20, 30)
# b2 = (1, 2, 3)
# eq_(9 + 18 + 27, diff(b2, b1))
#
# def test_diff_mixed_positive_and_negative(self):
# b1 = (1, 5, 10)
# b2 = (10, 1, 15)
# eq_(9 + 4 + 5, diff(b1, b2))
#
# class TCgetblocks(unittest.TestCase):
# def test_empty_image(self):
# im = empty()
# blocks = getblocks(im, 1)
# eq_(0, len(blocks))
#
# def test_one_block_image(self):
# im = four_pixels()
# blocks = getblocks2(im, 1)
# eq_(1, len(blocks))
# block = blocks[0]
# meanred = (0xff + 0x80) // 4
# meangreen = (0x80 + 0x40) // 4
# meanblue = (0xff + 0x80) // 4
# eq_((meanred, meangreen, meanblue), block)
#
# def test_not_enough_height_to_fit_a_block(self):
# im = FakeImage((2, 1), [BLACK, BLACK])
# blocks = getblocks(im, 2)
# eq_(0, len(blocks))
#
# def xtest_dont_include_leftovers(self):
# # this test is disabled because getblocks is not used and getblock in cdeffed
# pixels = [
# RED,(0, 0x80, 0xff), BLACK,
# (0x80, 0, 0),(0, 0x40, 0x80), BLACK,
# BLACK, BLACK, BLACK
# ]
# im = FakeImage((3, 3), pixels)
# blocks = getblocks(im, 2)
# block = blocks[0]
# #Because the block is smaller than the image, only blocksize must be considered.
# meanred = (0xff + 0x80) // 4
# meangreen = (0x80 + 0x40) // 4
# meanblue = (0xff + 0x80) // 4
# eq_((meanred, meangreen, meanblue), block)
#
# def xtest_two_blocks(self):
# # this test is disabled because getblocks is not used and getblock in cdeffed
# pixels = [BLACK for i in xrange(4 * 2)]
# pixels[0] = RED
# pixels[1] = (0, 0x80, 0xff)
# pixels[4] = (0x80, 0, 0)
# pixels[5] = (0, 0x40, 0x80)
# im = FakeImage((4, 2), pixels)
# blocks = getblocks(im, 2)
# eq_(2, len(blocks))
# block = blocks[0]
# #Because the block is smaller than the image, only blocksize must be considered.
# meanred = (0xff + 0x80) // 4
# meangreen = (0x80 + 0x40) // 4
# meanblue = (0xff + 0x80) // 4
# eq_((meanred, meangreen, meanblue), block)
# eq_(BLACK, blocks[1])
#
# def test_four_blocks(self):
# pixels = [BLACK for i in xrange(4 * 4)]
# pixels[0] = RED
# pixels[1] = (0, 0x80, 0xff)
# pixels[4] = (0x80, 0, 0)
# pixels[5] = (0, 0x40, 0x80)
# im = FakeImage((4, 4), pixels)
# blocks = getblocks2(im, 2)
# eq_(4, len(blocks))
# block = blocks[0]
# #Because the block is smaller than the image, only blocksize must be considered.
# meanred = (0xff + 0x80) // 4
# meangreen = (0x80 + 0x40) // 4
# meanblue = (0xff + 0x80) // 4
# eq_((meanred, meangreen, meanblue), block)
# eq_(BLACK, blocks[1])
# eq_(BLACK, blocks[2])
# eq_(BLACK, blocks[3])
#
class TestCasegetblocks2: class TestCasegetblocks2:
def test_empty_image(self): def test_empty_image(self):
im = empty() im = empty()
@@ -169,9 +84,9 @@ class TestCasegetblocks2:
blocks = getblocks2(im, 1) blocks = getblocks2(im, 1)
eq_(1, len(blocks)) eq_(1, len(blocks))
block = blocks[0] block = blocks[0]
meanred = (0xff + 0x80) // 4 meanred = (0xFF + 0x80) // 4
meangreen = (0x80 + 0x40) // 4 meangreen = (0x80 + 0x40) // 4
meanblue = (0xff + 0x80) // 4 meanblue = (0xFF + 0x80) // 4
eq_((meanred, meangreen, meanblue), block) eq_((meanred, meangreen, meanblue), block)
def test_four_blocks_all_black(self): def test_four_blocks_all_black(self):
@@ -225,25 +140,25 @@ class TestCaseavgdiff:
my_avgdiff([b, b], [b]) my_avgdiff([b, b], [b])
def test_first_arg_is_empty_but_not_second(self): def test_first_arg_is_empty_but_not_second(self):
#Don't return 0 (as when the 2 lists are empty), raise! # Don't return 0 (as when the 2 lists are empty), raise!
b = (0, 0, 0) b = (0, 0, 0)
with raises(DifferentBlockCountError): with raises(DifferentBlockCountError):
my_avgdiff([], [b]) my_avgdiff([], [b])
def test_limit(self): def test_limit(self):
ref = (0, 0, 0) ref = (0, 0, 0)
b1 = (10, 10, 10) #avg 30 b1 = (10, 10, 10) # avg 30
b2 = (20, 20, 20) #avg 45 b2 = (20, 20, 20) # avg 45
b3 = (30, 30, 30) #avg 60 b3 = (30, 30, 30) # avg 60
blocks1 = [ref, ref, ref] blocks1 = [ref, ref, ref]
blocks2 = [b1, b2, b3] blocks2 = [b1, b2, b3]
eq_(45, my_avgdiff(blocks1, blocks2, 44)) eq_(45, my_avgdiff(blocks1, blocks2, 44))
def test_min_iterations(self): def test_min_iterations(self):
ref = (0, 0, 0) ref = (0, 0, 0)
b1 = (10, 10, 10) #avg 30 b1 = (10, 10, 10) # avg 30
b2 = (20, 20, 20) #avg 45 b2 = (20, 20, 20) # avg 45
b3 = (10, 10, 10) #avg 40 b3 = (10, 10, 10) # avg 40
blocks1 = [ref, ref, ref] blocks1 = [ref, ref, ref]
blocks2 = [b1, b2, b3] blocks2 = [b1, b2, b3]
eq_(40, my_avgdiff(blocks1, blocks2, 45 - 1, 3)) eq_(40, my_avgdiff(blocks1, blocks2, 45 - 1, 3))
@@ -262,8 +177,8 @@ class TestCaseavgdiff:
def test_return_at_least_1_at_the_slightest_difference(self): def test_return_at_least_1_at_the_slightest_difference(self):
ref = (0, 0, 0) ref = (0, 0, 0)
b1 = (1, 0, 0) b1 = (1, 0, 0)
blocks1 = [ref for i in range(250)] blocks1 = [ref for _ in range(250)]
blocks2 = [ref for i in range(250)] blocks2 = [ref for _ in range(250)]
blocks2[0] = b1 blocks2[0] = b1
eq_(1, my_avgdiff(blocks1, blocks2)) eq_(1, my_avgdiff(blocks1, blocks2))
@@ -272,41 +187,3 @@ class TestCaseavgdiff:
blocks1 = [ref, ref] blocks1 = [ref, ref]
blocks2 = [ref, ref] blocks2 = [ref, ref]
eq_(0, my_avgdiff(blocks1, blocks2)) eq_(0, my_avgdiff(blocks1, blocks2))
# class TCmaxdiff(unittest.TestCase):
# def test_empty(self):
# self.assertRaises(NoBlocksError, maxdiff,[],[])
#
# def test_two_blocks(self):
# b1 = (5, 10, 15)
# b2 = (255, 250, 245)
# b3 = (0, 0, 0)
# b4 = (255, 0, 255)
# blocks1 = [b1, b2]
# blocks2 = [b3, b4]
# expected1 = 5 + 10 + 15
# expected2 = 0 + 250 + 10
# expected = max(expected1, expected2)
# eq_(expected, maxdiff(blocks1, blocks2))
#
# def test_blocks_not_the_same_size(self):
# b = (0, 0, 0)
# self.assertRaises(DifferentBlockCountError, maxdiff,[b, b],[b])
#
# def test_first_arg_is_empty_but_not_second(self):
# #Don't return 0 (as when the 2 lists are empty), raise!
# b = (0, 0, 0)
# self.assertRaises(DifferentBlockCountError, maxdiff,[],[b])
#
# def test_limit(self):
# b1 = (5, 10, 15)
# b2 = (255, 250, 245)
# b3 = (0, 0, 0)
# b4 = (255, 0, 255)
# blocks1 = [b1, b2]
# blocks2 = [b3, b4]
# expected1 = 5 + 10 + 15
# expected2 = 0 + 250 + 10
# eq_(expected1, maxdiff(blocks1, blocks2, expected1 - 1))
#

View File

@@ -10,40 +10,41 @@ from pytest import raises, skip
from hscommon.testutil import eq_ from hscommon.testutil import eq_
try: try:
from ..pe.cache import colors_to_string, string_to_colors from core.pe.cache import colors_to_string, string_to_colors
from ..pe.cache_sqlite import SqliteCache from core.pe.cache_sqlite import SqliteCache
from ..pe.cache_shelve import ShelveCache from core.pe.cache_shelve import ShelveCache
except ImportError: except ImportError:
skip("Can't import the cache module, probably hasn't been compiled.") skip("Can't import the cache module, probably hasn't been compiled.")
class TestCasecolors_to_string:
class TestCaseColorsToString:
def test_no_color(self): def test_no_color(self):
eq_('', colors_to_string([])) eq_("", colors_to_string([]))
def test_single_color(self): def test_single_color(self):
eq_('000000', colors_to_string([(0, 0, 0)])) eq_("000000", colors_to_string([(0, 0, 0)]))
eq_('010101', colors_to_string([(1, 1, 1)])) eq_("010101", colors_to_string([(1, 1, 1)]))
eq_('0a141e', colors_to_string([(10, 20, 30)])) eq_("0a141e", colors_to_string([(10, 20, 30)]))
def test_two_colors(self): def test_two_colors(self):
eq_('000102030405', colors_to_string([(0, 1, 2), (3, 4, 5)])) eq_("000102030405", colors_to_string([(0, 1, 2), (3, 4, 5)]))
class TestCasestring_to_colors: class TestCaseStringToColors:
def test_empty(self): def test_empty(self):
eq_([], string_to_colors('')) eq_([], string_to_colors(""))
def test_single_color(self): def test_single_color(self):
eq_([(0, 0, 0)], string_to_colors('000000')) eq_([(0, 0, 0)], string_to_colors("000000"))
eq_([(2, 3, 4)], string_to_colors('020304')) eq_([(2, 3, 4)], string_to_colors("020304"))
eq_([(10, 20, 30)], string_to_colors('0a141e')) eq_([(10, 20, 30)], string_to_colors("0a141e"))
def test_two_colors(self): def test_two_colors(self):
eq_([(10, 20, 30), (40, 50, 60)], string_to_colors('0a141e28323c')) eq_([(10, 20, 30), (40, 50, 60)], string_to_colors("0a141e28323c"))
def test_incomplete_color(self): def test_incomplete_color(self):
# don't return anything if it's not a complete color # don't return anything if it's not a complete color
eq_([], string_to_colors('102')) eq_([], string_to_colors("102"))
class BaseTestCaseCache: class BaseTestCaseCache:
@@ -54,58 +55,58 @@ class BaseTestCaseCache:
c = self.get_cache() c = self.get_cache()
eq_(0, len(c)) eq_(0, len(c))
with raises(KeyError): with raises(KeyError):
c['foo'] c["foo"]
def test_set_then_retrieve_blocks(self): def test_set_then_retrieve_blocks(self):
c = self.get_cache() c = self.get_cache()
b = [(0, 0, 0), (1, 2, 3)] b = [(0, 0, 0), (1, 2, 3)]
c['foo'] = b c["foo"] = b
eq_(b, c['foo']) eq_(b, c["foo"])
def test_delitem(self): def test_delitem(self):
c = self.get_cache() c = self.get_cache()
c['foo'] = '' c["foo"] = ""
del c['foo'] del c["foo"]
assert 'foo' not in c assert "foo" not in c
with raises(KeyError): with raises(KeyError):
del c['foo'] del c["foo"]
def test_persistance(self, tmpdir): def test_persistance(self, tmpdir):
DBNAME = tmpdir.join('hstest.db') DBNAME = tmpdir.join("hstest.db")
c = self.get_cache(str(DBNAME)) c = self.get_cache(str(DBNAME))
c['foo'] = [(1, 2, 3)] c["foo"] = [(1, 2, 3)]
del c del c
c = self.get_cache(str(DBNAME)) c = self.get_cache(str(DBNAME))
eq_([(1, 2, 3)], c['foo']) eq_([(1, 2, 3)], c["foo"])
def test_filter(self): def test_filter(self):
c = self.get_cache() c = self.get_cache()
c['foo'] = '' c["foo"] = ""
c['bar'] = '' c["bar"] = ""
c['baz'] = '' c["baz"] = ""
c.filter(lambda p: p != 'bar') #only 'bar' is removed c.filter(lambda p: p != "bar") # only 'bar' is removed
eq_(2, len(c)) eq_(2, len(c))
assert 'foo' in c assert "foo" in c
assert 'baz' in c assert "baz" in c
assert 'bar' not in c assert "bar" not in c
def test_clear(self): def test_clear(self):
c = self.get_cache() c = self.get_cache()
c['foo'] = '' c["foo"] = ""
c['bar'] = '' c["bar"] = ""
c['baz'] = '' c["baz"] = ""
c.clear() c.clear()
eq_(0, len(c)) eq_(0, len(c))
assert 'foo' not in c assert "foo" not in c
assert 'baz' not in c assert "baz" not in c
assert 'bar' not in c assert "bar" not in c
def test_by_id(self): def test_by_id(self):
# it's possible to use the cache by referring to the files by their row_id # it's possible to use the cache by referring to the files by their row_id
c = self.get_cache() c = self.get_cache()
b = [(0, 0, 0), (1, 2, 3)] b = [(0, 0, 0), (1, 2, 3)]
c['foo'] = b c["foo"] = b
foo_id = c.get_id('foo') foo_id = c.get_id("foo")
eq_(c[foo_id], b) eq_(c[foo_id], b)
@@ -120,16 +121,16 @@ class TestCaseSqliteCache(BaseTestCaseCache):
# If we don't do this monkeypatching, we get a weird exception about trying to flush a # If we don't do this monkeypatching, we get a weird exception about trying to flush a
# closed file. I've tried setting logging level and stuff, but nothing worked. So, there we # closed file. I've tried setting logging level and stuff, but nothing worked. So, there we
# go, a dirty monkeypatch. # go, a dirty monkeypatch.
monkeypatch.setattr(logging, 'warning', lambda *args, **kw: None) monkeypatch.setattr(logging, "warning", lambda *args, **kw: None)
dbname = str(tmpdir.join('foo.db')) dbname = str(tmpdir.join("foo.db"))
fp = open(dbname, 'w') fp = open(dbname, "w")
fp.write('invalid sqlite content') fp.write("invalid sqlite content")
fp.close() fp.close()
c = self.get_cache(dbname) # should not raise a DatabaseError c = self.get_cache(dbname) # should not raise a DatabaseError
c['foo'] = [(1, 2, 3)] c["foo"] = [(1, 2, 3)]
del c del c
c = self.get_cache(dbname) c = self.get_cache(dbname)
eq_(c['foo'], [(1, 2, 3)]) eq_(c["foo"], [(1, 2, 3)])
class TestCaseShelveCache(BaseTestCaseCache): class TestCaseShelveCache(BaseTestCaseCache):
@@ -161,4 +162,3 @@ class TestCaseCacheSQLEscape:
del c["foo'bar"] del c["foo'bar"]
except KeyError: except KeyError:
assert False assert False

View File

@@ -1 +1 @@
from hscommon.testutil import pytest_funcarg__app # noqa from hscommon.testutil import app # noqa

View File

@@ -10,95 +10,104 @@ import tempfile
import shutil import shutil
from pytest import raises from pytest import raises
from hscommon.path import Path from pathlib import Path
from hscommon.testutil import eq_ from hscommon.testutil import eq_
from hscommon.plat import ISWINDOWS
from core.fs import File
from core.directories import (
Directories,
DirectoryState,
AlreadyThereError,
InvalidPathError,
)
from core.exclude import ExcludeList, ExcludeDict
from ..fs import File
from ..directories import Directories, DirectoryState, AlreadyThereError, InvalidPathError
def create_fake_fs(rootpath): def create_fake_fs(rootpath):
# We have it as a separate function because other units are using it. # We have it as a separate function because other units are using it.
rootpath = rootpath['fs'] rootpath = rootpath.joinpath("fs")
rootpath.mkdir() rootpath.mkdir()
rootpath['dir1'].mkdir() rootpath.joinpath("dir1").mkdir()
rootpath['dir2'].mkdir() rootpath.joinpath("dir2").mkdir()
rootpath['dir3'].mkdir() rootpath.joinpath("dir3").mkdir()
fp = rootpath['file1.test'].open('w') with rootpath.joinpath("file1.test").open("wt") as fp:
fp.write('1') fp.write("1")
fp.close() with rootpath.joinpath("file2.test").open("wt") as fp:
fp = rootpath['file2.test'].open('w') fp.write("12")
fp.write('12') with rootpath.joinpath("file3.test").open("wt") as fp:
fp.close() fp.write("123")
fp = rootpath['file3.test'].open('w') with rootpath.joinpath("dir1", "file1.test").open("wt") as fp:
fp.write('123') fp.write("1")
fp.close() with rootpath.joinpath("dir2", "file2.test").open("wt") as fp:
fp = rootpath['dir1']['file1.test'].open('w') fp.write("12")
fp.write('1') with rootpath.joinpath("dir3", "file3.test").open("wt") as fp:
fp.close() fp.write("123")
fp = rootpath['dir2']['file2.test'].open('w')
fp.write('12')
fp.close()
fp = rootpath['dir3']['file3.test'].open('w')
fp.write('123')
fp.close()
return rootpath return rootpath
testpath = None testpath = None
def setup_module(module): def setup_module(module):
# In this unit, we have tests depending on two directory structure. One with only one file in it # In this unit, we have tests depending on two directory structure. One with only one file in it
# and another with a more complex structure. # and another with a more complex structure.
testpath = Path(tempfile.mkdtemp()) testpath = Path(tempfile.mkdtemp())
module.testpath = testpath module.testpath = testpath
rootpath = testpath['onefile'] rootpath = testpath.joinpath("onefile")
rootpath.mkdir() rootpath.mkdir()
fp = rootpath['test.txt'].open('w') with rootpath.joinpath("test.txt").open("wt") as fp:
fp.write('test_data') fp.write("test_data")
fp.close()
create_fake_fs(testpath) create_fake_fs(testpath)
def teardown_module(module): def teardown_module(module):
shutil.rmtree(str(module.testpath)) shutil.rmtree(str(module.testpath))
def test_empty(): def test_empty():
d = Directories() d = Directories()
eq_(len(d), 0) eq_(len(d), 0)
assert 'foobar' not in d assert "foobar" not in d
def test_add_path(): def test_add_path():
d = Directories() d = Directories()
p = testpath['onefile'] p = testpath.joinpath("onefile")
d.add_path(p) d.add_path(p)
eq_(1, len(d)) eq_(1, len(d))
assert p in d assert p in d
assert (p['foobar']) in d assert (p.joinpath("foobar")) in d
assert p.parent() not in d assert p.parent not in d
p = testpath['fs'] p = testpath.joinpath("fs")
d.add_path(p) d.add_path(p)
eq_(2, len(d)) eq_(2, len(d))
assert p in d assert p in d
def test_AddPath_when_path_is_already_there():
def test_add_path_when_path_is_already_there():
d = Directories() d = Directories()
p = testpath['onefile'] p = testpath.joinpath("onefile")
d.add_path(p) d.add_path(p)
with raises(AlreadyThereError): with raises(AlreadyThereError):
d.add_path(p) d.add_path(p)
with raises(AlreadyThereError): with raises(AlreadyThereError):
d.add_path(p['foobar']) d.add_path(p.joinpath("foobar"))
eq_(1, len(d)) eq_(1, len(d))
def test_add_path_containing_paths_already_there(): def test_add_path_containing_paths_already_there():
d = Directories() d = Directories()
d.add_path(testpath['onefile']) d.add_path(testpath.joinpath("onefile"))
eq_(1, len(d)) eq_(1, len(d))
d.add_path(testpath) d.add_path(testpath)
eq_(len(d), 1) eq_(len(d), 1)
eq_(d[0], testpath) eq_(d[0], testpath)
def test_AddPath_non_latin(tmpdir):
def test_add_path_non_latin(tmpdir):
p = Path(str(tmpdir)) p = Path(str(tmpdir))
to_add = p['unicode\u201a'] to_add = p.joinpath("unicode\u201a")
os.mkdir(str(to_add)) os.mkdir(str(to_add))
d = Directories() d = Directories()
try: try:
@@ -106,63 +115,69 @@ def test_AddPath_non_latin(tmpdir):
except UnicodeDecodeError: except UnicodeDecodeError:
assert False assert False
def test_del(): def test_del():
d = Directories() d = Directories()
d.add_path(testpath['onefile']) d.add_path(testpath.joinpath("onefile"))
try: try:
del d[1] del d[1]
assert False assert False
except IndexError: except IndexError:
pass pass
d.add_path(testpath['fs']) d.add_path(testpath.joinpath("fs"))
del d[1] del d[1]
eq_(1, len(d)) eq_(1, len(d))
def test_states(): def test_states():
d = Directories() d = Directories()
p = testpath['onefile'] p = testpath.joinpath("onefile")
d.add_path(p) d.add_path(p)
eq_(DirectoryState.Normal, d.get_state(p)) eq_(DirectoryState.NORMAL, d.get_state(p))
d.set_state(p, DirectoryState.Reference) d.set_state(p, DirectoryState.REFERENCE)
eq_(DirectoryState.Reference, d.get_state(p)) eq_(DirectoryState.REFERENCE, d.get_state(p))
eq_(DirectoryState.Reference, d.get_state(p['dir1'])) eq_(DirectoryState.REFERENCE, d.get_state(p.joinpath("dir1")))
eq_(1, len(d.states)) eq_(1, len(d.states))
eq_(p, list(d.states.keys())[0]) eq_(p, list(d.states.keys())[0])
eq_(DirectoryState.Reference, d.states[p]) eq_(DirectoryState.REFERENCE, d.states[p])
def test_get_state_with_path_not_there(): def test_get_state_with_path_not_there():
# When the path's not there, just return DirectoryState.Normal # When the path's not there, just return DirectoryState.Normal
d = Directories() d = Directories()
d.add_path(testpath['onefile']) d.add_path(testpath.joinpath("onefile"))
eq_(d.get_state(testpath), DirectoryState.Normal) eq_(d.get_state(testpath), DirectoryState.NORMAL)
def test_states_overwritten_when_larger_directory_eat_smaller_ones(): def test_states_overwritten_when_larger_directory_eat_smaller_ones():
# ref #248 # ref #248
# When setting the state of a folder, we overwrite previously set states for subfolders. # When setting the state of a folder, we overwrite previously set states for subfolders.
d = Directories() d = Directories()
p = testpath['onefile'] p = testpath.joinpath("onefile")
d.add_path(p) d.add_path(p)
d.set_state(p, DirectoryState.Excluded) d.set_state(p, DirectoryState.EXCLUDED)
d.add_path(testpath) d.add_path(testpath)
d.set_state(testpath, DirectoryState.Reference) d.set_state(testpath, DirectoryState.REFERENCE)
eq_(d.get_state(p), DirectoryState.Reference) eq_(d.get_state(p), DirectoryState.REFERENCE)
eq_(d.get_state(p['dir1']), DirectoryState.Reference) eq_(d.get_state(p.joinpath("dir1")), DirectoryState.REFERENCE)
eq_(d.get_state(testpath), DirectoryState.Reference) eq_(d.get_state(testpath), DirectoryState.REFERENCE)
def test_get_files(): def test_get_files():
d = Directories() d = Directories()
p = testpath['fs'] p = testpath.joinpath("fs")
d.add_path(p) d.add_path(p)
d.set_state(p['dir1'], DirectoryState.Reference) d.set_state(p.joinpath("dir1"), DirectoryState.REFERENCE)
d.set_state(p['dir2'], DirectoryState.Excluded) d.set_state(p.joinpath("dir2"), DirectoryState.EXCLUDED)
files = list(d.get_files()) files = list(d.get_files())
eq_(5, len(files)) eq_(5, len(files))
for f in files: for f in files:
if f.path.parent() == p['dir1']: if f.path.parent == p.joinpath("dir1"):
assert f.is_ref assert f.is_ref
else: else:
assert not f.is_ref assert not f.is_ref
def test_get_files_with_folders(): def test_get_files_with_folders():
# When fileclasses handle folders, return them and stop recursing! # When fileclasses handle folders, return them and stop recursing!
class FakeFile(File): class FakeFile(File):
@@ -171,143 +186,380 @@ def test_get_files_with_folders():
return True return True
d = Directories() d = Directories()
p = testpath['fs'] p = testpath.joinpath("fs")
d.add_path(p) d.add_path(p)
files = list(d.get_files(fileclasses=[FakeFile])) files = list(d.get_files(fileclasses=[FakeFile]))
# We have the 3 root files and the 3 root dirs # We have the 3 root files and the 3 root dirs
eq_(6, len(files)) eq_(6, len(files))
def test_get_folders(): def test_get_folders():
d = Directories() d = Directories()
p = testpath['fs'] p = testpath.joinpath("fs")
d.add_path(p) d.add_path(p)
d.set_state(p['dir1'], DirectoryState.Reference) d.set_state(p.joinpath("dir1"), DirectoryState.REFERENCE)
d.set_state(p['dir2'], DirectoryState.Excluded) d.set_state(p.joinpath("dir2"), DirectoryState.EXCLUDED)
folders = list(d.get_folders()) folders = list(d.get_folders())
eq_(len(folders), 3) eq_(len(folders), 3)
ref = [f for f in folders if f.is_ref] ref = [f for f in folders if f.is_ref]
not_ref = [f for f in folders if not f.is_ref] not_ref = [f for f in folders if not f.is_ref]
eq_(len(ref), 1) eq_(len(ref), 1)
eq_(ref[0].path, p['dir1']) eq_(ref[0].path, p.joinpath("dir1"))
eq_(len(not_ref), 2) eq_(len(not_ref), 2)
eq_(ref[0].size, 1) eq_(ref[0].size, 1)
def test_get_files_with_inherited_exclusion(): def test_get_files_with_inherited_exclusion():
d = Directories() d = Directories()
p = testpath['onefile'] p = testpath.joinpath("onefile")
d.add_path(p) d.add_path(p)
d.set_state(p, DirectoryState.Excluded) d.set_state(p, DirectoryState.EXCLUDED)
eq_([], list(d.get_files())) eq_([], list(d.get_files()))
def test_save_and_load(tmpdir): def test_save_and_load(tmpdir):
d1 = Directories() d1 = Directories()
d2 = Directories() d2 = Directories()
p1 = Path(str(tmpdir.join('p1'))) p1 = Path(str(tmpdir.join("p1")))
p1.mkdir() p1.mkdir()
p2 = Path(str(tmpdir.join('p2'))) p2 = Path(str(tmpdir.join("p2")))
p2.mkdir() p2.mkdir()
d1.add_path(p1) d1.add_path(p1)
d1.add_path(p2) d1.add_path(p2)
d1.set_state(p1, DirectoryState.Reference) d1.set_state(p1, DirectoryState.REFERENCE)
d1.set_state(p1['dir1'], DirectoryState.Excluded) d1.set_state(p1.joinpath("dir1"), DirectoryState.EXCLUDED)
tmpxml = str(tmpdir.join('directories_testunit.xml')) tmpxml = str(tmpdir.join("directories_testunit.xml"))
d1.save_to_file(tmpxml) d1.save_to_file(tmpxml)
d2.load_from_file(tmpxml) d2.load_from_file(tmpxml)
eq_(2, len(d2)) eq_(2, len(d2))
eq_(DirectoryState.Reference, d2.get_state(p1)) eq_(DirectoryState.REFERENCE, d2.get_state(p1))
eq_(DirectoryState.Excluded, d2.get_state(p1['dir1'])) eq_(DirectoryState.EXCLUDED, d2.get_state(p1.joinpath("dir1")))
def test_invalid_path(): def test_invalid_path():
d = Directories() d = Directories()
p = Path('does_not_exist') p = Path("does_not_exist")
with raises(InvalidPathError): with raises(InvalidPathError):
d.add_path(p) d.add_path(p)
eq_(0, len(d)) eq_(0, len(d))
def test_set_state_on_invalid_path(): def test_set_state_on_invalid_path():
d = Directories() d = Directories()
try: try:
d.set_state(Path('foobar',), DirectoryState.Normal) d.set_state(
Path(
"foobar",
),
DirectoryState.NORMAL,
)
except LookupError: except LookupError:
assert False assert False
def test_load_from_file_with_invalid_path(tmpdir): def test_load_from_file_with_invalid_path(tmpdir):
#This test simulates a load from file resulting in a # This test simulates a load from file resulting in a
#InvalidPath raise. Other directories must be loaded. # InvalidPath raise. Other directories must be loaded.
d1 = Directories() d1 = Directories()
d1.add_path(testpath['onefile']) d1.add_path(testpath.joinpath("onefile"))
#Will raise InvalidPath upon loading # Will raise InvalidPath upon loading
p = Path(str(tmpdir.join('toremove'))) p = Path(str(tmpdir.join("toremove")))
p.mkdir() p.mkdir()
d1.add_path(p) d1.add_path(p)
p.rmdir() p.rmdir()
tmpxml = str(tmpdir.join('directories_testunit.xml')) tmpxml = str(tmpdir.join("directories_testunit.xml"))
d1.save_to_file(tmpxml) d1.save_to_file(tmpxml)
d2 = Directories() d2 = Directories()
d2.load_from_file(tmpxml) d2.load_from_file(tmpxml)
eq_(1, len(d2)) eq_(1, len(d2))
def test_unicode_save(tmpdir): def test_unicode_save(tmpdir):
d = Directories() d = Directories()
p1 = Path(str(tmpdir))['hello\xe9'] p1 = Path(str(tmpdir), "hello\xe9")
p1.mkdir() p1.mkdir()
p1['foo\xe9'].mkdir() p1.joinpath("foo\xe9").mkdir()
d.add_path(p1) d.add_path(p1)
d.set_state(p1['foo\xe9'], DirectoryState.Excluded) d.set_state(p1.joinpath("foo\xe9"), DirectoryState.EXCLUDED)
tmpxml = str(tmpdir.join('directories_testunit.xml')) tmpxml = str(tmpdir.join("directories_testunit.xml"))
try: try:
d.save_to_file(tmpxml) d.save_to_file(tmpxml)
except UnicodeDecodeError: except UnicodeDecodeError:
assert False assert False
def test_get_files_refreshes_its_directories(): def test_get_files_refreshes_its_directories():
d = Directories() d = Directories()
p = testpath['fs'] p = testpath.joinpath("fs")
d.add_path(p) d.add_path(p)
files = d.get_files() files = d.get_files()
eq_(6, len(list(files))) eq_(6, len(list(files)))
time.sleep(1) time.sleep(1)
os.remove(str(p['dir1']['file1.test'])) os.remove(str(p.joinpath("dir1", "file1.test")))
files = d.get_files() files = d.get_files()
eq_(5, len(list(files))) eq_(5, len(list(files)))
def test_get_files_does_not_choke_on_non_existing_directories(tmpdir): def test_get_files_does_not_choke_on_non_existing_directories(tmpdir):
d = Directories() d = Directories()
p = Path(str(tmpdir)) p = Path(str(tmpdir))
d.add_path(p) d.add_path(p)
p.rmtree() shutil.rmtree(str(p))
eq_([], list(d.get_files())) eq_([], list(d.get_files()))
def test_get_state_returns_excluded_by_default_for_hidden_directories(tmpdir): def test_get_state_returns_excluded_by_default_for_hidden_directories(tmpdir):
d = Directories() d = Directories()
p = Path(str(tmpdir)) p = Path(str(tmpdir))
hidden_dir_path = p['.foo'] hidden_dir_path = p.joinpath(".foo")
p['.foo'].mkdir() p.joinpath(".foo").mkdir()
d.add_path(p) d.add_path(p)
eq_(d.get_state(hidden_dir_path), DirectoryState.Excluded) eq_(d.get_state(hidden_dir_path), DirectoryState.EXCLUDED)
# But it can be overriden # But it can be overriden
d.set_state(hidden_dir_path, DirectoryState.Normal) d.set_state(hidden_dir_path, DirectoryState.NORMAL)
eq_(d.get_state(hidden_dir_path), DirectoryState.Normal) eq_(d.get_state(hidden_dir_path), DirectoryState.NORMAL)
def test_default_path_state_override(tmpdir): def test_default_path_state_override(tmpdir):
# It's possible for a subclass to override the default state of a path # It's possible for a subclass to override the default state of a path
class MyDirectories(Directories): class MyDirectories(Directories):
def _default_state_for_path(self, path): def _default_state_for_path(self, path):
if 'foobar' in path: if "foobar" in path.parts:
return DirectoryState.Excluded return DirectoryState.EXCLUDED
d = MyDirectories() d = MyDirectories()
p1 = Path(str(tmpdir)) p1 = Path(str(tmpdir))
p1['foobar'].mkdir() p1.joinpath("foobar").mkdir()
p1['foobar/somefile'].open('w').close() p1.joinpath("foobar/somefile").touch()
p1['foobaz'].mkdir() p1.joinpath("foobaz").mkdir()
p1['foobaz/somefile'].open('w').close() p1.joinpath("foobaz/somefile").touch()
d.add_path(p1) d.add_path(p1)
eq_(d.get_state(p1['foobaz']), DirectoryState.Normal) eq_(d.get_state(p1.joinpath("foobaz")), DirectoryState.NORMAL)
eq_(d.get_state(p1['foobar']), DirectoryState.Excluded) eq_(d.get_state(p1.joinpath("foobar")), DirectoryState.EXCLUDED)
eq_(len(list(d.get_files())), 1) # only the 'foobaz' file is there eq_(len(list(d.get_files())), 1) # only the 'foobaz' file is there
# However, the default state can be changed # However, the default state can be changed
d.set_state(p1['foobar'], DirectoryState.Normal) d.set_state(p1.joinpath("foobar"), DirectoryState.NORMAL)
eq_(d.get_state(p1['foobar']), DirectoryState.Normal) eq_(d.get_state(p1.joinpath("foobar")), DirectoryState.NORMAL)
eq_(len(list(d.get_files())), 2) eq_(len(list(d.get_files())), 2)
class TestExcludeList:
def setup_method(self, method):
self.d = Directories(exclude_list=ExcludeList(union_regex=False))
def get_files_and_expect_num_result(self, num_result):
"""Calls get_files(), get the filenames only, print for debugging.
num_result is how many files are expected as a result."""
print(
f"EXCLUDED REGEX: paths {self.d._exclude_list.compiled_paths} \
files: {self.d._exclude_list.compiled_files} all: {self.d._exclude_list.compiled}"
)
files = list(self.d.get_files())
files = [file.name for file in files]
print(f"FINAL FILES {files}")
eq_(len(files), num_result)
return files
def test_exclude_recycle_bin_by_default(self, tmpdir):
regex = r"^.*Recycle\.Bin$"
self.d._exclude_list.add(regex)
self.d._exclude_list.mark(regex)
p1 = Path(str(tmpdir))
p1.joinpath("$Recycle.Bin").mkdir()
p1.joinpath("$Recycle.Bin", "subdir").mkdir()
self.d.add_path(p1)
eq_(self.d.get_state(p1.joinpath("$Recycle.Bin")), DirectoryState.EXCLUDED)
# By default, subdirs should be excluded too, but this can be overridden separately
eq_(self.d.get_state(p1.joinpath("$Recycle.Bin", "subdir")), DirectoryState.EXCLUDED)
self.d.set_state(p1.joinpath("$Recycle.Bin", "subdir"), DirectoryState.NORMAL)
eq_(self.d.get_state(p1.joinpath("$Recycle.Bin", "subdir")), DirectoryState.NORMAL)
def test_exclude_refined(self, tmpdir):
regex1 = r"^\$Recycle\.Bin$"
self.d._exclude_list.add(regex1)
self.d._exclude_list.mark(regex1)
p1 = Path(str(tmpdir))
p1.joinpath("$Recycle.Bin").mkdir()
p1.joinpath("$Recycle.Bin", "somefile.png").touch()
p1.joinpath("$Recycle.Bin", "some_unwanted_file.jpg").touch()
p1.joinpath("$Recycle.Bin", "subdir").mkdir()
p1.joinpath("$Recycle.Bin", "subdir", "somesubdirfile.png").touch()
p1.joinpath("$Recycle.Bin", "subdir", "unwanted_subdirfile.gif").touch()
p1.joinpath("$Recycle.Bin", "subdar").mkdir()
p1.joinpath("$Recycle.Bin", "subdar", "somesubdarfile.jpeg").touch()
p1.joinpath("$Recycle.Bin", "subdar", "unwanted_subdarfile.png").touch()
self.d.add_path(p1.joinpath("$Recycle.Bin"))
# Filter should set the default state to Excluded
eq_(self.d.get_state(p1.joinpath("$Recycle.Bin")), DirectoryState.EXCLUDED)
# The subdir should inherit its parent state
eq_(self.d.get_state(p1.joinpath("$Recycle.Bin", "subdir")), DirectoryState.EXCLUDED)
eq_(self.d.get_state(p1.joinpath("$Recycle.Bin", "subdar")), DirectoryState.EXCLUDED)
# Override a child path's state
self.d.set_state(p1.joinpath("$Recycle.Bin", "subdir"), DirectoryState.NORMAL)
eq_(self.d.get_state(p1.joinpath("$Recycle.Bin", "subdir")), DirectoryState.NORMAL)
# Parent should keep its default state, and the other child too
eq_(self.d.get_state(p1.joinpath("$Recycle.Bin")), DirectoryState.EXCLUDED)
eq_(self.d.get_state(p1.joinpath("$Recycle.Bin", "subdar")), DirectoryState.EXCLUDED)
# print(f"get_folders(): {[x for x in self.d.get_folders()]}")
# only the 2 files directly under the Normal directory
files = self.get_files_and_expect_num_result(2)
assert "somefile.png" not in files
assert "some_unwanted_file.jpg" not in files
assert "somesubdarfile.jpeg" not in files
assert "unwanted_subdarfile.png" not in files
assert "somesubdirfile.png" in files
assert "unwanted_subdirfile.gif" in files
# Overriding the parent should enable all children
self.d.set_state(p1.joinpath("$Recycle.Bin"), DirectoryState.NORMAL)
eq_(self.d.get_state(p1.joinpath("$Recycle.Bin", "subdar")), DirectoryState.NORMAL)
# all files there
files = self.get_files_and_expect_num_result(6)
assert "somefile.png" in files
assert "some_unwanted_file.jpg" in files
# This should still filter out files under directory, despite the Normal state
regex2 = r".*unwanted.*"
self.d._exclude_list.add(regex2)
self.d._exclude_list.mark(regex2)
files = self.get_files_and_expect_num_result(3)
assert "somefile.png" in files
assert "some_unwanted_file.jpg" not in files
assert "unwanted_subdirfile.gif" not in files
assert "unwanted_subdarfile.png" not in files
if ISWINDOWS:
regex3 = r".*Recycle\.Bin\\.*unwanted.*subdirfile.*"
else:
regex3 = r".*Recycle\.Bin\/.*unwanted.*subdirfile.*"
self.d._exclude_list.rename(regex2, regex3)
assert self.d._exclude_list.error(regex3) is None
# print(f"get_folders(): {[x for x in self.d.get_folders()]}")
# Directory shouldn't change its state here, unless explicitely done by user
eq_(self.d.get_state(p1.joinpath("$Recycle.Bin", "subdir")), DirectoryState.NORMAL)
files = self.get_files_and_expect_num_result(5)
assert "unwanted_subdirfile.gif" not in files
assert "unwanted_subdarfile.png" in files
# using end of line character should only filter the directory, or file ending with subdir
regex4 = r".*subdir$"
self.d._exclude_list.rename(regex3, regex4)
assert self.d._exclude_list.error(regex4) is None
p1.joinpath("$Recycle.Bin", "subdar", "file_ending_with_subdir").touch()
eq_(self.d.get_state(p1.joinpath("$Recycle.Bin", "subdir")), DirectoryState.EXCLUDED)
files = self.get_files_and_expect_num_result(4)
assert "file_ending_with_subdir" not in files
assert "somesubdarfile.jpeg" in files
assert "somesubdirfile.png" not in files
assert "unwanted_subdirfile.gif" not in files
self.d.set_state(p1.joinpath("$Recycle.Bin", "subdir"), DirectoryState.NORMAL)
eq_(self.d.get_state(p1.joinpath("$Recycle.Bin", "subdir")), DirectoryState.NORMAL)
# print(f"get_folders(): {[x for x in self.d.get_folders()]}")
files = self.get_files_and_expect_num_result(6)
assert "file_ending_with_subdir" not in files
assert "somesubdirfile.png" in files
assert "unwanted_subdirfile.gif" in files
regex5 = r".*subdir.*"
self.d._exclude_list.rename(regex4, regex5)
# Files containing substring should be filtered
eq_(self.d.get_state(p1.joinpath("$Recycle.Bin", "subdir")), DirectoryState.NORMAL)
# The path should not match, only the filename, the "subdir" in the directory name shouldn't matter
p1.joinpath("$Recycle.Bin", "subdir", "file_which_shouldnt_match").touch()
files = self.get_files_and_expect_num_result(5)
assert "somesubdirfile.png" not in files
assert "unwanted_subdirfile.gif" not in files
assert "file_ending_with_subdir" not in files
assert "file_which_shouldnt_match" in files
# This should match the directory only
regex6 = r".*/.*subdir.*/.*"
if ISWINDOWS:
regex6 = r".*\\.*subdir.*\\.*"
assert os.sep in regex6
self.d._exclude_list.rename(regex5, regex6)
self.d._exclude_list.remove(regex1)
eq_(len(self.d._exclude_list.compiled), 1)
assert regex1 not in self.d._exclude_list
assert regex5 not in self.d._exclude_list
assert self.d._exclude_list.error(regex6) is None
assert regex6 in self.d._exclude_list
# This still should not be affected
eq_(self.d.get_state(p1.joinpath("$Recycle.Bin", "subdir")), DirectoryState.NORMAL)
files = self.get_files_and_expect_num_result(5)
# These files are under the "/subdir" directory
assert "somesubdirfile.png" not in files
assert "unwanted_subdirfile.gif" not in files
# This file under "subdar" directory should not be filtered out
assert "file_ending_with_subdir" in files
# This file is in a directory that should be filtered out
assert "file_which_shouldnt_match" not in files
def test_japanese_unicode(self, tmpdir):
p1 = Path(str(tmpdir))
p1.joinpath("$Recycle.Bin").mkdir()
p1.joinpath("$Recycle.Bin", "somerecycledfile.png").touch()
p1.joinpath("$Recycle.Bin", "some_unwanted_file.jpg").touch()
p1.joinpath("$Recycle.Bin", "subdir").mkdir()
p1.joinpath("$Recycle.Bin", "subdir", "過去白濁物語~]_カラー.jpg").touch()
p1.joinpath("$Recycle.Bin", "思叫物語").mkdir()
p1.joinpath("$Recycle.Bin", "思叫物語", "なししろ会う前").touch()
p1.joinpath("$Recycle.Bin", "思叫物語", "堂~ロ").touch()
self.d.add_path(p1.joinpath("$Recycle.Bin"))
regex3 = r".*物語.*"
self.d._exclude_list.add(regex3)
self.d._exclude_list.mark(regex3)
# print(f"get_folders(): {[x for x in self.d.get_folders()]}")
eq_(self.d.get_state(p1.joinpath("$Recycle.Bin", "思叫物語")), DirectoryState.EXCLUDED)
files = self.get_files_and_expect_num_result(2)
assert "過去白濁物語~]_カラー.jpg" not in files
assert "なししろ会う前" not in files
assert "堂~ロ" not in files
# using end of line character should only filter that directory, not affecting its files
regex4 = r".*物語$"
self.d._exclude_list.rename(regex3, regex4)
assert self.d._exclude_list.error(regex4) is None
self.d.set_state(p1.joinpath("$Recycle.Bin", "思叫物語"), DirectoryState.NORMAL)
files = self.get_files_and_expect_num_result(5)
assert "過去白濁物語~]_カラー.jpg" in files
assert "なししろ会う前" in files
assert "堂~ロ" in files
def test_get_state_returns_excluded_for_hidden_directories_and_files(self, tmpdir):
# This regex only work for files, not paths
regex = r"^\..*$"
self.d._exclude_list.add(regex)
self.d._exclude_list.mark(regex)
p1 = Path(str(tmpdir))
p1.joinpath("foobar").mkdir()
p1.joinpath("foobar", ".hidden_file.txt").touch()
p1.joinpath("foobar", ".hidden_dir").mkdir()
p1.joinpath("foobar", ".hidden_dir", "foobar.jpg").touch()
p1.joinpath("foobar", ".hidden_dir", ".hidden_subfile.png").touch()
self.d.add_path(p1.joinpath("foobar"))
# It should not inherit its parent's state originally
eq_(self.d.get_state(p1.joinpath("foobar", ".hidden_dir")), DirectoryState.EXCLUDED)
self.d.set_state(p1.joinpath("foobar", ".hidden_dir"), DirectoryState.NORMAL)
# The files should still be filtered
files = self.get_files_and_expect_num_result(1)
eq_(len(self.d._exclude_list.compiled_paths), 0)
eq_(len(self.d._exclude_list.compiled_files), 1)
assert ".hidden_file.txt" not in files
assert ".hidden_subfile.png" not in files
assert "foobar.jpg" in files
class TestExcludeDict(TestExcludeList):
def setup_method(self, method):
self.d = Directories(exclude_list=ExcludeDict(union_regex=False))
class TestExcludeListunion(TestExcludeList):
def setup_method(self, method):
self.d = Directories(exclude_list=ExcludeList(union_regex=True))
class TestExcludeDictunion(TestExcludeList):
def setup_method(self, method):
self.d = Directories(exclude_list=ExcludeDict(union_regex=True))

View File

@@ -10,16 +10,31 @@ from hscommon.jobprogress import job
from hscommon.util import first from hscommon.util import first
from hscommon.testutil import eq_, log_calls from hscommon.testutil import eq_, log_calls
from .base import NamedObject from core.tests.base import NamedObject
from .. import engine from core import engine
from ..engine import ( from core.engine import (
get_match, getwords, Group, getfields, unpack_fields, compare_fields, compare, WEIGHT_WORDS, get_match,
MATCH_SIMILAR_WORDS, NO_FIELD_ORDER, build_word_dict, get_groups, getmatches, Match, getwords,
getmatches_by_contents, merge_similar_words, reduce_common_words Group,
getfields,
unpack_fields,
compare_fields,
compare,
WEIGHT_WORDS,
MATCH_SIMILAR_WORDS,
NO_FIELD_ORDER,
build_word_dict,
get_groups,
getmatches,
Match,
getmatches_by_contents,
merge_similar_words,
reduce_common_words,
) )
no = NamedObject no = NamedObject
def get_match_triangle(): def get_match_triangle():
o1 = NamedObject(with_words=True) o1 = NamedObject(with_words=True)
o2 = NamedObject(with_words=True) o2 = NamedObject(with_words=True)
@@ -29,6 +44,7 @@ def get_match_triangle():
m3 = get_match(o2, o3) m3 = get_match(o2, o3)
return [m1, m2, m3] return [m1, m2, m3]
def get_test_group(): def get_test_group():
m1, m2, m3 = get_match_triangle() m1, m2, m3 = get_match_triangle()
result = Group() result = Group()
@@ -37,6 +53,7 @@ def get_test_group():
result.add_match(m3) result.add_match(m3)
return result return result
def assert_match(m, name1, name2): def assert_match(m, name1, name2):
# When testing matches, whether objects are in first or second position very often doesn't # When testing matches, whether objects are in first or second position very often doesn't
# matter. This function makes this test more convenient. # matter. This function makes this test more convenient.
@@ -46,53 +63,57 @@ def assert_match(m, name1, name2):
eq_(m.first.name, name2) eq_(m.first.name, name2)
eq_(m.second.name, name1) eq_(m.second.name, name1)
class TestCasegetwords: class TestCasegetwords:
def test_spaces(self): def test_spaces(self):
eq_(['a', 'b', 'c', 'd'], getwords("a b c d")) eq_(["a", "b", "c", "d"], getwords("a b c d"))
eq_(['a', 'b', 'c', 'd'], getwords(" a b c d ")) eq_(["a", "b", "c", "d"], getwords(" a b c d "))
def test_unicode(self):
eq_(["e", "c", "0", "a", "o", "u", "e", "u"], getwords("é ç 0 à ö û è ¤ ù"))
eq_(["02", "君のこころは輝いてるかい?", "国木田花丸", "solo", "ver"], getwords("02 君のこころは輝いてるかい? 国木田花丸 Solo Ver"))
def test_splitter_chars(self): def test_splitter_chars(self):
eq_( eq_(
[chr(i) for i in range(ord('a'), ord('z')+1)], [chr(i) for i in range(ord("a"), ord("z") + 1)],
getwords("a-b_c&d+e(f)g;h\\i[j]k{l}m:n.o,p<q>r/s?t~u!v@w#x$y*z") getwords("a-b_c&d+e(f)g;h\\i[j]k{l}m:n.o,p<q>r/s?t~u!v@w#x$y*z"),
) )
def test_joiner_chars(self): def test_joiner_chars(self):
eq_(["aec"], getwords("a'e\u0301c")) eq_(["aec"], getwords("a'e\u0301c"))
def test_empty(self): def test_empty(self):
eq_([], getwords('')) eq_([], getwords(""))
def test_returns_lowercase(self): def test_returns_lowercase(self):
eq_(['foo', 'bar'], getwords('FOO BAR')) eq_(["foo", "bar"], getwords("FOO BAR"))
def test_decompose_unicode(self): def test_decompose_unicode(self):
eq_(getwords('foo\xe9bar'), ['fooebar']) eq_(["fooebar"], getwords("foo\xe9bar"))
class TestCasegetfields: class TestCasegetfields:
def test_simple(self): def test_simple(self):
eq_([['a', 'b'], ['c', 'd', 'e']], getfields('a b - c d e')) eq_([["a", "b"], ["c", "d", "e"]], getfields("a b - c d e"))
def test_empty(self): def test_empty(self):
eq_([], getfields('')) eq_([], getfields(""))
def test_cleans_empty_fields(self): def test_cleans_empty_fields(self):
expected = [['a', 'bc', 'def']] expected = [["a", "bc", "def"]]
actual = getfields(' - a bc def') actual = getfields(" - a bc def")
eq_(expected, actual) eq_(expected, actual)
expected = [['bc', 'def']]
class TestCaseunpack_fields: class TestCaseUnpackFields:
def test_with_fields(self): def test_with_fields(self):
expected = ['a', 'b', 'c', 'd', 'e', 'f'] expected = ["a", "b", "c", "d", "e", "f"]
actual = unpack_fields([['a'], ['b', 'c'], ['d', 'e', 'f']]) actual = unpack_fields([["a"], ["b", "c"], ["d", "e", "f"]])
eq_(expected, actual) eq_(expected, actual)
def test_without_fields(self): def test_without_fields(self):
expected = ['a', 'b', 'c', 'd', 'e', 'f'] expected = ["a", "b", "c", "d", "e", "f"]
actual = unpack_fields(['a', 'b', 'c', 'd', 'e', 'f']) actual = unpack_fields(["a", "b", "c", "d", "e", "f"])
eq_(expected, actual) eq_(expected, actual)
def test_empty(self): def test_empty(self):
@@ -101,127 +122,140 @@ class TestCaseunpack_fields:
class TestCaseWordCompare: class TestCaseWordCompare:
def test_list(self): def test_list(self):
eq_(100, compare(['a', 'b', 'c', 'd'], ['a', 'b', 'c', 'd'])) eq_(100, compare(["a", "b", "c", "d"], ["a", "b", "c", "d"]))
eq_(86, compare(['a', 'b', 'c', 'd'], ['a', 'b', 'c'])) eq_(86, compare(["a", "b", "c", "d"], ["a", "b", "c"]))
def test_unordered(self): def test_unordered(self):
#Sometimes, users don't want fuzzy matching too much When they set the slider # Sometimes, users don't want fuzzy matching too much When they set the slider
#to 100, they don't expect a filename with the same words, but not the same order, to match. # to 100, they don't expect a filename with the same words, but not the same order, to match.
#Thus, we want to return 99 in that case. # Thus, we want to return 99 in that case.
eq_(99, compare(['a', 'b', 'c', 'd'], ['d', 'b', 'c', 'a'])) eq_(99, compare(["a", "b", "c", "d"], ["d", "b", "c", "a"]))
def test_word_occurs_twice(self): def test_word_occurs_twice(self):
#if a word occurs twice in first, but once in second, we want the word to be only counted once # if a word occurs twice in first, but once in second, we want the word to be only counted once
eq_(89, compare(['a', 'b', 'c', 'd', 'a'], ['d', 'b', 'c', 'a'])) eq_(89, compare(["a", "b", "c", "d", "a"], ["d", "b", "c", "a"]))
def test_uses_copy_of_lists(self): def test_uses_copy_of_lists(self):
first = ['foo', 'bar'] first = ["foo", "bar"]
second = ['bar', 'bleh'] second = ["bar", "bleh"]
compare(first, second) compare(first, second)
eq_(['foo', 'bar'], first) eq_(["foo", "bar"], first)
eq_(['bar', 'bleh'], second) eq_(["bar", "bleh"], second)
def test_word_weight(self): def test_word_weight(self):
eq_(int((6.0 / 13.0) * 100), compare(['foo', 'bar'], ['bar', 'bleh'], (WEIGHT_WORDS, ))) eq_(
int((6.0 / 13.0) * 100),
compare(["foo", "bar"], ["bar", "bleh"], (WEIGHT_WORDS,)),
)
def test_similar_words(self): def test_similar_words(self):
eq_(100, compare(['the', 'white', 'stripes'], ['the', 'whites', 'stripe'], (MATCH_SIMILAR_WORDS, ))) eq_(
100,
compare(
["the", "white", "stripes"],
["the", "whites", "stripe"],
(MATCH_SIMILAR_WORDS,),
),
)
def test_empty(self): def test_empty(self):
eq_(0, compare([], [])) eq_(0, compare([], []))
def test_with_fields(self): def test_with_fields(self):
eq_(67, compare([['a', 'b'], ['c', 'd', 'e']], [['a', 'b'], ['c', 'd', 'f']])) eq_(67, compare([["a", "b"], ["c", "d", "e"]], [["a", "b"], ["c", "d", "f"]]))
def test_propagate_flags_with_fields(self, monkeypatch): def test_propagate_flags_with_fields(self, monkeypatch):
def mock_compare(first, second, flags): def mock_compare(first, second, flags):
eq_((0, 1, 2, 3, 5), flags) eq_((0, 1, 2, 3, 5), flags)
monkeypatch.setattr(engine, 'compare_fields', mock_compare) monkeypatch.setattr(engine, "compare_fields", mock_compare)
compare([['a']], [['a']], (0, 1, 2, 3, 5)) compare([["a"]], [["a"]], (0, 1, 2, 3, 5))
class TestCaseWordCompareWithFields: class TestCaseWordCompareWithFields:
def test_simple(self): def test_simple(self):
eq_(67, compare_fields([['a', 'b'], ['c', 'd', 'e']], [['a', 'b'], ['c', 'd', 'f']])) eq_(
67,
compare_fields([["a", "b"], ["c", "d", "e"]], [["a", "b"], ["c", "d", "f"]]),
)
def test_empty(self): def test_empty(self):
eq_(0, compare_fields([], [])) eq_(0, compare_fields([], []))
def test_different_length(self): def test_different_length(self):
eq_(0, compare_fields([['a'], ['b']], [['a'], ['b'], ['c']])) eq_(0, compare_fields([["a"], ["b"]], [["a"], ["b"], ["c"]]))
def test_propagates_flags(self, monkeypatch): def test_propagates_flags(self, monkeypatch):
def mock_compare(first, second, flags): def mock_compare(first, second, flags):
eq_((0, 1, 2, 3, 5), flags) eq_((0, 1, 2, 3, 5), flags)
monkeypatch.setattr(engine, 'compare_fields', mock_compare) monkeypatch.setattr(engine, "compare_fields", mock_compare)
compare_fields([['a']], [['a']], (0, 1, 2, 3, 5)) compare_fields([["a"]], [["a"]], (0, 1, 2, 3, 5))
def test_order(self): def test_order(self):
first = [['a', 'b'], ['c', 'd', 'e']] first = [["a", "b"], ["c", "d", "e"]]
second = [['c', 'd', 'f'], ['a', 'b']] second = [["c", "d", "f"], ["a", "b"]]
eq_(0, compare_fields(first, second)) eq_(0, compare_fields(first, second))
def test_no_order(self): def test_no_order(self):
first = [['a', 'b'], ['c', 'd', 'e']] first = [["a", "b"], ["c", "d", "e"]]
second = [['c', 'd', 'f'], ['a', 'b']] second = [["c", "d", "f"], ["a", "b"]]
eq_(67, compare_fields(first, second, (NO_FIELD_ORDER, ))) eq_(67, compare_fields(first, second, (NO_FIELD_ORDER,)))
first = [['a', 'b'], ['a', 'b']] #a field can only be matched once. first = [["a", "b"], ["a", "b"]] # a field can only be matched once.
second = [['c', 'd', 'f'], ['a', 'b']] second = [["c", "d", "f"], ["a", "b"]]
eq_(0, compare_fields(first, second, (NO_FIELD_ORDER, ))) eq_(0, compare_fields(first, second, (NO_FIELD_ORDER,)))
first = [['a', 'b'], ['a', 'b', 'c']] first = [["a", "b"], ["a", "b", "c"]]
second = [['c', 'd', 'f'], ['a', 'b']] second = [["c", "d", "f"], ["a", "b"]]
eq_(33, compare_fields(first, second, (NO_FIELD_ORDER, ))) eq_(33, compare_fields(first, second, (NO_FIELD_ORDER,)))
def test_compare_fields_without_order_doesnt_alter_fields(self): def test_compare_fields_without_order_doesnt_alter_fields(self):
#The NO_ORDER comp type altered the fields! # The NO_ORDER comp type altered the fields!
first = [['a', 'b'], ['c', 'd', 'e']] first = [["a", "b"], ["c", "d", "e"]]
second = [['c', 'd', 'f'], ['a', 'b']] second = [["c", "d", "f"], ["a", "b"]]
eq_(67, compare_fields(first, second, (NO_FIELD_ORDER, ))) eq_(67, compare_fields(first, second, (NO_FIELD_ORDER,)))
eq_([['a', 'b'], ['c', 'd', 'e']], first) eq_([["a", "b"], ["c", "d", "e"]], first)
eq_([['c', 'd', 'f'], ['a', 'b']], second) eq_([["c", "d", "f"], ["a", "b"]], second)
class TestCasebuild_word_dict: class TestCaseBuildWordDict:
def test_with_standard_words(self): def test_with_standard_words(self):
l = [NamedObject('foo bar', True)] item_list = [NamedObject("foo bar", True)]
l.append(NamedObject('bar baz', True)) item_list.append(NamedObject("bar baz", True))
l.append(NamedObject('baz bleh foo', True)) item_list.append(NamedObject("baz bleh foo", True))
d = build_word_dict(l) d = build_word_dict(item_list)
eq_(4, len(d)) eq_(4, len(d))
eq_(2, len(d['foo'])) eq_(2, len(d["foo"]))
assert l[0] in d['foo'] assert item_list[0] in d["foo"]
assert l[2] in d['foo'] assert item_list[2] in d["foo"]
eq_(2, len(d['bar'])) eq_(2, len(d["bar"]))
assert l[0] in d['bar'] assert item_list[0] in d["bar"]
assert l[1] in d['bar'] assert item_list[1] in d["bar"]
eq_(2, len(d['baz'])) eq_(2, len(d["baz"]))
assert l[1] in d['baz'] assert item_list[1] in d["baz"]
assert l[2] in d['baz'] assert item_list[2] in d["baz"]
eq_(1, len(d['bleh'])) eq_(1, len(d["bleh"]))
assert l[2] in d['bleh'] assert item_list[2] in d["bleh"]
def test_unpack_fields(self): def test_unpack_fields(self):
o = NamedObject('') o = NamedObject("")
o.words = [['foo', 'bar'], ['baz']] o.words = [["foo", "bar"], ["baz"]]
d = build_word_dict([o]) d = build_word_dict([o])
eq_(3, len(d)) eq_(3, len(d))
eq_(1, len(d['foo'])) eq_(1, len(d["foo"]))
def test_words_are_unaltered(self): def test_words_are_unaltered(self):
o = NamedObject('') o = NamedObject("")
o.words = [['foo', 'bar'], ['baz']] o.words = [["foo", "bar"], ["baz"]]
build_word_dict([o]) build_word_dict([o])
eq_([['foo', 'bar'], ['baz']], o.words) eq_([["foo", "bar"], ["baz"]], o.words)
def test_object_instances_can_only_be_once_in_words_object_list(self): def test_object_instances_can_only_be_once_in_words_object_list(self):
o = NamedObject('foo foo', True) o = NamedObject("foo foo", True)
d = build_word_dict([o]) d = build_word_dict([o])
eq_(1, len(d['foo'])) eq_(1, len(d["foo"]))
def test_job(self): def test_job(self):
def do_progress(p, d=''): def do_progress(p, d=""):
self.log.append(p) self.log.append(p)
return True return True
@@ -234,54 +268,53 @@ class TestCasebuild_word_dict:
eq_(100, self.log[1]) eq_(100, self.log[1])
class TestCasemerge_similar_words: class TestCaseMergeSimilarWords:
def test_some_similar_words(self): def test_some_similar_words(self):
d = { d = {
'foobar': set([1]), "foobar": {1},
'foobar1': set([2]), "foobar1": {2},
'foobar2': set([3]), "foobar2": {3},
} }
merge_similar_words(d) merge_similar_words(d)
eq_(1, len(d)) eq_(1, len(d))
eq_(3, len(d['foobar'])) eq_(3, len(d["foobar"]))
class TestCaseReduceCommonWords:
class TestCasereduce_common_words:
def test_typical(self): def test_typical(self):
d = { d = {
'foo': set([NamedObject('foo bar', True) for i in range(50)]), "foo": {NamedObject("foo bar", True) for _ in range(50)},
'bar': set([NamedObject('foo bar', True) for i in range(49)]) "bar": {NamedObject("foo bar", True) for _ in range(49)},
} }
reduce_common_words(d, 50) reduce_common_words(d, 50)
assert 'foo' not in d assert "foo" not in d
eq_(49, len(d['bar'])) eq_(49, len(d["bar"]))
def test_dont_remove_objects_with_only_common_words(self): def test_dont_remove_objects_with_only_common_words(self):
d = { d = {
'common': set([NamedObject("common uncommon", True) for i in range(50)] + [NamedObject("common", True)]), "common": set([NamedObject("common uncommon", True) for _ in range(50)] + [NamedObject("common", True)]),
'uncommon': set([NamedObject("common uncommon", True)]) "uncommon": {NamedObject("common uncommon", True)},
} }
reduce_common_words(d, 50) reduce_common_words(d, 50)
eq_(1, len(d['common'])) eq_(1, len(d["common"]))
eq_(1, len(d['uncommon'])) eq_(1, len(d["uncommon"]))
def test_values_still_are_set_instances(self): def test_values_still_are_set_instances(self):
d = { d = {
'common': set([NamedObject("common uncommon", True) for i in range(50)] + [NamedObject("common", True)]), "common": set([NamedObject("common uncommon", True) for _ in range(50)] + [NamedObject("common", True)]),
'uncommon': set([NamedObject("common uncommon", True)]) "uncommon": {NamedObject("common uncommon", True)},
} }
reduce_common_words(d, 50) reduce_common_words(d, 50)
assert isinstance(d['common'], set) assert isinstance(d["common"], set)
assert isinstance(d['uncommon'], set) assert isinstance(d["uncommon"], set)
def test_dont_raise_KeyError_when_a_word_has_been_removed(self): def test_dont_raise_keyerror_when_a_word_has_been_removed(self):
#If a word has been removed by the reduce, an object in a subsequent common word that # If a word has been removed by the reduce, an object in a subsequent common word that
#contains the word that has been removed would cause a KeyError. # contains the word that has been removed would cause a KeyError.
d = { d = {
'foo': set([NamedObject('foo bar baz', True) for i in range(50)]), "foo": {NamedObject("foo bar baz", True) for _ in range(50)},
'bar': set([NamedObject('foo bar baz', True) for i in range(50)]), "bar": {NamedObject("foo bar baz", True) for _ in range(50)},
'baz': set([NamedObject('foo bar baz', True) for i in range(49)]) "baz": {NamedObject("foo bar baz", True) for _ in range(49)},
} }
try: try:
reduce_common_words(d, 50) reduce_common_words(d, 50)
@@ -289,45 +322,43 @@ class TestCasereduce_common_words:
self.fail() self.fail()
def test_unpack_fields(self): def test_unpack_fields(self):
#object.words may be fields. # object.words may be fields.
def create_it(): def create_it():
o = NamedObject('') o = NamedObject("")
o.words = [['foo', 'bar'], ['baz']] o.words = [["foo", "bar"], ["baz"]]
return o return o
d = { d = {"foo": {create_it() for _ in range(50)}}
'foo': set([create_it() for i in range(50)])
}
try: try:
reduce_common_words(d, 50) reduce_common_words(d, 50)
except TypeError: except TypeError:
self.fail("must support fields.") self.fail("must support fields.")
def test_consider_a_reduced_common_word_common_even_after_reduction(self): def test_consider_a_reduced_common_word_common_even_after_reduction(self):
#There was a bug in the code that causeda word that has already been reduced not to # There was a bug in the code that causeda word that has already been reduced not to
#be counted as a common word for subsequent words. For example, if 'foo' is processed # be counted as a common word for subsequent words. For example, if 'foo' is processed
#as a common word, keeping a "foo bar" file in it, and the 'bar' is processed, "foo bar" # as a common word, keeping a "foo bar" file in it, and the 'bar' is processed, "foo bar"
#would not stay in 'bar' because 'foo' is not a common word anymore. # would not stay in 'bar' because 'foo' is not a common word anymore.
only_common = NamedObject('foo bar', True) only_common = NamedObject("foo bar", True)
d = { d = {
'foo': set([NamedObject('foo bar baz', True) for i in range(49)] + [only_common]), "foo": set([NamedObject("foo bar baz", True) for _ in range(49)] + [only_common]),
'bar': set([NamedObject('foo bar baz', True) for i in range(49)] + [only_common]), "bar": set([NamedObject("foo bar baz", True) for _ in range(49)] + [only_common]),
'baz': set([NamedObject('foo bar baz', True) for i in range(49)]) "baz": {NamedObject("foo bar baz", True) for _ in range(49)},
} }
reduce_common_words(d, 50) reduce_common_words(d, 50)
eq_(1, len(d['foo'])) eq_(1, len(d["foo"]))
eq_(1, len(d['bar'])) eq_(1, len(d["bar"]))
eq_(49, len(d['baz'])) eq_(49, len(d["baz"]))
class TestCaseget_match: class TestCaseGetMatch:
def test_simple(self): def test_simple(self):
o1 = NamedObject("foo bar", True) o1 = NamedObject("foo bar", True)
o2 = NamedObject("bar bleh", True) o2 = NamedObject("bar bleh", True)
m = get_match(o1, o2) m = get_match(o1, o2)
eq_(50, m.percentage) eq_(50, m.percentage)
eq_(['foo', 'bar'], m.first.words) eq_(["foo", "bar"], m.first.words)
eq_(['bar', 'bleh'], m.second.words) eq_(["bar", "bleh"], m.second.words)
assert m.first is o1 assert m.first is o1
assert m.second is o2 assert m.second is o2
@@ -340,7 +371,7 @@ class TestCaseget_match:
assert object() not in m assert object() not in m
def test_word_weight(self): def test_word_weight(self):
m = get_match(NamedObject("foo bar", True), NamedObject("bar bleh", True), (WEIGHT_WORDS, )) m = get_match(NamedObject("foo bar", True), NamedObject("bar bleh", True), (WEIGHT_WORDS,))
eq_(m.percentage, int((6.0 / 13.0) * 100)) eq_(m.percentage, int((6.0 / 13.0) * 100))
@@ -349,54 +380,63 @@ class TestCaseGetMatches:
eq_(getmatches([]), []) eq_(getmatches([]), [])
def test_simple(self): def test_simple(self):
l = [NamedObject("foo bar"), NamedObject("bar bleh"), NamedObject("a b c foo")] item_list = [
r = getmatches(l) NamedObject("foo bar"),
NamedObject("bar bleh"),
NamedObject("a b c foo"),
]
r = getmatches(item_list)
eq_(2, len(r)) eq_(2, len(r))
m = first(m for m in r if m.percentage == 50) #"foo bar" and "bar bleh" m = first(m for m in r if m.percentage == 50) # "foo bar" and "bar bleh"
assert_match(m, 'foo bar', 'bar bleh') assert_match(m, "foo bar", "bar bleh")
m = first(m for m in r if m.percentage == 33) #"foo bar" and "a b c foo" m = first(m for m in r if m.percentage == 33) # "foo bar" and "a b c foo"
assert_match(m, 'foo bar', 'a b c foo') assert_match(m, "foo bar", "a b c foo")
def test_null_and_unrelated_objects(self): def test_null_and_unrelated_objects(self):
l = [NamedObject("foo bar"), NamedObject("bar bleh"), NamedObject(""), NamedObject("unrelated object")] item_list = [
r = getmatches(l) NamedObject("foo bar"),
NamedObject("bar bleh"),
NamedObject(""),
NamedObject("unrelated object"),
]
r = getmatches(item_list)
eq_(len(r), 1) eq_(len(r), 1)
m = r[0] m = r[0]
eq_(m.percentage, 50) eq_(m.percentage, 50)
assert_match(m, 'foo bar', 'bar bleh') assert_match(m, "foo bar", "bar bleh")
def test_twice_the_same_word(self): def test_twice_the_same_word(self):
l = [NamedObject("foo foo bar"), NamedObject("bar bleh")] item_list = [NamedObject("foo foo bar"), NamedObject("bar bleh")]
r = getmatches(l) r = getmatches(item_list)
eq_(1, len(r)) eq_(1, len(r))
def test_twice_the_same_word_when_preworded(self): def test_twice_the_same_word_when_preworded(self):
l = [NamedObject("foo foo bar", True), NamedObject("bar bleh", True)] item_list = [NamedObject("foo foo bar", True), NamedObject("bar bleh", True)]
r = getmatches(l) r = getmatches(item_list)
eq_(1, len(r)) eq_(1, len(r))
def test_two_words_match(self): def test_two_words_match(self):
l = [NamedObject("foo bar"), NamedObject("foo bar bleh")] item_list = [NamedObject("foo bar"), NamedObject("foo bar bleh")]
r = getmatches(l) r = getmatches(item_list)
eq_(1, len(r)) eq_(1, len(r))
def test_match_files_with_only_common_words(self): def test_match_files_with_only_common_words(self):
#If a word occurs more than 50 times, it is excluded from the matching process # If a word occurs more than 50 times, it is excluded from the matching process
#The problem with the common_word_threshold is that the files containing only common # The problem with the common_word_threshold is that the files containing only common
#words will never be matched together. We *should* match them. # words will never be matched together. We *should* match them.
# This test assumes that the common word threashold const is 50 # This test assumes that the common word threshold const is 50
l = [NamedObject("foo") for i in range(50)] item_list = [NamedObject("foo") for _ in range(50)]
r = getmatches(l) r = getmatches(item_list)
eq_(1225, len(r)) eq_(1225, len(r))
def test_use_words_already_there_if_there(self): def test_use_words_already_there_if_there(self):
o1 = NamedObject('foo') o1 = NamedObject("foo")
o2 = NamedObject('bar') o2 = NamedObject("bar")
o2.words = ['foo'] o2.words = ["foo"]
eq_(1, len(getmatches([o1, o2]))) eq_(1, len(getmatches([o1, o2])))
def test_job(self): def test_job(self):
def do_progress(p, d=''): def do_progress(p, d=""):
self.log.append(p) self.log.append(p)
return True return True
@@ -409,28 +449,28 @@ class TestCaseGetMatches:
eq_(100, self.log[-1]) eq_(100, self.log[-1])
def test_weight_words(self): def test_weight_words(self):
l = [NamedObject("foo bar"), NamedObject("bar bleh")] item_list = [NamedObject("foo bar"), NamedObject("bar bleh")]
m = getmatches(l, weight_words=True)[0] m = getmatches(item_list, weight_words=True)[0]
eq_(int((6.0 / 13.0) * 100), m.percentage) eq_(int((6.0 / 13.0) * 100), m.percentage)
def test_similar_word(self): def test_similar_word(self):
l = [NamedObject("foobar"), NamedObject("foobars")] item_list = [NamedObject("foobar"), NamedObject("foobars")]
eq_(len(getmatches(l, match_similar_words=True)), 1) eq_(len(getmatches(item_list, match_similar_words=True)), 1)
eq_(getmatches(l, match_similar_words=True)[0].percentage, 100) eq_(getmatches(item_list, match_similar_words=True)[0].percentage, 100)
l = [NamedObject("foobar"), NamedObject("foo")] item_list = [NamedObject("foobar"), NamedObject("foo")]
eq_(len(getmatches(l, match_similar_words=True)), 0) #too far eq_(len(getmatches(item_list, match_similar_words=True)), 0) # too far
l = [NamedObject("bizkit"), NamedObject("bizket")] item_list = [NamedObject("bizkit"), NamedObject("bizket")]
eq_(len(getmatches(l, match_similar_words=True)), 1) eq_(len(getmatches(item_list, match_similar_words=True)), 1)
l = [NamedObject("foobar"), NamedObject("foosbar")] item_list = [NamedObject("foobar"), NamedObject("foosbar")]
eq_(len(getmatches(l, match_similar_words=True)), 1) eq_(len(getmatches(item_list, match_similar_words=True)), 1)
def test_single_object_with_similar_words(self): def test_single_object_with_similar_words(self):
l = [NamedObject("foo foos")] item_list = [NamedObject("foo foos")]
eq_(len(getmatches(l, match_similar_words=True)), 0) eq_(len(getmatches(item_list, match_similar_words=True)), 0)
def test_double_words_get_counted_only_once(self): def test_double_words_get_counted_only_once(self):
l = [NamedObject("foo bar foo bleh"), NamedObject("foo bar bleh bar")] item_list = [NamedObject("foo bar foo bleh"), NamedObject("foo bar bleh bar")]
m = getmatches(l)[0] m = getmatches(item_list)[0]
eq_(75, m.percentage) eq_(75, m.percentage)
def test_with_fields(self): def test_with_fields(self):
@@ -450,13 +490,13 @@ class TestCaseGetMatches:
eq_(m.percentage, 50) eq_(m.percentage, 50)
def test_only_match_similar_when_the_option_is_set(self): def test_only_match_similar_when_the_option_is_set(self):
l = [NamedObject("foobar"), NamedObject("foobars")] item_list = [NamedObject("foobar"), NamedObject("foobars")]
eq_(len(getmatches(l, match_similar_words=False)), 0) eq_(len(getmatches(item_list, match_similar_words=False)), 0)
def test_dont_recurse_do_match(self): def test_dont_recurse_do_match(self):
# with nosetests, the stack is increased. The number has to be high enough not to be failing falsely # with nosetests, the stack is increased. The number has to be high enough not to be failing falsely
sys.setrecursionlimit(200) sys.setrecursionlimit(200)
files = [NamedObject('foo bar') for i in range(201)] files = [NamedObject("foo bar") for _ in range(201)]
try: try:
getmatches(files) getmatches(files)
except RuntimeError: except RuntimeError:
@@ -465,34 +505,60 @@ class TestCaseGetMatches:
sys.setrecursionlimit(1000) sys.setrecursionlimit(1000)
def test_min_match_percentage(self): def test_min_match_percentage(self):
l = [NamedObject("foo bar"), NamedObject("bar bleh"), NamedObject("a b c foo")] item_list = [
r = getmatches(l, min_match_percentage=50) NamedObject("foo bar"),
eq_(1, len(r)) #Only "foo bar" / "bar bleh" should match NamedObject("bar bleh"),
NamedObject("a b c foo"),
]
r = getmatches(item_list, min_match_percentage=50)
eq_(1, len(r)) # Only "foo bar" / "bar bleh" should match
def test_MemoryError(self, monkeypatch): def test_memory_error(self, monkeypatch):
@log_calls @log_calls
def mocked_match(first, second, flags): def mocked_match(first, second, flags):
if len(mocked_match.calls) > 42: if len(mocked_match.calls) > 42:
raise MemoryError() raise MemoryError()
return Match(first, second, 0) return Match(first, second, 0)
objects = [NamedObject() for i in range(10)] # results in 45 matches objects = [NamedObject() for _ in range(10)] # results in 45 matches
monkeypatch.setattr(engine, 'get_match', mocked_match) monkeypatch.setattr(engine, "get_match", mocked_match)
try: try:
r = getmatches(objects) r = getmatches(objects)
except MemoryError: except MemoryError:
self.fail('MemorryError must be handled') self.fail("MemoryError must be handled")
eq_(42, len(r)) eq_(42, len(r))
class TestCaseGetMatchesByContents: class TestCaseGetMatchesByContents:
def test_dont_compare_empty_files(self): def test_big_file_partial_hashing(self):
o1, o2 = no(size=0), no(size=0) smallsize = 1
assert not getmatches_by_contents([o1, o2]) bigsize = 100 * 1024 * 1024 # 100MB
f = [
no("bigfoo", size=bigsize),
no("bigbar", size=bigsize),
no("smallfoo", size=smallsize),
no("smallbar", size=smallsize),
]
f[0].digest = f[0].digest_partial = f[0].digest_samples = "foobar"
f[1].digest = f[1].digest_partial = f[1].digest_samples = "foobar"
f[2].digest = f[2].digest_partial = "bleh"
f[3].digest = f[3].digest_partial = "bleh"
r = getmatches_by_contents(f, bigsize=bigsize)
eq_(len(r), 2)
# User disabled optimization for big files, compute digests as usual
r = getmatches_by_contents(f, bigsize=0)
eq_(len(r), 2)
# Other file is now slightly different, digest_partial is still the same
f[1].digest = f[1].digest_samples = "foobardiff"
r = getmatches_by_contents(f, bigsize=bigsize)
# Successfully filter it out
eq_(len(r), 1)
r = getmatches_by_contents(f, bigsize=0)
eq_(len(r), 1)
class TestCaseGroup: class TestCaseGroup:
def test_empy(self): def test_empty(self):
g = Group() g = Group()
eq_(None, g.ref) eq_(None, g.ref)
eq_([], g.dupes) eq_([], g.dupes)
@@ -599,7 +665,7 @@ class TestCaseGroup:
eq_([o1], g.dupes) eq_([o1], g.dupes)
g.switch_ref(o2) g.switch_ref(o2)
assert o2 is g.ref assert o2 is g.ref
g.switch_ref(NamedObject('', True)) g.switch_ref(NamedObject("", True))
assert o2 is g.ref assert o2 is g.ref
def test_switch_ref_from_ref_dir(self): def test_switch_ref_from_ref_dir(self):
@@ -620,11 +686,11 @@ class TestCaseGroup:
m = g.get_match_of(o) m = g.get_match_of(o)
assert g.ref in m assert g.ref in m
assert o in m assert o in m
assert g.get_match_of(NamedObject('', True)) is None assert g.get_match_of(NamedObject("", True)) is None
assert g.get_match_of(g.ref) is None assert g.get_match_of(g.ref) is None
def test_percentage(self): def test_percentage(self):
#percentage should return the avg percentage in relation to the ref # percentage should return the avg percentage in relation to the ref
m1, m2, m3 = get_match_triangle() m1, m2, m3 = get_match_triangle()
m1 = Match(m1[0], m1[1], 100) m1 = Match(m1[0], m1[1], 100)
m2 = Match(m2[0], m2[1], 50) m2 = Match(m2[0], m2[1], 50)
@@ -651,9 +717,9 @@ class TestCaseGroup:
o1 = m1.first o1 = m1.first
o2 = m1.second o2 = m1.second
o3 = m2.second o3 = m2.second
o1.name = 'c' o1.name = "c"
o2.name = 'b' o2.name = "b"
o3.name = 'a' o3.name = "a"
g = Group() g = Group()
g.add_match(m1) g.add_match(m1)
g.add_match(m2) g.add_match(m2)
@@ -666,8 +732,7 @@ class TestCaseGroup:
# if the ref has the same key as one or more of the dupe, run the tie_breaker func among them # if the ref has the same key as one or more of the dupe, run the tie_breaker func among them
g = get_test_group() g = get_test_group()
o1, o2, o3 = g.ordered o1, o2, o3 = g.ordered
tie_breaker = lambda ref, dupe: dupe is o3 g.prioritize(lambda x: 0, lambda ref, dupe: dupe is o3)
g.prioritize(lambda x: 0, tie_breaker)
assert g.ref is o3 assert g.ref is o3
def test_prioritize_with_tie_breaker_runs_on_all_dupes(self): def test_prioritize_with_tie_breaker_runs_on_all_dupes(self):
@@ -678,8 +743,7 @@ class TestCaseGroup:
o1.foo = 1 o1.foo = 1
o2.foo = 2 o2.foo = 2
o3.foo = 3 o3.foo = 3
tie_breaker = lambda ref, dupe: dupe.foo > ref.foo g.prioritize(lambda x: 0, lambda ref, dupe: dupe.foo > ref.foo)
g.prioritize(lambda x: 0, tie_breaker)
assert g.ref is o3 assert g.ref is o3
def test_prioritize_with_tie_breaker_runs_only_on_tie_dupes(self): def test_prioritize_with_tie_breaker_runs_only_on_tie_dupes(self):
@@ -692,9 +756,7 @@ class TestCaseGroup:
o1.bar = 1 o1.bar = 1
o2.bar = 2 o2.bar = 2
o3.bar = 3 o3.bar = 3
key_func = lambda x: -x.foo g.prioritize(lambda x: -x.foo, lambda ref, dupe: dupe.bar > ref.bar)
tie_breaker = lambda ref, dupe: dupe.bar > ref.bar
g.prioritize(key_func, tie_breaker)
assert g.ref is o2 assert g.ref is o2
def test_prioritize_with_ref_dupe(self): def test_prioritize_with_ref_dupe(self):
@@ -709,9 +771,9 @@ class TestCaseGroup:
def test_prioritize_nothing_changes(self): def test_prioritize_nothing_changes(self):
# prioritize() returns False when nothing changes in the group. # prioritize() returns False when nothing changes in the group.
g = get_test_group() g = get_test_group()
g[0].name = 'a' g[0].name = "a"
g[1].name = 'b' g[1].name = "b"
g[2].name = 'c' g[2].name = "c"
assert not g.prioritize(lambda x: x.name) assert not g.prioritize(lambda x: x.name)
def test_list_like(self): def test_list_like(self):
@@ -723,7 +785,11 @@ class TestCaseGroup:
def test_discard_matches(self): def test_discard_matches(self):
g = Group() g = Group()
o1, o2, o3 = (NamedObject("foo", True), NamedObject("bar", True), NamedObject("baz", True)) o1, o2, o3 = (
NamedObject("foo", True),
NamedObject("bar", True),
NamedObject("baz", True),
)
g.add_match(get_match(o1, o2)) g.add_match(get_match(o1, o2))
g.add_match(get_match(o1, o3)) g.add_match(get_match(o1, o3))
g.discard_matches() g.discard_matches()
@@ -731,14 +797,14 @@ class TestCaseGroup:
eq_(0, len(g.candidates)) eq_(0, len(g.candidates))
class TestCaseget_groups: class TestCaseGetGroups:
def test_empty(self): def test_empty(self):
r = get_groups([]) r = get_groups([])
eq_([], r) eq_([], r)
def test_simple(self): def test_simple(self):
l = [NamedObject("foo bar"), NamedObject("bar bleh")] item_list = [NamedObject("foo bar"), NamedObject("bar bleh")]
matches = getmatches(l) matches = getmatches(item_list)
m = matches[0] m = matches[0]
r = get_groups(matches) r = get_groups(matches)
eq_(1, len(r)) eq_(1, len(r))
@@ -747,28 +813,39 @@ class TestCaseget_groups:
eq_([m.second], g.dupes) eq_([m.second], g.dupes)
def test_group_with_multiple_matches(self): def test_group_with_multiple_matches(self):
#This results in 3 matches # This results in 3 matches
l = [NamedObject("foo"), NamedObject("foo"), NamedObject("foo")] item_list = [NamedObject("foo"), NamedObject("foo"), NamedObject("foo")]
matches = getmatches(l) matches = getmatches(item_list)
r = get_groups(matches) r = get_groups(matches)
eq_(1, len(r)) eq_(1, len(r))
g = r[0] g = r[0]
eq_(3, len(g)) eq_(3, len(g))
def test_must_choose_a_group(self): def test_must_choose_a_group(self):
l = [NamedObject("a b"), NamedObject("a b"), NamedObject("b c"), NamedObject("c d"), NamedObject("c d")] item_list = [
#There will be 2 groups here: group "a b" and group "c d" NamedObject("a b"),
#"b c" can go either of them, but not both. NamedObject("a b"),
matches = getmatches(l) NamedObject("b c"),
NamedObject("c d"),
NamedObject("c d"),
]
# There will be 2 groups here: group "a b" and group "c d"
# "b c" can go either of them, but not both.
matches = getmatches(item_list)
r = get_groups(matches) r = get_groups(matches)
eq_(2, len(r)) eq_(2, len(r))
eq_(5, len(r[0])+len(r[1])) eq_(5, len(r[0]) + len(r[1]))
def test_should_all_go_in_the_same_group(self): def test_should_all_go_in_the_same_group(self):
l = [NamedObject("a b"), NamedObject("a b"), NamedObject("a b"), NamedObject("a b")] item_list = [
#There will be 2 groups here: group "a b" and group "c d" NamedObject("a b"),
#"b c" can fit in both, but it must be in only one of them NamedObject("a b"),
matches = getmatches(l) NamedObject("a b"),
NamedObject("a b"),
]
# There will be 2 groups here: group "a b" and group "c d"
# "b c" can fit in both, but it must be in only one of them
matches = getmatches(item_list)
r = get_groups(matches) r = get_groups(matches)
eq_(1, len(r)) eq_(1, len(r))
@@ -787,8 +864,8 @@ class TestCaseget_groups:
assert o3 in g assert o3 in g
def test_four_sized_group(self): def test_four_sized_group(self):
l = [NamedObject("foobar") for i in range(4)] item_list = [NamedObject("foobar") for _ in range(4)]
m = getmatches(l) m = getmatches(item_list)
r = get_groups(m) r = get_groups(m)
eq_(1, len(r)) eq_(1, len(r))
eq_(4, len(r[0])) eq_(4, len(r[0]))
@@ -807,11 +884,11 @@ class TestCaseget_groups:
# If, with a (A, B, C, D) set, all match with A, but C and D don't match with B and that the # If, with a (A, B, C, D) set, all match with A, but C and D don't match with B and that the
# (A, B) match is the highest (thus resulting in an (A, B) group), still match C and D # (A, B) match is the highest (thus resulting in an (A, B) group), still match C and D
# in a separate group instead of discarding them. # in a separate group instead of discarding them.
A, B, C, D = [NamedObject() for _ in range(4)] A, B, C, D = (NamedObject() for _ in range(4))
m1 = Match(A, B, 90) # This is the strongest "A" match m1 = Match(A, B, 90) # This is the strongest "A" match
m2 = Match(A, C, 80) # Because C doesn't match with B, it won't be in the group m2 = Match(A, C, 80) # Because C doesn't match with B, it won't be in the group
m3 = Match(A, D, 80) # Same thing for D m3 = Match(A, D, 80) # Same thing for D
m4 = Match(C, D, 70) # However, because C and D match, they should have their own group. m4 = Match(C, D, 70) # However, because C and D match, they should have their own group.
groups = get_groups([m1, m2, m3, m4]) groups = get_groups([m1, m2, m3, m4])
eq_(len(groups), 2) eq_(len(groups), 2)
g1, g2 = groups g1, g2 = groups
@@ -819,4 +896,3 @@ class TestCaseget_groups:
assert B in g1 assert B in g1
assert C in g2 assert C in g2
assert D in g2 assert D in g2

435
core/tests/exclude_test.py Normal file
View File

@@ -0,0 +1,435 @@
# Copyright 2016 Hardcoded Software (http://www.hardcoded.net)
#
# This software is licensed under the "GPLv3" License as described in the "LICENSE" file,
# which should be included with this package. The terms are also available at
# http://www.gnu.org/licenses/gpl-3.0.html
import io
from xml.etree import ElementTree as ET
from hscommon.testutil import eq_
from hscommon.plat import ISWINDOWS
from core.tests.base import DupeGuru
from core.exclude import ExcludeList, ExcludeDict, default_regexes, AlreadyThereException
from re import error
# Two slightly different implementations here, one around a list of lists,
# and another around a dictionary.
class TestCaseListXMLLoading:
def setup_method(self, method):
self.exclude_list = ExcludeList()
def test_load_non_existant_file(self):
# Loads the pre-defined regexes
self.exclude_list.load_from_xml("non_existant.xml")
eq_(len(default_regexes), len(self.exclude_list))
# they should also be marked by default
eq_(len(default_regexes), self.exclude_list.marked_count)
def test_save_to_xml(self):
f = io.BytesIO()
self.exclude_list.save_to_xml(f)
f.seek(0)
doc = ET.parse(f)
root = doc.getroot()
eq_("exclude_list", root.tag)
def test_save_and_load(self, tmpdir):
e1 = ExcludeList()
e2 = ExcludeList()
eq_(len(e1), 0)
e1.add(r"one")
e1.mark(r"one")
e1.add(r"two")
tmpxml = str(tmpdir.join("exclude_testunit.xml"))
e1.save_to_xml(tmpxml)
e2.load_from_xml(tmpxml)
# We should have the default regexes
assert r"one" in e2
assert r"two" in e2
eq_(len(e2), 2)
eq_(e2.marked_count, 1)
def test_load_xml_with_garbage_and_missing_elements(self):
root = ET.Element("foobar") # The root element shouldn't matter
exclude_node = ET.SubElement(root, "bogus")
exclude_node.set("regex", "None")
exclude_node.set("marked", "y")
exclude_node = ET.SubElement(root, "exclude")
exclude_node.set("regex", "one")
# marked field invalid
exclude_node.set("markedddd", "y")
exclude_node = ET.SubElement(root, "exclude")
exclude_node.set("regex", "two")
# missing marked field
exclude_node = ET.SubElement(root, "exclude")
exclude_node.set("regex", "three")
exclude_node.set("markedddd", "pazjbjepo")
f = io.BytesIO()
tree = ET.ElementTree(root)
tree.write(f, encoding="utf-8")
f.seek(0)
self.exclude_list.load_from_xml(f)
print(f"{[x for x in self.exclude_list]}")
# only the two "exclude" nodes should be added,
eq_(3, len(self.exclude_list))
# None should be marked
eq_(0, self.exclude_list.marked_count)
class TestCaseDictXMLLoading(TestCaseListXMLLoading):
def setup_method(self, method):
self.exclude_list = ExcludeDict()
class TestCaseListEmpty:
def setup_method(self, method):
self.app = DupeGuru()
self.app.exclude_list = ExcludeList(union_regex=False)
self.exclude_list = self.app.exclude_list
def test_add_mark_and_remove_regex(self):
regex1 = r"one"
regex2 = r"two"
self.exclude_list.add(regex1)
assert regex1 in self.exclude_list
self.exclude_list.add(regex2)
self.exclude_list.mark(regex1)
self.exclude_list.mark(regex2)
eq_(len(self.exclude_list), 2)
eq_(len(self.exclude_list.compiled), 2)
compiled_files = [x for x in self.exclude_list.compiled_files]
eq_(len(compiled_files), 2)
self.exclude_list.remove(regex2)
assert regex2 not in self.exclude_list
eq_(len(self.exclude_list), 1)
def test_add_duplicate(self):
self.exclude_list.add(r"one")
eq_(1, len(self.exclude_list))
try:
self.exclude_list.add(r"one")
except Exception:
pass
eq_(1, len(self.exclude_list))
def test_add_not_compilable(self):
# Trying to add a non-valid regex should not work and raise exception
regex = r"one))"
try:
self.exclude_list.add(regex)
except Exception as e:
# Make sure we raise a re.error so that the interface can process it
eq_(type(e), error)
added = self.exclude_list.mark(regex)
eq_(added, False)
eq_(len(self.exclude_list), 0)
eq_(len(self.exclude_list.compiled), 0)
compiled_files = [x for x in self.exclude_list.compiled_files]
eq_(len(compiled_files), 0)
def test_force_add_not_compilable(self):
"""Used when loading from XML for example"""
regex = r"one))"
self.exclude_list.add(regex, forced=True)
marked = self.exclude_list.mark(regex)
eq_(marked, False) # can't be marked since not compilable
eq_(len(self.exclude_list), 1)
eq_(len(self.exclude_list.compiled), 0)
compiled_files = [x for x in self.exclude_list.compiled_files]
eq_(len(compiled_files), 0)
# adding a duplicate
regex = r"one))"
try:
self.exclude_list.add(regex, forced=True)
except Exception as e:
# we should have this exception, and it shouldn't be added
assert type(e) is AlreadyThereException
eq_(len(self.exclude_list), 1)
eq_(len(self.exclude_list.compiled), 0)
def test_rename_regex(self):
regex = r"one"
self.exclude_list.add(regex)
self.exclude_list.mark(regex)
regex_renamed = r"one))"
# Not compilable, can't be marked
self.exclude_list.rename(regex, regex_renamed)
assert regex not in self.exclude_list
assert regex_renamed in self.exclude_list
eq_(self.exclude_list.is_marked(regex_renamed), False)
self.exclude_list.mark(regex_renamed)
eq_(self.exclude_list.is_marked(regex_renamed), False)
regex_renamed_compilable = r"two"
self.exclude_list.rename(regex_renamed, regex_renamed_compilable)
assert regex_renamed_compilable in self.exclude_list
eq_(self.exclude_list.is_marked(regex_renamed), False)
self.exclude_list.mark(regex_renamed_compilable)
eq_(self.exclude_list.is_marked(regex_renamed_compilable), True)
eq_(len(self.exclude_list), 1)
# Should still be marked after rename
regex_compilable = r"three"
self.exclude_list.rename(regex_renamed_compilable, regex_compilable)
eq_(self.exclude_list.is_marked(regex_compilable), True)
def test_rename_regex_file_to_path(self):
regex = r".*/one.*"
if ISWINDOWS:
regex = r".*\\one.*"
regex2 = r".*one.*"
self.exclude_list.add(regex)
self.exclude_list.mark(regex)
compiled_re = [x.pattern for x in self.exclude_list._excluded_compiled]
files_re = [x.pattern for x in self.exclude_list.compiled_files]
paths_re = [x.pattern for x in self.exclude_list.compiled_paths]
assert regex in compiled_re
assert regex not in files_re
assert regex in paths_re
self.exclude_list.rename(regex, regex2)
compiled_re = [x.pattern for x in self.exclude_list._excluded_compiled]
files_re = [x.pattern for x in self.exclude_list.compiled_files]
paths_re = [x.pattern for x in self.exclude_list.compiled_paths]
assert regex not in compiled_re
assert regex2 in compiled_re
assert regex2 in files_re
assert regex2 not in paths_re
def test_restore_default(self):
"""Only unmark previously added regexes and mark the pre-defined ones"""
regex = r"one"
self.exclude_list.add(regex)
self.exclude_list.mark(regex)
self.exclude_list.restore_defaults()
eq_(len(default_regexes), self.exclude_list.marked_count)
# added regex shouldn't be marked
eq_(self.exclude_list.is_marked(regex), False)
# added regex shouldn't be in compiled list either
compiled = [x for x in self.exclude_list.compiled]
assert regex not in compiled
# Only default regexes marked and in compiled list
for re in default_regexes:
assert self.exclude_list.is_marked(re)
found = False
for compiled_re in compiled:
if compiled_re.pattern == re:
found = True
if not found:
raise (Exception(f"Default RE {re} not found in compiled list."))
eq_(len(default_regexes), len(self.exclude_list.compiled))
class TestCaseListEmptyUnion(TestCaseListEmpty):
"""Same but with union regex"""
def setup_method(self, method):
self.app = DupeGuru()
self.app.exclude_list = ExcludeList(union_regex=True)
self.exclude_list = self.app.exclude_list
def test_add_mark_and_remove_regex(self):
regex1 = r"one"
regex2 = r"two"
self.exclude_list.add(regex1)
assert regex1 in self.exclude_list
self.exclude_list.add(regex2)
self.exclude_list.mark(regex1)
self.exclude_list.mark(regex2)
eq_(len(self.exclude_list), 2)
eq_(len(self.exclude_list.compiled), 1)
compiled_files = [x for x in self.exclude_list.compiled_files]
eq_(len(compiled_files), 1) # Two patterns joined together into one
assert "|" in compiled_files[0].pattern
self.exclude_list.remove(regex2)
assert regex2 not in self.exclude_list
eq_(len(self.exclude_list), 1)
def test_rename_regex_file_to_path(self):
regex = r".*/one.*"
if ISWINDOWS:
regex = r".*\\one.*"
regex2 = r".*one.*"
self.exclude_list.add(regex)
self.exclude_list.mark(regex)
eq_(len([x for x in self.exclude_list]), 1)
compiled_re = [x.pattern for x in self.exclude_list.compiled]
files_re = [x.pattern for x in self.exclude_list.compiled_files]
paths_re = [x.pattern for x in self.exclude_list.compiled_paths]
assert regex in compiled_re
assert regex not in files_re
assert regex in paths_re
self.exclude_list.rename(regex, regex2)
eq_(len([x for x in self.exclude_list]), 1)
compiled_re = [x.pattern for x in self.exclude_list.compiled]
files_re = [x.pattern for x in self.exclude_list.compiled_files]
paths_re = [x.pattern for x in self.exclude_list.compiled_paths]
assert regex not in compiled_re
assert regex2 in compiled_re
assert regex2 in files_re
assert regex2 not in paths_re
def test_restore_default(self):
"""Only unmark previously added regexes and mark the pre-defined ones"""
regex = r"one"
self.exclude_list.add(regex)
self.exclude_list.mark(regex)
self.exclude_list.restore_defaults()
eq_(len(default_regexes), self.exclude_list.marked_count)
# added regex shouldn't be marked
eq_(self.exclude_list.is_marked(regex), False)
# added regex shouldn't be in compiled list either
compiled = [x for x in self.exclude_list.compiled]
assert regex not in compiled
# Need to escape both to get the same strings after compilation
compiled_escaped = {x.encode("unicode-escape").decode() for x in compiled[0].pattern.split("|")}
default_escaped = {x.encode("unicode-escape").decode() for x in default_regexes}
assert compiled_escaped == default_escaped
eq_(len(default_regexes), len(compiled[0].pattern.split("|")))
class TestCaseDictEmpty(TestCaseListEmpty):
"""Same, but with dictionary implementation"""
def setup_method(self, method):
self.app = DupeGuru()
self.app.exclude_list = ExcludeDict(union_regex=False)
self.exclude_list = self.app.exclude_list
class TestCaseDictEmptyUnion(TestCaseDictEmpty):
"""Same, but with union regex"""
def setup_method(self, method):
self.app = DupeGuru()
self.app.exclude_list = ExcludeDict(union_regex=True)
self.exclude_list = self.app.exclude_list
def test_add_mark_and_remove_regex(self):
regex1 = r"one"
regex2 = r"two"
self.exclude_list.add(regex1)
assert regex1 in self.exclude_list
self.exclude_list.add(regex2)
self.exclude_list.mark(regex1)
self.exclude_list.mark(regex2)
eq_(len(self.exclude_list), 2)
eq_(len(self.exclude_list.compiled), 1)
compiled_files = [x for x in self.exclude_list.compiled_files]
# two patterns joined into one
eq_(len(compiled_files), 1)
self.exclude_list.remove(regex2)
assert regex2 not in self.exclude_list
eq_(len(self.exclude_list), 1)
def test_rename_regex_file_to_path(self):
regex = r".*/one.*"
if ISWINDOWS:
regex = r".*\\one.*"
regex2 = r".*one.*"
self.exclude_list.add(regex)
self.exclude_list.mark(regex)
marked_re = [x for marked, x in self.exclude_list if marked]
eq_(len(marked_re), 1)
compiled_re = [x.pattern for x in self.exclude_list.compiled]
files_re = [x.pattern for x in self.exclude_list.compiled_files]
paths_re = [x.pattern for x in self.exclude_list.compiled_paths]
assert regex in compiled_re
assert regex not in files_re
assert regex in paths_re
self.exclude_list.rename(regex, regex2)
compiled_re = [x.pattern for x in self.exclude_list.compiled]
files_re = [x.pattern for x in self.exclude_list.compiled_files]
paths_re = [x.pattern for x in self.exclude_list.compiled_paths]
assert regex not in compiled_re
assert regex2 in compiled_re
assert regex2 in files_re
assert regex2 not in paths_re
def test_restore_default(self):
"""Only unmark previously added regexes and mark the pre-defined ones"""
regex = r"one"
self.exclude_list.add(regex)
self.exclude_list.mark(regex)
self.exclude_list.restore_defaults()
eq_(len(default_regexes), self.exclude_list.marked_count)
# added regex shouldn't be marked
eq_(self.exclude_list.is_marked(regex), False)
# added regex shouldn't be in compiled list either
compiled = [x for x in self.exclude_list.compiled]
assert regex not in compiled
# Need to escape both to get the same strings after compilation
compiled_escaped = {x.encode("unicode-escape").decode() for x in compiled[0].pattern.split("|")}
default_escaped = {x.encode("unicode-escape").decode() for x in default_regexes}
assert compiled_escaped == default_escaped
eq_(len(default_regexes), len(compiled[0].pattern.split("|")))
def split_union(pattern_object):
"""Returns list of strings for each union pattern"""
return [x for x in pattern_object.pattern.split("|")]
class TestCaseCompiledList:
"""Test consistency between union or and separate versions."""
def setup_method(self, method):
self.e_separate = ExcludeList(union_regex=False)
self.e_separate.restore_defaults()
self.e_union = ExcludeList(union_regex=True)
self.e_union.restore_defaults()
def test_same_number_of_expressions(self):
# We only get one union Pattern item in a tuple, which is made of however many parts
eq_(len(split_union(self.e_union.compiled[0])), len(default_regexes))
# We get as many as there are marked items
eq_(len(self.e_separate.compiled), len(default_regexes))
exprs = split_union(self.e_union.compiled[0])
# We should have the same number and the same expressions
eq_(len(exprs), len(self.e_separate.compiled))
for expr in self.e_separate.compiled:
assert expr.pattern in exprs
def test_compiled_files(self):
# is path separator checked properly to yield the output
if ISWINDOWS:
regex1 = r"test\\one\\sub"
else:
regex1 = r"test/one/sub"
self.e_separate.add(regex1)
self.e_separate.mark(regex1)
self.e_union.add(regex1)
self.e_union.mark(regex1)
separate_compiled_dirs = self.e_separate.compiled
separate_compiled_files = [x for x in self.e_separate.compiled_files]
# HACK we need to call compiled property FIRST to generate the cache
union_compiled_dirs = self.e_union.compiled
# print(f"type: {type(self.e_union.compiled_files[0])}")
# A generator returning only one item... ugh
union_compiled_files = [x for x in self.e_union.compiled_files][0]
print(f"compiled files: {union_compiled_files}")
# Separate should give several plus the one added
eq_(len(separate_compiled_dirs), len(default_regexes) + 1)
# regex1 shouldn't be in the "files" version
eq_(len(separate_compiled_files), len(default_regexes))
# Only one Pattern returned, which when split should be however many + 1
eq_(len(split_union(union_compiled_dirs[0])), len(default_regexes) + 1)
# regex1 shouldn't be here either
eq_(len(split_union(union_compiled_files)), len(default_regexes))
class TestCaseCompiledDict(TestCaseCompiledList):
"""Test the dictionary version"""
def setup_method(self, method):
self.e_separate = ExcludeDict(union_regex=False)
self.e_separate.restore_defaults()
self.e_union = ExcludeDict(union_regex=True)
self.e_union.restore_defaults()

View File

@@ -1,45 +1,113 @@
# Created By: Virgil Dupras # Created By: Virgil Dupras
# Created On: 2009-10-23 # Created On: 2009-10-23
# Copyright 2015 Hardcoded Software (http://www.hardcoded.net) # Copyright 2015 Hardcoded Software (http://www.hardcoded.net)
# #
# This software is licensed under the "GPLv3" License as described in the "LICENSE" file, # This software is licensed under the "GPLv3" License as described in the "LICENSE" file,
# which should be included with this package. The terms are also available at # which should be included with this package. The terms are also available at
# http://www.gnu.org/licenses/gpl-3.0.html # http://www.gnu.org/licenses/gpl-3.0.html
import hashlib import typing
from os import urandom
from hscommon.path import Path from pathlib import Path
from hscommon.testutil import eq_ from hscommon.testutil import eq_
from core.tests.directories_test import create_fake_fs from core.tests.directories_test import create_fake_fs
from .. import fs from core import fs
hasher: typing.Callable
try:
import xxhash
hasher = xxhash.xxh128
except ImportError:
import hashlib
hasher = hashlib.md5
def create_fake_fs_with_random_data(rootpath):
rootpath = rootpath.joinpath("fs")
rootpath.mkdir()
rootpath.joinpath("dir1").mkdir()
rootpath.joinpath("dir2").mkdir()
rootpath.joinpath("dir3").mkdir()
data1 = urandom(200 * 1024) # 200KiB
data2 = urandom(1024 * 1024) # 1MiB
data3 = urandom(10 * 1024 * 1024) # 10MiB
with rootpath.joinpath("file1.test").open("wb") as fp:
fp.write(data1)
with rootpath.joinpath("file2.test").open("wb") as fp:
fp.write(data2)
with rootpath.joinpath("file3.test").open("wb") as fp:
fp.write(data3)
with rootpath.joinpath("dir1", "file1.test").open("wb") as fp:
fp.write(data1)
with rootpath.joinpath("dir2", "file2.test").open("wb") as fp:
fp.write(data2)
with rootpath.joinpath("dir3", "file3.test").open("wb") as fp:
fp.write(data3)
return rootpath
def test_size_aggregates_subfiles(tmpdir): def test_size_aggregates_subfiles(tmpdir):
p = create_fake_fs(Path(str(tmpdir))) p = create_fake_fs(Path(str(tmpdir)))
b = fs.Folder(p) b = fs.Folder(p)
eq_(b.size, 12) eq_(b.size, 12)
def test_md5_aggregate_subfiles_sorted(tmpdir):
#dir.allfiles can return child in any order. Thus, bundle.md5 must aggregate def test_digest_aggregate_subfiles_sorted(tmpdir):
#all files' md5 it contains, but it must make sure that it does so in the # dir.allfiles can return child in any order. Thus, bundle.digest must aggregate
#same order everytime. # all files' digests it contains, but it must make sure that it does so in the
p = create_fake_fs(Path(str(tmpdir))) # same order everytime.
p = create_fake_fs_with_random_data(Path(str(tmpdir)))
b = fs.Folder(p) b = fs.Folder(p)
md51 = fs.File(p['dir1']['file1.test']).md5 digest1 = fs.File(p.joinpath("dir1", "file1.test")).digest
md52 = fs.File(p['dir2']['file2.test']).md5 digest2 = fs.File(p.joinpath("dir2", "file2.test")).digest
md53 = fs.File(p['dir3']['file3.test']).md5 digest3 = fs.File(p.joinpath("dir3", "file3.test")).digest
md54 = fs.File(p['file1.test']).md5 digest4 = fs.File(p.joinpath("file1.test")).digest
md55 = fs.File(p['file2.test']).md5 digest5 = fs.File(p.joinpath("file2.test")).digest
md56 = fs.File(p['file3.test']).md5 digest6 = fs.File(p.joinpath("file3.test")).digest
# The expected md5 is the md5 of md5s for folders and the direct md5 for files # The expected digest is the hash of digests for folders and the direct digest for files
folder_md51 = hashlib.md5(md51).digest() folder_digest1 = hasher(digest1).digest()
folder_md52 = hashlib.md5(md52).digest() folder_digest2 = hasher(digest2).digest()
folder_md53 = hashlib.md5(md53).digest() folder_digest3 = hasher(digest3).digest()
md5 = hashlib.md5(folder_md51+folder_md52+folder_md53+md54+md55+md56) digest = hasher(folder_digest1 + folder_digest2 + folder_digest3 + digest4 + digest5 + digest6).digest()
eq_(b.md5, md5.digest()) eq_(b.digest, digest)
def test_partial_digest_aggregate_subfile_sorted(tmpdir):
p = create_fake_fs_with_random_data(Path(str(tmpdir)))
b = fs.Folder(p)
digest1 = fs.File(p.joinpath("dir1", "file1.test")).digest_partial
digest2 = fs.File(p.joinpath("dir2", "file2.test")).digest_partial
digest3 = fs.File(p.joinpath("dir3", "file3.test")).digest_partial
digest4 = fs.File(p.joinpath("file1.test")).digest_partial
digest5 = fs.File(p.joinpath("file2.test")).digest_partial
digest6 = fs.File(p.joinpath("file3.test")).digest_partial
# The expected digest is the hash of digests for folders and the direct digest for files
folder_digest1 = hasher(digest1).digest()
folder_digest2 = hasher(digest2).digest()
folder_digest3 = hasher(digest3).digest()
digest = hasher(folder_digest1 + folder_digest2 + folder_digest3 + digest4 + digest5 + digest6).digest()
eq_(b.digest_partial, digest)
digest1 = fs.File(p.joinpath("dir1", "file1.test")).digest_samples
digest2 = fs.File(p.joinpath("dir2", "file2.test")).digest_samples
digest3 = fs.File(p.joinpath("dir3", "file3.test")).digest_samples
digest4 = fs.File(p.joinpath("file1.test")).digest_samples
digest5 = fs.File(p.joinpath("file2.test")).digest_samples
digest6 = fs.File(p.joinpath("file3.test")).digest_samples
# The expected digest is the digest of digests for folders and the direct digest for files
folder_digest1 = hasher(digest1).digest()
folder_digest2 = hasher(digest2).digest()
folder_digest3 = hasher(digest3).digest()
digest = hasher(folder_digest1 + folder_digest2 + folder_digest3 + digest4 + digest5 + digest6).digest()
eq_(b.digest_samples, digest)
def test_has_file_attrs(tmpdir): def test_has_file_attrs(tmpdir):
#a Folder must behave like a file, so it must have mtime attributes # a Folder must behave like a file, so it must have mtime attributes
b = fs.Folder(Path(str(tmpdir))) b = fs.Folder(Path(str(tmpdir)))
assert b.mtime > 0 assert b.mtime > 0
eq_(b.extension, '') eq_(b.extension, "")

View File

@@ -10,81 +10,89 @@ from xml.etree import ElementTree as ET
from pytest import raises from pytest import raises
from hscommon.testutil import eq_ from hscommon.testutil import eq_
from ..ignore import IgnoreList from core.ignore import IgnoreList
def test_empty(): def test_empty():
il = IgnoreList() il = IgnoreList()
eq_(0, len(il)) eq_(0, len(il))
assert not il.AreIgnored('foo', 'bar') assert not il.are_ignored("foo", "bar")
def test_simple(): def test_simple():
il = IgnoreList() il = IgnoreList()
il.Ignore('foo', 'bar') il.ignore("foo", "bar")
assert il.AreIgnored('foo', 'bar') assert il.are_ignored("foo", "bar")
assert il.AreIgnored('bar', 'foo') assert il.are_ignored("bar", "foo")
assert not il.AreIgnored('foo', 'bleh') assert not il.are_ignored("foo", "bleh")
assert not il.AreIgnored('bleh', 'bar') assert not il.are_ignored("bleh", "bar")
eq_(1, len(il)) eq_(1, len(il))
def test_multiple(): def test_multiple():
il = IgnoreList() il = IgnoreList()
il.Ignore('foo', 'bar') il.ignore("foo", "bar")
il.Ignore('foo', 'bleh') il.ignore("foo", "bleh")
il.Ignore('bleh', 'bar') il.ignore("bleh", "bar")
il.Ignore('aybabtu', 'bleh') il.ignore("aybabtu", "bleh")
assert il.AreIgnored('foo', 'bar') assert il.are_ignored("foo", "bar")
assert il.AreIgnored('bar', 'foo') assert il.are_ignored("bar", "foo")
assert il.AreIgnored('foo', 'bleh') assert il.are_ignored("foo", "bleh")
assert il.AreIgnored('bleh', 'bar') assert il.are_ignored("bleh", "bar")
assert not il.AreIgnored('aybabtu', 'bar') assert not il.are_ignored("aybabtu", "bar")
eq_(4, len(il)) eq_(4, len(il))
def test_clear(): def test_clear():
il = IgnoreList() il = IgnoreList()
il.Ignore('foo', 'bar') il.ignore("foo", "bar")
il.Clear() il.clear()
assert not il.AreIgnored('foo', 'bar') assert not il.are_ignored("foo", "bar")
assert not il.AreIgnored('bar', 'foo') assert not il.are_ignored("bar", "foo")
eq_(0, len(il)) eq_(0, len(il))
def test_add_same_twice(): def test_add_same_twice():
il = IgnoreList() il = IgnoreList()
il.Ignore('foo', 'bar') il.ignore("foo", "bar")
il.Ignore('bar', 'foo') il.ignore("bar", "foo")
eq_(1, len(il)) eq_(1, len(il))
def test_save_to_xml(): def test_save_to_xml():
il = IgnoreList() il = IgnoreList()
il.Ignore('foo', 'bar') il.ignore("foo", "bar")
il.Ignore('foo', 'bleh') il.ignore("foo", "bleh")
il.Ignore('bleh', 'bar') il.ignore("bleh", "bar")
f = io.BytesIO() f = io.BytesIO()
il.save_to_xml(f) il.save_to_xml(f)
f.seek(0) f.seek(0)
doc = ET.parse(f) doc = ET.parse(f)
root = doc.getroot() root = doc.getroot()
eq_(root.tag, 'ignore_list') eq_(root.tag, "ignore_list")
eq_(len(root), 2) eq_(len(root), 2)
eq_(len([c for c in root if c.tag == 'file']), 2) eq_(len([c for c in root if c.tag == "file"]), 2)
f1, f2 = root[:] f1, f2 = root[:]
subchildren = [c for c in f1 if c.tag == 'file'] + [c for c in f2 if c.tag == 'file'] subchildren = [c for c in f1 if c.tag == "file"] + [c for c in f2 if c.tag == "file"]
eq_(len(subchildren), 3) eq_(len(subchildren), 3)
def test_SaveThenLoad():
def test_save_then_load():
il = IgnoreList() il = IgnoreList()
il.Ignore('foo', 'bar') il.ignore("foo", "bar")
il.Ignore('foo', 'bleh') il.ignore("foo", "bleh")
il.Ignore('bleh', 'bar') il.ignore("bleh", "bar")
il.Ignore('\u00e9', 'bar') il.ignore("\u00e9", "bar")
f = io.BytesIO() f = io.BytesIO()
il.save_to_xml(f) il.save_to_xml(f)
f.seek(0) f.seek(0)
il = IgnoreList() il = IgnoreList()
il.load_from_xml(f) il.load_from_xml(f)
eq_(4, len(il)) eq_(4, len(il))
assert il.AreIgnored('\u00e9', 'bar') assert il.are_ignored("\u00e9", "bar")
def test_LoadXML_with_empty_file_tags():
def test_load_xml_with_empty_file_tags():
f = io.BytesIO() f = io.BytesIO()
f.write(b'<?xml version="1.0" encoding="utf-8"?><ignore_list><file><file/></file></ignore_list>') f.write(b'<?xml version="1.0" encoding="utf-8"?><ignore_list><file><file/></file></ignore_list>')
f.seek(0) f.seek(0)
@@ -92,72 +100,80 @@ def test_LoadXML_with_empty_file_tags():
il.load_from_xml(f) il.load_from_xml(f)
eq_(0, len(il)) eq_(0, len(il))
def test_AreIgnore_works_when_a_child_is_a_key_somewhere_else():
def test_are_ignore_works_when_a_child_is_a_key_somewhere_else():
il = IgnoreList() il = IgnoreList()
il.Ignore('foo', 'bar') il.ignore("foo", "bar")
il.Ignore('bar', 'baz') il.ignore("bar", "baz")
assert il.AreIgnored('bar', 'foo') assert il.are_ignored("bar", "foo")
def test_no_dupes_when_a_child_is_a_key_somewhere_else(): def test_no_dupes_when_a_child_is_a_key_somewhere_else():
il = IgnoreList() il = IgnoreList()
il.Ignore('foo', 'bar') il.ignore("foo", "bar")
il.Ignore('bar', 'baz') il.ignore("bar", "baz")
il.Ignore('bar', 'foo') il.ignore("bar", "foo")
eq_(2, len(il)) eq_(2, len(il))
def test_iterate(): def test_iterate():
#It must be possible to iterate through ignore list # It must be possible to iterate through ignore list
il = IgnoreList() il = IgnoreList()
expected = [('foo', 'bar'), ('bar', 'baz'), ('foo', 'baz')] expected = [("foo", "bar"), ("bar", "baz"), ("foo", "baz")]
for i in expected: for i in expected:
il.Ignore(i[0], i[1]) il.ignore(i[0], i[1])
for i in il: for i in il:
expected.remove(i) #No exception should be raised expected.remove(i) # No exception should be raised
assert not expected #expected should be empty assert not expected # expected should be empty
def test_filter(): def test_filter():
il = IgnoreList() il = IgnoreList()
il.Ignore('foo', 'bar') il.ignore("foo", "bar")
il.Ignore('bar', 'baz') il.ignore("bar", "baz")
il.Ignore('foo', 'baz') il.ignore("foo", "baz")
il.Filter(lambda f, s: f == 'bar') il.filter(lambda f, s: f == "bar")
eq_(1, len(il)) eq_(1, len(il))
assert not il.AreIgnored('foo', 'bar') assert not il.are_ignored("foo", "bar")
assert il.AreIgnored('bar', 'baz') assert il.are_ignored("bar", "baz")
def test_save_with_non_ascii_items(): def test_save_with_non_ascii_items():
il = IgnoreList() il = IgnoreList()
il.Ignore('\xac', '\xbf') il.ignore("\xac", "\xbf")
f = io.BytesIO() f = io.BytesIO()
try: try:
il.save_to_xml(f) il.save_to_xml(f)
except Exception as e: except Exception as e:
raise AssertionError(str(e)) raise AssertionError(str(e))
def test_len(): def test_len():
il = IgnoreList() il = IgnoreList()
eq_(0, len(il)) eq_(0, len(il))
il.Ignore('foo', 'bar') il.ignore("foo", "bar")
eq_(1, len(il)) eq_(1, len(il))
def test_nonzero(): def test_nonzero():
il = IgnoreList() il = IgnoreList()
assert not il assert not il
il.Ignore('foo', 'bar') il.ignore("foo", "bar")
assert il assert il
def test_remove(): def test_remove():
il = IgnoreList() il = IgnoreList()
il.Ignore('foo', 'bar') il.ignore("foo", "bar")
il.Ignore('foo', 'baz') il.ignore("foo", "baz")
il.remove('bar', 'foo') il.remove("bar", "foo")
eq_(len(il), 1) eq_(len(il), 1)
assert not il.AreIgnored('foo', 'bar') assert not il.are_ignored("foo", "bar")
def test_remove_non_existant(): def test_remove_non_existant():
il = IgnoreList() il = IgnoreList()
il.Ignore('foo', 'bar') il.ignore("foo", "bar")
il.Ignore('foo', 'baz') il.ignore("foo", "baz")
with raises(ValueError): with raises(ValueError):
il.remove('foo', 'bleh') il.remove("foo", "bleh")

View File

@@ -6,35 +6,41 @@
from hscommon.testutil import eq_ from hscommon.testutil import eq_
from ..markable import MarkableList, Markable from core.markable import MarkableList, Markable
def gen(): def gen():
ml = MarkableList() ml = MarkableList()
ml.extend(list(range(10))) ml.extend(list(range(10)))
return ml return ml
def test_unmarked(): def test_unmarked():
ml = gen() ml = gen()
for i in ml: for i in ml:
assert not ml.is_marked(i) assert not ml.is_marked(i)
def test_mark(): def test_mark():
ml = gen() ml = gen()
assert ml.mark(3) assert ml.mark(3)
assert ml.is_marked(3) assert ml.is_marked(3)
assert not ml.is_marked(2) assert not ml.is_marked(2)
def test_unmark(): def test_unmark():
ml = gen() ml = gen()
ml.mark(4) ml.mark(4)
assert ml.unmark(4) assert ml.unmark(4)
assert not ml.is_marked(4) assert not ml.is_marked(4)
def test_unmark_unmarked(): def test_unmark_unmarked():
ml = gen() ml = gen()
assert not ml.unmark(4) assert not ml.unmark(4)
assert not ml.is_marked(4) assert not ml.is_marked(4)
def test_mark_twice_and_unmark(): def test_mark_twice_and_unmark():
ml = gen() ml = gen()
assert ml.mark(5) assert ml.mark(5)
@@ -42,6 +48,7 @@ def test_mark_twice_and_unmark():
ml.unmark(5) ml.unmark(5)
assert not ml.is_marked(5) assert not ml.is_marked(5)
def test_mark_toggle(): def test_mark_toggle():
ml = gen() ml = gen()
ml.mark_toggle(6) ml.mark_toggle(6)
@@ -51,22 +58,25 @@ def test_mark_toggle():
ml.mark_toggle(6) ml.mark_toggle(6)
assert ml.is_marked(6) assert ml.is_marked(6)
def test_is_markable(): def test_is_markable():
class Foobar(Markable): class Foobar(Markable):
def _is_markable(self, o): def _is_markable(self, o):
return o == 'foobar' return o == "foobar"
f = Foobar() f = Foobar()
assert not f.is_marked('foobar') assert not f.is_marked("foobar")
assert not f.mark('foo') assert not f.mark("foo")
assert not f.is_marked('foo') assert not f.is_marked("foo")
f.mark_toggle('foo') f.mark_toggle("foo")
assert not f.is_marked('foo') assert not f.is_marked("foo")
f.mark('foobar') f.mark("foobar")
assert f.is_marked('foobar') assert f.is_marked("foobar")
ml = gen() ml = gen()
ml.mark(11) ml.mark(11)
assert not ml.is_marked(11) assert not ml.is_marked(11)
def test_change_notifications(): def test_change_notifications():
class Foobar(Markable): class Foobar(Markable):
def _did_mark(self, o): def _did_mark(self, o):
@@ -77,13 +87,14 @@ def test_change_notifications():
f = Foobar() f = Foobar()
f.log = [] f.log = []
f.mark('foo') f.mark("foo")
f.mark('foo') f.mark("foo")
f.mark_toggle('bar') f.mark_toggle("bar")
f.unmark('foo') f.unmark("foo")
f.unmark('foo') f.unmark("foo")
f.mark_toggle('bar') f.mark_toggle("bar")
eq_([(True, 'foo'), (True, 'bar'), (False, 'foo'), (False, 'bar')], f.log) eq_([(True, "foo"), (True, "bar"), (False, "foo"), (False, "bar")], f.log)
def test_mark_count(): def test_mark_count():
ml = gen() ml = gen()
@@ -93,6 +104,7 @@ def test_mark_count():
ml.mark(11) ml.mark(11)
eq_(1, ml.mark_count) eq_(1, ml.mark_count)
def test_mark_none(): def test_mark_none():
log = [] log = []
ml = gen() ml = gen()
@@ -104,6 +116,7 @@ def test_mark_none():
eq_(0, ml.mark_count) eq_(0, ml.mark_count)
eq_([1, 2], log) eq_([1, 2], log)
def test_mark_all(): def test_mark_all():
ml = gen() ml = gen()
eq_(0, ml.mark_count) eq_(0, ml.mark_count)
@@ -111,6 +124,7 @@ def test_mark_all():
eq_(10, ml.mark_count) eq_(10, ml.mark_count)
assert ml.is_marked(1) assert ml.is_marked(1)
def test_mark_invert(): def test_mark_invert():
ml = gen() ml = gen()
ml.mark(1) ml.mark(1)
@@ -118,6 +132,7 @@ def test_mark_invert():
assert not ml.is_marked(1) assert not ml.is_marked(1)
assert ml.is_marked(2) assert ml.is_marked(2)
def test_mark_while_inverted(): def test_mark_while_inverted():
log = [] log = []
ml = gen() ml = gen()
@@ -134,6 +149,7 @@ def test_mark_while_inverted():
eq_(7, ml.mark_count) eq_(7, ml.mark_count)
eq_([(True, 1), (False, 1), (True, 2), (True, 1), (True, 3)], log) eq_([(True, 1), (False, 1), (True, 2), (True, 1), (True, 3)], log)
def test_remove_mark_flag(): def test_remove_mark_flag():
ml = gen() ml = gen()
ml.mark(1) ml.mark(1)
@@ -145,10 +161,12 @@ def test_remove_mark_flag():
ml._remove_mark_flag(1) ml._remove_mark_flag(1)
assert ml.is_marked(1) assert ml.is_marked(1)
def test_is_marked_returns_false_if_object_not_markable(): def test_is_marked_returns_false_if_object_not_markable():
class MyMarkableList(MarkableList): class MyMarkableList(MarkableList):
def _is_markable(self, o): def _is_markable(self, o):
return o != 4 return o != 4
ml = MyMarkableList() ml = MyMarkableList()
ml.extend(list(range(10))) ml.extend(list(range(10)))
ml.mark_invert() ml.mark_invert()

View File

@@ -1,19 +1,20 @@
# Created By: Virgil Dupras # Created By: Virgil Dupras
# Created On: 2011/09/07 # Created On: 2011/09/07
# Copyright 2015 Hardcoded Software (http://www.hardcoded.net) # Copyright 2015 Hardcoded Software (http://www.hardcoded.net)
# #
# This software is licensed under the "GPLv3" License as described in the "LICENSE" file, # This software is licensed under the "GPLv3" License as described in the "LICENSE" file,
# which should be included with this package. The terms are also available at # which should be included with this package. The terms are also available at
# http://www.gnu.org/licenses/gpl-3.0.html # http://www.gnu.org/licenses/gpl-3.0.html
import os.path as op import os.path as op
from itertools import combinations from itertools import combinations
from .base import TestApp, NamedObject, with_app, eq_ from core.tests.base import TestApp, NamedObject, with_app, eq_
from ..engine import Group, Match from core.engine import Group, Match
no = NamedObject no = NamedObject
def app_with_dupes(dupes): def app_with_dupes(dupes):
# Creates an app with specified dupes. dupes is a list of lists, each list in the list being # Creates an app with specified dupes. dupes is a list of lists, each list in the list being
# a dupe group. We cheat a little bit by creating dupe groups manually instead of running a # a dupe group. We cheat a little bit by creating dupe groups manually instead of running a
@@ -29,57 +30,63 @@ def app_with_dupes(dupes):
app.app._results_changed() app.app._results_changed()
return app return app
#---
# ---
def app_normal_results(): def app_normal_results():
# Just some results, with different extensions and size, for good measure. # Just some results, with different extensions and size, for good measure.
dupes = [ dupes = [
[ [
no('foo1.ext1', size=1, folder='folder1'), no("foo1.ext1", size=1, folder="folder1"),
no('foo2.ext2', size=2, folder='folder2') no("foo2.ext2", size=2, folder="folder2"),
], ],
] ]
return app_with_dupes(dupes) return app_with_dupes(dupes)
@with_app(app_normal_results) @with_app(app_normal_results)
def test_kind_subcrit(app): def test_kind_subcrit(app):
# The subcriteria of the "Kind" criteria is a list of extensions contained in the dupes. # The subcriteria of the "Kind" criteria is a list of extensions contained in the dupes.
app.select_pri_criterion("Kind") app.select_pri_criterion("Kind")
eq_(app.pdialog.criteria_list[:], ['ext1', 'ext2']) eq_(app.pdialog.criteria_list[:], ["ext1", "ext2"])
@with_app(app_normal_results) @with_app(app_normal_results)
def test_kind_reprioritization(app): def test_kind_reprioritization(app):
# Just a simple test of the system as a whole. # Just a simple test of the system as a whole.
# select a criterion, and perform re-prioritization and see if it worked. # select a criterion, and perform re-prioritization and see if it worked.
app.select_pri_criterion("Kind") app.select_pri_criterion("Kind")
app.pdialog.criteria_list.select([1]) # ext2 app.pdialog.criteria_list.select([1]) # ext2
app.pdialog.add_selected() app.pdialog.add_selected()
app.pdialog.perform_reprioritization() app.pdialog.perform_reprioritization()
eq_(app.rtable[0].data['name'], 'foo2.ext2') eq_(app.rtable[0].data["name"], "foo2.ext2")
@with_app(app_normal_results) @with_app(app_normal_results)
def test_folder_subcrit(app): def test_folder_subcrit(app):
app.select_pri_criterion("Folder") app.select_pri_criterion("Folder")
eq_(app.pdialog.criteria_list[:], ['folder1', 'folder2']) eq_(app.pdialog.criteria_list[:], ["folder1", "folder2"])
@with_app(app_normal_results) @with_app(app_normal_results)
def test_folder_reprioritization(app): def test_folder_reprioritization(app):
app.select_pri_criterion("Folder") app.select_pri_criterion("Folder")
app.pdialog.criteria_list.select([1]) # folder2 app.pdialog.criteria_list.select([1]) # folder2
app.pdialog.add_selected() app.pdialog.add_selected()
app.pdialog.perform_reprioritization() app.pdialog.perform_reprioritization()
eq_(app.rtable[0].data['name'], 'foo2.ext2') eq_(app.rtable[0].data["name"], "foo2.ext2")
@with_app(app_normal_results) @with_app(app_normal_results)
def test_prilist_display(app): def test_prilist_display(app):
# The prioritization list displays selected criteria correctly. # The prioritization list displays selected criteria correctly.
app.select_pri_criterion("Kind") app.select_pri_criterion("Kind")
app.pdialog.criteria_list.select([1]) # ext2 app.pdialog.criteria_list.select([1]) # ext2
app.pdialog.add_selected() app.pdialog.add_selected()
app.select_pri_criterion("Folder") app.select_pri_criterion("Folder")
app.pdialog.criteria_list.select([1]) # folder2 app.pdialog.criteria_list.select([1]) # folder2
app.pdialog.add_selected() app.pdialog.add_selected()
app.select_pri_criterion("Size") app.select_pri_criterion("Size")
app.pdialog.criteria_list.select([1]) # Lowest app.pdialog.criteria_list.select([1]) # Lowest
app.pdialog.add_selected() app.pdialog.add_selected()
expected = [ expected = [
"Kind (ext2)", "Kind (ext2)",
@@ -88,23 +95,26 @@ def test_prilist_display(app):
] ]
eq_(app.pdialog.prioritization_list[:], expected) eq_(app.pdialog.prioritization_list[:], expected)
@with_app(app_normal_results) @with_app(app_normal_results)
def test_size_subcrit(app): def test_size_subcrit(app):
app.select_pri_criterion("Size") app.select_pri_criterion("Size")
eq_(app.pdialog.criteria_list[:], ['Highest', 'Lowest']) eq_(app.pdialog.criteria_list[:], ["Highest", "Lowest"])
@with_app(app_normal_results) @with_app(app_normal_results)
def test_size_reprioritization(app): def test_size_reprioritization(app):
app.select_pri_criterion("Size") app.select_pri_criterion("Size")
app.pdialog.criteria_list.select([0]) # highest app.pdialog.criteria_list.select([0]) # highest
app.pdialog.add_selected() app.pdialog.add_selected()
app.pdialog.perform_reprioritization() app.pdialog.perform_reprioritization()
eq_(app.rtable[0].data['name'], 'foo2.ext2') eq_(app.rtable[0].data["name"], "foo2.ext2")
@with_app(app_normal_results) @with_app(app_normal_results)
def test_reorder_prioritizations(app): def test_reorder_prioritizations(app):
app.add_pri_criterion("Kind", 0) # ext1 app.add_pri_criterion("Kind", 0) # ext1
app.add_pri_criterion("Kind", 1) # ext2 app.add_pri_criterion("Kind", 1) # ext2
app.pdialog.prioritization_list.move_indexes([1], 0) app.pdialog.prioritization_list.move_indexes([1], 0)
expected = [ expected = [
"Kind (ext2)", "Kind (ext2)",
@@ -112,6 +122,7 @@ def test_reorder_prioritizations(app):
] ]
eq_(app.pdialog.prioritization_list[:], expected) eq_(app.pdialog.prioritization_list[:], expected)
@with_app(app_normal_results) @with_app(app_normal_results)
def test_remove_crit_from_list(app): def test_remove_crit_from_list(app):
app.add_pri_criterion("Kind", 0) app.add_pri_criterion("Kind", 0)
@@ -123,75 +134,72 @@ def test_remove_crit_from_list(app):
] ]
eq_(app.pdialog.prioritization_list[:], expected) eq_(app.pdialog.prioritization_list[:], expected)
@with_app(app_normal_results) @with_app(app_normal_results)
def test_add_crit_without_selection(app): def test_add_crit_without_selection(app):
# Adding a criterion without having made a selection doesn't cause a crash. # Adding a criterion without having made a selection doesn't cause a crash.
app.pdialog.add_selected() # no crash app.pdialog.add_selected() # no crash
#---
# ---
def app_one_name_ends_with_number(): def app_one_name_ends_with_number():
dupes = [ dupes = [
[ [no("foo.ext"), no("foo1.ext")],
no('foo.ext'),
no('foo1.ext'),
],
] ]
return app_with_dupes(dupes) return app_with_dupes(dupes)
@with_app(app_one_name_ends_with_number) @with_app(app_one_name_ends_with_number)
def test_filename_reprioritization(app): def test_filename_reprioritization(app):
app.add_pri_criterion("Filename", 0) # Ends with a number app.add_pri_criterion("Filename", 0) # Ends with a number
app.pdialog.perform_reprioritization() app.pdialog.perform_reprioritization()
eq_(app.rtable[0].data['name'], 'foo1.ext') eq_(app.rtable[0].data["name"], "foo1.ext")
#---
# ---
def app_with_subfolders(): def app_with_subfolders():
dupes = [ dupes = [
[ [no("foo1", folder="baz"), no("foo2", folder="foo/bar")],
no('foo1', folder='baz'), [no("foo3", folder="baz"), no("foo4", folder="foo")],
no('foo2', folder='foo/bar'),
],
[
no('foo3', folder='baz'),
no('foo4', folder='foo'),
],
] ]
return app_with_dupes(dupes) return app_with_dupes(dupes)
@with_app(app_with_subfolders) @with_app(app_with_subfolders)
def test_folder_crit_is_sorted(app): def test_folder_crit_is_sorted(app):
# Folder subcriteria are sorted. # Folder subcriteria are sorted.
app.select_pri_criterion("Folder") app.select_pri_criterion("Folder")
eq_(app.pdialog.criteria_list[:], ['baz', 'foo', op.join('foo', 'bar')]) eq_(app.pdialog.criteria_list[:], ["baz", "foo", op.join("foo", "bar")])
@with_app(app_with_subfolders) @with_app(app_with_subfolders)
def test_folder_crit_includes_subfolders(app): def test_folder_crit_includes_subfolders(app):
# When selecting a folder crit, dupes in a subfolder are also considered as affected by that # When selecting a folder crit, dupes in a subfolder are also considered as affected by that
# crit. # crit.
app.add_pri_criterion("Folder", 1) # foo app.add_pri_criterion("Folder", 1) # foo
app.pdialog.perform_reprioritization() app.pdialog.perform_reprioritization()
# Both foo and foo/bar dupes will be prioritized # Both foo and foo/bar dupes will be prioritized
eq_(app.rtable[0].data['name'], 'foo2') eq_(app.rtable[0].data["name"], "foo2")
eq_(app.rtable[2].data['name'], 'foo4') eq_(app.rtable[2].data["name"], "foo4")
@with_app(app_with_subfolders) @with_app(app_with_subfolders)
def test_display_something_on_empty_extensions(app): def test_display_something_on_empty_extensions(app):
# When there's no extension, display "None" instead of nothing at all. # When there's no extension, display "None" instead of nothing at all.
app.select_pri_criterion("Kind") app.select_pri_criterion("Kind")
eq_(app.pdialog.criteria_list[:], ['None']) eq_(app.pdialog.criteria_list[:], ["None"])
#---
# ---
def app_one_name_longer_than_the_other(): def app_one_name_longer_than_the_other():
dupes = [ dupes = [
[ [no("shortest.ext"), no("loooongest.ext")],
no('shortest.ext'),
no('loooongest.ext'),
],
] ]
return app_with_dupes(dupes) return app_with_dupes(dupes)
@with_app(app_one_name_longer_than_the_other) @with_app(app_one_name_longer_than_the_other)
def test_longest_filename_prioritization(app): def test_longest_filename_prioritization(app):
app.add_pri_criterion("Filename", 2) # Longest app.add_pri_criterion("Filename", 2) # Longest
app.pdialog.perform_reprioritization() app.pdialog.perform_reprioritization()
eq_(app.rtable[0].data['name'], 'loooongest.ext') eq_(app.rtable[0].data["name"], "loooongest.ext")

View File

@@ -1,12 +1,13 @@
# Created By: Virgil Dupras # Created By: Virgil Dupras
# Created On: 2013-07-28 # Created On: 2013-07-28
# Copyright 2015 Hardcoded Software (http://www.hardcoded.net) # Copyright 2015 Hardcoded Software (http://www.hardcoded.net)
# #
# This software is licensed under the "GPLv3" License as described in the "LICENSE" file, # This software is licensed under the "GPLv3" License as described in the "LICENSE" file,
# which should be included with this package. The terms are also available at # which should be included with this package. The terms are also available at
# http://www.gnu.org/licenses/gpl-3.0.html # http://www.gnu.org/licenses/gpl-3.0.html
from .base import TestApp, GetTestGroups from core.tests.base import TestApp, GetTestGroups
def app_with_results(): def app_with_results():
app = TestApp() app = TestApp()
@@ -15,23 +16,26 @@ def app_with_results():
app.rtable.refresh() app.rtable.refresh()
return app return app
def test_delta_flags_delta_mode_off(): def test_delta_flags_delta_mode_off():
app = app_with_results() app = app_with_results()
# When the delta mode is off, we never have delta values flags # When the delta mode is off, we never have delta values flags
app.rtable.delta_values = False app.rtable.delta_values = False
# Ref file, always false anyway # Ref file, always false anyway
assert not app.rtable[0].is_cell_delta('size') assert not app.rtable[0].is_cell_delta("size")
# False because delta mode is off # False because delta mode is off
assert not app.rtable[1].is_cell_delta('size') assert not app.rtable[1].is_cell_delta("size")
def test_delta_flags_delta_mode_on_delta_columns(): def test_delta_flags_delta_mode_on_delta_columns():
# When the delta mode is on, delta columns always have a delta flag, except for ref rows # When the delta mode is on, delta columns always have a delta flag, except for ref rows
app = app_with_results() app = app_with_results()
app.rtable.delta_values = True app.rtable.delta_values = True
# Ref file, always false anyway # Ref file, always false anyway
assert not app.rtable[0].is_cell_delta('size') assert not app.rtable[0].is_cell_delta("size")
# But for a dupe, the flag is on # But for a dupe, the flag is on
assert app.rtable[1].is_cell_delta('size') assert app.rtable[1].is_cell_delta("size")
def test_delta_flags_delta_mode_on_non_delta_columns(): def test_delta_flags_delta_mode_on_non_delta_columns():
# When the delta mode is on, non-delta columns have a delta flag if their value differs from # When the delta mode is on, non-delta columns have a delta flag if their value differs from
@@ -39,11 +43,12 @@ def test_delta_flags_delta_mode_on_non_delta_columns():
app = app_with_results() app = app_with_results()
app.rtable.delta_values = True app.rtable.delta_values = True
# "bar bleh" != "foo bar", flag on # "bar bleh" != "foo bar", flag on
assert app.rtable[1].is_cell_delta('name') assert app.rtable[1].is_cell_delta("name")
# "ibabtu" row, but it's a ref, flag off # "ibabtu" row, but it's a ref, flag off
assert not app.rtable[3].is_cell_delta('name') assert not app.rtable[3].is_cell_delta("name")
# "ibabtu" == "ibabtu", flag off # "ibabtu" == "ibabtu", flag off
assert not app.rtable[4].is_cell_delta('name') assert not app.rtable[4].is_cell_delta("name")
def test_delta_flags_delta_mode_on_non_delta_columns_case_insensitive(): def test_delta_flags_delta_mode_on_non_delta_columns_case_insensitive():
# Comparison that occurs for non-numeric columns to check whether they're delta is case # Comparison that occurs for non-numeric columns to check whether they're delta is case
@@ -53,4 +58,4 @@ def test_delta_flags_delta_mode_on_non_delta_columns_case_insensitive():
app.app.results.groups[1].dupes[0].name = "IBaBTU" app.app.results.groups[1].dupes[0].name = "IBaBTU"
app.rtable.delta_values = True app.rtable.delta_values = True
# "ibAbtu" == "IBaBTU", flag off # "ibAbtu" == "IBaBTU", flag off
assert not app.rtable[4].is_cell_delta('name') assert not app.rtable[4].is_cell_delta("name")

View File

@@ -12,10 +12,10 @@ from xml.etree import ElementTree as ET
from pytest import raises from pytest import raises
from hscommon.testutil import eq_ from hscommon.testutil import eq_
from hscommon.util import first from hscommon.util import first
from core import engine
from core.tests.base import NamedObject, GetTestGroups, DupeGuru
from core.results import Results
from .. import engine
from .base import NamedObject, GetTestGroups, DupeGuru
from ..results import Results
class TestCaseResultsEmpty: class TestCaseResultsEmpty:
def setup_method(self, method): def setup_method(self, method):
@@ -24,8 +24,8 @@ class TestCaseResultsEmpty:
def test_apply_invalid_filter(self): def test_apply_invalid_filter(self):
# If the applied filter is an invalid regexp, just ignore the filter. # If the applied filter is an invalid regexp, just ignore the filter.
self.results.apply_filter('[') # invalid self.results.apply_filter("[") # invalid
self.test_stat_line() # make sure that the stats line isn't saying we applied a '[' filter self.test_stat_line() # make sure that the stats line isn't saying we applied a '[' filter
def test_stat_line(self): def test_stat_line(self):
eq_("0 / 0 (0.00 B / 0.00 B) duplicates marked.", self.results.stat_line) eq_("0 / 0 (0.00 B / 0.00 B) duplicates marked.", self.results.stat_line)
@@ -34,7 +34,7 @@ class TestCaseResultsEmpty:
eq_(0, len(self.results.groups)) eq_(0, len(self.results.groups))
def test_get_group_of_duplicate(self): def test_get_group_of_duplicate(self):
assert self.results.get_group_of_duplicate('foo') is None assert self.results.get_group_of_duplicate("foo") is None
def test_save_to_xml(self): def test_save_to_xml(self):
f = io.BytesIO() f = io.BytesIO()
@@ -42,7 +42,7 @@ class TestCaseResultsEmpty:
f.seek(0) f.seek(0)
doc = ET.parse(f) doc = ET.parse(f)
root = doc.getroot() root = doc.getroot()
eq_('results', root.tag) eq_("results", root.tag)
def test_is_modified(self): def test_is_modified(self):
assert not self.results.is_modified assert not self.results.is_modified
@@ -59,10 +59,10 @@ class TestCaseResultsEmpty:
# would have been some kind of feedback to the user, but the work involved for something # would have been some kind of feedback to the user, but the work involved for something
# that simply never happens (I never received a report of this crash, I experienced it # that simply never happens (I never received a report of this crash, I experienced it
# while fooling around) is too much. Instead, use standard name conflict resolution. # while fooling around) is too much. Instead, use standard name conflict resolution.
folderpath = tmpdir.join('foo') folderpath = tmpdir.join("foo")
folderpath.mkdir() folderpath.mkdir()
self.results.save_to_xml(str(folderpath)) # no crash self.results.save_to_xml(str(folderpath)) # no crash
assert tmpdir.join('[000] foo').check() assert tmpdir.join("[000] foo").check()
class TestCaseResultsWithSomeGroups: class TestCaseResultsWithSomeGroups:
@@ -116,18 +116,18 @@ class TestCaseResultsWithSomeGroups:
assert d is g.ref assert d is g.ref
def test_sort_groups(self): def test_sort_groups(self):
self.results.make_ref(self.objects[1]) #We want to make the 1024 sized object to go ref. self.results.make_ref(self.objects[1]) # We want to make the 1024 sized object to go ref.
g1, g2 = self.groups g1, g2 = self.groups
self.results.sort_groups('size') self.results.sort_groups("size")
assert self.results.groups[0] is g2 assert self.results.groups[0] is g2
assert self.results.groups[1] is g1 assert self.results.groups[1] is g1
self.results.sort_groups('size', False) self.results.sort_groups("size", False)
assert self.results.groups[0] is g1 assert self.results.groups[0] is g1
assert self.results.groups[1] is g2 assert self.results.groups[1] is g2
def test_set_groups_when_sorted(self): def test_set_groups_when_sorted(self):
self.results.make_ref(self.objects[1]) #We want to make the 1024 sized object to go ref. self.results.make_ref(self.objects[1]) # We want to make the 1024 sized object to go ref.
self.results.sort_groups('size') self.results.sort_groups("size")
objects, matches, groups = GetTestGroups() objects, matches, groups = GetTestGroups()
g1, g2 = groups g1, g2 = groups
g1.switch_ref(objects[1]) g1.switch_ref(objects[1])
@@ -158,9 +158,9 @@ class TestCaseResultsWithSomeGroups:
o3.size = 3 o3.size = 3
o4.size = 2 o4.size = 2
o5.size = 1 o5.size = 1
self.results.sort_dupes('size') self.results.sort_dupes("size")
eq_([o5, o3, o2], self.results.dupes) eq_([o5, o3, o2], self.results.dupes)
self.results.sort_dupes('size', False) self.results.sort_dupes("size", False)
eq_([o2, o3, o5], self.results.dupes) eq_([o2, o3, o5], self.results.dupes)
def test_dupe_list_remember_sort(self): def test_dupe_list_remember_sort(self):
@@ -170,25 +170,25 @@ class TestCaseResultsWithSomeGroups:
o3.size = 3 o3.size = 3
o4.size = 2 o4.size = 2
o5.size = 1 o5.size = 1
self.results.sort_dupes('size') self.results.sort_dupes("size")
self.results.make_ref(o2) self.results.make_ref(o2)
eq_([o5, o3, o1], self.results.dupes) eq_([o5, o3, o1], self.results.dupes)
def test_dupe_list_sort_delta_values(self): def test_dupe_list_sort_delta_values(self):
o1, o2, o3, o4, o5 = self.objects o1, o2, o3, o4, o5 = self.objects
o1.size = 10 o1.size = 10
o2.size = 2 #-8 o2.size = 2 # -8
o3.size = 3 #-7 o3.size = 3 # -7
o4.size = 20 o4.size = 20
o5.size = 1 #-19 o5.size = 1 # -19
self.results.sort_dupes('size', delta=True) self.results.sort_dupes("size", delta=True)
eq_([o5, o2, o3], self.results.dupes) eq_([o5, o2, o3], self.results.dupes)
def test_sort_empty_list(self): def test_sort_empty_list(self):
#There was an infinite loop when sorting an empty list. # There was an infinite loop when sorting an empty list.
app = DupeGuru() app = DupeGuru()
r = app.results r = app.results
r.sort_dupes('name') r.sort_dupes("name")
eq_([], r.dupes) eq_([], r.dupes)
def test_dupe_list_update_on_remove_duplicates(self): def test_dupe_list_update_on_remove_duplicates(self):
@@ -209,7 +209,7 @@ class TestCaseResultsWithSomeGroups:
f = io.BytesIO() f = io.BytesIO()
self.results.save_to_xml(f) self.results.save_to_xml(f)
assert not self.results.is_modified assert not self.results.is_modified
self.results.groups = self.groups # sets the flag back self.results.groups = self.groups # sets the flag back
f.seek(0) f.seek(0)
self.results.load_from_xml(f, get_file) self.results.load_from_xml(f, get_file)
assert not self.results.is_modified assert not self.results.is_modified
@@ -236,7 +236,7 @@ class TestCaseResultsWithSomeGroups:
# "aaa" makes our dupe go first in alphabetical order, but since we have the same value as # "aaa" makes our dupe go first in alphabetical order, but since we have the same value as
# ref, we're going last. # ref, we're going last.
g2r.name = g2d1.name = "aaa" g2r.name = g2d1.name = "aaa"
self.results.sort_dupes('name', delta=True) self.results.sort_dupes("name", delta=True)
eq_("aaa", self.results.dupes[2].name) eq_("aaa", self.results.dupes[2].name)
def test_dupe_list_sort_delta_values_nonnumeric_case_insensitive(self): def test_dupe_list_sort_delta_values_nonnumeric_case_insensitive(self):
@@ -244,9 +244,10 @@ class TestCaseResultsWithSomeGroups:
g1r, g1d1, g1d2, g2r, g2d1 = self.objects g1r, g1d1, g1d2, g2r, g2d1 = self.objects
g2r.name = "AaA" g2r.name = "AaA"
g2d1.name = "aAa" g2d1.name = "aAa"
self.results.sort_dupes('name', delta=True) self.results.sort_dupes("name", delta=True)
eq_("aAa", self.results.dupes[2].name) eq_("aAa", self.results.dupes[2].name)
class TestCaseResultsWithSavedResults: class TestCaseResultsWithSavedResults:
def setup_method(self, method): def setup_method(self, method):
self.app = DupeGuru() self.app = DupeGuru()
@@ -266,7 +267,7 @@ class TestCaseResultsWithSavedResults:
def get_file(path): def get_file(path):
return [f for f in self.objects if str(f.path) == path][0] return [f for f in self.objects if str(f.path) == path][0]
self.results.groups = self.groups # sets the flag back self.results.groups = self.groups # sets the flag back
self.results.load_from_xml(self.f, get_file) self.results.load_from_xml(self.f, get_file)
assert not self.results.is_modified assert not self.results.is_modified
@@ -299,7 +300,7 @@ class TestCaseResultsMarkings:
self.results.mark(self.objects[2]) self.results.mark(self.objects[2])
self.results.mark(self.objects[4]) self.results.mark(self.objects[4])
eq_("2 / 3 (2.00 B / 1.01 KB) duplicates marked.", self.results.stat_line) eq_("2 / 3 (2.00 B / 1.01 KB) duplicates marked.", self.results.stat_line)
self.results.mark(self.objects[0]) #this is a ref, it can't be counted self.results.mark(self.objects[0]) # this is a ref, it can't be counted
eq_("2 / 3 (2.00 B / 1.01 KB) duplicates marked.", self.results.stat_line) eq_("2 / 3 (2.00 B / 1.01 KB) duplicates marked.", self.results.stat_line)
self.results.groups = self.groups self.results.groups = self.groups
eq_("0 / 3 (0.00 B / 1.01 KB) duplicates marked.", self.results.stat_line) eq_("0 / 3 (0.00 B / 1.01 KB) duplicates marked.", self.results.stat_line)
@@ -335,7 +336,7 @@ class TestCaseResultsMarkings:
def log_object(o): def log_object(o):
log.append(o) log.append(o)
if o is self.objects[1]: if o is self.objects[1]:
raise EnvironmentError('foobar') raise OSError("foobar")
log = [] log = []
self.results.mark_all() self.results.mark_all()
@@ -350,7 +351,7 @@ class TestCaseResultsMarkings:
eq_(len(self.results.problems), 1) eq_(len(self.results.problems), 1)
dupe, msg = self.results.problems[0] dupe, msg = self.results.problems[0]
assert dupe is self.objects[1] assert dupe is self.objects[1]
eq_(msg, 'foobar') eq_(msg, "foobar")
def test_perform_on_marked_with_ref(self): def test_perform_on_marked_with_ref(self):
def log_object(o): def log_object(o):
@@ -400,7 +401,7 @@ class TestCaseResultsMarkings:
self.results.make_ref(d) self.results.make_ref(d)
eq_("0 / 3 (0.00 B / 3.00 B) duplicates marked.", self.results.stat_line) eq_("0 / 3 (0.00 B / 3.00 B) duplicates marked.", self.results.stat_line)
def test_SaveXML(self): def test_save_xml(self):
self.results.mark(self.objects[1]) self.results.mark(self.objects[1])
self.results.mark_invert() self.results.mark_invert()
f = io.BytesIO() f = io.BytesIO()
@@ -408,20 +409,20 @@ class TestCaseResultsMarkings:
f.seek(0) f.seek(0)
doc = ET.parse(f) doc = ET.parse(f)
root = doc.getroot() root = doc.getroot()
g1, g2 = root.getiterator('group') g1, g2 = root.iter("group")
d1, d2, d3 = g1.getiterator('file') d1, d2, d3 = g1.iter("file")
eq_('n', d1.get('marked')) eq_("n", d1.get("marked"))
eq_('n', d2.get('marked')) eq_("n", d2.get("marked"))
eq_('y', d3.get('marked')) eq_("y", d3.get("marked"))
d1, d2 = g2.getiterator('file') d1, d2 = g2.iter("file")
eq_('n', d1.get('marked')) eq_("n", d1.get("marked"))
eq_('y', d2.get('marked')) eq_("y", d2.get("marked"))
def test_LoadXML(self): def test_load_xml(self):
def get_file(path): def get_file(path):
return [f for f in self.objects if str(f.path) == path][0] return [f for f in self.objects if str(f.path) == path][0]
self.objects[4].name = 'ibabtu 2' #we can't have 2 files with the same path self.objects[4].name = "ibabtu 2" # we can't have 2 files with the same path
self.results.mark(self.objects[1]) self.results.mark(self.objects[1])
self.results.mark_invert() self.results.mark_invert()
f = io.BytesIO() f = io.BytesIO()
@@ -444,51 +445,51 @@ class TestCaseResultsXML:
self.objects, self.matches, self.groups = GetTestGroups() self.objects, self.matches, self.groups = GetTestGroups()
self.results.groups = self.groups self.results.groups = self.groups
def get_file(self, path): # use this as a callback for load_from_xml def get_file(self, path): # use this as a callback for load_from_xml
return [o for o in self.objects if o.path == path][0] return [o for o in self.objects if str(o.path) == path][0]
def test_save_to_xml(self): def test_save_to_xml(self):
self.objects[0].is_ref = True self.objects[0].is_ref = True
self.objects[0].words = [['foo', 'bar']] self.objects[0].words = [["foo", "bar"]]
f = io.BytesIO() f = io.BytesIO()
self.results.save_to_xml(f) self.results.save_to_xml(f)
f.seek(0) f.seek(0)
doc = ET.parse(f) doc = ET.parse(f)
root = doc.getroot() root = doc.getroot()
eq_('results', root.tag) eq_("results", root.tag)
eq_(2, len(root)) eq_(2, len(root))
eq_(2, len([c for c in root if c.tag == 'group'])) eq_(2, len([c for c in root if c.tag == "group"]))
g1, g2 = root g1, g2 = root
eq_(6, len(g1)) eq_(6, len(g1))
eq_(3, len([c for c in g1 if c.tag == 'file'])) eq_(3, len([c for c in g1 if c.tag == "file"]))
eq_(3, len([c for c in g1 if c.tag == 'match'])) eq_(3, len([c for c in g1 if c.tag == "match"]))
d1, d2, d3 = [c for c in g1 if c.tag == 'file'] d1, d2, d3 = (c for c in g1 if c.tag == "file")
eq_(op.join('basepath', 'foo bar'), d1.get('path')) eq_(op.join("basepath", "foo bar"), d1.get("path"))
eq_(op.join('basepath', 'bar bleh'), d2.get('path')) eq_(op.join("basepath", "bar bleh"), d2.get("path"))
eq_(op.join('basepath', 'foo bleh'), d3.get('path')) eq_(op.join("basepath", "foo bleh"), d3.get("path"))
eq_('y', d1.get('is_ref')) eq_("y", d1.get("is_ref"))
eq_('n', d2.get('is_ref')) eq_("n", d2.get("is_ref"))
eq_('n', d3.get('is_ref')) eq_("n", d3.get("is_ref"))
eq_('foo,bar', d1.get('words')) eq_("foo,bar", d1.get("words"))
eq_('bar,bleh', d2.get('words')) eq_("bar,bleh", d2.get("words"))
eq_('foo,bleh', d3.get('words')) eq_("foo,bleh", d3.get("words"))
eq_(3, len(g2)) eq_(3, len(g2))
eq_(2, len([c for c in g2 if c.tag == 'file'])) eq_(2, len([c for c in g2 if c.tag == "file"]))
eq_(1, len([c for c in g2 if c.tag == 'match'])) eq_(1, len([c for c in g2 if c.tag == "match"]))
d1, d2 = [c for c in g2 if c.tag == 'file'] d1, d2 = (c for c in g2 if c.tag == "file")
eq_(op.join('basepath', 'ibabtu'), d1.get('path')) eq_(op.join("basepath", "ibabtu"), d1.get("path"))
eq_(op.join('basepath', 'ibabtu'), d2.get('path')) eq_(op.join("basepath", "ibabtu"), d2.get("path"))
eq_('n', d1.get('is_ref')) eq_("n", d1.get("is_ref"))
eq_('n', d2.get('is_ref')) eq_("n", d2.get("is_ref"))
eq_('ibabtu', d1.get('words')) eq_("ibabtu", d1.get("words"))
eq_('ibabtu', d2.get('words')) eq_("ibabtu", d2.get("words"))
def test_LoadXML(self): def test_load_xml(self):
def get_file(path): def get_file(path):
return [f for f in self.objects if str(f.path) == path][0] return [f for f in self.objects if str(f.path) == path][0]
self.objects[0].is_ref = True self.objects[0].is_ref = True
self.objects[4].name = 'ibabtu 2' #we can't have 2 files with the same path self.objects[4].name = "ibabtu 2" # we can't have 2 files with the same path
f = io.BytesIO() f = io.BytesIO()
self.results.save_to_xml(f) self.results.save_to_xml(f)
f.seek(0) f.seek(0)
@@ -504,36 +505,36 @@ class TestCaseResultsXML:
assert g1[0] is self.objects[0] assert g1[0] is self.objects[0]
assert g1[1] is self.objects[1] assert g1[1] is self.objects[1]
assert g1[2] is self.objects[2] assert g1[2] is self.objects[2]
eq_(['foo', 'bar'], g1[0].words) eq_(["foo", "bar"], g1[0].words)
eq_(['bar', 'bleh'], g1[1].words) eq_(["bar", "bleh"], g1[1].words)
eq_(['foo', 'bleh'], g1[2].words) eq_(["foo", "bleh"], g1[2].words)
eq_(2, len(g2)) eq_(2, len(g2))
assert not g2[0].is_ref assert not g2[0].is_ref
assert not g2[1].is_ref assert not g2[1].is_ref
assert g2[0] is self.objects[3] assert g2[0] is self.objects[3]
assert g2[1] is self.objects[4] assert g2[1] is self.objects[4]
eq_(['ibabtu'], g2[0].words) eq_(["ibabtu"], g2[0].words)
eq_(['ibabtu'], g2[1].words) eq_(["ibabtu"], g2[1].words)
def test_LoadXML_with_filename(self, tmpdir): def test_load_xml_with_filename(self, tmpdir):
def get_file(path): def get_file(path):
return [f for f in self.objects if str(f.path) == path][0] return [f for f in self.objects if str(f.path) == path][0]
filename = str(tmpdir.join('dupeguru_results.xml')) filename = str(tmpdir.join("dupeguru_results.xml"))
self.objects[4].name = 'ibabtu 2' #we can't have 2 files with the same path self.objects[4].name = "ibabtu 2" # we can't have 2 files with the same path
self.results.save_to_xml(filename) self.results.save_to_xml(filename)
app = DupeGuru() app = DupeGuru()
r = Results(app) r = Results(app)
r.load_from_xml(filename, get_file) r.load_from_xml(filename, get_file)
eq_(2, len(r.groups)) eq_(2, len(r.groups))
def test_LoadXML_with_some_files_that_dont_exist_anymore(self): def test_load_xml_with_some_files_that_dont_exist_anymore(self):
def get_file(path): def get_file(path):
if path.endswith('ibabtu 2'): if path.endswith("ibabtu 2"):
return None return None
return [f for f in self.objects if str(f.path) == path][0] return [f for f in self.objects if str(f.path) == path][0]
self.objects[4].name = 'ibabtu 2' #we can't have 2 files with the same path self.objects[4].name = "ibabtu 2" # we can't have 2 files with the same path
f = io.BytesIO() f = io.BytesIO()
self.results.save_to_xml(f) self.results.save_to_xml(f)
f.seek(0) f.seek(0)
@@ -543,40 +544,40 @@ class TestCaseResultsXML:
eq_(1, len(r.groups)) eq_(1, len(r.groups))
eq_(3, len(r.groups[0])) eq_(3, len(r.groups[0]))
def test_LoadXML_missing_attributes_and_bogus_elements(self): def test_load_xml_missing_attributes_and_bogus_elements(self):
def get_file(path): def get_file(path):
return [f for f in self.objects if str(f.path) == path][0] return [f for f in self.objects if str(f.path) == path][0]
root = ET.Element('foobar') #The root element shouldn't matter, really. root = ET.Element("foobar") # The root element shouldn't matter, really.
group_node = ET.SubElement(root, 'group') group_node = ET.SubElement(root, "group")
dupe_node = ET.SubElement(group_node, 'file') #Perfectly correct file dupe_node = ET.SubElement(group_node, "file") # Perfectly correct file
dupe_node.set('path', op.join('basepath', 'foo bar')) dupe_node.set("path", op.join("basepath", "foo bar"))
dupe_node.set('is_ref', 'y') dupe_node.set("is_ref", "y")
dupe_node.set('words', 'foo, bar') dupe_node.set("words", "foo, bar")
dupe_node = ET.SubElement(group_node, 'file') #is_ref missing, default to 'n' dupe_node = ET.SubElement(group_node, "file") # is_ref missing, default to 'n'
dupe_node.set('path', op.join('basepath', 'foo bleh')) dupe_node.set("path", op.join("basepath", "foo bleh"))
dupe_node.set('words', 'foo, bleh') dupe_node.set("words", "foo, bleh")
dupe_node = ET.SubElement(group_node, 'file') #words are missing, valid. dupe_node = ET.SubElement(group_node, "file") # words are missing, valid.
dupe_node.set('path', op.join('basepath', 'bar bleh')) dupe_node.set("path", op.join("basepath", "bar bleh"))
dupe_node = ET.SubElement(group_node, 'file') #path is missing, invalid. dupe_node = ET.SubElement(group_node, "file") # path is missing, invalid.
dupe_node.set('words', 'foo, bleh') dupe_node.set("words", "foo, bleh")
dupe_node = ET.SubElement(group_node, 'foobar') #Invalid element name dupe_node = ET.SubElement(group_node, "foobar") # Invalid element name
dupe_node.set('path', op.join('basepath', 'bar bleh')) dupe_node.set("path", op.join("basepath", "bar bleh"))
dupe_node.set('is_ref', 'y') dupe_node.set("is_ref", "y")
dupe_node.set('words', 'bar, bleh') dupe_node.set("words", "bar, bleh")
match_node = ET.SubElement(group_node, 'match') # match pointing to a bad index match_node = ET.SubElement(group_node, "match") # match pointing to a bad index
match_node.set('first', '42') match_node.set("first", "42")
match_node.set('second', '45') match_node.set("second", "45")
match_node = ET.SubElement(group_node, 'match') # match with missing attrs match_node = ET.SubElement(group_node, "match") # match with missing attrs
match_node = ET.SubElement(group_node, 'match') # match with non-int values match_node = ET.SubElement(group_node, "match") # match with non-int values
match_node.set('first', 'foo') match_node.set("first", "foo")
match_node.set('second', 'bar') match_node.set("second", "bar")
match_node.set('percentage', 'baz') match_node.set("percentage", "baz")
group_node = ET.SubElement(root, 'foobar') #invalid group group_node = ET.SubElement(root, "foobar") # invalid group
group_node = ET.SubElement(root, 'group') #empty group group_node = ET.SubElement(root, "group") # empty group
f = io.BytesIO() f = io.BytesIO()
tree = ET.ElementTree(root) tree = ET.ElementTree(root)
tree.write(f, encoding='utf-8') tree.write(f, encoding="utf-8")
f.seek(0) f.seek(0)
app = DupeGuru() app = DupeGuru()
r = Results(app) r = Results(app)
@@ -586,16 +587,16 @@ class TestCaseResultsXML:
def test_xml_non_ascii(self): def test_xml_non_ascii(self):
def get_file(path): def get_file(path):
if path == op.join('basepath', '\xe9foo bar'): if path == op.join("basepath", "\xe9foo bar"):
return objects[0] return objects[0]
if path == op.join('basepath', 'bar bleh'): if path == op.join("basepath", "bar bleh"):
return objects[1] return objects[1]
objects = [NamedObject("\xe9foo bar", True), NamedObject("bar bleh", True)] objects = [NamedObject("\xe9foo bar", True), NamedObject("bar bleh", True)]
matches = engine.getmatches(objects) #we should have 5 matches matches = engine.getmatches(objects) # we should have 5 matches
groups = engine.get_groups(matches) #We should have 2 groups groups = engine.get_groups(matches) # We should have 2 groups
for g in groups: for g in groups:
g.prioritize(lambda x: objects.index(x)) #We want the dupes to be in the same order as the list is g.prioritize(lambda x: objects.index(x)) # We want the dupes to be in the same order as the list is
app = DupeGuru() app = DupeGuru()
results = Results(app) results = Results(app)
results.groups = groups results.groups = groups
@@ -607,11 +608,11 @@ class TestCaseResultsXML:
r.load_from_xml(f, get_file) r.load_from_xml(f, get_file)
g = r.groups[0] g = r.groups[0]
eq_("\xe9foo bar", g[0].name) eq_("\xe9foo bar", g[0].name)
eq_(['efoo', 'bar'], g[0].words) eq_(["efoo", "bar"], g[0].words)
def test_load_invalid_xml(self): def test_load_invalid_xml(self):
f = io.BytesIO() f = io.BytesIO()
f.write(b'<this is invalid') f.write(b"<this is invalid")
f.seek(0) f.seek(0)
app = DupeGuru() app = DupeGuru()
r = Results(app) r = Results(app)
@@ -623,7 +624,7 @@ class TestCaseResultsXML:
app = DupeGuru() app = DupeGuru()
r = Results(app) r = Results(app)
with raises(IOError): with raises(IOError):
r.load_from_xml('does_not_exist.xml', None) r.load_from_xml("does_not_exist.xml", None)
eq_(0, len(r.groups)) eq_(0, len(r.groups))
def test_remember_match_percentage(self): def test_remember_match_percentage(self):
@@ -643,12 +644,12 @@ class TestCaseResultsXML:
results.load_from_xml(f, self.get_file) results.load_from_xml(f, self.get_file)
group = results.groups[0] group = results.groups[0]
d1, d2, d3 = group d1, d2, d3 = group
match = group.get_match_of(d2) #d1 - d2 match = group.get_match_of(d2) # d1 - d2
eq_(42, match[2]) eq_(42, match[2])
match = group.get_match_of(d3) #d1 - d3 match = group.get_match_of(d3) # d1 - d3
eq_(43, match[2]) eq_(43, match[2])
group.switch_ref(d2) group.switch_ref(d2)
match = group.get_match_of(d3) #d2 - d3 match = group.get_match_of(d3) # d2 - d3
eq_(46, match[2]) eq_(46, match[2])
def test_save_and_load(self): def test_save_and_load(self):
@@ -661,13 +662,13 @@ class TestCaseResultsXML:
def test_apply_filter_works_on_paths(self): def test_apply_filter_works_on_paths(self):
# apply_filter() searches on the whole path, not just on the filename. # apply_filter() searches on the whole path, not just on the filename.
self.results.apply_filter('basepath') self.results.apply_filter("basepath")
eq_(len(self.results.groups), 2) eq_(len(self.results.groups), 2)
def test_save_xml_with_invalid_characters(self): def test_save_xml_with_invalid_characters(self):
# Don't crash when saving files that have invalid xml characters in their path # Don't crash when saving files that have invalid xml characters in their path
self.objects[0].name = 'foo\x19' self.objects[0].name = "foo\x19"
self.results.save_to_xml(io.BytesIO()) # don't crash self.results.save_to_xml(io.BytesIO()) # don't crash
class TestCaseResultsFilter: class TestCaseResultsFilter:
@@ -676,7 +677,7 @@ class TestCaseResultsFilter:
self.results = self.app.results self.results = self.app.results
self.objects, self.matches, self.groups = GetTestGroups() self.objects, self.matches, self.groups = GetTestGroups()
self.results.groups = self.groups self.results.groups = self.groups
self.results.apply_filter(r'foo') self.results.apply_filter(r"foo")
def test_groups(self): def test_groups(self):
eq_(1, len(self.results.groups)) eq_(1, len(self.results.groups))
@@ -694,7 +695,7 @@ class TestCaseResultsFilter:
def test_dupes_reconstructed_filtered(self): def test_dupes_reconstructed_filtered(self):
# make_ref resets self.__dupes to None. When it's reconstructed, we want it filtered # make_ref resets self.__dupes to None. When it's reconstructed, we want it filtered
dupe = self.results.dupes[0] #3rd object dupe = self.results.dupes[0] # 3rd object
self.results.make_ref(dupe) self.results.make_ref(dupe)
eq_(1, len(self.results.dupes)) eq_(1, len(self.results.dupes))
assert self.results.dupes[0] is self.objects[0] assert self.results.dupes[0] is self.objects[0]
@@ -702,23 +703,23 @@ class TestCaseResultsFilter:
def test_include_ref_dupes_in_filter(self): def test_include_ref_dupes_in_filter(self):
# When only the ref of a group match the filter, include it in the group # When only the ref of a group match the filter, include it in the group
self.results.apply_filter(None) self.results.apply_filter(None)
self.results.apply_filter(r'foo bar') self.results.apply_filter(r"foo bar")
eq_(1, len(self.results.groups)) eq_(1, len(self.results.groups))
eq_(0, len(self.results.dupes)) eq_(0, len(self.results.dupes))
def test_filters_build_on_one_another(self): def test_filters_build_on_one_another(self):
self.results.apply_filter(r'bar') self.results.apply_filter(r"bar")
eq_(1, len(self.results.groups)) eq_(1, len(self.results.groups))
eq_(0, len(self.results.dupes)) eq_(0, len(self.results.dupes))
def test_stat_line(self): def test_stat_line(self):
expected = '0 / 1 (0.00 B / 1.00 B) duplicates marked. filter: foo' expected = "0 / 1 (0.00 B / 1.00 B) duplicates marked. filter: foo"
eq_(expected, self.results.stat_line) eq_(expected, self.results.stat_line)
self.results.apply_filter(r'bar') self.results.apply_filter(r"bar")
expected = '0 / 0 (0.00 B / 0.00 B) duplicates marked. filter: foo --> bar' expected = "0 / 0 (0.00 B / 0.00 B) duplicates marked. filter: foo --> bar"
eq_(expected, self.results.stat_line) eq_(expected, self.results.stat_line)
self.results.apply_filter(None) self.results.apply_filter(None)
expected = '0 / 3 (0.00 B / 1.01 KB) duplicates marked.' expected = "0 / 3 (0.00 B / 1.01 KB) duplicates marked."
eq_(expected, self.results.stat_line) eq_(expected, self.results.stat_line)
def test_mark_count_is_filtered_as_well(self): def test_mark_count_is_filtered_as_well(self):
@@ -726,8 +727,8 @@ class TestCaseResultsFilter:
# We don't want to perform mark_all() because we want the mark list to contain objects # We don't want to perform mark_all() because we want the mark list to contain objects
for dupe in self.results.dupes: for dupe in self.results.dupes:
self.results.mark(dupe) self.results.mark(dupe)
self.results.apply_filter(r'foo') self.results.apply_filter(r"foo")
expected = '1 / 1 (1.00 B / 1.00 B) duplicates marked. filter: foo' expected = "1 / 1 (1.00 B / 1.00 B) duplicates marked. filter: foo"
eq_(expected, self.results.stat_line) eq_(expected, self.results.stat_line)
def test_mark_all_only_affects_filtered_items(self): def test_mark_all_only_affects_filtered_items(self):
@@ -739,22 +740,22 @@ class TestCaseResultsFilter:
def test_sort_groups(self): def test_sort_groups(self):
self.results.apply_filter(None) self.results.apply_filter(None)
self.results.make_ref(self.objects[1]) # to have the 1024 b obkect as ref self.results.make_ref(self.objects[1]) # to have the 1024 b obkect as ref
g1, g2 = self.groups g1, g2 = self.groups
self.results.apply_filter('a') # Matches both group self.results.apply_filter("a") # Matches both group
self.results.sort_groups('size') self.results.sort_groups("size")
assert self.results.groups[0] is g2 assert self.results.groups[0] is g2
assert self.results.groups[1] is g1 assert self.results.groups[1] is g1
self.results.apply_filter(None) self.results.apply_filter(None)
assert self.results.groups[0] is g2 assert self.results.groups[0] is g2
assert self.results.groups[1] is g1 assert self.results.groups[1] is g1
self.results.sort_groups('size', False) self.results.sort_groups("size", False)
self.results.apply_filter('a') self.results.apply_filter("a")
assert self.results.groups[1] is g2 assert self.results.groups[1] is g2
assert self.results.groups[0] is g1 assert self.results.groups[0] is g1
def test_set_group(self): def test_set_group(self):
#We want the new group to be filtered # We want the new group to be filtered
self.objects, self.matches, self.groups = GetTestGroups() self.objects, self.matches, self.groups = GetTestGroups()
self.results.groups = self.groups self.results.groups = self.groups
eq_(1, len(self.results.groups)) eq_(1, len(self.results.groups))
@@ -764,12 +765,12 @@ class TestCaseResultsFilter:
def get_file(path): def get_file(path):
return [f for f in self.objects if str(f.path) == path][0] return [f for f in self.objects if str(f.path) == path][0]
filename = str(tmpdir.join('dupeguru_results.xml')) filename = str(tmpdir.join("dupeguru_results.xml"))
self.objects[4].name = 'ibabtu 2' #we can't have 2 files with the same path self.objects[4].name = "ibabtu 2" # we can't have 2 files with the same path
self.results.save_to_xml(filename) self.results.save_to_xml(filename)
app = DupeGuru() app = DupeGuru()
r = Results(app) r = Results(app)
r.apply_filter('foo') r.apply_filter("foo")
r.load_from_xml(filename, get_file) r.load_from_xml(filename, get_file)
eq_(2, len(r.groups)) eq_(2, len(r.groups))
@@ -778,7 +779,7 @@ class TestCaseResultsFilter:
self.results.apply_filter(None) self.results.apply_filter(None)
eq_(2, len(self.results.groups)) eq_(2, len(self.results.groups))
eq_(2, len(self.results.dupes)) eq_(2, len(self.results.dupes))
self.results.apply_filter('ibabtu') self.results.apply_filter("ibabtu")
self.results.remove_duplicates([self.results.dupes[0]]) self.results.remove_duplicates([self.results.dupes[0]])
self.results.apply_filter(None) self.results.apply_filter(None)
eq_(1, len(self.results.groups)) eq_(1, len(self.results.groups))
@@ -786,7 +787,7 @@ class TestCaseResultsFilter:
def test_filter_is_case_insensitive(self): def test_filter_is_case_insensitive(self):
self.results.apply_filter(None) self.results.apply_filter(None)
self.results.apply_filter('FOO') self.results.apply_filter("FOO")
eq_(1, len(self.results.dupes)) eq_(1, len(self.results.dupes))
def test_make_ref_on_filtered_out_doesnt_mess_stats(self): def test_make_ref_on_filtered_out_doesnt_mess_stats(self):
@@ -794,13 +795,13 @@ class TestCaseResultsFilter:
# When calling make_ref on such a dupe, the total size and dupecount stats gets messed up # When calling make_ref on such a dupe, the total size and dupecount stats gets messed up
# because they are *not* counted in the stats in the first place. # because they are *not* counted in the stats in the first place.
g1, g2 = self.groups g1, g2 = self.groups
bar_bleh = g1[1] # The "bar bleh" dupe is filtered out bar_bleh = g1[1] # The "bar bleh" dupe is filtered out
self.results.make_ref(bar_bleh) self.results.make_ref(bar_bleh)
# Now the stats should display *2* markable dupes (instead of 1) # Now the stats should display *2* markable dupes (instead of 1)
expected = '0 / 2 (0.00 B / 2.00 B) duplicates marked. filter: foo' expected = "0 / 2 (0.00 B / 2.00 B) duplicates marked. filter: foo"
eq_(expected, self.results.stat_line) eq_(expected, self.results.stat_line)
self.results.apply_filter(None) # Now let's make sure our unfiltered results aren't fucked up self.results.apply_filter(None) # Now let's make sure our unfiltered results aren't fucked up
expected = '0 / 3 (0.00 B / 3.00 B) duplicates marked.' expected = "0 / 3 (0.00 B / 3.00 B) duplicates marked."
eq_(expected, self.results.stat_line) eq_(expected, self.results.stat_line)
@@ -814,6 +815,5 @@ class TestCaseResultsRefFile:
self.results.groups = self.groups self.results.groups = self.groups
def test_stat_line(self): def test_stat_line(self):
expected = '0 / 2 (0.00 B / 2.00 B) duplicates marked.' expected = "0 / 2 (0.00 B / 2.00 B) duplicates marked."
eq_(expected, self.results.stat_line) eq_(expected, self.results.stat_line)

View File

@@ -4,86 +4,110 @@
# which should be included with this package. The terms are also available at # which should be included with this package. The terms are also available at
# http://www.gnu.org/licenses/gpl-3.0.html # http://www.gnu.org/licenses/gpl-3.0.html
import pytest
from hscommon.jobprogress import job from hscommon.jobprogress import job
from hscommon.path import Path from pathlib import Path
from hscommon.testutil import eq_ from hscommon.testutil import eq_
from .. import fs from core import fs
from ..engine import getwords, Match from core.engine import getwords, Match
from ..ignore import IgnoreList from core.ignore import IgnoreList
from ..scanner import Scanner, ScanType from core.scanner import Scanner, ScanType
from ..me.scanner import ScannerME from core.me.scanner import ScannerME
class NamedObject: class NamedObject:
def __init__(self, name="foobar", size=1, path=None): def __init__(self, name="foobar", size=1, path=None):
if path is None: if path is None:
path = Path(name) path = Path(name)
else: else:
path = Path(path)[name] path = Path(path, name)
self.name = name self.name = name
self.size = size self.size = size
self.path = path self.path = path
self.words = getwords(name) self.words = getwords(name)
def __repr__(self): def __repr__(self):
return '<NamedObject %r %r>' % (self.name, self.path) return "<NamedObject {!r} {!r}>".format(self.name, self.path)
no = NamedObject no = NamedObject
def pytest_funcarg__fake_fileexists(request):
@pytest.fixture
def fake_fileexists(request):
# This is a hack to avoid invalidating all previous tests since the scanner started to test # This is a hack to avoid invalidating all previous tests since the scanner started to test
# for file existence before doing the match grouping. # for file existence before doing the match grouping.
monkeypatch = request.getfuncargvalue('monkeypatch') monkeypatch = request.getfixturevalue("monkeypatch")
monkeypatch.setattr(Path, 'exists', lambda _: True) monkeypatch.setattr(Path, "exists", lambda _: True)
def test_empty(fake_fileexists): def test_empty(fake_fileexists):
s = Scanner() s = Scanner()
r = s.get_dupe_groups([]) r = s.get_dupe_groups([])
eq_(r, []) eq_(r, [])
def test_default_settings(fake_fileexists): def test_default_settings(fake_fileexists):
s = Scanner() s = Scanner()
eq_(s.min_match_percentage, 80) eq_(s.min_match_percentage, 80)
eq_(s.scan_type, ScanType.Filename) eq_(s.scan_type, ScanType.FILENAME)
eq_(s.mix_file_kind, True) eq_(s.mix_file_kind, True)
eq_(s.word_weighting, False) eq_(s.word_weighting, False)
eq_(s.match_similar_words, False) eq_(s.match_similar_words, False)
eq_(s.size_threshold, 0)
eq_(s.large_size_threshold, 0)
eq_(s.big_file_size_threshold, 0)
def test_simple_with_default_settings(fake_fileexists): def test_simple_with_default_settings(fake_fileexists):
s = Scanner() s = Scanner()
f = [no('foo bar', path='p1'), no('foo bar', path='p2'), no('foo bleh')] f = [no("foo bar", path="p1"), no("foo bar", path="p2"), no("foo bleh")]
r = s.get_dupe_groups(f) r = s.get_dupe_groups(f)
eq_(len(r), 1) eq_(len(r), 1)
g = r[0] g = r[0]
#'foo bleh' cannot be in the group because the default min match % is 80 # 'foo bleh' cannot be in the group because the default min match % is 80
eq_(len(g), 2) eq_(len(g), 2)
assert g.ref in f[:2] assert g.ref in f[:2]
assert g.dupes[0] in f[:2] assert g.dupes[0] in f[:2]
def test_simple_with_lower_min_match(fake_fileexists): def test_simple_with_lower_min_match(fake_fileexists):
s = Scanner() s = Scanner()
s.min_match_percentage = 50 s.min_match_percentage = 50
f = [no('foo bar', path='p1'), no('foo bar', path='p2'), no('foo bleh')] f = [no("foo bar", path="p1"), no("foo bar", path="p2"), no("foo bleh")]
r = s.get_dupe_groups(f) r = s.get_dupe_groups(f)
eq_(len(r), 1) eq_(len(r), 1)
g = r[0] g = r[0]
eq_(len(g), 3) eq_(len(g), 3)
def test_trim_all_ref_groups(fake_fileexists): def test_trim_all_ref_groups(fake_fileexists):
# When all files of a group are ref, don't include that group in the results, but also don't # When all files of a group are ref, don't include that group in the results, but also don't
# count the files from that group as discarded. # count the files from that group as discarded.
s = Scanner() s = Scanner()
f = [no('foo', path='p1'), no('foo', path='p2'), no('bar', path='p1'), no('bar', path='p2')] f = [
no("foo", path="p1"),
no("foo", path="p2"),
no("bar", path="p1"),
no("bar", path="p2"),
]
f[2].is_ref = True f[2].is_ref = True
f[3].is_ref = True f[3].is_ref = True
r = s.get_dupe_groups(f) r = s.get_dupe_groups(f)
eq_(len(r), 1) eq_(len(r), 1)
eq_(s.discarded_file_count, 0) eq_(s.discarded_file_count, 0)
def test_priorize(fake_fileexists):
def test_prioritize(fake_fileexists):
s = Scanner() s = Scanner()
f = [no('foo', path='p1'), no('foo', path='p2'), no('bar', path='p1'), no('bar', path='p2')] f = [
no("foo", path="p1"),
no("foo", path="p2"),
no("bar", path="p1"),
no("bar", path="p2"),
]
f[1].size = 2 f[1].size = 2
f[2].size = 3 f[2].size = 3
f[3].is_ref = True f[3].is_ref = True
@@ -94,36 +118,112 @@ def test_priorize(fake_fileexists):
assert f[3] in (g1.ref, g2.ref) assert f[3] in (g1.ref, g2.ref)
assert f[2] in (g1.dupes[0], g2.dupes[0]) assert f[2] in (g1.dupes[0], g2.dupes[0])
def test_content_scan(fake_fileexists): def test_content_scan(fake_fileexists):
s = Scanner() s = Scanner()
s.scan_type = ScanType.Contents s.scan_type = ScanType.CONTENTS
f = [no('foo'), no('bar'), no('bleh')] f = [no("foo"), no("bar"), no("bleh")]
f[0].md5 = f[0].md5partial = 'foobar' f[0].digest = f[0].digest_partial = f[0].digest_samples = "foobar"
f[1].md5 = f[1].md5partial = 'foobar' f[1].digest = f[1].digest_partial = f[1].digest_samples = "foobar"
f[2].md5 = f[2].md5partial = 'bleh' f[2].digest = f[2].digest_partial = f[1].digest_samples = "bleh"
r = s.get_dupe_groups(f) r = s.get_dupe_groups(f)
eq_(len(r), 1) eq_(len(r), 1)
eq_(len(r[0]), 2) eq_(len(r[0]), 2)
eq_(s.discarded_file_count, 0) # don't count the different md5 as discarded! eq_(s.discarded_file_count, 0) # don't count the different digest as discarded!
def test_content_scan_compare_sizes_first(fake_fileexists): def test_content_scan_compare_sizes_first(fake_fileexists):
class MyFile(no): class MyFile(no):
@property @property
def md5(file): def digest(self):
raise AssertionError() raise AssertionError()
s = Scanner() s = Scanner()
s.scan_type = ScanType.Contents s.scan_type = ScanType.CONTENTS
f = [MyFile('foo', 1), MyFile('bar', 2)] f = [MyFile("foo", 1), MyFile("bar", 2)]
eq_(len(s.get_dupe_groups(f)), 0) eq_(len(s.get_dupe_groups(f)), 0)
def test_ignore_file_size(fake_fileexists):
s = Scanner()
s.scan_type = ScanType.CONTENTS
small_size = 10 # 10KB
s.size_threshold = 0
large_size = 100 * 1024 * 1024 # 100MB
s.large_size_threshold = 0
f = [
no("smallignore1", small_size - 1),
no("smallignore2", small_size - 1),
no("small1", small_size),
no("small2", small_size),
no("large1", large_size),
no("large2", large_size),
no("largeignore1", large_size + 1),
no("largeignore2", large_size + 1),
]
f[0].digest = f[0].digest_partial = f[0].digest_samples = "smallignore"
f[1].digest = f[1].digest_partial = f[1].digest_samples = "smallignore"
f[2].digest = f[2].digest_partial = f[2].digest_samples = "small"
f[3].digest = f[3].digest_partial = f[3].digest_samples = "small"
f[4].digest = f[4].digest_partial = f[4].digest_samples = "large"
f[5].digest = f[5].digest_partial = f[5].digest_samples = "large"
f[6].digest = f[6].digest_partial = f[6].digest_samples = "largeignore"
f[7].digest = f[7].digest_partial = f[7].digest_samples = "largeignore"
r = s.get_dupe_groups(f)
# No ignores
eq_(len(r), 4)
# Ignore smaller
s.size_threshold = small_size
r = s.get_dupe_groups(f)
eq_(len(r), 3)
# Ignore larger
s.size_threshold = 0
s.large_size_threshold = large_size
r = s.get_dupe_groups(f)
eq_(len(r), 3)
# Ignore both
s.size_threshold = small_size
r = s.get_dupe_groups(f)
eq_(len(r), 2)
def test_big_file_partial_hashes(fake_fileexists):
s = Scanner()
s.scan_type = ScanType.CONTENTS
smallsize = 1
bigsize = 100 * 1024 * 1024 # 100MB
s.big_file_size_threshold = bigsize
f = [no("bigfoo", bigsize), no("bigbar", bigsize), no("smallfoo", smallsize), no("smallbar", smallsize)]
f[0].digest = f[0].digest_partial = f[0].digest_samples = "foobar"
f[1].digest = f[1].digest_partial = f[1].digest_samples = "foobar"
f[2].digest = f[2].digest_partial = "bleh"
f[3].digest = f[3].digest_partial = "bleh"
r = s.get_dupe_groups(f)
eq_(len(r), 2)
# digest_partial is still the same, but the file is actually different
f[1].digest = f[1].digest_samples = "difffoobar"
# here we compare the full digests, as the user disabled the optimization
s.big_file_size_threshold = 0
r = s.get_dupe_groups(f)
eq_(len(r), 1)
# here we should compare the digest_samples, and see they are different
s.big_file_size_threshold = bigsize
r = s.get_dupe_groups(f)
eq_(len(r), 1)
def test_min_match_perc_doesnt_matter_for_content_scan(fake_fileexists): def test_min_match_perc_doesnt_matter_for_content_scan(fake_fileexists):
s = Scanner() s = Scanner()
s.scan_type = ScanType.Contents s.scan_type = ScanType.CONTENTS
f = [no('foo'), no('bar'), no('bleh')] f = [no("foo"), no("bar"), no("bleh")]
f[0].md5 = f[0].md5partial = 'foobar' f[0].digest = f[0].digest_partial = f[0].digest_samples = "foobar"
f[1].md5 = f[1].md5partial = 'foobar' f[1].digest = f[1].digest_partial = f[1].digest_samples = "foobar"
f[2].md5 = f[2].md5partial = 'bleh' f[2].digest = f[2].digest_partial = f[2].digest_samples = "bleh"
s.min_match_percentage = 101 s.min_match_percentage = 101
r = s.get_dupe_groups(f) r = s.get_dupe_groups(f)
eq_(len(r), 1) eq_(len(r), 1)
@@ -133,157 +233,181 @@ def test_min_match_perc_doesnt_matter_for_content_scan(fake_fileexists):
eq_(len(r), 1) eq_(len(r), 1)
eq_(len(r[0]), 2) eq_(len(r[0]), 2)
def test_content_scan_doesnt_put_md5_in_words_at_the_end(fake_fileexists):
def test_content_scan_doesnt_put_digest_in_words_at_the_end(fake_fileexists):
s = Scanner() s = Scanner()
s.scan_type = ScanType.Contents s.scan_type = ScanType.CONTENTS
f = [no('foo'), no('bar')] f = [no("foo"), no("bar")]
f[0].md5 = f[0].md5partial = '\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f' f[0].digest = f[0].digest_partial = f[
f[1].md5 = f[1].md5partial = '\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f' 0
].digest_samples = "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
f[1].digest = f[1].digest_partial = f[
1
].digest_samples = "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
r = s.get_dupe_groups(f) r = s.get_dupe_groups(f)
# FIXME looks like we are missing something here?
r[0] r[0]
def test_extension_is_not_counted_in_filename_scan(fake_fileexists): def test_extension_is_not_counted_in_filename_scan(fake_fileexists):
s = Scanner() s = Scanner()
s.min_match_percentage = 100 s.min_match_percentage = 100
f = [no('foo.bar'), no('foo.bleh')] f = [no("foo.bar"), no("foo.bleh")]
r = s.get_dupe_groups(f) r = s.get_dupe_groups(f)
eq_(len(r), 1) eq_(len(r), 1)
eq_(len(r[0]), 2) eq_(len(r[0]), 2)
def test_job(fake_fileexists): def test_job(fake_fileexists):
def do_progress(progress, desc=''): def do_progress(progress, desc=""):
log.append(progress) log.append(progress)
return True return True
s = Scanner() s = Scanner()
log = [] log = []
f = [no('foo bar'), no('foo bar'), no('foo bleh')] f = [no("foo bar"), no("foo bar"), no("foo bleh")]
s.get_dupe_groups(f, j=job.Job(1, do_progress)) s.get_dupe_groups(f, j=job.Job(1, do_progress))
eq_(log[0], 0) eq_(log[0], 0)
eq_(log[-1], 100) eq_(log[-1], 100)
def test_mix_file_kind(fake_fileexists): def test_mix_file_kind(fake_fileexists):
s = Scanner() s = Scanner()
s.mix_file_kind = False s.mix_file_kind = False
f = [no('foo.1'), no('foo.2')] f = [no("foo.1"), no("foo.2")]
r = s.get_dupe_groups(f) r = s.get_dupe_groups(f)
eq_(len(r), 0) eq_(len(r), 0)
def test_word_weighting(fake_fileexists): def test_word_weighting(fake_fileexists):
s = Scanner() s = Scanner()
s.min_match_percentage = 75 s.min_match_percentage = 75
s.word_weighting = True s.word_weighting = True
f = [no('foo bar'), no('foo bar bleh')] f = [no("foo bar"), no("foo bar bleh")]
r = s.get_dupe_groups(f) r = s.get_dupe_groups(f)
eq_(len(r), 1) eq_(len(r), 1)
g = r[0] g = r[0]
m = g.get_match_of(g.dupes[0]) m = g.get_match_of(g.dupes[0])
eq_(m.percentage, 75) # 16 letters, 12 matching eq_(m.percentage, 75) # 16 letters, 12 matching
def test_similar_words(fake_fileexists): def test_similar_words(fake_fileexists):
s = Scanner() s = Scanner()
s.match_similar_words = True s.match_similar_words = True
f = [no('The White Stripes'), no('The Whites Stripe'), no('Limp Bizkit'), no('Limp Bizkitt')] f = [
no("The White Stripes"),
no("The Whites Stripe"),
no("Limp Bizkit"),
no("Limp Bizkitt"),
]
r = s.get_dupe_groups(f) r = s.get_dupe_groups(f)
eq_(len(r), 2) eq_(len(r), 2)
def test_fields(fake_fileexists): def test_fields(fake_fileexists):
s = Scanner() s = Scanner()
s.scan_type = ScanType.Fields s.scan_type = ScanType.FIELDS
f = [no('The White Stripes - Little Ghost'), no('The White Stripes - Little Acorn')] f = [no("The White Stripes - Little Ghost"), no("The White Stripes - Little Acorn")]
r = s.get_dupe_groups(f) r = s.get_dupe_groups(f)
eq_(len(r), 0) eq_(len(r), 0)
def test_fields_no_order(fake_fileexists): def test_fields_no_order(fake_fileexists):
s = Scanner() s = Scanner()
s.scan_type = ScanType.FieldsNoOrder s.scan_type = ScanType.FIELDSNOORDER
f = [no('The White Stripes - Little Ghost'), no('Little Ghost - The White Stripes')] f = [no("The White Stripes - Little Ghost"), no("Little Ghost - The White Stripes")]
r = s.get_dupe_groups(f) r = s.get_dupe_groups(f)
eq_(len(r), 1) eq_(len(r), 1)
def test_tag_scan(fake_fileexists): def test_tag_scan(fake_fileexists):
s = Scanner() s = Scanner()
s.scan_type = ScanType.Tag s.scan_type = ScanType.TAG
o1 = no('foo') o1 = no("foo")
o2 = no('bar') o2 = no("bar")
o1.artist = 'The White Stripes' o1.artist = "The White Stripes"
o1.title = 'The Air Near My Fingers' o1.title = "The Air Near My Fingers"
o2.artist = 'The White Stripes' o2.artist = "The White Stripes"
o2.title = 'The Air Near My Fingers' o2.title = "The Air Near My Fingers"
r = s.get_dupe_groups([o1, o2]) r = s.get_dupe_groups([o1, o2])
eq_(len(r), 1) eq_(len(r), 1)
def test_tag_with_album_scan(fake_fileexists): def test_tag_with_album_scan(fake_fileexists):
s = Scanner() s = Scanner()
s.scan_type = ScanType.Tag s.scan_type = ScanType.TAG
s.scanned_tags = set(['artist', 'album', 'title']) s.scanned_tags = {"artist", "album", "title"}
o1 = no('foo') o1 = no("foo")
o2 = no('bar') o2 = no("bar")
o3 = no('bleh') o3 = no("bleh")
o1.artist = 'The White Stripes' o1.artist = "The White Stripes"
o1.title = 'The Air Near My Fingers' o1.title = "The Air Near My Fingers"
o1.album = 'Elephant' o1.album = "Elephant"
o2.artist = 'The White Stripes' o2.artist = "The White Stripes"
o2.title = 'The Air Near My Fingers' o2.title = "The Air Near My Fingers"
o2.album = 'Elephant' o2.album = "Elephant"
o3.artist = 'The White Stripes' o3.artist = "The White Stripes"
o3.title = 'The Air Near My Fingers' o3.title = "The Air Near My Fingers"
o3.album = 'foobar' o3.album = "foobar"
r = s.get_dupe_groups([o1, o2, o3]) r = s.get_dupe_groups([o1, o2, o3])
eq_(len(r), 1) eq_(len(r), 1)
def test_that_dash_in_tags_dont_create_new_fields(fake_fileexists): def test_that_dash_in_tags_dont_create_new_fields(fake_fileexists):
s = Scanner() s = Scanner()
s.scan_type = ScanType.Tag s.scan_type = ScanType.TAG
s.scanned_tags = set(['artist', 'album', 'title']) s.scanned_tags = {"artist", "album", "title"}
s.min_match_percentage = 50 s.min_match_percentage = 50
o1 = no('foo') o1 = no("foo")
o2 = no('bar') o2 = no("bar")
o1.artist = 'The White Stripes - a' o1.artist = "The White Stripes - a"
o1.title = 'The Air Near My Fingers - a' o1.title = "The Air Near My Fingers - a"
o1.album = 'Elephant - a' o1.album = "Elephant - a"
o2.artist = 'The White Stripes - b' o2.artist = "The White Stripes - b"
o2.title = 'The Air Near My Fingers - b' o2.title = "The Air Near My Fingers - b"
o2.album = 'Elephant - b' o2.album = "Elephant - b"
r = s.get_dupe_groups([o1, o2]) r = s.get_dupe_groups([o1, o2])
eq_(len(r), 1) eq_(len(r), 1)
def test_tag_scan_with_different_scanned(fake_fileexists): def test_tag_scan_with_different_scanned(fake_fileexists):
s = Scanner() s = Scanner()
s.scan_type = ScanType.Tag s.scan_type = ScanType.TAG
s.scanned_tags = set(['track', 'year']) s.scanned_tags = {"track", "year"}
o1 = no('foo') o1 = no("foo")
o2 = no('bar') o2 = no("bar")
o1.artist = 'The White Stripes' o1.artist = "The White Stripes"
o1.title = 'some title' o1.title = "some title"
o1.track = 'foo' o1.track = "foo"
o1.year = 'bar' o1.year = "bar"
o2.artist = 'The White Stripes' o2.artist = "The White Stripes"
o2.title = 'another title' o2.title = "another title"
o2.track = 'foo' o2.track = "foo"
o2.year = 'bar' o2.year = "bar"
r = s.get_dupe_groups([o1, o2]) r = s.get_dupe_groups([o1, o2])
eq_(len(r), 1) eq_(len(r), 1)
def test_tag_scan_only_scans_existing_tags(fake_fileexists): def test_tag_scan_only_scans_existing_tags(fake_fileexists):
s = Scanner() s = Scanner()
s.scan_type = ScanType.Tag s.scan_type = ScanType.TAG
s.scanned_tags = set(['artist', 'foo']) s.scanned_tags = {"artist", "foo"}
o1 = no('foo') o1 = no("foo")
o2 = no('bar') o2 = no("bar")
o1.artist = 'The White Stripes' o1.artist = "The White Stripes"
o1.foo = 'foo' o1.foo = "foo"
o2.artist = 'The White Stripes' o2.artist = "The White Stripes"
o2.foo = 'bar' o2.foo = "bar"
r = s.get_dupe_groups([o1, o2]) r = s.get_dupe_groups([o1, o2])
eq_(len(r), 1) # Because 'foo' is not scanned, they match eq_(len(r), 1) # Because 'foo' is not scanned, they match
def test_tag_scan_converts_to_str(fake_fileexists): def test_tag_scan_converts_to_str(fake_fileexists):
s = Scanner() s = Scanner()
s.scan_type = ScanType.Tag s.scan_type = ScanType.TAG
s.scanned_tags = set(['track']) s.scanned_tags = {"track"}
o1 = no('foo') o1 = no("foo")
o2 = no('bar') o2 = no("bar")
o1.track = 42 o1.track = 42
o2.track = 42 o2.track = 42
try: try:
@@ -292,31 +416,33 @@ def test_tag_scan_converts_to_str(fake_fileexists):
raise AssertionError() raise AssertionError()
eq_(len(r), 1) eq_(len(r), 1)
def test_tag_scan_non_ascii(fake_fileexists): def test_tag_scan_non_ascii(fake_fileexists):
s = Scanner() s = Scanner()
s.scan_type = ScanType.Tag s.scan_type = ScanType.TAG
s.scanned_tags = set(['title']) s.scanned_tags = {"title"}
o1 = no('foo') o1 = no("foo")
o2 = no('bar') o2 = no("bar")
o1.title = 'foobar\u00e9' o1.title = "foobar\u00e9"
o2.title = 'foobar\u00e9' o2.title = "foobar\u00e9"
try: try:
r = s.get_dupe_groups([o1, o2]) r = s.get_dupe_groups([o1, o2])
except UnicodeEncodeError: except UnicodeEncodeError:
raise AssertionError() raise AssertionError()
eq_(len(r), 1) eq_(len(r), 1)
def test_ignore_list(fake_fileexists): def test_ignore_list(fake_fileexists):
s = Scanner() s = Scanner()
f1 = no('foobar') f1 = no("foobar")
f2 = no('foobar') f2 = no("foobar")
f3 = no('foobar') f3 = no("foobar")
f1.path = Path('dir1/foobar') f1.path = Path("dir1/foobar")
f2.path = Path('dir2/foobar') f2.path = Path("dir2/foobar")
f3.path = Path('dir3/foobar') f3.path = Path("dir3/foobar")
ignore_list = IgnoreList() ignore_list = IgnoreList()
ignore_list.Ignore(str(f1.path), str(f2.path)) ignore_list.ignore(str(f1.path), str(f2.path))
ignore_list.Ignore(str(f1.path), str(f3.path)) ignore_list.ignore(str(f1.path), str(f3.path))
r = s.get_dupe_groups([f1, f2, f3], ignore_list=ignore_list) r = s.get_dupe_groups([f1, f2, f3], ignore_list=ignore_list)
eq_(len(r), 1) eq_(len(r), 1)
g = r[0] g = r[0]
@@ -327,19 +453,20 @@ def test_ignore_list(fake_fileexists):
# Ignored matches are not counted as discarded # Ignored matches are not counted as discarded
eq_(s.discarded_file_count, 0) eq_(s.discarded_file_count, 0)
def test_ignore_list_checks_for_unicode(fake_fileexists): def test_ignore_list_checks_for_unicode(fake_fileexists):
#scanner was calling path_str for ignore list checks. Since the Path changes, it must # scanner was calling path_str for ignore list checks. Since the Path changes, it must
#be unicode(path) # be unicode(path)
s = Scanner() s = Scanner()
f1 = no('foobar') f1 = no("foobar")
f2 = no('foobar') f2 = no("foobar")
f3 = no('foobar') f3 = no("foobar")
f1.path = Path('foo1\u00e9') f1.path = Path("foo1\u00e9")
f2.path = Path('foo2\u00e9') f2.path = Path("foo2\u00e9")
f3.path = Path('foo3\u00e9') f3.path = Path("foo3\u00e9")
ignore_list = IgnoreList() ignore_list = IgnoreList()
ignore_list.Ignore(str(f1.path), str(f2.path)) ignore_list.ignore(str(f1.path), str(f2.path))
ignore_list.Ignore(str(f1.path), str(f3.path)) ignore_list.ignore(str(f1.path), str(f3.path))
r = s.get_dupe_groups([f1, f2, f3], ignore_list=ignore_list) r = s.get_dupe_groups([f1, f2, f3], ignore_list=ignore_list)
eq_(len(r), 1) eq_(len(r), 1)
g = r[0] g = r[0]
@@ -348,6 +475,7 @@ def test_ignore_list_checks_for_unicode(fake_fileexists):
assert f2 in g assert f2 in g
assert f3 in g assert f3 in g
def test_file_evaluates_to_false(fake_fileexists): def test_file_evaluates_to_false(fake_fileexists):
# A very wrong way to use any() was added at some point, causing resulting group list # A very wrong way to use any() was added at some point, causing resulting group list
# to be empty. # to be empty.
@@ -355,19 +483,19 @@ def test_file_evaluates_to_false(fake_fileexists):
def __bool__(self): def __bool__(self):
return False return False
s = Scanner() s = Scanner()
f1 = FalseNamedObject('foobar', path='p1') f1 = FalseNamedObject("foobar", path="p1")
f2 = FalseNamedObject('foobar', path='p2') f2 = FalseNamedObject("foobar", path="p2")
r = s.get_dupe_groups([f1, f2]) r = s.get_dupe_groups([f1, f2])
eq_(len(r), 1) eq_(len(r), 1)
def test_size_threshold(fake_fileexists): def test_size_threshold(fake_fileexists):
# Only file equal or higher than the size_threshold in size are scanned # Only file equal or higher than the size_threshold in size are scanned
s = Scanner() s = Scanner()
f1 = no('foo', 1, path='p1') f1 = no("foo", 1, path="p1")
f2 = no('foo', 2, path='p2') f2 = no("foo", 2, path="p2")
f3 = no('foo', 3, path='p3') f3 = no("foo", 3, path="p3")
s.size_threshold = 2 s.size_threshold = 2
groups = s.get_dupe_groups([f1, f2, f3]) groups = s.get_dupe_groups([f1, f2, f3])
eq_(len(groups), 1) eq_(len(groups), 1)
@@ -377,48 +505,52 @@ def test_size_threshold(fake_fileexists):
assert f2 in group assert f2 in group
assert f3 in group assert f3 in group
def test_tie_breaker_path_deepness(fake_fileexists): def test_tie_breaker_path_deepness(fake_fileexists):
# If there is a tie in prioritization, path deepness is used as a tie breaker # If there is a tie in prioritization, path deepness is used as a tie breaker
s = Scanner() s = Scanner()
o1, o2 = no('foo'), no('foo') o1, o2 = no("foo"), no("foo")
o1.path = Path('foo') o1.path = Path("foo")
o2.path = Path('foo/bar') o2.path = Path("foo/bar")
[group] = s.get_dupe_groups([o1, o2]) [group] = s.get_dupe_groups([o1, o2])
assert group.ref is o2 assert group.ref is o2
def test_tie_breaker_copy(fake_fileexists): def test_tie_breaker_copy(fake_fileexists):
# if copy is in the words used (even if it has a deeper path), it becomes a dupe # if copy is in the words used (even if it has a deeper path), it becomes a dupe
s = Scanner() s = Scanner()
o1, o2 = no('foo bar Copy'), no('foo bar') o1, o2 = no("foo bar Copy"), no("foo bar")
o1.path = Path('deeper/path') o1.path = Path("deeper/path")
o2.path = Path('foo') o2.path = Path("foo")
[group] = s.get_dupe_groups([o1, o2]) [group] = s.get_dupe_groups([o1, o2])
assert group.ref is o2 assert group.ref is o2
def test_tie_breaker_same_name_plus_digit(fake_fileexists): def test_tie_breaker_same_name_plus_digit(fake_fileexists):
# if ref has the same words as dupe, but has some just one extra word which is a digit, it # if ref has the same words as dupe, but has some just one extra word which is a digit, it
# becomes a dupe # becomes a dupe
s = Scanner() s = Scanner()
o1 = no('foo bar 42') o1 = no("foo bar 42")
o2 = no('foo bar [42]') o2 = no("foo bar [42]")
o3 = no('foo bar (42)') o3 = no("foo bar (42)")
o4 = no('foo bar {42}') o4 = no("foo bar {42}")
o5 = no('foo bar') o5 = no("foo bar")
# all numbered names have deeper paths, so they'll end up ref if the digits aren't correctly # all numbered names have deeper paths, so they'll end up ref if the digits aren't correctly
# used as tie breakers # used as tie breakers
o1.path = Path('deeper/path') o1.path = Path("deeper/path")
o2.path = Path('deeper/path') o2.path = Path("deeper/path")
o3.path = Path('deeper/path') o3.path = Path("deeper/path")
o4.path = Path('deeper/path') o4.path = Path("deeper/path")
o5.path = Path('foo') o5.path = Path("foo")
[group] = s.get_dupe_groups([o1, o2, o3, o4, o5]) [group] = s.get_dupe_groups([o1, o2, o3, o4, o5])
assert group.ref is o5 assert group.ref is o5
def test_partial_group_match(fake_fileexists): def test_partial_group_match(fake_fileexists):
# Count the number of discarded matches (when a file doesn't match all other dupes of the # Count the number of discarded matches (when a file doesn't match all other dupes of the
# group) in Scanner.discarded_file_count # group) in Scanner.discarded_file_count
s = Scanner() s = Scanner()
o1, o2, o3 = no('a b'), no('a'), no('b') o1, o2, o3 = no("a b"), no("a"), no("b")
s.min_match_percentage = 50 s.min_match_percentage = 50
[group] = s.get_dupe_groups([o1, o2, o3]) [group] = s.get_dupe_groups([o1, o2, o3])
eq_(len(group), 2) eq_(len(group), 2)
@@ -431,81 +563,87 @@ def test_partial_group_match(fake_fileexists):
assert o3 in group assert o3 in group
eq_(s.discarded_file_count, 1) eq_(s.discarded_file_count, 1)
def test_dont_group_files_that_dont_exist(tmpdir): def test_dont_group_files_that_dont_exist(tmpdir):
# when creating groups, check that files exist first. It's possible that these files have # when creating groups, check that files exist first. It's possible that these files have
# been moved during the scan by the user. # been moved during the scan by the user.
# In this test, we have to delete one of the files between the get_matches() part and the # In this test, we have to delete one of the files between the get_matches() part and the
# get_groups() part. # get_groups() part.
s = Scanner() s = Scanner()
s.scan_type = ScanType.Contents s.scan_type = ScanType.CONTENTS
p = Path(str(tmpdir)) p = Path(str(tmpdir))
p['file1'].open('w').write('foo') with p.joinpath("file1").open("w") as fp:
p['file2'].open('w').write('foo') fp.write("foo")
with p.joinpath("file2").open("w") as fp:
fp.write("foo")
file1, file2 = fs.get_files(p) file1, file2 = fs.get_files(p)
def getmatches(*args, **kw): def getmatches(*args, **kw):
file2.path.remove() file2.path.unlink()
return [Match(file1, file2, 100)] return [Match(file1, file2, 100)]
s._getmatches = getmatches s._getmatches = getmatches
assert not s.get_dupe_groups([file1, file2]) assert not s.get_dupe_groups([file1, file2])
def test_folder_scan_exclude_subfolder_matches(fake_fileexists): def test_folder_scan_exclude_subfolder_matches(fake_fileexists):
# when doing a Folders scan type, don't include matches for folders whose parent folder already # when doing a Folders scan type, don't include matches for folders whose parent folder already
# match. # match.
s = Scanner() s = Scanner()
s.scan_type = ScanType.Folders s.scan_type = ScanType.FOLDERS
topf1 = no("top folder 1", size=42) topf1 = no("top folder 1", size=42)
topf1.md5 = topf1.md5partial = b"some_md5_1" topf1.digest = topf1.digest_partial = topf1.digest_samples = b"some_digest__1"
topf1.path = Path('/topf1') topf1.path = Path("/topf1")
topf2 = no("top folder 2", size=42) topf2 = no("top folder 2", size=42)
topf2.md5 = topf2.md5partial = b"some_md5_1" topf2.digest = topf2.digest_partial = topf2.digest_samples = b"some_digest__1"
topf2.path = Path('/topf2') topf2.path = Path("/topf2")
subf1 = no("sub folder 1", size=41) subf1 = no("sub folder 1", size=41)
subf1.md5 = subf1.md5partial = b"some_md5_2" subf1.digest = subf1.digest_partial = subf1.digest_samples = b"some_digest__2"
subf1.path = Path('/topf1/sub') subf1.path = Path("/topf1/sub")
subf2 = no("sub folder 2", size=41) subf2 = no("sub folder 2", size=41)
subf2.md5 = subf2.md5partial = b"some_md5_2" subf2.digest = subf2.digest_partial = subf2.digest_samples = b"some_digest__2"
subf2.path = Path('/topf2/sub') subf2.path = Path("/topf2/sub")
eq_(len(s.get_dupe_groups([topf1, topf2, subf1, subf2])), 1) # only top folders eq_(len(s.get_dupe_groups([topf1, topf2, subf1, subf2])), 1) # only top folders
# however, if another folder matches a subfolder, keep in in the matches # however, if another folder matches a subfolder, keep in in the matches
otherf = no("other folder", size=41) otherf = no("other folder", size=41)
otherf.md5 = otherf.md5partial = b"some_md5_2" otherf.digest = otherf.digest_partial = otherf.digest_samples = b"some_digest__2"
otherf.path = Path('/otherfolder') otherf.path = Path("/otherfolder")
eq_(len(s.get_dupe_groups([topf1, topf2, subf1, subf2, otherf])), 2) eq_(len(s.get_dupe_groups([topf1, topf2, subf1, subf2, otherf])), 2)
def test_ignore_files_with_same_path(fake_fileexists): def test_ignore_files_with_same_path(fake_fileexists):
# It's possible that the scanner is fed with two file instances pointing to the same path. One # It's possible that the scanner is fed with two file instances pointing to the same path. One
# of these files has to be ignored # of these files has to be ignored
s = Scanner() s = Scanner()
f1 = no('foobar', path='path1/foobar') f1 = no("foobar", path="path1/foobar")
f2 = no('foobar', path='path1/foobar') f2 = no("foobar", path="path1/foobar")
eq_(s.get_dupe_groups([f1, f2]), []) eq_(s.get_dupe_groups([f1, f2]), [])
def test_dont_count_ref_files_as_discarded(fake_fileexists): def test_dont_count_ref_files_as_discarded(fake_fileexists):
# To speed up the scan, we don't bother comparing contents of files that are both ref files. # To speed up the scan, we don't bother comparing contents of files that are both ref files.
# However, this causes problems in "discarded" counting and we make sure here that we don't # However, this causes problems in "discarded" counting and we make sure here that we don't
# report discarded matches in exact duplicate scans. # report discarded matches in exact duplicate scans.
s = Scanner() s = Scanner()
s.scan_type = ScanType.Contents s.scan_type = ScanType.CONTENTS
o1 = no("foo", path="p1") o1 = no("foo", path="p1")
o2 = no("foo", path="p2") o2 = no("foo", path="p2")
o3 = no("foo", path="p3") o3 = no("foo", path="p3")
o1.md5 = o1.md5partial = 'foobar' o1.digest = o1.digest_partial = o1.digest_samples = "foobar"
o2.md5 = o2.md5partial = 'foobar' o2.digest = o2.digest_partial = o2.digest_samples = "foobar"
o3.md5 = o3.md5partial = 'foobar' o3.digest = o3.digest_partial = o3.digest_samples = "foobar"
o1.is_ref = True o1.is_ref = True
o2.is_ref = True o2.is_ref = True
eq_(len(s.get_dupe_groups([o1, o2, o3])), 1) eq_(len(s.get_dupe_groups([o1, o2, o3])), 1)
eq_(s.discarded_file_count, 0) eq_(s.discarded_file_count, 0)
def test_priorize_me(fake_fileexists):
# in ScannerME, bitrate goes first (right after is_ref) in priorization def test_prioritize_me(fake_fileexists):
# in ScannerME, bitrate goes first (right after is_ref) in prioritization
s = ScannerME() s = ScannerME()
o1, o2 = no('foo', path='p1'), no('foo', path='p2') o1, o2 = no("foo", path="p1"), no("foo", path="p2")
o1.bitrate = 1 o1.bitrate = 1
o2.bitrate = 2 o2.bitrate = 2
[group] = s.get_dupe_groups([o1, o2]) [group] = s.get_dupe_groups([o1, o2])
assert group.ref is o2 assert group.ref is o2

View File

@@ -5,38 +5,52 @@
# http://www.gnu.org/licenses/gpl-3.0.html # http://www.gnu.org/licenses/gpl-3.0.html
import time import time
import sys
import os
import urllib.request
import urllib.error
import json
import semantic_version
import logging
from typing import Union
from hscommon.util import format_time_decimal from hscommon.util import format_time_decimal
def format_timestamp(t, delta): def format_timestamp(t, delta):
if delta: if delta:
return format_time_decimal(t) return format_time_decimal(t)
else: else:
if t > 0: if t > 0:
return time.strftime('%Y/%m/%d %H:%M:%S', time.localtime(t)) return time.strftime("%Y/%m/%d %H:%M:%S", time.localtime(t))
else: else:
return '---' return "---"
def format_words(w): def format_words(w):
def do_format(w): def do_format(w):
if isinstance(w, list): if isinstance(w, list):
return '(%s)' % ', '.join(do_format(item) for item in w) return "(%s)" % ", ".join(do_format(item) for item in w)
else: else:
return w.replace('\n', ' ') return w.replace("\n", " ")
return ", ".join(do_format(item) for item in w)
return ', '.join(do_format(item) for item in w)
def format_perc(p): def format_perc(p):
return "%0.0f" % p return "%0.0f" % p
def format_dupe_count(c): def format_dupe_count(c):
return str(c) if c else '---' return str(c) if c else "---"
def cmp_value(dupe, attrname): def cmp_value(dupe, attrname):
value = getattr(dupe, attrname, '') value = getattr(dupe, attrname, "")
return value.lower() if isinstance(value, str) else value return value.lower() if isinstance(value, str) else value
def fix_surrogate_encoding(s, encoding='utf-8'):
def fix_surrogate_encoding(s, encoding="utf-8"):
# ref #210. It's possible to end up with file paths that, while correct unicode strings, are # ref #210. It's possible to end up with file paths that, while correct unicode strings, are
# decoded with the 'surrogateescape' option, which make the string unencodable to utf-8. We fix # decoded with the 'surrogateescape' option, which make the string unencodable to utf-8. We fix
# these strings here by trying to encode them and, if it fails, we do an encode/decode dance # these strings here by trying to encode them and, if it fails, we do an encode/decode dance
@@ -49,8 +63,41 @@ def fix_surrogate_encoding(s, encoding='utf-8'):
try: try:
s.encode(encoding) s.encode(encoding)
except UnicodeEncodeError: except UnicodeEncodeError:
return s.encode(encoding, 'replace').decode(encoding) return s.encode(encoding, "replace").decode(encoding)
else: else:
return s return s
def executable_folder():
return os.path.dirname(os.path.abspath(sys.argv[0]))
def check_for_update(current_version: str, include_prerelease: bool = False) -> Union[None, dict]:
request = urllib.request.Request(
"https://api.github.com/repos/arsenetar/dupeguru/releases",
headers={"Accept": "application/vnd.github.v3+json"},
)
try:
with urllib.request.urlopen(request) as response:
if response.status != 200:
logging.warn(f"Error retriving updates. Status: {response.status}")
return None
try:
response_json = json.loads(response.read())
except json.JSONDecodeError as ex:
logging.warn(f"Error parsing updates. {ex.msg}")
return None
except urllib.error.URLError as ex:
logging.warn(f"Error retriving updates. {ex.reason}")
return None
new_version = semantic_version.Version(current_version)
new_url = None
for release in response_json:
release_version = semantic_version.Version(release["name"])
if new_version < release_version and (include_prerelease or not release_version.prerelease):
new_version = release_version
new_url = release["html_url"]
if new_url is not None:
return {"version": new_version, "url": new_url}
else:
return None

View File

@@ -1,3 +1,95 @@
=== 4.3.1 (2022-07-08)
* Fix issue where cache db exceptions could prevent files being hashed (#1015)
* Add extra guard for non-zero length files without digests to prevent false duplicates
* Update Italian translations
=== 4.3.0 (2022-07-01)
* Redirect stdout from custom command to the log files (#1008)
* Update translations
* Fix typo in debian control file (#989)
* Add option to profile scans
* Update fs.py to optimize stat() calls
* Fix Error when delete after scan (#988)
* Update directory scanning to use os.scandir() and DirEntry objects
* Improve performance of Directories.get_state()
* Migrate from hscommon.path to pathlib
* Switch file hashing to xxhash with fallback to md5
* Add update check feature to about box
=== 4.2.1 (2022-03-25)
* Default to English on unsupported system language (#976)
* Fix image viewer zoom datatype issue (#978)
* Fix errors from window change event (#937, #980)
* Fix deprecation warning from SQLite
* Enforce minimum Windows version in installer (#983)
* Fix help path for local files
* Drop python 3.6 support
* VS Code project settings added, yaml validation for GitHub actions
=== 4.2.0 (2021-01-24)
* Add Malay and Turkish
* Add dark style for windows (#900)
* Add caching md5 file hashes (#942)
* Add feature to partially hash large files, with user adjustable preference (#908)
* Add portable mode (store settings next to executable)
* Add file association for .dupeguru files on windows
* Add ability to pass .dupeguru file to load on startup (#902)
* Add ability to reveal in explorer/finder (#895)
* Switch audio tag processing from hsaudiotag to mutagen (#440)
* Add ability to use Qt dialogs instead of native OS dialogs for some file selection operations
* Add OS and Python details to error dialog to assist in troubleshooting
* Add preference to ignore large files with threshold (#430)
* Fix error on close from DetailsPanel (#857, #873)
* Change reference background color (#894, #898)
* Remove stripping of unicode characters when matching names (#879)
* Fix exception when deleting in delta view (#863, #905)
* Fix dupes only view not updating after re-prioritize results (#757, #910, #911)
* Fix ability to drag'n'drop file/folder with certain characters in name (#897)
* Fix window position opening partially offscreen (#653)
* Fix TypeError is photo mode (#551)
* Change message for when files are deleted directly (#904)
* Add more feedback during scan (#700)
* Add Python version check to build.py (#589)
* General code cleanups
* Improvements to using standardized build tooling
* Moved CI/CD to github actions, added codeql, SonarCloud
=== 4.1.1 (2021-03-21)
* Add Japanese
* Update internationalization and translations to be up to date with current UI.
* Minor translation and UI language updates
* Fix language selection issues on Windows (#760)
* Add some additional notes about builds on Linux based systems
* Add import from transifex export to build.py
=== 4.1.0 (2020-12-29)
* Use tabs instead of separate windows (#688)
* Show the shortcut for "mark selected" in results dialog (#656, #641)
* Add image comparison features to details dialog (#683)
* Add the ability to use regex based exclusion filters (#705)
* Change reference row background color, and allow user to adjust the color (#701)
* Save / Load directories as XML (#706)
* Workaround for EXIF IFD type mismatch in parsing function (#630, #698)
* Progress dialog stuck at "Verified X/X matches" (#693, #694)
* Fix word wrap in ignore list dialog (#687)
* Fix issue with result window action on creation (#685)
* Colorize details table differences, allow moving rows (#682)
* Fix loading Result of 'Scan Type: Folders' shows only '---' in every table cell (#677, #676)
* Fix issue with details and results dialog row trimming (#655, #654)
* Add option to enable/disable bold font (#646, #314)
* Use relative icon path for themes to override more easily (#746)
* Fix issues with Python 3.8 compatibility (#665)
* Fix flake8 issues (#672)
* Update to use newer pytest and expand flake8 checking, cleanup various Deprecation Warnings
* Add warnings to packaging script when files are not built (#691)
* Use relative icon path for themes to override more easily (#746)
* Update Packaging for Ubuntu (#593)
* Minor Build Updates (#627, #575, #628, #614)
* Update CI builds and add windows CI (#572, #669)
=== 4.0.4 (2019-05-13) === 4.0.4 (2019-05-13)
* Update qt/platform.py to support other Unix style OSes (#444) * Update qt/platform.py to support other Unix style OSes (#444)

View File

@@ -1,7 +1,7 @@
Häufig gestellte Fragen Häufig gestellte Fragen
========================== ==========================
.. topic:: What is |appname|? .. topic:: What is dupeGuru?
.. only:: edition_se .. only:: edition_se
@@ -25,7 +25,7 @@ Häufig gestellte Fragen
.. topic:: Was sind die Demo-Einschränkungen von dupeGuru? .. topic:: Was sind die Demo-Einschränkungen von dupeGuru?
Keine, |appname| ist `Fairware <http://open.hardcoded.net/about/>`_. Keine, dupeGuru ist `Fairware <http://open.hardcoded.net/about/>`_.
.. topic:: Die Markierungsbox einer Datei, die ich löschen möchte, ist deaktiviert. Was muss ich tun? .. topic:: Die Markierungsbox einer Datei, die ich löschen möchte, ist deaktiviert. Was muss ich tun?

View File

@@ -1,21 +1,13 @@
|appname| Hilfe dupeGuru Hilfe
=============== ===============
.. only:: edition_se .. only:: edition_se
Dieses Dokument ist auch auf `Englisch <http://www.hardcoded.net/dupeguru/help/en/>`__ und `Französisch <http://www.hardcoded.net/dupeguru/help/fr/>`__ verfügbar. Dieses Dokument ist auch auf `Englisch <http://dupeguru.voltaicideas.net/help/en/>`__ und `Französisch <http://dupeguru.voltaicideas.net/help/fr/>`__ verfügbar.
.. only:: edition_me
Dieses Dokument ist auch auf `Englisch <http://www.hardcoded.net/dupeguru/help/en/>`__ und `Französisch <http://www.hardcoded.net/dupeguru_me/help/fr/>`__ verfügbar.
.. only:: edition_pe
Dieses Dokument ist auch auf `Englisch <http://www.hardcoded.net/dupeguru/help/en/>`__ und `Französisch <http://www.hardcoded.net/dupeguru_pe/help/fr/>`__ verfügbar.
.. only:: edition_se or edition_me .. only:: edition_se or edition_me
|appname| ist ein Tool zum Auffinden von Duplikaten auf Ihrem Computer. Es kann entweder Dateinamen oder Inhalte scannen. Der Dateiname-Scan stellt einen lockeren Suchalgorithmus zur Verfügung, der sogar Duplikate findet, die nicht den exakten selben Namen haben. dupeGuru ist ein Tool zum Auffinden von Duplikaten auf Ihrem Computer. Es kann entweder Dateinamen oder Inhalte scannen. Der Dateiname-Scan stellt einen lockeren Suchalgorithmus zur Verfügung, der sogar Duplikate findet, die nicht den exakten selben Namen haben.
.. only:: edition_pe .. only:: edition_pe
@@ -23,7 +15,7 @@
Obwohl dupeGuru auch leicht ohne Dokumentation genutzt werden kann, ist es sinnvoll die Hilfe zu lesen. Wenn Sie nach einer Führung für den ersten Duplikatscan suchen, werfen Sie einen Blick auf die :doc:`Schnellstart <quick_start>` Sektion Obwohl dupeGuru auch leicht ohne Dokumentation genutzt werden kann, ist es sinnvoll die Hilfe zu lesen. Wenn Sie nach einer Führung für den ersten Duplikatscan suchen, werfen Sie einen Blick auf die :doc:`Schnellstart <quick_start>` Sektion
Es ist eine gute Idee |appname| aktuell zu halten. Sie können die neueste Version auf der `homepage`_ finden. Es ist eine gute Idee dupeGuru aktuell zu halten. Sie können die neueste Version auf der http://dupeguru.voltaicideas.net finden.
Inhalte: Inhalte:

View File

@@ -12,7 +12,7 @@ a community around this project.
So, whatever your skills, if you're interested in contributing to dupeGuru, please do so. Normally, So, whatever your skills, if you're interested in contributing to dupeGuru, please do so. Normally,
this documentation should be enough to get you started, but if it isn't, then **please**, this documentation should be enough to get you started, but if it isn't, then **please**,
`let me know`_ because it's a problem that I'm committed to fix. If there's any situation where you'd open a discussion at https://github.com/arsenetar/dupeguru/discussions. If there's any situation where you'd
wish to contribute but some doubt you're having prevent you from going forward, please contact me. wish to contribute but some doubt you're having prevent you from going forward, please contact me.
I'd much prefer to spend the time figuring out with you whether (and how) you can contribute than I'd much prefer to spend the time figuring out with you whether (and how) you can contribute than
taking the chance of missing that opportunity. taking the chance of missing that opportunity.
@@ -82,10 +82,9 @@ agree on what should be added to the documentation.
dupeGuru. For more information about how to do that, you can refer to the `translator guide`_. dupeGuru. For more information about how to do that, you can refer to the `translator guide`_.
.. _been open source: https://www.hardcoded.net/articles/free-as-in-speech-fair-as-in-trade .. _been open source: https://www.hardcoded.net/articles/free-as-in-speech-fair-as-in-trade
.. _let me know: mailto:hsoft@hardcoded.net .. _Source code repository: https://github.com/arsenetar/dupeguru
.. _Source code repository: https://github.com/hsoft/dupeguru .. _Issue Tracker: https://github.com/arsenetar/issues
.. _Issue Tracker: https://github.com/hsoft/dupeguru/issues .. _Issue labels meaning: https://github.com/arsenetar/wiki/issue-labels
.. _Issue labels meaning: https://github.com/hsoft/dupeguru/wiki/issue-labels
.. _Sphinx: http://sphinx-doc.org/ .. _Sphinx: http://sphinx-doc.org/
.. _reST: http://en.wikipedia.org/wiki/ReStructuredText .. _reST: http://en.wikipedia.org/wiki/ReStructuredText
.. _translator guide: https://github.com/hsoft/dupeguru/wiki/Translator-Guide .. _translator guide: https://github.com/arsenetar/wiki/Translator-Guide

View File

@@ -1,12 +0,0 @@
hscommon.jobprogress.qt
=======================
.. automodule:: hscommon.jobprogress.qt
.. autosummary::
Progress
.. autoclass:: Progress
:members:

View File

@@ -151,8 +151,6 @@ delete files" option that is offered to you when you activate Send to Trash. Thi
files to the Trash, but delete them immediately. In some cases, for example on network storage files to the Trash, but delete them immediately. In some cases, for example on network storage
(NAS), this has been known to work when normal deletion didn't. (NAS), this has been known to work when normal deletion didn't.
If this fail, `HS forums`_ might be of some help.
Why is Picture mode's contents scan so slow? Why is Picture mode's contents scan so slow?
-------------------------------------------- --------------------------------------------
@@ -178,7 +176,6 @@ Preferences are stored elsewhere:
* Linux: ``~/.config/Hardcoded Software/dupeGuru.conf`` * Linux: ``~/.config/Hardcoded Software/dupeGuru.conf``
* Mac OS X: In the built-in ``defaults`` system, as ``com.hardcoded-software.dupeguru`` * Mac OS X: In the built-in ``defaults`` system, as ``com.hardcoded-software.dupeguru``
.. _HS forums: https://forum.hardcoded.net/ .. _Github: https://github.com/arsenetar/dupeguru
.. _Github: https://github.com/hsoft/dupeguru .. _open an issue: https://github.com/arsenetar/dupeguru/wiki/issue-labels
.. _open an issue: https://github.com/hsoft/dupeguru/wiki/issue-labels

View File

@@ -3,11 +3,11 @@ dupeGuru help
This help document is also available in these languages: This help document is also available in these languages:
* `French <http://www.hardcoded.net/dupeguru/help/fr>`__ * `French <http://dupeguru.voltaicideas.net/help/fr>`__
* `German <http://www.hardcoded.net/dupeguru/help/de>`__ * `German <http://dupeguru.voltaicideas.net/help/de>`__
* `Armenian <http://www.hardcoded.net/dupeguru/help/hy>`__ * `Armenian <http://dupeguru.voltaicideas.net/help/hy>`__
* `Russian <http://www.hardcoded.net/dupeguru/help/ru>`__ * `Russian <http://dupeguru.voltaicideas.net/help/ru>`__
* `Ukrainian <http://www.hardcoded.net/dupeguru/help/uk>`__ * `Ukrainian <http://dupeguru.voltaicideas.net/help/uk>`__
dupeGuru is a tool to find duplicate files on your computer. It has three dupeGuru is a tool to find duplicate files on your computer. It has three
modes, Standard, Music and Picture, with each mode having its own scan types modes, Standard, Music and Picture, with each mode having its own scan types
@@ -42,4 +42,4 @@ Indices and tables
* :ref:`genindex` * :ref:`genindex`
* :ref:`search` * :ref:`search`
.. _homepage: https://www.hardcoded.net/dupeguru .. _homepage: https://dupeguru.voltaicideas.net/

View File

@@ -3,7 +3,7 @@ Foire aux questions
.. contents:: .. contents::
Qu'est-ce que |appname|? Qu'est-ce que dupeGuru?
------------------------ ------------------------
.. only:: edition_se .. only:: edition_se

View File

@@ -1,21 +1,13 @@
Aide |appname| Aide dupeGuru
=============== ===============
.. only:: edition_se .. only:: edition_se
Ce document est aussi disponible en `anglais <http://www.hardcoded.net/dupeguru/help/en/>`__, en `allemand <http://www.hardcoded.net/dupeguru/help/de/>`__ et en `arménien <http://www.hardcoded.net/dupeguru/help/hy/>`__. Ce document est aussi disponible en `anglais <http://dupeguru.voltaicideas.net/help/en/>`__, en `allemand <http://dupeguru.voltaicideas.net/help/de/>`__ et en `arménien <http://dupeguru.voltaicideas.net/help/hy/>`__.
.. only:: edition_me
Ce document est aussi disponible en `anglais <http://www.hardcoded.net/dupeguru_me/help/en/>`__, en `allemand <http://www.hardcoded.net/dupeguru_me/help/de/>`__ et en `arménien <http://www.hardcoded.net/dupeguru_me/help/hy/>`__.
.. only:: edition_pe
Ce document est aussi disponible en `anglais <http://www.hardcoded.net/dupeguru_pe/help/en/>`__, en `allemand <http://www.hardcoded.net/dupeguru_pe/help/de/>`__ et en `arménien <http://www.hardcoded.net/dupeguru_pe/help/hy/>`__.
.. only:: edition_se or edition_me .. only:: edition_se or edition_me
|appname| est un outil pour trouver des doublons parmi vos fichiers. Il peut comparer soit les noms de fichiers, soit le contenu. Le comparateur de nom de fichier peut trouver des doublons même si les noms ne sont pas exactement pareils. dupeGuru est un outil pour trouver des doublons parmi vos fichiers. Il peut comparer soit les noms de fichiers, soit le contenu. Le comparateur de nom de fichier peut trouver des doublons même si les noms ne sont pas exactement pareils.
.. only:: edition_pe .. only:: edition_pe
@@ -23,7 +15,7 @@ Aide |appname|
Bien que dupeGuru puisse être utilisé sans lire l'aide, une telle lecture vous permettra de bien comprendre comment l'application fonctionne. Pour un guide rapide pour une première utilisation, référez vous à la section :doc:`Démarrage Rapide <quick_start>`. Bien que dupeGuru puisse être utilisé sans lire l'aide, une telle lecture vous permettra de bien comprendre comment l'application fonctionne. Pour un guide rapide pour une première utilisation, référez vous à la section :doc:`Démarrage Rapide <quick_start>`.
C'est toujours une bonne idée de garder |appname| à jour. Vous pouvez télécharger la dernière version sur sa `page web`_. C'est toujours une bonne idée de garder dupeGuru à jour. Vous pouvez télécharger la dernière version sur sa http://dupeguru.voltaicideas.net.
Contents: Contents:

View File

@@ -1,7 +1,7 @@
Հաճախ Տրվող Հարցեր Հաճախ Տրվող Հարցեր
========================== ==========================
.. topic:: Ի՞նչ է |appname|-ը: .. topic:: Ի՞նչ է dupeGuru-ը:
.. only:: edition_se .. only:: edition_se

View File

@@ -1,21 +1,13 @@
|appname| help dupeGuru help
=============== ===============
.. only:: edition_se .. only:: edition_se
Այս փաստաթուղթը հասանելի է նաև՝ `Ֆրանսերեն <http://www.hardcoded.net/dupeguru/help/fr/>`__ և `Գերմաներեն <http://www.hardcoded.net/dupeguru/help/de/>`__. Այս փաստաթուղթը հասանելի է նաև՝ `Ֆրանսերեն <http://dupeguru.voltaicideas.net/help/fr/>`__ և `Գերմաներեն <http://dupeguru.voltaicideas.net/help/de/>`__.
.. only:: edition_me
Այս փաստաթուղթը հասանելի է նաև՝ `Ֆրանսերեն <http://www.hardcoded.net/dupeguru_me/help/fr/>`__ և `Գերմաներեն <http://www.hardcoded.net/dupeguru_me/help/de/>`__.
.. only:: edition_pe
Այս փաստաթուղթը հասանելի է նաև՝ `Ֆրանսերեն <http://www.hardcoded.net/dupeguru_pe/help/fr/>`__ և `Գերմաներեն <http://www.hardcoded.net/dupeguru_pe/help/de/>`__.
.. only:: edition_se or edition_me .. only:: edition_se or edition_me
|appname| ծրագիր է՝ գտնելու կրկնօրինակ ունեցող ֆայլեր Ձեր համակարգչում: Այն կարող է անգամ ստուգել ֆայլի անունները կան բովանդակությունը: Ֆայլի անվան ստուգման հնարավորությունները ոչ ճշգրիտ համընկման ալգորիթմով, որը կարող է գտնել ֆայլի անվան կրկնօրինակներ, անգամ եթե դրանք նույնը չեն: dupeGuru ծրագիր է՝ գտնելու կրկնօրինակ ունեցող ֆայլեր Ձեր համակարգչում: Այն կարող է անգամ ստուգել ֆայլի անունները կան բովանդակությունը: Ֆայլի անվան ստուգման հնարավորությունները ոչ ճշգրիտ համընկման ալգորիթմով, որը կարող է գտնել ֆայլի անվան կրկնօրինակներ, անգամ եթե դրանք նույնը չեն:
.. only:: edition_pe .. only:: edition_pe
@@ -23,7 +15,7 @@
Չնայած dupeGuru-ն կարող է հեշտությամբ օգտագործվել առանց օգնության, այնուհանդերձ եթե կարդաք այս ֆայլը, այն մեծապես կօգնի Ձեզ ընկալելու ծրագրի աշխատանքը: Եթե Դուք նայում եք ձեռնարկը կրկնօրինակների առաջին ստուգման համար, ապա կարող եք ընտրել :doc:`Արագ Սկիզբ <quick_start>` հատվածը: Չնայած dupeGuru-ն կարող է հեշտությամբ օգտագործվել առանց օգնության, այնուհանդերձ եթե կարդաք այս ֆայլը, այն մեծապես կօգնի Ձեզ ընկալելու ծրագրի աշխատանքը: Եթե Դուք նայում եք ձեռնարկը կրկնօրինակների առաջին ստուգման համար, ապա կարող եք ընտրել :doc:`Արագ Սկիզբ <quick_start>` հատվածը:
Շատ լավ միտք է պահելու |appname| թարմացված: Կարող եք բեռնել վեբ կայքի համապատասխան էջից `homepage`_: Շատ լավ միտք է պահելու dupeGuru թարմացված: Կարող եք բեռնել վեբ կայքի համապատասխան էջից http://dupeguru.voltaicideas.net:
Պարունակությունը. Պարունակությունը.

View File

@@ -1,7 +1,7 @@
Часто задаваемые вопросы Часто задаваемые вопросы
========================== ==========================
.. topic:: Что такое |appname|? .. topic:: Что такое dupeGuru?
.. only:: edition_se .. only:: edition_se

Some files were not shown because too many files have changed in this diff Show More