* match all orientations
* use rotation as option
---------
Co-authored-by: Andrew Senetar <arsenetar@gmail.com>
Co-authored-by: Luke <byunghun.hyun26@gmail.com>
- Update NullJob to subclass Job
- Remove unnecessary size pre-read in _getMatches() as file sizes are
already loaded during file scan via stat call
- Skip ref check if contents scan as the scan already prevents this from
happening, some of the other scans do things differently and need to
be reviewed before removing this post step completely
- Add guard on partial hashing to just hash the whole file if smaller
than the offset and size and use the value for both the partial digest
and digest
- Add "safe" existence check to files which catches OSErrors that may
occur when trying to stat files
- Use "safe" existence check during final existence check
- Remove lock on read operations, only needed for write operations
- Change to use context manager for sqlite connection
- Remove long lived cursor object and use short lived cursors instead
Fixes#1080
- Add exclude pattern for flake8 when running with pre-commit as it does
not fully honor the exclude paths.
- Cleanup exclude paths for flake8 in tox.ini
- Re-enable line length check and correct three affected files
- Add option to include file existence check at end of scan, speeds up
end of scan operation time considerably, however if user has removed
or moved files since starting a scan there could be later errors when
interacting with results. Defaults to existing behavior of including
the check, until it can be verified later dialogs and actions handle
non-existent items better.
- Add option to ignore differences in mtime when checking hash cache.
Option is present in advanced tab of preferences. Closes#1022.
- Regenerate pot files for translations
- Move the try/except of cache db calls to the calls themselves.
- Add some additional information to logging statements on cache db
exception to improve troubleshooting.
- Add Callable type to hasher (should realy be more specific...)
- Add type hint to COLUMNS in qtlib/table.py
- Use Qt.ItemFlag.ItemIsEnabled instead of Qt.itemIsEnabled in qtlib/table.py
- Update to get size and mtime at time of class creation when os.DirEntry is used for initialization.
- Folders still calculate size later for folder scans.
- Ref #962, #959
- Change to use os.scandir() instead of os.walk() to leverage DirEntry objects.
- Avoids extra calls to stat() on files during fs.can_handle()
- See 3x speed improvement on Windows in some cases
commit 8b15fe9a502ebf4841c6529e7098cef03a6a5e6f
Author: Andrew Senetar <arsenetar@gmail.com>
Date: Sun Mar 27 23:48:15 2022 -0500
Finish up changes to copy_or_move
commit 21f6a32cf3186a400af8f30e67ad2743dc9a49bd
Author: Andrew Senetar <arsenetar@gmail.com>
Date: Thu Mar 17 23:56:52 2022 -0500
Migrate from hscommon.path to pathlib
- Part one, this gets all hscommon and core tests passing
- App appears to be able to load directories and complete scans, need further testing
- app.py copy_or_move needs some additional work
commit 5eb515f666bfa1ff06c2e96bdc351a4b7456580e
Author: Andrew Senetar <arsenetar@gmail.com>
Date: Sun Mar 27 22:19:39 2022 -0500
Add fallback to md5 if xxhash not available
Mainly here for the case when distributions have not packaged python3-xxhash.
commit 51b18d4c84
Author: Andrew Senetar <arsenetar@gmail.com>
Date: Sat Mar 19 15:25:46 2022 -0500
Switch file hashing to xxhash instead of md5
- Improves performance significantly in some cases
- Add xxhash to requirements.txt and sort requirements
- Rename md5 based members to digest
- Update all tests to use new member names and hashing methods
- Update hash db code to upgrade schema
NOTE: May consider supporting multiple hashing algorithms in the future.
Computing 3 hash samples for files less than 3MiB (3 * CHUNK_SIZE) is not efficient since spans of later samples would overlap a previous one.
Therefore we can simply return the hash of the entire small file instead.
* Instead of keeping md5 samples separate, merge them as one hash computed from the various selected chunks we picked.
* We don't need to keep a boolean to see whether or not the user chose to optimize; we can simply compare the value of the threshold, since 0 means no optimization currently active.
* Big files above the user selected threshold can be partially hashed in 3 places.
* If the user is willing to take the risk, we consider files with identical md5samples as being identical.
- Format all files with black
- Update tox.ini flake8 arguments to be compatible
- Add black to requirements-extra.txt
- Reduce ignored flake8 rules and fix a few violations
Refactored dupeGuru to make hscommon.path's API a bit close to pathlib's
API. It's not 100% compatible yet, but it's much better than before.
This is more of a hscommon refactoring than a dupeguru one, but since
duepGuru is the main user of Path, it was the driver behind the
refactoring.
This refactoring also see the introduction of @pathify, which ensure
Path arguments. Previously, we were often unsure of whether the caller
of a function was passing a Path or a str. This problem is now solved
and this allows us to remove hscommon.io, an ill-conceived attempt to
solve that same ambiguity problem.
Fixes#235.