1
0
mirror of https://github.com/arsenetar/dupeguru.git synced 2024-11-05 07:49:02 +00:00
Commit Graph

53 Commits

Author SHA1 Message Date
99ec4e0f27
fix: Minor cleanups and fixes
- Update NullJob to subclass Job
- Remove unnecessary size pre-read in _getMatches() as file sizes are
  already loaded during file scan via stat call
- Skip ref check if contents scan as the scan already prevents this from
  happening, some of the other scans do things differently and need to
  be reviewed before removing this post step completely
- Add guard on partial hashing to just hash the whole file if smaller
  than the offset and size and use the value for both the partial digest
  and digest
2023-06-08 01:14:52 -05:00
057be0294a
fix: Prevent exception during existence check
- Add "safe" existence check to files which catches OSErrors that may
  occur when trying to stat files
- Use "safe" existence check during final existence check
2023-01-11 23:07:06 -06:00
81daddd072
refactor: Improve digest cache db method performance
- Remove lock on read operations, only needed for write operations
- Change to use context manager for sqlite connection
- Remove long lived cursor object and use short lived cursors instead

Fixes #1080
2023-01-11 00:58:29 -06:00
6db2fa2be6
fix: Correct flake8 config
- Add exclude pattern for flake8 when running with pre-commit as it does
  not fully honor the exclude paths.
- Cleanup exclude paths for flake8 in tox.ini
- Re-enable line length check and correct three affected files
2023-01-09 22:35:12 -06:00
e30a135451
feat: Add additional scan time options
- Add option to include file existence check at end of scan, speeds up
  end of scan operation time considerably, however if user has removed
  or moved files since starting a scan there could be later errors when
  interacting with results.  Defaults to existing behavior of including
  the check, until it can be verified later dialogs and actions handle
  non-existent items better.
- Add option to ignore differences in mtime when checking hash cache.
  Option is present in advanced tab of preferences.  Closes #1022.
- Regenerate pot files for translations
2023-01-05 23:01:16 -06:00
71af825b37
Move try/except of cache db to get() and put()
- Move the try/except of cache db calls to the calls themselves.
- Add some additional information to logging statements on cache db
  exception to improve troubleshooting.
2022-07-07 21:52:22 -05:00
0a4e61edf5
Additional cleanup per mypy
- Add Callable type to hasher (should realy be more specific...)
- Add type hint to COLUMNS in qtlib/table.py
- Use Qt.ItemFlag.ItemIsEnabled instead of Qt.itemIsEnabled in qtlib/table.py
2022-04-30 05:16:46 -05:00
63dd4d4561
Apply pyupgrade changes 2022-04-27 20:53:12 -05:00
a470a8de25
Update fs.py to optimize stat() calls
- Update to get size and mtime at time of class creation when os.DirEntry is used for initialization.
- Folders still calculate size later for folder scans.
- Ref #962, #959
2022-03-30 22:58:01 -05:00
efd500ecc1
Update directory scanning to use os.scandir()
- Change to use os.scandir() instead of os.walk() to leverage DirEntry objects.
- Avoids extra calls to stat() on files during fs.can_handle()
- See 3x speed improvement on Windows in some cases
2022-03-29 23:37:56 -05:00
43fcc52291
Replace pathlib.glob() with os.scandir() in fs.py 2022-03-29 22:35:38 -05:00
50f5db1543
Update fs to support DirEntry on get_file() 2022-03-29 22:32:36 -05:00
da9f8b2b9d
Squashed commit of the following:
commit 8b15fe9a502ebf4841c6529e7098cef03a6a5e6f
Author: Andrew Senetar <arsenetar@gmail.com>
Date:   Sun Mar 27 23:48:15 2022 -0500

    Finish up changes to copy_or_move

commit 21f6a32cf3186a400af8f30e67ad2743dc9a49bd
Author: Andrew Senetar <arsenetar@gmail.com>
Date:   Thu Mar 17 23:56:52 2022 -0500

    Migrate from hscommon.path to pathlib
    - Part one, this gets all hscommon and core tests passing
    - App appears to be able to load directories and complete scans, need further testing
    - app.py copy_or_move needs some additional work
2022-03-27 23:50:03 -05:00
9f40e4e786
Squashed commit of the following:
commit 5eb515f666bfa1ff06c2e96bdc351a4b7456580e
Author: Andrew Senetar <arsenetar@gmail.com>
Date:   Sun Mar 27 22:19:39 2022 -0500

    Add fallback to md5 if xxhash not available

    Mainly here for the case when distributions have not packaged python3-xxhash.

commit 51b18d4c84
Author: Andrew Senetar <arsenetar@gmail.com>
Date:   Sat Mar 19 15:25:46 2022 -0500

    Switch file hashing to xxhash instead of md5

    - Improves performance significantly in some cases
    - Add xxhash to requirements.txt and sort requirements
    - Rename md5 based members to digest
    - Update all tests to use new member names and hashing methods
    - Update hash db code to upgrade schema

    NOTE: May consider supporting multiple hashing algorithms in the future.
2022-03-27 22:27:13 -05:00
Dobatymo
9753afba74 change FilesDB to singleton class
move hash calculation back in to Files class
clear cache now clears hash cache in addition to picture cache
2021-10-29 15:12:40 +08:00
Dobatymo
2f02a6010d implement hash cache for md5 hash based on sqlite 2021-10-29 15:12:40 +08:00
d576a7043c
Code cleanups in core and other affected files 2021-08-21 18:02:02 -05:00
ffe6b7047c
Format all files with black correcting line length 2021-08-15 04:10:18 -05:00
glubsy
e95306e58f Fix flake 8 2021-08-14 02:52:00 +02:00
glubsy
891a875990 Cache constant expression
Perhaps the python byte code is already optimized, but just in case it is not, keep pre-compute the constant expression.
2021-08-13 21:33:21 +02:00
glubsy
545a5a75fb Fix for older python versions
The "walrus" operator is only available in python 3.8 and later. Fall back to more traditional notation.
2021-08-13 20:56:33 +02:00
glubsy
7b764f183e Avoid partially hashing small files
Computing 3 hash samples for files less than 3MiB (3 * CHUNK_SIZE) is not efficient since spans of later samples would overlap a previous one.
Therefore we can simply return the hash of the entire small file instead.
2021-08-13 20:47:01 +02:00
glubsy
718ca5b313 Remove unused import 2021-06-22 02:41:33 +02:00
glubsy
277bc3fbb8 Add unit tests for hash sample optimization
* Instead of keeping md5 samples separate, merge them as one hash computed from the various selected chunks we picked.
* We don't need to keep a boolean to see whether or not the user chose to optimize; we can simply compare the value of the threshold, since 0 means no optimization currently active.
2021-06-21 22:44:05 +02:00
glubsy
e07dfd5955 Add partial hashes optimization for big files
* Big files above the user selected threshold can be partially hashed in 3 places.
* If the user is willing to take the risk, we consider files with identical md5samples as being identical.
2021-06-21 19:03:21 +02:00
7ba8aa3514
Format files with black
- Format all files with black
- Update tox.ini flake8 arguments to be compatible
- Add black to requirements-extra.txt
- Reduce ignored flake8 rules and fix a few violations
2019-12-31 20:16:27 -06:00
Virgil Dupras
334f4dd2ae Increase md5 reading buffer to 1mb
This makes md5 computing faster without using too much memory.
2016-06-08 12:23:10 -04:00
Virgil Dupras
e7076bc3bd Change license from BSD to GPLv3
See http://www.hardcoded.net/archive2014#2014-12-28 for context
2015-01-03 16:33:16 -05:00
Virgil Dupras
fc16ea8c49 Change copyright year to 2015 2015-01-03 16:30:57 -05:00
Virgil Dupras
2166a0996c Added tox configuration
... and fixed pep8 warnings. There's a lot of them that are still
ignored, but that's because it's too much of a step to take at once.
2014-10-13 15:08:59 -04:00
Virgil Dupras
ca709a60cf Updated copyright year to 2014 2014-04-19 12:19:11 -04:00
Virgil Dupras
10dbfa9b38 Refactoring: Path API compatibility with pathlib
Refactored dupeGuru to make hscommon.path's API a bit close to pathlib's
API. It's not 100% compatible yet, but it's much better than before.

This is more of a hscommon refactoring than a dupeguru one, but since
duepGuru is the main user of Path, it was the driver behind the
refactoring.

This refactoring also see the introduction of @pathify, which ensure
Path arguments. Previously, we were often unsure of whether the caller
of a function was passing a Path or a str. This problem is now solved
and this allows us to remove hscommon.io, an ill-conceived attempt to
solve that same ambiguity problem.

Fixes #235.
2013-11-16 12:06:16 -05:00
Virgil Dupras
be8efea081 Fixed folder scanning in SE, which was completely broken
Oops
2013-08-18 20:50:31 -04:00
Virgil Dupras
7e8f9036d8 Began serious code documentation effort
Enabled the autodoc Sphinx extension and started adding docstrings to
classes, methods, etc.. It's quickly becoming quite interesting...
2013-08-18 18:36:09 -04:00
Virgil Dupras
7891fb5396 Refactoring: Moved some code from app.DupeGuru to fs.File.
Moved DupeGuru._get_display_info() to File.get_display_info().
This method used none of the app's global state or methods
and had nothing to do there.
2013-07-14 17:43:58 -04:00
Virgil Dupras
4a8ce9b6c4 Updated copyright year to 2013. 2013-04-28 10:35:51 -04:00
Virgil Dupras
df30a31782 Refactoring: Began to phase out to the use of hscommon.io in favor of Path methods. 2012-08-09 10:53:24 -04:00
Virgil Dupras
1171705921 Made core.fs.File slotted to save a lot of memory usage. 2012-05-29 17:39:54 -04:00
Virgil Dupras
657f6743c2 Changed copyright year to 2012 2012-03-15 14:28:40 -04:00
Virgil Dupras
56207f4dbb [#161 state:fixed] Fixed folder sorting. 2011-06-15 11:58:33 -04:00
Virgil Dupras
0b20b35ffb Fixed copying operations for folders which didn't work. 2011-04-14 12:55:50 +02:00
Virgil Dupras
279d44b7f3 [#89 state:fixed] Added a Folders scan type in dgse.
--HG--
rename : core_se/tests/fs_test.py => core/tests/fs_test.py
2011-04-12 13:22:29 +02:00
Virgil Dupras
0fea59007c Updated copyright year to 2011. 2011-04-12 10:04:01 +02:00
Virgil Dupras
eefe464fba Replaced dependencies from hsutil to hscommon. 2011-01-11 13:36:05 +01:00
Virgil Dupras
33c0ba808c Changed references to what has already been moved from hsutil to hscommon (io, path, testutil). 2011-01-11 11:59:53 +01:00
Virgil Dupras
4886982d43 Re-licensed to BSD 2010-09-30 12:17:41 +02:00
Virgil Dupras
565c990687 [#101 state:fixed] Remove the Creation Time column. 2010-08-13 09:26:38 +02:00
Virgil Dupras
854d194f88 Converted to py3k. There's probably some bugs still. So far, I managed to run dupeGuru SE under pyobjc and qt. 2010-08-11 16:39:06 +02:00
Virgil Dupras
b372974437 [#84 state:hold] Added debug logging to fs.get_files() to eventually figure out the cause of this bug. 2010-02-05 17:55:47 +01:00
Virgil Dupras
9f006ec08a [#75 state:fixed] md5 hashes are now computed incrementally. 2010-01-13 08:59:44 +01:00