1
0
mirror of https://github.com/arsenetar/dupeguru.git synced 2024-10-31 22:05:58 +00:00
Commit Graph

52 Commits

Author SHA1 Message Date
057be0294a
fix: Prevent exception during existence check
- Add "safe" existence check to files which catches OSErrors that may
  occur when trying to stat files
- Use "safe" existence check during final existence check
2023-01-11 23:07:06 -06:00
81daddd072
refactor: Improve digest cache db method performance
- Remove lock on read operations, only needed for write operations
- Change to use context manager for sqlite connection
- Remove long lived cursor object and use short lived cursors instead

Fixes #1080
2023-01-11 00:58:29 -06:00
6db2fa2be6
fix: Correct flake8 config
- Add exclude pattern for flake8 when running with pre-commit as it does
  not fully honor the exclude paths.
- Cleanup exclude paths for flake8 in tox.ini
- Re-enable line length check and correct three affected files
2023-01-09 22:35:12 -06:00
e30a135451
feat: Add additional scan time options
- Add option to include file existence check at end of scan, speeds up
  end of scan operation time considerably, however if user has removed
  or moved files since starting a scan there could be later errors when
  interacting with results.  Defaults to existing behavior of including
  the check, until it can be verified later dialogs and actions handle
  non-existent items better.
- Add option to ignore differences in mtime when checking hash cache.
  Option is present in advanced tab of preferences.  Closes #1022.
- Regenerate pot files for translations
2023-01-05 23:01:16 -06:00
71af825b37
Move try/except of cache db to get() and put()
- Move the try/except of cache db calls to the calls themselves.
- Add some additional information to logging statements on cache db
  exception to improve troubleshooting.
2022-07-07 21:52:22 -05:00
0a4e61edf5
Additional cleanup per mypy
- Add Callable type to hasher (should realy be more specific...)
- Add type hint to COLUMNS in qtlib/table.py
- Use Qt.ItemFlag.ItemIsEnabled instead of Qt.itemIsEnabled in qtlib/table.py
2022-04-30 05:16:46 -05:00
63dd4d4561
Apply pyupgrade changes 2022-04-27 20:53:12 -05:00
a470a8de25
Update fs.py to optimize stat() calls
- Update to get size and mtime at time of class creation when os.DirEntry is used for initialization.
- Folders still calculate size later for folder scans.
- Ref #962, #959
2022-03-30 22:58:01 -05:00
efd500ecc1
Update directory scanning to use os.scandir()
- Change to use os.scandir() instead of os.walk() to leverage DirEntry objects.
- Avoids extra calls to stat() on files during fs.can_handle()
- See 3x speed improvement on Windows in some cases
2022-03-29 23:37:56 -05:00
43fcc52291
Replace pathlib.glob() with os.scandir() in fs.py 2022-03-29 22:35:38 -05:00
50f5db1543
Update fs to support DirEntry on get_file() 2022-03-29 22:32:36 -05:00
da9f8b2b9d
Squashed commit of the following:
commit 8b15fe9a502ebf4841c6529e7098cef03a6a5e6f
Author: Andrew Senetar <arsenetar@gmail.com>
Date:   Sun Mar 27 23:48:15 2022 -0500

    Finish up changes to copy_or_move

commit 21f6a32cf3186a400af8f30e67ad2743dc9a49bd
Author: Andrew Senetar <arsenetar@gmail.com>
Date:   Thu Mar 17 23:56:52 2022 -0500

    Migrate from hscommon.path to pathlib
    - Part one, this gets all hscommon and core tests passing
    - App appears to be able to load directories and complete scans, need further testing
    - app.py copy_or_move needs some additional work
2022-03-27 23:50:03 -05:00
9f40e4e786
Squashed commit of the following:
commit 5eb515f666bfa1ff06c2e96bdc351a4b7456580e
Author: Andrew Senetar <arsenetar@gmail.com>
Date:   Sun Mar 27 22:19:39 2022 -0500

    Add fallback to md5 if xxhash not available

    Mainly here for the case when distributions have not packaged python3-xxhash.

commit 51b18d4c84
Author: Andrew Senetar <arsenetar@gmail.com>
Date:   Sat Mar 19 15:25:46 2022 -0500

    Switch file hashing to xxhash instead of md5

    - Improves performance significantly in some cases
    - Add xxhash to requirements.txt and sort requirements
    - Rename md5 based members to digest
    - Update all tests to use new member names and hashing methods
    - Update hash db code to upgrade schema

    NOTE: May consider supporting multiple hashing algorithms in the future.
2022-03-27 22:27:13 -05:00
Dobatymo
9753afba74 change FilesDB to singleton class
move hash calculation back in to Files class
clear cache now clears hash cache in addition to picture cache
2021-10-29 15:12:40 +08:00
Dobatymo
2f02a6010d implement hash cache for md5 hash based on sqlite 2021-10-29 15:12:40 +08:00
d576a7043c
Code cleanups in core and other affected files 2021-08-21 18:02:02 -05:00
ffe6b7047c
Format all files with black correcting line length 2021-08-15 04:10:18 -05:00
glubsy
e95306e58f Fix flake 8 2021-08-14 02:52:00 +02:00
glubsy
891a875990 Cache constant expression
Perhaps the python byte code is already optimized, but just in case it is not, keep pre-compute the constant expression.
2021-08-13 21:33:21 +02:00
glubsy
545a5a75fb Fix for older python versions
The "walrus" operator is only available in python 3.8 and later. Fall back to more traditional notation.
2021-08-13 20:56:33 +02:00
glubsy
7b764f183e Avoid partially hashing small files
Computing 3 hash samples for files less than 3MiB (3 * CHUNK_SIZE) is not efficient since spans of later samples would overlap a previous one.
Therefore we can simply return the hash of the entire small file instead.
2021-08-13 20:47:01 +02:00
glubsy
718ca5b313 Remove unused import 2021-06-22 02:41:33 +02:00
glubsy
277bc3fbb8 Add unit tests for hash sample optimization
* Instead of keeping md5 samples separate, merge them as one hash computed from the various selected chunks we picked.
* We don't need to keep a boolean to see whether or not the user chose to optimize; we can simply compare the value of the threshold, since 0 means no optimization currently active.
2021-06-21 22:44:05 +02:00
glubsy
e07dfd5955 Add partial hashes optimization for big files
* Big files above the user selected threshold can be partially hashed in 3 places.
* If the user is willing to take the risk, we consider files with identical md5samples as being identical.
2021-06-21 19:03:21 +02:00
7ba8aa3514
Format files with black
- Format all files with black
- Update tox.ini flake8 arguments to be compatible
- Add black to requirements-extra.txt
- Reduce ignored flake8 rules and fix a few violations
2019-12-31 20:16:27 -06:00
Virgil Dupras
334f4dd2ae Increase md5 reading buffer to 1mb
This makes md5 computing faster without using too much memory.
2016-06-08 12:23:10 -04:00
Virgil Dupras
e7076bc3bd Change license from BSD to GPLv3
See http://www.hardcoded.net/archive2014#2014-12-28 for context
2015-01-03 16:33:16 -05:00
Virgil Dupras
fc16ea8c49 Change copyright year to 2015 2015-01-03 16:30:57 -05:00
Virgil Dupras
2166a0996c Added tox configuration
... and fixed pep8 warnings. There's a lot of them that are still
ignored, but that's because it's too much of a step to take at once.
2014-10-13 15:08:59 -04:00
Virgil Dupras
ca709a60cf Updated copyright year to 2014 2014-04-19 12:19:11 -04:00
Virgil Dupras
10dbfa9b38 Refactoring: Path API compatibility with pathlib
Refactored dupeGuru to make hscommon.path's API a bit close to pathlib's
API. It's not 100% compatible yet, but it's much better than before.

This is more of a hscommon refactoring than a dupeguru one, but since
duepGuru is the main user of Path, it was the driver behind the
refactoring.

This refactoring also see the introduction of @pathify, which ensure
Path arguments. Previously, we were often unsure of whether the caller
of a function was passing a Path or a str. This problem is now solved
and this allows us to remove hscommon.io, an ill-conceived attempt to
solve that same ambiguity problem.

Fixes #235.
2013-11-16 12:06:16 -05:00
Virgil Dupras
be8efea081 Fixed folder scanning in SE, which was completely broken
Oops
2013-08-18 20:50:31 -04:00
Virgil Dupras
7e8f9036d8 Began serious code documentation effort
Enabled the autodoc Sphinx extension and started adding docstrings to
classes, methods, etc.. It's quickly becoming quite interesting...
2013-08-18 18:36:09 -04:00
Virgil Dupras
7891fb5396 Refactoring: Moved some code from app.DupeGuru to fs.File.
Moved DupeGuru._get_display_info() to File.get_display_info().
This method used none of the app's global state or methods
and had nothing to do there.
2013-07-14 17:43:58 -04:00
Virgil Dupras
4a8ce9b6c4 Updated copyright year to 2013. 2013-04-28 10:35:51 -04:00
Virgil Dupras
df30a31782 Refactoring: Began to phase out to the use of hscommon.io in favor of Path methods. 2012-08-09 10:53:24 -04:00
Virgil Dupras
1171705921 Made core.fs.File slotted to save a lot of memory usage. 2012-05-29 17:39:54 -04:00
Virgil Dupras
657f6743c2 Changed copyright year to 2012 2012-03-15 14:28:40 -04:00
Virgil Dupras
56207f4dbb [#161 state:fixed] Fixed folder sorting. 2011-06-15 11:58:33 -04:00
Virgil Dupras
0b20b35ffb Fixed copying operations for folders which didn't work. 2011-04-14 12:55:50 +02:00
Virgil Dupras
279d44b7f3 [#89 state:fixed] Added a Folders scan type in dgse.
--HG--
rename : core_se/tests/fs_test.py => core/tests/fs_test.py
2011-04-12 13:22:29 +02:00
Virgil Dupras
0fea59007c Updated copyright year to 2011. 2011-04-12 10:04:01 +02:00
Virgil Dupras
eefe464fba Replaced dependencies from hsutil to hscommon. 2011-01-11 13:36:05 +01:00
Virgil Dupras
33c0ba808c Changed references to what has already been moved from hsutil to hscommon (io, path, testutil). 2011-01-11 11:59:53 +01:00
Virgil Dupras
4886982d43 Re-licensed to BSD 2010-09-30 12:17:41 +02:00
Virgil Dupras
565c990687 [#101 state:fixed] Remove the Creation Time column. 2010-08-13 09:26:38 +02:00
Virgil Dupras
854d194f88 Converted to py3k. There's probably some bugs still. So far, I managed to run dupeGuru SE under pyobjc and qt. 2010-08-11 16:39:06 +02:00
Virgil Dupras
b372974437 [#84 state:hold] Added debug logging to fs.get_files() to eventually figure out the cause of this bug. 2010-02-05 17:55:47 +01:00
Virgil Dupras
9f006ec08a [#75 state:fixed] md5 hashes are now computed incrementally. 2010-01-13 08:59:44 +01:00
Virgil Dupras
d62ff40bed Removed svn keywords. 2010-01-02 16:52:18 +01:00