dupeguru

mirror of https://github.com/arsenetar/dupeguru.git synced 2024-11-09 01:29:01 +00:00

Author	SHA1	Message	Date
Bruno Cabral	85a4557525	match all orientations (#1127 ) * match all orientations * use rotation as option --------- Co-authored-by: Andrew Senetar <arsenetar@gmail.com> Co-authored-by: Luke <byunghun.hyun26@gmail.com>	2024-02-19 09:19:33 -06:00
Luca Falavigna	007404f46a	Use isolation_level=None mode for GNU Hurd	2024-02-12 18:02:13 +01:00
Andrew Senetar	99ec4e0f27	fix: Minor cleanups and fixes - Update NullJob to subclass Job - Remove unnecessary size pre-read in _getMatches() as file sizes are already loaded during file scan via stat call - Skip ref check if contents scan as the scan already prevents this from happening, some of the other scans do things differently and need to be reviewed before removing this post step completely - Add guard on partial hashing to just hash the whole file if smaller than the offset and size and use the value for both the partial digest and digest	2023-06-08 01:14:52 -05:00
Andrew Senetar	057be0294a	fix: Prevent exception during existence check - Add "safe" existence check to files which catches OSErrors that may occur when trying to stat files - Use "safe" existence check during final existence check	2023-01-11 23:07:06 -06:00
Andrew Senetar	81daddd072	refactor: Improve digest cache db method performance - Remove lock on read operations, only needed for write operations - Change to use context manager for sqlite connection - Remove long lived cursor object and use short lived cursors instead Fixes #1080	2023-01-11 00:58:29 -06:00
Andrew Senetar	6db2fa2be6	fix: Correct flake8 config - Add exclude pattern for flake8 when running with pre-commit as it does not fully honor the exclude paths. - Cleanup exclude paths for flake8 in tox.ini - Re-enable line length check and correct three affected files	2023-01-09 22:35:12 -06:00
Andrew Senetar	e30a135451	feat: Add additional scan time options - Add option to include file existence check at end of scan, speeds up end of scan operation time considerably, however if user has removed or moved files since starting a scan there could be later errors when interacting with results. Defaults to existing behavior of including the check, until it can be verified later dialogs and actions handle non-existent items better. - Add option to ignore differences in mtime when checking hash cache. Option is present in advanced tab of preferences. Closes #1022. - Regenerate pot files for translations	2023-01-05 23:01:16 -06:00
Andrew Senetar	71af825b37	Move try/except of cache db to get() and put() - Move the try/except of cache db calls to the calls themselves. - Add some additional information to logging statements on cache db exception to improve troubleshooting.	2022-07-07 21:52:22 -05:00
Andrew Senetar	0a4e61edf5	Additional cleanup per mypy - Add Callable type to hasher (should realy be more specific...) - Add type hint to COLUMNS in qtlib/table.py - Use Qt.ItemFlag.ItemIsEnabled instead of Qt.itemIsEnabled in qtlib/table.py	2022-04-30 05:16:46 -05:00
Andrew Senetar	63dd4d4561	Apply pyupgrade changes	2022-04-27 20:53:12 -05:00
Andrew Senetar	a470a8de25	Update fs.py to optimize stat() calls - Update to get size and mtime at time of class creation when os.DirEntry is used for initialization. - Folders still calculate size later for folder scans. - Ref #962, #959	2022-03-30 22:58:01 -05:00
Andrew Senetar	efd500ecc1	Update directory scanning to use os.scandir() - Change to use os.scandir() instead of os.walk() to leverage DirEntry objects. - Avoids extra calls to stat() on files during fs.can_handle() - See 3x speed improvement on Windows in some cases	2022-03-29 23:37:56 -05:00
Andrew Senetar	43fcc52291	Replace pathlib.glob() with os.scandir() in fs.py	2022-03-29 22:35:38 -05:00
Andrew Senetar	50f5db1543	Update fs to support DirEntry on get_file()	2022-03-29 22:32:36 -05:00
Andrew Senetar	da9f8b2b9d	Squashed commit of the following: commit 8b15fe9a502ebf4841c6529e7098cef03a6a5e6f Author: Andrew Senetar <arsenetar@gmail.com> Date: Sun Mar 27 23:48:15 2022 -0500 Finish up changes to copy_or_move commit 21f6a32cf3186a400af8f30e67ad2743dc9a49bd Author: Andrew Senetar <arsenetar@gmail.com> Date: Thu Mar 17 23:56:52 2022 -0500 Migrate from hscommon.path to pathlib - Part one, this gets all hscommon and core tests passing - App appears to be able to load directories and complete scans, need further testing - app.py copy_or_move needs some additional work	2022-03-27 23:50:03 -05:00
Andrew Senetar	9f40e4e786	Squashed commit of the following: commit 5eb515f666bfa1ff06c2e96bdc351a4b7456580e Author: Andrew Senetar <arsenetar@gmail.com> Date: Sun Mar 27 22:19:39 2022 -0500 Add fallback to md5 if xxhash not available Mainly here for the case when distributions have not packaged python3-xxhash. commit `51b18d4c84` Author: Andrew Senetar <arsenetar@gmail.com> Date: Sat Mar 19 15:25:46 2022 -0500 Switch file hashing to xxhash instead of md5 - Improves performance significantly in some cases - Add xxhash to requirements.txt and sort requirements - Rename md5 based members to digest - Update all tests to use new member names and hashing methods - Update hash db code to upgrade schema NOTE: May consider supporting multiple hashing algorithms in the future.	2022-03-27 22:27:13 -05:00
Dobatymo	9753afba74	change FilesDB to singleton class move hash calculation back in to Files class clear cache now clears hash cache in addition to picture cache	2021-10-29 15:12:40 +08:00
Dobatymo	2f02a6010d	implement hash cache for md5 hash based on sqlite	2021-10-29 15:12:40 +08:00
Andrew Senetar	d576a7043c	Code cleanups in core and other affected files	2021-08-21 18:02:02 -05:00
Andrew Senetar	ffe6b7047c	Format all files with black correcting line length	2021-08-15 04:10:18 -05:00
glubsy	e95306e58f	Fix flake 8	2021-08-14 02:52:00 +02:00
glubsy	891a875990	Cache constant expression Perhaps the python byte code is already optimized, but just in case it is not, keep pre-compute the constant expression.	2021-08-13 21:33:21 +02:00
glubsy	545a5a75fb	Fix for older python versions The "walrus" operator is only available in python 3.8 and later. Fall back to more traditional notation.	2021-08-13 20:56:33 +02:00
glubsy	7b764f183e	Avoid partially hashing small files Computing 3 hash samples for files less than 3MiB (3 * CHUNK_SIZE) is not efficient since spans of later samples would overlap a previous one. Therefore we can simply return the hash of the entire small file instead.	2021-08-13 20:47:01 +02:00
glubsy	718ca5b313	Remove unused import	2021-06-22 02:41:33 +02:00
glubsy	277bc3fbb8	Add unit tests for hash sample optimization * Instead of keeping md5 samples separate, merge them as one hash computed from the various selected chunks we picked. * We don't need to keep a boolean to see whether or not the user chose to optimize; we can simply compare the value of the threshold, since 0 means no optimization currently active.	2021-06-21 22:44:05 +02:00
glubsy	e07dfd5955	Add partial hashes optimization for big files * Big files above the user selected threshold can be partially hashed in 3 places. * If the user is willing to take the risk, we consider files with identical md5samples as being identical.	2021-06-21 19:03:21 +02:00
Andrew Senetar	7ba8aa3514	Format files with black - Format all files with black - Update tox.ini flake8 arguments to be compatible - Add black to requirements-extra.txt - Reduce ignored flake8 rules and fix a few violations	2019-12-31 20:16:27 -06:00
Virgil Dupras	334f4dd2ae	Increase md5 reading buffer to 1mb This makes md5 computing faster without using too much memory.	2016-06-08 12:23:10 -04:00
Virgil Dupras	e7076bc3bd	Change license from BSD to GPLv3 See http://www.hardcoded.net/archive2014#2014-12-28 for context	2015-01-03 16:33:16 -05:00
Virgil Dupras	fc16ea8c49	Change copyright year to 2015	2015-01-03 16:30:57 -05:00
Virgil Dupras	2166a0996c	Added tox configuration ... and fixed pep8 warnings. There's a lot of them that are still ignored, but that's because it's too much of a step to take at once.	2014-10-13 15:08:59 -04:00
Virgil Dupras	ca709a60cf	Updated copyright year to 2014	2014-04-19 12:19:11 -04:00
Virgil Dupras	10dbfa9b38	Refactoring: Path API compatibility with pathlib Refactored dupeGuru to make hscommon.path's API a bit close to pathlib's API. It's not 100% compatible yet, but it's much better than before. This is more of a hscommon refactoring than a dupeguru one, but since duepGuru is the main user of Path, it was the driver behind the refactoring. This refactoring also see the introduction of @pathify, which ensure Path arguments. Previously, we were often unsure of whether the caller of a function was passing a Path or a str. This problem is now solved and this allows us to remove hscommon.io, an ill-conceived attempt to solve that same ambiguity problem. Fixes #235.	2013-11-16 12:06:16 -05:00
Virgil Dupras	be8efea081	Fixed folder scanning in SE, which was completely broken Oops	2013-08-18 20:50:31 -04:00
Virgil Dupras	7e8f9036d8	Began serious code documentation effort Enabled the autodoc Sphinx extension and started adding docstrings to classes, methods, etc.. It's quickly becoming quite interesting...	2013-08-18 18:36:09 -04:00
Virgil Dupras	7891fb5396	Refactoring: Moved some code from app.DupeGuru to fs.File. Moved DupeGuru._get_display_info() to File.get_display_info(). This method used none of the app's global state or methods and had nothing to do there.	2013-07-14 17:43:58 -04:00
Virgil Dupras	4a8ce9b6c4	Updated copyright year to 2013.	2013-04-28 10:35:51 -04:00
Virgil Dupras	df30a31782	Refactoring: Began to phase out to the use of hscommon.io in favor of Path methods.	2012-08-09 10:53:24 -04:00
Virgil Dupras	1171705921	Made core.fs.File slotted to save a lot of memory usage.	2012-05-29 17:39:54 -04:00
Virgil Dupras	657f6743c2	Changed copyright year to 2012	2012-03-15 14:28:40 -04:00
Virgil Dupras	56207f4dbb	[#161 state:fixed] Fixed folder sorting.	2011-06-15 11:58:33 -04:00
Virgil Dupras	0b20b35ffb	Fixed copying operations for folders which didn't work.	2011-04-14 12:55:50 +02:00
Virgil Dupras	279d44b7f3	[#89 state:fixed] Added a Folders scan type in dgse. --HG-- rename : core_se/tests/fs_test.py => core/tests/fs_test.py	2011-04-12 13:22:29 +02:00
Virgil Dupras	0fea59007c	Updated copyright year to 2011.	2011-04-12 10:04:01 +02:00
Virgil Dupras	eefe464fba	Replaced dependencies from hsutil to hscommon.	2011-01-11 13:36:05 +01:00
Virgil Dupras	33c0ba808c	Changed references to what has already been moved from hsutil to hscommon (io, path, testutil).	2011-01-11 11:59:53 +01:00
Virgil Dupras	4886982d43	Re-licensed to BSD	2010-09-30 12:17:41 +02:00
Virgil Dupras	565c990687	[#101 state:fixed] Remove the Creation Time column.	2010-08-13 09:26:38 +02:00
Virgil Dupras	854d194f88	Converted to py3k. There's probably some bugs still. So far, I managed to run dupeGuru SE under pyobjc and qt.	2010-08-11 16:39:06 +02:00

1 2

55 Commits