1
0
mirror of https://github.com/arsenetar/dupeguru.git synced 2026-01-22 06:37:17 +00:00

Update site to include help

- Add sphinx documentation generated from build to help
- Add link to help (in english) in header
- Add link to github in header
This commit is contained in:
2018-04-08 11:25:34 -05:00
parent 10f06999ed
commit bacba3f0a5
431 changed files with 117335 additions and 0 deletions

View File

@@ -0,0 +1,705 @@
:tocdepth: 1
Changelog
=========
**About the word "crash":** When reading this changelog, you might be alarmed at the number of fixes
for "crashes". Be aware that when the word "crash" is used here, it refers to "soft crashes" which
don't cause the application to quit. You simply get an error window that asks you if you want to
send the crash report to Hardcoded Software. Crashes that cause the application to quit are called
"hard crashes" in this changelog.
4.0.3 (2016-11-24)
----------------------
* Add new picture cache backend: shelve
* Make shelve picture cache backend the active one on MacOS to fix `#394 <https://github.com/hsoft/dupeguru/issues/394>`__ more
elegantly. [cocoa]
* Remove Sparkle (auto-updates) due to technical limitations. [cocoa]
4.0.2 (2016-10-09)
----------------------
* Fix systematic crash in Picture Mode under MacOS Sierra. (`#394 <https://github.com/hsoft/dupeguru/issues/394>`__)
* No change for Linux. Just keeping version in sync.
4.0.1 (2016-08-24)
----------------------
* Add Greek localization, by Gabriel Koutilellis. (`#382 <https://github.com/hsoft/dupeguru/issues/382>`__)
* Fix localization base path. [qt] (`#378 <https://github.com/hsoft/dupeguru/issues/378>`__)
* Fix broken load results dialog. [qt]
* Fix crash on load results. [cocoa] (`#380 <https://github.com/hsoft/dupeguru/issues/380>`__)
* Save preferences more predictably. [qt] (`#379 <https://github.com/hsoft/dupeguru/issues/379>`__)
* Fix picture mode's fuzzy block scanner threshold. (`#387 <https://github.com/hsoft/dupeguru/issues/387>`__)
4.0.0 (2016-07-01)
----------------------
* Merge Standard, Music and Picture editions in the same application!
* Improve documentation. (`#294 <https://github.com/hsoft/dupeguru/issues/294>`__)
* Add Polish, Korean, Spanish and Dutch localizations.
* qt: Fix wrong use_regexp option propagation to core. (`#295 <https://github.com/hsoft/dupeguru/issues/295>`__)
* qt: Fix progress window mistakenly showing up on startup. (`#357 <https://github.com/hsoft/dupeguru/issues/357>`__)
* Bump Python requirement to v3.4.
* Bump OS X requirement to 10.8
* Drop Windows support, maybe temporarily.
`Details <https://www.hardcoded.net/archive2015`#2015 <https://github.com/hsoft/dupeguru/issues/2015>`__-11-01>`_
* cocoa: Drop iPhoto, Aperture and iTunes support. Was unmaintained and obsolete.
* Drop "Audio Contents" scan type. Was confusing and seldom useful.
* Change license to GPLv3
3.9.1 (2014-10-17)
----------------------
* Fixed ``AttributeError: 'ComboboxModel' object has no attribute 'reset'``. [Linux, Windows] (`#254 <https://github.com/hsoft/dupeguru/issues/254>`__)
* Fixed ``PermissionError`` on saving results. (`#266 <https://github.com/hsoft/dupeguru/issues/266>`__)
* Fixed a build problem introduced by Sphinx 1.2.3.
* Updated German localisation, by Frank Weber.
3.9.0 (2014-04-19)
----------------------
* This is mostly a dependencies upgrade.
* Upgraded to Python 3.3.
* Upgraded to Qt 5.
* Minimum Windows version is now Windows 7 64bit.
* Minimum Ubuntu version is now 14.04.
* Minimum OS X version is now 10.7 (Lion).
* ... But with a couple of little improvements.
* Improved documentation.
* Overwrite subfolders' state when setting states in folder dialog (`#248 <https://github.com/hsoft/dupeguru/issues/248>`__)
* The error report dialog now brings the user to Github issues.
3.8.0 (2013-12-07)
----------------------
* Disable symlink/hardlink deletion option when not relevant. (`#247 <https://github.com/hsoft/dupeguru/issues/247>`__)
* Make Cmd+A select all folders in the Folder Selection dialog. [Mac] (`#228 <https://github.com/hsoft/dupeguru/issues/228>`__)
* Make non-numeric delta comparison case insensitive. (`#239 <https://github.com/hsoft/dupeguru/issues/239>`__)
* Fix surrogate-related UnicodeEncodeError on CSV export. (`#210 <https://github.com/hsoft/dupeguru/issues/210>`__)
* Fixed crash on Dupe Count sorting with Delta + Dupes Only. (`#238 <https://github.com/hsoft/dupeguru/issues/238>`__)
* Improved documentation.
* Important internal refactorings.
* Dropped Ubuntu 12.04 and 12.10 support.
* Removed the fairware dialog (`More Info <http://www.hardcoded.net/articles/phasing-out-fairware>`__).
3.7.1 (2013-08-19)
----------------------
* Fixed folder scan type, which was broken in v3.7.0.
3.7.0 (2013-08-17)
----------------------
* Improved delta values to support non-numerical values. (`#213 <https://github.com/hsoft/dupeguru/issues/213>`__)
* Improved the Re-Prioritize dialog's UI. (`#224 <https://github.com/hsoft/dupeguru/issues/224>`__)
* Added hardlink/symlink support on Windows Vista+. (`#220 <https://github.com/hsoft/dupeguru/issues/220>`__)
* Dropped 32bit support on Mac OS X.
* Added Vietnamese localization by Phan Anh.
3.6.1 (2013-04-28)
----------------------
* Improved "Make Selection Reference" to make it clearer. (`#222 <https://github.com/hsoft/dupeguru/issues/222>`__)
* Improved "Open Selected" to allow opening more than one file at once. (`#142 <https://github.com/hsoft/dupeguru/issues/142>`__)
* Fixed a few typos here and there. (`#216 <https://github.com/hsoft/dupeguru/issues/216>`__ `#225 <https://github.com/hsoft/dupeguru/issues/225>`__)
* Tweaked the fairware dialog (`More Info <http://www.hardcoded.net/articles/phasing-out-fairware>`__).
* Added Arch Linux packaging
* Added a 64-bit build for Windows.
* Improved Russian localization by Kyrill Detinov.
* Improved Brazilian localization by Victor Figueiredo.
3.6.0 (2012-08-08)
----------------------
* Added "Export to CSV". (`#189 <https://github.com/hsoft/dupeguru/issues/189>`__)
* Added "Replace with symlinks" to complement "Replace with hardlinks". [Mac, Linux] (`#194 <https://github.com/hsoft/dupeguru/issues/194>`__)
* dupeGuru now tells how many duplicates were affected after each re-prioritization operation. (`#204 <https://github.com/hsoft/dupeguru/issues/204>`__)
* Added Longest/Shortest filename criteria in the re-prioritize dialog. (`#198 <https://github.com/hsoft/dupeguru/issues/198>`__)
* Fixed result table cells which mistakenly became writable in v3.5.0. [Mac] (`#203 <https://github.com/hsoft/dupeguru/issues/203>`__)
* Fixed "Rename Selected" which was broken since v3.5.0. [Mac] (`#202 <https://github.com/hsoft/dupeguru/issues/202>`__)
* Fixed a bug where "Reset to Defaults" in the Columns menu wouldn't refresh menu items' marked state.
* Added Brazilian localization by Victor Figueiredo.
3.5.0 (2012-06-01)
----------------------
* Added a Deletion Options panel.
* Greatly improved memory usage for big scans.
* Added a keybinding for the filter field. (`#182 <https://github.com/hsoft/dupeguru/issues/182>`__) [Mac]
* Upgraded minimum requirements for Ubuntu to 12.04.
3.4.1 (2012-04-14)
----------------------
* Fixed the "Folders" scan type. [Mac]
* Fixed localization issues. [Windows, Linux]
3.4.0 (2012-03-29)
----------------------
* Improved results window UI. [Windows, Linux]
* Added a dialog to edit the Ignore List.
* Added the ability to sort results by "marked" status.
* Fixed "Open with default application". (`#190 <https://github.com/hsoft/dupeguru/issues/190>`__)
* Fixed a bug where there would be a false reporting of discarded matches. (`#195 <https://github.com/hsoft/dupeguru/issues/195>`__)
* Fixed various localization glitches.
* Fixed hard crashes on crash reporting. (`#196 <https://github.com/hsoft/dupeguru/issues/196>`__)
* Fixed bug where the details panel would show up at inconvenient places in the screen. [Windows, Linux]
3.3.3 (2012-02-01)
----------------------
* Fixed crash on adding some folders. [Mac OS X]
* Added Ukrainian localization by Yuri Petrashko.
3.3.2 (2012-01-16)
----------------------
* Fixed random hard crashes (yeah, again). [Mac OS X]
* Fixed crash on Export to HTML. [Windows, Linux]
* Added Armenian localization by Hrant Ohanyan.
* Added Russian localization by Igor Pavlov.
3.3.1 (2011-12-02)
----------------------
* Fixed a couple of nasty crashes.
3.3.0 (2011-11-30)
----------------------
* Added multiple-selection in folder selection dialog for a more efficient folder removal. (`#179 <https://github.com/hsoft/dupeguru/issues/179>`__)
* Fixed a crash in the prioritize dialog. (`#178 <https://github.com/hsoft/dupeguru/issues/178>`__)
* Fixed a bug where mass marking with a filter would mark more than filtered duplicates. (`#181 <https://github.com/hsoft/dupeguru/issues/181>`__)
* Fixed random hard crashes. [Mac OS X] (`#183 <https://github.com/hsoft/dupeguru/issues/183>`__ `#184 <https://github.com/hsoft/dupeguru/issues/184>`__)
* Added Czech localization by Aleš Nehyba.
* Added Italian localization by Paolo Rossi.
3.2.1 (2011-10-02)
----------------------
* Fixed a couple of broken action bindings from v3.2.0.
3.2.0 (2011-09-27)
----------------------
* Added duplicate re-prioritization dialog. (`#138 <https://github.com/hsoft/dupeguru/issues/138>`__)
* Added font size preference for duplicate table. (`#82 <https://github.com/hsoft/dupeguru/issues/82>`__)
* Added Quicklook support. [Mac OS X] (`#21 <https://github.com/hsoft/dupeguru/issues/21>`__)
* Improved behavior of Mark Selected. (`#139 <https://github.com/hsoft/dupeguru/issues/139>`__)
* Improved filename sorting. (`#169 <https://github.com/hsoft/dupeguru/issues/169>`__)
* Added Chinese (Simplified) localization by Eric Dee.
* Tweaked the fairware system.
* Upgraded minimum requirements to OS X 10.6 and Ubuntu 11.04.
3.1.2 (2011-08-25)
----------------------
* Fixed a bug preventing the Folders scan from working. (`#172 <https://github.com/hsoft/dupeguru/issues/172>`__)
3.1.1 (2011-08-24)
----------------------
* Added German localization by Gregor Tätzner.
* Improved OS X Lion compatibility. [Mac OS X]
* Made the file collection phase cancellable. (`#168 <https://github.com/hsoft/dupeguru/issues/168>`__)
* Fixed glitch in folder window upon selecting a folder state. [Windows, Linux] (`#165 <https://github.com/hsoft/dupeguru/issues/165>`__)
* Fixed a text coloring glitch in the results. (`#156 <https://github.com/hsoft/dupeguru/issues/156>`__)
* Fixed glitch in the sorting feature of the Folder column. (`#161 <https://github.com/hsoft/dupeguru/issues/161>`__)
* Make sure that saved results have the ".dupeguru" extension. [Linux] (`#157 <https://github.com/hsoft/dupeguru/issues/157>`__)
3.1.0 (2011-04-16)
----------------------
* Added the "Folders" scan type. (`#89 <https://github.com/hsoft/dupeguru/issues/89>`__)
* Fixed a couple of crashes. (`#140 <https://github.com/hsoft/dupeguru/issues/140>`__ `#149 <https://github.com/hsoft/dupeguru/issues/149>`__)
3.0.2 (2011-03-16)
----------------------
* Fixed crash after removing marked dupes. (`#140 <https://github.com/hsoft/dupeguru/issues/140>`__)
* Fixed crash on error handling. [Windows] (`#144 <https://github.com/hsoft/dupeguru/issues/144>`__)
* Fixed crash on copy/move. [Windows] (`#148 <https://github.com/hsoft/dupeguru/issues/148>`__)
* Fixed crash when launching dupeGuru from a very long folder name. [Mac OS X] (`#119 <https://github.com/hsoft/dupeguru/issues/119>`__)
* Fixed a refresh bug in directory panel. (`#153 <https://github.com/hsoft/dupeguru/issues/153>`__)
* Improved reliability of the "Send to Trash" operation. [Linux]
* Tweaked Fairware reminders.
3.0.1 (2011-01-27)
----------------------
* Restored the context menu which had been broken in 3.0.0. [Mac OS X] (`#133 <https://github.com/hsoft/dupeguru/issues/133>`__)
* Fixed a bug where an "unsaved results" warning would be issued on quit even with empty results. (`#134 <https://github.com/hsoft/dupeguru/issues/134>`__)
* Removed focus from the cancel button in the progress dialog to avoid accidental cancellations. [Mac OS X] (`#135 <https://github.com/hsoft/dupeguru/issues/135>`__)
* Folders added through drag and drop are added to the recent folders list. (`#136 <https://github.com/hsoft/dupeguru/issues/136>`__)
* Added a debugging mode. (`#132 <https://github.com/hsoft/dupeguru/issues/132>`__)
* Fixed french localization glitches.
3.0.0 (2011-01-24)
----------------------
* Re-designed the UI. (`#129 <https://github.com/hsoft/dupeguru/issues/129>`__)
* Internationalized dupeGuru and localized it to french. (`#32 <https://github.com/hsoft/dupeguru/issues/32>`__)
* Changed the format of the help file. (`#130 <https://github.com/hsoft/dupeguru/issues/130>`__)
2.12.3 (2011-01-01)
----------------------
* Fixed bug causing results to be corrupted after a scan cancellation. (`#120 <https://github.com/hsoft/dupeguru/issues/120>`__)
* Fixed crash when fetching Fairware unpaid hours. (`#121 <https://github.com/hsoft/dupeguru/issues/121>`__)
* Fixed crash when replacing files with hardlinks. (`#122 <https://github.com/hsoft/dupeguru/issues/122>`__)
2.12.2 (2010-10-05)
----------------------
* Fixed delta column colors which were broken since 2.12.0.
* Fixed column sorting crash. (`#108 <https://github.com/hsoft/dupeguru/issues/108>`__)
* Fixed occasional crash during scan. (`#106 <https://github.com/hsoft/dupeguru/issues/106>`__)
2.12.1 (2010-09-30)
----------------------
* Re-licensed dupeGuru to BSD and made it `Fairware <http://open.hardcoded.net/about/>`__.
2.12.0 (2010-09-26)
----------------------
* Improved UI with a little revamp.
* Added the possibility to place hardlinks to references after having deleted duplicates. [Mac OS X, Linux] (`#91 <https://github.com/hsoft/dupeguru/issues/91>`__)
* Added an option to ignore duplicates hardlinking to the same file. [Mac OS X, Linux] (`#92 <https://github.com/hsoft/dupeguru/issues/92>`__)
* Added multiple selection in the "Add Directory" dialog. [Mac OS X] (`#105 <https://github.com/hsoft/dupeguru/issues/105>`__)
* Fixed a bug preventing drag & drop from working in the Directories panel. [Windows, Linux]
2.11.1 (2010-08-26)
----------------------
* Fixed HTML exporting which was broken in 2.11.0.
2.11.0 (2010-08-18)
----------------------
* Added the ability to save results (and reload them) at arbitrary locations.
* Improved the way reference files in dupe groups are chosen. (`#15 <https://github.com/hsoft/dupeguru/issues/15>`__)
* Remember size/position of all windows between launches. (`#102 <https://github.com/hsoft/dupeguru/issues/102>`__)
* Fixed a bug sometimes preventing dupeGuru from reloading previous results.
* Fixed a bug sometimes causing the progress dialog to be stuck there. [Mac OS X] (`#103 <https://github.com/hsoft/dupeguru/issues/103>`__)
* Removed the Creation Date column, which wasn't displaying the correct value anyway. (`#101 <https://github.com/hsoft/dupeguru/issues/101>`__)
2.10.1 (2010-07-15)
----------------------
* Fixed a couple of crashes. (`#95 <https://github.com/hsoft/dupeguru/issues/95>`__, `#97 <https://github.com/hsoft/dupeguru/issues/97>`__, `#100 <https://github.com/hsoft/dupeguru/issues/100>`__)
2.10.0 (2010-04-13)
----------------------
* Improved error messages when files can't be sent to trash, moved or copied.
* Added a custom command invocation action. (`#12 <https://github.com/hsoft/dupeguru/issues/12>`__)
* Filters are now applied on whole paths. (`#4 <https://github.com/hsoft/dupeguru/issues/4>`__)
2.9.2 (2010-02-10)
----------------------
* dupeGuru is now 64-bit on Mac OS X!
* Fixed a crash upon quitting when support folder is not present. (`#83 <https://github.com/hsoft/dupeguru/issues/83>`__)
* Fixed a crash during sorting. (`#85 <https://github.com/hsoft/dupeguru/issues/85>`__)
* Fixed selection glitches, especially while renaming. (`#93 <https://github.com/hsoft/dupeguru/issues/93>`__)
2.9.1 (2010-01-13)
----------------------
* Improved memory usage for Contents scans. (`#75 <https://github.com/hsoft/dupeguru/issues/75>`__)
* Improved scanning speed when ref directories are involved. (`#77 <https://github.com/hsoft/dupeguru/issues/77>`__)
* Show a message dialog at the end of the scan if no duplicates are found. (`#81 <https://github.com/hsoft/dupeguru/issues/81>`__)
* Fixed a bug sometimes causing the small files threshold pref to be ignored. [Mac OS X] (`#75 <https://github.com/hsoft/dupeguru/issues/75>`__)
2.9.0 (2009-11-03)
----------------------
* Significantly improved speed and memory usage of big contents-based scans.
* Added drag & drop support in the Directories panel. (`#9 <https://github.com/hsoft/dupeguru/issues/9>`__)
* Fixed a bug causing dupeGuru to be confused if a scanned file was moved during the scan. (`#72 <https://github.com/hsoft/dupeguru/issues/72>`__)
* Dropped support for Mac OS X 10.4 (Tiger)
2.8.2 (2009-10-14)
----------------------
* Improved directory selection in the Directories panel (Windows). (`#56 <https://github.com/hsoft/dupeguru/issues/56>`__)
* Fixed a bug preventing dupeGuru from starting on certain machines (Windows). (`#68 <https://github.com/hsoft/dupeguru/issues/68>`__)
* Fixed a crash during very big scans. (`#70 <https://github.com/hsoft/dupeguru/issues/70>`__)
2.8.1 (2009-10-02)
----------------------
* Fixed crash with filtering when regular expressions were enabled. (`#60 <https://github.com/hsoft/dupeguru/issues/60>`__)
* Fixed crash when setting directories' state. (Mac OS X) (`#66 <https://github.com/hsoft/dupeguru/issues/66>`__)
* Fixed crash with Make Reference when certain filters are applied. (Mac OS X) (`#55 <https://github.com/hsoft/dupeguru/issues/55>`__)
* Improved error handling during delete/move/copy actions. (`#62 <https://github.com/hsoft/dupeguru/issues/62>`__ `#65 <https://github.com/hsoft/dupeguru/issues/65>`__)
2.8.0 (2009-09-07)
----------------------
* Added support for all kinds of bundle (not just applications) (Mac OS X) (`#11 <https://github.com/hsoft/dupeguru/issues/11>`__)
* Re-introduced the Export to XHTML feature to Windows. (`#14 <https://github.com/hsoft/dupeguru/issues/14>`__)
* Improved Export to XHTML speed. (`#14 <https://github.com/hsoft/dupeguru/issues/14>`__)
* Improved Contents scanning speed for large files. (`#33 <https://github.com/hsoft/dupeguru/issues/33>`__)
* Improved the grouping algorithm to reduce the number of discarded files in non-exact scans. (`#51 <https://github.com/hsoft/dupeguru/issues/51>`__)
* Stopped showing the same file on the 2 sides of the details panel when a ref file is selected. (`#50 <https://github.com/hsoft/dupeguru/issues/50>`__)
* Fixed crashes in the Directories panel. (`#46 <https://github.com/hsoft/dupeguru/issues/46>`__)
2.7.3 (2009-06-20)
----------------------
* Fixed bugs with selection being jumpy during "Make Reference" actions and Power Marker
switches. (`#3 <https://github.com/hsoft/dupeguru/issues/3>`__)
* Fixed crash happening when a file with non-roman characters couldn't be analyzed. (`#30 <https://github.com/hsoft/dupeguru/issues/30>`__)
* Fixed crash sometimes happening during the file collection phase in scanning. (`#38 <https://github.com/hsoft/dupeguru/issues/38>`__)
* Restored double-click and right-click behavior lost in the PyQt move (Windows). (`#34 <https://github.com/hsoft/dupeguru/issues/34>`__ `#35 <https://github.com/hsoft/dupeguru/issues/35>`__)
2.7.2 (2009-06-10)
----------------------
* Fixed an occasional crash on Copy/Move operations. (`#16 <https://github.com/hsoft/dupeguru/issues/16>`__)
* Added automatic exclusion for sensible folders (like system folders). (`#20 <https://github.com/hsoft/dupeguru/issues/20>`__)
* Fixed an occasional crash when application files were part of the results (Mac OS X). (`#25 <https://github.com/hsoft/dupeguru/issues/25>`__)
2.7.1 (2009-05-29)
----------------------
* Fixed a bug causing crashes when having application files in the results.
* Fixed a bug causing a GUI freeze at the beginning of a scan with a lot of files.
* Fixed a bug that sometimes caused a crash when an action was cancelled, and then started again.
2.7.0 (2009-05-25)
----------------------
* Converted the Windows GUI to Qt.
* Improved the reliability of the scanning process.
2.6.1 (2009-03-27)
----------------------
* **Fixed** an occasional crash caused by permission issues.
* **Fixed** a bug where the "X discarded" notice would show a too large number of discarded
duplicates.
2.6.0 (2008-09-10)
----------------------
* **Added** a small file threshold preference.
* **Added** a notice in the status bar when matches were discarded during the scan.
* **Improved** duplicate prioritization (smartly chooses which file you will keep).
* **Improved** scan progress feedback.
* **Improved** responsiveness of the user interface for certain actions.
2.5.4 (2008-08-10)
----------------------
* **Improved** the speed of results loading and saving.
* **Fixed** a crash sometimes occurring during duplicate deletion.
2.5.3 (2008-07-08)
----------------------
* **Improved** unicode handling for filenames. dupeGuru will now find a lot more duplicates if your files have non-ascii characters in it.
* **Fixed** "Clear Ignore List" crash in Windows.
2.5.2 (2008-01-10)
----------------------
* **Improved** the handling of low memory situations.
* **Improved** the directory panel. The "Remove" button changes to "Put Back" when an excluded directory is selected.
* **Improved** scan, delete and move speed in situations where there were a lot of duplicates.
* **Fixed** occasional crashes when moving bundles (such as .app files).
* **Fixed** occasional crashes when moving a lot of files at once.
2.5.1 (2007-11-22)
----------------------
* **Added** the "Remove empty folders" option.
* **Fixed** results load/save issues.
* **Fixed** occasional status bar inaccuracies when the results are filtered.
2.5.0 (2007-09-15)
----------------------
* **Added** post scan filtering.
* **Fixed** issues with the rename feature under Windows
* **Fixed** some user interface annoyances under Windows
2.4.8 (2007-04-14)
----------------------
* **Improved** UI responsiveness (using threads) under Mac OS X.
* **Improved** result load/save speed and memory usage.
2.4.7 (2007-03-10)
----------------------
* **Fixed** a "bad file descriptor" error occasionally popping up.
* **Fixed** a bug with non-latin directory names.
2.4.6 (2007-02-10)
----------------------
* **Added** Re-orderable columns. In fact, I re-added the feature which was lost in the C# conversion in 2.4.0 (Windows).
* **Changed** the behavior of the scanning engine when setting the hardness to 100. It will now only match files that have their words in the same order.
* **Fixed** a bug with all the Delete/Move/Copy actions with certain kinds of files.
2.4.5 (2007-01-11)
----------------------
* **Fixed** a bug with the Move action.
2.4.4 (2007-01-07)
----------------------
* **Fixed** a "ghosting" bug. Dupes deleted by dupeGuru would sometimes come back in subsequent scans (Windows).
* **Fixed** bugs sometimes making dupeGuru crash when marking a dupe (Windows).
* **Fixed** some minor visual glitches (Windows).
2.4.3 (2006-12-08)
----------------------
* **Fixed** a mishandling of ".app" files (OS X).
* **Fixed** a bug preventing files from "reference" directories to be displayed in blue in the results (Windows).
* **Fixed** a bug preventing some files to be sent to the recycle bin (Windows).
* **Fixed** a bug in the packaging preventing certain Windows configurations to start dupeGuru at all.
2.4.2 (2006-11-18)
----------------------
* **Fixed** a bug with directory states.
2.4.1 (2006-11-15)
----------------------
* **Fixed** a bug causing the ignore list not to be saved.
* **Fixed** a bug sometimes making delete and move operations stall.
2.4.0 (2006-11-10)
----------------------
* **Changed** the Windows interface. It is now .NET based.
* **Added** an auto-update feature to the windows version.
* **Changed** the way power marking works. It is now a mode instead of a separate window.
* **Changed** the "Size (MB)" column for a "Size (KB)" column. The values are now "ceiled" instead of rounded. Therefore, a size "0" is now really 0 bytes, not just a value too small to be rounded up. It is also the case for delta values.
* **Removed** the min word length/count options. These came from Mp3 Filter, and just aren't used anymore. Word weighting does pretty much the same job.
2.3.4 (2006-11-07)
----------------------
* **Improved** speed and memory usage of the scanning engine, again. Does it mean there was a lot of improvements to be made? Nah...
2.3.3 (2006-11-02)
----------------------
* **Improved** speed and memory usage of the scanning engine, especially when the scan results in a lot of duplicates.
* Now I wonder if Sparkle is going to work well...
2.3.2 (2006-10-16)
----------------------
* **Added** an auto-update feature in the Mac OS X version (with Sparkle).
* **Fixed** a bug preventing some duplicate reports to be created correctly under Windows.
2.3.1 (2006-10-02)
----------------------
* **Fixed** a bug preventing some duplicates to be found, especially when scanning lots of files.
2.3.0 (2006-09-22)
----------------------
* **Added** XHTML export feature.
2.2.10 (2006-08-31)
----------------------
* **Added** sticky columns.
* **Fixed** an issue with file caching between scans.
* **Fixed** an issue preventing some duplicates from being deleted/moved/copied.
2.2.9 (2006-08-27)
----------------------
* **Fixed** an issue with ignore list and unicode.
* **Fixed** an issue with file attribute fetching sometimes causing dupeGuru to crash.
* **Fixed** an issue in the directories panel under Windows.
2.2.8 (2006-08-17)
----------------------
* **Fixed** an issue in the duplicate seeking engine preventing some duplicates to be found.
2.2.7 (2006-08-12)
----------------------
* **Improved** unicode support.
* **Improved** the "Reveal in Finder" ("Open Containing Folder" in Windows) feature so it selects the file in the folder it opens.
2.2.6 (2006-08-07)
----------------------
* **Improved** the ignore list system.
* dupeGuru is now a Universal application on Mac OS X.
2.2.5 (2006-07-26)
----------------------
* **Improved** application (.app) dupe detection on Mac OS X.
* **Fixed** an issue that occasionally made dupeGuru crash on startup.
2.2.4 (2006-06-27)
----------------------
* **Fixed** an issue with Move and Copy features.
2.2.3 (2006-06-15)
----------------------
* **Improved** duplicate scanning speed.
* **Added** a warning that a file couldn't be renamed if a file with the same name already exists.
2.2.2 (2006-06-07)
----------------------
* **Added** "Rename Selected" feature.
* **Fixed** some minor issues with "Reload Last Results" feature.
* **Fixed** ignore list issues.
2.2.1 (2006-05-22)
----------------------
* **Fixed** occasional progress bar woes under Windows.
* **Fixed** a bug in the registration system under Windows.
* Nothing has been changed in the Mac OS X version, but I want to keep version in sync.
2.2.0 (2006-05-10)
----------------------
* **Added** destination path re-creation options.
* **Added** an ignore list.
* **Changed** the main icon.
* **Improved** dramatically the delta values feature.
2.1.2 (2006-04-18)
----------------------
* **Added** the "Match similar words" option.
* **Fixed** Power marking issues under Mac.
2.1.1 (2006-04-14)
----------------------
* **Added** the "Display delta values" option.
* **Improved** Power marking sorting speed under Mac.
* **Fixed** Power marking sorting issues.
2.1.0 (2006-04-03)
----------------------
* **Added** the Power Marker feature.
* **Fixed** a column sorting bug. The results would sometimes lose their sort order.
* **Fixed** a bug with the Make Reference feature. The results sometimes wasn't correctly refreshed after the reference switch.
2.0.1 (2006-03-23)
----------------------
* **Fixed** an issue occasionally occurring when trying to reload results from removable media that is no longer present.
2.0.0 (2006-03-17)
----------------------
* Complete rewrite.
* Now runs on Mac OS X.
1.0.0 (2004-09-24)
----------------------
* Initial release.

View File

@@ -0,0 +1,91 @@
Contribute to dupeGuru
======================
dupeGuru was started as shareware (thus proprietary) so it doesn't have a legacy of
community-building. It's `been open source`_ for a while now and, although I've ("I" being Virgil
Dupras, author of the software) always wanted to have people other than me working on dupeGuru, I've
failed at attracting them.
Since the end of 2013, I've been putting a lot of efforts into dupeGuru's
:doc:`developer documentation </developer/index>` and I'm more serious about my commitment to create
a community around this project.
So, whatever your skills, if you're interested in contributing to dupeGuru, please do so. Normally,
this documentation should be enough to get you started, but if it isn't, then **please**,
`let me know`_ because it's a problem that I'm committed to fix. If there's any situation where you'd
wish to contribute but some doubt you're having prevent you from going forward, please contact me.
I'd much prefer to spend the time figuring out with you whether (and how) you can contribute than
taking the chance of missing that opportunity.
Development process
-------------------
* `Source code repository`_
* `Issue Tracker`_
* `Issue labels meaning`_
dupeGuru's source code is on Github and thus managed in a Git repository. At all times, you should
be able to build from source a fresh checkout of the ``master`` branch using instructions from the
``README.md`` file at the root of this project. If you can't, it's a bug. Please report it.
``master`` is the main development branch, and thus represents what going to be included in the
next feature release. When needed, we create maintenance branches for bugfixes of the current
feature release.
When implementing a big feature, it's possible that it gets its own branch until
it's stable enough to merge into ``master``.
Every release is tagged, the tag name containing the edition (for old versions) and its version.
For example, release 6.6.0 of dupeGuru ME is tagged ``me6.6.0``. Newer releases are tagged only
with the version number (because editions don't exist anymore), for example ``4.0.0``.
Once you're past building the software, the :doc:`developer documentation </developer/index>` should
be enough to get you started with actual development. Then again, proper documentation is a very
difficult task and, in the case of dupeGuru, this documentation was practically nonexistent until
late in the project, so it's still lacking.
However, I'm committed to fix this situation, so if you're in a situation where you lack proper
documentation to figure something out about this code, please contact me.
Tasks for non-developers
------------------------
**Create and comment issues**. The single most useful way for a user who is not a developer to
contribute to a software project is by thoroughly documenting a bug or a feature request. Most of
the time, what we get as developers are emails like "the app crashes" and we spend a lot of time
trying to figure out the cause of that bug. By properly describing the nature and context of a crash
(we learn to do that with experience as a user who reports bugs), you help developers so immensely,
you have no idea.
It's the same thing with feature requests. Description of a feature request, when thoughts have
already been given to how such a feature would fit in the current design, are precious to developers
and help them figure out a clear roadmap for the project.
So, even if you're not a developer, you can always open a Github account and create/comment issues.
Your contribution will be much appreciated.
**Documentation**. This is a bit trickier because dupeGuru's documentation is written with a rather
complex markup language, `Sphinx`_ (based on `reST`_). To properly work within the documentation,
you have to know that language. I don't think that learning this language is outside the realm of
possibility for a non-developer, but it might be a daunting task.
That being said, if it's a minor modification to the documentation, nothing stops you from opening
an issue (there's a label for documentation issues, so this kind of issue is relevant to the
tracker) describing the change you propose to make and I'll be happy to make the change myself (if
relevant, of course).
Even if it's a bigger contribution to the documentation you want to make, I probably wouldn't mind
doing the formatting myself. But in that case, it's better to contact me first to make sure that we
agree on what should be added to the documentation.
**Translation**. Creating or improving an existing translation is a very good way to contribute to
dupeGuru. For more information about how to do that, you can refer to the `translator guide`_.
.. _been open source: https://www.hardcoded.net/articles/free-as-in-speech-fair-as-in-trade
.. _let me know: mailto:hsoft@hardcoded.net
.. _Source code repository: https://github.com/hsoft/dupeguru
.. _Issue Tracker: https://github.com/hsoft/dupeguru/issues
.. _Issue labels meaning: https://github.com/hsoft/dupeguru/wiki/issue-labels
.. _Sphinx: http://sphinx-doc.org/
.. _reST: http://en.wikipedia.org/wiki/ReStructuredText
.. _translator guide: https://github.com/hsoft/dupeguru/wiki/Translator-Guide

View File

@@ -0,0 +1,5 @@
core.app
========
.. automodule:: core.app
:members:

View File

@@ -0,0 +1,5 @@
core.directories
================
.. automodule:: core.directories
:members:

View File

@@ -0,0 +1,36 @@
core.engine
===========
.. automodule:: core.engine
.. autoclass:: Match
.. autoclass:: Group
:members:
.. autofunction:: build_word_dict
.. autofunction:: compare
.. autofunction:: compare_fields
.. autofunction:: getmatches
.. autofunction:: getmatches_by_contents
.. autofunction:: get_groups
.. autofunction:: merge_similar_words
.. autofunction:: reduce_common_words
.. _fields:
Fields
------
Fields are groups of words which each represent a significant part of the whole name. This concept
is sifnificant in music file names, where we often have names like "My Artist - a very long title
with many many words".
This title has 10 words. If you run as scan with a bit of tolerance, let's say 90%, you'll be able
to find a dupe that has only one "many" in the song title. However, you would also get false
duplicates from a title like "My Giraffe - a very long title with many many words", which is of
course a very different song and it doesn't make sense to match them.
When matching by fields, each field (separated by "-") is considered as a separate string to match
independently. After all fields are matched, the lowest result is kept. In the "Giraffe" example we
gave, the result would be 50% instead of 90% in normal mode.

View File

@@ -0,0 +1,5 @@
core.fs
=======
.. automodule:: core.fs
:members:

View File

@@ -0,0 +1,5 @@
core.gui.deletion_options
=========================
.. automodule:: core.gui.deletion_options
:members:

View File

@@ -0,0 +1,10 @@
core.gui
========
.. automodule:: core.gui
:members:
.. toctree::
:maxdepth: 2
deletion_options

View File

@@ -0,0 +1,12 @@
core
====
.. toctree::
:maxdepth: 2
app
fs
engine
directories
results
gui/index

View File

@@ -0,0 +1,5 @@
core.results
============
.. automodule:: core.results
:members:

View File

@@ -0,0 +1,5 @@
hscommon.build
==============
.. automodule:: hscommon.build
:members:

View File

@@ -0,0 +1,5 @@
hscommon.conflict
=================
.. automodule:: hscommon.conflict
:members:

View File

@@ -0,0 +1,5 @@
hscommon.desktop
================
.. automodule:: hscommon.desktop
:members:

View File

@@ -0,0 +1,12 @@
hscommon.gui.base
=================
.. automodule:: hscommon.gui.base
.. autosummary::
GUIObject
.. autoclass:: GUIObject
:members:
:private-members:

View File

@@ -0,0 +1,25 @@
hscommon.gui.column
============================
.. automodule:: hscommon.gui.column
.. autosummary::
Columns
Column
ColumnsView
PrefAccessInterface
.. autoclass:: Columns
:members:
:private-members:
.. autoclass:: Column
:members:
:private-members:
.. autoclass:: ColumnsView
:members:
.. autoclass:: PrefAccessInterface
:members:

View File

@@ -0,0 +1,18 @@
hscommon.gui.progress_window
============================
.. automodule:: hscommon.gui.progress_window
.. autosummary::
ProgressWindow
ProgressWindowView
.. autoclass:: ProgressWindow
:members:
:private-members:
.. autoclass:: ProgressWindowView
:members:
:private-members:

View File

@@ -0,0 +1,26 @@
hscommon.gui.selectable_list
============================
.. automodule:: hscommon.gui.selectable_list
.. autosummary::
Selectable
SelectableList
GUISelectableList
GUISelectableListView
.. autoclass:: Selectable
:members:
:private-members:
.. autoclass:: SelectableList
:members:
:private-members:
.. autoclass:: GUISelectableList
:members:
:private-members:
.. autoclass:: GUISelectableListView
:members:

View File

@@ -0,0 +1,26 @@
hscommon.gui.table
==================
.. automodule:: hscommon.gui.table
.. autosummary::
Table
Row
GUITable
GUITableView
.. autoclass:: Table
:members:
:private-members:
.. autoclass:: Row
:members:
:private-members:
.. autoclass:: GUITable
:members:
:private-members:
.. autoclass:: GUITableView
:members:

View File

@@ -0,0 +1,16 @@
hscommon.gui.text_field
=======================
.. automodule:: hscommon.gui.text_field
.. autosummary::
TextField
TextFieldView
.. autoclass:: TextField
:members:
:private-members:
.. autoclass:: TextFieldView
:members:

View File

@@ -0,0 +1,18 @@
hscommon.gui.tree
=================
.. automodule:: hscommon.gui.tree
.. autosummary::
Tree
Node
.. autoclass:: Tree
:members:
:private-members:
.. autoclass:: Node
:members:
:private-members:

View File

@@ -0,0 +1,16 @@
hscommon
========
.. toctree::
:maxdepth: 2
:glob:
build
conflict
desktop
notify
path
util
jobprogress/*
gui/*

View File

@@ -0,0 +1,17 @@
hscommon.jobprogress.job
========================
.. automodule:: hscommon.jobprogress.job
.. autosummary::
Job
NullJob
.. autoclass:: Job
:members:
:private-members:
.. autoclass:: NullJob
:members:

View File

@@ -0,0 +1,12 @@
hscommon.jobprogress.performer
==============================
.. automodule:: hscommon.jobprogress.performer
.. autosummary::
ThreadedJobPerformer
.. autoclass:: ThreadedJobPerformer
:members:

View File

@@ -0,0 +1,12 @@
hscommon.jobprogress.qt
=======================
.. automodule:: hscommon.jobprogress.qt
.. autosummary::
Progress
.. autoclass:: Progress
:members:

View File

@@ -0,0 +1,5 @@
hscommon.notify
===============
.. automodule:: hscommon.notify
:members:

View File

@@ -0,0 +1,5 @@
hscommon.path
=============
.. automodule:: hscommon.path
:members:

View File

@@ -0,0 +1,5 @@
hscommon.util
=============
.. automodule:: hscommon.util
:members:

View File

@@ -0,0 +1,74 @@
Developer Guide
===============
When looking at a non-trivial codebase for the first time, it's very difficult to understand
anything of it until you get the "Big Picture". This page is meant to, hopefully, make you get
dupeGuru's big picture.
Branches and tags
-----------------
The git repo has one main branch, ``master``. It represents the latest "stable development commit",
that is, the latest commit that doesn't include in-progress features. This branch should always
be buildable, ``tox`` should always run without errors on it.
When a feature/bugfix has an atomicity of a single commit, it's alright to commit right into
``master``. However, if a feature/bugfix needs more than a commit, it should live in a separate
topic branch until it's ready.
Every release is tagged with the version number. For example, there's a ``2.8.2`` tag for the
v2.8.2 release.
Model/View/Controller... nope!
------------------------------
dupeGuru's codebase has quite a few design flaws. The Model, View and Controller roles are filled by
different classes, scattered around. If you're aware of that, it might help you to understand what
the heck is going on.
The central piece of dupeGuru is :class:`core.app.DupeGuru`. It's the only
interface to the python's code for the GUI code. A duplicate scan is started with
:meth:`core.app.DupeGuru.start_scanning()`, directories are added through
:meth:`core.app.DupeGuru.add_directory()`, etc..
A lot of functionalities of the App are implemented in the platform-specific subclasses of
:class:`core.app.DupeGuru`, like ``DupeGuru`` in ``cocoa/inter/app.py``, or the ``DupeGuru`` class
in ``qt/base/app.py``. For example, when performing "Remove Selected From Results",
``RemoveSelected()`` on the cocoa side, and ``remove_duplicates()`` on the PyQt side, are
respectively called to perform the thing.
.. _jobs:
Jobs
----
A lot of operations in dupeGuru take a significant amount of time. This is why there's a generalized
threaded job mechanism built-in :class:`~core.app.DupeGuru`. First, :class:`~core.app.DupeGuru` has
a ``progress`` member which is an instance of
:class:`~hscommon.jobprogress.performer.ThreadedJobPerformer`. It lets the GUI code know of the progress
of the current threaded job. When :class:`~core.app.DupeGuru` needs to start a job, it calls
``_start_job()`` and the platform specific subclass deals with the details of starting the job.
Core principles
---------------
The core of the duplicate matching takes place (for SE and ME, not PE) in :mod:`core.engine`.
There's :func:`core.engine.getmatches` which take a list of :class:`core.fs.File` instances and
return a list of ``(firstfile, secondfile, match_percentage)`` matches. Then, there's
:func:`core.engine.get_groups` which takes a list of matches and returns a list of
:class:`.Group` instances (a :class:`.Group` is basically a list of :class:`.File` matching
together).
When a scan is over, the final result (the list of groups from :func:`.get_groups`) is placed into
:attr:`core.app.DupeGuru.results`, which is a :class:`core.results.Results` instance. The
:class:`~.Results` instance is where all the dupe marking, sorting, removing, power marking, etc.
takes place.
API
---
.. toctree::
:maxdepth: 2
core/index
hscommon/index

View File

@@ -0,0 +1,184 @@
Frequently Asked Questions
==========================
.. contents::
What is dupeGuru?
-----------------
dupeGuru is a tool to find duplicate files on your computer. It has three operational modes:
Standard, Music and Picture. Each mode has its own specialized preferences.
Each mode has multiple scan types, such as filename, contents, tags. Some scan types feature
advanced fuzzy matching algorithm, allowing you to find duplicates that other more rigid duplicate
scanners can't.
What makes it special?
----------------------
It's mostly about customizability. There's a lot of scanning options that allow you to get the
type of results you're really looking for.
How safe is it to use dupeGuru?
-------------------------------
Very safe. dupeGuru has been designed to make sure you don't delete files you didn't mean to delete.
First, there is the reference folder system that lets you define folders where you absolutely
**don't** want dupeGuru to let you delete files there, and then there is the group reference system
that makes sure that you will **always** keep at least one member of the duplicate group.
How can I report a bug a suggest a feature?
-------------------------------------------
dupeGuru is hosted on `Github`_ and it's also where issues are tracked. The best way to report a
bug or suggest a feature is to sign up on Github and `open an issue`_.
The mark box of a file I want to delete is disabled. What must I do?
--------------------------------------------------------------------
You cannot mark the reference (The first file) of a duplicate group. However, what you can do is to
promote a duplicate file to reference. Thus, if a file you want to mark is reference, select a
duplicate file in the group that you want to promote to reference, and click on
**Actions-->Make Selected into Reference**. If the reference file is from a reference folder
(filename written in blue letters), you cannot remove it from the reference position.
I have a folder from which I really don't want to delete files.
---------------------------------------------------------------
If you want to be sure that dupeGuru will never delete file from a particular folder, make sure to
set its state to **Reference** at :doc:`folders`.
What is this '(X discarded)' notice in the status bar?
------------------------------------------------------
In some cases, some matches are not included in the final results for security reasons. Let me use
an example. We have 3 file: A, B and C. We scan them using a low filter hardness. The scanner
determines that A matches with B, A matches with C, but B does **not** match with C. Here, dupeGuru
has kind of a problem. It cannot create a duplicate group with A, B and C in it because not all
files in the group would match together. It could create 2 groups: one A-B group and then one A-C
group, but it will not, for security reasons. Lets think about it: If B doesn't match with C, it
probably means that either B, C or both are not actually duplicates. If there would be 2 groups (A-B
and A-C), you would end up delete both B and C. And if one of them is not a duplicate, that is
really not what you want to do, right? So what dupeGuru does in a case like this is to discard the
A-C match (and adds a notice in the status bar). Thus, if you delete B and re-run a scan, you will
have a A-C match in your next results.
I want to mark all files from a specific folder. What can I do?
---------------------------------------------------------------
Enable the :doc:`Dupes Only <results>` mode and click on the Folder column to sort your duplicates
by folder. It will then be easy for you to select all duplicates from the same folder, and then
press Space to mark all selected duplicates.
I want to remove all files that are more than 300 KB away from their reference file. What can I do?
---------------------------------------------------------------------------------------------------
* Enable the :doc:`Dupes Only <results>` mode.
* Enable the **Delta Values** mode.
* Click on the "Size" column to sort the results by size.
* Select all duplicates below -300.
* Click on **Remove Selected from Results**.
* Select all duplicates over 300.
* Click on **Remove Selected from Results**.
I want to make my latest modified files reference files. What can I do?
-----------------------------------------------------------------------
* Enable the :doc:`Dupes Only <results>` mode.
* Enable the **Delta Values** mode.
* Click on the "Modification" column to sort the results by modification date.
* Click on the "Modification" column again to reverse the sort order.
* Select all duplicates over 0.
* Click on **Make Selected into Reference**.
I want to mark all duplicates containing the word "copy". How do I do that?
---------------------------------------------------------------------------
* Type "copy" in the "Filter" field in the top-right corner of the result window.
* Click on **Mark --> Mark All**.
I want to remove all songs that are more than 3 seconds away from their reference file. What can I do?
------------------------------------------------------------------------------------------------------
* Enable the :doc:`Dupes Only <results>` mode.
* Enable the **Delta Values** mode.
* Click on the "Time" column to sort the results by time.
* Select all duplicates below -00:03.
* Click on **Remove Selected from Results**.
* Select all duplicates over 00:03.
* Click on **Remove Selected from Results**.
I want to make my highest bitrate songs reference files. What can I do?
-----------------------------------------------------------------------
* Enable the :doc:`Dupes Only <results>` mode.
* Enable the **Delta Values** mode.
* Click on the "Bitrate" column to sort the results by bitrate.
* Click on the "Bitrate" column again to reverse the sort order.
* Select all duplicates over 0.
* Click on **Make Selected into Reference**.
I don't want [live] and [remix] versions of my songs counted as duplicates. How do I do that?
---------------------------------------------------------------------------------------------
If your comparison threshold is low enough, you will probably end up with live and remix
versions of your songs in your results. There's nothing you can do to prevent that, but there's
something you can do to easily remove them from your results after the scan: post-scan
filtering. If, for example, you want to remove every song with anything inside square brackets
[]:
* Type "[*]" in the "Filter" field in the top-right corner of the result window.
* Click on **Mark --> Mark All**.
* Click on **Actions --> Remove Selected from Results**.
The "Filter Hardness" slider in the preferences won't move!
-----------------------------------------------------------
This slider is only relevant for scan types that support "fuzziness". Many scan types, such as the
"Contents" type, only support exact matches. When these types are selected, the slider is disabled.
On some OS, the fact that it's disabled is harder to see than on others, but if you can't move the
slider, it means that this preference is irrelevant in your current scan type.
I've tried to send my duplicates to Trash, but dupeGuru is telling me it can't do it. Why? What can I do?
---------------------------------------------------------------------------------------------------------
Most of the time, the reason why dupeGuru can't send files to Trash is because of file permissions.
You need *write* permissions on files you want to send to Trash.
If dupeGuru still gives you troubles after fixing your permissions, try enabling the "Directly
delete files" option that is offered to you when you activate Send to Trash. This will not send
files to the Trash, but delete them immediately. In some cases, for example on network storage
(NAS), this has been known to work when normal deletion didn't.
If this fail, `HS forums`_ might be of some help.
Why is Picture mode's contents scan so slow?
--------------------------------------------
This scanning method is very different from methods. It can detect duplicate photos even if they
are not exactly the same. This very cool capability has a cost: time. Every picture has to be
individually and fuzzily matched to all others, and this takes a lot of CPU power.
If all you need to find is exact duplicates, just use the standard mode of dupeGuru with the
Contents scan method. If your photos have EXIF tags, you can also try the "EXIF" scan method which
is much faster.
Where are user files located?
-----------------------------
For some reason, you'd like to remove or edit dupeGuru's user files (debug logs, caches, etc.).
Where they're located depends on your platform:
* Linux: ``~/.local/share/data/Hardcoded Software/dupeGuru``
* Mac OS X: ``~/Library/Application Support/dupeGuru``
Preferences are stored elsewhere:
* Linux: ``~/.config/Hardcoded Software/dupeGuru.conf``
* Mac OS X: In the built-in ``defaults`` system, as ``com.hardcoded-software.dupeguru``
.. _HS forums: https://forum.hardcoded.net/
.. _Github: https://github.com/hsoft/dupeguru
.. _open an issue: https://github.com/hsoft/dupeguru/wiki/issue-labels

View File

@@ -0,0 +1,76 @@
Folder Selection
================
The first window you see when you launch dupeGuru is the folder selection window. This windows
contains the basic input dupeGuru needs to start a scan:
* An Application Mode selection
* A Scan Type selection
* Folders to scan
Application Mode
----------------
dupeGuru had three main modes: Standard, Music and Picture.
Standard is for any type of files. This makes this mode the most polyvalent, but it lacks
specialized features other modes have.
Music mode scans only music files, but it supports tags comparison and its results window has many
audio-related informational columns.
Picture mode scans only pictures, but its contents scan type is a powerful fuzzy matcher that can
find pictures that are similar without being exactly the same.
Choosing an application mode not only changes available scan types in the selector below, but also
changes available options in the preferences panel. Thus, if you want to fine tune your scan, be
sure to open the preferences panel **after** you've selected the application mode.
Scan Type
---------
This selector determines the type of the scan we'll do. See :doc:`scan` for details about scan
types.
Folder List
-----------
To add a folder, click on the **+** button. If you added folder before, a popup
menu with a list of recent folders you added will pop. You can click on one of
them to add it directly to your list. If you click on the first item of the
popup menu, **Add New Folder...**, you will be prompted for a folder to add. If
you never added a folder, no menu will pop and you will directly be prompted
for a new folder to add.
An alternate way to add folders to the list is to drag them in the list.
To remove a folder, select the folder to remove and click on **-**. If a subfolder is selected when
you click the button, the selected folder will be set to **excluded** state (see below) instead of
being removed.
Folder states
-------------
Every folder can be in one of these 3 states:
**Normal:**
Duplicates found in this folder can be deleted.
**Reference:**
Duplicates found in this folder **cannot** be deleted. Files from this folder can
only end up in **reference** position in the dupe group. If more than one file from reference
folders end up in the same dupe group, only one will be kept. The others will be removed from
the group.
**Excluded:**
Files in this directory will not be included in the scan.
The default state of a folder is, of course, **Normal**. You can use **Reference** state for a
folder if you want to be sure that you won't delete any file from it.
When you set the state of a directory, all subfolders of this folder automatically inherit this
state unless you explicitly set a subfolder's state.
Scan
----
When you're ready, click on the **Scan** button to initiate the scanning process. When it's done,
you'll be shown the :doc:`results`.

View File

@@ -0,0 +1,45 @@
dupeGuru help
=============
This help document is also available in these languages:
* `French <http://www.hardcoded.net/dupeguru/help/fr>`__
* `German <http://www.hardcoded.net/dupeguru/help/de>`__
* `Armenian <http://www.hardcoded.net/dupeguru/help/hy>`__
* `Russian <http://www.hardcoded.net/dupeguru/help/ru>`__
* `Ukrainian <http://www.hardcoded.net/dupeguru/help/uk>`__
dupeGuru is a tool to find duplicate files on your computer. It has three
modes, Standard, Music and Picture, with each mode having its own scan types
and little features.
Although dupeGuru can easily be used without documentation, reading this file
will help you to master it. If you are looking for guidance for your first
duplicate scan, you can take a look at the :doc:`Quick Start <quick_start>`
section.
It is a good idea to keep dupeGuru updated. You can download the latest version on its `homepage`_.
Contents:
.. toctree::
:maxdepth: 2
contribute
quick_start
folders
preferences
scan
results
reprioritize
faq
developer/index
changelog
Indices and tables
==================
* :ref:`genindex`
* :ref:`search`
.. _homepage: https://www.hardcoded.net/dupeguru

View File

@@ -0,0 +1,78 @@
Preferences
===========
**Tags to scan:**
When using the **Tags** scan type, you can select the tags that will be used for comparison.
**Word weighting:**
See :ref:`word-weighting`.
**Match similar words:**
See :ref:`similarity-matching`.
**Match pictures of different dimensions:**
If you check this box, pictures of different dimensions will be allowed in the same
duplicate group.
.. _filter-hardness:
**Filter Hardness:**
The threshold needed for two files to be considered duplicates. A lower value means more
duplicates. The meaning of the threshold depends on the scanning type (see :doc:`scan`).
Only works for :ref:`worded <worded-scan>` and :ref:`picture blocks <picture-blocks-scan>`
scans.
**Can mix file kind:**
If you check this box, duplicate groups are allowed to have files with different extensions. If
you don't check it, well, they aren't!
**Ignore duplicates hardlinking to the same file:**
If this option is enabled, dupeGuru will verify duplicates to see if they refer to the same
`inode`_. If they do, they will not be considered duplicates. (Only for OS X and Linux)
**Use regular expressions when filtering:**
If you check this box, the filtering feature will treat your filter query as a
**regular expression**. Explaining them is beyond the scope of this document. A good place to
start learning it is `regular-expressions.info`_.
**Remove empty folders after delete or move:**
When this option is enabled, folders are deleted after a file is deleted or moved and the folder
is empty.
**Copy and Move:**
Determines how the Copy and Move operations (in the Action menu) will behave.
* **Right in destination:** All files will be sent directly in the selected destination, without
trying to recreate the source path at all.
* **Recreate relative path:** The source file's path will be re-created in the destination folder up
to the root selection in the Directories panel. For example, if you added
``/Users/foobar/SomeFolder`` to your Directories panel and you move
``/Users/foobar/SomeFolder/SubFolder/SomeFile.ext`` to the destination
``/Users/foobar/MyDestination``, the final destination for the file will be
``/Users/foobar/MyDestination/SubFolder`` (``SomeFolder`` has been trimmed from source's path in
the final destination.).
* **Recreate absolute path:** The source file's path will be re-created in the destination folder in
its entirety. For example, if you move ``/Users/foobar/SomeFolder/SubFolder/SomeFile.ext`` to the
destination ``/Users/foobar/MyDestination``, the final destination for the file will be
``/Users/foobar/MyDestination/Users/foobar/SomeFolder/SubFolder``.
In all cases, dupeGuru nicely handles naming conflicts by prepending a number to the destination
filename if the filename already exists in the destination.
**Custom Command:**
This preference determines the command that will be invoked by the "Invoke Custom Command"
action. You can invoke any external application through this action. This can be useful if,
for example, you have a nice diffing application installed.
The format of the command is the same as what you would write in the command line, except that there
are 2 placeholders: **%d** and **%r**. These placeholders will be replaced by the path of the
selected dupe (%d) and the path of the selected dupe's reference file (%r).
If the path to your executable contains space characters, you should enclose it in "" quotes. You
should also enclose placeholders in quotes because it's very possible that paths to dupes and refs
will contain spaces. Here's an example custom command::
"C:\Program Files\SuperDiffProg\SuperDiffProg.exe" "%d" "%r"
.. _inode: http://en.wikipedia.org/wiki/Inode
.. _regular-expressions.info: http://www.regular-expressions.info

View File

@@ -0,0 +1,14 @@
Quick Start
===========
To get you quickly started with dupeGuru, let's just make a standard scan using default preferences.
* Launch dupeGuru.
* Add folders to scan with either drag & drop or the "+" button.
* Click on **Scan**.
* Wait until the scan process is over.
* Look at every duplicate (The files that are indented) and verify that it is indeed a duplicate to the group's reference (The file above the duplicate that is not indented and have a disabled mark box).
* If a file is a false duplicate, select it and click on **Actions-->Remove Selected from Results**.
* Once you are sure that there is no false duplicate in your results, click on **Edit-->Mark All**, and then **Actions-->Send Marked to Recycle bin**.
That is only a basic scan. There are a lot of tweaking you can do to get different results and several methods of examining and modifying your results. To know about them, just read the rest of this help file.

View File

@@ -0,0 +1,25 @@
Re-Prioritizing duplicates
==========================
dupeGuru tries to automatically determine which duplicate should go in each group's reference
position, but sometimes it gets it wrong. In many cases, clever dupe sorting with "Delta Values"
and "Dupes Only" options in addition to the "Make Selected into Reference" action does the trick,
but sometimes, a more powerful option is needed. This is where the Re-Prioritization dialog comes
into play. You can summon it through the "Re-Prioritize Results" item in the "Actions" menu.
This dialog allows you to select criteria according to which a reference dupe will be selected in
each dupe group. The list of available criteria is on the left and the list of criteria you've
selected is on the right.
A criteria is a category followed by an argument. For example, "Size (Highest)" means that the dupe
with the biggest size will win. "Folder (/foo/bar)" means that dupes in this folder will win. To add
a criterion to the rightmost list, first select a category in the combobox, then select a
subargument in the list below, and then click on the right pointing arrow button.
The order of the list on the right is important (you can re-order items through drag & drop). When
picking a dupe for reference position, the first criterion is used. If there's a tie, the second
criterion is used and so on and so on. For example, if your arguments are "Size (Highest)" and then
"Filename (Doesn't end with a number)", the reference file that will be picked in a group will be
the biggest file, and if two or more files have the same size, the one that has a filename that
doesn't end with a number will be used. When all criteria result in ties, the order in which dupes
previously were in the group will be used.

View File

@@ -0,0 +1,196 @@
Results
=======
.. contents::
When dupeGuru is finished scanning for duplicates, it will show its results in the form of duplicate group list.
About duplicate groups
----------------------
A duplicate group is a group of files that all match together. Every group has a **reference file** and one or more **duplicate files**. The reference file is the first file of the group. Its mark box is disabled. Below it, and indented, are the duplicate files.
You can mark duplicate files, but you can never mark the reference file of a group. This is a security measure to prevent dupeGuru from deleting not only duplicate files, but their reference. You sure don't want that, do you?
What determines which files are reference and which files are duplicates is first their folder state. A file from a reference folder will always be reference in a duplicate group. If all files are from a normal folder, the size determine which file will be the reference of a duplicate group. dupeGuru assumes that you always want to keep the biggest file, so the biggest files will take the reference position.
You can change the reference file of a group manually. To do so, select the duplicate file you want
to promote to reference, and click on **Actions-->Make Selected into Reference**.
Reviewing results
-----------------
Although you can just click on **Edit-->Mark All** and then **Actions-->Send Marked to Recycle bin** to quickly delete all duplicate files in your results, it is always recommended to review all duplicates before deleting them.
To help you reviewing the results, you can bring up the **Details panel**. This panel shows all the details of the currently selected file as well as its reference's details. This is very handy to quickly determine if a duplicate really is a duplicate. You can also double-click on a file to open it with its associated application.
If you have more false duplicates than true duplicates (If your filter hardness is very low), the best way to proceed would be to review duplicates, mark true duplicates and then click on **Actions-->Send Marked to Recycle bin**. If you have more true duplicates than false duplicates, you can instead mark all files that are false duplicates, and use **Actions-->Remove Marked from Results**.
Marking and Selecting
---------------------
A **marked** duplicate is a duplicate with the little box next to it having a check-mark. A **selected** duplicate is a duplicate being highlighted. The multiple selection actions can be performed in dupeGuru in the standard way (Shift/Command/Control click). You can toggle all selected duplicates' mark state by pressing **space**.
Show Dupes Only
---------------
When this mode is enabled, the duplicates are shown without their respective reference file. You can select, mark and sort this list, just like in normal mode.
The dupeGuru results, when in normal mode, are sorted according to duplicate groups' **reference file**. This means that if you want, for example, to mark all duplicates with the "exe" extension, you cannot just sort the results by "Kind" to have all exe duplicates together because a group can be composed of more than one kind of files. That is where Dupes Only mode comes into play. To mark all your "exe" duplicates, you just have to:
* Enable the Dupes Only mode.
* Add the "Kind" column with the "Columns" menu.
* Click on that "Kind" column to sort the list by kind.
* Locate the first duplicate with a "exe" kind.
* Select it.
* Scroll down the list to locate the last duplicate with a "exe" kind.
* Hold Shift and click on it.
* Press Space to mark all selected duplicates.
.. _deltavalues:
Delta Values
------------
If you turn this switch on, numerical columns will display the value relative to the duplicate's
reference instead of the absolute values. These delta values will also be displayed in a different
color, orange, so you can spot them easily. For example, if a duplicate is 1.2 MB and its reference
is 1.4 MB, the Size column will display -0.2 MB.
Moreover, non-numerical values will also be in orange if their value is different from their
reference, and stay black if their value is the same. Combined with column sorting in Dupes Only
mode, this allows for very powerful post-scan filtering.
Dupes Only and Delta Values
---------------------------
The Dupes Only mode unveil its true power when you use it with the Delta Values switch turned on.
When you turn it on, relative values will be displayed instead of absolute ones. So if, for example,
you want to remove from your results all duplicates that are more than 300 KB away from their
reference, you could sort the dupes only results by Size, select all duplicates under -300 in the
Size column, delete them, and then do the same for duplicates over 300 at the bottom of the list.
Same thing for non-numerical values: When Dupes Only and Delta Values are enabled at the same time,
column sorting groups rows depending on whether they're orange or not. Example: You ran a contents
scan, but you would only like to delete duplicates that have the same filename? Sort by filename
and all dupes with their filename attribute being the same as the reference will be grouped
together, their value being in black.
You could also use it to change the reference priority of your duplicate list. When you make a fresh
scan, if there are no reference folders, the reference file of every group is the biggest file. If
you want to change that, for example, to the latest modification time, you can sort the dupes only
results by modification time in **descending** order, select all duplicates with a modification time
delta value higher than 0 and click on **Make Selected into Reference**. The reason why you must
make the sort order descending is because if 2 files among the same duplicate group are selected
when you click on **Make Selected into Reference**, only the first of the list will be made
reference, the other will be ignored. And since you want the last modified file to be reference,
having the sort order descending assures you that the first item of the list will be the last
modified.
Filtering
---------
dupeGuru supports post-scan filtering. With it, you can narrow down your results so you can perform
actions on a subset of it. For example, you could easily mark all duplicates with their filename
containing "copy" from your results using the filter.
To use the filtering feature, type your filter in the "Filter" search field at the top-right corner
of the results window. What you type in that box will be applied to the *whole path* of every
duplicate in the results. Only duplicate *groups* having at least one duplicate matching the filter
will be shown.
When having groups where not all duplicates match the filter, we still show all duplicates of
the group. However, non-matching duplicates are in "reference mode". Therefore, you can perform
actions like "Mark All" and be sure to only mark filtered duplicates.
To go back to unfiltered result, blank out the field or click on the "X".
In simple mode (the default mode), whatever you type as the filter is the string used to perform the
actual filtering, with the exception of one wildcard: **\***. Thus, if you type "[*]" as your
filter, it will match anything with [] brackets in it, whatever is in between those brackets.
For more advanced filtering, you can turn "Use regular expressions when filtering" on. The filtering
feature will then use **regular expressions**. A regular expression is a language for matching text.
Explaining them is beyond the scope of this document. A good place to start learning it is
`regular-expressions.info`_.
Matches are case insensitive in both simple and regexp mode.
For the filter to match, your regular expression don't have to match the whole filename, it just
have to contain a string matching the expression.
Action Menu
-----------
**Clear Ignore List:**
Remove all ignored matches you added. You have to start a new scan for the
newly cleared ignore list to be effective.
**Export Results to XHTML:**
Take the current results, and create an XHTML file out of it. The
columns that are visible when you click on this button will be the columns present in the XHTML
file. The file will automatically be opened in your default browser.
**Send Marked to Trash:**
Send all marked duplicates to trash, obviously. Before proceeding,
you'll be presented deletion options (see below).
**Move Marked to...:**
Prompt you for a destination, and then move all marked files to that
destination. Source file's path might be re-created in destination, depending on the
"Copy and Move" preference.
**Copy Marked to...:**
Prompt you for a destination, and then copy all marked files to that
destination. Source file's path might be re-created in destination, depending on the
"Copy and Move" preference.
**Remove Marked from Results:**
Remove all marked duplicates from results. The actual files will
not be touched and will stay where they are.
**Remove Selected from Results:**
Remove all selected duplicates from results. Note that all
selected reference files will be ignored, only duplicates can be removed with this action.
**Make Selected into Reference:**
Promote all selected duplicates to reference. If a duplicate is
a part of a group having a reference file coming from a reference folder (in blue color), no
action will be taken for this duplicate. If more than one duplicate among the same group are
selected, only the first of each group will be promoted.
**Add Selected to Ignore List:**
This first removes all selected duplicates from results, and
then add the match of that duplicate and the current reference in the ignore list. This match
will not come up again in further scan. The duplicate itself might come back, but it will be
matched with another reference file. You can clear the ignore list with the Clear Ignore List
command.
**Open Selected with Default Application:**
Open the file with the application associated with selected file's type.
**Reveal Selected in Finder:**
Open the folder containing selected file.
**Invoke Custom Command:**
Invokes the external application you've set up in your preferences using the current selection
as arguments in the invocation.
**Rename Selected:**
Prompts you for a new name, and then rename the selected file.
Deletion Options
----------------
These options affect how duplicate deletion takes place. Most of the time, you don't need to enable
any of them.
**Link deleted files:**
The deleted files are replaced by a link to the reference file. You have a choice of replacing
it either with a `symlink`_ or a `hardlink`_. It's better to read the whole
wikipedia pages about them to make a informed choice, but in short, a symlink is a shortcut to
the file's path. If the original file is deleted or moved, the link is broken. A hardlink is a
link to the file *itself*. That link is as good as a "real" file. Only when *all* hardlinks to a
file are deleted is the file itself deleted.
On OSX and Linux, this feature is supported fully, but under Windows, it's a bit complicated.
Windows XP doesn't support it, but Vista and up support it. However, for the feature to work,
dupeGuru has to run with administrative privileges.
**Directly delete files:**
Instead of sending files to trash, directly delete them. This is used
for troubleshooting and you normally don't need to enable this unless dupeGuru has problems
deleting files normally, something that can happens when you try to delete files on network
storage (NAS).
.. _regular-expressions.info: http://www.regular-expressions.info
.. _hardlink: http://en.wikipedia.org/wiki/Hard_link
.. _symlink: http://en.wikipedia.org/wiki/Symbolic_link

View File

@@ -0,0 +1,168 @@
The scanning process
====================
.. contents::
dupeGuru has 3 basic ways of scanning: :ref:`worded-scan` and :ref:`contents-scan` and
:ref:`picture blocks <picture-blocks-scan>`. The first two types are for the Standard and Music
modes, the last is for the Picture mode. The scanning process is configured through the
:doc:`Preference pane <preferences>`.
.. _worded-scan:
Worded scans
------------
Worded scans extract a string from each file and split it into words. The string can come from two
different sources: **Filename** or **Tags** (Music Edition only).
When our source is music tags, we have to choose which tags to use. If, for example, we choose to
analyse *artist* and *title* tags, we'd end up with strings like
"The White Stripes - Seven Nation Army".
Words are split by space characters, with all punctuation removed (some are replaced by spaces, some
by nothing) and all words lowercased. For example, the string "This guy's song(remix)" yields
*this*, *guys*, *song* and *remix*.
Once this is done, the scanning dance begins. Finding duplicates is only a matter of finding how
many words in common two given strings have. If the :ref:`filter hardness <filter-hardness>` is,
for example, ``80``, it means that 80% of the words of two strings must match. To determine the
matching percentage, dupeGuru first counts the total number of words in **both** strings, then count
the number of words matching (every word matching count as 2), and then divide the number of words
matching by the total number of words. If the result is higher or equal than the filter hardness,
we have a duplicate match. For example, "a b c d" and "c d e" have a matching percentage of 57
(4 words matching, 7 total words).
Fields
^^^^^^
Song filenames often come with multiple and distinct parts and this can cause problems. For example,
let's take these two songs: "Dolly Parton - I Will Always Love You" and
"Whitney Houston - I Will Always Love You". They are clearly not the same song (they come from
different artists), but they still still have a matching score of 71%! This means that, with a naive
scanning method, we would get these songs as a false positive as soon as we try to dig a bit deeper
in our dupe hunt by lowering the threshold a bit.
This is why we have the "Fields" concept. Fields are separated by dashes (``-``). When the
"Filename - Fields" scan type is chosen, each field is compared separately. Our final matching score
will only be the lowest of all the fields. In our example, the title has a 100% match, but the
artist has a 0% match, making our final match score 0.
Sometimes, our song filename policy isn't completely homogenous, which means that we can end up with
"The White Stripes - Seven Nation Army" and "Seven Nation Army - The White Stripes". This is why
we have the "Filename - Fields (No Order)" scan type. With this scan type, all fields are compared
with each other, and the highest score is kept. Then, the final matching score is the lowest of them
all. In our case, the final matching score is 100.
Note: Each field is used once. Thus, "The White Stripes - The White Stripes" and
"The White Stripes - Seven Nation Army" have a match score of 0 because the second
"The White Stripes" can't be compared with the first field of the other name because it has already
been "used up" by the first field. Our final match score would be 0.
*Tags* scanning method is always "fielded". When choosing this scan method, we also choose which
tags are going to be compared, each being a field.
.. _word-weighting:
Word weighting
^^^^^^^^^^^^^^
When enabled, this option slightly changes how matching percentage is calculated by making bigger
words worth more. With word weighting, instead of having a value of 1 in the duplicate count and
total word count, every word have a value equal to the number of characters they have. With word
weighting, "ab cde fghi" and "ab cde fghij" would have a matching percentage of 53% (19 total
characters, 10 characters matching (4 for "ab" and 6 for "cde")).
.. _similarity-matching:
Similarity matching
^^^^^^^^^^^^^^^^^^^
When enabled, similar words will be counted as matches. For example "The White Stripes" and
"The White Stripe" would have a match score of 100 instead of 66 with that option turned on.
Two words are considered similar if they can be made equal with only a few edit operations (removing
a letter, adding one etc.). The process used is not unlike the
`Levenshtein distance`_. For the technically inclined, the actual function used is
Python's `get_close_matches`_ with a ``0.8`` cutoff.
**Warning:** Use this option with caution. It is likely that you will get a lot of false positives
in your results when turning it on. However, it will help you to find duplicates that you wouldn't
have found otherwise. The scan process also is significantly slower with this option turned on.
.. _contents-scan:
Contents scans
--------------
Contents scans are much simpler than worded scans. We read files and if the contents is exactly the
same, we consider the two files duplicates.
This is, of course, quite longer than comparing filenames and, to avoid needlessly reading whole
file contents, we start by looking at file sizes. After having grouped our files by size, we discard
every file that is alone in its group. Then, we proceed to read the contents of our remaining files.
MD5 hashes are used to compute compare contents. Yes, it is widely known that forging files having
the same MD5 hash is easy, but this file has to be knowingly forged. The possibilities of two files
having the same MD5 hash *and* the same size by accident is still very, very small.
The :ref:`filter hardness <filter-hardness>` preference is ignored in this scan.
Folders
^^^^^^^
This is a special Contents scan type. It works like a normal contents scan, but
instead of trying to find duplicate files, it tries to find duplicate folders.
A folder is duplicate to another if all files it contains have the same
contents as the other folder's file.
This scan is, of course, recursive and subfolders are checked. dupeGuru keeps only the biggest
fishes. Therefore, if two folders that are considered as matching contain subfolders, these
subfolders will not be included in the final results.
With this mode, we end up with folders as results instead of files.
.. _picture-blocks-scan:
Picture blocks
--------------
dupeGuru Picture mode stands apart of its two friends. Its scan types are completely different.
The first one is its "Contents" scan, which is a bit too generic, hence the name we use here,
"Picture blocks".
We start by opening every picture in RGB bitmap mode, then we "blockify" the picture. We create a
15x15 grid and compute the average color of each grid tile. This is the "picture analysis" phase.
It's very time consuming and the result is cached in a database (the "picture cache").
Once we've done that, we can start comparing them. Each tile in the grid (an average color) is
compared to its corresponding grid on the other picture and a color diff is computer (it's simply
a sum of the difference of R, G and B on each side). All these sums are added up to a final "score".
If that score is smaller or equal to ``100 - threshold``, we have a match.
A threshold of 100 adds an additional constraint that pictures have to be exactly the same (it's
possible, due to averaging, that the tile comparison yields ``0`` for pictures that aren't exactly
the same, but since "100%" suggests "exactly the same", we discard those ocurrences). If you want
to get pictures that are very, very similar but still allow a bit of fuzzy differences, go for 99%.
This second part of the scan is CPU intensive and can take quite a bit of time. This task has been
made to take advatange of multi-core CPUs and has been optimized to the best of my abilities, but
the fact of the matter is that, due to the fuzziness of the task, we still have to compare every picture
to every other, making the algorithm quadratic (if ``N`` is the number of pictures to compare, the
number of comparisons to perform is ``N*N``).
This algorithm is very naive, but in the field, it works rather well. If you master a better
algorithm and want to improve dupeGuru, by all means, let me know!
EXIF Timestamp
--------------
This one is easy. We read the EXIF information of every picture and extract the ``DateTimeOriginal``
tag. If the tag is the same for two pictures, they're considered duplicates.
**Warning:** Modified pictures often keep the same EXIF timestamp, so watch out for false positives
when you use that scan type.
.. _Levenshtein distance: http://en.wikipedia.org/wiki/Levenshtein_distance
.. _get_close_matches: http://docs.python.org/3/library/difflib.html#difflib.get_close_matches