Skip to main content

Tool for finding and hardlinking identical files.

Project description

mklinks: tool for finding and hardlinking identical files

Latest release 20250530: Big refactor to use cs.hashindex for the content checks and better internal data structures.

Mklinks walks supplied paths looking for files with the same content, based on a cryptographic checksum of their content. It hardlinks all such files found, keeping the oldest version.

Unlike some rather naive tools out there, mklinks only compares files with other files of the same size, and is hardlink aware; a partially hardlinked tree is processed efficiently and correctly.

Short summary:

  • Inode: Information about a particular inode.
  • Linker: The class which links files with identical content.
  • main: CLI for mklinks.
  • MKLinksCommand: Hard link files with identical contents.

Module contents:

  • Class Inode(cs.fs.HasFSPath, cs.deco.Promotable)``: Information about a particular inode.

Inode.assimilate(self, other: 'Inode', *, dry_run=False, runstate: Optional[cs.resources.RunState] = <function uses_runstate.<locals>.<lambda> at 0x10b75c220>): Link our primary path to all the paths from other. Return success.

Inode.from_str(fspath: str, *, hashname: str): Promote a filesystem path to an Inode.

Inode.key: A (ino,dev) 2-tuple.

Inode.same_dev(self, other): Test whether two Inodes are on the same filesystem.

Inode.same_file(self, other): Test whether two Inodes refer to the same file.

Inode.samefs(self, other): Test whether 2 Inode's are on the same filesystem (same .dev values).

Inode.stat_key(fspath: str): Compute the (ino,dev) 2-tuple from os.stat(fspath).

  • Class Linker``: The class which links files with identical content.

Linker.addpath(self, path): Add a new path to the data structures.

Linker.merge(self, *, dry_run=False, runstate: Optional[cs.resources.RunState] = <function uses_runstate.<locals>.<lambda> at 0x10b75cd60>): Merge files with equivalent content.

Linker.scan(self, path, *, runstate: Optional[cs.resources.RunState] = <function uses_runstate.<locals>.<lambda> at 0x10b75c900>): Scan the file tree.

  • main(argv=None): CLI for mklinks.

  • Class MKLinksCommand(cs.cmdutils.BaseCommand)``: Hard link files with identical contents.

    Usage summary:

    Usage: mklinks [common-options...] subcommand [options...]
      Hard link files with identical contents.
    

MKLinksCommand.Options

MKLinksCommand.main(self, argv): Usage: {cmd} paths... Hard link files with identical contents.

Release Log

Release 20250530: Big refactor to use cs.hashindex for the content checks and better internal data structures.

Release 20221228:

  • Modernise command, pass a RunState to the merge methods.
  • merge: check runstate more frequently, tweak progress bar.
  • assimilate: plumb runstate.

Release 20210404:

  • FileInfo.checksum: bump read size to 1MiB.
  • Requirements bump to match cs.cmdutils change.

Release 20210401: Major bugfix: subdirectory file paths were computed incorrectly.

Release 20210306: Use cs.cmdutils.BaseCommand for main programme, add better progress reporting.

Release 20171228: Initial PyPI release of cs.app.mklinks.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cs_app_mklinks-20250530.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cs_app_mklinks-20250530-py2.py3-none-any.whl (6.0 kB view details)

Uploaded Python 2Python 3

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page