ISCC-SUM User Guide#
This comprehensive guide covers all features of the iscc-sum command-line tool for generating and verifying
ISCC (International Standard Content Code) checksums according to ISO 24138:2024.
New to ISCC-SUM?
If you're just getting started, check out our Quick Start Guide for a gentle introduction. This user guide covers all features in detail for when you need more advanced functionality.
Installation#
The recommended way to install iscc-sum is using UV, a fast Python package
manager:
Verify the installation:
Installation Troubleshooting
"Command not found" after installation
- Close and reopen your terminal to refresh the PATH
- Check if UV's bin directory is in your PATH:
- Linux/macOS:
~/.local/bin - Windows:
%USERPROFILE%\.local\bin
- Linux/macOS:
Permission errors during installation
- Don't use
sudowith UV - UV installs tools in your user directory by default
Core Features#
Basic Checksum Generation#
Generate ISCC checksums for your files with simple commands:
Single File#
Output:
Output Format
The default output format follows GNU coreutils conventions:
ISCC:...- The ISCC checksum*- Binary mode indicator (always present)filename- Path to the file
Multiple Files#
Process multiple files in one command:
Output:
ISCC:KACWSO4JFISTQSRVMCWDRBTS5AX5E2XD7H3PRFMBTNGBD6PZQJNQ *file1.txt
ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2XHGQY *file2.pdf
ISCC:KACSUH2DZ5SLMG4YGH3LUW7J7ERVQZFMV5LYS4KAHCUSM6EPUAFA *image.jpg
Directory Processing#
Process all files in a directory recursively:
This will:
Recursively find all files
Process them in deterministic order
Respect
.isccignorepatterns
Deterministic Ordering
Directory traversal uses the TREEWALK-ISCC algorithm, ensuring consistent output across different platforms and filesystems.
Standard Input#
Process data from pipes or redirects:
Or from a file:
Output Formats#
ISCC-SUM supports multiple output formats for different use cases:
Default Format#
The standard format compatible with GNU coreutils tools:
BSD-Style Format#
Use --tag for BSD-style output:
iscc-sum --tag file.txt
# ISCC-SUM (file.txt) = ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2XHGQY
Narrow Format (128-bit)#
For ISO 24138:2024 conformant 128-bit ISCCs:
Narrow vs Extended Format
- Narrow (128-bit): Shorter, ISO standard compliant
- Extended (256-bit): Default, more collision resistant
Component Display#
Show individual Data-Code and Instance-Code components:
Output:
ISCC:KACYPXW445FTYNJ3CYSXHAFJMA2HUWULUNRFE3BLHRSCXYH2XHGQY *file.txt
ISCC:EAAW4BQTJSTJSHAI27AJSAGMGHNUKSKRTK3E6OZ5CXUS57SWQZXJQ
ISCC:IABXF3ZHYL6O6PM5P2HGV677CS3RBHINZSXEJCITE3WNOTQ2CYXRA
The components are:
- First line: Combined ISCC checksum
- Second line: Data-Code (content similarity)
- Third line: Instance-Code (exact match)
Zero-Terminated Output#
For script processing, use NUL-terminated lines:
Cross-Platform Checksum Files#
New Feature
The -o/--output option ensures consistent checksum files across platforms.
Create portable checksum files that work on Windows, Linux, and macOS:
This ensures:
UTF-8 encoding
LF line endings (Unix-style)
Cross-platform compatibility
Verification Mode#
Verify file integrity by checking against saved checksums:
Creating Checksum Files#
Save checksums for later verification:
# Save to a file (cross-platform safe)
iscc-sum -o project-checksums.txt src/**/*.py
# Or use shell redirection (platform-specific line endings)
iscc-sum *.doc > checksums.txt
Basic Verification#
Check if files match their saved checksums:
Output:
src/main.py: OK
src/utils.py: OK
src/config.py: FAILED
iscc-sum: WARNING: 1 computed checksum did NOT match
Verification Options#
Quiet Mode#
Only show failures:
Status Mode#
Silent operation, check exit code only:
Perfect for scripts:
if iscc-sum -c --status checksums.txt; then
echo "All files verified successfully"
else
echo "Verification failed!"
fi
Strict Mode#
Exit immediately on format errors:
Format Warnings#
Show warnings about improperly formatted lines:
Similarity Detection#
Unique Feature
Unlike traditional checksums, ISCC enables finding similar files through its Data-Code component.
How It Works#
ISCC's Data-Code captures content structure, allowing similarity comparison:
- Similar content → Similar Data-Codes
- Measured by Hamming distance
- Default threshold: 12 bits difference
Finding Similar Files#
Group files by content similarity:
Output:
document_v1.txt
~08 document_v2.txt
~12 document_draft.txt
report_2024.txt
~06 report_2024_final.txt
The numbers (e.g., ~08) indicate bit differences - lower means more similar.
Adjusting Similarity Threshold#
Find more similar files with a higher threshold:
Or find only very similar files:
Choosing Thresholds
- 0-5: Nearly identical files
- 6-12: Likely similar content (default)
- 13-20: Probably somewhat similar
Similarity with Other Options#
Combine similarity detection with other formats:
# BSD-style output with similarity grouping
iscc-sum --similar --tag *.doc
# Narrow format similarity
iscc-sum --similar --narrow --threshold 8 images/*
Tree Mode#
Generate a single checksum for an entire directory structure:
Output:
Directory Indicator
The trailing slash (/) indicates this is a directory checksum, not a file.
What Tree Mode Captures#
Tree mode creates a composite checksum of:
All file contents
Directory structure
File ordering
Respects
.isccignorepatterns
Use Cases for Tree Mode#
Practical Examples#
Example: Build Reproducibility#
Ensure your build outputs are consistent:
# Before build
iscc-sum --tree src/ -o src-checksum.txt
# After build
make clean && make
# Verify source unchanged
iscc-sum -c src-checksum.txt
# Check build outputs
iscc-sum --tree build/ -o build-checksum.txt
Example: Finding Duplicate Images#
Identify duplicate or near-duplicate images:
# Find very similar images
iscc-sum --similar --threshold 5 photos/*.jpg > duplicates.txt
# Review groups
grep -B1 "~0[0-5]" duplicates.txt
Example: Cross-Platform File Transfer#
Ensure files transfer correctly between systems:
Example: Continuous Integration#
Add file integrity checks to your CI pipeline:
# .github/workflows/verify.yml
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Verify test fixtures
run: |
iscc-sum -c tests/fixtures/checksums.txt --status || {
echo "Test fixtures corrupted!"
exit 1
}
Command Reference#
Synopsis#
Options Reference#
| Option | Short | Description |
|---|---|---|
--help |
Show help message and exit | |
--version |
Show version number and exit | |
| Generation | ||
--narrow |
Generate 128-bit checksums (ISO standard) | |
--tag |
Use BSD-style output format | |
--units |
Show Data-Code and Instance-Code components | |
--zero |
-z |
End lines with NUL instead of newline |
--output FILE |
-o |
Write to FILE with consistent encoding |
| Verification | ||
--check |
-c |
Read checksums and verify files |
--quiet |
-q |
Don't print OK for each file |
--status |
Don't output anything, exit code only | |
--warn |
-w |
Warn about format errors |
--strict |
Exit on first format error | |
| Advanced | ||
--similar |
Find files with similar content | |
--threshold N |
Hamming distance for similarity (default: 12) | |
--tree |
-t |
Single checksum for entire directory |
Exit Codes#
| Code | Meaning |
|---|---|
| 0 | Success - all operations completed successfully |
| 1 | Verification failure - one or more files failed |
| 2 | Error - I/O error, invalid format, or other issue |
Performance Tips#
Performance Optimization
- Large Files: Processed in 2MB chunks for memory efficiency
- Many Files: Use directory arguments instead of wildcards for better performance
- Network Storage: Create checksums locally, then transfer for faster processing
Troubleshooting#
Common Issues and Solutions
"No such file or directory"
- Check file path spelling and case (especially on Linux/macOS)
- Use tab completion to verify paths
- For spaces in names: use quotes or escape with backslash
"Permission denied"
- Check file permissions:
ls -l filename - For system files, consider if you really need to checksum them
- Never use
sudounless absolutely necessary
Checksum mismatches
- Verify the file hasn't been modified: check timestamps
- Ensure consistent line endings (use
-ooption) - Check for hidden characters in filenames
Performance issues
- For many small files, process directory instead of wildcards
- Use
--treemode for full directory comparison - Consider
--narrowformat for faster processing
See Also#
- Quick Start Guide - Getting started with ISCC-SUM
- Developer Guide - Using ISCC-SUM in your Python code
- Specifications - Technical details and standards
- GitHub Repository - Source code and issue tracking