Short answer: Use analyzeMFT for pure-Python parsing when you can install it, libmft when you want a typed object model, or shell out to omerbenamram/mft when you care about speed. Pure-Python parsing is ~10–50× slower than the Rust crate but is fine for one-off scripts.
What you are reading
The NTFS Master File Table is a sequence of fixed-size 1,024-byte records. To parse it from Python you only need to:
- Open the
$MFTfile (or read it from a disk image). - Step through it 1,024 bytes at a time.
- Apply the fixup array to each record. See the record anatomy for the byte-level layout.
- Walk the attribute stream inside each record.
The libraries below handle all four steps. Most analysts only fall back to raw struct.unpack when a library does not expose a field they need.
Option 1: analyzeMFT
analyzeMFT is the classic pure-Python MFT parser, originally by David Kovar and still maintained. CLI-first, but importable.
# pip install analyzeMFT
from analyzeMFT.mft_analyzer import MFTAnalyzer
analyzer = MFTAnalyzer(mft_file="path/to/$MFT", output_file="out.csv")
analyzer.analyze()
The CSV it produces has one row per record with timestamps from both $STANDARD_INFORMATION and $FILE_NAME. Good enough for spreadsheet-driven triage.
When to use: small $MFT files, ad-hoc scripts, no native dependencies allowed.
Limits: slow on multi-gigabyte inputs (single-threaded pure Python), and the object model is geared toward CSV emission rather than programmatic walks.
Option 2: libmft (typed object model)
If you want to query records as Python objects, libmft exposes a typed model close to the on-disk structure.
# pip install libmft
from libmft.api import MFT
with open("path/to/$MFT", "rb") as f:
mft = MFT(f)
for entry in mft:
if not entry.is_deleted():
continue
name = entry.get_full_path()
si = entry.get_attributes(0x10)[0] # $STANDARD_INFORMATION
print(name, si.created, si.modified)
libmft resolves parent references so you can ask each entry for its full path without writing the traversal yourself. It also handles $ATTRIBUTE_LIST extension records transparently — something analyzeMFT's CSV layer hides from you.
When to use: you want to write logic that walks records, filters by attribute, and emits a custom shape.
Option 3: shell out to a Rust parser
When the $MFT is large (~1 GB+) or you are batching across many disks, the fastest practical option is to shell out from Python to a native parser and read its JSON.
import json
import subprocess
# omerbenamram/mft — `cargo install mft` or download a release binary
proc = subprocess.run(
["mft_dump", "-o", "json", "path/to/$MFT"],
capture_output=True, check=True,
)
for line in proc.stdout.splitlines():
record = json.loads(line)
if record["header"]["flags"] & 0x1 == 0: # IN_USE clear → deleted
print(record["entry"], record["file_name"]["name"])
mft_dump emits JSON Lines — one record per line — which streams cleanly into Python without loading the full output into memory. Compared with analyzeMFT on the same input, the Rust parser is typically 10–50× faster and uses a tenth of the memory.
When to use: production pipelines, large inputs, or anywhere parsing time matters.
Reading $MFT straight from a disk image
If you have a raw .dd or .E01 image rather than an extracted $MFT file, use pytsk3 (Python bindings for The Sleuth Kit) to seek to $MFT on the volume and stream its bytes:
import pytsk3
img = pytsk3.Img_Info("disk.dd")
fs = pytsk3.FS_Info(img, offset=0) # use the NTFS partition offset
mft_file = fs.open_meta(inode=0) # $MFT is always inode 0
size = mft_file.info.meta.size
data = mft_file.read_random(0, size)
# data now contains $MFT; feed it to libmft or write to disk
This is the cleanest approach when the volume is encrypted at the partition level but mounted via a decryptor that gives you a raw image.
Common pitfalls
- Forgetting the fixup array. Reading raw 1,024-byte chunks without applying the USA gives you garbage at offsets 510 and 1022 of every record. Every library above does this for you — only roll your own parser if you understand the fixup mechanism (see the record anatomy post).
- Treating record number as identity. Record numbers are reused. The 64-bit file reference (record number plus sequence number) is the identifier that does not collide.
- Confusing the two timestamp sets. Every record carries timestamps in both
$STANDARD_INFORMATION(updated frequently) and$FILE_NAME(mostly stable). For timestomping detection, you need both — see the four MFT timestamps.
When to skip Python entirely
For one-off interactive analysis without any installation, drop the $MFT onto the browser parser on this site. It runs the same omerbenamram/mft crate compiled to WebAssembly, filters and searches client-side, and exports CSV — no Python required.