Parsing $MFT from Python without losing your weekend

Most of the Python on a forensics workstation is glue. You acquire artifacts with native tools, parse them with libraries someone else wrote, and use Python to stitch the results into reports and timelines. $MFT is one of the parsers where the temptation to roll your own is highest because the format is small and the libraries on PyPI have rough edges. Resist the temptation. There are better options.

This is the practical Python MFT parsing post: which library when, with the code I actually use.

What you are reading

The NTFS Master File Table is a sequence of fixed-size 1,024-byte records. To parse it from Python you have to:

Open the $MFT file (or read it from a disk image).
Step through it 1,024 bytes at a time.
Apply the fixup array to each record (the torn-write detection mechanism; see the record anatomy post).
Walk the attribute stream inside each record.

The libraries below handle all four steps. Falling back to struct.unpack is only worthwhile when a library does not expose a field you need.

Option 1: analyzeMFT (pure Python, easy to deploy)

analyzeMFT is the classic pure-Python parser, originally by David Kovar, still maintained. CLI-first, importable. Slow, but reliable on the records it understands.

# pip install analyzeMFT
from analyzeMFT.mft_analyzer import MFTAnalyzer

analyzer = MFTAnalyzer(mft_file="path/to/$MFT", output_file="out.csv")
analyzer.analyze()

The CSV it produces has one row per record with timestamps from both $STANDARD_INFORMATION and $FILE_NAME. Good enough for spreadsheet-driven triage when the MFT is small.

Use it when:

The $MFT is small (a few hundred MB or less).
You are working in an air-gapped Python-only environment.
You want a simple CSV without touching native dependencies.

Skip it when:

Inputs are multi-GB. analyzeMFT is single-threaded pure Python. A 4 GB MFT can take 20+ minutes that the Rust parser does in 30 seconds.
You want to write logic that walks records programmatically. The object model is geared toward CSV emission, not analysis.

Option 2: libmft (typed object model)

If you want to query records as Python objects, libmft exposes a typed model close to the on-disk structure.

# pip install libmft
from libmft.api import MFT

with open("path/to/$MFT", "rb") as f:
    mft = MFT(f)
    for entry in mft:
        if not entry.is_deleted():
            continue
        name = entry.get_full_path()
        si = entry.get_attributes(0x10)[0]  # $STANDARD_INFORMATION
        print(name, si.created, si.modified)

libmft resolves parent references so you can ask each entry for its full path without writing the traversal yourself. It also handles $ATTRIBUTE_LIST extension records transparently, which analyzeMFT's CSV layer hides from you.

Use it when:

You want to write logic that walks records, filters by attribute, and emits a custom shape.
You need access to the typed object model (security descriptors, reparse points, runlists) rather than flat CSV.

Skip it when:

Performance matters. libmft is faster than analyzeMFT but still pure Python; expect 5 to 10 minutes on a 4 GB MFT.

Option 3: shell out to a Rust parser

When the MFT is large or you are batching across many disks, the fastest practical option is to shell out to omerbenamram/mft_dump and read its JSON Lines output.

import json
import subprocess

# omerbenamram/mft — `cargo install mft` or download a release binary
proc = subprocess.Popen(
    ["mft_dump", "-o", "json", "path/to/$MFT"],
    stdout=subprocess.PIPE, text=True,
)

for line in proc.stdout:
    record = json.loads(line)
    if record["header"]["flags"] & 0x1 == 0:  # IN_USE clear → deleted
        print(record["entry"], record["file_name"]["name"])

mft_dump emits one record per line, which streams cleanly into Python without loading the full output into memory. Compared with analyzeMFT on the same input, the Rust parser is typically 10 to 50x faster and uses a tenth of the memory.

Use it when:

Production pipelines.
Large inputs.
Anywhere parsing time matters.

The only catch: you depend on the binary being installed. Pin a version, ship it alongside your tooling, and document the install in your runbook.

Reading $MFT straight from a disk image

If you have a raw .dd or .E01 image rather than an extracted $MFT file, use pytsk3 (Python bindings for The Sleuth Kit) to seek to $MFT on the volume and stream its bytes:

import pytsk3

img = pytsk3.Img_Info("disk.dd")
fs = pytsk3.FS_Info(img, offset=0)  # use the NTFS partition offset
mft_file = fs.open_meta(inode=0)    # $MFT is always inode 0
size = mft_file.info.meta.size
data = mft_file.read_random(0, size)
# data now contains $MFT; feed it to libmft or write to disk

This is the cleanest approach when the volume is encrypted at the partition level but mounted via a decryptor that gives you a raw image. It is also the right tool when the image contains VSS snapshots and you want to extract $MFT from each one. Combine with libvshadow for the snapshot enumeration.

A short script I keep around

Roughly the script I reach for first when looking at an unfamiliar MFT. It finds deleted records with resident data and dumps their contents.

import json
import subprocess

proc = subprocess.Popen(
    ["mft_dump", "-o", "json", "path/to/$MFT"],
    stdout=subprocess.PIPE, text=True,
)

for line in proc.stdout:
    rec = json.loads(line)
    if rec["header"]["flags"] & 0x1:
        continue  # in use
    for attr in rec.get("attributes", []):
        if attr["header"]["type_code"] != 0x80:
            continue  # not $DATA
        if not attr["header"]["is_resident"]:
            continue  # data lives elsewhere
        # Resident, deleted, has $DATA inline. The interesting case.
        data = bytes.fromhex(attr["data"]["resident_data"])
        print(f"rec={rec['entry']} seq={rec['header']['sequence']} "
              f"name={rec.get('file_name', {}).get('name')} "
              f"bytes={len(data)}")
        # Write to a file named by record number for review.
        with open(f"deleted_resident_{rec['entry']}.bin", "wb") as f:
            f.write(data)

That single script has surfaced enough deleted scripts, configs, and one-liner droppers across investigations to justify itself many times over. Resident data sits in MFT records people never think to check. See resident data for what fits.

Common pitfalls

Forgetting the fixup array. Reading raw 1,024-byte chunks without applying the USA gives you garbage at offsets 510 and 1022 of every record. The libraries above do this for you. Only roll your own parser if you understand the fixup mechanism in the record anatomy post.
Treating record number as identity. Record numbers are reused. The 64-bit file reference (record number plus sequence number) is the identifier that does not collide. If your script groups by record number alone, it will silently conflate deleted predecessors with their reusing successors.
Confusing the two timestamp sets. Every record carries timestamps in $STANDARD_INFORMATION (updated frequently) and $FILE_NAME (mostly stable). For timestomping detection you need both. See the four MFT timestamps.
Not handling extension records. A file whose attributes overflow one record has an $ATTRIBUTE_LIST (0x20) pointing at extension records. Many naive scripts emit the base record and miss attributes that live in the extensions. libmft handles this; if you roll your own walk, do not forget.

When to skip Python entirely

For one-off interactive analysis without any installation, drop the $MFT onto the browser parser on this site. It runs the same omerbenamram/mft crate compiled to WebAssembly, filters and searches client-side, and exports CSV. No Python required.

Parsing $MFT from Python without losing your weekend

What you are reading

Option 1: analyzeMFT (pure Python, easy to deploy)

Option 2: libmft (typed object model)

Option 3: shell out to a Rust parser

Reading $MFT straight from a disk image

A short script I keep around

Common pitfalls

When to skip Python entirely

Further reading

External resources

Related posts

External resources