A PDF creation library written in Rust, designed for SaaS and web applications. Low memory and CPU consumption — even for documents with hundreds of pages.
merge_pdfs combines two or more existing PDF files into a single output file.
Pages from each source are appended in document order: all pages from the first
source, then all pages from the second, and so on.
This is useful for assembling multi-part documents — for example, combining a cover page, a body report, and an appendix that were generated separately.
The merge operation relies on the pub(crate) infrastructure added in Issue 27:
PdfReader.page_object_numbers) to get the leaf page object
numbers in document order.collect_closure) — a BFS from the
page node through all indirect references, gathering every object the page
depends on: content streams, resource dictionaries, fonts, images, etc.source_obj_num → output_obj_num).raw_object_bytes, then scan for N G R (indirect reference) and N G obj
(object header) patterns and substitute using the remapping table. Stream
bodies are copied verbatim to preserve compressed binary content./Root.Re-serialising objects would require a full PDF object model. Copying raw bytes and rewriting only the integer tokens that appear in reference patterns is far simpler and avoids introducing a dependency on a full PDF parsing library. The only tokens that must change are object numbers; all other content (stream operators, name dictionaries, encoding tables) is copied byte-for-byte.
collect_closure is seeded with the leaf page objects. Because each page
references its parent Pages node (/Parent N G R), the source’s Pages tree
nodes are included in the closure and copied to the output. These copied nodes
are not referenced by the new merged Catalog and are effectively orphaned — they
waste a small amount of space but do not affect correctness for any PDF operation
that follows the standard Catalog → Pages → Kids traversal.
use pdf_core::{merge_pdfs, MergeOptions, PdfMergeError};
merge_pdfs(
&["report.pdf", "appendix.pdf"],
"combined.pdf",
MergeOptions::default(),
)?;
$opts = new MergeOptions(); // flattenForms defaults to false
merge_pdfs(['report.pdf', 'appendix.pdf'], 'combined.pdf', $opts);
| Field | PHP property | Default | Description |
|---|---|---|---|
flatten_forms |
flattenForms |
false |
Flatten interactive form fields. Not yet implemented. |
Setting flatten_forms = true returns PdfMergeError::NotSupported (Rust) or
throws an exception (PHP). Full support is deferred until form field reading and
writing are implemented.
merge_pdfs returns Result<(), PdfMergeError>.
| Variant | Meaning |
|---|---|
NotSupported |
An unsupported option was requested (e.g. flatten_forms) |
ReadError(PdfReadError) |
A source PDF could not be read or parsed |
Io(String) |
The output file could not be written |
PdfMergeError::ReadError(PdfReadError::XrefStreamNotSupported). This
affects many PDFs from Adobe Acrobat and LibreOffice.flatten_forms = true is not yet implemented.examples/rust/generate_merge.rs — merges rust-tables.pdf and
rust-invoice.pdf, prints the merged page count.examples/php/generate_merge.php — mirrors the Rust example.# Rust
cargo run --example generate_tables -p pdf-examples
cargo run --example generate_invoice -p pdf-examples
cargo run --example generate_merge -p pdf-examples
# PHP
php -d extension=target/release/libpdf_php.so examples/php/generate_tables.php
php -d extension=target/release/libpdf_php.so examples/php/generate_invoice.php
php -d extension=target/release/libpdf_php.so examples/php/generate_merge.php
All source objects are renumbered from a single global counter. This guarantees no two objects from different sources share an ID in the output, without needing to scan for conflicts.
Binary-compressed stream content may accidentally contain byte sequences that
look like N G R. Scanning stream content for references would corrupt it.
The renumber pass detects the stream keyword (at a word boundary) and copies
everything up to endstream verbatim. This is safe because real indirect
references can only appear in the object’s dictionary, not inside the stream
body.
All output objects use generation number 0. PDFs with objects at generation > 0 (the result of incremental updates that delete and reuse object numbers) are rare in practice, and resetting to 0 is always valid for a freshly written PDF.
merge_pdfs, MergeOptions,
PdfMergeError. PHP bindings via merge_pdfs() function and MergeOptions
class. Depends on pub(crate) infrastructure from Issue 27.