If you’ve ever used a template engine for Microsoft Word, you’ve likely encountered the “split run” nightmare. You type ${name} in Word, but internally, Word saves it as three separate XML runs: <w:r>${</w:r>, <w:r>na</w:r>, <w:r>me}</w:r>.
When this happens, a simple search-and-replace engine fails because it never “sees” the full expression. This month, I’ve introduced a structural solution to this problem in office-stamper: Smart Tag Preprocessing.
Why Smart Tags?
When looking for a container to “shield” our placeholders from Word’s fragmentation, I evaluated several options:
- Bookmarks: Too restrictive; they can’t easily hold complex XML structures.
- CustomXML: Deprecated in newer Word versions and complex to manage.
- Run Merging: Hard to implement reliably without breaking formatting.
I chose Smart Tags (<w:smartTag>). They are flexible, can hold any property, and most importantly, they act as a stable wrapper. By wrapping ${expression} in a Smart Tag before the main processing pass, we ensure that no matter how Word decides to split the underlying runs, the expression remains logically grouped.
The PreparePlaceholders Pre-processor
The implementation uses a new pre-processing phase. It scans the document for ${...} patterns and wraps them:
public class PreparePlaceholders implements PreProcessor {
@Override
public void process(WordprocessingMLPackage document) {
var pattern = Pattern.compile("(\\$\\{([^{]+?)})", Pattern.DOTALL);
// ... scans paragraphs and wraps matches in Smart Tags ...
}
}
Extensibility for the Community
This isn’t just a hardcoded fix. I’ve refined the PreProcessor and PostProcessor interfaces that were introduced earlier this year.
I envision a future where users can write their own processors to:
- Translate templates on the fly before stamping.
- Obfuscate sensitive placeholder names.
- Apply custom formatting logic that the standard engine doesn’t cover.
Impact: Predictability for Everyone
- For Template Designers: You don’t have to change anything! Keep writing your expressions as usual. The engine now handles the “XML magic” to make them robust.
- For Processor Authors: This is a shift in mindset. Instead of looking for raw text runs, you should now leverage the structure provided by these tags.
By moving this logic to a dedicated PreProcessor, we’ve kept the core engine clean while significantly increasing its “it just works” factor.
December 8th 2025 — Commit Summary:
- Feat: Added
PreparePlaceholderspre-processor using Smart Tags. - Refactor: Improved
PreProcessorandPostProcessorinterface definitions. - Logic: Implemented robust regex-based placeholder identification and wrapping.
- Test: Added comprehensive tests for fragmented run scenarios.