· 8 min read · By Daniel Shilansky, Founder, TomeVox

How to Prepare Your Manuscript for AI Narration (Checklist)

To prepare a manuscript for AI narration, clean it so the narrator reads only what should be heard: spell out URLs and symbols, mark how numbers and dates should be read, expand acronyms, fix all-caps and stray punctuation, tidy dialogue and scene breaks, standardise chapter headings, and write a short pronunciation guide.

An AI narrator is literal: it reads exactly what is on the page. Anything the text leaves ambiguous becomes a coin-flip in the audio. A web address gets read character by character, a "Dr." could be "doctor" or "drive", and a centred row of asterisks marking a scene break might be spelled out as "asterisk asterisk asterisk". None of these are hard to fix, but they are far cheaper to fix in the document than in finished audio. This checklist walks through the eight passes that catch the issues most likely to surface in an AI-narrated audiobook.

Preparing a manuscript for AI narration is mostly a tidy-up, not a rewrite. You are not changing your prose; you are removing visual shorthand that only makes sense to the eye and replacing it with cues a narrator can voice. Work through the steps below in order, keep a running pronunciation list as you go, and finish by hearing a free first-chapter preview before committing to the full book. For the wider picture, the AI audiobook production guide sets out the full workflow.

Do you need to prepare a manuscript for AI narration at all?

You do need light preparation, because an AI narrator reads the manuscript verbatim and cannot infer intent the way a human narrator scanning the page would. A skilled human narrator silently normalises a date, skips a footnote marker, and reads "&" as "and" without being told; an AI narrator does only what the text and your instructions specify. Preparation is the act of writing down those silent decisions so the audio matches what you hear when you read your own book.

A clean manuscript pays off twice: a clean manuscript produces a clean first generation, which means fewer chapters to re-do, and the same cleanup helps every downstream step. The checklist below is also exactly what TomeVox needs from you on submission, so working through it doubles as onboarding.

How do you clean URLs, emails, and symbols (Step 1)?

Step one is to replace or remove anything a reader skims with their eyes but a narrator must voice: web addresses, email addresses, and symbols such as &, %, @, #, $, and +. A bare URL like "example.com/audiobook-guide" can be read out character by character, which is unlistenable, so decide whether to say it in plain words ("our website, example dot com"), summarise it ("the link in the description"), or cut it from the spoken text entirely and keep it only in the ebook.

Symbols are the same problem in miniature. Write "and" instead of "&" in body prose, spell out "percent", "at", "dollars", and "number" where those are meant, and check that any currency or measurement reads naturally — "$49" should usually be written "forty-nine dollars" in narration-facing text if you want it spoken that way. Footnote and endnote markers deserve a decision too: most fiction and trade non-fiction drop them from the audio, and if you keep them you should say how they are read.

How should AI narration handle numbers and dates (Step 2)?

Step two is to decide how every meaningful number, date, time, and measurement should be read, because many figures are genuinely ambiguous. The year 1900 can be "nineteen hundred" or "one thousand nine hundred"; 1,200 can be "twelve hundred" or "one thousand two hundred"; and "1980s" might be "nineteen eighties" or "nineteen-eighty-s". Phone numbers, ranges, fractions, and Roman numerals all carry the same risk. For any number whose spoken form matters to meaning or rhythm, write it the way you want it heard, or note it in your pronunciation guide.

A simple rule helps: if reading the digits aloud produces only one natural result, leave them; if you hesitated, spell it out. Times of day ("9:30"), sports scores, version numbers, and addresses are common trip-points, and measurements need care too, since "5 km" could be "five kilometres" or "five k-m". This step removes the single most common source of re-generations.

How do you handle abbreviations and acronyms (Step 3)?

Step three is to tell the narrator, for each abbreviation and acronym, whether it is spoken as letters or as a word. NASA is said as a word, FBI as letters, and a brand-new acronym you invented could go either way, so it must be specified. Titles and shorthand are the other half of this pass: "Dr." may be "doctor" or "drive", "St." may be "saint" or "street", "Mt." may be "mount" or "mountain", and "Jan." could be a month or a name. Expand any of these that your context leaves unclear.

The cleanest approach is to write the spoken form into the text on first use and add anything unusual to your pronunciation guide. Recurring initialisms only need to be defined once if you list them, which keeps the narration consistent without you editing every instance by hand.

How do you fix em-dashes, ellipses, and all-caps (Step 4)?

Step four covers punctuation and emphasis that an AI narrator interprets as timing or stress. Em-dashes and ellipses generally signal pauses, which is usually what you want, so make sure you have used them deliberately rather than as a catch-all; long strings of dashes or dots can produce odd, drawn-out gaps. Check that interrupted dialogue and trailing thoughts use the punctuation that gives the pause you intend.

All-caps is the bigger trap. A word in capitals for emphasis — "she said NO" — can be read letter by letter ("N-O") rather than shouted, so convert emphasis-caps to italics or normal case and reserve all-caps only for genuine initialisms you have already defined. The same goes for stylised spacing like "s p a c e d o u t" text, which a narrator cannot voice as intended and should be rewritten in plain words that describe the effect.

How do you clean up dialogue and scene breaks (Steps 5 and 6)?

Step five is dialogue. Make sure every line has clear attribution so the narrator does not blur two speakers into one, use consistent quotation marks throughout (straight or curly, not a mix), and check that dialogue tags sit where they belong. A single AI voice narrates all characters, so attribution carries the work that distinct character voices would do in a full-cast production — if you are weighing how a single narrator handles dialogue, the guide to choosing an audiobook voice is a useful companion. Remove any stray stage-direction artefacts, tracked-changes remnants, or comment bubbles that survived from editing.

Step six is scene breaks. In-chapter section breaks are usually shown with a centred #, three asterisks, or an ornament, and you want the narrator to insert a pause there, not to read the symbol aloud. Pick one consistent marker for soft scene breaks across the whole book and keep it distinct from your chapter headings, so the structure is unambiguous. Consistency is what lets the production process turn a marker into a clean beat of silence rather than a spoken character.

How should you set up chapter headings and front matter (Step 7)?

Step seven is structure. Give every chapter a consistent heading style — applied as a real heading in DOCX or EPUB, not just bold text — because that structure is what tells the system where each chapter begins and lets it produce per-chapter MP3 files plus an M4B with chapter markers. Number or name chapters uniformly ("Chapter One", "Chapter 1", or your titled scheme) and avoid burying the body text inside images or complex tables a narrator cannot read in order.

Front and back matter need a deliberate decision rather than a copy-paste of the print pages. Name the book's title and author once at the opening so the audiobook identifies itself, and keep opening credits short; you do not read the full copyright page aloud, and a long blurb does not belong in opening credits. Decide what closing material — acknowledgements, an author note, a call to action — actually works as audio. For where the finished file then goes, the guide to where to sell an AI audiobook and the Google Play Books walkthrough cover distribution once the audio is ready.

How do you build a pronunciation guide and preview the result (Step 8)?

Step eight is the pronunciation guide, the single most valuable asset you can hand a narrator. List every proper noun, place name, invented word, foreign term, and piece of jargon that could be said more than one way, and beside each one write a plain phonetic respelling of how it should sound — for example "Siobhan = shiv-AWN" or "Caius = KYE-us". A short table works well, and you only need to define each term once for it to stay consistent across the whole book.

Word in manuscriptTypeSay it like
SiobhanCharacter nameshiv-AWN
WorcestershirePlace nameWUUS-ter-sher
DaenysInvented (fantasy)DAY-nis
EBITDAAcronym (nonfiction)ee-BIT-dah
1900 (the year)Numbernineteen hundred

The takeaway from the table is that one short respelling per ambiguous term removes almost all pronunciation risk before a single second of audio is generated, and it scales: a fantasy novel with many invented names benefits most, while a contemporary memoir may need only a handful of entries. Once the guide is done, hear the result. TomeVox offers a free first-chapter preview with no credit card, so you can listen to the actual voice on your actual text and confirm the manuscript is ready. Every audiobook is automatically checked for technical quality before delivery, and if a chapter still needs a fix you can re-generate that chapter at no extra cost after updating your guide.

The 8-step pre-submission checklist

1. Spell out or cut URLs, emails, and symbols (&, %, @, #, $).

2. Decide how every ambiguous number, date, time, and measurement is read.

3. Mark each abbreviation/acronym as letters or a word; expand Dr., St., Mt.

4. Convert emphasis all-caps to italics; check em-dashes and ellipses signal the right pauses.

5. Give every line of dialogue clear attribution and consistent quotation marks.

6. Use one consistent scene-break marker so it becomes a pause, not a spoken symbol.

7. Apply real, uniform chapter headings; name title/author once; trim front and back matter.

8. Build a phonetic pronunciation guide, then run a free first-chapter preview.

What does a clean manuscript get you?

A clean manuscript gets you a faster, cheaper, better audiobook: fewer re-generated chapters, a narration that matches your intent, and a file that is ready to distribute without surprises. Because TomeVox charges a flat early-bird fee — $49 up to 60,000 words, $79 up to 100,000 words, and $99 up to 150,000 words, with $0.0005 per word only above 150,000 — and delivers an M4B with chapter markers plus per-chapter MP3 files usually within 48 hours, the time you spend preparing the manuscript translates directly into a cleaner first pass rather than a cheaper invoice. The preparation is the part of the process most within your control.

Preparation also keeps your options open afterwards. You receive full commercial rights with no exclusivity, so a well-prepared file can go directly to Google Play Books and Kobo, or wide to Apple Books and Spotify through an AI-friendly aggregator such as PublishDrive or Author's Republic, which also unlocks Chirp; disclose AI or digital-voice narration wherever a platform asks, and as best practice everywhere. Standard ACX submission still requires human narration and Audible's third-party-AI acceptance is not yet open to all indie authors, so plan accordingly — the ACX requirements guide and the guide to selling audiobooks direct cover those routes.

Frequently asked questions

Do I need to format my manuscript before AI narration?

Yes, light preparation noticeably improves the result. An AI narrator reads exactly what is on the page, so URLs, symbols, ambiguous numbers, all-caps, and undefined acronyms can be mispronounced or spelled out unless you decide in advance how they should be read. Spending an hour cleaning the manuscript and writing a short pronunciation list usually means the audiobook comes out clean on the first generation, with fewer chapters to re-do.

How does AI narration read numbers and dates?

AI narration converts numbers using general rules, but many figures are ambiguous, so it is safest to mark your intent. The year 1900 can be read as nineteen hundred or one thousand nine hundred, and a figure like 1,200 can be twelve hundred or one thousand two hundred. For any number whose spoken form matters, write it out the way you want it heard, or note it in your pronunciation guide so the narrator reads it consistently.

What file format should I submit for AI narration?

A clean DOCX or EPUB with clearly styled chapter headings is ideal, because the structure tells the system where chapters begin and end. Avoid embedding the body text inside images, complex tables, or footnote-heavy layouts that an AI narrator cannot read in order. TomeVox accepts manuscripts and uses the chapter structure to produce per-chapter MP3 files plus a single M4B with chapter markers.

Can I fix a mispronunciation after the audiobook is generated?

Yes. With TomeVox, every audiobook is automatically checked for technical quality before delivery, and if a name or term is mispronounced you can re-generate that chapter at no extra cost after adding the correct pronunciation to your guide. Preparing the manuscript well upfront simply reduces how many fixes are needed, so most books are clean after the first pass.

How long does manuscript preparation take?

For a typical full-length book, working through this checklist takes about thirty to sixty minutes, plus a little more if you have many proper nouns or invented words to list. The pronunciation guide is the only step that scales with content. A free first-chapter preview then confirms the manuscript is ready before you commit to the full audiobook.

Hear your prepared manuscript narrated, free

Run your cleaned-up manuscript through TomeVox, choose a voice, and get a free first-chapter preview with no credit card. Like it? Get the full audiobook as an M4B + per-chapter MP3 within 48 hours for a flat $49–$99, with full rights and no exclusivity.

Try TomeVox Free