· 9 min read · By Daniel Shilansky, Founder, TomeVox

Images, Tables, and Footnotes in an Audiobook: How to Handle Visual Content

To handle images and footnotes in an audiobook, treat each by intent: read short content footnotes inline, batch citation notes at a chapter's end, and cut bare source notes. Replace charts and tables with a one or two sentence spoken summary, and point listeners to a downloadable supplemental PDF.

Visual content is the single most common reason nonfiction authors stall before making an audiobook, because a footnote, a chart, or a table cannot simply be "read aloud" the way a paragraph can. A listener cannot see a figure, cannot click a link, and cannot glance back at a table to re-check a number. The fix is not to abandon the audiobook; it is to decide, element by element, whether each visual should be described in audio, pointed to in a supplemental PDF, or cut because it served the eye alone. This guide gives the judgment rules and example scripts to make those decisions confidently.

The principle underneath every rule below is that an audiobook is an adaptation, not a recording of the page. Print can use the page layout to do work — a superscript number, a boxed table, a clickable URL — that audio has to accomplish with words and structure instead. Once you accept that turning visual scaffolding into spoken meaning is editing rather than loss, the decisions get much easier, and the result is an audiobook that stands on its own without a single "see figure 3" left dangling.

How do you handle footnotes and endnotes in an audiobook?

Footnotes and endnotes in an audiobook are handled by sorting them into two kinds and treating each differently. A content footnote carries an aside, an anecdote, a qualification, or part of the argument; it adds meaning when heard, so it usually deserves a place in the audio. A reference footnote only points to a source — a citation, a page number, a "see also" — and adds nothing a listener can act on, so it is usually cut. Sorting your notes into these two buckets before you decide anything else removes most of the difficulty.

For content footnotes, you then choose between reading them inline and batching them. Read a note inline — woven into the sentence or set off with a brief spoken cue like "a quick aside" — when it is short and the listener would lose the thread by waiting. Batch notes at the end of the chapter when there are many of them, or when each is long enough that an inline read would shatter the flow of the main text. Whichever you choose, keep it consistent across the whole book: a listener quickly learns that your asides come at chapter's end, but is jolted if the pattern changes from chapter to chapter.

For reference footnotes, cutting is the default, with one caveat: if a citation matters to the book's credibility — a contested claim, a key study an attentive reader will want — keep the substance in spoken form ("a 2019 study in the Lancet found...") rather than reading the bibliographic string. Page numbers, "ibid", and volume-and-issue details are visual-only scaffolding and should go. Heavily-footnoted academic and reference titles benefit most from this discipline, a theme covered in the nonfiction AI audiobook narration guide.

How do you describe images, charts, and tables in an audiobook?

You describe an image, chart, or table by speaking the point it was making, not by enumerating its contents. The visual existed to communicate something — a trend, a comparison, a structure — and your job is to deliver that something in a sentence or two. Reading every cell of a table or every data label of a chart is the most common mistake; it is tedious to hear and impossible to remember. Capture the takeaway, give the one or two numbers that actually matter, and move on.

Concrete describe-in-audio scripts make this easier. For a line chart showing revenue over time, instead of reading axes and points, narrate: "Revenue climbed steadily from about two million in 2019 to roughly nine million in 2024 — more than a fourfold increase across five years." For a comparison table of three pricing tiers, narrate the shape: "The three plans differ mainly on word count: the entry plan covers short books, the middle plan doubles that, and the top plan handles full-length manuscripts." For a photograph or diagram that sets a scene, a single descriptive line — "a hand-drawn map of the valley, with the river splitting it north to south" — does the work.

When a visual genuinely cannot be summarized — a detailed data table a reader needs to study, a worksheet, a complex figure — point the listener to a downloadable supplemental PDF. Narrate a short pointer such as: "The complete figures are in the free companion PDF for this book; you'll find the link in the book's description." Most distributors let you host or link a companion PDF from the title's description or your own site, so the listener keeps the detail without you reading a spreadsheet aloud. Tables that are short and comparative, however, are better spoken as prose than offloaded — only push to the PDF what truly needs the page.

How do you read URLs and email addresses in an audiobook?

You read a URL or email address in an audiobook by judging whether the listener could realistically remember and use it. A short, brandable link — "tomevox dot com" — can be spoken as plain words, dropping the "https colon slash slash" that means nothing aloud. A long or parameter-heavy link ("example dot com slash 2026 slash q3 slash report dash final dot pdf") should never be read character by character; nobody can transcribe that from audio while listening. Speak only what a person could plausibly retain.

For anything longer than a bare domain, route the listener to one place instead of reading raw strings. The standard pattern is a single companion page or a supplemental PDF that lists every link, resource, and email address in the book, introduced once with a line like: "Every link and resource mentioned in this book is collected on one page — see the companion page linked in the description." This respects the medium: a listener cannot click audio, so you give them one easy destination rather than a dozen unspeakable URLs. Email addresses follow the same rule — spell out a short, brandable address if it must be heard, and otherwise send the listener to the companion page.

What front and back matter belongs in an audiobook?

Front and back matter in an audiobook keeps what carries meaning aloud and drops what only organizes the page. Keep the title and author at the very start — every audiobook should open by stating its title and author, a compliance point worth checking before delivery — plus a short dedication and a substantive author's note if it adds context. A foreword or introduction that argues something stays; a copyright page read line by line does not. The opening of a finished audiobook should sound like a book beginning, not like a legal notice.

Drop or compress the page-bound scaffolding. The table of contents is redundant in audio because chapter markers do the navigating, so it is not read aloud line by line. The index is purely a page-lookup tool and is always cut. Acknowledgements can be read if short and warm, or trimmed if they run to pages of names. The same describe-or-cut judgment you applied to footnotes and figures applies here, and the broader prep checklist lives in the EPUB to audiobook conversion guide, which covers how chapter structure carries over from the manuscript.

Which elements to cut, describe, or move to a PDF — the decision table

The table below maps each common visual or print-only element to its recommended audiobook treatment. Use it as a first pass over your manuscript: tag every footnote, figure, table, link, and front-matter section with its treatment before you finalize the audio script, so nothing is decided ad hoc during narration.

ElementRecommended treatmentWhy
Content footnote (aside, anecdote, argument)Read inline, or batch at chapter endCarries meaning a listener wants to hear
Reference footnote (citation, page no., ibid)Cut; keep substance only if credibility-criticalPoints to a page the listener cannot use
Chart or graphDescribe the trend + key numbers in 1–2 sentencesThe takeaway matters, not every data point
Short comparison tableSummarize as spoken proseThe comparison is the point; speak it
Large or detailed data tableMove to a supplemental PDF, narrate a pointerDetail needs the page; most distributors allow a companion PDF
Photo, diagram, or illustrationOne descriptive line, or skip if decorativeSet the scene without over-describing
Short, brandable URL or emailSpeak as words ("tomevox dot com")Memorable enough to retain by ear
Long or complex URLPoint to a companion page that lists all linksCannot be transcribed from audio
Title page (title + author)Always read at the startRequired opening; identifies the book
Table of contentsCut; chapter markers replace itNavigation is handled by the M4B markers
IndexCutPure page-lookup tool; useless in audio

The takeaway from the table is that almost every element falls into one of three actions — describe it, point to it, or cut it — and the deciding question is always the same: does this carry meaning when heard, or does it only work on the page? Content footnotes, chart takeaways, short tables, and the title page carry meaning and stay (adapted). Indexes, tables of contents, bare citations, and long URLs work only on the page and either get cut or moved to a companion PDF or page. Tagging your manuscript against this table once, up front, turns a vague worry into a finished checklist.

How TomeVox handles a visual-heavy nonfiction book

TomeVox is built to narrate an audio-ready script, so the visual decisions you make using the rules above flow straight into production. You prepare the manuscript by tagging footnotes for inline-or-batch-or-cut, writing the one or two sentence describe-in-audio lines for charts and tables, and adding the pointer lines for any supplemental PDF or companion page. TomeVox then turns that script into a finished M4B with chapter markers plus per-chapter MP3 files, usually within 48 hours, for a flat early-bird fee of $49 up to 60,000 words, $79 up to 100,000 words, and $99 up to 150,000 words, with $0.0005 per word only above 150,000.

The pre-delivery pass is where visual content gets caught. Every audiobook is automatically checked for technical quality, and a focused listen of the converted sections is exactly where an awkwardly spoken URL, a chart description that reads as a list of numbers, or a footnote pattern that drifts between chapters gets flagged. If a chapter's description or spoken link needs another pass, you can re-generate that chapter at no extra cost rather than redoing the book, and a free first-chapter preview with no credit card lets you hear how your front matter and first figures sound before paying. TomeVox supports 13 languages, gives you full commercial rights with no exclusivity, and is EU-based in Berlin under GDPR.

Because the finished file is yours outright, you control where the audiobook and its companion PDF go. You can upload the audio directly to Google Play Books and Kobo, or distribute wide to Apple Books, Spotify, and more through an aggregator that accepts AI narration such as PublishDrive or Author's Republic — Author's Republic also unlocks Chirp. Standard ACX still requires human narration, and while Audible has announced third-party-AI acceptance it is not yet open to all indie authors, so check the ACX audiobook submission requirements before counting on that channel. Disclose digital-voice narration wherever a platform asks, and as best practice everywhere, and the same goes for hosting a companion PDF. For a fuller map of channels, see where to sell an AI audiobook and the AI audiobook commercial rights guide.

Frequently asked questions

How do you handle footnotes in an audiobook?

Footnotes in an audiobook are handled one of three ways: read inline if the note is short and adds meaning, batched at the end of the chapter if there are many citation-style notes, or cut if the note is purely a source citation that does not change the listener's understanding. The rule of thumb is to read content footnotes that carry argument or anecdote, and omit reference footnotes that only point to a page or source. Whichever route you choose, apply it consistently across the whole book so the listener learns what to expect.

How do you describe images, charts, and tables in an audiobook?

You replace each image, chart, or table with a short spoken description that conveys the point the visual was making, not every data cell. For a chart, state the trend and the one or two numbers that matter; for a table, summarize the comparison in prose rather than reading every row. For anything that genuinely needs the visual, narrate a one-line pointer to a downloadable supplemental PDF — for example, saying the full table is in the companion PDF — which most distributors allow you to host or link from the book's description.

How do you read a URL or email address in an audiobook?

Read a short, memorable URL aloud as plain words, for example saying "tomevox dot com" rather than spelling out the protocol. For long or complex links, do not read the raw string; instead point the listener to a short companion page or a supplemental PDF that lists every link, since a listener cannot click audio. Spell email addresses only if they are short and brandable, and otherwise route the listener to the same companion page.

Should you cut anything when turning a nonfiction book into an audiobook?

Yes. Cut elements that exist only to serve the eye and add nothing when heard: the index, the table of contents read line by line, page-number cross-references, decorative figures, and bare source-citation footnotes. Keep and adapt anything that carries meaning, such as content footnotes, the substance of a chart, and front matter like the title, author, and a short dedication. Cutting visual-only scaffolding is editing, not loss, because it removes what the listener cannot use.

Can an AI audiobook handle a complex nonfiction book with lots of visuals?

Yes, when the manuscript is prepared for audio first. You decide how each footnote, chart, table, and link should be treated, write short describe-in-audio lines where a visual carries meaning, and reference a supplemental PDF for anything that needs the page. TomeVox then narrates that audio-ready script, every audiobook is automatically checked for technical quality before delivery, and you can re-generate any chapter at no extra cost if a description or a spoken URL needs another pass.

Turn your nonfiction book into an audiobook the right way

Upload your manuscript to TomeVox, choose a voice, and get a free first-chapter preview with no credit card — hear how your front matter and figures sound before you pay. Like it? Get the full audiobook as an M4B + per-chapter MP3 within 48 hours for a flat $49–$99, with full rights and no exclusivity.

Try TomeVox Free