Why Interview Prep Determines Edit Speed

I have a rule that I tell every new editor I work with: the time you spend in the timeline is directly proportional to the time you did not spend on prep. A well-prepped interview project takes 2 to 3 hours to edit. The same interview with no prep takes 6 to 8 hours. The footage is identical. The edit quality is identical. The only difference is whether you organized the material before or during the edit.

Interview footage is uniquely demanding because the source material is long and unstructured. A 45-minute interview generates 45 minutes of continuous footage where the usable content is scattered throughout. Unlike a scripted shoot where you know which takes are good, an interview requires you to evaluate every minute of footage to find the strongest moments.

Without prep, you do this evaluation while editing — scrubbing through the timeline, listening to the interview in real time, trying to remember where the good quotes were, going back and forth between sections you vaguely remember being strong. This is not editing. This is searching. And it is the most wasteful use of your editing time possible.

Proper prep separates the search from the edit. By the time you open your NLE, you already know which soundbites you are using, what order they go in, where the b-roll covers transitions, and what the final structure looks like. The actual editing becomes assembly and polish — the creative work that is actually worth your time.

Transcription as the Foundation

Every interview prep workflow starts with a transcript. Reading is faster than listening. You can scan a 45-minute interview transcript in 10 to 15 minutes, identifying the strongest moments, the weak sections, and the natural structure of the conversation. Doing the same by watching the footage takes 45 minutes minimum — and you will miss things because listening is more passive than reading.

AI transcription has made this step nearly instant. Feed the interview audio to a transcription tool and you have a timestamped, speaker-labeled transcript in 5 to 10 minutes. The accuracy on clean interview audio is 96 to 98 percent, which means you will need to correct a few errors but the foundation is solid.

The critical requirements for interview transcription:

Speaker labels. The transcript must distinguish between interviewer and guest. Without speaker labels, you cannot quickly scan for guest responses versus interviewer questions. Most AI transcription tools handle this automatically through speaker diarization.

Timestamps. Every paragraph or segment needs a timecode reference. When you find a strong soundbite in the transcript, you need to jump directly to that moment in the footage without guessing. Timestamps at 30-second intervals are the minimum. Word-level timestamps are ideal.

Paragraph breaks at natural pauses. A transcript that is one continuous block of text is harder to scan than one broken into paragraphs at topic changes and natural pauses. Good transcription tools handle this automatically. If yours does not, a quick manual pass to add paragraph breaks is worth the 5 minutes.

EDITOR'S TAKE

I print my interview transcripts. This sounds old-fashioned but it works. I highlight strong soundbites in yellow, mark sections to cut with a strikethrough, and write structural notes in the margins. By the time I am done with the transcript, I have a physical paper edit that I can reference while building the timeline. The tactile process of reading and marking a printed transcript engages a different kind of attention than screen reading, and I catch things I would miss on a monitor.

Marking Selects and Kills

Once you have the transcript, the first pass is binary: is this section a potential select or a definite kill?

Selects are any section that might make the final cut. Be generous on this pass. A soundbite that seems marginal now might become essential when you discover a structural gap later. Mark anything that is coherent, on-topic, and delivered well as a potential select.

Kills are sections that will definitely not be used. Off-topic tangents that went nowhere. Technical interruptions ("hold on, let me fix my mic"). Repeated answers where the guest covered the same ground twice (keep the better version, kill the weaker one). Rambling sections where the guest lost the thread.

A typical 45-minute interview yields 15 to 20 minutes of selects and 25 to 30 minutes of kills. For a final video of 8 to 12 minutes, you are working from those 15 to 20 minutes of selects to build a tight, compelling piece.

In the transcript, I use a simple marking system. Square brackets around selects with a quality rating: [A] for must-use moments, [B] for strong but not essential, [C] for usable if needed. Strikethrough for kills. After this pass, I can see at a glance how much strong material exists and whether I have enough for the target duration.

AI can assist with this step by identifying candidate selects based on the content of the dialogue and the delivery quality. It will not replace your editorial judgment about what makes a strong soundbite for your specific audience, but it can flag sections where the speaker is making clear, quotable statements versus rambling or repeating themselves. The guide on creating paper edits with AI transcription covers this in detail.

Organizing Soundbites by Theme

Interviews rarely follow a clean, linear structure. The guest might mention a key insight at minute 5, return to the same topic at minute 22, and add a clarifying detail at minute 38. If you edit the interview chronologically, these related moments are scattered. If you organize soundbites by theme first, you can assemble them into coherent sections regardless of when they were recorded.

After marking selects, group them into thematic clusters. For a business interview, themes might include: background and origin story, the problem they solved, how the product works, customer results, and future plans. For a creative interview: influences, creative process, challenges and failures, breakthrough moments, and advice for others.

Each theme becomes a section in your final video. The order of themes is your video structure. This is where editorial judgment matters most — you are deciding what story to tell and how to tell it, not the guest. The guest provided the raw material. You are shaping it into a narrative.

ThemeSelectsBest SoundbiteDuration
Origin story3 clips"I quit my job on a Tuesday and started building on Wednesday" [A]~2 min
Problem they solved4 clips"Nobody was solving this for small teams" [A]~3 min
How it works2 clips"It takes 10 minutes instead of three days" [B]~2 min
Customer results3 clips"One customer saved $40,000 in the first month" [A]~2 min
What is next2 clips"We are launching in Europe by summer" [B]~1.5 min

This table gives you a clear picture of the final video before you have opened your NLE. You know the structure, the duration, the strongest moments, and where you might need supplementary material or b-roll to cover gaps.

Building a Paper Edit

A paper edit is the written version of your final video. It lists every soundbite in order, with timecodes, transitions, and b-roll notes. It is the blueprint that turns your timeline session from creative exploration into efficient assembly.

The format I use:

Section header: The theme and target duration.
Soundbite entries: Timecode in, timecode out, transcript text, quality rating, and any editing notes ("cut the first sentence," "trim the pause at the start," "needs b-roll cover for the jump cut at 22:34").
Transition notes: How each section connects to the next. Does the guest naturally bridge between topics? Do you need a question from the interviewer as a transition? Does b-roll cover the topic shift?

A complete paper edit for a 10-minute interview video takes 20 to 30 minutes to build from a marked-up transcript. That 30-minute investment typically saves 2 to 3 hours in the timeline because every editing decision has already been made on paper.

AI tools can generate draft paper edits from transcripts. You describe the target structure and duration, and the AI suggests a soundbite selection and ordering based on the transcript content. This is a starting point, not a finished product — you will always need to review and adjust based on your editorial vision. But it compresses the initial structuring work from 30 minutes to 5 minutes, leaving you with more time for creative refinement.

B-Roll Mapping to Soundbites

Interview videos that are nothing but talking heads lose viewers. B-roll breaks up the visual monotony, covers jump cuts between non-sequential soundbites, and adds visual context to what the speaker is discussing.

Map b-roll to your paper edit before you start editing. For each soundbite, note whether it needs b-roll coverage and what type.

Jump cut covers. When you cut two non-sequential soundbites together, the visual jump between them is jarring. B-roll over the cut point masks the transition. Note every jump cut in your paper edit and assign b-roll to cover it.

Illustrative b-roll. When the guest talks about their product, show the product. When they discuss their team, show the team. When they reference data, show a graphic. This b-roll adds context and keeps viewers engaged during long answers.

Atmospheric b-roll. Establishing shots, location footage, and environmental cutaways set the scene and give the video a sense of place. These are particularly important at the beginning of the video and at major section transitions.

If you are working with a creator's existing footage organized by scene type, finding the right b-roll is much faster than digging through an unstructured library. AI scene detection and tagging turn a 15-minute b-roll hunt into a 2-minute search.

The Complete Interview Prep Workflow

INTERVIEW PREP WORKFLOW
01
Generate Timestamped Transcript
Run interview audio through AI transcription with speaker labels and timestamps. Review and correct proper nouns, technical terms, and any misheard sections. Time: 15 minutes.
02
Mark Selects and Kills
Read the transcript and mark each section as a select (A, B, or C quality) or kill. Be generous with selects on this first pass. Time: 10 to 15 minutes for a 45-minute interview.
03
Group Selects by Theme
Organize marked selects into thematic clusters. Identify 4 to 6 themes that will become sections in your final video. Note the strongest soundbite in each theme. Time: 10 minutes.
04
Build Paper Edit
Arrange themes in narrative order. List specific soundbites with timecodes within each section. Add transition notes and b-roll mapping. Time: 20 to 30 minutes.
05
Assemble in NLE
Follow the paper edit to build your timeline. Every cut has been pre-decided. Focus on execution, pacing refinement, and polish. Time: 1.5 to 2.5 hours for a 10-minute final video.

Total prep time: 55 to 70 minutes. Total edit time: 1.5 to 2.5 hours. Total project time: roughly 2.5 to 3.5 hours for a polished 10-minute interview video. Without prep, the same video takes 6 to 8 hours.

Tips for Same-Day and Next-Day Turnarounds

Some interview projects have aggressive deadlines. Conference interviews, breaking news responses, and live event content often need to be published within hours. Here is how to accelerate the workflow when time is critical.

Transcribe during the interview. If you are recording remotely, run a live transcription tool alongside the recording. By the time the interview ends, your transcript is already 90 percent done. This eliminates the 5 to 10 minute post-interview transcription wait.

Take notes during the interview. If you are present during the recording (as the interviewer or as a producer), mark timestamps for strong moments in real time. A quick note like "14:22 — great quote about hiring" during the interview saves you from discovering it during the transcript review. These live notes become the skeleton of your paper edit.

Use a shortened prep process. For same-day turnarounds, skip the full paper edit. Go straight from transcript review to a prioritized selects list: the 5 to 7 strongest soundbites in rough narrative order. Build the timeline from this shortlist and add structural connective tissue as you edit.

Limit b-roll to essentials. Fast turnarounds do not have time for elaborate b-roll packages. Use talking head footage as the primary visual and limit b-roll to jump cut covers and one or two establishing shots. Viewers will forgive less visual variety if the content is strong and timely.

Pre-build templates. Create a Premiere Pro template with your lower thirds, intro bumper, end screen, and audio processing chain already in place. When a fast-turnaround interview comes in, you open the template and drop in footage rather than building everything from scratch.

EDITOR'S TAKE

The fastest interview turnaround I have done was 90 minutes from end of recording to published video. That was a 5-minute conference interview with a live transcript, pre-built template, and minimal b-roll. It was not my best work creatively, but it was on time, on brand, and strong enough to get 50,000 views because it was first to publish. For fast turnarounds, good enough on time beats perfect too late. Every time.

Interview editing does not have to be the time sink that most editors accept as normal. The prep workflow in this guide is the difference between dreading interview projects and completing them efficiently. Every minute you invest in prep saves two to three minutes in the timeline. Build the habit, and fast turnaround interviews become a strength rather than a stress.

TRY IT

Stop scrubbing. Start creating.

Wideframe gives your team an AI agent that searches, organizes, and assembles Premiere Pro sequences from your footage. 7-day free trial.

REQUIRES APPLE SILICON

Frequently asked questions

For a 45-minute interview, prep should take 55 to 70 minutes including transcription review, marking selects, theme organization, and building a paper edit. This investment saves 3 to 5 hours in the timeline compared to editing without prep.

A paper edit is a written blueprint of your final video. It lists every soundbite in order with timecodes, transition notes, and b-roll mapping. Building a paper edit before opening your NLE turns the editing session from creative exploration into efficient assembly.

Read the transcript and rate each section: A for must-use moments, B for strong but not essential, C for usable if needed. Mark everything else as a kill. A typical 45-minute interview yields 15 to 20 minutes of selects that are then organized by theme for the final edit.

Yes. AI handles transcription with speaker labels and timestamps in minutes. It can also identify candidate selects based on dialogue content and delivery quality, and generate draft paper edits from transcripts. Human editorial judgment is still needed for final selection and creative structuring.

With proper prep, a 10-minute interview video takes approximately 2.5 to 3.5 hours total including prep and editing. For same-day turnarounds, a shortened prep process with a pre-built template can produce a polished video in as little as 90 minutes.

DP
Daniel Pearson
Co-Founder & CEO, Wideframe
Daniel Pearson is the co-founder & CEO of Wideframe. Before founding Wideframe, he founded an agency that made thousands of video ads. He has a deep interest in the intersection of video creativity and AI. We are building Wideframe to arm humans with AI tools that save them time and expand what's creatively possible for them.
This article was written with AI assistance and reviewed by the author.