How to Create Lower Thirds with AI in Premiere Pro

The Lower Thirds Pain Point

Lower thirds seem simple. A name, a title, maybe a company name, displayed in the bottom third of the frame for a few seconds. How hard can it be? If you have ever had to create lower thirds for a 12-person corporate interview series, you know the answer: surprisingly hard and incredibly tedious.

Here is the typical workflow without AI. You get the list of interviewees from the client (sometimes misspelled). You create a text template in Premiere Pro or After Effects. You manually type each name and title. You manually place each lower third at the correct timecode. You manually adjust duration. You realize the client's VP of Marketing actually goes by "Vice President, Brand Marketing and Strategic Communications" and your template does not have enough room. You resize everything. Multiply by 12 people who each appear three to four times throughout the video.

For freelancers, lower thirds are the kind of task that takes 30 minutes on paper and 90 minutes in practice. They are too simple to justify charging extra for but too time-consuming to absorb comfortably. They are the tax of corporate video editing.

AI changes this by automating the parts that are mechanical (speaker detection, timecode placement, duration calculation) while still letting you control the parts that are creative (design, animation, brand styling).

How AI Speaker Detection Enables Auto Lower Thirds

The foundation of automated lower thirds is knowing who is speaking and when. This is where AI speaker detection (also called speaker diarization) comes in.

When you run AI analysis on your footage using a tool like Wideframe, it does several things at once. It transcribes the audio. It identifies unique speakers by their voice characteristics. It maps each segment of speech to a specific speaker. And it can match speakers across multiple clips, so if the CEO appears in four different interview clips, the AI knows it is the same person.

Once you have speaker detection data, generating lower thirds becomes automated. The AI knows that Speaker A starts talking at 00:01:23 and you can tell it to place a lower third there with the name "Sarah Chen" and the title "CEO, TechCorp" for a duration of four seconds. Multiply that across every speaker appearance and you have all your lower thirds placed in seconds.

The accuracy of speaker detection is the critical factor. In my testing, modern AI speaker detection is about 95 percent accurate for two-person conversations and about 88 to 90 percent accurate for larger groups. The errors usually occur during crosstalk or when speakers have similar vocal characteristics. A quick review pass catches these easily.

EDITOR'S TAKE — DANIEL PEARSON

Speaker detection for lower thirds is one of those features that sounds like a nice-to-have until you use it on a real project. I edited a corporate town hall video with 14 speakers last month. The AI identified every speaker, and I just needed to assign the correct names and titles. Without that automation, I would have spent over an hour just placing and timing lower thirds. It took 10 minutes instead.

Design Principles for Effective Lower Thirds

Before diving into the AI workflow, let me share the design principles I follow for lower thirds. These apply whether you are creating them manually or with AI assistance.

Keep them readable at mobile size. More than half of video views happen on phones. If your lower third text is not legible at 375 pixels wide, it is too small or too detailed. Use a minimum of 24-point text equivalent and high contrast against the background.

Two lines maximum. Name on the first line, title on the second. If the title does not fit in one line, abbreviate it. Nobody reads three-line lower thirds, they just look cluttered.

Left-aligned, lower-left corner. This is the standard position for a reason. It does not interfere with center-framed subjects and stays clear of most platform UI elements (like YouTube's progress bar and subscribe button).

Four to six seconds duration. Long enough to read, short enough not to overstay. For the first appearance of a speaker, go six seconds. For subsequent appearances in the same video, four seconds is enough.

Animate in and out. A simple fade or slide-on looks professional. A 30-frame animation is plenty. Avoid flashy, complex animations that distract from the speaker.

Match the project's visual identity. Lower thirds should feel like they belong to the same design system as the rest of the video. Use the same font family, color palette, and design language as your titles and graphics.

AI Lower Third Generation Workflow

Here is my complete workflow for generating lower thirds using AI tools and Premiere Pro.

AI LOWER THIRD WORKFLOW

Run Speaker Analysis

Import your footage into Wideframe and run the full media analysis. This identifies unique speakers, maps their speech segments, and generates transcripts with speaker labels.

Create a Speaker Directory

Map each detected speaker to their real name and title. If you have 12 speakers, create a lookup table: Speaker 1 = Sarah Chen, CEO. Speaker 2 = Mike Torres, VP Engineering. And so on.

Define Your Lower Third Template

Specify the design: font, size, color, position, animation style, and duration. If you have a brand guidelines document or a Motion Graphics Template (.mogrt), reference that.

Generate and Place Lower Thirds

Use natural language to tell the AI to create lower thirds for each speaker's first appearance in the sequence, with a six-second duration and your specified design. The AI places them at the correct timecodes.

Review and Adjust in Premiere Pro

Open the generated sequence in Premiere Pro. Play through to verify each lower third appears at the right time, displays the correct information, and does not overlap with other graphics or important visual content.

Maintaining Brand Consistency Across Projects

If you work with repeat clients, maintaining consistent lower third design across projects is important. Here is how I handle this with AI tools.

Save brand presets. Create a saved preset for each client that includes their brand fonts, colors, logo placement, and animation style. When starting a new project for that client, load the preset and the AI uses it as the template for all generated lower thirds.

Use Motion Graphics Templates. If your client has existing .mogrt templates for their lower thirds, you can reference these in your AI workflow. Wideframe can populate .mogrt templates with speaker data and place them in the sequence, maintaining exact design consistency with previous videos.

Document the specifications. I keep a simple spec sheet for each client's lower thirds: font name and size, hex color codes, position coordinates, animation type and duration, line break rules for long titles. When the client changes their branding, I update the spec sheet once and all future projects automatically use the new design.

Brand consistency is one of those details that separates a freelancer who gets repeat work from one who does not. When a client sees that their new video has the exact same lower third style as the last five videos you edited for them, it builds confidence in your attention to detail. AI makes this consistency effortless.

Working with Motion Graphics Templates

Motion Graphics Templates (.mogrt files) are the professional standard for lower thirds in Premiere Pro. They are created in After Effects with editable text fields, customizable colors, and built-in animations. When placed in a Premiere Pro sequence, editors can modify the text and basic properties without opening After Effects.

AI tools that support .mogrt templates provide the best of both worlds: professional-quality animated lower thirds with automated placement. Here is how this works in practice.

You install a .mogrt template (either one you created, one the client provided, or one from a marketplace like Envato). You tell the AI tool which template to use and map the speaker data to the template's text fields. The AI then places instances of the template throughout your sequence, each populated with the correct speaker information.

For freelancers who do not have After Effects skills, there are thousands of pre-made .mogrt templates available. Many are free. The quality varies, but a good template with AI-automated placement produces results that look hand-crafted by a motion designer.

If you want to create your own .mogrt templates for maximum flexibility, the initial setup takes two to three hours in After Effects. But once created, you can reuse the template across hundreds of projects with AI-automated population. The time investment pays for itself quickly.

Handling Multi-Speaker Projects

Multi-speaker projects are where AI lower thirds really prove their value. Here are the specific challenges and how to handle them.

Speaker re-identification. When the same person appears in multiple clips recorded at different times, the AI needs to recognize them as the same speaker. Wideframe handles this by analyzing voice characteristics across all clips in the project. If the AI creates separate speaker labels for the same person, you merge them in the speaker directory.

Panel discussions. In roundtable or panel formats, speakers often talk in rapid succession. Place lower thirds only at meaningful speaking turns, not every time someone interjects a one-word response. A good rule: show the lower third when a speaker holds the floor for more than five seconds.

Name changes and corrections. Clients frequently send corrections after you have already placed lower thirds. When using AI-generated lower thirds in a template system, you can update the speaker directory and regenerate. All lower thirds update at once. This is dramatically faster than finding and editing each text instance manually.

Guest introductions. Sometimes you want a lower third to appear at a specific narrative moment rather than the first time a speaker talks. For example, in a documentary-style video, you might want the lower third to appear when the narrator introduces the person, not when they first appear on screen. AI can handle this with specific prompts like "place the lower third when the narrator says the speaker's name."

Common Lower Third Mistakes to Avoid

After years of creating lower thirds for client projects, here are the mistakes I see most often, and that AI tools help you avoid.

Misspelled names. This is the most embarrassing and most common mistake. When you are typing names manually across 15 instances, typos happen. AI-generated lower thirds pull from a single speaker directory, so you only need to spell each name correctly once.

Incorrect titles. People change roles. The speaker directory from the pre-production phase might be outdated by the time you edit. Always verify titles with the client before finalizing. I send a simple confirmation email: "Please confirm these names and titles are correct."

Overlapping graphics. Lower thirds that appear while other on-screen text is visible create visual clutter. AI placement tools can check for conflicts with other graphics layers and adjust timing to avoid overlaps.

Too many appearances. In a 10-minute video, the audience does not need to see the same person's lower third every time they speak. First appearance plus one re-identification later in the video is usually sufficient. The AI can be configured to limit lower third frequency per speaker.

EDITOR'S TAKE — DANIEL PEARSON

The single best lower third tip I can give you: send the client a screenshot of every lower third before the final export. Not a list, an actual screenshot showing how it looks in the video. This catches misspellings, wrong titles, and design issues before they become revision requests. It takes five minutes and prevents the most common round of revisions on corporate video projects.

Lower thirds are a small detail that makes a big difference in perceived production quality. With AI handling the tedious parts of detection, placement, and population, you can focus on making them look great and ensuring they serve the story rather than just checking a box.

TRY IT

Stop scrubbing. Start creating.

Wideframe gives your team an AI agent that searches, organizes, and assembles Premiere Pro sequences from your footage. 7-day free trial.

REQUIRES APPLE SILICON

Daniel Pearson

Co-Founder & CEO, Wideframe

Daniel Pearson is the co-founder & CEO of Wideframe. Before founding Wideframe, he founded an agency that made thousands of video ads. He has a deep interest in the intersection of video creativity and AI. We are building Wideframe to arm humans with AI tools that save them time and expand what’s creatively possible for them.

This article was written with AI assistance and reviewed by the author.

Frequently asked questions

Use an AI tool like Wideframe to analyze your footage for speaker detection, then map detected speakers to names and titles. The AI automatically places lower thirds at correct timecodes in your Premiere Pro sequence using your specified design template or Motion Graphics Template.

Yes. AI speaker diarization identifies unique speakers by voice characteristics and maps their speaking segments throughout the video. Once you provide the names and titles for each detected speaker, the AI places lower thirds at each speaker's first appearance automatically.

Use a minimum of 24-point equivalent text for mobile readability. Keep to two lines maximum: name on line one, title on line two. Duration should be four to six seconds, with six seconds for first appearances and four seconds for re-identifications later in the video.

Yes. AI tools like Wideframe can populate .mogrt template text fields with speaker data and place template instances throughout your Premiere Pro sequence. This combines professional After Effects animations with automated AI placement.

Modern AI speaker detection is about 95 percent accurate for two-person conversations and 88 to 90 percent accurate for larger groups. Errors typically occur during crosstalk or with speakers who have similar vocal characteristics, and are easy to correct in a review pass.