Video Frame Sizes Visualization
Visualize Compressed Video
In this tutorial, we’ll walk through how to use Python and Bokeh to visualize video frame sizes (H.264/AVC, H.265/HEVC, etc.)
and calculate moving average bitrate values from an MPEG TS file.
We’ll use ffprobe
to extract frame information and pandas
for data manipulation. Finally, we’ll create interactive plots with Bokeh.
Here is what we’ll have as the result.
❗ The final python script with click
for CLI parsing can be found here: [gist].
Prerequisites
Make sure you have the following installed:
- Python 3.x
ffprobe
(part of the FFmpeg suite)pandas
librarybokeh
library
You can install the required Python libraries using pip
:
pip install pandas bokeh
Step 1. Extract Frame Information with FFprobe
First, use FFprobe to extract frame sizes and types from your MPEG TS file. FFprobe is a perfect tool for this: it is free, well maintained and supports a wide range of codecs.
ffprobe -v error -select_streams v:0 -show_entries frame=pkt_pos,pkt_size,pict_type -of csv=p=0 input_file.ts > frame_info.csv
Replace input_file.ts
with the path to your MPEG TS file. This command will output the frame position (pkt_pos
),
size (pkt_size
) and frame type (pict_type
) in CSV format using FFprobe’s CSV writer.
The file produced will have frame position, frame size, picture type and two empty columns (for some reason). Note that frames are arranged in a display order. The frame position column can be used to rearrange data in a stream order.
222968,46380,I,,
288768,9202,B,,
298356,5113,B,,
303620,3164,B,,
270908,17308,P,,
Step 2: Load and Process the CSV Data
Next, we’ll load the CSV data into a pandas DataFrame and process it to calculate moving averages and rolling average bitrates.
2.1 Load CSV Data
Let’s create a function to load the CSV data and drop two empty columns in a CSV file produced by FFprobe.
Additionally the data frame is sorted based on frame_pos
column to use a stream order.
The function will return a pandas DataFrame.
import pandas as pd
def load_csv(file_path):
"""Load CSV file into a DataFrame and drop empty columns."""
df = pd.read_csv(file_path, header=None, names=['frame_pos', 'frame_size', 'pict_type', 'empty1', 'empty2'])
df = df.drop(columns=['empty1', 'empty2'])
# Convert display order to stream order. Drop the old index to use the new order.
df = df.sort_values(by='frame_pos').reset_index(drop=True)
return df
2.2. Calculate Bitrate
Let’s add a function to calculate video bitrate values (moving average) using 3 seconds, 1 second and 100 milliseconds rolling window.
An additional argument fps
(frames per second) is required.
def calculate_bitrates(df, fps):
"""Calculate 3s, 1s and 100 ms rolling average bitrate values."""
fps_div_10 = fps // 10
# Calculate the rolling sum of 'frame_size' values converted from bytes to megabits.
df['mbps_mavg_1s'] = df['frame_size'].rolling(window=fps).sum() * 8 / 1e6
# Similar for 3s moving average.
df['mbps_mavg_3s'] = df['frame_size'].rolling(window=3*fps).sum() * 8 / 1e6 / 3
# The same but using 100 millisecond intervals.
df['mbps_mavg_100ms'] = df['frame_size'].rolling(window=fps_div_10).sum() * 8 / 1e6 * 10 / (fps_div_10 * 10) * fps
return df
The first missing MAVG values can be calculated using an expanding sum or using the existing data to extrapolate the bitrate, but it is optional.
# Calculate the first missing values of the 'mbps_mavg_1s',
# place them into df['mbps_mavg_1s']
df.loc[:fps, 'mbps_mavg_1s'] = df['frame_size'].expanding(min_periods=1).sum().iloc[:fps] * 8 / 1e6
2.3. Add Kilobyte Frame Sizes
Let’s convert frame sizes to kilobytes to have a nicer plot.
def calculate_kbyte_frame_sizes(df):
# Convert to kilobytes
df['frame_size_kBytes'] = df['frame_size'] / 1000
return df
2.4. Add Color Column
For further visualization let’s use a color to represent a certain frame type:
- Red for an I frame,
- Blue for a P frame,
- Green for a B frame.
def add_color_column(df):
"""Add a color column based on pict_type."""
colors = {'I': 'red', 'P': 'blue', 'B': 'green'}
df['color'] = df['pict_type'].map(colors)
return df
Step 3: Create Interactive Plots with Bokeh
Let’s visualize the data using Bokeh – a Python library for creating interactive visualizations.
The following imports would be required.
from bokeh.plotting import figure, show, output_file
from bokeh.io import output_notebook
from bokeh.transform import factor_cmap
from bokeh.models import Legend, LegendItem
from bokeh.layouts import column
Let’s create a color-coded bar chart for frame sizes.
def create_bar_chart(df):
"""Create a bar chart for frame sizes with color legend."""
# Create the bar chart figure.
p1 = figure(x_axis_label='Frame Index', y_axis_label='Frame Size, kbytes', title='Frame Sizes by Type', width=1000, height=400)
# Add bars to the bar chart figure.
bars = p1.vbar(x=df.index, top=df['frame_size_kBytes'], width=0.9, color=df['color'])
# Find the first index of each pict_type
i_index = df[df['pict_type'] == 'I'].index[0]
p_index = df[df['pict_type'] == 'P'].index[0]
b_index = df[df['pict_type'] == 'B'].index[0]
# Add a legend to describe color meaning.
legend_items = [
LegendItem(label="I Frame", renderers=[bars], index=i_index),
LegendItem(label="P Frame", renderers=[bars], index=p_index),
LegendItem(label="B Frame", renderers=[bars], index=b_index)
]
legend1 = Legend(items=legend_items, location="top_right")
p1.add_layout(legend1)
return p1
Also let’s create a plot for bitrate values. It will be placed right below the bar chart. The x axis of both plots will be synchronized.
def create_bitrates_plot(df, x_range):
"""Create a plot for bitrate values with toggleable lines."""
# Create the moving average plot figure
p2 = figure(x_axis_label='Frame Index', y_axis_label='Mbps', title='Bitrate', width=1000, height=400, x_range=x_range)
# Add the Mbps moving average line to the moving average plot figure
line1 = p2.line(df.index, df['mbps_mavg_3s'], line_width=2, color='black', legend_label='MAvg Bitrate (3s), Mbps')
line2 = p2.line(df.index, df['mbps_mavg_1s'], line_width=2, color='green', legend_label='MAvg Bitrate (1s), Mbps')
# Add the rolling average bitrate line to the moving average plot figure
line3 = p2.line(df.index, df['mbps_mavg_100ms'], line_width=2, color='blue', legend_label='MAvg Bitrate (100ms), Mbps')
p2.legend.click_policy = "hide"
return p2
Now a function to create a plot of frame sizes and bitrate values.
def plot_frame_sizes(df):
"""A function to plot frame sizes using a bar chart and bitrate values right below."""
p1 = create_bar_chart(df)
p2 = create_bitrates_plot(df, x_range=p1.x_range)
# Uncomment if you want to use it in a notebook.
#output_notebook()
show(column(p1, p2))
Result
def plot_frame_sizes_from_csv(file_path):
"""A function to plot frame sizes and bitrate values from a CSV file."""
df = load_csv(file_path)
df = calculate_bitrates(df)
df = calculate_kbyte_frame_sizes(df)
df = add_color_column(df)
plot_frame_sizes(df)
As the result the following interactive plot will be built:
❗ The final python script with click
for CLI parsing can be found here: [gist].
Conclusion
By following these steps, you can visualize H.264/AVC, H.265/HEVC etc. frame sizes and calculate moving average bitrate values from an MPEG TS file. The interactive plots created with Bokeh allow you to explore the data and gain insights into the frame sizes and bitrates.
Feel free to modify the code to suit your needs and experiment with different datasets. Happy coding!