First Plots with Altair
In the previous section, we saw why visualization is important and got a glimpse of Altair's basic structure. Now, let's dive into creating common chart types. We'll focus on understanding how to map your data to visual marks and encodings.
Loading and Preparing Data¶
Altair works seamlessly with Pandas DataFrames and can also handle Polars DataFrames, often directly or with a simple conversion. For our examples, we'll primarily use datasets from the vega_datasets
package and convert them to Polars DataFrames to align with the data manipulation tools you've learned.
Loading a Sample Dataset (cars
dataset):
The vega_datasets
library provides a variety of sample datasets. Let's consider the cars
dataset.
Quick Polars Recap for Pre-Visualization:
Before plotting, you might need to select specific columns, filter rows, or handle missing data using Polars. For example:
Basic Chart Types and Encodings¶
The core of creating any plot in Altair involves:
- Initializing a
Chart
object with your data. - Choosing a
mark
type (e.g.,mark_point()
,mark_bar()
,mark_line()
). - Defining
encode()
ings to map data fields to visual properties.
Altair Data Types: When specifying fields in encodings, Altair can often infer the data type. However, it's good practice to be explicit using shorthands:
:Q
for Quantitative (numerical values that can be measured, e.g., temperature, horsepower).:N
for Nominal (categorical data with no inherent order, e.g., car origin, movie genre).:O
for Ordinal (categorical data with a meaningful order, e.g., t-shirt size S/M/L, ratings).:T
for Temporal (date/time values).
1. Scatter Plot¶
Scatter plots are used to visualize the relationship between two quantitative variables. Each point represents an observation.
- Mark:
mark_point()
ormark_circle()
- Key Encodings:
x='field_name:Q'
: Maps a quantitative field to the x-axis.y='field_name:Q'
: Maps a quantitative field to the y-axis.color='field_name:N'
: Colors points by a nominal field (category).size='field_name:Q'
: Varies point size by a quantitative field.tooltip=['field1', 'field2']
: Shows specified fields on hover.
Example:
import altair as alt
# Assuming cars_pl_df is your Polars DataFrame
scatter_plot = alt.Chart(cars_pl_df.drop_nulls(subset=['Horsepower', 'Miles_per_Gallon'])).mark_circle(size=60).encode(
x='Horsepower:Q',
y='Miles_per_Gallon:Q',
color='Origin:N', # Color by country of origin
tooltip=['Name:N', 'Horsepower:Q', 'Miles_per_Gallon:Q']
).properties(
title='Horsepower vs. Miles per Gallon'
)
# scatter_plot.show() # To display
2. Bar Chart¶
Bar charts are excellent for comparing a quantitative measure across different categories or showing counts.
- Mark:
mark_bar()
- Key Encodings:
x='category_field:N'
: Nominal field for categories on the x-axis.y='quantitative_field:Q'
: Quantitative field for bar height (ory='count():Q'
for frequencies).- Or,
x='quantitative_field:Q'
,y='category_field:N'
for horizontal bars. color='category_field:N'
: Colors bars by category (often redundant if categories are already on an axis, but useful for stacked/grouped bars).
Example (Average MPG per Origin): Altair can perform simple aggregations.
avg_mpg_bar_chart = alt.Chart(cars_pl_df.drop_nulls(subset=['Miles_per_Gallon', 'Origin'])).mark_bar().encode(
x='Origin:N',
y='average(Miles_per_Gallon):Q', # Altair aggregation
color='Origin:N',
tooltip=['Origin:N', 'average(Miles_per_Gallon):Q']
).properties(
title='Average Miles per Gallon by Origin'
)
# Polars aggregation
avg_mpg_origin_pl = cars_pl_df.drop_nulls(subset=['Miles_per_Gallon', 'Origin']) \
.group_by('Origin') \
.agg(pl.mean('Miles_per_Gallon').alias('avg_MPG'))
avg_mpg_bar_chart_polars = alt.Chart(avg_mpg_origin_pl).mark_bar().encode(
x='Origin:N',
y='avg_MPG:Q',
color='Origin:N',
tooltip=['Origin:N', 'avg_MPG:Q']
).properties(
title='Average Miles per Gallon by Origin (Polars Pre-aggregated)'
)
3. Line Chart¶
Line charts are ideal for showing trends over a continuous or ordered sequence, typically time.
- Mark:
mark_line()
- Key Encodings:
x='ordered_field:T'
(Temporal) or:O
(Ordinal) or:Q
(if a continuous quantitative sequence).y='quantitative_field:Q'
.color='category_field:N'
: For plotting multiple lines from different categories.
Example (Seattle Weather - Max Temperature Over Time):
We'll use the seattle-weather
dataset.
4. Histogram¶
Histograms visualize the distribution of a single quantitative variable by dividing the data range into bins and showing the frequency of observations in each bin.
- Mark:
mark_bar()
- Key Encodings:
x='quantitative_field:Q'
withbin=True
oralt.X('quantitative_field:Q', bin=alt.Bin(maxbins=20))
for more control.y='count():Q'
(Altair automatically computes the count for histograms).
Example (Distribution of Horsepower):
Workflow Tip: Incremental Development¶
When building visualizations, especially complex ones:
- Start Simple: Get your data loaded and create the most basic version of your chart (e.g.,
alt.Chart(data).mark_point().encode(x='col_A', y='col_B')
). - Add Encodings Gradually: Introduce color, size, tooltips, etc., one by one. Check the output at each step.
- Refine Properties: Adjust titles, labels, sizes, and scales last.
- Tweak DataFrame: Tweak the DataFrame using Polars if you want to adjust properties that are strongly coupled to datapoints from the dataset. For example, the legend labels for categorical variables are easier to change in Polars than in Altair. This iterative approach makes debugging easier and helps you build intuition for how each component affects the final visualization.