A quick overview of Seaborn

Seaborn. A wrapper on top of matplotlib. Used to make plots, and to make them quicker, easier, and more beautiful.

Thank you for your service, matplotlib. Despite your flaws, you’ve guided us this far.

But it’s time to step aside.

Types of Seaborn plots Link to heading

  • sns.boxplot() | generic boxplot
  • sns.distplot() | histogram and kernel density estimate (KDE) plotted together
    • sns.distplot(rug=True) | rugplot
  • sns.kdeplot() | kernel density estimate plot
    • sns.kdeplot(n_levels) | set the n_levels parameter high to make the KDE finer
  • sns.rugplot() | rugplot again
    • sns.jointplot() | show a scatterplot and marginal histogram for two-dimensional data.
    • sns.jointplot(kind='hexbin') | hexbin plot, like a two-dimensional histogram.
    • sns.jointplot(kind='kde') | two-dimensional KDE (might take a while to plot for large datasets).
    • sns.jointplot(kind='reg') | scatterplot, regression line and confidence interval. The sns.jointplot() function returns a JointPlot object, which you can exploit by saving the result and then adding to it whatever you feel like. Some examples:
# Save the JointPlot 
g = sns.jointplot(x="x", y="y", data=df, kind="kde", color="m")

# Use plot_joint to add a scatter plot overlay 
g.plot_joint(plt.scatter, c='w', s=1)

# Or a regression line: 
g.plot_joint(sns.regplot)
  • sns.pairplot() | used for exploring the relationships between variables in a data frame. By default, plots a scatterplot matrix on off-diagonals and histograms on diagonals. Similar to the R function ggpairs() in the GGally package.
    • Similar to how jointplot() returns a JointGrid, pairplot() returns a PairGrid with its own set of methods available to it. You can use this to change what graphs are plotted:
# Store the PairGrid object
g = sns.PairGrid(iris)

# Change the plots down the diagonal 
g.map_diag(sns.kdeplot)

# Change the plots down the offdiagonals
g.map_offdiag(sns.kdeplot, cmap="Blues_d", n_levels=6)
  • sns.stripplot()| Like a scatterplot, but one of the variables is categorical
    • sns.stripplot(jitter=True) | stops the points from overlapping as much
  • sns.swarmplot() | beeswarm plot that works like stripplot() above, but avoids overlap entirely.
    • sns.swarmplot(hue) | set the hue parameter to use colour to distinguish levels of a variable e.g. blue for male, red for female
  • sns.violinplot() | draw a violinplot with a boxplot inside it.
    • sns.violinplot(hue, split=True) | if the hue variable has two levels, then you can spit it so the violin plots won’t be symmetrical
    • sns.violinplot(inner='stick') | show the individual observations inside the violin plot, rather than a boxplot
  • sns.barplot() | standard barplot, complete with bootstrapped confidence intervals
  • sns.countplot() | histogram over a categorical variable, as opposed to the regular histogram which is over a continuous variable
  • sns.pointplot() | plot the interaction between variables using scatter plot glyphs:

A example pointplot using the Titanic dataset.

  • sns.factorplot() | draw multiple plots on different facets of your data. Combines plots (like the ones above) with a FacetGrid, which is a subplot grid that comes with a range of methods.
    • sns.factorplot(kind) | specify the type of your plot. Choose between point, bar, count, box, violin and strip. Swarm seems to work too, at least according to the official tutorial (use a Find search to find the example)
  • sns.regplot() | plot a scatterplot, simple linear regression line and 95% confidence intervals around the regression line. Accepts x and y variables in a variety of formats. Subset of sns.lmplot()
  • sns.lmplot() | like sns.regplot() , but requires a data parameter and the column names to plot specified as strings.
    • sns.lmplot(x_jitter) | add jitter in the x-direction. Useful when making plots where one of the variables takes discrete values.
    • sns.lmplot(x_estimator) | instead of points, plot an estimate of central tendency (like a mean) and a range
    • sns.lmplot(order) | fit non-linear trends with a polynomial (applies to regplot too)
    • sns.lmplot(robust=True) | fit robust regression, down-weighing the impact of outliers
    • sns.lmplot(logisitic=True) | logistic regression
    • sns.lmplot(lowess=True) | fit a scatterplot smoother
    • sns.lmplot(hue) | fit separate regression lines to levels of a categorical variable
    • sns.lmplot(col) | create facets along levels of a categorical variable
  • sns.residplot() | fits a simple linear regression, calculates residuals and then plots them
  • sns.heatmap() | takes rectangular data and plots a heatmap
  • sns.clustermap() | hierarchically clustered heatmap
  • sns.tsplot() | time series plotting function. Has the option to include uncertainty, bootstrap resamples, a range of estimators and error bars.
  • sns.lvplot() | letter value plot, which is like a better boxplot for when you have a high number of data points

Miscellaneous functions Link to heading

  • sns.get_dataset_names()| list all the toy datasets available on the Seaborn online repository
  • sns.load_dataset() | load a dataset from the Seaborn online repository
  • sns.FacetGrid , sns.PairGrid , sns.JointGrid |grids of subplots used for plotting, each somewhat different and each with their own set of methods
  • sns.despine() | remove top and right axes, making the plot look better

Controlling aesthetics Link to heading

  • sns.set() | set plotting options to seaborn defaults. Can use to reset plot parameters to the default values.
  • sns.set_style() | change the default plot theme
  • sns.set_context() | change the default plot context. Used to scale the plots up and down. Options are paper, notebook, talk and poster, in order from smallest to largest scale.
  • sns.axes_style() | temporarily set plot parameters, often used for a single plot. For example:

```py with sns.axes_style("white"): sns.jointplot(x=x, y=y, kind="hex", color="k") ```

Working with colour Link to heading

  • sns.color_palette() | return the list of colours in the current palette
    • The hls colour palette is one option; see the list of colours with sns.palplot(sns.color_palette("hls", 8)) .
    • Another (better) option is the husl system; see the list of colours with sns.palplot(sns.color_palette("husl", 8))
    • Use Paired to access ColorBrewer colours: sns.palplot(sns.color_palette("Paired")) . Likewise you can put in other parameters; for example, sns.palplot(sns.color_palette("Set2", 10)) for the Set2 palette.
    • Tack on _r with ColorBrewer palettes to reverse the colour order. Compare the difference between sns.palplot(sns.color_palette("BuGn_r")) and sns.palplot(sns.color_palette("BuGn")) .
    • Tack on _d with ColorBrewer palettes to create darker palettes than usual. See sns.palplot(sns.color_palette("GnBu_d")) compared to sns.palplot(sns.color_palette("GnBu"))
  • sns.palplot() | plot colours in a palette in a horizontal array
  • sns.hls_palette() | more customisation of the hls palette
  • sns.husl_palette() | more customisation of the husl palette
  • sns.cubehelix_palette() | more customisation of the cubehelix palette
  • sns.light_palette() and sns.dark_palette() |sequential palettes for sequential data.
  • sns.diverging_palette() | pretty self explanatory
  • sns.choose_colorbrewer_palette() | launch an interactive widget to help you choose ColorBrewer palettes. Must be used in a Jupyter notebook.
  • sns.choose_cubehelix_palette() | similar to sns.choose_colorbrewer_palette() , but for the cubehelix colour palette.
  • sns.choose_light_palette() and sns.choose_dark_palette() | launch interactive widget to aid the choice of palette.
  • sns.choose_diverging_palette() | guess what this does

Using colour palettes Link to heading

Use the cmap  argument to pass across colour palettes to a Seaborn plotting function:

x, y = np.random.multivariate_normal([0, 0], [[1, -.5], [-.5, 1]], size=300).T
cmap = sns.cubehelix_palette(light=1, as_cmap=True)
sns.kdeplot(x, y, cmap=cmap, shade=True)

You can also use the set_palette() function that changes the default matplotlib parameters so the palette is applied to all plots:

sns.set_palette("husl")