As a developer on the vegan package for R, one of the most FAQs is how to customise ordination diagrams, usually to colour the sample points according to an external grouping variable. Now, just because we get asked how to do this a lot is not really a reflection on the quality of the plot() methods available in vegan.
Ordination diagrams are difficult beasts to handle with plotting code without having an unwieldy number of arguments etc. There are potentially five sets of scores that need to be plotted so the number of arguments could quickly get out of hand if we allowed the user to pass all the relevant graphical parameters to the various sets of scores. We’ve provided a basic plot() method that is set up with useful defaults that works for all the ordination methods in vegan. This method is really there to allow the quick visualisation of the fitted ordination; I use vegan on almost a daily basis and none of my presentation or publication figures use the default plot method at all. However, we have provided all the tools needed to draw custom ordination plots within vegan. How you use them also provides a useful guide to building up base graphics plots from lower-level plotting functions. In this post I intend to show two examples of building up a simple PCA biplot from the basic building blocks available in vegan and R’s base graphics.
To get going, start R and load the vegan package. For this example I will use the classic Dutch Dune Meadow data set which I also load. A simple PCA of the species data is then fitted and stored in mod. To finish the basic example, I use the basic plot() method to plot the PCA biplot. Note that I’m using symmetric scaling here; I tend to prefer this scaling for general diagrams as it preserves many of the features of biplots without focusing on one or other sets of scores. (Here I’m also using it to illustrate how to select a scaling when you are building the plot from scratch.)
## load vegan
require("vegan")
## load the Dune data
data(dune, dune.env)
## PCA of the Dune data
mod <- rda(dune, scale = TRUE)
## plot the PCA
plot(mod, scaling = 3)
The resulting PCA biplot is shown below

The result of a call to the plot.cca() method results in a simple biplot of the Dune Meadows PCA but is not very customisable.
Building a biplot using vegan methods
The first example of a customised biplot I will show uses low-level plotting methods provided by vegan. These include points() and text() methods for objects of class "cca". (The "cca" object is one of the base ordination classes in vegan; its name is a bit unfortunate as it is the base representation for PCA, CA, RDA etc… objects — read more about this object via ?cca.object.) I am going to plot the basic PCA biplot, but colour the sites according to the land-use variable dune.env$Use, which has three levels
> with(dune.env, levels(Use)) [1] "Hayfield" "Haypastu" "Pasture"
Start by defining an object holding the desired ordination scaling and a vector of colours with which to do the plotting
scl <- 3 ## scaling = 3
colvec <- c("red2", "green4", "mediumblue")
The basic plot() method can be coerced into setting up the basic plot axes, limits etc for us by using type = "n", so we use that short cut and pass along our desired scaling
plot(mod, type = "n", scaling = scl)
The next step is to add the site scores. Here I use the points() method rather than draw the site scores using their sample ID (as is the default in the plot() method).
with(dune.env, points(mod, display = "sites", col = colvec[Use],
scaling = scl, pch = 21, bg = colvec[Use]))
The key point to note in the code chunk above is how I colour each site according to its land-use. I index into the vector of colours created earlier using the factor Use. Use is essentially a vector of 1s, 2s, and 3s (there are three levels remember). The colvec[Use] evaluates to a vector the same length as the number of sites, where each element is one of the pre-specified colours
> head(with(dune.env, colvec[Use])) [1] "green4" "green4" "green4" "mediumblue" [5] "green4" "green4"
The display argument selects the type of scores to plot. The remainder of the arguments are the scaling for the scores (so they match the base plot) and arguments to style the plotted points.
Next I add the species scores, but this time I want to label them with (abbreviated) species names. For this I use the text() method with argument display = "species"
text(mod, display = "species", scaling = scl, cex = 0.8, col = "darkcyan")
To finish the plot I add a legend. It is important to get the ordering and labelling of the colours correct here. When I drew the site scores I used the Use factor to index the vector of plotting colours. To ensure we get the correct ordering in the legend, it is best to extract the levels as the lables, which is what I do below
with(dune.env, legend("topright", legend = levels(Use), bty = "n",
col = colvec, pch = 21, pt.bg = colvec))
If you want to provide custom labels, look at the levels of the factor and provide them to legend() in the correct order.
The biplot should now look like the one below
Building a biplot using base graphics directly
If you want the ultimate level of control over the plots then you will want to build the plot up from scratch using lower-level plotting functions provided by base graphics. In this second example I’ll recreate the plot I made above but from the ground up using basic plotting functions.
First, start by extracting the scores needed from the ordination object. Here the scaling required for the plot needs to be provided and the sets of scores specified
scrs <- scores(mod, display = c("sites", "species"), scaling = scl)
This results in a list with two components; sites and species
str(scrs, max = 1)
Each component is a matrix with two columns containing the scores on the first and second principal components respectively. If you want to extract scores on different axes provide the choices argument to the scores() function with a numeric vector of the axes you wish to extract. Do note that the code below assumes only two axes are extracted.
Next compute the axis limits, which need to cover the range of the site and species scores on the first (x-axis) and second (y-axis) principal components
xlim <- with(scrs, range(species[,1], sites[,1])) ylim <- with(scrs, range(species[,2], sites[,2]))
Now everything is ready to do some actual plotting. Start preparing the plot device be starting a new plot with plot.new() and then set up the coordination system via a call to plot.window() supplying the axis limits created above. A crucial aspect of the call to plot.window() is the graphical parameter asp which controls the aspect ratio of the plot. Here we set the aspect ratio equal to 1 to preserve the relationships between scores on the different axes and the distance interpretation of the biplot
plot.new() plot.window(xlim = xlim, ylim = ylim, asp = 1)
The reference guides (dotted lines going through the point (0,0) are added first so as to not lie on top of any of the site or species scores. Two calls to the abline() function are used
abline(h = 0, lty = "dotted") abline(v = 0, lty = "dotted")
The next two lines of code use the default methods for points() and text() rather than the "cca" methods used above
with(dune.env, points(scrs$sites, col = colvec[Use],
pch = 21, bg = colvec[Use]))
with(scrs, text(species, labels = rownames(species),
col = "darkcyan", cex = 0.8))
The legend is added in same way as before
with(dune.env, legend("topright", legend = levels(Use), bty = "n",
col = colvec, pch = 21, pt.bg = colvec))
All that remains is to add the plot furniture; the axes, axis labels and the plot frame
axis(side = 1) axis(side = 2) title(xlab = "PC 1", ylab = "PC 2") box()
The fruits of our labours are shown below
I don’t claim these are perfect; many of the labels lie on top of one another for example. Vegan has some functions to help with this but I’ll leave exemplifying those for another post.
The full code for the two examples is shown below
## load vegan
require("vegan")
## load the Dune data
data(dune, dune.env)
## PCA of the Dune data
mod <- rda(dune, scale = TRUE)
## plot the PCA
plot(mod, scaling = 3)
## build the plot up via vegan methods
scl <- 3 ## scaling == 3
colvec <- c("red2", "green4", "mediumblue")
plot(mod, type = "n", scaling = scl)
with(dune.env, points(mod, display = "sites", col = colvec[Use],
scaling = scl, pch = 21, bg = colvec[Use]))
text(mod, display = "species", scaling = scl, cex = 0.8, col = "darkcyan")
with(dune.env, legend("topright", legend = levels(Use), bty = "n",
col = colvec, pch = 21, pt.bg = colvec))
## or via base graphics methods
scrs <- scores(mod, display = c("sites", "species"), scaling = scl)
str(scrs, max = 1)
xlim <- with(scrs, range(species[,1], sites[,1]))
ylim <- with(scrs, range(species[,2], sites[,2]))
plot.new()
plot.window(xlim = xlim, ylim = ylim, asp = 1)
abline(h = 0, lty = "dotted")
abline(v = 0, lty = "dotted")
with(dune.env, points(scrs$sites, col = colvec[Use],
pch = 21, bg = colvec[Use]))
with(scrs, text(species, labels = rownames(species),
col = "darkcyan", cex = 0.8))
axis(1)
axis(2)
title(xlab = "PC 1", ylab = "PC 2")
with(dune.env, legend("topright", legend = levels(Use), bty = "n",
col = colvec, pch = 21, pt.bg = colvec))
box()




Reblogged this on Stats in the Wild.
Have you tried pointLabel for the text? The overlap of text in the plotting is one thing that drives me nuts. I haven’t tried to use pointLabel yet, but I’m going to give it a whirl to see how it works.
http://rgm2.lab.nig.ac.jp/RGM2/func.php?rd_id=maptools:pointLabel
Try
ordipointlabel()in vegan. Jari shamelessly copied (and acknowledged)pointLabel()as he didn’t want to have vegan depend on a big chunk of the spatial stack just for this one thing. This is one of the things I will do a a post on in the near future (drives me bonkers too!)Hi ucfagls, this post is helpful. Do you have good idea how to display a “subset” of species using ordipointlabel() in vegan?
For instance, in text(), I could select subset of species using “select” arguments. However, ordipointlable() does not support “select” arguments.
To be honest, if it were me I’d hack the function to subset the scores before the function tries to optimise placement of labels. Shouldn’t be too hard to add to vegan itself. Will take a look once I finish teaching and get back from vacation in early June.
Ucfagls, Thanks a lots. :D I could adjust label manually for now, so please take your time.
I have discussed this with Jari and have now checked in some code to the svn sources on R-Forge. You want a build of revision 2178 (r2178) or later to pick up this change. It works almost the same as
text.ccaand you can only use it when you are plotting a single set of scores. We also check you supply the correct length forselectso no recycling is allowed etc. Note sure when this will make it into the CRAN version…Updated the post as I’d not documented a key point. The aspect ratio needs to be set to 1 in the call to
plot.window()to preserve the biplot aspects of the figure.First, Great Post. What about the vector arrows? How do you add that to the plot?
text(mod,dis="cn")should do the job. Just figured it out :)I doubt that will do what you what – that should be for plotting constraint centroids where constraints are nominal factors. If you vector arrows for species, see the
biplot()method for ideas (depends if you want to draw the arrow tip or the label at the species score.If you want biplot arrows for constraints (environmental data) then see
?envfitfor unconstrained or fit a constrained ordination viacca(),rda(), orcapscale()and then you can the “scores” for the biplot arrows viadisplay = "bp".If you explain more what you want/mean I can help further.
That adds the arrows but not the titles for the vectors– can you let us know how to label them too? I am doing the same thing but want to show which environmental parameters go with which vectors.
Thank you in advance for your help.
Never mind. I figured it out. Same as above, but using with(dat.env,text(cca,display=”bp”,”scaling….))
Thank you! It is a wonderful post!
@Margaret You will probably want to push the labels somewhat further from the arrow head than the code you show below will do. Essentially the arrow heads are drawn to biplot score and then you are labelling the exact same coordinate with the text label. You can do that by extracting the biplot scores on the axes you want to plot, and then plot text at those coordinates plus a little bit of padding using the base function
text()rather than thevegan:::plot.ccamethod you used below. If you look at the code forvegan:::biplot.rdayou’ll see lots of lines of code trying to handle this better. It isn’t perfect but might help for future projects. Also, IIRC I added something similar tovegan:::plot.envfitwhich tried to arrange for extra space on the plot to accommodate the labels drawn at the biplot arrow heads of class centroids.Vector arrows for what – the species?
I meant the arrows for constrained environmental data. I think your previous post answered my question as well. I used constrained ordinaton via capscale() to fit the environmental data into my species X sample data. I was able to color the sites but, was struggling to figure out how to get the environmental data to be displayed as well. I was missing an obvious point of using display = “bp”
Hello,
This post is really helpful!
I have also problems with many of the labels lie on top of one another . Which vegan functions can I use to solve this problem? It is really annoys me! Thank you!
There are several and I have a blog post in the works on those.
ordipointlabel(), orditorp() orordilabel()are things to look at.Thank you very much!
Hello,
Another question when I use your scriy and add
text(mod, display = “sites”, cex = 0.8, col = “darkblue”), name of the sites fall very far from the points. What could it be the reason? Distribution looks totally different.
Thank you!
You forgot the scaling – I used symmetric scaling
scaling = sclwheresclwas defined earlier to bescl <- 3. the default isscaling = 2hence the offset.Thank you for a great post! I (a new user of R) successfully applied this method to my data, except changing the order of items in legend. I tried many different way (an example below), but without success.
with(dune.env, legend(“topright”, legend = levels(Use),
c(“Pasture”,”Haypastu”,Hayfield”),
bty = “n”, col = colvec, pch = 21, pt.bg = colvec))
Any help would be greatly appreciated. Thank you.
Kiyoshi
What were you trying to do? You’ve added
c(“Pasture”,”Haypastu”,Hayfield”)as another argument which most likely won’t make sense. Where thelegend = levels(Use)you need to specify your levels/group labels in place oflevels(Use)Thank you for your prompt reply. Using your example, I’m trying to change the order of the items in the legend produced by default in your codes, from “Hayfield, Haypastu, Pasture” to “Pasture, Haypastu, Hayfield”. Here is what I tried:
with(dune.env, legend(“topright”, legend = c(“Pasture”,”Haypastu”,”Hayfield”), bty = “n”,
col = colvec, pch = 21, pt.bg = colvec))
It’s not working. I don’t have the basic knowledge and cannot figure out how to change the code you provided. Could you please provide the entire code so that I can learn from an example?
Sorry for asking such a basic question. Thank you.
Kiyoshi
I did provide all the code!? What you show should work in the sense that the labels in the legend will now be in the order you want, unfortunately the colours will not be correct. You need to reorder
colvectoo. Where you seecolvecin the code you showed above (that’s two places) replace it withcolvec[3:1]as that is the change to the ordering you want. Note you can do this more easily withrev(colvec))but using[3:1]is more general allowing you to specify exactly what order you want. Do read?legendto understand what and howlegend()works. If that fails, this blog isn’t a good place to provide further support. Post a Question on the http://stackoverflow.com/questions/tagged/r under the R tag and someone will help – I am busy during the day tomorrow but will have a look it you post a full example on StackOverflow and post the link here.I’m sorry for asking questions in an inappropriate place like this blog. I’ll try your advise and study more about function(). Thank you.
I just wanted to let you know that I finally figured out and thank you for all your helpful input. Thank you very much!
Hi, I have a couple questions about fitting environmental (land use factors, plant species presence-absence, and soil variables) constraints to my CCA biplot. 1. After successfully plotting species and site scores in my CCA, I have been trying to insert the biplot arrows of the environmental constraints in my data set using the text() function. When I do that, the plot changes completely. Is there some code or a sample script you could let me know about? 2. I would like to include only the environment constraints that are significant at conf=0.95, but am not sure that I can do that in a CCA biplot. I was hoping that this way I could make my plot less crowded.
Thanks!
That’s a bit too involved and somewhat tangential to the posting to handle here. Do you know about the vegan help “forum”? Post your Q there and I’ll take a look if no-one else does (I’m in the process of moving/emigrating, so may be delay in replying).
Pingback: Decluttering ordination plots in vegan part 1: ordilabel() | From the bottom of the heap
Hello. Great post, thanks. I have two questions. First is about the symmetric scaling (scaling = 3) which you have used and that dictates how we should interpret the ordination diagram. Scaling 3 looks best for my plot but what ramifications does this have for the interpretation of the graph?
Second, why are axes on ordination plots not between -1 and 1?
Thanks for your comment and questions Steve.
Q1; most of the relationships you would look for an interpret on a species scaling plot or a samples scaled plot (scalings 1 and 2) are preserved very well on the symmetric scaling biplot. IIRC they show about 90% of what would have been shown if you’d done a scaling 1 or scaling 2 plot without having to do separate plots for the species and sites. You’d need to look up the paper by Gabriel (2002) on symmetric scaling for the details as I don’t have my text books to hand and so can’t help at the moment.
Q2: the axes are not between -1 and 1 because of the rescaling. If you do
scaling = 0(i.e. no scaling) then you’ll see the configuration lies between -1 and 1.Regarding Q1: Gabriel (2002) says ” The symmetric biplot … provides an optimal fit to the data themselves but neither to the variance nor to the form”. But I’m still unclear how to interpret the plot using scalings other than 1 or 2. For example, here I see a succinct description of ‘rules’ to interpret triplots using these scalings: http://evol.bio.lmu.de/_statgen/Multivariate/11SS/rda.pdf. However, I cannot find a description of how to interpret plots with symmetric (scaling = 3) or no scaling. Can I treat my plot as a distance or correlation triplot? What do distances between points mean, what about angles between lines?
Regarding Q2: Reviewers have asked why the axes are not between -1 and 1, and if they can project arrows onto these axes to get correlation coefficients. So if I use no scaling, or symmetric scaling is this possible to do?
Thanks so much for the help.
Gabriel (2002) also says “Hence the symmetric biplot, in addition to its optimal fit of the data, proportionally fits the form and variance almost optimally and is an excellent candidate for general usage, unless one requires representation of the actual magnitudes.”
Symmetric scaling can, as far as I can tell, be interpreted essentially as you would both scaling = 1 or scaling = 2 biplots. The correlations between species are approximated to ~95% of that one would get with the correlation biplot for example. There is a table of how to read biplots for the various scalings somewhere, but I can’t recall if it is in the Canoco manual or the text book of Jeps & Smilauer, both of which are in a packing crate at the moment (I am between homes, jobs, & countries).
You might do well to email the ORDNEWS email list with this question and Q2. Re the latter, if you mean’t why aren’t they between -1 and 1 for symmetric scaling then that is just the scaling. The UseR series book by Borcard et al has an example of adding a unit correlation circle to a biplot produced from vegan. If you get no luck on ORDNEWS, I’d check out that book or the bible Legendre & Legendre’s Numerical Ecology which now has a new edition out.
I saw you’d emailed the vegan lists with these Qs. Let’s see if Jari chips in there too.
My custom axis labels for RDA are showing up at top and right when they should be bottom and left. how can I adjust their position?
You’ll need to show the code you’re using; I’m not clairvoyant you know ? ;-) If you can’t post a big chunk of code here, consider emailing the R-SIG-Ecology mailing list or post something on the Vegan list on R-Forge.
#plot myrdaTreatment, colored by interactive Effects
plot(myrdaTreatment,display=”sites”,scaling=1,type=”p”,main=”Treatment Effect on Endophytes”,cex.main=1.8)
title(sub=”(Geum)”,line=-27,cex.sub=1.5)
points(myrdaTreatment,pch=22,col=env$fertFac,bg=env$removalFac,cex=2,scaling=1) #removals are black, controls are open; N is red X is green
text(scores(myrdaTreatment,scaling=1)$centroids,c(“DN”,”DX”,”XN”,”XX”),col=”blue”)
I have tried adding axis labels to title:
title(sub=”(Geum)”,line=-27,cex.sub=1.5, xlab=”Axis 1 (12.97%)”, ylab=”Axis 2 (4.69%)”)
but the titles don’t end up anywhere close to the axes (which are automatically placed bottom and left)
The
line = -27is most likely the problem here. You are pushing the labelling so far away from the line that would normally be used. Why do you even have this there? What happens if you leave that bit out?then the subtitle ends up at the bottom of the graph instead of just under the main title
That is where it is suppose to be. Anyway,
lineis used to draw all the labels in the call totitle(), so have a separate call for thesuband another for thexlabandylab. Only useline = -27for thesubone.ok. I need the sub title just below the title (I have several RDA plots, each for a different plant spp, so I just denote the spp beneath the RDA title). How do I seperate the line notation for different labels? just “line.sub”?
Have two calls to
title()– you can’t make arguments up! Something likeThanks! That worked for placement, but it overlays my custom text with the automatic labels “RDA1″ and “RDA2″. How do I get rid of those labels?
When you
plot()the RDA object, includexlab = "", ylab = ""in the call.