<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>From the bottom of the heap</title>
	<atom:link href="http://ucfagls.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://ucfagls.wordpress.com</link>
	<description>The musings of a geographer</description>
	<lastBuildDate>Fri, 24 May 2013 15:35:09 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='ucfagls.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>From the bottom of the heap</title>
		<link>http://ucfagls.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://ucfagls.wordpress.com/osd.xml" title="From the bottom of the heap" />
	<atom:link rel='hub' href='http://ucfagls.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Decluttering ordination plots in vegan part 2: orditorp()</title>
		<link>http://ucfagls.wordpress.com/2013/01/13/decluttering-ordination-plots-in-vegan-part-2-orditorp/</link>
		<comments>http://ucfagls.wordpress.com/2013/01/13/decluttering-ordination-plots-in-vegan-part-2-orditorp/#comments</comments>
		<pubDate>Sun, 13 Jan 2013 13:13:43 +0000</pubDate>
		<dc:creator>ucfagls</dc:creator>
				<category><![CDATA[Plotting]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[vegan]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[ordination]]></category>
		<category><![CDATA[PCA]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://ucfagls.wordpress.com/?p=640</guid>
		<description><![CDATA[In the earlier post in this series I looked at the ordilabel() function to help tidy up ordination biplots in vegan. An alternative function vegan provides is orditorp(), the last four letters abbreviating the words text or points. That is &#8230; <a href="http://ucfagls.wordpress.com/2013/01/13/decluttering-ordination-plots-in-vegan-part-2-orditorp/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ucfagls.wordpress.com&#038;blog=14744973&#038;post=640&#038;subd=ucfagls&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>In the <a href="http://ucfagls.wordpress.com/2013/01/12/decluttering-ordination-plots-in-vegan-part-1-ordilabel/" title="Decluttering ordination plots in vegan part 1:&nbsp;ordilabel()">earlier post in this series</a> I looked at the <code>ordilabel()</code> function to help tidy up ordination biplots in <a href="http://cran.r-project.org/package=vegan">vegan</a>. An alternative function vegan provides is <code>orditorp()</code>, the last four letters abbreviating the words <em><strong>t</strong>ext <strong>or</strong> <strong>p</strong>oints</em>. That is a pretty good description of what <code>orditorp()</code> does; it draws sample or species labels using text where there is room and where there isn&#8217;t a plotting character is drawn instead. Essentially it boils down to being a one stop shop for calls to <code>text()</code> or <code>points()</code> as needed. Let&#8217;s see how it works&#8230;<span id="more-640"></span></p>
<p>As with last time out, I&#8217;ll illustrate how <code>orditorp()</code> works via a <acronym title="Principal Components Analysis">PCA</acronym> biplot for the Dutch dune meadow data.</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
## load vegan and the data
require(vegan)
data(dune)
ord &lt;- rda(dune) ## PCA of Dune data

## species priority; which species drawn last, i.e. on top
priSpp &lt;- diversity(dune, index = &quot;invsimpson&quot;, MARGIN = 2)
## sample priority
priSite &lt;- diversity(dune, index = &quot;invsimpson&quot;, MARGIN = 1)

## scaling to use
scl &lt;- 3
</pre>
<p>I won&#8217;t explain any of the code above; it is the same as that used in the <a href="http://ucfagls.wordpress.com/2013/01/12/decluttering-ordination-plots-in-vegan-part-1-ordilabel/" title="Decluttering ordination plots in vegan part 1:&nbsp;ordilabel()">earlier post</a> where an explanation was also provided.</p>
<p><code>orditorp()</code> takes an ordination object as the first argument and in addition the <code>display</code> argument controls which set of scores is displayed. Note that <code>orditorp()</code> can only plot one set of scores at a time, which as we&#8217;ll see in a minute is not exactly ideal nor foolproof. Like <code>ordilabel()</code>, you are free to specify the importance of each sample or species via argument <code>priority</code>. In <code>ordilable()</code> the <code>priority</code> controlled the plotting order such that those samples or species with high priority were plotted last (uppermost). Instead, <code>orditorp()</code> draws labels for samples or species (if it can) for those with the highest priority first.</p>
<p>So we have something to talk to, recreate the basic samples and species biplot as used in the previous post but updated to use <code>orditorp()</code></p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
plot(ord, type = &quot;n&quot;, scaling = 3)
orditorp(ord, display = &quot;sites&quot;, priority = priSite, scaling = scl,
         col = &quot;blue&quot;, cex = 1, pch = 19)
## You may prefer separate plots, but here species as well
orditorp(ord, display = &quot;species&quot;, priority = priSpp, scaling = scl,
         col = &quot;forestgreen&quot;, pch = 2, cex = 1)
</pre>
<div id="attachment_646" class="wp-caption aligncenter" style="width: 510px"><a href="http://ucfagls.files.wordpress.com/2013/01/orditorp_figure_combined.png"><img src="http://ucfagls.files.wordpress.com/2013/01/orditorp_figure_combined.png?w=640" alt="PCA biplot of the Dutch dune meadow data produced using &lt;code&gt;orditorp()&lt;/code&gt;"   class="size-full wp-image-646" /></a><p class="wp-caption-text">PCA biplot of the Dutch dune meadow data produced using <code>orditorp()</code></p></div>
<p>The behaviour or <code>orditorp()</code> should now be reasonably clear; labels are drawn for sample or species only if there is room to do so, with a point being used instead. <code>orditorp()</code> isn&#8217;t perfect by any means. Because it can only drawn one set of scores at a time, there is no easy way to stop the species labels plotting over the sample labels and vice versa.</p>
<p>How it works is, first <code>orditorp()</code> calculates the heights and widths of the labels, adds a bit of space to this (more on this later) and then works out if the box given by the current sample or species label width/height, centred on the axis score coordinate, will obscure the label boxes of any labels previously drawn. If the label box doesn&#8217;t obscure any previous label boxes the label is drawn at the sample or species score coordinates. If it does obscure an existing label then a point is drawn instead. <code>orditorp()</code> draws the labels in order of <code>priority</code> and as it draws each subsequent label it checks to see if previous labels are not obscured.</p>
<p>This process isn&#8217;t infallible of course; for example the second highest priority sample or species could lie very close to the highest priority one in ordination space and if so <code>orditorp()</code> would not draw a label for this second highest priority sample or species because it would obscure the label of the highest priority one.</p>
<p>The amount of spacing or padding <em>around</em> each label is specified via the <code>air</code> argument which has a default of <code>1</code>. <code>air</code> is interpreted as the proportion of half the label width or height that the label occupies. The default of <code>1</code> therefore means that in fact there is no additional spacing beyond the confines of the box that encloses the label. If <code>air</code> is greater than 1 proportionally more padding is added whilst values less than 1 indicate that labels can overlap. The figure below shows the species scores only with two values for <code>air</code>. In the left hand panel <code>air = 2</code> is used and the labels are padded either side of the label by the <em>entire</em> string width or height. The right hand panel uses <code>air = 0.5</code> which allows labels to overlap by up to a quarter of the string width or height in any direction from the plotting coordinate (in other words, the box that cannot be obscured when plotting subsequent labels is half the string width wide and half the string height high, centred on the plotting coordinates for the label).</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
layout(matrix(1:2, ncol = 2))
op &lt;- par(mar = c(5,4,4,1) + 0.1)
## site/sample scores
plot(ord, type = &quot;n&quot;, scaling = 3, main = expression(air == 2), cex = 1)
orditorp(ord, display = &quot;species&quot;, priority = priSite, scaling = scl,
         col = &quot;forestgreen&quot;, cex = 1, pch = 2, air = 2)
## Species scores
plot(ord, type = &quot;n&quot;, scaling = 3, main = expression(air == 0.5), cex = 1)
orditorp(ord, display = &quot;species&quot;, priority = priSpp, scaling = scl,
         col = &quot;forestgreen&quot;, pch = 2, cex = 1, air = 0.5)
par(op)
layout(1)
</pre>
<p><div id="attachment_645" class="wp-caption aligncenter" style="width: 650px"><a href="http://ucfagls.files.wordpress.com/2013/01/orditorp_figure_air.png"><img src="http://ucfagls.files.wordpress.com/2013/01/orditorp_figure_air.png?w=640&#038;h=320" alt="PCA species plot of the Dutch dune meadow data produced using &lt;code&gt;orditorp()&lt;/code&gt; showing the effect of changing argument &lt;code&gt;air&lt;/code&gt;." width="640" height="320" class="size-large wp-image-645" /></a><p class="wp-caption-text">PCA species plot of the Dutch dune meadow data produced using <code>orditorp()</code> showing the effect of changing argument <code>air</code>.</p></div><br />
One point that should be noted is that <code>orditorp()</code> doesn&#8217;t stop labels and points from overlaying one another, though as the labels are drawn after the points they shouldn&#8217;t get obscured too much. We could improve the situation a bit by drawing an opaque box around the label, or even make it partially transparent, so that the label always stood out from the plotting points. Although we&#8217;d run the risk of hiding points under labels and thus hiding information from the person looking at the figure.</p>
<p>One additional point to make is that <code>orditorp()</code> returns a logical vector indicating which sample or species scores were drawn with labels (<code>TRUE</code>) or points (<code>FALSE</code>), which might be useful for further plotting or adding to the diagram.</p>
<p>So there were have <code>orditorp()</code>. Next time I&#8217;ll take a look at <code>ordipointlabel()</code> which tackles the problem of producing a tidy ordination diagram in a far more complex way than either <code>ordilabel()</code> or <code>orditorp()</code>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ucfagls.wordpress.com/640/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ucfagls.wordpress.com/640/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ucfagls.wordpress.com&#038;blog=14744973&#038;post=640&#038;subd=ucfagls&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://ucfagls.wordpress.com/2013/01/13/decluttering-ordination-plots-in-vegan-part-2-orditorp/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/af0cc2f46bd679e92029bc489cdde955?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ucfagls</media:title>
		</media:content>

		<media:content url="http://ucfagls.files.wordpress.com/2013/01/orditorp_figure_combined.png" medium="image">
			<media:title type="html">PCA biplot of the Dutch dune meadow data produced using &#60;code&#62;orditorp()&#60;/code&#62;</media:title>
		</media:content>

		<media:content url="http://ucfagls.files.wordpress.com/2013/01/orditorp_figure_air.png?w=640" medium="image">
			<media:title type="html">PCA species plot of the Dutch dune meadow data produced using &#60;code&#62;orditorp()&#60;/code&#62; showing the effect of changing argument &#60;code&#62;air&#60;/code&#62;.</media:title>
		</media:content>
	</item>
		<item>
		<title>Decluttering ordination plots in vegan part 1: ordilabel()</title>
		<link>http://ucfagls.wordpress.com/2013/01/12/decluttering-ordination-plots-in-vegan-part-1-ordilabel/</link>
		<comments>http://ucfagls.wordpress.com/2013/01/12/decluttering-ordination-plots-in-vegan-part-1-ordilabel/#comments</comments>
		<pubDate>Sat, 12 Jan 2013 18:16:43 +0000</pubDate>
		<dc:creator>ucfagls</dc:creator>
				<category><![CDATA[Plotting]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[vegan]]></category>
		<category><![CDATA[Biplot]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[ordination]]></category>
		<category><![CDATA[PCA]]></category>
		<category><![CDATA[principal components]]></category>

		<guid isPermaLink="false">http://ucfagls.wordpress.com/?p=617</guid>
		<description><![CDATA[In an earlier post I showed how to customise ordination diagrams produced by our vegan package for R through use of colours and plotting symbols. In a series of short posts I want to cover some of the options available &#8230; <a href="http://ucfagls.wordpress.com/2013/01/12/decluttering-ordination-plots-in-vegan-part-1-ordilabel/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ucfagls.wordpress.com&#038;blog=14744973&#038;post=617&#038;subd=ucfagls&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>In an <a href="http://ucfagls.wordpress.com/2012/04/11/customising-vegans-ordination-plots/" title="Customising vegan’s ordination&nbsp;plots">earlier post</a> I showed how to customise ordination diagrams produced by our <a href="http://cran.r-project.org/web/packages/vegan/index.html">vegan</a> package for <a href="http://www.r-project.org">R</a> through use of colours and plotting symbols. In a series of short posts I want to cover some of the options available in vegan that can be used to help in producing better, clearer, less cluttered ordination diagrams.</p>
<p>First up we have <code>ordilabel()</code>. <span id="more-617"></span></p>
<p>One of the problems that ordination results pose is that there is a lot is a lot information that we want to convey using a relatively small number of pixels. What we often end up with is a jumbled mess and because of the way the sample or species scores are plotted, the important observations could very well end up covered in all the rare species or odd samples just by virtue of their ordering in the data set.</p>
<p>The simplest tool that vegan provides to help in this regard is <code>ordilabel()</code>; it won&#8217;t produce a publication-ready, uncluttered ordination diagram but it will help you focus on the &#8220;important&#8221;<sup><a href="#note1">1</a></sup> things.</p>
<p><code>ordilabel()</code> draws sample or species scores with their label (site ID or species name/code) taken from the <code>dimnames</code> of the data used to fit the ordination. To help their display, however, <code>ordilabel()</code> draws the labels in a box with an opaque background so that the labels plotted later (i.e. above) cover earlier labels whilst remain visible because of the opaque background. <code>ordilabel()</code> also allows you to specify the importance of the samples or species via the <code>priority</code> argument, which in effect controls which labels get drawn first or beneath all the others.</p>
<p>Here I&#8217;ll use a <acronym title="Principal Components Analysis">PCA</acronym> of the famous Dune Meadow data<sup><a href="#note2">2</a></sup>. First, we load vegan and the data and perform the ordination</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
require(vegan)
data(dune)
ord &lt;- rda(dune) ## PCA of Dune data
</pre>
<p>In this example, I want to give plotting priority to those species or samples that are most abundant or most diverse, respectively. For this I will use Hill&#8217;s N<sub>2</sub> for both the species and the samples, both of which can be computed via the <code>diversity()</code> function</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
## species priority; which species drawn last, i.e. on top
priSpp &lt;- diversity(dune, index = &quot;invsimpson&quot;, MARGIN = 2)
## sample priority
priSite &lt;- diversity(dune, index = &quot;invsimpson&quot;, MARGIN = 1)
</pre>
<p>The <code>MARGIN</code> argument refers to which dimension or margin of the data is used; <code>1</code> means rows, <code>2</code> means columns. Hill&#8217;s N<sub>2</sub> is equal to the inverse (or reciprocal) of the <a href="http://en.wikipedia.org/wiki/Diversity_index#Simpson_index">Simpson diversity</a> measure.</p>
<p>Throughout I&#8217;m going to use symmetric scaling of the two sets of scores for use in the biplot. As it is important to make sure the same scaling is used at each stage it is handy to store the scaling in an object and then refer to that object throughout. That way you can easily change the scaling used by altering the value in the object. Here I use <code>scl</code> and symmetric scaling is indicated by the number <code>3</code></p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
## scaling to use
scl &lt;- 3
</pre>
<p><code>ordilabel()</code> adds labels to an existing plot, so first set up the plotting region for the PCA biplot using the <code>plot()</code> method with <code>type = "n"</code> to not plot any of the data</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
plot(ord, type = &quot;n&quot;, scaling = 3)
</pre>
<p>Now we are ready to add labels to the plot. <code>ordilabel()</code> takes the ordination object as the first argument and extracts the scores indicated by the <code>display</code> argument from the fitted object. There are a number of standard plotting arguments to control the look and feel of the labels, but the important argument is <code>priority</code> to control the plotting order. Here we set it to the Hill&#8217;s N<sub>2</sub> values we computed earlier. The code chunk below adds both to the base plot we just generated</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
ordilabel(ord, display = &quot;sites&quot;, font = 3, fill = &quot;hotpink&quot;,
          col = &quot;blue&quot;, priority = priSite, scaling = scl)
## You may prefer separate plots, but here add species as well
ordilabel(ord, display = &quot;species&quot;, font = 2,
          priority = priSpp, scaling = scl)
</pre>
<p>The resulting biplot should look similar to the one below<br />
<div id="attachment_624" class="wp-caption aligncenter" style="width: 510px"><a href="http://ucfagls.files.wordpress.com/2013/01/ordilabel_figure_combined.png"><img src="http://ucfagls.files.wordpress.com/2013/01/ordilabel_figure_combined.png?w=640" alt="PCA biplot of the dune meadow data with labels added by  ordilabel()"   class="size-full wp-image-624" /></a><p class="wp-caption-text">PCA biplot of the dune meadow data with labels added by  <code>ordilabel()</code></p></div></p>
<p>Not perfect, but better than the standard <code>plot()</code> method in vegan.</p>
<p>Alternatively, one might wish to draw side by side biplots of the sample and species scores. This can be done simply with a call to <code>layout()</code> to split the current plot device into two plot regions, which we fill using very similar plotting commands as described above</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
layout(matrix(1:2, ncol = 2))
plot(ord, type = &quot;n&quot;, scaling = scl)
ordilabel(ord, display = &quot;sites&quot;, font = 3, fill = &quot;hotpink&quot;,
          col = &quot;blue&quot;, priority = priSite, scaling = scl)
plot(ord, type = &quot;n&quot;, scaling = scl)
ordilabel(ord, display = &quot;species&quot;, font = 2,
          priority = priSpp, scaling = scl)
layout(1)
</pre>
<div id="attachment_625" class="wp-caption aligncenter" style="width: 650px"><a href="http://ucfagls.files.wordpress.com/2013/01/ordilabel_figure_side_by_side.png"><img src="http://ucfagls.files.wordpress.com/2013/01/ordilabel_figure_side_by_side.png?w=640&#038;h=320" alt="Side-by-side PCA biplots of the dune meadow data with labels added by ordilabel()" width="640" height="320" class="size-large wp-image-625" /></a><p class="wp-caption-text">Side-by-side PCA biplots of the dune meadow data with labels added by <code>ordilabel()</code></p></div>
<p>You may notice some warnings about <code>scaling</code> not being a graphical parameter. These are harmful and arise because we pass <code>scaling</code> along as part of the <code>...</code> argument which we also pass on to the plotting functions used to build the plot. We&#8217;ve tried hard to stop these warnings in vegan <a href="http://ucfagls.wordpress.com/2011/07/23/passing-non-graphical-parameters-to-graphical-functions-using/" title="Passing non-graphical parameters to graphical functions using&nbsp;...">using a technique</a> I blogged about a while back, but it looks like we missed a few of these. It will be fixed in a later version of vegan and the warnings will go away.</p>
<p>Next time we&#8217;ll look at <code>orditorp()</code>.</p>
<p><strong>Notes:</strong><br />
<sup id="note1">1</sup>Whatever &#8220;important&#8221; means&#8230;<br />
<sup id="note2">2</sup>Not that I think this is the best way to analyse these data, it is just for show!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ucfagls.wordpress.com/617/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ucfagls.wordpress.com/617/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ucfagls.wordpress.com&#038;blog=14744973&#038;post=617&#038;subd=ucfagls&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://ucfagls.wordpress.com/2013/01/12/decluttering-ordination-plots-in-vegan-part-1-ordilabel/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/af0cc2f46bd679e92029bc489cdde955?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ucfagls</media:title>
		</media:content>

		<media:content url="http://ucfagls.files.wordpress.com/2013/01/ordilabel_figure_combined.png" medium="image">
			<media:title type="html">PCA biplot of the dune meadow data with labels added by  ordilabel()</media:title>
		</media:content>

		<media:content url="http://ucfagls.files.wordpress.com/2013/01/ordilabel_figure_side_by_side.png?w=640" medium="image">
			<media:title type="html">Side-by-side PCA biplots of the dune meadow data with labels added by ordilabel()</media:title>
		</media:content>
	</item>
		<item>
		<title>Shading regions under a curve</title>
		<link>http://ucfagls.wordpress.com/2013/01/11/shading-regions-under-a-curve/</link>
		<comments>http://ucfagls.wordpress.com/2013/01/11/shading-regions-under-a-curve/#comments</comments>
		<pubDate>Fri, 11 Jan 2013 15:38:31 +0000</pubDate>
		<dc:creator>ucfagls</dc:creator>
				<category><![CDATA[Plotting]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[area under a curve]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[plotting]]></category>
		<category><![CDATA[polygon]]></category>

		<guid isPermaLink="false">http://ucfagls.wordpress.com/?p=593</guid>
		<description><![CDATA[Over on the Clastic Detritus blog, Brian Romans posted a nice introduction to plotting in R. At the end of his post, Brian mentioned he would like to colour in areas under the data curve corresponding to particular ranges of &#8230; <a href="http://ucfagls.wordpress.com/2013/01/11/shading-regions-under-a-curve/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ucfagls.wordpress.com&#038;blog=14744973&#038;post=593&#038;subd=ucfagls&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Over on the <a href="http://wp.me/p6FnH-13u">Clastic Detritus blog</a>, Brian Romans posted a nice introduction to plotting in R. At the end of his post, Brian mentioned he would like to colour in areas under the data curve corresponding to particular ranges of grain sizes. The comment area on a blog isn&#8217;t really amenable to giving a full answer to the problem posed so I gave a few pointers. Other commenters also suggested solutions.</p>
<p>The problem is how to shade or colour in areas under a curve. The more general problem is how to do this when you don&#8217;t have any data that fall on the margins of the regions you wish to shade. Here is more solution to that more general problem. <span id="more-593"></span></p>
<p>As I don&#8217;t have Brian&#8217;s data lets generate some similar data with the help of the <strong>mgcv</strong> package</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
library(mgcv)
set.seed(2) ## simulate some data... 
dat &lt;- gamSim(1, n = 400, dist = &quot;normal&quot;, scale = 2)
b &lt;- gam(y ~ s(x2), data = dat)
set.seed(42)
newX &lt;- with(dat, data.frame(x2 = sort(runif(100, min = min(x2), max = max(x2)))))
pred &lt;- predict(b, newdata = newX)

bed &lt;- data.frame(Volume = pred, Diameter = newX[,1])
</pre>
<p>I&#8217;m not going to explain that as it is but a means to an end. The resulting data can be nicely, plotted via</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
plot(Volume ~ Diameter, data = bed, type = &quot;o&quot;, pch = 19)
</pre>
<p>with the resulting plot shown in Figure 1, below.</p>
<div id="attachment_604" class="wp-caption aligncenter" style="width: 410px"><a href="http://ucfagls.files.wordpress.com/2013/01/polygon_under_curve_data_figure_1.png"><img src="http://ucfagls.files.wordpress.com/2013/01/polygon_under_curve_data_figure_1.png?w=640" alt="Example data used to illustrate shading areas under a curve"   class="size-full wp-image-604" /></a><p class="wp-caption-text">Figure 1: Example data used to illustrate shading areas under a curve</p></div>
<p>To illustrate, let&#8217;s assume we want to shade the regions under the curve as defined by the following start and end points of four regions</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
from &lt;- c(0.1, 0.25, 0.37, 0.78)
to &lt;- c(0.25, 0.37, 0.63, 0.84)
</pre>
<p>To cut to the chase, here is my solution to the problem:</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
polyCurve &lt;- function(x, y, from, to, n = 50, miny,
                      col = &quot;red&quot;, border = col) {
    drawPoly &lt;- function(fun, from, to, n = 50, miny, col, border) {
        Sq &lt;- seq(from = from, to = to, length = n)
        polygon(x = c(Sq[1], Sq, Sq[n]),
                y = c(miny, fun(Sq), miny),
                col = col, border = border)
    }
    lf &lt;- length(from)
    stopifnot(identical(lf, length(to)))
    if(length(col) != lf)
        col &lt;- rep(col, length.out = lf)
    if(length(border) != lf)
        border &lt;- rep(border, length.out = lf)
    if(missing(miny))
        miny &lt;- min(y)
    interp &lt;- approxfun(x = x, y = y)
    mapply(drawPoly, from = from, to = to, col = col, border = border,
           MoreArgs = list(fun = interp, n = n, miny = miny))
    invisible()
}
</pre>
<p>Don&#8217;t worry, I&#8217;ll explain all of that in a minute; first let&#8217;s see <code>polyCurve()</code> in use</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
cols &lt;- c(&quot;red&quot;, &quot;forestgreen&quot;, &quot;navyblue&quot;, &quot;orange&quot;)
with(bed, plot(Diameter, Volume, type = &quot;o&quot;, pch = 19,
               panel.first =
               polyCurve(Diameter, Volume, from = from, to = to,
                         col = cols, border = &quot;black&quot;)))
</pre>
<p>The resulting plot should look like the one in Figure 2 below.</p>
<div id="attachment_605" class="wp-caption aligncenter" style="width: 410px"><a href="http://ucfagls.files.wordpress.com/2013/01/final_plot_polygon_under_the_curve.png"><img src="http://ucfagls.files.wordpress.com/2013/01/final_plot_polygon_under_the_curve.png?w=640" alt="Final plot showing the example data and four regions under the curve shaded"   class="size-full wp-image-605" /></a><p class="wp-caption-text">Figure 2: Final plot showing the example data and four regions under the curve shaded</p></div>
<p>Now the nitty gritty. <code>polyCurve()</code> takes the x and y data points, the start and end points of the areas to be shaded in (arguments <code>from</code> and <code>to</code>), an option to override the minimum value on the y axis to which the shading will extend (can be missing), plus vectors of colours for the fill and border of each polygon drawn.</p>
<p>Lines 3&ndash;8 define an internal function <code>drawPoly()</code>, which will actually draw a single polygon over the region under the curve defined by its arguments <code>from</code> and <code>to</code>. The first argument to <code>drawPoly()</code> is <code>fun</code>, which is a function that returns values on the curve for a vector of locations in the x variable/axis. We&#8217;ll see how this function is derived later. Notice that we pass in here some of the arguments previously described that control the look of the polygons and how far down on the y-axis the polygon will be drawn (<code>miny</code>).</p>
<p>The first line of <code>drawPoly()</code> generates an equally-spaced sequence of <code>n</code> values over the range of the polygon on the x-axis. <code>n</code> defaults to 50 values but this can be changed if needed especially if the data in that region are very wiggly or the region quite wide. The next part of <code>drawPoly()</code> uses the <code>polygon()</code> function to actually draw the polygon. We pass it the x coordinate as the sequence of values just created, with the first and last points in the sequence repeated. The y coordinates are supplied as the output from <code>fun()</code> for the sequence of points, augmented at the start and end by the value of <code>miny</code>. And that&#8217;s it.</p>
<p>Lines 9&ndash;16 of <code>polyCurve()</code> do some sanity checking and house keeping</p>
<ul>
<li>First the lengths of <code>from</code> and <code>to</code> are checked to see if they are equal</li>
<li>Then we check the lengths of the vectors of fill and border colours, <code>col</code> and <code>border</code> to see if these match up with the number of polygons to be drawn. If the lengths don&#8217;t match, we extend each vector to match the number of polygons to be drawn. This is a nice little feature that allows for a single colour to be supplied and have <code>polyCurve()</code> still work.</li>
<li>Finally, we check to see if argument <code>miny</code> was set by the user and if not we assign to it the minimum value taken by <code>y</code>.
</ul>
<p>The next line is where some R magic happens. Recall that <code>drawPoly()</code> takes a function as its first argument, which returns interpolated values along the data curve at specified x variable locations. This is where that function is created. <code>approxfun()</code> is one of those great little R functions that really saves a lot of time and coding. Essentially, <code>approxfun()</code> linearly (by default) interpolates a set of x and y coordinates. But crucially, and here is the kicker, it returns a <em>function</em> that, if given new locations for the x coordinate, will return interpolated values for the y coordinate.</p>
<p>Rather than interpolating the data curve for each region for which we want to draw a polygon, we interpolate the entire data curve with <code>approxfun()</code> and then reuse that function to generate the interpolated values we need when drawing each region&#8217;s polygon.</p>
<p>The last major piece of <code>polyCurve()</code> code is where we repeatedly call <code>drawPoly()</code>, once per region to be covered by a polygon. I could have done this part with a <code>for()</code> loop, iterating over the regions and calling <code>drawPoly()</code> with the appropriate <code>from</code>, <code>to</code>, <code>col</code> and <code>border</code>, etc. That would be relatively easy to do, but it is not really R-like. R provides a family of functions, known as the <code>apply</code> functions, which in many cases allow one to do away with an explicit <code>for()</code>. [Note that the loop is still there, it is just hidden away from view and in some cases done in compiled code rather than interpretted R code.]</p>
<p>We want to call <code>drawPoly()</code> for each combination of <code>from</code>, <code>to</code>, <code>col</code> and <code>border</code>. For this we need a the <em>multivariate</em> <code>apply</code> function <code>mapply()</code>. We pass <code>mapply()</code> the function we wish to repeatedly call. After this we give the arguments we wish to call our function with. <code>mapply()</code> will call our function once for each combination of these arguments. I.e. it will call <code>drawPoly()</code> with <code>from[i]</code>, <code>to[i]</code>, <code>col[i]</code> and <code>border[i]</code>, where <code>i</code> takes the value 1, 2, 3, &#8230; in turn. The final part of our <code>mapply()</code> call is to pass some extra arguments needed for <code>drawPoly()</code>; these arguments don&#8217;t change with each polygon so they are supplied as a list object to the <code>MoreArgs</code> argument. Notice that I name the elements of this list using the name of the <code>drawPoly()</code> argument I want each element passed on to.</p>
<p>The final line is last bit of house keeping; <code>polyCurve()</code> returns nothing and does so invisibly.</p>
<p>Let&#8217;s return to the code we used to actually draw Figure 2 above</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
cols &lt;- c(&quot;red&quot;, &quot;forestgreen&quot;, &quot;navyblue&quot;, &quot;orange&quot;)
with(bed, plot(Diameter, Volume, type = &quot;o&quot;, pch = 19,
               panel.first =
               polyCurve(Diameter, Volume, from = from, to = to,
                         col = cols, border = &quot;black&quot;)))
</pre>
<p>This is a fairly standard call to <code>plot()</code>. The none-standard part is the use of the <code>panel.first</code> argument. This is actually an argument of <code>plot.default()</code>, the default <code>plot</code> method. It takes an R expression, a bit of R code, that will be run after the plotting region has been defined and axes drawn, but crucially <em>before</em> the data for the plot are actually drawn. This is where we want <code>polyCurve()</code> to be run, so the coloured polygons end up being drawn <em>underneath</em> the actual data. This produces a nicer looking plot than having the polygons drawn over the top of the data.</p>
<p>It is worth noting that there is a corresponding <code>panel.last</code> argument which works the same way but is only run once all the other plotting is complete. A further point to note is that these two arguments work nicely when the default <code>plot</code> is called, but they can break when other <code>plot</code> methods are called first. Things break because the expression supplied to <code>panel.first</code> might end up getting evaluated (run) <em>before</em> any plotting has even taken place, because the argument is being evaluated in the wrong place (at the wrong time). At the very least, <code>panel.first</code> will have no effect, but it might raise an error in some situations.</p>
<p>So there we have it. Interpolating the data allows for a relatively concise solution to the problem of shading areas under a curve. It is a general solution not requiring one to have data at the boundaries of the regions to be shaded and as such doesn&#8217;t require any selection of data points within the region to draw the polygon through.</p>
<p>If you are still with me, it might be useful to visualise how <code>drawPoly()</code> and <code>polyCurve()</code> work, to see what each part of the process is doing.</p>
<p>First, set up a base plot onto which we can draw; this shows the data as before, but with the data points draw in a smaller size.</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
plot(Volume ~ Diameter, data = bed, type = &quot;o&quot;, pch = 19, col = &quot;black&quot;,
     cex = 0.5, main = &quot;Interpolated points on the\ndata curve&quot;)
</pre>
<p>Next, use <code>approxfun()</code> to produce an interpolation function for the data</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
FUN &lt;- with(bed, approxfun(Diameter, Volume))
</pre>
<p><code>FUN()</code> takes a single argument, the locations on the x variable for which interpolated y coordinates are to be returned, e.g.</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
&gt; FUN((1:10) * 0.1) 
 [1]  8.394756 12.740232 12.004102  8.834530  7.239627  8.145023
 [7]  7.831734  5.340844  4.948277        NA
</pre>
<p>Notice that <code>NA</code> is returned for values outside the range of the data; This is the default behaviour of <code>approxfun()</code>, which can be changed via argument <code>rule</code>, but we can&#8217;t get it to extrapolate beyond the range of the data.</p>
<p>Now, generate a set of x coordinates for the region of the curve we want to interpolate. Here I use the bit of the curve between the two peaks in the data.</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
Sq &lt;- seq(from[3], to[3], length = 20)
</pre>
<p>We use 20 values here, so the plot we will produce in a minute isn&#8217;t overly crowded, but the more values you draw over the region, the smoother the fit to the data curve itself.</p>
<p>The interpolated values for this sequence of coordinates is given by <code>FUN()</code></p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
&gt; FUN(Sq)
 [1] 9.724055 9.300774 8.907145 8.565961 8.233448 7.924078
 [7] 7.675439 7.471385 7.315091 7.263203 7.216052 7.210351
[13] 7.339273 7.468195 7.600860 7.801945 7.996272 8.180440
[19] 8.347901 8.429620
</pre>
<p>and we can draw these locations on the plot via a call to <code>points()</code>, giving it the x coordinates, <code>Sq</code>, and the output from <code>FUN(Sq)</code></p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
points(Sq, FUN(Sq), col = &quot;#FF000088&quot;, pch = 19, type = &quot;o&quot;)
</pre>
<p>The points were drawn in red, with some alpha transparency so that the data and curve show through from underneath. The resulting plot should look like the one in the left hand panel of Figure 3 below.</p>
<div id="attachment_608" class="wp-caption aligncenter" style="width: 610px"><a href="http://ucfagls.files.wordpress.com/2013/01/working_polygon_under_curve1.png"><img src="http://ucfagls.files.wordpress.com/2013/01/working_polygon_under_curve1.png?w=640" alt="Illustrating the steps involved in interpolating the data curve and drawing a polygon under the curve"   class="size-full wp-image-608" /></a><p class="wp-caption-text">Figure 3: Illustrating the steps involved in interpolating the data curve and drawing a polygon under the curve</p></div>
<p>Now that we have a good handle on what <code>approxfun()</code> is doing, we can move on to the drawing of the polygon that will shade in the area under the curve defined by our region. Start a new plot as before</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
plot(Volume ~ Diameter, data = bed, type = &quot;o&quot;, pch = 19, col = &quot;black&quot;,
     cex = 0.5, main = &quot;The final polygon&quot;)
FUN &lt;- with(bed, approxfun(Diameter, Volume))
Sq &lt;- seq(from[3], to[3], length = 20)
</pre>
<p>We now need to do a few housekeeping steps that will make the subsequent plotting much easier.</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
miny &lt;- with(bed, min(Volume))
xvals &lt;- c(Sq[1], Sq, Sq[20])
yvals &lt;- c(miny, FUN(Sq), miny)
col &lt;- &quot;#FF000088&quot;
</pre>
<p>First we looked-up the minimum value of the y coordinate, <code>Volume</code>. Then we created a set of x and y coordinates for which we want the polygon drawing. For the x coordinates, notice how we extend the sequence by prepending the first element of <code>Sq</code> and appending the last element on to the vector of x coordinates <code>Sq</code>. We do this because we have two points with the same x coordinate at the edges of the region we want to cover in a polygon; one on the curve and one at the bottom of the plot. The y coordinates were generated by calling our interpolation function <code>FUN()</code>, and as with the x coordinates, we pad this vector of coordinates at both ends with the minimum value of <code>Volume</code>. This takes care of the vertices of the polygon that fall to the bottom of the plot. We also store the plotting colour so we don&#8217;t have to keep repeating it in the steps to follow.</p>
<p>Having done that bit of housekeeping, we can draw the polygon. In the code below I draw the actual polygon and overlay on it the vertices of the polygon through which R actually draws the line of the polygon</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
polygon(xvals, yvals, border = col)
points(xvals, yvals, col = col, pch = 19, type = &quot;o&quot;)
</pre>
<p>(Note that the behaviour of <code>polygon()</code> is to join the first and last vertices, hence we didn&#8217;t need to do that bit ourselves.)</p>
<p>At this point the plot should look like the right hand panel of Figure 3, above.</p>
<p>The entirety of Figure 3 can be reproduced via the following code</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
layout(matrix(1:2, ncol = 2))
op &lt;- par(mar = c(5,4,4,2) + 0.1)
## plot1
plot(Volume ~ Diameter, data = bed, type = &quot;o&quot;, pch = 19, col = &quot;black&quot;,
     cex = 0.5, main = &quot;Interpolated points on the\ndata curve&quot;)
FUN &lt;- with(bed, approxfun(Diameter, Volume))
Sq &lt;- seq(from[3], to[3], length = 20)
points(Sq, FUN(Sq), col = &quot;#FF000088&quot;, pch = 19, type = &quot;o&quot;)
## plot2
plot(Volume ~ Diameter, data = bed, type = &quot;o&quot;, pch = 19, col = &quot;black&quot;,
     cex = 0.5, main = &quot;The final polygon&quot;)
miny &lt;- with(bed, min(Volume))
xvals &lt;- c(Sq[1], Sq, Sq[20])
yvals &lt;- c(miny, FUN(Sq), miny)
col &lt;- &quot;#FF000088&quot;
polygon(xvals, yvals, border = col)
points(xvals, yvals, col = col, pch = 19, type = &quot;o&quot;)
par(op)
layout(1)
</pre>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ucfagls.wordpress.com/593/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ucfagls.wordpress.com/593/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ucfagls.wordpress.com&#038;blog=14744973&#038;post=593&#038;subd=ucfagls&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://ucfagls.wordpress.com/2013/01/11/shading-regions-under-a-curve/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/af0cc2f46bd679e92029bc489cdde955?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ucfagls</media:title>
		</media:content>

		<media:content url="http://ucfagls.files.wordpress.com/2013/01/polygon_under_curve_data_figure_1.png" medium="image">
			<media:title type="html">Example data used to illustrate shading areas under a curve</media:title>
		</media:content>

		<media:content url="http://ucfagls.files.wordpress.com/2013/01/final_plot_polygon_under_the_curve.png" medium="image">
			<media:title type="html">Final plot showing the example data and four regions under the curve shaded</media:title>
		</media:content>

		<media:content url="http://ucfagls.files.wordpress.com/2013/01/working_polygon_under_curve1.png" medium="image">
			<media:title type="html">Illustrating the steps involved in interpolating the data curve and drawing a polygon under the curve</media:title>
		</media:content>
	</item>
		<item>
		<title>Monotonic deshrinking in weighted averaging models</title>
		<link>http://ucfagls.wordpress.com/2013/01/05/monotonic-deshrinking-in-weighted-averaging-models/</link>
		<comments>http://ucfagls.wordpress.com/2013/01/05/monotonic-deshrinking-in-weighted-averaging-models/#comments</comments>
		<pubDate>Sat, 05 Jan 2013 17:03:32 +0000</pubDate>
		<dc:creator>ucfagls</dc:creator>
				<category><![CDATA[analogue]]></category>
		<category><![CDATA[Palaeoecology]]></category>
		<category><![CDATA[Palaeolimnology]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[deshrinking]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[transfer function]]></category>
		<category><![CDATA[weighted averaging]]></category>

		<guid isPermaLink="false">http://ucfagls.wordpress.com/?p=574</guid>
		<description><![CDATA[Weighted averaging regression and calibration is the most widely used method for developing a palaeolimnological transfer function. Such models are used to reconstruct properties of the past lake environment such as pH, total phosphorus, and water temperature with, it has &#8230; <a href="http://ucfagls.wordpress.com/2013/01/05/monotonic-deshrinking-in-weighted-averaging-models/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ucfagls.wordpress.com&#038;blog=14744973&#038;post=574&#038;subd=ucfagls&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Weighted averaging regression and calibration is the most widely used method for developing a palaeolimnological transfer function. Such models are used to reconstruct properties of the past lake environment such as pH, total phosphorus, and water temperature with, it has to be said, varying degrees of success and usefulness.</p>
<p>In simple weighted averaging (<acronym title="Weighted Averaging">WA</acronym>) there is little to specify other than the predictors (the species or other proxy data) and the response (the thing you wish to build a model for and predict). The one user-specified option in a simple WA is the type of deshrinking to use.</p>
<p>Why deshrinking?<span id="more-574"></span> Well, in a WA model, averages of the response are effectively taken twice; i) first when the WA optima of the response variable for each taxon is computed, and ii) a second time when a weighted average of the optima for the species in each sample is computed to give the raw WA estimate of the response for each sample. Note that the weights here are the values in the predictor data matrix; usually these are species or taxon abundances (often the in the form of proportions or percentages).</p>
<p>The effect of taking averages twice is to shrink the range of possible estimates from a WA model. This can be illustrated using some of the tools from <a href="http://bit.ly/VFk6Le">analogue</a>. First load the package and the SWAP example data set</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
## load analogue...
require(analogue)
## ...and some example data
data(swapdiat)
data(swappH)
</pre>
<p>Next, compute the WA pH optima for each diatom taxon</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
opt &lt;- optima(swapdiat, swappH) ## compute WA optima
</pre>
<p>The final step is to use some simple matrix algebra to give the raw WA estimates of the pH for each SWAP sample (site) [note that the <code>swapdiat</code> object needs to be cast as a matrix for the matrix algebra step, which we do using <code>data.matrix()</code>]</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
diat &lt;- data.matrix(swapdiat)  ## convert to a matrix
pred &lt;- ((diat %*% opt) / rowSums(diat))[, 1] ## compute raw WA estimates
</pre>
<p>I won&#8217;t go into the details of the last step; it is a reasonably optimised bit of R code to compute a weighted average of the pH optima for each species in a given sample. [The first part of the code involving the matrix multiplication operator <code>%*%</code> forms a weighted sum of the pH optima, with weights given by the abundances of the diatom taxa.]</p>
<p>From the above we can see that there are three types of pH value:</p>
<ul>
<li>the observed pH in <code>swappH</code></li>
<li>the pH optima for each taxon in <code>opt</code>, and</li>
<li>the WA estimate of pH for each site in <code>pred</code>.</li>
</li>
<p>A quick look at the range of the pH values for each stage shows the shrinkage problem resulting from averaging</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
r.obs &lt;- range(swappH)
r.opt &lt;- range(opt)
r.wa &lt;- range(pred)

&gt; rbind(r.obs, r.opt, r.wa)
          [,1]    [,2]
r.obs 4.330000 7.25000
r.opt 4.500000 7.25000
r.wa  4.875662 6.62395
</pre>
<p>About a whole pH unit has been lost due to the repeated taking of averages.</p>
<p>Clearly, this is a problem for WA models, one that is addressed through the using a deshrinking step.</p>
<p>Traditionally there were two approaches to deshrinking; <em>inverse</em> and <em>classical</em> deshrinking. In the inverse approach, a model of the form</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cmathrm%7Bobserved%7D+%3D+%5Cbeta_0+%2B+%5Cbeta_1%5Cmathrm%7Bwa_%7Best%7D%7D+%2B+%5Cvarepsilon&amp;bg=ffffff&amp;fg=333333&amp;s=1' alt='&#92;mathrm{observed} = &#92;beta_0 + &#92;beta_1&#92;mathrm{wa_{est}} + &#92;varepsilon' title='&#92;mathrm{observed} = &#92;beta_0 + &#92;beta_1&#92;mathrm{wa_{est}} + &#92;varepsilon' class='latex' />,</p>
<p>which is a simple linear regression with the observed values as the response and the raw WA estimates as the predictor. The classical deshrinking approach flips the role of the response and the predictor such the model is</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cmathrm%7Bwa_%7Best%7D%7D+%3D+%5Cbeta_0+%2B+%5Cbeta_1%5Cmathrm%7Bobserved%7D+%2B+%5Cvarepsilon&amp;bg=ffffff&amp;fg=333333&amp;s=1' alt='&#92;mathrm{wa_{est}} = &#92;beta_0 + &#92;beta_1&#92;mathrm{observed} + &#92;varepsilon' title='&#92;mathrm{wa_{est}} = &#92;beta_0 + &#92;beta_1&#92;mathrm{observed} + &#92;varepsilon' class='latex' />.</p>
<p>Expanded WA estimates are achieved by taking the usual predicted values from the inverse model or by inverting the classical model equation.</p>
<p>Both the inverse and classical deshrinking approaches are linear. The idea of using a non-linear deshrinking step has long been proposed in the literature, all the way back to ter Braak and Juggins (1993) and Marchetto (1994), but in practice the idea has not caught on, presumably because of a lack of widely-available and user-friendly software that implement the technique.</p>
<p>In the non-linear deshrinking approach the following model is use, which very similar to that of the inverse approach</p>
<p><img src='http://s0.wp.com/latex.php?latex=%5Cmathrm%7Bobserved%7D+%5Csim+%5Cbeta_0+%2B+f_1%28%5Cmathrm%7Bwa_%7Best%7D%7D%29+%2B+%5Cvarepsilon&amp;bg=ffffff&amp;fg=333333&amp;s=1' alt='&#92;mathrm{observed} &#92;sim &#92;beta_0 + f_1(&#92;mathrm{wa_{est}}) + &#92;varepsilon' title='&#92;mathrm{observed} &#92;sim &#92;beta_0 + f_1(&#92;mathrm{wa_{est}}) + &#92;varepsilon' class='latex' /></p>
<p>where <img src='http://s0.wp.com/latex.php?latex=f_1%28%29&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='f_1()' title='f_1()' class='latex' /> is a smooth, monotonic function. The monotonicity constraint is important; as the raw WA estimate increases in value the expanded estimate should also increase. In other words we don&#8217;t allow for the expanded estimate to get smaller (bigger) as the raw WA estimate gets bigger (smaller).</p>
<p>New code in <strong>analogue</strong> implements this monotonic deshrinking idea. The seed of the actual implementation used here is murky.<sup><a href="#footnote1">1</a></sup> Briefly, both <strong>analogue</strong> and <strong>rioja</strong> fit a cubic regression spline via <code>s(wa.est, bs = "cr")</code>, but as the standard <code>gam()</code> function won&#8217;t constrain the smooth to be monotonic some additional steps have to be performed, and we end up fitting a penalised regression with monotonicity constraints invoked. The code to show how to do this in <strong>mgcv</strong> was sitting there in one of Simon&#8217;s man pages!</p>
<p>So what does this look like? The figure below shows the raw WA pH estimates and the observed pH values for the SWAP diatom data we looked at earlier. The thick green line fitted through the points is the monotonic cubic regression spline fitted via <strong>mgcv</strong>.<br />
<div id="attachment_584" class="wp-caption aligncenter" style="width: 610px"><a href="http://ucfagls.files.wordpress.com/2013/01/monotonic_deshrinking_figure_1.png"><img src="http://ucfagls.files.wordpress.com/2013/01/monotonic_deshrinking_figure_1.png?w=640" alt="Monotonic deshrinking spline for the SWAP diatom pH data set"   class="size-full wp-image-584" /></a><p class="wp-caption-text">Figure 1: Monotonic deshrinking spline for the SWAP diatom pH data set</p></div><br />
The monotonic deshrinking and the plot were produced using</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
## do the monotonic deshrinking
mono &lt;- deshrink(swappH, pred, type = &quot;monotonic&quot;)$env

## need to get things in increasing order for plotting
ord &lt;- order(pred)

## draw the data and the fitted monotonic cubic regression spline
plot(pred, swappH, asp = 1, ylim = r.obs, xlim = r.obs,
     panel.first = abline(0, 1, col = &quot;darkgrey&quot;, lwd = 2),
     ylab = &quot;Observed pH&quot;, xlab = &quot;Raw WA pH&quot;)
lines(mono[ord] ~ pred[ord], type = &quot;l&quot;, col = &quot;forestgreen&quot;, lwd = 2)
</pre>
<p>In this example, there appears to be two relationships between the raw WA estimates and the observed pH; a steeper one up to pH 6.0 and a somewhat less strong one thereafter.</p>
<p>Figure 2 shows a comparison of the three deshrinking methods discussed above.<br />
<div id="attachment_586" class="wp-caption aligncenter" style="width: 610px"><a href="http://ucfagls.files.wordpress.com/2013/01/monotonic_deshrinking_figure_2.png"><img src="http://ucfagls.files.wordpress.com/2013/01/monotonic_deshrinking_figure_2.png?w=640" alt="Comparison of inverse, classical and monotonic deshrinking for the SWAP diatom pH data set"   class="size-full wp-image-586" /></a><p class="wp-caption-text">Figure 2: Comparison of inverse, classical and monotonic deshrinking for the SWAP diatom pH data set</p></div><br />
The inverse and classicial deshrinking values were computed using</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
inv &lt;- deshrink(swappH, pred, type = &quot;inverse&quot;)$env
cla &lt;- deshrink(swappH, pred, type = &quot;classical&quot;)$env
</pre>
<p>The figure was produced using</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
ylim &lt;- range(r.obs, r.wa, mono, inv, cla)
plot(pred, swappH, asp = 1, ylim = ylim, xlim = ylim,
     panel.first = abline(0, 1, col = &quot;darkgrey&quot;, lwd = 2),
     ylab = &quot;Observed pH&quot;, xlab = &quot;Raw WA pH&quot;)
ord &lt;- order(pred)
lines(inv[ord] ~ pred[ord], type = &quot;l&quot;, col = &quot;darkorange&quot;, lwd = 2)
lines(cla[ord] ~ pred[ord], type = &quot;l&quot;, col = &quot;navyblue&quot;, lwd = 2)
lines(mono[ord] ~ pred[ord], type = &quot;l&quot;, col = &quot;forestgreen&quot;, lwd = 2)
legend(&quot;topleft&quot;, legend = c(&quot;Monotonic&quot;, &quot;Inverse&quot;, &quot;Classical&quot;),
       col = c(&quot;forestgreen&quot;,&quot;darkorange&quot;,&quot;navyblue&quot;),
       lwd = 3, bty = &quot;n&quot;)
</pre>
<p>The monotonic deshrinking curve is quite similar to the inverse deshrinking one; this is not surprising as monotonic deshrinking is a local version of the inverse deshrinking model. The two curves only deviate from one another above pH 5.5.</p>
<p>There does seem to be a slight improvement in WA model performance when using monotonic deshrinking over the other deshrinking techniques, for the SWAP data set at least. Well, that was the conclusion of the paper John and I have been working on for a special issue of JoPL. But you&#8217;ll have to wait for the paper (and accompanying blog post) for the full details when it is accepted and in press.</p>
<p><strong>Notes</strong><br />
<sup id="footnote1">1</sup>Steve Juggins discussed the idea briefly in a presentation he gave at UCL a number of years ago and I had at the back of my mind been mulling how to do this in R without having code the entire thing(!) myself. John Birks recently suggested that I use monotonic deshrinking in a comparison of transfer function methods we were doing for a special issue of the Journal of Paleolimnology (JoPL). It turns out that Richard Telford, a colleague of John&#8217;s, had discussed this with both John and Steve too including the implementation used in <strong>analogue</strong> and, as it turned out, also in Steve&#8217;s <strong>rioja</strong> R package. It seems Steve and I implemented almost the same idea, independently; me after I&#8217;d been scouring Simon Wood&#8217;s exceedingly useful man pages for his <strong>mgcv</strong> package trying to find out how to do something with <code>gam()</code> to analyse time series data.</p>
<p><strong>References</strong><br />
Marchetto (1994; <em>Journal of Paleolimnology</em> <strong>12</strong>, 155&ndash;162)<br />
ter Braak &amp; Juggins (1993; <em>Hydrobiologia</em>, 269/270, 485&ndash;502)</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ucfagls.wordpress.com/574/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ucfagls.wordpress.com/574/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ucfagls.wordpress.com&#038;blog=14744973&#038;post=574&#038;subd=ucfagls&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://ucfagls.wordpress.com/2013/01/05/monotonic-deshrinking-in-weighted-averaging-models/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/af0cc2f46bd679e92029bc489cdde955?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ucfagls</media:title>
		</media:content>

		<media:content url="http://ucfagls.files.wordpress.com/2013/01/monotonic_deshrinking_figure_1.png" medium="image">
			<media:title type="html">Monotonic deshrinking spline for the SWAP diatom pH data set</media:title>
		</media:content>

		<media:content url="http://ucfagls.files.wordpress.com/2013/01/monotonic_deshrinking_figure_2.png" medium="image">
			<media:title type="html">Comparison of inverse, classical and monotonic deshrinking for the SWAP diatom pH data set</media:title>
		</media:content>
	</item>
		<item>
		<title>A new version of analogue for a new year</title>
		<link>http://ucfagls.wordpress.com/2013/01/04/a-new-version-of-analogue-for-a-new-year/</link>
		<comments>http://ucfagls.wordpress.com/2013/01/04/a-new-version-of-analogue-for-a-new-year/#comments</comments>
		<pubDate>Fri, 04 Jan 2013 21:00:57 +0000</pubDate>
		<dc:creator>ucfagls</dc:creator>
				<category><![CDATA[analogue]]></category>
		<category><![CDATA[Palaeoecology]]></category>
		<category><![CDATA[Palaeolimnology]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://ucfagls.wordpress.com/?p=569</guid>
		<description><![CDATA[Yesterday I rolled up a new version (0.10-0) of analogue, my R package for analysing palaeoecological data. It is now available from CRAN. There were lots of incremental changes to Stratiplot() to improve the quality of the stratigraphic diagrams produced &#8230; <a href="http://ucfagls.wordpress.com/2013/01/04/a-new-version-of-analogue-for-a-new-year/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ucfagls.wordpress.com&#038;blog=14744973&#038;post=569&#038;subd=ucfagls&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Yesterday I rolled up a new version (0.10-0) of <a href="http://cran.r-project.org/web/packages/analogue/index.html">analogue</a>, my R package for analysing palaeoecological data. It is now available from CRAN.</p>
<p>There were lots of incremental changes to <code>Stratiplot()</code> to improve the quality of the stratigraphic diagrams produced and fix several annoying bugs. Also the definition of the standard error of MAT reconstructions was fixed; it is essentially a weighted variance but the original version assumed the weights summed to 1, which is not the case for dissimilarities of the <em>k</em>-NN.</p>
<p>Several new functions and additional functionality were added to the package:</p>
<ul>
<li><code>caterpillarPlot()</code> produces caterpillar plots showing species WA optima and tolerance ranges</li>
<li><code>splitSample()</code> is a convenience function for sampling a subset of training set samples whilst ensuring that the entire environmental gradient of interest in the training set is evenly sampled.</li>
<li>The <code>wa()</code> function received a lot of love in this iteration. The main addition is to allow non-linear deshrinking of the raw WA estimates alongside the more common inverse and classical deshrinking techniques. The deshrinking is achieved using a cubic regression spline fitted using the <code>gam()</code> function in package <a href="http://cran.r-project.org/web/packages/mgcv/index.html">mgcv</a>. The spline is constrained to be monotonic to make sure that the deshrunk values for increasing raw values are likewise increasing. Small tolerance handling in <code>wa()</code> with tolerance downweighting gained the option to replace small tolerances with the mean of all taxon tolerances.</li>
<li><code>logitreg()</code>, which applies a logistic regression to the problem of identifying a critical threshold in compositional dissimilarity for MAT models, saw a major update. The returned object was substantially altered to allow for a wider amount of information to be supplied to the user. <code>fitted()</code> and <code>predict()</code> methods for class <code>"logitreg"</code> were also added. These compute the fitted probabilities for the training set samples and for new (e.g. fossil) samples respectively. The probabilities are in respect to the analogue-ness of samples to the groups in the training set (e.g. vegetation biomes in the case of pollen data).<br />
These changes allow an analysis similar in spirit to that of <a href="http://dx.doi.org/10.1016/S0033-5894(03)00088-7">Gavin et al (2003, Quaternary Research 60; 356–367)</a> in their Figure 8. Here though logistic regression fits are used and not the ROC method they use.</li>
</ul>
<p>I&#8217;ll be writing more on these ideas, especially the monotonic deshrinking and the logistic regression approach to dissimilarity threshold choice in future posts.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ucfagls.wordpress.com/569/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ucfagls.wordpress.com/569/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ucfagls.wordpress.com&#038;blog=14744973&#038;post=569&#038;subd=ucfagls&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://ucfagls.wordpress.com/2013/01/04/a-new-version-of-analogue-for-a-new-year/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/af0cc2f46bd679e92029bc489cdde955?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ucfagls</media:title>
		</media:content>
	</item>
		<item>
		<title>Processing sample labels using regular expressions in R</title>
		<link>http://ucfagls.wordpress.com/2012/08/15/processing-sample-labels-using-regular-expressions-in-r/</link>
		<comments>http://ucfagls.wordpress.com/2012/08/15/processing-sample-labels-using-regular-expressions-in-r/#comments</comments>
		<pubDate>Wed, 15 Aug 2012 09:49:29 +0000</pubDate>
		<dc:creator>ucfagls</dc:creator>
				<category><![CDATA[Palaeoecology]]></category>
		<category><![CDATA[Palaeolimnology]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[data processing]]></category>
		<category><![CDATA[regular expression]]></category>
		<category><![CDATA[sediment core]]></category>

		<guid isPermaLink="false">http://ucfagls.wordpress.com/?p=555</guid>
		<description><![CDATA[I am often found in possession of palaeo core data where the sample identifiers contain a core code or label plus the sample depth. Often these are things generated by colleagues who have used other software where for one reason &#8230; <a href="http://ucfagls.wordpress.com/2012/08/15/processing-sample-labels-using-regular-expressions-in-r/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ucfagls.wordpress.com&#038;blog=14744973&#038;post=555&#038;subd=ucfagls&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>I am often found in possession of palaeo core data where the sample identifiers contain a core code or label plus the sample depth. Often these are things generated by colleagues who have used other software where for one reason or another they don&#8217;t want to store the depth information as a separate numeric variable. I also generate such data sets, not because I want to but because the software often supplied with lab equipment (most recent example is the Thermo Flash EA/Delta V I&#8217;ve been running stable N and C isotope measurements on) that records data/measurements using a single character identifier variable.</p>
<p>The information in these labels is useful and I really don&#8217;t want to type out all the depths again and it&#8217;s not just because I am lazy; the more times you have to enter data the more opportunities for transcription errors to creep into your work and analysis. So I have things like this</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
R&gt; (eg1 &lt;- paste0(&quot;CORE&quot;, 0:10 + 0.5))
 [1] &quot;CORE0.5&quot;  &quot;CORE1.5&quot;  &quot;CORE2.5&quot;  &quot;CORE3.5&quot;  &quot;CORE4.5&quot;  &quot;CORE5.5&quot;
 [7] &quot;CORE6.5&quot;  &quot;CORE7.5&quot;  &quot;CORE8.5&quot;  &quot;CORE9.5&quot;  &quot;CORE10.5&quot;
R&gt; (eg2 &lt;- paste0(&quot;FOO_&quot;, 0:10 + 0.5))
 [1] &quot;FOO_0.5&quot;  &quot;FOO_1.5&quot;  &quot;FOO_2.5&quot;  &quot;FOO_3.5&quot;  &quot;FOO_4.5&quot;  &quot;FOO_5.5&quot;
 [7] &quot;FOO_6.5&quot;  &quot;FOO_7.5&quot;  &quot;FOO_8.5&quot;  &quot;FOO_9.5&quot;  &quot;FOO_10.5&quot;
</pre>
<p>What can be done to process these sorts of data with R to extract the useful information?</p>
<p><span id="more-555"></span></p>
<p>With <code>eg2</code> we could split the strings on <code>_</code> using <code>strsplit()</code> and process the resulting components. For example</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
R&gt; as.numeric(sapply(strsplit(eg2, &quot;_&quot;), `[`, 2))
 [1]  0.5  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5
</pre>
<p>To see how that code works, note that <code>strsplit()</code> returns a list with as many components as elements in the character vector supplied (e.g. <code>length(eg2)</code>). Each component of the list contains the individual character strings created by splitting.</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
R&gt; head(spl &lt;- strsplit(eg2, &quot;_&quot;), 2)
[[1]]
[1] &quot;FOO&quot; &quot;0.5&quot;

[[2]]
[1] &quot;FOO&quot; &quot;1.5&quot;
</pre>
<p>Notice that the depth information is in the second element of each list component. To access this information for the first component we might use <code>spl[[1]][2]</code> and the second one via <code>spl[[2]][2]</code>. Notice that the only thing that is changing here is the number in the <code>[[ ]]</code>. To each of the components of <code>spl</code> we are applying the <code>[</code> function with argument <code>2</code>; that can be automated via <code>sapply()</code> as shown above. The last part of the example just coerces the character vector of depths to a numeric one.</p>
<p>All of that is a bit of a faff and won't work for <code>eg1</code> because there is nothing to split on. An alternative solution is to use regular expressions. I'm no regular expression expert and if there is anything in computing that will warp your feeble little mind it is a regular expression. However, these things are incredibly useful for matching or extracting bits of data from strings.</p>
<p>A <a href="http://en.wikipedia.org/wiki/Regular_expression">regular expression</a> contains placeholders or entities that you want to match or find within a given set of strings. For example, here is a modified version of <code>eg1</code> where the last element has a different format to the rest</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
R&gt; (eg3 &lt;- c(eg1, &quot;12.5CORE&quot;))
 [1] &quot;CORE0.5&quot;  &quot;CORE1.5&quot;  &quot;CORE2.5&quot;  &quot;CORE3.5&quot;  &quot;CORE4.5&quot;  &quot;CORE5.5&quot;
 [7] &quot;CORE6.5&quot;  &quot;CORE7.5&quot;  &quot;CORE8.5&quot;  &quot;CORE9.5&quot;  &quot;CORE10.5&quot; &quot;12.5CORE&quot;
</pre>
<p>To match only those with one or more alphabetical characters are the start of the string we can use <code>"^[A-Za-z]+"</code> as our regular expression and the <code>grep()</code> to do the matching</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
R&gt; grep(&quot;^[A-Za-z]+&quot;, eg3, value = TRUE)
 [1] &quot;CORE0.5&quot;  &quot;CORE1.5&quot;  &quot;CORE2.5&quot;  &quot;CORE3.5&quot;  &quot;CORE4.5&quot;  &quot;CORE5.5&quot;
 [7] &quot;CORE6.5&quot;  &quot;CORE7.5&quot;  &quot;CORE8.5&quot;  &quot;CORE9.5&quot;  &quot;CORE10.5&quot;
</pre>
<p>The <code>[A-Za-z]</code> means match anything that is a letter in the English language alphabet. I added a qualifier, the <code>+</code>, which means match <em>one or more</em> of these letters. The last bit of the regular expression is the <code>^</code>, which indicates that matches should begin with one or more letters; anything that doesn't begin with one or more letters will not be matched. If you look carefully at the result, <code>"12.5CORE"</code> is missing because it doesn't start with one or more letters.<br />
To match one or more letters at the end of a string, the <code>$</code> can be used, e.g.</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
R&gt; grep(&quot;[A-Za-z]+$&quot;, eg3, value = TRUE)
[1] &quot;12.5CORE&quot;
</pre>
<p>Let's turn our attention back to <code>eg1</code>. A regular expression that would match each component of the strings could be <code>"([A-Za-z]+)([0-9\\.]+)"</code>. The parentheses group the various parts of the expression which we'll use in a moment. The first set of parentheses matches one or more letters whilst the second set matches one or more digits plus the decimal point. The decimal point has been escaped (which in R requires two not the usual one backslash) as it is a regular expression meta character (like <code>+</code> and <code>*</code>) that matches a single character. We want a literal <code>.</code> so we escape its usual meaning. As we now have a regular expression that will match the format of our sample labels we can proceed to manipulate them. This is where the parentheses come in. As I said, these group matches within the single expression. The matches within the parentheses can be referred to using backreferences. So I could use <code>\\1</code> to refer to the strings matched by the first set of parentheses and <code>\\2</code> to matches in the second set. Note we need to double backslash here as this is R.</p>
<p>To achieve our final goal of extracting the depth information from the sample labels we can combine this regular expression with the <code>gsub()</code> function, which does string replacement using regular expressions. If we think about what we want to do, we want to essentially replace the sample label with the extracted depth information to form a new set of strings. So we can match the two parts of our sample labels using our regular expression and replace them with a <a class="zem_slink" title="Regular expression" href="http://en.wikipedia.org/wiki/Regular_expression" rel="wikipedia" target="_blank">backreference</a> to the depth part matched by the second set of parentheses. For example:</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
R&gt; gsub(&quot;([A-Za-z]+)([0-9\\.]+)&quot;, &quot;\\2&quot;, eg1)
 [1] &quot;0.5&quot;  &quot;1.5&quot;  &quot;2.5&quot;  &quot;3.5&quot;  &quot;4.5&quot;  &quot;5.5&quot;  &quot;6.5&quot;  &quot;7.5&quot;  &quot;8.5&quot;  &quot;9.5&quot;
[11] &quot;10.5&quot;
</pre>
<p>All that remains is to coerce that to numeric and we have our depth data</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
R&gt; as.numeric(gsub(&quot;([A-Za-z]+)([0-9\\.]+)&quot;, &quot;\\2&quot;, eg1))
 [1]  0.5  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5
</pre>
<p><code>eg2</code> can be handled in a similar way but we need to add _ to the characters matched by the first set of parentheses</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
R&gt; as.numeric(gsub(&quot;([A-Za-z_]+)([0-9\\.]+)&quot;, &quot;\\2&quot;, eg2))
 [1]  0.5  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5
</pre>
<p>or add it as a literal <code>_</code> between the two sets</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
R&gt; as.numeric(gsub(&quot;([A-Za-z]+)_([0-9\\.]+)&quot;, &quot;\\2&quot;, eg2))
 [1]  0.5  1.5  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5
</pre>
<p>If you had a more complicated data set with several cores in the same file, identified by a different core code, regular expressions can be used to extract the core and depth information. For example, given</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">

R&gt; set.seed(1)
R&gt; dat &lt;- data.frame(Label = paste0(rep(c(&quot;WAST&quot;, &quot;NAGA&quot;), each = 3), rep(0:2 + 0.5, 3)),
+                    Value = runif(6))
R&gt; dat
    Label     Value
1 WAST0.5 0.2655087
2 WAST1.5 0.3721239
3 WAST2.5 0.5728534
4 NAGA0.5 0.9082078
5 NAGA1.5 0.2016819
6 NAGA2.5 0.8983897
</pre>
<p>we could add site and label data using</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
R&gt; rexp &lt;- &quot;([A-Za-z]+)([0-9\\.]+)&quot;
R&gt; dat &lt;- transform(dat, Site  = gsub(rexp, &quot;\\1&quot;, Label),
+                        Depth = as.numeric(gsub(rexp, &quot;\\2&quot;, Label)))
R&gt; dat
    Label     Value Site Depth
1 WAST0.5 0.2655087 WAST   0.5
2 WAST1.5 0.3721239 WAST   1.5
3 WAST2.5 0.5728534 WAST   2.5
4 NAGA0.5 0.9082078 NAGA   0.5
5 NAGA1.5 0.2016819 NAGA   1.5
6 NAGA2.5 0.8983897 NAGA   2.5
</pre>
<p>These are just some very simple regular expressions but hopefully you can see their power and utility for manipulations of character data that palaeo-types often have to handle.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ucfagls.wordpress.com/555/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ucfagls.wordpress.com/555/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ucfagls.wordpress.com&#038;blog=14744973&#038;post=555&#038;subd=ucfagls&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://ucfagls.wordpress.com/2012/08/15/processing-sample-labels-using-regular-expressions-in-r/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/af0cc2f46bd679e92029bc489cdde955?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ucfagls</media:title>
		</media:content>
	</item>
		<item>
		<title>What&#8217;s wrong with LOESS for palaeo data?</title>
		<link>http://ucfagls.wordpress.com/2012/07/24/whats-wrong-with-loess-for-palaeo-data/</link>
		<comments>http://ucfagls.wordpress.com/2012/07/24/whats-wrong-with-loess-for-palaeo-data/#comments</comments>
		<pubDate>Tue, 24 Jul 2012 10:00:36 +0000</pubDate>
		<dc:creator>ucfagls</dc:creator>
				<category><![CDATA[Palaeoecology]]></category>
		<category><![CDATA[Palaeolimnology]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[Time series]]></category>
		<category><![CDATA[autocorrelation]]></category>
		<category><![CDATA[cross-validation]]></category>
		<category><![CDATA[GCV]]></category>
		<category><![CDATA[LOESS]]></category>
		<category><![CDATA[overfitting]]></category>

		<guid isPermaLink="false">http://ucfagls.wordpress.com/?p=523</guid>
		<description><![CDATA[Locally weighted scatterplot smoothing (LOWESS) or local regression (LOESS) is widely used to highlight &#8220;signal&#8221; in variables from stratigraphic sequences. It is a user-friendly way of fitting a local model that derives its&#160;form from the data themselves rather than having &#8230; <a href="http://ucfagls.wordpress.com/2012/07/24/whats-wrong-with-loess-for-palaeo-data/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ucfagls.wordpress.com&#038;blog=14744973&#038;post=523&#038;subd=ucfagls&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://en.wikipedia.org/wiki/Local_regression">Locally weighted scatterplot smoothing</a> (LOWESS) or local regression (LOESS) is widely used to highlight &#8220;signal&#8221; in variables from stratigraphic sequences. It is a user-friendly way of fitting a local model that derives its&nbsp;form from the data themselves rather than having to be specified <em>a priori</em> by the user. There are generally two things that a user has to specify when using LOESS; <img src='http://s0.wp.com/latex.php?latex=%5Clambda&amp;bg=ffffff&amp;fg=333333&amp;s=1' alt='&#92;lambda' title='&#92;lambda' class='latex' /> the span or bandwidth of the local window and <img src='http://s0.wp.com/latex.php?latex=%5Calpha&amp;bg=ffffff&amp;fg=333333&amp;s=1' alt='&#92;alpha' title='&#92;alpha' class='latex' /> the degree of polynomial used in the local regression. Both control the smoothness of the fitted model, with smaller spans and higher degree polynomials giving less-smooth (more-rough) models. Usually it is just the span value that is changed, for expedience.</p>
<p>What can I have possibly against this? What <em>is</em> wrong with using LOESS for palaeo data?<br />
<span id="more-523"></span><br />
The problem as I see it stems from the way palaeolimnologists choose the LOESS parameters when fitting stratigraphic data. Quite often the default is chosen in whatever software is used. Some people will play around with the span testing out some values until they get a fit that they are happy with. The more statistically savvy palaeoecologist might use a cross-validation (CV) to choose the value of span that provides the best out-of-sample predictions of the observed data. For the latter, generalised cross-validation (GCV) would normally be applied to avoid repeated fitting to each CV fold or subset.</p>
<p>Using the default or the value that gives you a fit that appeals to you is simply not justifiable science. The default may be totally inappropriate for the data to hand and the signal one is expecting. Furthermore, the human brain is great at seeing pattern even where non exists. The smoothness of the fitted LOESS model needs to be chosen to avoid overfitting the observed data. You can&#8217;t do that by eye!</p>
<p>CV or GCV should help avoid overfitting the data but importantly can only do this for <em>independent</em> observations in their normal incarnations. Palaeo data are far from independent observations. I&#8217;ve blogged <a title="Smoothing temporally correlated&nbsp;data" href="http://ucfagls.wordpress.com/2011/07/21/smoothing-temporally-correlated-data/">before</a> about the problems of smoothing temporally correlated data. Those problems apply just as well to LOESS though they are harder to solve.</p>
<p>Why is this an issue? Well, the reason for fitting LOESS (or any smoother/model) to the stratigraphic data is to show any pattern or trend. With LOESS, what pattern or trend you get is determined by the data <em>and crucially</em> by the parameters chosen for the fit. Once you have the LOESS fit you <em>are</em> going to look at it, ponder what it means, interpret it in light of some other factors, posit a plausible mechanism for its generation. But what if the pattern or trend you&#8217;ve lovingly produced isn&#8217;t real? What if the features being pondered are statistically indistinguishable from no trend or pattern? LOESS makes it really easy to extract a pattern or trend and because it is a <q>proper</q> stats method it is often taken for granted that the pattern so derived is meaningful.</p>
<p>Consider the example data from the <a title="Smoothing temporally correlated&nbsp;data" href="http://ucfagls.wordpress.com/2011/07/21/smoothing-temporally-correlated-data/">earlier blog post</a>&nbsp;from Kohn et al (2000), where the aim is to uncover the known model from observations drawn from this model with moderate AR(1) noise. The data sample is generated using the following code:</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
set.seed(321)
n &lt;- 100
time &lt;- 1:n
xt &lt;- time/n
Y &lt;- (1280 * xt^4) * (1- xt)^4
y &lt;- as.numeric(Y + arima.sim(list(ar = 0.3713), n = n))
</pre>
<p>Several R functions can fit LOESS-like models (e.g. <code>lowess()</code> and <code>loess()</code>in base R but note these two are not the same type of LOESS model). The code chunk and figure below show three fits using different values for the span parameter.</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
## fit LOESS models
lo1 &lt;- loess(y ~ xt) ## span = 0.75
lo2 &lt;- update(lo1, span = 0.25)
lo3 &lt;- update(lo1, span = 0.5)
</pre>
<div id="attachment_531" class="wp-caption aligncenter" style="width: 610px"><a href="http://ucfagls.files.wordpress.com/2012/07/loess_span_examples3.png"><img class="size-full wp-image-531" title="loess_span_examples" src="http://ucfagls.files.wordpress.com/2012/07/loess_span_examples3.png?w=640" alt="Three LOESS fits to the example data using span = 0.75, 0.25 and 0.5"   /></a><p class="wp-caption-text">Three LOESS fits to the example data using span = 0.75, 0.25 and 0.5</p></div>
<p>The plot was produced using</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
COL &lt;- &quot;darkorange1&quot;
layout(matrix(1:4, nrow = 2))
plot(y ~ xt, xlab = expression(x[t]), ylab = expression(y[t]),
     main = expression(lambda == 0.75))
lines(Y ~ xt, lty = &quot;dashed&quot;, lwd = 1)
lines(fitted(lo1) ~ xt, col = COL, lwd = 2)
plot(y ~ xt, xlab = expression(x[t]), ylab = expression(y[t]),
     main = expression(lambda == 0.25))
lines(Y ~ xt, lty = &quot;dashed&quot;, lwd = 1)
lines(fitted(lo2) ~ xt, col = COL, lwd = 2)
plot(y ~ xt, xlab = expression(x[t]), ylab = expression(y[t]),
     main = expression(lambda == 0.5))
lines(Y ~ xt, lty = &quot;dashed&quot;, lwd = 1)
lines(fitted(lo3) ~ xt, col = COL, lwd = 2)
layout(1)
</pre>
<p>There is little to choose between <img src='http://s0.wp.com/latex.php?latex=%5Clambda&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;lambda' title='&#92;lambda' class='latex' /> = 0.75 and <img src='http://s0.wp.com/latex.php?latex=%5Clambda&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;lambda' title='&#92;lambda' class='latex' /> = 0.5 if we are comparing them with the known model (the dashed line), but suppose we don&#8217;t know the true model?</p>
<p>The optimal <img src='http://s0.wp.com/latex.php?latex=%5Clambda&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;lambda' title='&#92;lambda' class='latex' /> according to GCV can be determined using a function modified from a posting to R-Help by Michael Friendly (the original function did more than compute GCV)</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
loessGCV &lt;- function (x) {
    ## Modified from code by Michael Friendly
    ## http://tolstoy.newcastle.edu.au/R/help/05/11/15899.html
    if (!(inherits(x,&quot;loess&quot;))) stop(&quot;Error: argument must be a loess object&quot;)
    ## extract values from loess object
    span &lt;- x$pars$span
    n &lt;- x$n
    traceL &lt;- x$trace.hat
    sigma2 &lt;- sum(resid(x)^2) / (n-1)
    gcv  &lt;- n*sigma2 / (n-traceL)^2
    result &lt;- list(span=span, gcv=gcv)
    result
}
</pre>
<p>The <code>optimize()</code> function can be used to find the value of <img src='http://s0.wp.com/latex.php?latex=%5Clambda&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;lambda' title='&#92;lambda' class='latex' /> that achieves minimal GCV. A small wrapper function is required to link <code>optimize()</code> with <code>loessGCV()</code></p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
bestLoess &lt;- function(model, spans = c(.05, .95)) {
    f &lt;- function(span) {
        mod &lt;- update(model, span = span)
        loessGCV(mod)[[&quot;gcv&quot;]]
    }
    result &lt;- optimize(f, spans)
    result
}
</pre>
<p>The optimal <img src='http://s0.wp.com/latex.php?latex=%5Clambda&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;lambda' title='&#92;lambda' class='latex' /> is chosen using <code>bestLoess()</code>. The optimal <img src='http://s0.wp.com/latex.php?latex=%5Clambda&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;lambda' title='&#92;lambda' class='latex' /> is about 0.18 with a GCV of around 0.009</p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
&gt; best
$minimum
[1] 0.1813552

$objective
[1] 0.009433405
</pre>
<p>Our original LOESS model can be updated to use this <img src='http://s0.wp.com/latex.php?latex=%5Clambda&amp;bg=ffffff&amp;fg=333333&amp;s=0' alt='&#92;lambda' title='&#92;lambda' class='latex' /></p>
<pre class="brush: r; title: ; toolbar: false; notranslate">
lo.gcv &lt;- update(lo1, span = best$minimum)
</pre>
<p>The fit this gives is shown in the figure below</p>
<div id="attachment_538" class="wp-caption aligncenter" style="width: 610px"><a href="http://ucfagls.files.wordpress.com/2012/07/loess_gcv_fit.png"><img class="size-full wp-image-538" title="loess_gcv_fit" src="http://ucfagls.files.wordpress.com/2012/07/loess_gcv_fit.png?w=640" alt="Optimal LOESS fit as determined  by GCV"   /></a><p class="wp-caption-text">Optimal LOESS fit as determined by GCV</p></div>
<p>This model clearly overfits; the result of the GCV criterion not knowing that the data are temporally autocorrelated. The whole process assumes that the data are independent and clearly palaeo data often violate this critical assumption. If you didn&#8217;t know the real underlying model the average palaeo type would already be penning their next paper on remarkable variation in [INSERT TIME PERIOD]  climate from [INSERT SITE]. Yet all that wiggliness, the signal, just isn&#8217;t real. Take another sample of data and you would get about the same level of wiggliness but in different places; the signal is just a figment of the sample of data you happen to have collected.</p>
<p>There are solutions to this problem of course; <em>h</em>-block CV has been suggested as a more appropriate means of CV for time series where <em>h</em> observations either side of the target observation are left out from the data used to fit the model to predict the target. There are variations on approach too, as <em>h</em>-block CV tends to over-fit in some situations. I&#8217;ll go into this in a bit more detail in a later posting.</p>
<p>Be very careful using LOESS for palaeo data!</p>
<h2>References</h2>
<p>Kohn R., Schimek M.G., Smith M. (2000) Spline and kernel regression for dependent data. In Schimekk M.G. (Ed) (2000) <em>Smoothing and Regression: approaches, computation and application</em>. John Wiley &amp; Sons, Inc.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ucfagls.wordpress.com/523/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ucfagls.wordpress.com/523/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ucfagls.wordpress.com&#038;blog=14744973&#038;post=523&#038;subd=ucfagls&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://ucfagls.wordpress.com/2012/07/24/whats-wrong-with-loess-for-palaeo-data/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/af0cc2f46bd679e92029bc489cdde955?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ucfagls</media:title>
		</media:content>

		<media:content url="http://ucfagls.files.wordpress.com/2012/07/loess_span_examples3.png" medium="image">
			<media:title type="html">loess_span_examples</media:title>
		</media:content>

		<media:content url="http://ucfagls.files.wordpress.com/2012/07/loess_gcv_fit.png" medium="image">
			<media:title type="html">loess_gcv_fit</media:title>
		</media:content>
	</item>
		<item>
		<title>Sometimes I am lost for words&#8230;</title>
		<link>http://ucfagls.wordpress.com/2012/07/17/sometimes-i-am-lost-for-words/</link>
		<comments>http://ucfagls.wordpress.com/2012/07/17/sometimes-i-am-lost-for-words/#comments</comments>
		<pubDate>Tue, 17 Jul 2012 13:00:28 +0000</pubDate>
		<dc:creator>ucfagls</dc:creator>
				<category><![CDATA[OpenAccess]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[CC BY]]></category>
		<category><![CDATA[geoscience data]]></category>
		<category><![CDATA[John Wiley & Sons]]></category>
		<category><![CDATA[Open Access]]></category>
		<category><![CDATA[RCUK]]></category>
		<category><![CDATA[Research Council UK]]></category>
		<category><![CDATA[Royal Meteorological Society]]></category>
		<category><![CDATA[UK Government]]></category>
		<category><![CDATA[Wiley]]></category>

		<guid isPermaLink="false">http://ucfagls.wordpress.com/?p=493</guid>
		<description><![CDATA[Following on from yesterday&#8217;s momentous announcements&#160;from UK Government and RCUK on opening up access to research outputs funded by the British tax payer, John Wiley &#38; Sons Inc. have announced the Geoscience Data Journal&#160;in partnership with the Royal Meteorological Society, &#8230; <a href="http://ucfagls.wordpress.com/2012/07/17/sometimes-i-am-lost-for-words/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ucfagls.wordpress.com&#038;blog=14744973&#038;post=493&#038;subd=ucfagls&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Following on from yesterday&#8217;s momentous announcements&nbsp;from UK Government and RCUK on opening up access to research outputs funded by the British tax payer, <a class="zem_slink" title="John Wiley &amp; Sons" href="http://www.wiley.com/" rel="homepage" target="_blank">John Wiley &amp; Sons</a> Inc. have <a href="http://eu.wiley.com/WileyCDA/PressRelease/pressReleaseId-104139.html">announced</a> the <a href="http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)2049-6060">Geoscience Data Journal</a>&nbsp;in partnership with the <a class="zem_slink" title="Royal Meteorological Society" href="http://en.wikipedia.org/wiki/Royal_Meteorological_Society" rel="wikipedia" target="_blank">Royal Meteorological Society</a>, the Met Office and <a class="zem_slink" title="Natural Environment Research Council" href="http://www.nerc.ac.uk" rel="homepage" target="_blank">NERC</a>.</p>
<p>I first heard about this exciting new initiative via a submitted abstract from the EGU conference this summer. There need to be incentives for scientists to do more than just deposit data in repositories and data papers are a way to document their collection and processing and to acquire citations that indicate to the bean counters that scientists are making worthwhile contributions; having <em>impact</em> in the common parlance.</p>
<p>So I should be happy, right? Nope, not one bit!<br />
<span id="more-493"></span><br />
Recall that RCUK and UK Government have mandated that no more restrictions than those covered by the <a class="zem_slink" title="Creative Commons licenses" href="http://www.creativecommons.org/" rel="homepage" target="_blank">Creative Commons By Attribution</a> (<a class="zem_slink" title="CC BY" href="http://creativecommons.org/licenses/by/3.0/" rel="homepage" target="_blank">CC BY</a>) licence may be imposed on tax-payer funded research published in the scientific literature. CC BY is also the industry standard amongst scientists being compatible with the <a class="zem_slink" title="Budapest Open Access Initiative" href="http://en.wikipedia.org/wiki/Budapest_Open_Access_Initiative" rel="wikipedia" target="_blank">Budapest Open Access Initiative</a> (BOAI) definition of open access. Guess what licence Wiley have chosen for the Geoscience Data Journal? CC BY-NC, the non-commercial&nbsp;derivative&nbsp;of CC BY.</p>
<p>And this is where words start to fail me&#8230;</p>
<p>NERC is a major supporter of the Geoscience Data Journal. It is also part of RCUK and so bound by the recent announcements on open access. UK scientists in receipt of NERC funds will no longer be allowed to publish outputs from that research in the very journal NERC is supporting and was involved in setting up. This is staggeringly short-sighted of NERC, the editorial board and Wiley. It is not as if RCUK&#8217;s decision has come out of the blue; a draft has been available for many months for consultation ( I even <a title="Thoughts on the proposed RCUK policy on Open Access to research&nbsp;outputs" href="http://ucfagls.wordpress.com/2012/04/10/thoughts-on-the-propose-rcuk-policy-on-open-access-to-research-outputs/">blogged</a> about it a while ago).</p>
<p>Quite why Wiley want to retain commercial rights to papers published in the journal is beyond me. They are obviously not&nbsp;satisfied&nbsp;with pocketing the £1000 ($1500) they will charge authors to publish papers in the journal.</p>
<p>This should have been a PR win for Wiley and a win for the whole geosciences community. Instead it paints a picture of a publisher out of touch and out of step with the open access landscape within which it is operating.</p>
<p>I emailed Wiley&#8217;s Science Press Room and Rob Allan, the journal&#8217;s editor, making these points and requesting that the licence be reconsidered. If you want to do the same, email Sciencenewsroom@Wiley.com FOA of Ben Norman and Rob Allan (rob.allan AT metoffice.gov.uk).&nbsp;My email is appended below. I do hope that they listen&#8230;</p>
<pre style="font-size:.8em;">Dear Ben, Rob,

I was alerted via Twitter to the press release from Wiley regarding the
new Geoscience Data Journal. I was excited to learn of the new journal
from an EGU abstract earlier in the year and have been looking forward
to further announcements. What should be a positive addition to the open
access landscape for the geoscience community now comes across as badly
out-dated and suggests that Wiley and the people and organisations
related to this announcement (NERC, Royal Meteorological Society etc)
are way behind the times when it comes to open access.

Why? The Geoscience Data Journal is licensed under a CC BY-NC licence.
Firstly this is not a true open access licence according to the Budapest
Open Access Initiative (BOAI) nor is it compatible with the new policy
on open access announced by Research Councils UK (of which NERC is a
part) yesterday. The BOAI has adopted CC BY as the level of restriction
that should be placed on open access research "publications". This
standard has been upheld by RCUK and the UK Government as the standard
to which it will hold recipients of its research funds. It is the height
of irony that, as it stands, no one in receipt of NERC funding would be
able to publish those funded outputs in this new journal, which NERC has
been instrumental in setting up. Similar restrictions related to
publishing EU-funded work/data are being discussed ahead of the next
major funding programme.

Springer, one of your major rivals, has adopted CC BY as the licence for
all it's open access journals. Elsevier has been publicly lambasted by
the open access movement for it's adoption of non-BOAI-compliant "open"
licences, which is but one reason that so many academics have pledged to
boycott that publisher.

By adopting the CC BY-NC licence for this journal you are severely
restricting the potential pool of authors, undermining the commonly-held
standard of CC BY for open access publishing, and going against the
ground swell of recent announcements from the UK Government and other
researcher funding agencies.

I do hope that Wiley will take another look at the licence under which
submissions to Geoscience Data Journal, and all it's open access
offerings, are made available with a view to immediately adopting CC BY
instead.

I wish you well with this exciting new initiative, but unless the
licence is reconsidered, I and many other scientists will be unable,
morally or otherwise, to submit our data to the new journal.

Yours,Dr. Gavin Simpson</pre>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ucfagls.wordpress.com/493/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ucfagls.wordpress.com/493/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ucfagls.wordpress.com&#038;blog=14744973&#038;post=493&#038;subd=ucfagls&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://ucfagls.wordpress.com/2012/07/17/sometimes-i-am-lost-for-words/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/af0cc2f46bd679e92029bc489cdde955?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ucfagls</media:title>
		</media:content>
	</item>
		<item>
		<title>Quantitative palaeolimnology: my book chapters are finally out!</title>
		<link>http://ucfagls.wordpress.com/2012/04/23/quantitative-palaeolimnology-my-book-chapters-are-finally-out/</link>
		<comments>http://ucfagls.wordpress.com/2012/04/23/quantitative-palaeolimnology-my-book-chapters-are-finally-out/#comments</comments>
		<pubDate>Mon, 23 Apr 2012 13:33:16 +0000</pubDate>
		<dc:creator>ucfagls</dc:creator>
				<category><![CDATA[analogue]]></category>
		<category><![CDATA[Palaeoecology]]></category>
		<category><![CDATA[Palaeolimnology]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[vegan]]></category>
		<category><![CDATA[Numerical analysis]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://ucfagls.wordpress.com/?p=461</guid>
		<description><![CDATA[Today I received confirmation that the delayed fifth volume in the Developments in Palaeoenvironmental Research series has been published. The book is titled Data Handling and Numerical methods, though it covers more of the latter and, IMHO, is far more interesting than &#8230; <a href="http://ucfagls.wordpress.com/2012/04/23/quantitative-palaeolimnology-my-book-chapters-are-finally-out/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ucfagls.wordpress.com&#038;blog=14744973&#038;post=461&#038;subd=ucfagls&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Today I received confirmation that the delayed <a href="http://www.springerlink.com/content/978-94-007-2744-1/">fifth volume</a> in the <a href="http://www.springerlink.com/content/1571-5299/">Developments in Palaeoenvironmental Research</a> series has been published. The book is titled <em>Data Handling and Numerical methods</em>, though it covers more of the latter and, IMHO, is far more interesting than the dry title would suggest (who gets excited by <em>Data Handling</em>? Well, one or two people perhaps ;-)</p>
<p>A <a href="http://www.springerlink.com/content/978-94-007-2744-1/#section=1058967&amp;page=12">full table of contents</a> can be found on the <a class="zem_slink" title="Springer Science+Business Media" href="http://www.springer.com" rel="homepage" target="_blank">SpringerLink</a> website, though in their infinite wisdom, this material is not available as <a class="zem_slink" title="HTML" href="http://en.wikipedia.org/wiki/HTML" rel="wikipedia" target="_blank">HTML</a> but as embedded previews of the pages, which you can download as PDFs (as you can each chapter but for the proper fee).</p>
<p>I authored or co-authored three of the 21 chapters</p>
<ul>
<li>Chapter 9 <a href="http://dx.doi.org/10.1007/978-94-007-2745-8_9">Statistical Learning in Palaeolimnology</a> (with <a href="http://www.uib.no/persons/John.Birks">John Birks</a>)</li>
<li>Chapter 15 <a href="http://dx.doi.org/10.1007/978-94-007-2745-8_15">Analogue Methods in Palaeolimnology</a></li>
<li>Chapter 19 <a href="http://dx.doi.org/10.1007/978-94-007-2745-8_19">Human Impacts: Applications of Numerical Methods to Evaluate Surface-Water Acidification and Eutrophication</a> (with <a href="http://biology.uwaterloo.ca/people/roland-hall">Roland Hall</a>)</li>
</ul>
<div id="attachment_473" class="wp-caption aligncenter" style="width: 610px"><a href="https://ucfagls.files.wordpress.com/2012/04/classification_tree_scp_example.png"><img class="size-full wp-image-473" title="classification_tree_scp_example" src="https://ucfagls.files.wordpress.com/2012/04/classification_tree_scp_example.png?w=640" alt="Classification tree fitted to the SCP chemical data in DPER Chapter 9"   /></a><p class="wp-caption-text">Pruned classification tree fitted to the three-fuel spheroidal carbonaceous particle (SCP) example data. The predicted fuel types for each terminal node are shown, as are the split variables and thresholds that define the prediction rules.</p></div>
<p>All three chapters relied heavily upon R; the first two being conducted entirely in R. Chapter 19 was written such a long time ago now that it wasn&#8217;t all done in R (WA-PLS, Maximum Likelihood transfer functions methods weren&#8217;t then available in R, nor were some of the ordination based methods I used). However, I&#8217;m confident the entire thing (at least the acidification parts) could be done using R now.</p>
<p>I have scripts for all the analyses performed using R. Some need a little work before I post them (mainly for Chapter 9) but I aim to maintain up-to-date scripts on my blog. Details soon.</p>
<p>Although writing these chapters took on a life of their own and used up far more time than they should have done, I am genuinely pleased with the results. I certainly learned a huge amount more about the statistics that underlay the techniques that palaeolimnologists and palaeoecologists use day in day out.</p>
<p>To pre-empt requests for PDFs of the chapters; I don&#8217;t have any so don&#8217;t ask. I&#8217;m not sure what arrangements were made originally with the publisher in that regard. I haven&#8217;t even seen the final book yet either; still waiting on my copy to pop through the letter box. If you want a copy and didn&#8217;t get in a pre-order at the discount rate, I suspect the best bet would be to pick on up later this summer at the <a href="http://paleolim.org/ips2012/">International Paleolimnology Symposium in Glasgow</a>, where I&#8217;m sure there will be a conference discount. If I hear of any offers in the meantime I&#8217;ll post something here.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ucfagls.wordpress.com/461/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ucfagls.wordpress.com/461/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ucfagls.wordpress.com&#038;blog=14744973&#038;post=461&#038;subd=ucfagls&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://ucfagls.wordpress.com/2012/04/23/quantitative-palaeolimnology-my-book-chapters-are-finally-out/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/af0cc2f46bd679e92029bc489cdde955?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ucfagls</media:title>
		</media:content>

		<media:content url="https://ucfagls.files.wordpress.com/2012/04/classification_tree_scp_example.png" medium="image">
			<media:title type="html">classification_tree_scp_example</media:title>
		</media:content>
	</item>
		<item>
		<title>Customising vegan&#8217;s ordination plots</title>
		<link>http://ucfagls.wordpress.com/2012/04/11/customising-vegans-ordination-plots/</link>
		<comments>http://ucfagls.wordpress.com/2012/04/11/customising-vegans-ordination-plots/#comments</comments>
		<pubDate>Wed, 11 Apr 2012 13:15:48 +0000</pubDate>
		<dc:creator>ucfagls</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[vegan]]></category>
		<category><![CDATA[Biplot]]></category>
		<category><![CDATA[ordination]]></category>
		<category><![CDATA[PCA]]></category>
		<category><![CDATA[Principal component analysis]]></category>

		<guid isPermaLink="false">http://ucfagls.wordpress.com/?p=437</guid>
		<description><![CDATA[As a developer on the vegan package for R, one of the most FAQs is how to customise ordination diagrams, usually to colour the sample points according to an external grouping variable. Now, just because we get asked how to &#8230; <a href="http://ucfagls.wordpress.com/2012/04/11/customising-vegans-ordination-plots/">Continue reading <span class="meta-nav">&#8594;</span></a><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ucfagls.wordpress.com&#038;blog=14744973&#038;post=437&#038;subd=ucfagls&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>As a developer on the <a href="vegan.r-forge.r-project.org">vegan</a> package for R, one of the most FAQs is how to customise <a class="zem_slink" title="Ordination" href="http://en.wikipedia.org/wiki/Ordination_(statistics)" rel="wikipedia" target="_blank">ordination</a> diagrams, usually to colour the sample points according to an external grouping variable. Now, just because we get asked how to do this a lot is not really a reflection on the quality of the <code>plot()</code> methods available in vegan.</p>
<p>Ordination diagrams are difficult beasts to handle with plotting code without having an unwieldy number of arguments etc. There are potentially five sets of scores that need to be plotted so the number of arguments could quickly get out of hand if we allowed the user to pass all the relevant graphical parameters to the various sets of scores. We&#8217;ve provided a basic <code>plot()</code> method that is set up with useful defaults that works for all the ordination methods in vegan. This method is really there to allow the quick visualisation of the fitted ordination; I use vegan on almost a daily basis and none of my presentation or publication figures use the default plot method at all. However, we have provided all the tools needed to draw custom ordination plots within vegan. How you use them also provides a useful guide to building up base graphics plots from lower-level plotting functions. In this post I intend to show two examples of building up a simple <a class="zem_slink" title="Principal component analysis" href="http://en.wikipedia.org/wiki/Principal_component_analysis" rel="wikipedia" target="_blank">PCA</a> <a class="zem_slink" title="Biplot" href="http://en.wikipedia.org/wiki/Biplot" rel="wikipedia" target="_blank">biplot</a> from the basic building blocks available in vegan and R&#8217;s base graphics.</p>
<p><span id="more-437"></span></p>
<p>To get going, start R and load the vegan package. For this example I will use the classic Dutch Dune Meadow data set which I also load. A simple PCA of the species data is then fitted and stored in <code>mod</code>. To finish the basic example, I use the basic <code>plot()</code> method to plot the PCA biplot. Note that I&#8217;m using symmetric scaling here; I tend to prefer this scaling for general diagrams as it preserves many of the features of biplots without focusing on one or other sets of scores. (Here I&#8217;m also using it to illustrate how to select a scaling when you are building the plot from scratch.)</p>
<pre class="brush: r; light: true; title: ; notranslate">
## load vegan
require(&quot;vegan&quot;)

## load the Dune data
data(dune, dune.env)

## PCA of the Dune data
mod &lt;- rda(dune, scale = TRUE)

## plot the PCA
plot(mod, scaling = 3)
</pre>
<p>The resulting PCA biplot is shown below</p>
<div id="attachment_443" class="wp-caption aligncenter" style="width: 490px"><a href="https://ucfagls.files.wordpress.com/2012/04/custom_ordination_basic_plot_method.png"><img class="size-full wp-image-443" title="custom_ordination_basic_plot_method" src="https://ucfagls.files.wordpress.com/2012/04/custom_ordination_basic_plot_method.png?w=640" alt="Basic plot() method for ordinations in vegan"   /></a><p class="wp-caption-text">The result of a call to the plot.cca() method results in a simple biplot of the Dune Meadows PCA but is not very customisable.</p></div>
<h2>Building a biplot using vegan methods</h2>
<p>The first example of a customised biplot I will show uses low-level plotting methods provided by vegan. These include <code>points()</code> and <code>text()</code> methods for objects of class <code>"cca"</code>. (The <code>"cca"</code> object is one of the base ordination classes in vegan; its name is a bit unfortunate as it is the base representation for PCA, CA, RDA etc&#8230; objects — read more about this object via <code>?cca.object</code>.) I am going to plot the basic PCA biplot, but colour the sites according to the land-use variable <code>dune.env$Use</code>, which has three levels</p>
<pre class="brush: r; light: true; title: ; notranslate">
&gt; with(dune.env, levels(Use))
[1] &quot;Hayfield&quot; &quot;Haypastu&quot; &quot;Pasture&quot;
</pre>
<p>Start by defining an object holding the desired ordination scaling and a vector of colours with which to do the plotting</p>
<pre class="brush: r; light: true; title: ; notranslate">
scl &lt;- 3 ## scaling = 3
colvec &lt;- c(&quot;red2&quot;, &quot;green4&quot;, &quot;mediumblue&quot;)
</pre>
<p>The basic <code>plot()</code> method can be coerced into setting up the basic plot axes, limits etc for us by using <code>type = "n"</code>, so we use that short cut and pass along our desired scaling</p>
<pre class="brush: r; light: true; title: ; notranslate">
plot(mod, type = &quot;n&quot;, scaling = scl)
</pre>
<p>The next step is to add the site scores. Here I use the <code>points()</code> method rather than draw the site scores using their sample ID (as is the default in the <code>plot()</code> method).</p>
<pre class="brush: r; light: true; title: ; notranslate">
with(dune.env, points(mod, display = &quot;sites&quot;, col = colvec[Use],
                      scaling = scl, pch = 21, bg = colvec[Use]))
</pre>
<p>The key point to note in the code chunk above is how I colour each site according to its land-use. I index into the vector of colours created earlier using the factor <code>Use</code>. <code>Use</code> is essentially a vector of 1s, 2s, and 3s (there are three levels remember). The <code>colvec[Use]</code> evaluates to a vector the same length as the number of sites, where each element is one of the pre-specified colours</p>
<pre class="brush: r; light: true; title: ; notranslate">
 &gt; head(with(dune.env, colvec[Use]))
[1] &quot;green4&quot;     &quot;green4&quot;     &quot;green4&quot;     &quot;mediumblue&quot;
[5] &quot;green4&quot;     &quot;green4&quot;
</pre>
<p>The <code>display</code> argument selects the type of scores to plot. The remainder of the arguments are the scaling for the scores (so they match the base plot) and arguments to style the plotted points.</p>
<p>Next I add the species scores, but this time I want to label them with (abbreviated) species names. For this I use the <code>text()</code> method with argument <code>display = "species"</code></p>
<pre class="brush: r; light: true; title: ; notranslate">
text(mod, display = &quot;species&quot;, scaling = scl, cex = 0.8, col = &quot;darkcyan&quot;)
</pre>
<p>To finish the plot I add a legend. It is important to get the ordering and labelling of the colours correct here. When I drew the site scores I used the <code>Use</code> factor to index the vector of plotting colours. To ensure we get the correct ordering in the legend, it is best to extract the levels as the lables, which is what I do below</p>
<pre class="brush: r; light: true; title: ; notranslate">
with(dune.env, legend(&quot;topright&quot;, legend = levels(Use), bty = &quot;n&quot;,
                      col = colvec, pch = 21, pt.bg = colvec))
</pre>
<p>If you want to provide custom labels, look at the levels of the factor and provide them to <code>legend()</code> in the correct order.</p>
<p>The biplot should now look like the one below</p>
<div id="attachment_445" class="wp-caption aligncenter" style="width: 490px"><a href="https://ucfagls.files.wordpress.com/2012/04/custom_ordination_vegan_version.png"><img class="size-full wp-image-445" title="custom_ordination_vegan_version" src="https://ucfagls.files.wordpress.com/2012/04/custom_ordination_vegan_version.png?w=640" alt="Customised ordination built from vegan lower-level plot methods"   /></a><p class="wp-caption-text">A customised ordination built from vegan lower-level plot methods</p></div>
<h2>Building a biplot using base graphics directly</h2>
<p>If you want the ultimate level of control over the plots then you will want to build the plot up from scratch using lower-level plotting functions provided by base graphics. In this second example I&#8217;ll recreate the plot I made above but from the ground up using basic plotting functions.</p>
<p>First, start by extracting the scores needed from the ordination object. Here the scaling required for the plot needs to be provided and the sets of scores specified</p>
<pre class="brush: r; light: true; title: ; notranslate">
scrs &lt;- scores(mod, display = c(&quot;sites&quot;, &quot;species&quot;), scaling = scl)
</pre>
<p>This results in a list with two components; <code>sites</code> and <code>species</code></p>
<pre class="brush: r; light: true; title: ; notranslate">
str(scrs, max = 1)
</pre>
<p>Each component is a matrix with two columns containing the scores on the first and second principal components respectively. If you want to extract scores on different axes provide the <code>choices</code> argument to the <code>scores()</code> function with a numeric vector of the axes you wish to extract. Do note that the code below assumes only two axes are extracted.</p>
<p>Next compute the axis limits, which need to cover the range of the site and species scores on the first (x-axis) and second (y-axis) principal components</p>
<pre class="brush: r; light: true; title: ; notranslate">
xlim &lt;- with(scrs, range(species[,1], sites[,1]))
ylim &lt;- with(scrs, range(species[,2], sites[,2]))
</pre>
<p>Now everything is ready to do some actual plotting. Start preparing the plot device be starting a new plot with <code>plot.new()</code> and then set up the coordination system via a call to <code>plot.window()</code> supplying the axis limits created above. A crucial aspect of the call to <code>plot.window()</code> is the graphical parameter <code>asp</code> which controls the aspect ratio of the plot. Here we set the aspect ratio equal to 1 to preserve the relationships between scores on the different axes and the distance interpretation of the biplot</p>
<pre class="brush: r; light: true; title: ; notranslate">
plot.new()
plot.window(xlim = xlim, ylim = ylim, asp = 1)
</pre>
<p>The reference guides (dotted lines going through the point (0,0) are added first so as to not lie on top of any of the site or species scores. Two calls to the <code>abline()</code> function are used</p>
<pre class="brush: r; light: true; title: ; notranslate">
abline(h = 0, lty = &quot;dotted&quot;)
abline(v = 0, lty = &quot;dotted&quot;)
</pre>
<p>The next two lines of code use the default methods for <code>points()</code> and <code>text()</code> rather than the <code>"cca"</code> methods used above</p>
<pre class="brush: r; light: true; title: ; notranslate">
with(dune.env, points(scrs$sites, col = colvec[Use],
                      pch = 21, bg = colvec[Use]))
with(scrs, text(species, labels = rownames(species),
                col = &quot;darkcyan&quot;, cex = 0.8))
</pre>
<p>The legend is added in same way as before</p>
<pre class="brush: r; light: true; title: ; notranslate">
with(dune.env, legend(&quot;topright&quot;, legend = levels(Use), bty = &quot;n&quot;,
                      col = colvec, pch = 21, pt.bg = colvec))
</pre>
<p>All that remains is to add the plot furniture; the axes, axis labels and the plot frame</p>
<pre class="brush: r; light: true; title: ; notranslate">
axis(side = 1)
axis(side = 2)
title(xlab = &quot;PC 1&quot;, ylab = &quot;PC 2&quot;)
box()
</pre>
<p>The fruits of our labours are shown below<br />
<div id="attachment_456" class="wp-caption aligncenter" style="width: 490px"><a href="https://ucfagls.files.wordpress.com/2012/04/custom_ordination_base_graphics_version1.png"><img src="https://ucfagls.files.wordpress.com/2012/04/custom_ordination_base_graphics_version1.png?w=640" alt="PCA ordination built up directly from base graphics elements" title="custom_ordination_base_graphics_version"   class="size-full wp-image-456" /></a><p class="wp-caption-text">The customised ordination built directly from base graphics elements</p></div></p>
<p>I don&#8217;t claim these are perfect; many of the labels lie on top of one another for example. Vegan has some functions to help with this but I&#8217;ll leave exemplifying those for another post.</p>
<p>The full code for the two examples is shown below</p>
<pre class="brush: r; light: true; title: ; notranslate">
## load vegan
require(&quot;vegan&quot;)

## load the Dune data
data(dune, dune.env)

## PCA of the Dune data
mod &lt;- rda(dune, scale = TRUE)

## plot the PCA
plot(mod, scaling = 3)

## build the plot up via vegan methods
scl &lt;- 3 ## scaling == 3
colvec &lt;- c(&quot;red2&quot;, &quot;green4&quot;, &quot;mediumblue&quot;)
plot(mod, type = &quot;n&quot;, scaling = scl)
with(dune.env, points(mod, display = &quot;sites&quot;, col = colvec[Use],
                      scaling = scl, pch = 21, bg = colvec[Use]))
text(mod, display = &quot;species&quot;, scaling = scl, cex = 0.8, col = &quot;darkcyan&quot;)
with(dune.env, legend(&quot;topright&quot;, legend = levels(Use), bty = &quot;n&quot;,
                      col = colvec, pch = 21, pt.bg = colvec))

## or via base graphics methods
scrs &lt;- scores(mod, display = c(&quot;sites&quot;, &quot;species&quot;), scaling = scl)
str(scrs, max = 1)

xlim &lt;- with(scrs, range(species[,1], sites[,1]))
ylim &lt;- with(scrs, range(species[,2], sites[,2]))

plot.new()
plot.window(xlim = xlim, ylim = ylim, asp = 1)
abline(h = 0, lty = &quot;dotted&quot;)
abline(v = 0, lty = &quot;dotted&quot;)
with(dune.env, points(scrs$sites, col = colvec[Use],
                      pch = 21, bg = colvec[Use]))
with(scrs, text(species, labels = rownames(species),
                col = &quot;darkcyan&quot;, cex = 0.8))
axis(1)
axis(2)
title(xlab = &quot;PC 1&quot;, ylab = &quot;PC 2&quot;)
with(dune.env, legend(&quot;topright&quot;, legend = levels(Use), bty = &quot;n&quot;,
                      col = colvec, pch = 21, pt.bg = colvec))
box()
</pre>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/ucfagls.wordpress.com/437/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/ucfagls.wordpress.com/437/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=ucfagls.wordpress.com&#038;blog=14744973&#038;post=437&#038;subd=ucfagls&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://ucfagls.wordpress.com/2012/04/11/customising-vegans-ordination-plots/feed/</wfw:commentRss>
		<slash:comments>45</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/af0cc2f46bd679e92029bc489cdde955?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">ucfagls</media:title>
		</media:content>

		<media:content url="https://ucfagls.files.wordpress.com/2012/04/custom_ordination_basic_plot_method.png" medium="image">
			<media:title type="html">custom_ordination_basic_plot_method</media:title>
		</media:content>

		<media:content url="https://ucfagls.files.wordpress.com/2012/04/custom_ordination_vegan_version.png" medium="image">
			<media:title type="html">custom_ordination_vegan_version</media:title>
		</media:content>

		<media:content url="https://ucfagls.files.wordpress.com/2012/04/custom_ordination_base_graphics_version1.png" medium="image">
			<media:title type="html">custom_ordination_base_graphics_version</media:title>
		</media:content>
	</item>
	</channel>
</rss>
