A better way of saving and loading objects in R

Hadley Wickham (@hadleywickham) this week mentioned on Twitter his preference for saveRDS() over the more familiar save(). Being a new function to me, I thought I’d take a look…

save() and load() will be familiar to many R users. They allow you to save a named R object to a file or other connection and restore that object again. When loaded the named object is restored to the current environment (in general use this is the global environment — the workspace) with the same name it had when saved. This is annoying for example when you have a saved model object resulting from a previous fit and you want to compare it with the model object returned when the R code is rerun. Unless you change the name of the model fit object in your script you can’t have both the saved object and the newly created one available in the same environment at the same time.

Here’s an example of what I mean.

> require(mgcv)
Loading required package: mgcv
This is mgcv 1.7-13. For overview type 'help("mgcv-package")'.
> mod <- gam(Ozone ~ s(Wind), data = airquality, method = "REML")
> mod

Family: gaussian
Link function: identity

Formula:
Ozone ~ s(Wind)

Estimated degrees of freedom:
3.529  total = 4.529002

REML score: 529.4881
> save(mod, file = "mymodel.rda")
> ls()
[1] "mod"
> load(file = "mymodel.rda")
> ls()
[1] "mod"

saveRDS() provides a far better solution to this problem and to the general one of saving and loading objects created with R. saveRDS() serializes an R object into a format that can be saved. Wikipedia describes this thus …serialization is the process of converting a data structure or object state into a format that can be stored (for example, in a file or memory buffer, or transmitted across a network connection link) and “resurrected” later in the same or another computer environment. save() does the same thing, but with one important difference; saveRDS() doesn’t save the both the object and its name it just saves a representation of the object. As a result, the saved object can be loaded into a named object within R that is different from the name it had when originally serialized.

We can illustrate this using the model fitted earlier

> ls()
[1] "mod"
> saveRDS(mod, "mymodel.rds")
> mod2 <- readRDS("mymodel.rds")
> ls()
[1] "mod"  "mod2"
> identical(mod, mod2, ignore.environment = TRUE)
[1] TRUE

(Note that the two model objects have different environments within their representations so we have to ignore this when testing their identity.)

You’ll notice that in the call to saveRDS() I named the file with the extension .rds. This appears to be the convention used for serialized object of this sort; R uses this representation often, for example package meta-data and the databases used by help.search(). In contrast the extension .rda is often used for objects serialized via save().

So there you have it; saveRDS() and readRDS() are the newest additions to my day-to-day workflow.

Update: As Gabor points out in the comments saveRDS() isn’t a drop-in replacement for save(). The main difference is that save() can save many objects to a file in a single call, whilst saveRDS(), being a lower-level function, works with a single object at a time. This is a feature for me given the above use-case, but if you find yourself saving any more than a couple of objects at a time saveRDS() may not be ideal for you. The second significant difference is that saveRDS() forgets the original name of the object; in the use-case above this is also seen as an advantage. If maintaining the original name is important to you then there is no reason to switch from using save() to saveRDS().

About these ads
This entry was posted in R. Bookmark the permalink.

11 Responses to A better way of saving and loading objects in R

  1. G. Grothendieck says:

    Try

    save(mod, file = "mymodel.Rdata")
    load("mymodel.Rdata", envir = e <- new.env())
    identical(mod, e$mod, ignore.environment = TRUE)
    
    • ucfagls says:

      Indeed, thanks Gabor, and hence my qualification …in the same environment at the same time. I know there are ways round this using environments but the average user shouldn’t need them. saveRDS() allows, in my opinion, a nicer, cleaner, more general solution to saving and loading objects in R.

  2. G. Grothendieck says:

    saveRDS also has the deficiency of only being able to handle a single object at a time and you lose the name of the object in the process. If you only intend to save one object anyways and if retaining its name is unimportant then saveRDS is useful but otherwise save is better.

    • ucfagls says:

      You say deficiency, I say feature ;-) I have long stopped saving multiple objects (unless they are very small) into a single .rda file because I rarely found I had need for all the saved objects but loading them all was taking some time. Your points are highly relevant though; it occurs to me that the post could be taken as a suggestion that it was either or and I favoured saveRDS(). Both it and save() have their particular uses; the use-case I showed above was one I come across a lot in my work and so saveRDS() will prove useful to me but until Hadley mentioned it I wasn’t aware it existed.

  3. Gene says:

    I totally agree, the “serialized” saving is so important. There have been help requests asking for such a feature, but R seems committed to the current save / load paradigm that uses environments.

    I wrote my own function for this purpose. It works, but is crude. It depends on having only one saved object in your RData file.
    https://github.com/geneorama/geneorama/blob/master/R/loader.R

    Still, I always only save individual objects, and only use “loader”. I used to use “saver” too (in the same package), but as you mentioned in your use of Hadley’s package, save is sufficient.

  4. Rob says:

    I use this but I think I like Gabor’s solution better:

    rdx.file.contents <- function(rdx.filename)
    {
    load(rdx.filename)
    rm(rdx.filename)
    what <- ls()
    result <- lapply(X = what, FUN = get, envir = environment())
    names(result) <- what
    return(result)
    }

  5. G. Grothendieck says:

    Also in thinking about it a bit more if the function argument begins with a dot it won’t be included so we can reduce it further to just this:

    rdx.file.contents <- function(.rdx.filename) { load(.rdx.filename); as.list(environment()) }

  6. I’ve used the following which allows one to choose whether to return the contents of an rdx file as a list, an environment (with or without the function’s environment as parent) or to “spray” it around the global environment as `load` does.

    Load <- function (file, to.list = TRUE, Unlist = TRUE, spray = FALSE, delete.parent = FALSE) {
    NE <- if (spray) .GlobalEnv else new.env()
    load(file, NE)
    if (to.list) {
    NE <- as.list(NE)
    if (Unlist && length(NE) == 1) NE <- NE[[1]]
    }
    if (delete.parent) parent.env(NE) <- emptyenv()
    NE
    }

    • Unlist is for loading files containing only one object – if Unlist == TRUE then the single object is returned and its original name is lost.

    • … and it’s obvious that I’ve not used all options … e.g., if spray==TRUE it should return nothing immediately after second line, i.e add if(spray) return(invisible(NULL)) after load(file, NE). But of course, the “spray” argument here is not entirely serious.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s