Question
I am running into issues trying to use large objects in R. For example:
> memory.limit(4000)
> a = matrix(NA, 1500000, 60)
> a = matrix(NA, 2500000, 60)
> a = matrix(NA, 3500000, 60)
Error: cannot allocate vector of size 801.1 Mb
> a = matrix(NA, 2500000, 60)
Error: cannot allocate vector of size 572.2 Mb # Can't go smaller anymore
> rm(list=ls(all=TRUE))
> a = matrix(NA, 3500000, 60) # Now it works
> b = matrix(NA, 3500000, 60)
Error: cannot allocate vector of size 801.1 Mb # But that is all there is room for
I understand that this is related to the difficulty of obtaining contiguous blocks of memory (from [here](http://stat.ethz.ch/R-manual/R-patched/library/base/html/Memory- limits.html)):
Error messages beginning cannot allocate vector of size indicate a failure to obtain memory, either because the size exceeded the address-space limit for a process or, more likely, because the system was unable to provide the memory. Note that on a 32-bit build there may well be enough free memory available, but not a large enough contiguous block of address space into which to map it.
How can I get around this? My main difficulty is that I get to a certain point in my script and R can't allocate 200-300 Mb for an object... I can't really pre-allocate the block because I need the memory for other processing. This happens even when I dilligently remove unneeded objects.
EDIT: Yes, sorry: Windows XP SP3, 4Gb RAM, R 2.12.0:
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_Caribbean.1252 LC_CTYPE=English_Caribbean.1252
[3] LC_MONETARY=English_Caribbean.1252 LC_NUMERIC=C
[5] LC_TIME=English_Caribbean.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
Answer
Consider whether you really need all this data explicitly, or can the matrix
be sparse? There is good support in R (see Matrix
package for e.g.) for
sparse matrices.
Keep all other processes and objects in R to a minimum when you need to make
objects of this size. Use gc()
to clear now unused memory, or, better only
create the object you need in one session.
If the above cannot help, get a 64-bit machine with as much RAM as you can afford, and install 64-bit R.
If you cannot do that there are many online services for remote computing.
If you cannot do that the memory-mapping tools like package ff
(or
bigmemory
as Sascha mentions) will help you build a new solution. In my
limited experience ff
is the more advanced package, but you should read the
High Performance Computing
topic on CRAN Task Views.