rmongodb 1.8.0

· · Read in about 2 min · (353 words)

Today I’m introducing new version of rmongodb (which I started to maintain) – v1.8.0. Install it from github:

library(devtools)
install_github("mongosoup/[email protected]")

Release version will be uploaded to CRAN shortly. This release brings a lot of improvements to rmongodb:

  1. Now rmongodb correctly handles arrays.
    • mongo.bson.to.list() rewritten from scratch. R’s unnamed lists are treated as arrays, named lists as objects. Also it has an option – whether to try to simplify vanilla lists to arrays or not.
    • mongo.bson.from.list() updated.
  2. mongo.cursor.to.list() rewritten and has slightly changed behavior – it doesn’t produce any type coercions while fetching data from cursor.
  3. mongo.aggregation() has new options to match MongoDB 2.6+ features. Also second argument now called pipeline (as it is called in MongoDB command).
  4. new function mongo.index.TTLcreate() – creating indexes with “time to live” property.
  5. R’s NA values now converted into MongoDB null values.
  6. many bug fixes (including troubles with installation on Windows) – see full list

I want to highlight some of changes.
The first most important is that now rmongodb correctly handles arrays. This issue was very annoying for many users (including me :-). Moreover about half of rmongodb related questions at stackoverflow were caused by this issue. In new version of package, mongo.bson.to.list() is rewritten from scratch and mongo.bson.from.list() fixed. I heavily tested new behaviour and all works very smooth. Still it’s quite big internal change, because these fucntions are workhorses for many other high-level rmongodb functions. Please test it, your feedback is very wellcome. For example here is convertion of complex JSON into BSON using mongo.bson.from.JSON() (which internally call mongo.bson.from.list()):

library(rmongodb)
json_string <- '{"_id": "dummyID", "arr":["string",3.14,[1,"2",[3],{"four":4}],{"mol":42}]}'
bson <- mongo.bson.from.JSON (json_string)

This will produce following MongoDB document:

{"_id": "dummyID", "arr":["string",3.14,[1,"2",[3],{"four":4}],{"mol":42}]}  

The second one is that mongo.cursor.to.list() has new behaviour: it returns plain list of objects without any coercion. Each element of list corresponds to a document of underlying query result. Additional improvement is that mongo.cursor.to.list() uses R’s environments to avoid extra copying, so now it is much more efficient than previous version (especially when fetching a lot of records from MongoDB).

In the next few releases I have plans to upgrade underlying mongo-c-driver-legacy to latest version 0.8.1.