A quick tutorial on how to use Azure Cosmos DB from R.

What is Azure Cosmos DB?

Azure Cosmos DB is Microsoft’s globally-distributed multi-model low-latency database service. The Azure Cosmos DB API can be accessed via SQL, Javascript, Gremlin, MongoDB, Cassandra and Azure Table Storage.


Create A New Azure Cosmos DB Instance

Login to the Microsoft Azure Console and create a new resource.

Next, click Databases then Azure Cosmos DB

In the New account dialog we will be specifying:

  • ID - Unique name for this Azure Cosmos DB instance
  • API - Which API we will use to access Azure Cosmos DB.
  • Subscription - Which subscription this instance will run under
  • Resource Group - A new or existing resource group Azure Cosmos DB should run in
  • Location - Which Azure datacenter to run this instance of Azure Cosmos DB in.
  • Geo-redundancy - Optionally enable geo-redundancy.

Setting Description
ID Unique name used to identify this Azure Cosmos DB instance. The ID can contain only lowercase letters, numbers, and the hyphen (-) character, and it must contain 3 to 50 characters.
API API determines which type of account to create. There are 5 possible choices to select from. SQL (document database), MongoDB (document database), Gremlin (graph database), Azure Table and Cassandra.
Subscription Select which account to associate this Azure Cosmos DB instance too.
Resource Group Select or create a new resource group for this instance.
Location Select which geographic region to host your Azure Cosmos DB instance.
Enable geo-redundancy If selected, this will create a replicated version of your database in a second region.

After you click Create, Azure will start provisioning your new Cosmos DB instance. This process will take a few minutes.

After the provisioning process is complete, if you click on the All resources menu option, you should see something like the image below.

Azure Cosmos DB Credentials and Connection String

Before we’re able to login, you’ll need to view your Azure Cosmos DB credentials and connection settings by clicking on your newly created Azure Cosmos DB instance from the All resources menu. Then click Connection String under the Settings menu.

Create Azure CosmosDB Database and Collection

Now click on Browse under the Collections section.

Click the Add Collection button.

Next we will enter the names for our new database and collection. We’ll go ahead and use the defaults for the rest of this form. Click OK to create your new database and collection.

You might want to adjust the collections Throughput to 2,000-4,000 if you plan on inserting a large amount of data quickly initially into this collection. Throughput can also be adjusted downward later. See Request Units in Azure Cosmos DB for more information.

Connecting to Azure Cosmos DB from R

The R notebook for this demo is located on my Github account here:

For this tutorial, we’ll be using the following R packages.


The dataset we’ll be using the SpaceX Launch Data on Kaggle. This dataset contains launch, payload and outcome information for SpaceX missions.

Let’s load this dataset into R.

launch_df <- read.csv(here("input", "spacex_launch_data.csv"), stringsAsFactors=FALSE)

I won’t be doing any feature engineering or exploratory data analysis on this dataset, but lets quickly look at the shape of our data.

## Observations: 51
## Variables: 11
## $ Flight.Number      "1", "2", "3", "4", "5", "6", "7", "8", "9",...
## $ Date               "June 4, 2010", "December 8, 2010", "May 22,...
## $ Time..UTC.         "18:45", "15:43", "7:44", "0:35", "15:10", "...
## $ Booster.Version    "F9 v1.0", "F9 v1.0", "F9 v1.0", "F9 v1.0", ...
## $ Launch.Site        "CCAFS LC-40", "CCAFS LC-40", "CCAFS LC-40",...
## $ Payload            "Dragon Spacecraft Qualification Unit", "Dra...
## $ Payload.Mass..kg.  "", "", "525Â ", "500Â ", "677Â ", "500Â ", ...
## $ Orbit              "LEO", "LEO", "LEO", "LEO", "LEO", "Polar or...
## $ Customer           "SpaceX", "NASA (COTS) NRO", "NASA (COTS)", ...
## $ Mission.Outcome    "Success", "Success", "Success", "Success", ...
## $ Landing.Outcome    "Failure (parachutes)", "Failure (parachutes...

We can quickly see that we have 51 observations or rows that each have 11 columns. Lets go ahead and insert this data into the collection we previously created in Azure Cosmos DB.

mgo <- mongo(db = "demo", collection = "launch", url="[YOUR AZURE COSMOSDB PRIMARY CONNECTION STRING])
> mgo$insert(launch_df)
List of 5
 $ nInserted  : num 51
 $ nMatched   : num 0
 $ nRemoved   : num 0
 $ nUpserted  : num 0
 $ writeErrors: list()


Looks like our insert work. You can verify this by going to the Azure Cosmos DB Console and click on Data Explorer and expand the collection name you created for this tutorial. You should see something like this:

Querying data from Azure Cosmos DB

Now, that we have data to work in Azure Cosmos DB, lets perfom a few operations with our data.

Count number of records

[1] 51

Get all records

all <- mgo$find('{}')
   Flight_Number               Date Time__UTC_                               Booster_Version
1              1       June 4, 2010      18:45                                       F9 v1.0
2              2   December 8, 2010      15:43                                       F9 v1.0
3              3       May 22, 2012       7:44                                       F9 v1.0
4              4    October 8, 2012       0:35                                       F9 v1.0
5              5      March 1, 2013      15:10                                       F9 v1.0

Limiting returned results

limit <- mgo$find(limit=3)
  Flight_Number             Date Time__UTC_ Booster_Version Launch_Site
1             1     June 4, 2010      18:45         F9 v1.0 CCAFS LC-40
2             2 December 8, 2010      15:43         F9 v1.0 CCAFS LC-40
3             3     May 22, 2012       7:44         F9 v1.0 CCAFS LC-40

Find all GTO orbit missions

gto <- mgo$find('{"Orbit": "GTO"}')
   Flight_Number              Date Time__UTC_ Booster_Version  Launch_Site
1              7  December 3, 2013      22:41         F9 v1.1  CCAFS LC-40
2              8   January 6, 2014      22:06         F9 v1.1  CCAFS LC-40
3             11    August 5, 2014       8:00         F9 v1.1  CCAFS LC-40
4             12 September 7, 2014       5:00         F9 v1.1  CCAFS LC-40
5             16     March 2, 2015       3:50         F9 v1.1  CCAFS LC-40

Find all missions that had failed landings

fail <- mgo$find('{"Landing_Outcome": { "$regex": "^Fail", "$options": "i"}}')
   Flight_Number               Date Time__UTC_                               Booster_Version
1              1       June 4, 2010      18:45                                       F9 v1.0
2              2   December 8, 2010      15:43                                       F9 v1.0
3              6 September 29, 2013      16:00                                       F9 v1.1
4             13 September 21, 2014       5:52                                       F9 v1.1
5             14   January 10, 2015       9:47                                       F9 v1.1

Result projection

fail_projection <- mgo$find(
  query = '{"Landing_Outcome": { "$regex": "^Fail", "$options": "i"}}',
  fields = '{"Flight_Number": true, "Landing_Outcome": true, "_id": false}')

   Flight_Number                                                  Landing_Outcome
1              1                                             Failure (parachutes)
2              2                                             Failure (parachutes)
3              6                                                  Failure (ocean)
4             13                                                  Failure (ocean)
5             14                                             Failure (drone ship)
6             17                                             Failure (drone ship)
7             21                                             Failure (drone ship)
8             22                                             Failure (drone ship)
9             26                                             Failure (drone ship)
10          FH 1 Failure (drone ship), Success (ground pad), Success (ground pad)

See the Mongolite User Manual for more information on the R Mongolite syntax.


I’ve been very impressed with Azure Cosmos DB. Onboarding was quite easy. Most of my difficulties were finding the right syntax to use with the Mongolite driver.

Very impressed with the speed of Azure Cosmos DB. I did run into a few errors with inserting millions of records as fast as possible. That had to due with my Throughput being too low. Would be nice if there was an option to automatically scale throughput to a max value and warn me if that max value has been exceeded over a period of time.

Would love to hear your experiences with using Azure Cosmos DB.