How to Test Connections to the Complete PDSR Curated Data Set Using R - Enclave Linux VDI

How to Test Connections to the Complete PDSR Curated Data Set Using R - Enclave Linux VDI

Table of Contents

  • Introduction
  • Requirements or Prerequisites
  • Step-by-Step Instructions
    • Step 1. Install R library – odbc and other modules
    • Step 2. Configuring the User DSN in ~/.odbc.ini
    • Step 3. Use a connection token to access the SQL database
      • Example 1. Get the Access Token and Connect to PDSR
      • Results
  • Limitations
  • Relevant References

Purpose


This document outlines the requirements and step-by-step setup instructions for connecting to the Complete PDSR Curated Data Set Using R from the
Enclave Linux VDI or supported browsers.

Introduction

This quick start guide outlines the instructions for the successful connection to the Complete PDSR Curated Data Set (a SQL database in Azure) using R in the Enclave Linux VDI.

This solution is optimized for connecting to Azure Microsoft SQL Server with odbc using an access token from a Linux environment.

Requirements or Prerequisites

Before attempting to connect to the Azure Microsoft SQL Server from your Linux VDI, the following requirements are needed:

  • An Enclave Linux VDI domain account. If you do not have one, please complete and submit the Access Request Form.
  • Install required R packages and Azure packages (see Install Azure required packages below).
  • A Terminal (Command Line)
  • Familiarity with basic Linux commands
  • Basic understanding of SQL Server Management Studio (SSMS) and how to create queries and execute SQL commands.

Step-by-Step Instructions

Step 1. Install R library – odbc and other modules

You will need to install the R library – odbc. odbc is a popular open-source R package that facilitates the access of ODBC databases.

  • Open a Terminal window on your Linux Virtual Desktop
  • Install odbc R package

You can use the install.package command in the RStudio terminals to install the package. It will download and compile the odbc source code. Some related components and source files must be available for the compile to succeed.


On Ubuntu systems, all you need to do is run the following command in a terminal:
install.packages("odbc")

install.packages("odbc")
  • Check Microsoft ODBC Driver 18 configuration

You can use odbc via Microsoft ODBC Driver 18. This driver is preinstalled in the Linux VDI.

msodbcsql18_18.1.2.1-1_amd64.deb 03-Nov-2022

The configuration of the Microsoft ODBC Driver 18 is in odbcinst.ini. You can see the attached odbcinst.ini in the /etc folder.

Check the configuration of Microsoft ODBC Driver 18 by running the following command:

cat/etc/odbcinst.ini

It should look like this:

 

  • Install Azure required packages

In the RStudio terminals, you can install the following Azure required packages.


Install the AzureAuth package in R using the install.packages("AzureAuth") command. AzureAuth provides Azure Active Directory (AAD) authentication functionality for R users of Microsoft's Azure cloud. Use this package to obtain OAuth 2.0 tokens for Azure services including Azure Resource Manager, Azure Storage and others. Both AAD v1.0 and v2.0 are supported.


Install AzureRMR package in R using the install.packages("AzureRMR") command. AzureRMR is a package for interacting with Azure Resource Manager: list subscriptions, manage resource groups, deploy and delete templates andresources. It calls the Resource Manager REST API directly, so you don't need to have PowerShell or Python installed. Azure Active Directory OAuth tokens are obtained using the AzureAuth package.

The httpuv package must be installed to use the authorization_code method, as this requires a web server to listen on the (local) redirect URI.

Step 2. Configuring the User DSN in ~/.odbc.ini

In the odbc.ini file, you can configure your DSN name, SQL Server and port with Driver=Microsoft ODBC Driver 18 for SQL Server information.

You can copy the following example odbc.ini to your home folder as ~/.odbc.ini and modify it as needed.

Make sure to include the following lines to configure ~/.odbc.ini

In this example, the following entries are used:

database = PDSR_Monthly
driver= ODBC Driver 18 for SQL Server

Server = mgb-risc-lds-prod-e2-sqlmi.bce5b1eaf95b.database.windows.net


Both odbc.ini and odbcinst.ini are to be copy to /etc folder in the Linux VDI.

Step 3. Use a connection token to access the SQL database

Programming Examples Using odbc

Example 1. Get the Access Token and Connect to PDSR

The following examples uses Interactive Browser to ask for the user's managed identity and get the access token from Azure and connect to SQL Database, PDSR, in Azure.

Program

You can create your own test program by copying Connect to PDSR in Azure.R as below in RStudio.

# Load the libraries:

# Load the AzureAuth package into your R session using the library(AzureAuth) command.

# The main function in AzureAuth is get_azure_token, which obtains an OAuth

# token from AAD. The token is cached in a user-specific directory using the

# rappdirs package, and future requests will use the cached token without

# needing you to reauthenticate.

library(AzureAuth)

# Under the hood, AzureRMR uses a similar authentication process to the Azure CLI.

# The first time you authenticate with a given Azure Active Directory tenant,

# you call create_azure_login() and supply your credentials.

# AzureRMR will prompt you for permission to create a special data directory

# in which to cache the obtained authentication token and Resource Manager login.

# Once this information is saved on your machine, it can be retrieved in

# subsequent R sessions with get_azure_login(). Your credentials will be

# automatically refreshed so you don't have to reauthenticate.

library(AzureRMR)

# Use the OAuth token to connect to the Microsoft SQL Server in Azure.

# You can use any R package that supports connecting to SQL Server, such as odbc or RODBC.

library(odbc)

# on first use to authenticate with Azure AD:
# - on first login to this client, call create_azure_login()
# - on subsequent logins, call get_azure_login()
az <- create_azure_login()
list_azure_tokens()

#get the access token

resource <- "https://database.windows.net"

tenant <- "720edb1f-5c4e-4043-8141-214a63a7ead5"

app<- "04b07795-8ddb-461a-bbee-02f9e1bf7b46"

# Get Azure authentication token interactively

auth <- get_azure_token(resource,tenant,app)

# using auth as access token

accessToken <- auth$credentials$access_token;

conn <- dbConnect(odbc::odbc(), dsn = "PDSR_Monthly", Encrypt = "yes", attributes = list("azure_token" = accessToken))

# Perform SQL query

query <- "SELECT getdate()"

result <- dbGetQuery(conn, query)

#print the result

print(result)

You can verify your connection to the PDSR database in Azure by hitting the run button.

Results

The test program will ask you to pick your Azure account, which will be your Mass General Brigham-associated email address.

And then ask you to enter your MGB username and password in the pop-up window.

After verifying your credentials, you will see the message: “Authenticated with Azure Directory. Please close this page to return to R.


After closing this web page and returning to R, you should see the results (get date and time by getdate() function) .

Remember: The access token has a time limit before it expires. It lasts for ~60 minutes.

If you encounter an error describing the access token error, you will need to rerun this program to re-generate a new token

Limitations

Please Note: The above code was tested to run successfully under R 3.7.6 and ODBC Driver 18 for SQL Server

Relevant References

  1. How to connect Azure SQL database from R Function App using managed identity or access token - Microsoft Community Hub
  2. Microsoft Doc - How to connect Azure SQL database from Azure App Service Windows using managed identity. - Microsoft Community Hub
  3. PDSR LDS Dashboard in Collibra
  4. GitHub - Azure/AzureAuth: R package for OAuth 2.0 authentication with Azure Active Directory
  5. GitHub - Azure/AzureRMR: R package for interacting with Azure Resource ManagerEdit Labels
  6. https://cran.r-project.org/web/packages/odbc/odbc.pdf
  7. https://rdrr.io/github/rstats-db/odbc/src/R/Connection.R

 

Go to KB0040356 in the IS Service Desk

Related articles

Analytics Enclave

IDEA Platform