Working with APIs from R

Steven Mortimer
November 15, 2016

2 Common Webservice API Types

Resources:
The Difference Between SOAP and REST
SOAP vs REST Challenges

SOAP
Simple Object Access Protocol

Protocol agnostic
(HTTP, SMTP, TCP, or JMS)
Typically XML
Definitions provided by WSDL
(Web Service Description Language)

REST
Representational State Transfer

Noun-Verb Paradigm
(HTTP GET/POST/PUT/DELETE)
Typically formatted as JSON
(Javascript Object Notation)

2 Common API Data Formats

XML

<person>
  <firstname>Rick</firstname>
  <lastname>James</lastname>
  <occupation>legend</occupation>
</person>

Favored by SOAP APIs
Traditional format

JSON

{
  "person": {
    "firstname": "Rick",
    "lastname": "James",
    "occupation": "legend"
  }
}

Favored by REST APIs
A more modern, flexible approach

HTTP

Resources:
Extracting data from the web: APIs and beyond

Verbs for different actions
- GET, PUT, POST, DELETE
Authentication
- None, Basic, OAuth 2.0
Use httr for managing HTTP requests in R

install.packages(‘httr’)

Required Packages

Before you get started Run the snippet of code below:

options(stringsAsFactors = FALSE)

library(dplyr)
library(purrr)
library(httr)
library(jsonlite)
library(xml2)

options(httr_oauth_cache = TRUE)

# all of these packages are part of the tidyverse
# so you could run:
# install.packages('tidyverse'); library(tidyverse)

API Simple Authentication (Key in URL)

The Open Movie Database API at: http://www.omdbapi.com/

resp <- GET(paste0('http://www.omdbapi.com/',
                   '?t=The+Godfather&plot=short&r=xml&apikey=4439909d'))
resp

Response [http://www.omdbapi.com/?t=The+Godfather&plot=short&r=xml&apikey=4439909d]
  Date: 2019-06-10 16:08
  Status: 200
  Content-Type: text/xml; charset=utf-8
  Size: 852 B

parsed_xml <- read_xml(content(resp, as="raw"))
parsed_xml

{xml_document}
<root response="True">
[1] <movie title="The Godfather" year="1972" rated="R" released="24 Mar  ...

Parsing Many Elements

resp <- GET(paste0('http://www.omdbapi.com/',
                   '?s=The+Godfather&plot=short&r=xml&apikey=4439909d'))
parsed_xml <- read_xml(content(resp, as="raw"))
parsed_xml

{xml_document}
<root totalResults="67" response="True">
 [1] <result title="The Godfather" year="1972" imdbID="tt0068646" type=" ...
 [2] <result title="The Godfather: Part II" year="1974" imdbID="tt007156 ...
 [3] <result title="The Godfather Part III" year="1990" imdbID="tt009967 ...
 [4] <result title="The Godfather Trilogy: 1901-1980" year="1992" imdbID ...
 [5] <result title="The Godfather Saga" year="1977" imdbID="tt0809488" t ...
 [6] <result title="The Godfather" year="2006" imdbID="tt0442674" type=" ...
 [7] <result title="The Last Godfather" year="2010" imdbID="tt1584131" t ...
 [8] <result title="The Godfather Family: A Look Inside" year="1990" imd ...
 [9] <result title="The Godfather II" year="2009" imdbID="tt1198207" typ ...
[10] <result title="The Black Godfather" year="1974" imdbID="tt0071225"  ...

Parsing Many Elements (cont.)

# the values of a single result are stored
# as attributes inside the XML
# always test your strategies on one record
one_record <- parsed_xml %>% 
  xml_find_all('result') %>% 
  map(as_list) %>% .[[1]]

as.data.frame(attributes(one_record))[c('title', 'year', 'imdbID', 'type')]

          title year    imdbID  type
1 The Godfather 1972 tt0068646 movie

Parsing Many Elements (cont.)

# now work on them all
search_results <- parsed_xml %>% 
  xml_find_all('result') %>% 
  map(as_list) %>%
  map_df(function(x) as.data.frame(attributes(x)))

head(search_results[c('title', 'year', 'imdbID', 'type')])

                             title year    imdbID   type
1                    The Godfather 1972 tt0068646  movie
2           The Godfather: Part II 1974 tt0071562  movie
3           The Godfather Part III 1990 tt0099674  movie
4 The Godfather Trilogy: 1901-1980 1992 tt0150742  movie
5               The Godfather Saga 1977 tt0809488 series
6                    The Godfather 2006 tt0442674   game

We just learned:

Grabbing Data from an API with
No Authentication

API with Basic Authentication

Basic Authentication just accessing via username and password. The password may or may not be encrypted.

url <- 'http://httpbin.org/basic-auth/user/passwd'
username <- "user"
password <- "passwd"
resp <- GET(url,config=authenticate("user","passwd","basic"))
content(resp, as="parsed")

$authenticated
[1] TRUE

$user
[1] "user"

API with Basic (Digest) Authentication

Digest Authentication means that the receiving server provides a special session key back that you can use when requesting information that is more secure than basic authentication.

# there is nothing inherently different about
# using digest authentication with httr compared to basic
# it's all managed behind the scenes

url <- 'http://httpbin.org/digest-auth/qop/user/passwd'
username <- "user"
password <- "passwd"
resp <- GET(url, config = authenticate("user", "passwd", "digest"))
content(resp, as="parsed")

$authenticated
[1] TRUE

$user
[1] "user"

We just learned:

Grabbing Data from an API with
Basic Authentication

API with Token/OAuth 2.0 Authentication

OAuth is application-specific (Google, Facebook, Twitter). The user authorizes a “scope” (breadth of services) to get token.

# Using personal key and secret.
# Create your own at: https://console.developers.google.com/apis
key <- "526767977974-i8pn4vvaga2utiqmeblfnpakflgq964n.apps.googleusercontent.com"
secret <- "tNJixXCExE30f_ARBzb6e4hC"
myapp <- oauth_app("google", key, secret)
myapp

<oauth_app> google
  key:    526767977974-i8pn4vvaga2utiqmeblfnpakflgq964n.apps.googleusercontent.com
  secret: <hidden>

API with Token/OAuth 2.0 Authentication

Error in httpuv::startServer(use$host, use$port, list(call = listen)) : 
  Failed to create server

Working with APIs from R

2 Common Webservice API Types

2 Common API Data Formats

HTTP

Required Packages

API Simple Authentication (Key in URL)

Parsing Many Elements

Parsing Many Elements (cont.)

Parsing Many Elements (cont.)

We just learned: Grabbing Data from an API withNo Authentication

API with Basic Authentication

API with Basic (Digest) Authentication

We just learned: Grabbing Data from an API withBasic Authentication

API with Token/OAuth 2.0 Authentication

API with Token/OAuth 2.0 Authentication

We just learned:

Grabbing Data from an API with
No Authentication

We just learned:

Grabbing Data from an API with
Basic Authentication