Working with APIs from R

Steven Mortimer
November 15, 2016

2 Common Webservice API Types

SOAP
Simple Object Access Protocol

  • Protocol agnostic
    (HTTP, SMTP, TCP, or JMS)
  • Typically XML
  • Definitions provided by WSDL
    (Web Service Description Language)

REST
Representational State Transfer

  • Noun-Verb Paradigm
    (HTTP GET/POST/PUT/DELETE)
  • Typically formatted as JSON
    (Javascript Object Notation)

2 Common API Data Formats

XML

<person>
  <firstname>Rick</firstname>
  <lastname>James</lastname>
  <occupation>legend</occupation>
</person> 
  • Favored by SOAP APIs
  • Traditional format

JSON

{
  "person": {
    "firstname": "Rick",
    "lastname": "James",
    "occupation": "legend"
  }
}
  • Favored by REST APIs
  • A more modern, flexible approach

HTTP

  • Verbs for different actions
    • GET, PUT, POST, DELETE
  • Authentication
    • None, Basic, OAuth 2.0
  • Use httr for managing HTTP requests in R
    install.packages(‘httr’)

Required Packages


Before you get started Run the snippet of code below:

options(stringsAsFactors = FALSE)

library(dplyr)
library(purrr)
library(httr)
library(jsonlite)
library(xml2)

options(httr_oauth_cache = TRUE)

# all of these packages are part of the tidyverse
# so you could run:
# install.packages('tidyverse'); library(tidyverse)

API Simple Authentication (Key in URL)

resp <- GET(paste0('http://www.omdbapi.com/',
                   '?t=The+Godfather&plot=short&r=xml&apikey=4439909d'))
resp
Response [http://www.omdbapi.com/?t=The+Godfather&plot=short&r=xml&apikey=4439909d]
  Date: 2019-06-10 16:08
  Status: 200
  Content-Type: text/xml; charset=utf-8
  Size: 852 B
parsed_xml <- read_xml(content(resp, as="raw"))
parsed_xml
{xml_document}
<root response="True">
[1] <movie title="The Godfather" year="1972" rated="R" released="24 Mar  ...

Parsing Many Elements

resp <- GET(paste0('http://www.omdbapi.com/',
                   '?s=The+Godfather&plot=short&r=xml&apikey=4439909d'))
parsed_xml <- read_xml(content(resp, as="raw"))
parsed_xml
{xml_document}
<root totalResults="67" response="True">
 [1] <result title="The Godfather" year="1972" imdbID="tt0068646" type=" ...
 [2] <result title="The Godfather: Part II" year="1974" imdbID="tt007156 ...
 [3] <result title="The Godfather Part III" year="1990" imdbID="tt009967 ...
 [4] <result title="The Godfather Trilogy: 1901-1980" year="1992" imdbID ...
 [5] <result title="The Godfather Saga" year="1977" imdbID="tt0809488" t ...
 [6] <result title="The Godfather" year="2006" imdbID="tt0442674" type=" ...
 [7] <result title="The Last Godfather" year="2010" imdbID="tt1584131" t ...
 [8] <result title="The Godfather Family: A Look Inside" year="1990" imd ...
 [9] <result title="The Godfather II" year="2009" imdbID="tt1198207" typ ...
[10] <result title="The Black Godfather" year="1974" imdbID="tt0071225"  ...

Parsing Many Elements (cont.)


# the values of a single result are stored
# as attributes inside the XML
# always test your strategies on one record
one_record <- parsed_xml %>% 
  xml_find_all('result') %>% 
  map(as_list) %>% .[[1]]

as.data.frame(attributes(one_record))[c('title', 'year', 'imdbID', 'type')]
          title year    imdbID  type
1 The Godfather 1972 tt0068646 movie

Parsing Many Elements (cont.)


# now work on them all
search_results <- parsed_xml %>% 
  xml_find_all('result') %>% 
  map(as_list) %>%
  map_df(function(x) as.data.frame(attributes(x)))

head(search_results[c('title', 'year', 'imdbID', 'type')])
                             title year    imdbID   type
1                    The Godfather 1972 tt0068646  movie
2           The Godfather: Part II 1974 tt0071562  movie
3           The Godfather Part III 1990 tt0099674  movie
4 The Godfather Trilogy: 1901-1980 1992 tt0150742  movie
5               The Godfather Saga 1977 tt0809488 series
6                    The Godfather 2006 tt0442674   game

We just learned:

Grabbing Data from an API with
No Authentication

API with Basic Authentication

Basic Authentication just accessing via username and password. The password may or may not be encrypted.

url <- 'http://httpbin.org/basic-auth/user/passwd'
username <- "user"
password <- "passwd"
resp <- GET(url,config=authenticate("user","passwd","basic"))
content(resp, as="parsed")
$authenticated
[1] TRUE

$user
[1] "user"

API with Basic (Digest) Authentication

Digest Authentication means that the receiving server provides a special session key back that you can use when requesting information that is more secure than basic authentication.

# there is nothing inherently different about
# using digest authentication with httr compared to basic
# it's all managed behind the scenes

url <- 'http://httpbin.org/digest-auth/qop/user/passwd'
username <- "user"
password <- "passwd"
resp <- GET(url, config = authenticate("user", "passwd", "digest"))
content(resp, as="parsed")
$authenticated
[1] TRUE

$user
[1] "user"

We just learned:

Grabbing Data from an API with
Basic Authentication

API with Token/OAuth 2.0 Authentication

OAuth is application-specific (Google, Facebook, Twitter). The user authorizes a “scope” (breadth of services) to get token.

# Using personal key and secret.
# Create your own at: https://console.developers.google.com/apis
key <- "526767977974-i8pn4vvaga2utiqmeblfnpakflgq964n.apps.googleusercontent.com"
secret <- "tNJixXCExE30f_ARBzb6e4hC"
myapp <- oauth_app("google", key, secret)
myapp
<oauth_app> google
  key:    526767977974-i8pn4vvaga2utiqmeblfnpakflgq964n.apps.googleusercontent.com
  secret: <hidden>

API with Token/OAuth 2.0 Authentication

Error in httpuv::startServer(use$host, use$port, list(call = listen)) : 
  Failed to create server