Whether you own your own business or consult for a business using Square to capture payment data, Square can offer some amazing opportunities to gain insights by leveraging their Connect v1 & v2 APIs. The Square data backend operates much like a CRM system that holds information about transactions that customers make when purchasing items offered at a location. Naturally, there are API endpoints for each of the object types italicized above (Locations, Customers, Transactions, and Items) and many more endpoints for administrative tasks (employees, roles, timecards, refunds, etc.).
While working with Square data I decided to develop an R package squareupr that makes it easier to retrieve Square data from R so that you can focus on the analysis. After installing the squareupr package you must authenticate by supplying a personal access token (PAT) or using an OAuth 2.0 flow. You can find your PAT by logging into the Square Dashboard -> Apps -> My Apps. Create an app or click “Manage App” if you’ve already created one and there you should see your personal access token:
# The squareupr package is not yet available on CRAN so you must install from GitHub
# install.packages("devtools")
devtools::install_github("StevenMMortimer/squareupr")
library(tidyverse)
library(squareupr)
# authenticate using your Personal Access Token (PAT)
sq_auth(personal_access_token = "sq-Th1s1sMyPers0nalAcessT0ken")
The package also offers OAuth 2.0 authentication. More information is available <a target="_blank" href-“https://github.com/StevenMMortimer/squareupr#authenticate”>here.
As mentioned above there are endpoints for every major type of data stored by Square. The API documentation does a very good job at laying out how requests should be made to each of those endpoints. One thing to note is that it is important to first pull down the location details for your business because the location is often required when searching for things like transactions and items.
our_locations <- sq_list_locations()
our_locations$name <- "{HIDDEN}"
our_locations %>% select(id, name, address, timezone,
capabilities, status, created_at)
#> # A tibble: 5 x 7
#> id name address timezone capabilities status created_at
#> <chr> <chr> <list> <chr> <list> <chr> <chr>
#> 1 46FYN9N9RQS54 {HIDDEN} <list [… America/… <list [1]> ACTIVE 2017-04-2…
#> 2 DRDCJ2X8E2PMV {HIDDEN} <list [… America/… <list [1]> ACTIVE 2016-09-2…
#> 3 8T1TYXE840S00 {HIDDEN} <list [… America/… <list [1]> ACTIVE 2016-09-2…
#> 4 1AWPRVVVFWGQF {HIDDEN} <list [… America/… <list [1]> ACTIVE 2017-04-1…
#> 5 50X1GNAWEC8V0 {HIDDEN} <list [… America/… <list [1]> ACTIVE 2017-03-0…
Now that you have the location ids if you would like to pull all of the transactions
during a given timeframe, you would use the function sq_list_transactions()
.
# list all transactions for our 2nd location on May 11, 2018
# by default, if a date is provided with no time, then the time component is set to midnight
our_transactions <- sq_list_transactions(location = our_locations$id[2],
begin_time = as.Date('2018-05-11'),
end_time = as.Date('2018-05-12'))
our_transactions
#> # A tibble: 245 x 6
#> id location_id created_at tenders product client_id
#> <chr> <chr> <chr> <list> <chr> <chr>
#> 1 bUjFGVjBvN… DRDCJ2X8E2P… 2018-05-12T0… <list … REGIST… D5528FBA-E5DE-4…
#> 2 5PZP31N5Zs… DRDCJ2X8E2P… 2018-05-11T2… <list … REGIST… A3A1FF51-325A-4…
#> 3 BTrGydD6he… DRDCJ2X8E2P… 2018-05-11T2… <list … REGIST… 2B3D32EB-8E58-4…
#> 4 XsqOAHl68z… DRDCJ2X8E2P… 2018-05-11T2… <list … REGIST… C50AF3D7-BE32-4…
#> 5 vmLRzrwByS… DRDCJ2X8E2P… 2018-05-11T2… <list … REGIST… 52E40E1B-2333-4…
#> 6 pTbzQApZW7… DRDCJ2X8E2P… 2018-05-11T2… <list … REGIST… 962766FF-1436-4…
#> 7 lnE20zklpP… DRDCJ2X8E2P… 2018-05-11T2… <list … REGIST… A02191CC-9AC9-4…
#> 8 DSumrqQW0L… DRDCJ2X8E2P… 2018-05-11T2… <list … REGIST… 1135FF4F-9B89-4…
#> 9 tPwFXetIwe… DRDCJ2X8E2P… 2018-05-11T2… <list … REGIST… 0D95E79D-B44C-4…
#> 10 bqUuFrzH71… DRDCJ2X8E2P… 2018-05-11T2… <list … REGIST… 48FD6A49-80A9-4…
#> # ... with 235 more rows
At first glance there does not appear to be very much detail on the transaction record.
However, the tender
field represents a method of payment used in a Square transaction
so it contains information regarding the amount of money paid in total, in Square fees,
and tip. The tender
field even contains information regarding the customer_id
and
credit card information. In the following I will loop through the transactions in
April 2018 and determine the total spend for each customer in the dataset. Note:
The call to sq_list_transactions
may take a couple minutes to complete if you are pulling
thousands of transactions.
april_transactions <- sq_list_transactions(location = our_locations$id[2],
begin_time = as.Date('2018-04-01'),
end_time = as.Date('2018-05-01'))
In order to extract the customer ID and money spent I create a function
that checks for the tender
object on the transaction and if it exists tries
to extract the data into a tbl_df
. When I supply this function as
map_df(extract_cust_info_func) %>%
I get the data from each transaction stacked
into a single tbl_df
that’s ready to analyze.
# create a function that will extract out just the customer id and money spent
extract_cust_info_func <- function(x){
if(!is.null(x$tender)){
tibble(customer_id = sq_null_to_na(x$tender[[1]]$customer_id),
money_spent = sq_null_to_na(x$tender[[1]]$amount_money$amount))
} else {
tibble(customer_id = NA_character_,
money_spent = NA_integer_)
}
}
april_customer_spend <- april_transactions %>%
transpose() %>%
# pull out just the information we want from each transaction
map_df(extract_cust_info_func) %>%
group_by(customer_id) %>%
summarize(total_spend = sum(money_spent, na.rm=TRUE))
april_customer_spend
#> # A tibble: 208 x 2
#> customer_id total_spend
#> <chr> <int>
#> 1 064HFDQG0N52AHDBSG00C1BAC8 1700
#> 2 07VNWH1V4S6W4W2EJ4AN7SEJNR 0
#> 3 08M453QNJ97BCT97SM09TN7QK4 0
#> 4 08XE43X8FS0MPX8P2W4N0DEQY0 350
#> 5 0CZ78CVRW12V7AZET8S3S82GGW 675
#> 6 0G1J81148H42GGMTMQKRWSJHGC 0
#> 7 0V1Y1BX23WYRK889ERVBE2T0KM 900
#> 8 13HHFBFZTD33RX0RSJNAZQKV5M 0
#> 9 1BFCHB9MK91GQ39HTD7QK6R7ZR 0
#> 10 1DR0AK5GKX57H9ER9SG1JF01P0 2900
#> # ... with 198 more rows
The amounts in april_customer_spend
may seem large, but the Square APIs return
money as integers that represent whole cents. If you divide by 100, then you will
have the money amounts in dollars.
Square has this concept of “groups” that customers belong to. These groups can be
fashioned to do marketing campaigns complete with email blasts. In our analysis
let’s further determine which groups these customers belong to. The Square API has
an endpoint to retrieve one customer at a time; however, with large lists you
may get subjected to rate limiting. Rate limiting is errors on your requests because
too many are coming from the same application or access token. I would recommend
pulling down the entire list of customers with sq_list_customers()
and then
matching them up in R.
cust_groups <- sq_list_customers() %>%
select(id, groups) %>%
sq_extract_cust_groups() %>%
# filter to only the groups designated automatically by Square
filter(grepl("^CQ689YH4KCJMY", groups.id))
cust_groups
#> # A tibble: 13,444 x 3
#> id groups.id groups.name
#> <chr> <chr> <chr>
#> 1 M1RBDFRK7S1Q1EP6EZFJFV3CBW CQ689YH4KCJMY.LOYALTY_ALL Loyalty Participa…
#> 2 58MK9F1HQ5447D1QZDX60NHTP4 CQ689YH4KCJMY.CHURN_RISK Lapsed
#> 3 58MK9F1HQ5447D1QZDX60NHTP4 CQ689YH4KCJMY.REACHABLE Reachable
#> 4 MBSJA4QV4WX6N2XV8WV9VJJTG8 CQ689YH4KCJMY.LOYALTY_ALL Loyalty Participa…
#> 5 MBSJA4QV4WX6N2XV8WV9VJJTG8 CQ689YH4KCJMY.REACHABLE Reachable
#> 6 ZCBZJ234217KTV812WX4DP2404 CQ689YH4KCJMY.REACHABLE Reachable
#> 7 FKEMR8KZCN3BH98RV78PKHKQ1R CQ689YH4KCJMY.LOYALTY_ALL Loyalty Participa…
#> 8 FKEMR8KZCN3BH98RV78PKHKQ1R CQ689YH4KCJMY.LOYAL Regulars
#> 9 78VMJPJNK959AHH0ZQPXDXEG3C CQ689YH4KCJMY.LOYALTY_ALL Loyalty Participa…
#> 10 QASM1G54VX0QN2S15YS6KHEFCC CQ689YH4KCJMY.LOYAL Regulars
#> # ... with 13,434 more rows
Now that we know the customer memberships, let’s join back with the transaction data to determine the average total spend across the different membership groups.
cust_groups %>%
# bring in the spend data
inner_join(., april_customer_spend, by=c('id'='customer_id')) %>%
# group by the customer groups to find average spend per group
group_by(groups.name) %>%
summarize(avg_spend = mean(total_spend / 100, na.rm=TRUE))
#> # A tibble: 6 x 2
#> groups.name avg_spend
#> <chr> <dbl>
#> 1 Cards on File 569.
#> 2 Lapsed 20.3
#> 3 Loyalty Participants 13.5
#> 4 Nearing a Loyalty Reward 15.3
#> 5 Reachable 51.0
#> 6 Regulars 27.4
What is great about this analysis is that we can use the Square APIs to quickly and reliably pull down transaction data, match it to customer information and see how certain customer groupings or campaigns are performing. However, I did find some quirks and drawbacks that I wish the Square team would consider:
First, it would be nice to implement, as part of the query parameters, the ability to
only return certain fields (e.g. the id
and groups
fields from the Customer object).
This would help for two reasons: 1) It would improve the speed since only the required
data would be passed across and 2) For analysts who do not want to deal with personally
identifiable information (even in RAM) you would not be forced to pull information like
phone number, email, names, and even credit card information.
Second, the API will not allow you to update the customer groups programmatically. This means that you are stuck using the dashboard to create and assign customers to the groups you want to analyze. It is a drag when you want to create a reproducible research workflow to assign customer groups but the API hinders your ability to do so.
Finally, it appears that the customer ID associated to a transaction is not reliably captured on every transaction’s tender object. This means that you cannot get a complete picture of all the transactions. I believe part of this is an issue with the merchant capturing the customer information at the point of sale, but also partly Square’s fault since they seem to be able to piece everything together in their dashboard. However, this is not the case with the APIs. Overall, the Square APIs are a rich data resource for helping run a business and they should only get better with time as development progresses.