• Vendor

    Thanks for this Dale! The Census Bureau also has batch Geocoding, up to 10,000 records at a time ( but I like the Google API approach--I can visualize automating the ggmap R process as part of a daily routine  to processs new members and address changes.

    Census Geocoder
    Information about geocoding addresses using the Census geocoder
  • CU Employee Community Chair

    You bet, Naveen. Here's a little primer for anyone unfamiliar, given my humble experience.

    Essentially, geocoding involves the process of tagging the addresses of members or prospective members (or sites of interest, including competitor locations) with longitude and latitude coordinates so that they can be visualized on maps -- either as points (like geographic scatterplots) or as shadings of geographic areas (known as chloropleths) or (with smoothing) as geographic "heat maps." In the past, our credit union has done this exercise in one-off ways to:

    1. serve as one element in studies to guide branch location decisions
    2. assess geographic patterns (if any) in product ownership or marketing campaign results
    3. connect our memberships with other geographic datasets of interest (neighborhood median home values, etc.) for even better targeting

    What we've NOT done so far is to integrate geocoding into our data repository as a persistently-maintained set of fields. I'd typically figure them to be extensions of household address information in the our warehouse's address dimension. So that's where Larry and I would turn to a service like the Google Geocoding API, where a file of addresses is submitted to the service and a for each record the longitude and latitude coordinates (plus some bonus information) are returned.

    Those of us who use open source R and RStudio can engage the Google Geocoding API by means of a package that's available on the CRAN network called ggmap. Install it with the command install.packages("ggmap"), ready it with the command library("ggmap"), then submit addresses using the geocode() command. A single-line example might be stcu_hq <- geocode("1620 n signal dr, liberty lake, washington", output = "more"), where we've submitted the address of STCU's HQ and we're storing the results of the geocoding in the variable stcu_hq. Inspecting those results reveals they contain a vector with the lon (-116.8953), the lat (47.70299), the type (premise), the loctype (rooftop), the address (1620 n signal dr, liberty lake, wa 99019, usa), north/south/east/west coordinates (47.67382/47.67112/-117.0992/-117.1019), various address components, the name of the county, and the postal code. Setting the parameter output="all" gives you even more data returned. Setting output="simple" returns just lon and lat, if I'm not mistaken.

    The API limits the number of calls you can make against it in any particular day (two), and the total number of records (2500). Go beyond the limits, and the API returns the string "OVER_QUERY_LIMIT" -- so the best practice is to submit matrices of 2500 addresses in a single call. That's where the code that Shane Lynn shared comes in (see it here). Using his technique, I'd be geocoding 2500 addresses a day for about 60 days to fully capture all the geographic data on STCU's 150k households... his code facilitates that.

    Once we have addresses geocoded, THEN folks familiar with R and RStudio can use other functions in ggmap to fetch actual map graphics and plot points or shade areas. I won't go into detail on those function calls, but they're well documented. It's worthwhile to note that ggmap is, I believe, and extension of the "grammar of graphics" visualization scheme wired into the package ggplot2, SO annotation of maps and control of aesthetics follows a very similar approach. I expect folks who are keen to become handy with ggmap will do well to take a few minutes to refresh their memories about the (splendid) particulars of ggplot2.

    There are other mapping tools that use different processes to geocode and then take the results and render maps. Two that I'm only very slightly familiar with -- but I expect are the most commonly used by credit unions -- are ESRI and MapInfo. They have their fanbases, and I don't doubt for good reason. If a credit union figures to make mapping work a central aspect of their analytics, you'll want to find out more about these solutions. In some cases, reporting tools already have integrations with them. STCU leans on Information Builders for enterprise reporting, and they make ESRI mapping an integrated feature of their platform... I've just not personally invested time in learning how to leverage it. It's on my list of "to do's." :-)

    So, that's geographical data in a nutshell. I'm confident others in the group are MUCH further along in leveraging it in even more creative ways than I have. Is anyone willing to share particularly useful or insightful findings that your credit union has arrived at, that go beyond the basics I've described above? Let's use this thread to talk best practices! Thanks!

    Dale Davaz
    STCU R&D Strategist
    CULytics Community Chair 

    Batch Geocoding with R and Google maps
    I’ve recently wanted to geocode a large number of addresses (think circa 60k) in Ireland as part of a visualisation of the Irish property market. Geo…
  • CU Employee CULytics Founder

    How do you plan to use the Geocoded data about your members?

  • CU Employee Community Chair

    Great question, Larry!

    I can't speak for others, but I'm just adventurous enough (or stingy enough) that I'm headed down the path of doing just 2500 addresses a day with the Google Geocoding API as implemented in Hadley Wickham's "geocode" function within the ggmap package for R/RStudio. I'll be the first to admit that it's a drip/drip/drip approach to solving a problem, but I also know I could spend 60 days horsing around hunting more complicated solutions with higher price tags, and in the same time I could have this task done, substantially automated with some clever scripting. Then, once the whole membership job is behind me, the daily record limit is no longer a hardship when it comes to geocoding new/changed addresses.

    Anyone else have a favorite solution? If I'm not mistaken, ESRI exposes a web service that might not suffer the Google caps, but it's just enough added complication figuring out how to implement it within the context of an ESRI mapping server solution, I'm apt to take the scrappy route with RStudio.

    Dale Davaz
    STCU R&D Strategist
    CULytics Community Chair

    Getting Started  |  Google Maps Geocoding API  |  Google Developers
    Geocoding converts addresses into geographic coordinates to be placed on a map. Reverse Geocoding finds an address based on geographic coordinates or…
This reply was deleted.