Scrapping from java chart and drop down menu

Im trying to scrape the data from https://www.snowyhydro.com.au/our-energy/water/storages/lake-levels-calculator/ Iam trying to scrape the lake levels for different years further by using the drop down menu by R. Ar the moment Iam struggling as to where to start as I have searched various codes online and Iam unable to get a starting point as to how I can get yearly values for different lakes and Iam using R

Iam trying to use the selector gadget here but its not working as I reckon the charts are Java based

library('rvest')

url <- 'https://www.snowyhydro.com.au/our-energy/water/storages/lake-levels-calculator/'
webpage <- read_html(url)

Iam looking for tabular results for daily storage levels for all he lakes.

1 answer

  • answered 2019-02-10 14:04 clmarquart

    I was able to find a better url to use for requesting the data: "https://www.snowyhydro.com.au/wp-content/themes/basic/get_dataxml.php

    The JSON response of the request isn't clearly interpreted as a table, but I think the functions here should accomplish that for you:

    library(httr)
    library(jsonlite)
    
    # This function is called from within the other to convert each day 
    # to its own dataframe, creating extra columns for the year, month, and day
    entry.to.row <- function(entry) {
      date = entry[["-date"]]
      entry.df = data.frame(
        matrix(unlist(entry$lake), nrow=length(entry$lake), byrow = T), 
        stringsAsFactors = F
      )
      colnames(entry.df) = c("LakeName", "Date","Measurement")
      entry.df$Date = date
    
      date.split = strsplit(date, split = "-")[[1]]
      entry.df$Year = date.split[1]
      entry.df$Month = date.split[2]
      entry.df$Day = date.split[3]
      entry.df
    }
    
    # Fetch the data for two years and convert them into two data.frames which 
    # we will then merge into a single data.frame
    fetch.data <- function(
      base.url = "https://www.snowyhydro.com.au/wp-content/themes/basic/get_dataxml.php",
      current,
      past
    ) {
      fetched = httr::POST(
        url = base.url, 
        body = list("year_current"=current, "year_pass"=past)
      )
    
      datJSON = fromJSON(content(fetched, as = "text"), simplifyVector = F)
    
      pastJSON = datJSON$year_pass$snowyhydro$level
      pastEntries = do.call("rbind", lapply(pastJSON, entry.to.row))
    
      currentJSON = datJSON$year_current$snowyhydro$level
      currentEntries = do.call("rbind", lapply(currentJSON, entry.to.row))
    
      rbind(pastEntries, currentEntries)
    }
    
    # Fetch the data for 2019 and 2018
    dat = fetch.data(current=2019, past=2018)
    
    
    > head(dat)
                  LakeName       Date Measurement Year Month Day
    1       Lake Eucumbene 2018-01-01       46.40 2018    01  01
    2       Lake Jindabyne 2018-01-01       85.80 2018    01  01
    3 Tantangara Reservoir 2018-01-01       42.94 2018    01  01
    4       Lake Eucumbene 2018-01-02       46.41 2018    01  02
    5       Lake Jindabyne 2018-01-02       85.72 2018    01  02
    6 Tantangara Reservoir 2018-01-02       42.98 2018    01  02