I need advice in speeding up this rails method that involves many queries

I'm trying to display a table that counts webhooks and arranges the various counts into cells by date_sent, sending_ip, and esp (email service provider). Within each cell, the controller needs to count the webhooks that are labelled with the "opened" event, and the "sent" event. Our database currently includes several million webhooks, and adds at least 100k per day. Already this process takes so long that running this index method is practically useless.

I was hoping that Rails could break down the enormous model into smaller lists using a line like this:

@today_hooks = @m_webhooks.where(:date_sent => this_date)

I thought that the queries after this line would only look at the partial list, instead of the full model. Unfortunately, running this index method generates hundreds of SQL statements, and they all look like this:

SELECT COUNT(*) FROM "m_webhooks" WHERE "m_webhooks"."date_sent" = $1 AND "m_webhooks"."sending_ip" = $2 AND (m_webhooks.esp LIKE 'hotmail') AND (m_webhooks.event LIKE 'sent') 

This appears that the "date_sent" attribute is included in all of the queries, which implies that the SQL is searching through all 1M records with every single query.

I've read over a dozen articles about increasing performance in Rails queries, but none of the tips that I've found there have reduced the time it takes to complete this method. Thank you in advance for any insight.


def index
    def set_sub_count_hash(thip) {
      gmail_hooks: {opened: a = thip.gmail.send(@event).size, total_sent: b = thip.gmail.sent.size, perc_opened: find_perc(a, b)},
      hotmail_hooks: {opened: a = thip.hotmail.send(@event).size, total_sent: b = thip.hotmail.sent.size, perc_opened: find_perc(a, b)},
      yahoo_hooks: {opened: a = thip.yahoo.send(@event).size, total_sent: b = thip.yahoo.sent.size, perc_opened: find_perc(a, b)},
      other_hooks: {opened: a = thip.other.send(@event).size, total_sent: b = thip.other.sent.size, perc_opened: find_perc(a, b)},

    @m_webhooks = MWebhook.select("date_sent", "sending_ip", "esp", "event", "email").all
    @event = params[:event] || "unique_opened"

    @m_list_of_ips = [#List of three ip addresses]

    end_date = Date.today
    start_date = Date.today - 10.days
    date_range = (end_date - start_date).to_i
    @count_array = []
    date_range.times do |n|
      this_date = end_date - n.days
      @today_hooks = @m_webhooks.where(:date_sent => this_date)
      @count_array[n] = {:this_date => this_date}
      @m_list_of_ips.each_with_index do |ip, index|
        thip = @today_hooks.where(:sending_ip => ip)  #Stands for "Today Hooks ip"
        @count_array[n][index] = set_sub_count_hash(thip)

1 answer

  • answered 2018-07-11 04:58 Tiago Farias

    Well, your problem is very simple, actually. You gotta remember that when you use where(condition), the query is not straight executed in the DB.

    Rails is smart enough to detect when you need a concrete result (a list, an object, or a count or #size like in your case) and chain your queries while you don't need one. In your code, you keep chaining conditions to the main query inside a loop (date_range). And it gets worse, you start another loop inside this one adding conditions to each query created in the first loop.

    Then you pass the query (not concrete yet, it was not yet executed and does not have results!) to the method set_sub_count_hash which goes on to call the same query many times.

    Therefore you have something like:

    10(date_range) * 3(ip list) * 8 # (times the query is materialized in the #set_sub_count method) 

    and then you have a problem.

    What you want to do is to do the whole query at once and group it by date, ip and email. You should have a hash structure after that, which you would pass to the #set_sub_count method and do some ruby gymnastics to get the counts you're looking for.

    I imagine the query something like:

    main_query = @m_webhooks.where('date_sent > ?', 10.days.ago.to_date)

    Ok, now you have one query, which is nice, but I think you should separate the query in 4 (gmail, hotmail, yahoo and other), which gives you 4 queries (the first one, the main_query, will not be executed until you call for materialized results, don forget it). Still, like 100 times faster.

    I think this is the result that should be grouped, mapped and passed to #set_sub_count instead of passing the raw query and calling methods on it every time and many times. It will be a little work to do the grouping, mapping and counting for sure, but hey, it's faster. =)