Ruby storing data for queries

I have a string


This is telephone outbound call data where each new line represents a new phone call. (Call From, Call To, Duration, Line Type) I want to save this data in a way that allows me to query a specific number and get a string output of the number, its type, its total minutes used, and all the calls that it made (outbound calls). I just want to do this in a single ruby file.

Thus typing in this



4813243948, Type 2, 3.9 Minutes total
 1234433948, 1.3
 2435677524, 1.3
 5245654367, 1.3

I am wondering if I should try to store values in arrays, or create a custom class and make each number an object of a class then append the calls to each number.. not sure how to do the class method. Having a different array for each number seems like it would get cluttered as there are thousands of numbers and millions of calls. Of course, the provided input string is a very small portion of the real source.

2 answers

  • answered 2018-03-13 21:45 Roma149

    If you only want to make queries for the number the call originated from, you could store the data in a hash where the keys are the "call from" numbers and the value is an array, or another hash, containing the rest of the data. For example:

    { '4813243948': { call_to: 1234433948, duration: 1.3, line_type: 'Type2' }, ... }

    If the dataset is very large, or you need more complex queries, it might be better to store it in a database and just query it directly.

  • answered 2018-03-13 22:14 Sergio Tulentsev

    I have a string "4813243948,1234433948,1.3,Type2 1234433948,4813243948,1.3,Type1

    This looks like a CSV. If you slap some headers on top, you can parse it into an array of hashes.

    str = "4813243948,1234433948,1.3,Type2
    require 'csv'
    calls = CSV.parse(str, headers: %w[from to length type], header_converters: :symbol).map(&:to_h) 
    # => [{:from=>"4813243948", :to=>"1234433948", :length=>"1.3", :type=>"Type2"}, 
    #     {:from=>"1234433948", :to=>"4813243948", :length=>"1.3", :type=>"Type1"}]

    This is essentially the same as your original string, only it trades some memory for ease of access. You can now "query" this dataset like this:{ |c| c[:from] == '4813243948' }

    And then aggregate for presentation however you wish.

    Naturally, searching through this array takes linear time, so if you have millions of calls you might want to organize them in a more efficient search structure (like a B-Tree) or move the whole dataset to a real database.