MySql InnoDB Select Query takes more time to result data

I have a MySQL InnoDB table which uses per day table structure (a single table to store per day data). Each table consist of 50 columns (this table cannot be normalized since this table is used to store CDR data (telco related) having around 16 million rows per table.

This is the table schema:

CREATE TABLE IF NOT EXISTS smsc_cdr_2018_07_08 ( 
 event int(4) NOT NULL,
 cdr_timestamp timestamp NOT NULL,
 smsc_ref varchar(255) NOT NULL,
 attempts int(4) NOT NULL,
 relay_attempted varchar(255) NOT NULL,
 origination varchar(255),
 ori_origination varchar(255),
 destination varchar(255),
 ori_destination varchar(255),
 vas_type varchar(255),
 accept_time varchar(255),
 sc_addr varchar(255),
 dest_imsi varchar(255),
 fno varchar(255),
 fdfr varchar(255),
 udl varchar(255),
 expiry_time varchar(255),
 pid varchar(255),
 dcs varchar(255),
 orig_ton varchar(255),
 orig_npi varchar(255),
 dest_ton varchar(255),
 dest_npi varchar(255),
 calling_gt varchar(255),
 called_gt varchar(255),
 orig_imsi varchar(255),
 seq_no varchar(255),
 reserved_26 varchar(255),
 reserved_27 varchar(255),
 reserved_28 varchar(255),
 reserved_29 varchar(255),
 reserved_30 varchar(255),
 reserved_31 varchar(255),
 ref_num varchar(255),
 tot_seg varchar(255),
 seq_num varchar(255),
 srr_sms varchar(255),
 fnft varchar(255),
 fner varchar(255),
 fnfc varchar(255),
 srs varchar(255),
 udhi varchar(255),
 charge_info varchar(255),
 operator varchar(255),
 source varchar(255),
 dest varchar(255),
 esme varchar(255),
 pri varchar(255),
 vas_info varchar(255),
 smsc_id varchar(255),
 PRIMARY KEY (event, cdr_timestamp, smsc_ref, attempts, relay_attempted)
) ROW_FORMAT=COMPACT ENGINE=MEMORY;

There are 4 fields which is used in the where clause of the query.

select 
    cast(event as char),
    cdr_timestamp,
    smsc_ref,
    cast(attempts as char),
    origination,
    ori_origination,
    destination,
    ori_destination,
    vas_type,
    accept_time,
    sc_addr,
    dest_imsi,
    fno,
    fdfr,
    udl,
    expiry_time,
    pid,
    dcs,
    orig_ton,
    orig_npi,
    dest_ton,
    dest_npi,
    calling_gt,
    called_gt,
    ref_num,
    tot_seg,
    seq_num,
    relay_attempted,
    srr_sms,
    fnft,
    fner,
    fnfc,
    srs,
    udhi,
    charge_info,
    operator,
    source,
    dest,
    esme,
    pri,
    vas_info,
    smsc_id 
FROM 
    smsc_cdr_2018_07_01 
WHERE 
    cdr_timestamp >= '2018-07-01 00:00:00' AND 
    cdr_timestamp <= '2018-07-01 02:00:00'

The from time and to time is sent from the front-end and also there are three more filters which are added only if those filters are sent from the front-end.

This query takes more than 5 minutes to result which is a problem. Since this is only for single day and whenever the front end sends to fetch the data for multiple days it will take more than 10 minutes.

1 answer

  • answered 2018-07-11 16:55 Rick James

    MEMORY? Or Innodb? MEMORY is volatile; don't use MEMORY except for staging.

    Table per day -- Usually a bad design.

    "More filters added" -- Let's see them. Any solution for the query you provided may not help for other situations.

    You want 2 hours plus 1 second? Check the range test.

    cast(event as char) -- unnecessary; the output is indistinguishable from a string, even without the cast.

    Without an index starting with cdr_timestamp, that query will have to search the entire table. Adding an index may not help since the query is looking at perhaps 2/24ths of the table.

    Normalization can and should be applied. Sure, the data comes in as strings, but you should convert more of the columns before storing in the 'Fact' table. It will take an extra step or two. See http://mysql.rjweb.org/doc.php/staging_table

    That link provides several tips that would help you -- You are inserting about 200 rows per second, close to the limit without taking extra steps; that gives you some of the steps. It provide a 2-SQL normalization technique to very efficiently do that. If some of your queries are "Summary reports", then it delves into how to make them faster. Etc.

    Even the blind use of (255) for that many columns has a significant performance burden.

    By using appropriate datatypes, plus normalization, will significantly shrink the disk footprint, thereby providing some more performance.

    5 columns for the PRIMARY KEY -- that is a lot. Is that the minimal combination of columns that uniquely identify the row?

    What will you do with the million rows of output?? Can't you employ MySQL to summarize/condense/whatever the data? 5 minutes is not that unreasonable for locating and shoveling that much data!

    reserved... -- In the long run, you will probably find out that that kludge is not the best way to do things.

    Study the values you are getting from each column to see whether they are worth normalizing. And to see (for strings) what length makes more sense:

    SELECT COUNT(*) AS total_rows,
           COUNT(DISTINCT vas_type) AS distinct values,
           MAX(CHAR_LENGTH(vas_type)) AS longest
        FROM tbl;
    

    If distinct_values / total_rows > 40%, normalize (you pick the cutoff).
    Use longest, plus a fudge factor, in VARCHAR(...).
    Use CHARACTER SET latin1 unless there is some need for utf8.