SQL: Find if the column values are in right order

I have a table with StudentID, Date and CourseStatus.

If I order by StudentID and Date, the CourseStatus should only be in particular order (ENROLLED -> STARTED -> FINISHED -> TESTPASSED). But there are some StudentIDs where CourseStatus values are in incorrect order.

How to find such StudentIDs where CourseStatus is not strictly in expected order when ordered by date?

Example:

StudentID - Date         - CourseStatus
---------------------------------------
Student1  -  2019-01-01  - "ENROLLED"
Student1  -  2019-03-01  - "STARTED"
Student1  -  2019-05-01  - "FINISHED"
Student1  -  2019-08-01  - "TESTPASSED"
Student4  -  2019-02-15  - "ENROLLED"
Student4  -  2019-03-30  - "FINISHED"   <-- Incorrect value / sequence
Student4  -  2019-05-01  - "STARTED"    <-- Incorrect value / sequence
Student4  -  2019-09-01  - "TESTPASSED"
SQL Output should be
Student4  -  2019-02-15  - "ENROLLED"
Student4  -  2019-03-30  - "FINISHED"
Student4  -  2019-05-01  - "STARTED"
Student4  -  2019-09-01  - "TESTPASSED"

2 answers

  • answered 2020-02-12 23:01 GMB

    If you are runing MySQL 8.0, you can use window functions for this:

    select 
        StudentID,
        Date,
        YTDPayment
    from (
        select 
            t.*,
            s.seq
            lag(s.seq) over(partition by StudentID order by Date) lag_seq,
            lead(s.seq) over(partition by StudentID order by Date) lead_seq,
        from mytable t
        inner join (
            select 0 seq, 'ENROLLED' CourseStatus
            union all select 1, 'STARTED' 
            union all select 2, 'FINISHED' 
            union all select 3, 'TESTPASSED' 
        ) s on s.CourseStatus = t.CourseStatus
    ) t
    where not (
        (seq = lag_seq + 1 or seq = 0 and lag_seq = 3 or lag_seq is null)
        and (seq + 1 = lead_seq or seq = 3 and lead_seq = 0 or lead_seq is null)
    )
    

    The subquery uses lead() and lag() to recover the previous and next status - it is tedious to manipulate string for sequence comparisons here, so I used a mapping table that translates the strings to integer numbers.

    The outer query the filters on records that do not follow the pre-defined sequence. I enumerated all possible case (increasing sequence, end of a sequence and start of a new one, first/last record per student).

  • answered 2020-02-12 23:36 Gordon Linoff

    If you could live with the values in a single row, you can use aggregation:

    select studentid,
           group_concat(coursestatus order by date) as statuses
    from t
    group by student_id
    having statuses <> 'ENROLLED,STARTED,FINISHED,TESTPASSED';
    

    To get the original rows, you can use a join:

    select t.*
    from t join
         (select studentid,
                 group_concat(coursestatus order by date) as statuses
          from t
          group by student_id
          having statuses <> 'ENROLLED,STARTED,FINISHED,TESTPASSED'
         ) ss
         on t.studentid = ss.studentid;