Count length of consecutive duplicate values for each id

I have a table as shown in the screenshot (first two columns) and I need to create a column like the last one. I'm trying to calculate the length of each sequence of consecutive values for each id.

For this, the last column is required. I played around with

row_number() over (partition by id, value)

but did not have much success, since the circled number was (quite predictably) computed as 2 instead of 1.

Please help!

Initial table & column I need to fill

4 answers

  • answered 2019-10-06 10:32 hardaxan

    IMHO, this is easier to do with cursor and loop.

    may be there is a way to do the job with selfjoin

    declare @t table (id int, val int)
    insert into @t (id, val)
    
                 select 1 as id, 1 as val
       union all select 1, 0
       union all select 1, 0
       union all select 1, 1
       union all select 1, 1
       union all select 1, 1
    
    ;with cte1 (id , val , num ) as
    (
        select id, val, row_number() over (ORDER BY (SELECT 1)) as num from @t
    )
    , cte2 (id, val, num, N) as
    (
       select id, val, num, 1 from cte1 where num = 1
       union all
       select t1.id, t1.val, t1.num, 
        case when t1.id=t2.id and t1.val=t2.val then t2.N + 1 else 1 end 
       from cte1 t1 inner join cte2 t2 on t1.num = t2.num + 1 where t1.num > 1
    )
    
    select * from cte2
    

  • answered 2019-10-06 10:36 Piotr Findeisen

    You want results that are dependent on actual data ordering in the data source. In SQL you operate on relations, sometimes on ordered set of relations rows. Your desired end result is not well-defined in terms of SQL, unless you introduce an additional column in your source table, over which your data is ordered (e.g. auto-increment or some timestamp column).

  • answered 2019-10-06 10:48 gotqn

    First of all, we need to have a way to defined how the rows are ordered. For example, in your sample data there is not way to be sure that 'first' row (1, 1) will be always displayed before the 'second' row (1,0).

    That's why in my sample data I have added an identity column. In your real case, the details can be order by row ID, date column or something else, but you need to ensure the rows can be sorted via unique criteria.

    So, the task is pretty simple:

    1. calculate trigger switch - when value is changed
    2. calculate groups
    3. calculate rows

    That's it. I have used common table expression and leave all columns in order to be easy for you to understand the logic. You are free to break this in separate statements and remove some of the columns.

    DECLARE @DataSource TABLE
    ( 
        [RowID] INT IDENTITY(1, 1)
       ,[ID]INT
       ,[value] INT
    );
    
    INSERT INTO @DataSource ([ID], [value])
    VALUES (1, 1)
          ,(1, 0)
          ,(1, 0)
          ,(1, 1)
          ,(1, 1)
          ,(1, 1)
          --
          ,(2, 0)
          ,(2, 1)
          ,(2, 0)
          ,(2, 0);
    
    WITH DataSourceWithSwitch AS
    (
        SELECT *
              ,IIF(LAG([value]) OVER (PARTITION BY [ID] ORDER BY [RowID]) = [value], 0, 1) AS [Switch]
        FROM @DataSource
    ), DataSourceWithGroup AS
    (
        SELECT *
              ,SUM([Switch]) OVER (PARTITION BY [ID] ORDER BY [RowID]) AS [Group]
        FROM DataSourceWithSwitch
    )
    SELECT *
          ,ROW_NUMBER() OVER (PARTITION BY [ID], [Group] ORDER BY [RowID]) AS [GroupRowID]
    FROM DataSourceWithGroup
    ORDER BY [RowID];
    

    enter image description here

  • answered 2019-10-06 10:56 cars10m

    One way to solve it could be through a recursive CTE:

    create table #tmp (i int identity,id int, value int, rn int);
    insert into #tmp (id,value) VALUES
      (1,1),(1,0),(1,0),(1,1),(1,1),(1,1),
      (2,0),(2,1),(2,0),(2,0);
    WITH numbered AS (
     SELECT i,id,value, 1 seq FROM #tmp WHERE i=1 UNION ALL
     SELECT a.i,a.id,a.value, CASE WHEN a.id=b.id AND a.value=b.value THEN b.seq+1 ELSE 1 END
     FROM #tmp a INNER JOIN numbered b ON a.i=b.i+1
    )
    SELECT * FROM numbered -- OPTION (MAXRECURSION 1000)
    

    This will return the following:

    i   id  value   seq
    1   1   1       1
    2   1   0       1
    3   1   0       2
    4   1   1       1
    5   1   1       2
    6   1   1       3
    7   2   0       1
    8   2   1       1
    9   2   0       1
    10  2   0       2
    

    See my little demo here: https://rextester.com/ZZEIU93657

    A prerequisite for the CTE to work is a sequenced table (e. g. a table with an identitycolumn in it) as a source. In my example I introduced the column i for this. As a starting point I need to find the first entry of the source table. In my case this was the entry with i=1.

    For a longer source table you might run into a recursion-limit error as the default for MAXRECURSION is 100. In this case you should uncomment the OPTION setting behind my SELECT clause above. You can either set it to a higher value (like shown) or switch it off completely by setting it to 0.