How to skip the null value of a column while creating the dense rank in pandas dataframe?

I have to create dense rank based on the value of a column. That column has some null value and some float values. I have to skip null value and concentrate on float values. I am not sure how to do this? Can you please help me for doing this?

Explain:

df:

  comb_fld_order
0     NULL
1     NULL
2     NULL
3     1.1
4     1.1
5     NULL

I am using below code to creating dense rank-

df['dense_rank'] = df.groupby('comb_fld_order', sort=True).ngroup()+1

It is giving me output:

   comb_fld_order   dense_rank
0     NULL            2
1     NULL            2
2     NULL            2
3     1.1             1
4     1.1             1
5     NULL            2

But I want below output:

   comb_fld_order   dense_rank
0     NULL            
1     NULL            
2     NULL            
3     1.1             1
4     1.1             1
5     NULL            

2 answers

  • answered 2020-09-14 05:20 David Erickson

    1. Filter out NaN values with .notnull() prior to using .groupby
    2. Then, use fillna('') if you want blanks instead of NaN

    import pandas as pd
    df['dense_rank'] = df[df['comb_fld_order'].notnull()].groupby('comb_fld_order', sort=True).ngroup()+1
    df['dense_rank'] = df['dense_rank'].fillna('')
    df
    Out[1]: 
       comb_fld_order dense_rank
    0             NaN           
    1             NaN           
    2             NaN           
    3             1.1          1
    4             1.1          1
    5             NaN           
    

    If NULL is a string, then you can do:

    df['dense_rank'] = df[df['comb_fld_order'] != 'NULL'].groupby('comb_fld_order', sort=True).ngroup()+1
    

  • answered 2020-09-14 05:23 Joe Ferndz

    You can just give dropna=True

    Here's what I did.

    >>> import pandas as pd
    >>> import numpy as np
    >>> df = pd.DataFrame({'comb_fld_order':[np.nan, np.nan, np.nan, 1.1, 1.1, np.nan]})
    >>> df
       comb_fld_order
    0             NaN
    1             NaN
    2             NaN
    3             1.1
    4             1.1
    5             NaN
    >>> df['dense_rank'] = df.groupby('comb_fld_order', sort=True,dropna=True).ngroup()+1
    >>> df
       comb_fld_order  dense_rank
    0             NaN           0
    1             NaN           0
    2             NaN           0
    3             1.1           1
    4             1.1           1
    5             NaN           0