# drops a column if it exceeds a specific number of NA values

i want to write a program that drops a column if it exceeds a specific number of NA values .This is what i did.

``````def check(x):
for column in df:
if df.column.isnull().sum() > 2:
df.drop(column,axis=1)
``````

there is no error in executing the above code , but while doing `df.apply(check)`, there are a ton of errors.

P.S:I know about the thresh arguement in `df.dropna(thresh,axis)`

Any tips?Why isnt my code working?

Thanks

I think best here is use `dropna` with parameter `thresh`:

thresh : int, optional

Require that many non-NA values.

So for vectorize solution subtract it from length of `DataFrame`:

``````N = 2
df = df.dropna(thresh=len(df)-N, axis=1)
print (df)
A  D    E  F
0  a  1  5.0  a
1  b  3  3.0  a
2  c  5  6.0  a
3  d  7  9.0  b
4  e  1  2.0  b
5  f  0  NaN  b
``````

I suggest use `DataFrame.pipe` for apply function for input `DataFrame` with change `df.column` to `df[column]`, because dot notation with dynamic column names from variable failed (it try select column name `column`):

``````df = pd.DataFrame({'A':list('abcdef'),
'B':[np.nan,np.nan,np.nan,5,5,np.nan],
'C':[np.nan,8,np.nan,np.nan,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,np.nan],
'F':list('aaabbb')})

print (df)
A    B    C  D    E  F
0  a  NaN  NaN  1  5.0  a
1  b  NaN  8.0  3  3.0  a
2  c  NaN  NaN  5  6.0  a
3  d  5.0  NaN  7  9.0  b
4  e  5.0  2.0  1  2.0  b
5  f  NaN  3.0  0  NaN  b

def check(df):
for column in df:
if df[column].isnull().sum() > 2:
df.drop(column,axis=1, inplace=True)
return df

print (df.pipe(check))
A  D    E  F
0  a  1  5.0  a
1  b  3  3.0  a
2  c  5  6.0  a
3  d  7  9.0  b
4  e  1  2.0  b
5  f  0  NaN  b
``````

Although jezrael's answer works that is not the approach you should do. Instead, create a mask: `~df.isnull().sum().gt(2)` and apply it with `.loc[:,m]` to access columns.

Full example:

``````import pandas as pd
import numpy as np

df = pd.DataFrame({
'A':list('abcdef'),
'B':[np.nan,np.nan,np.nan,5,5,np.nan],
'C':[np.nan,8,np.nan,np.nan,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,np.nan],
'F':list('aaabbb')
})

m = ~df.isnull().sum().gt(2)
df = df.loc[:,m]

print(df)
``````

Returns:

``````   A  D    E  F
0  a  1  5.0  a
1  b  3  3.0  a
2  c  5  6.0  a
3  d  7  9.0  b
4  e  1  2.0  b
5  f  0  NaN  b
``````

Explanation

Assume we print the columns and the mask before applying it.

``````print(df.columns.tolist())
print(m.tolist())
``````

It would return this:

``````['A', 'B', 'C', 'D', 'E', 'F']
[True, False, False, True, True, True]
``````

Columns B and C are unwanted (False). They are removed when the mask is applied.

Alternatively, you can use `count` which counts non-null values
``````In [23]: df.loc[:, df.count().gt(len(df.index) - 2)]