What are the pros/cons in using pd.Index vs df.loc

What is the difference between using pd.Index vs df.loc? Is it effectively the same thing?

idx = pd.Index(('a', 'b'))
df = pd.DataFrame({'a': [0, 1], 'b': [2, 3], 'c': [0, 5]})

print(df.loc[:, ('a', 'b')],)
print(df[idx])
a b
0 0 2
1 1 3

2 answers

  • answered 2022-01-23 02:55 BENY

    When you do loc , you can do with index slice and columns slice or combine, however pd.index can only do for column slice

    df.loc[[0]]
       a  b  c
    0  0  2  0
    
    df.loc[[0],['a','b']]
       a  b
    0  0  2
    

    IMO, loc is more flexible to using, and I will select loc which will more clear for the long run or check back stage.

  • answered 2022-01-23 04:32 mozway

    How loc is the preferred method is described in the documentation. Using multiple slices can lead to a SettingWithCopyWarning:

    idx = ['a', 'b']
    d = df[idx]
    d.iloc[0,0] = 9
    
    SettingWithCopyWarning: 
    A value is trying to be set on a copy of a slice from a DataFrame
    
    See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
    

    In contrast, using loc doesn't trigger the SettingWithCopyWarning:

    idx = ['a', 'b']
    d = df.loc[:,idx]
    d.iloc[0,0] = 9
    

    Of note, loc also enables you to pass a specific axis as parameter:

    df.loc(axis=1)[idx]
    

How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum