Let's say we have a 100 rows pandas Dataframe "frame", then we define a method
def test(a_dataframe):
a_dataframe["new_col"] = "new_value"
a_dataframe = a_dataframe.iloc[0:10,:]
if we run test(frame), the frame object would have the "new_col", but still have 100 rows.
Could anybody explain why the method test could add new column to a Dataframe but couldn't subset it?
Thanks
I thought the "test" method would add new column to a Dataframe as well as subset it with the first 10 rows.
CodePudding user response:
When you call the function with test(frame), the local variable a_dataframe inside the function will initially contain a reference to the frame object that exists outside of the function. Now the two lines within the body of the function do very different things:
a_dataframe["new_col"] = "new_value"does not change the value of the local variablea_dataframe. Instead, it invokes the__setitem__method on the dataframe that is referenced by that variable. So theframeoutside the function is changed accordingly.a_dataframe = a_dataframe.iloc[0:10,:]does change the value of the local variablea_dataframe. This has nothing to do with theilocmethod. It is simply because witha_dataframe = <anything>, you assign a new value to the local variablea_dataframe, thus overwriting the reference toframeit initially contained.
If you do want to drop rows from frame from within the function, you could use something like a_dataframe.drop(range(10, 100), inplace=True). This would work similarly to case 1. above, calling a method on the dataframe that is referenced by the local variable. Note that the first argument of the drop method refers to index values, which are not necessarily identical to the row numbers that iloc refers to.
