I know that .itertuples() and .iterrows() are slow, but how can I speed them up if I need to use and process data one row at a time, as shown below?
df = pd.read_csv('example.csv')
posts = []
for row in df.itertuples():
post = Post(title=row.title, text=row.text, ...)
posts.append(post)
CodePudding user response:
You can use list comprehension and unpacking (using kwargs) if your DataFrame columns have the same names as your class attributes. An example is shown below.
df = pd.DataFrame({"title": ["fizz", "buzz"], "text": ["aaaa", "bbbb"]})
posts = [Post(**kwargs) for kwargs in df.to_dict("records")]
CodePudding user response:
What I usually do is using apply function.
import pandas as pd
df = pd.DataFrame(dict(title=["title1", "title2", "title3"],text=["text1", "text2", "text3"]))
df["Posts"] = df.apply(lambda x: dict(title=x["title"], text=x["text"]), axis=1)
posts = list(df["Posts"])
print(posts)
Output:
[{'title': 'title1', 'text': 'text1'}, {'title': 'title2', 'text': 'text2'}, {'title': 'title3', 'text': 'text3'}]
It's better to avoid a for loop when you have another methods to do that.
