Apply and Lambda Transform
In this notebook we will learn to perform the column data operation through implementation of apply()
and lambda
functionality.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
Load data
titanic = pd.read_csv('data/titanic.csv')
df1 = titanic.set_index('Name')
df1.head(2)
|
PassengerId |
Survived |
Pclass |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
Name |
|
|
|
|
|
|
|
|
|
|
|
Braund, Mr. Owen Harris |
1 |
0 |
3 |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
Cumings, Mrs. John Bradley (Florence Briggs Thayer) |
2 |
1 |
1 |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
1. Implementation of Apply ()
with lambda()
function
- Apply
lambda
functionality to age
column.
df1['remaining-age'] = df1['Age'].apply(lambda x: 100-x).head(5)
df1.head(5)
|
PassengerId |
Survived |
Pclass |
Sex |
Age |
SibSp |
Parch |
Ticket |
Fare |
Cabin |
Embarked |
remaining-age |
Name |
|
|
|
|
|
|
|
|
|
|
|
|
Braund, Mr. Owen Harris |
1 |
0 |
3 |
male |
22.0 |
1 |
0 |
A/5 21171 |
7.2500 |
NaN |
S |
78.0 |
Cumings, Mrs. John Bradley (Florence Briggs Thayer) |
2 |
1 |
1 |
female |
38.0 |
1 |
0 |
PC 17599 |
71.2833 |
C85 |
C |
62.0 |
Heikkinen, Miss. Laina |
3 |
1 |
3 |
female |
26.0 |
0 |
0 |
STON/O2. 3101282 |
7.9250 |
NaN |
S |
74.0 |
Futrelle, Mrs. Jacques Heath (Lily May Peel) |
4 |
1 |
1 |
female |
35.0 |
1 |
0 |
113803 |
53.1000 |
C123 |
S |
65.0 |
Allen, Mr. William Henry |
5 |
0 |
3 |
male |
35.0 |
0 |
0 |
373450 |
8.0500 |
NaN |
S |
65.0 |
- Apply
lambda
functionality to Fare
column to transform it to new value.
df1['Fare'].apply(lambda x: (10*x**2 + 2*x +4)/10).head(5)
Name
Braund, Mr. Owen Harris 54.412500
Cumings, Mrs. John Bradley (Florence Briggs Thayer) 5095.965519
Heikkinen, Miss. Laina 64.790625
Futrelle, Mrs. Jacques Heath (Lily May Peel) 2830.630000
Allen, Mr. William Henry 66.812500
Name: Fare, dtype: float64
- Let us write a new function to supply inside the
apply()
function.
def newfeature(x):
return 10 + x/3 + x**2
df1['Fare'].apply(newfeature).head(4)
Name
Braund, Mr. Owen Harris 64.979167
Cumings, Mrs. John Bradley (Florence Briggs Thayer) 5115.069959
Heikkinen, Miss. Laina 75.447292
Futrelle, Mrs. Jacques Heath (Lily May Peel) 2847.310000
Name: Fare, dtype: float64
2. Column Operation with Lambda function
- Lets create a new random dataframe to play around.
dates = pd.date_range('1/1/2000', periods=100)
df = pd.DataFrame(np.random.randn(100, 4),
index=dates, columns=['A', 'B', 'C', 'D'])
df.head()
|
A |
B |
C |
D |
2000-01-01 |
-0.656738 |
-0.461095 |
-0.259647 |
0.890244 |
2000-01-02 |
0.652611 |
0.906148 |
-0.527606 |
-0.106089 |
2000-01-03 |
-0.067463 |
1.407429 |
1.414694 |
-1.266369 |
2000-01-04 |
0.301058 |
0.624163 |
-0.144190 |
-1.177690 |
2000-01-05 |
1.557796 |
-1.497422 |
0.545636 |
-1.006319 |
- We can directly add, multiply, substract etc among columns if they have same data types.
df['E'] = (df['A'] + df['B'])/df['C']
df.head()
|
A |
B |
C |
D |
E |
2000-01-01 |
-0.656738 |
-0.461095 |
-0.259647 |
0.890244 |
4.305207 |
2000-01-02 |
0.652611 |
0.906148 |
-0.527606 |
-0.106089 |
-2.954399 |
2000-01-03 |
-0.067463 |
1.407429 |
1.414694 |
-1.266369 |
0.947177 |
2000-01-04 |
0.301058 |
0.624163 |
-0.144190 |
-1.177690 |
-6.416660 |
2000-01-05 |
1.557796 |
-1.497422 |
0.545636 |
-1.006319 |
0.110649 |
- One can use
lambda
functions to transform the columns before the column operation.
df['F'] = df['A'].apply(lambda x : 10+x) + df['E'].apply(lambda x: x+20 if x>0 else x)
df.head()
|
A |
B |
C |
D |
E |
F |
2000-01-01 |
-0.656738 |
-0.461095 |
-0.259647 |
0.890244 |
4.305207 |
33.648469 |
2000-01-02 |
0.652611 |
0.906148 |
-0.527606 |
-0.106089 |
-2.954399 |
7.698212 |
2000-01-03 |
-0.067463 |
1.407429 |
1.414694 |
-1.266369 |
0.947177 |
30.879714 |
2000-01-04 |
0.301058 |
0.624163 |
-0.144190 |
-1.177690 |
-6.416660 |
3.884397 |
2000-01-05 |
1.557796 |
-1.497422 |
0.545636 |
-1.006319 |
0.110649 |
31.668445 |
References:
- Pydata document for Pandas