Selecting, Slicing and Filtering data in a Pandas DataFrame

Last updated on 16th October 2019

One of the essential features that a data analysis tool must provide users for working with large data-sets is the ability to select, slice, and filter data easily. Pandas provide this feature through the use of DataFrames. A data frame consists of data, which is arranged in rows and columns, and row and column labels. You can easily select, slice or take a subset of the data in several different ways, for example by using labels, by index location, by value and so on. Here we demonstrate some of these operations using a sample DataFrame.

First and foremost, let's create a DataFrame with a dataset that contains 5 rows and 4 columns and values from ranging from 0 to 19. We will use the arange() and reshape() functions from NumPy library to create a two-dimensional array and this array is passed to the Pandas DataFrame constructor function.

import numpy as np
import pandas as pd

df = pd.DataFrame(np.arange(20).reshape(5,4), columns=["A","B","C","D"])
print(df)

Output:

    A   B   C   D
0   0   1   2   3
1   4   5   6   7
2   8   9  10  11
3  12  13  14  15
4  16  17  18  19

Select rows and columns using labels

You can select rows and columns in a Pandas DataFrame by using their corresponding labels.

To select a single column.

df.loc[:,"A"]
  or
df["A"]
  or
df.A

Output:

0     0
1     4
2     8
3    12
4    16
Name: A, dtype: int32

To select multiple columns.

df.loc[:, ["A", "C"]]
  or
df[["A", "C"]]

Output:

Select a row by its label.

 df.loc[0]

Output:

A    0
B    1
C    2
D    3
Name: 0, dtype: int32

Select multiple rows by label.

df.loc[[0,1]]

Output:

   A  B  C  D
0  0  1  2  3
1  4  5  6  7

Accessing values by row and column label.
```
df.loc[0,"D"]
```
Output:
```
3
```

Accessing values from multiple columns of same row.

df.loc[1,["A", "C"]]

Output:

A    4
C    6
Name: 1, dtype: int32

Accessing values from multiple rows but same column.
```
df.loc[[0,1],"B"]
```
Output:
```
0    1
1    5
Name: B, dtype: int32
```

Select by Index Position

You can select data from a Pandas DataFrame by its location. Note, Pandas indexing starts from zero.

Select a row by index location.

df.iloc[0]

Output:

A    0
B    1
C    2
D    3
Name: 0, dtype: int32

Select a column by index location.

df.iloc[:, 3]

Output:

0     3
1     7
2    11
3    15
4    19
Name: D, dtype: int32

Select data at the specified row and column location.
```
df.iloc[0,3]
```
Output:
```
3
```

Select list of rows and columns.

df.iloc[[1,2],[0, 1]]

Output:

   A  B
1  4  5
2  8  9

Slicing Rows and Columns using labels

You can select a range of rows or columns using labels or by position. To slice by labels you use loc attribute of the DataFrame.

Slice rows by label.

df.loc[1:3, :]

Output:

    A   B   C   D
1   4   5   6   7
2   8   9  10  11
3  12  13  14  15

Slice columns by label.

df.loc[:, "B":"D"]

Output:

    B   C   D
0   1   2   3
1   5   6   7
2   9  10  11
3  13  14  15
4  17  18  19

Slice rows and columns by label.

 
df.loc[1:3, "B":"D"]

Output:

    B   C   D
1   5   6   7
2   9  10  11
3  13  14  15

Slicing Rows and Columns by position

To slice a Pandas dataframe by position use the iloc attribute. Remember index starts from 0 to (number of rows/columns - 1).

To slice rows by index position.

df.iloc[0:2,:]

Output:

   A  B  C  D
0  0  1  2  3
1  4  5  6  7

To slice columns by index position.

df.iloc[:,1:3]

Output:

To slice row and columns by index position.

df.iloc[1:2,1:3]

Output:

   B  C
1  5  6

df.iloc[:2,:2]

Output:

   A  B
0  0  1
1  4  5

Subsetting by boolean conditions

You can use boolean conditions to obtain a subset of the data from the DataFrame.

Select rows based on column value

To select all rows whose column contain the specified value(s).

df[df.B == 9]
 or
df.loc[df.B == 9]

Output:

  A  B   C   D
2  8  9  10  11

You can also use the isin() method


df[df.B.isin([9,13])]

Output:

    A   B   C   D
2   8   9  10  11
3  12  13  14  15

Rows that match multiple boolean conditions.

df[(df.B == 5) | (df.C == 10)]

Output:

   A  B   C   D
1  4  5   6   7
2  8  9  10  11

Select rows whose column does not contain the specified values.

 df[~df.B.isin([9,13])]

Output:

    A   B   C   D
0   0   1   2   3
1   4   5   6   7
4  16  17  18  19

Select columns based on row value

To select columns whose rows contain the specified value.

df.loc[:,df.isin([9,12]).any()]

Output:

Subsetting using filter method

Subsets can be created using the filter method like below.

df.filter(items=["A","D"])

Output:

df.filter(like="2", axis=0)

Output:

   A  B   C   D
2  8  9  10  11

df.filter(regex="[^AB]")

Output:

Selecting, Slicing and Filtering data in a Pandas DataFrame

Select rows and columns using labels

Select by Index Position

Slicing Rows and Columns using labels

Slicing Rows and Columns by position

Subsetting by boolean conditions

Select rows based on column value

Select columns based on row value

Subsetting using filter method

Related Articles

Post a comment

Comments

Related Articles

Reading data from MySQL to Pandas Dataframe

Different ways to create a Pandas DataFrame

Writing data from a Pandas Dataframe to a MySQL table