Understanding Pandas in Python
Pandas is a powerful library used for data manipulation and analysis in Python. It provides data structures and functions designed to efficiently handle structured data, including tabular data such as spreadsheets and SQL tables.
Installing Pandas
Before we dive into the world of pandas, you need to install it first. You can do this by running the following command in your terminal or command prompt:
pip install pandas
Basic Pandas Data Structures
Pandas has two primary data structures: Series and DataFrame.
Series
A Series is one-dimensional labeled array. It’s similar to a list, but it also has an index that you can use to access specific values. For example, let’s say we have the following series:
import pandas as pd
# Creating a series
s = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])
print(s)
Output:
a 1
b 2
c 3
d 4
e 5
dtype: int64
DataFrame
A DataFrame is two-dimensional labeled data structure with columns of potentially different types. It’s similar to an Excel spreadsheet or a SQL table. For example, let’s say we have the following DataFrame:
import pandas as pd
# Creating a dataframe
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32],
'Country': ['USA', 'UK', 'Australia', 'Germany']}
df = pd.DataFrame(data)
print(df)
Output:
Name Age Country
0 John 28 USA
1 Anna 24 UK
2 Peter 35 Australia
3 Linda 32 Germany
Renaming Columns in a DataFrame
Now that we have our DataFrame, let’s talk about renaming columns. By default, pandas uses the first row of data as column headers when you create a new DataFrame. However, if you want to give your column headers custom names, you can use the columns attribute.
import pandas as pd
# Creating a dataframe with default column headers
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32],
'Country': ['USA', 'UK', 'Australia', 'Germany']}
df = pd.DataFrame(data)
print(df.columns)
Output:
Index(['Name', 'Age', 'Country'], dtype='object')
As you can see, the column headers are simply the keys of our data dictionary. However, if we want to give them custom names, we can do so using the columns attribute.
# Renaming columns
df.columns = ['Employee Name', 'Age Group', 'Nationality']
print(df.columns)
Output:
Index(['Employee Name', 'Age Group', 'Nationality'], dtype='object')
However, this approach doesn’t change the existing column headers in our DataFrame. To actually rename the columns, we need to use the rename method.
# Renaming columns using rename
df = df.rename(columns={'Name': 'Employee Name', 'Age': 'Age Group', 'Country': 'Nationality'})
print(df.columns)
Output:
Index(['Employee Name', 'Age Group', 'Nationality'], dtype='object')
Or, if you want to keep the original column names and just rename the corresponding ones:
# Renaming columns while keeping the rest unchanged
df = df.rename(columns={'Name': 'Employee Name'})
print(df.columns)
Output:
Index(['Employee Name', 'Age Group', 'Nationality'], dtype='object')
Renaming a Specific Column
Now, let’s say we have a DataFrame with columns named ‘Data Field’ that we want to rename to ‘Column1’.
import pandas as pd
# Creating a dataframe
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32],
'Data Field': ['USA', 'UK', 'Australia', 'Germany']}
df = pd.DataFrame(data)
print(df.columns)
Output:
Index(['Name', 'Age', 'Data Field'], dtype='object')
As you can see, the column header is ‘Data Field’. To rename this column to ‘Column1’, we can use the rename method:
# Renaming a specific column
df = df.rename(columns={'Data Field': 'Column1'})
print(df.columns)
Output:
Index(['Name', 'Age', 'Column1'], dtype='object')
Or, if you want to rename all columns except the ones you don’t want to touch:
# Renaming a specific column while keeping the rest unchanged
df = df.rename(columns={'Data Field': 'Column1'})
print(df.columns)
Output:
Index(['Name', 'Age', 'Column1'], dtype='object')
Renaming Column Index Name
However, when we rename a column using rename, it doesn’t directly change the index name. Instead, it only changes the header of the column.
To actually change the index name, you can use the following method:
# Renaming column index name
df.index.name = 'Column1'
This sets the index name to ‘Column1’, which is different from the column headers.
Why Does This Happen?
When we rename a column using rename, pandas internally uses the original header of the column as the new header. However, this doesn’t change the actual data stored in the column.
Similarly, when we set the index name using index.name, pandas creates a new attribute called _index that contains the original index values.
So, if you want to rename both the column headers and the index name, you need to use separate methods for each:
# Renaming column header and index name separately
df.columns = ['New Column Name']
df.index.name = 'New Index Name'
Or, if you want to keep the original header of the column but change only its position in the columns list (i.e., rename the column index name), you can use the following code:
# Renaming column index name while keeping the rest unchanged
df.columns = df.columns.tolist()
df.columns[0] = 'New Column Name'
This approach doesn’t change the actual data stored in the column, but it changes its position in the columns list.
Conclusion
In this article, we’ve learned how to rename columns in a pandas DataFrame. We covered various methods for renaming columns, including using rename, changing the index name, and renaming specific columns while keeping others unchanged.
Last modified on 2024-05-07