Series & Dataframe

Pandas is an open-source data manipulation and analysis library for Python. It provides a powerful data structure called DataFrame, which is similar to a table in a relational database, and allows you to easily manipulate and analyze data.

Some of the key features of the Pandas library include:

Data cleaning and preparation: Pandas provides functions to handle missing or incomplete data, and allows you to transform data into the desired format.
Data indexing and selection: You can easily select, filter, and slice data using Pandas.
Data aggregation and grouping: Pandas provides powerful functions for grouping and summarizing data.
Data visualization: Pandas integrates with other visualization libraries such as Matplotlib to create rich visualizations of your data.

Pandas is widely used in data science, machine learning, and finance for data analysis and manipulation tasks.

Series

Pandas Series is a one-dimensional array-like object that can hold any data type. It is similar to a column in a spreadsheet or a database table. Here's how you can use Pandas Series with codes:

First, import the Pandas library:

python
import pandas as pd

Create a Series object by passing a list of values to the pd.Series() function:

python
data = [10, 20, 30, 40, 50]
s = pd.Series(data)

This creates a Series object with five elements (10, 20, 30, 40, and 50).

Here's an example:

python
import pandas as pd

data = [10, 20, 30, 40, 50]s = pd.Series(data)print(s)

Output:

go
0    10
1    20
2    303    404    50dtype: int64

By default, Pandas assigns a numeric index starting from 0 to the elements of the Series.

You can also specify custom index labels for the Series elements:

python
data = [10, 20, 30, 40, 50]
index = ['a', 'b', 'c', 'd', 'e']
s = pd.Series(data, index=index)print(s)

This creates a Series object with five elements (10, 20, 30, 40, and 50) and custom index labels ('a', 'b', 'c', 'd', and 'e').

Output:

css
a    10
b    20
c    30d    40e    50dtype: int64

Accessing elements of a Series:

You can access individual elements of the Series using the index labels:

python
s['c'] # returns 30

You can perform various operations on the Series object such as filtering, indexing, and arithmetic operations:

python
# Filtering
s[s > 30] # returns a new Series with elements greater than 30
# Indexings[['a', 'c', 'e']] # returns a new Series with elements at index 'a', 'c', and 'e'
# Arithmetic operationss * 2 # multiplies each element of the Series by 2

These are just a few examples of what you can do with Pandas Series. You can refer to the Pandas documentation for more details and examples

DataFrame

To use Pandas DataFrame, you need to first import the Pandas library into your Python environment. You can do this by typing the following command at the beginning of your Python script:

python
import pandas as pd

This imports the Pandas library and renames it to "pd" for ease of use.

Next, you can create a DataFrame by passing a dictionary or a list of lists to the pd.DataFrame() function. Here's an example of creating a DataFrame from a dictionary:

python
data = {'name': ['John', 'Jane', 'Bob', 'Emily'],
        'age': [28, 25, 20, 32],
        'city': ['New York', 'San Francisco', 'Chicago', 'Los Angeles']}df = pd.DataFrame(data)print(df)

This creates a DataFrame with three columns (name, age, and city) and four rows of data.

Output:

markdown
    name  age           city
0   John   28       New York
1   Jane   25  San Francisco
2    Bob   20        Chicago
3  Emily   32    Los Angeles

You can also load data from a file using functions like pd.read_csv(), pd.read_excel(), or pd.read_sql().

Once you have a DataFrame, you can perform various operations on it such as selecting columns, filtering rows, grouping data, merging/joining multiple DataFrames, etc. Here are a few examples:

python
# Selecting columns
df['name'] # returns a Series containing the 'name' column
df[['name', 'age']] # returns a DataFrame containing the 'name' and 'age' columns# Filtering rowsdf[df['age'] > 25] # returns all rows where the 'age' column is greater than 25
# Grouping datadf.groupby('city').mean() # groups the data by the 'city' column and returns the mean age for each group
# Merging/joining DataFramesdf1 = pd.DataFrame({'name': ['John', 'Jane', 'Bob'],                    'age': [28, 25, 20]})
df2 = pd.DataFrame({'name': ['Bob', 'Emily', 'John'],                    'city': ['Chicago', 'Los Angeles', 'New York']})
merged_df = pd.merge(df1, df2, on='name') # merges the two DataFrames on the 'name' column

These are just a few examples of what you can do with Pandas DataFrame. You can refer to the Pandas documentation for more details and examples.

Search This Blog