Series & Dataframe

 Pandas is an open-source data manipulation and analysis library for Python. It provides a powerful data structure called DataFrame, which is similar to a table in a relational database, and allows you to easily manipulate and analyze data.

Some of the key features of the Pandas library include:

  1. Data cleaning and preparation: Pandas provides functions to handle missing or incomplete data, and allows you to transform data into the desired format.

  2. Data indexing and selection: You can easily select, filter, and slice data using Pandas.

  3. Data aggregation and grouping: Pandas provides powerful functions for grouping and summarizing data.

  4. Data visualization: Pandas integrates with other visualization libraries such as Matplotlib to create rich visualizations of your data.

Pandas is widely used in data science, machine learning, and finance for data analysis and manipulation tasks.

Series


Pandas Series is a one-dimensional array-like object that can hold any data type. It is similar to a column in a spreadsheet or a database table. Here's how you can use Pandas Series with codes:

First, import the Pandas library:

python
import pandas as pd

Create a Series object by passing a list of values to the pd.Series() function:

python
data = [10, 20, 30, 40, 50]
s = pd.Series(data)

This creates a Series object with five elements (10, 20, 30, 40, and 50).

Here's an example:

python
import pandas as pd

data = [10, 20, 30, 40, 50]
s = pd.Series(data)
print(s)

Output:

go
0 10
1 20
2 30
3 40
4 50
dtype: int64

By default, Pandas assigns a numeric index starting from 0 to the elements of the Series.

You can also specify custom index labels for the Series elements:

python
data = [10, 20, 30, 40, 50]
index = ['a', 'b', 'c', 'd', 'e']
s = pd.Series(data, index=index)
print(s)

This creates a Series object with five elements (10, 20, 30, 40, and 50) and custom index labels ('a', 'b', 'c', 'd', and 'e').

Output:

css
a 10
b 20
c 30
d 40
e 50
dtype: int64

Accessing elements of a Series:

You can access individual elements of the Series using the index labels:

python
s['c'] # returns 30

You can perform various operations on the Series object such as filtering, indexing, and arithmetic operations:

python
# Filtering
s[s > 30] # returns a new Series with elements greater than 30
# Indexing
s[['a', 'c', 'e']] # returns a new Series with elements at index 'a', 'c', and 'e'

# Arithmetic operations
s * 2 # multiplies each element of the Series by 2

These are just a few examples of what you can do with Pandas Series. You can refer to the Pandas documentation for more details and examples

DataFrame


To use Pandas DataFrame, you need to first import the Pandas library into your Python environment. You can do this by typing the following command at the beginning of your Python script:

python
import pandas as pd

This imports the Pandas library and renames it to "pd" for ease of use.

Next, you can create a DataFrame by passing a dictionary or a list of lists to the pd.DataFrame() function. Here's an example of creating a DataFrame from a dictionary:

python
data = {'name': ['John', 'Jane', 'Bob', 'Emily'],
'age': [28, 25, 20, 32],
'city': ['New York', 'San Francisco', 'Chicago', 'Los Angeles']}
df = pd.DataFrame(data)
print(df)

This creates a DataFrame with three columns (name, age, and city) and four rows of data.

Output:

markdown
name age city
0 John 28 New York
1 Jane 25 San Francisco
2 Bob 20 Chicago
3 Emily 32 Los Angeles

You can also load data from a file using functions like pd.read_csv(), pd.read_excel(), or pd.read_sql().

Once you have a DataFrame, you can perform various operations on it such as selecting columns, filtering rows, grouping data, merging/joining multiple DataFrames, etc. Here are a few examples:

python
# Selecting columns
df['name'] # returns a Series containing the 'name' column
df[['name', 'age']] # returns a DataFrame containing the 'name' and 'age' columns
# Filtering rows
df[df['age'] > 25] # returns all rows where the 'age' column is greater than 25

# Grouping data
df.groupby('city').mean() # groups the data by the 'city' column and returns the mean age for each group

# Merging/joining DataFrames
df1 = pd.DataFrame({'name': ['John', 'Jane', 'Bob'],
'age': [28, 25, 20]})

df2 = pd.DataFrame({'name': ['Bob', 'Emily', 'John'],
'city': ['Chicago', 'Los Angeles', 'New York']})

merged_df = pd.merge(df1, df2, on='name') # merges the two DataFrames on the 'name' column

These are just a few examples of what you can do with Pandas DataFrame. You can refer to the Pandas documentation for more details and examples.

Popular posts from this blog

Class & Function