Series & Dataframe
Pandas is an open-source data manipulation and analysis library for Python. It provides a powerful data structure called DataFrame, which is similar to a table in a relational database, and allows you to easily manipulate and analyze data.
Some of the key features of the Pandas library include:
Data cleaning and preparation: Pandas provides functions to handle missing or incomplete data, and allows you to transform data into the desired format.
Data indexing and selection: You can easily select, filter, and slice data using Pandas.
Data aggregation and grouping: Pandas provides powerful functions for grouping and summarizing data.
Data visualization: Pandas integrates with other visualization libraries such as Matplotlib to create rich visualizations of your data.
Pandas is widely used in data science, machine learning, and finance for data analysis and manipulation tasks.
Series
Pandas Series is a one-dimensional array-like object that can hold any data type. It is similar to a column in a spreadsheet or a database table. Here's how you can use Pandas Series with codes:
First, import the Pandas library:
pythonimport pandas as pd
Create a Series object by passing a list of values to the pd.Series() function:
pythondata = [10, 20, 30, 40, 50]s = pd.Series(data)This creates a Series object with five elements (10, 20, 30, 40, and 50).
Here's an example:
pythonimport pandas as pd
data = [10, 20, 30, 40, 50]s = pd.Series(data)print(s)Output:
go0 101 202 303 404 50dtype: int64By default, Pandas assigns a numeric index starting from 0 to the elements of the Series.
You can also specify custom index labels for the Series elements:
pythondata = [10, 20, 30, 40, 50]index = ['a', 'b', 'c', 'd', 'e']s = pd.Series(data, index=index)print(s)This creates a Series object with five elements (10, 20, 30, 40, and 50) and custom index labels ('a', 'b', 'c', 'd', and 'e').
Output:
cssa 10b 20c 30d 40e 50dtype: int64Accessing elements of a Series:
You can access individual elements of the Series using the index labels:
pythons['c'] # returns 30
You can perform various operations on the Series object such as filtering, indexing, and arithmetic operations:
python# Filterings[s > 30] # returns a new Series with elements greater than 30# Indexings[['a', 'c', 'e']] # returns a new Series with elements at index 'a', 'c', and 'e'
# Arithmetic operationss * 2 # multiplies each element of the Series by 2These are just a few examples of what you can do with Pandas Series. You can refer to the Pandas documentation for more details and examples
DataFrame
To use Pandas DataFrame, you need to first import the Pandas library into your Python environment. You can do this by typing the following command at the beginning of your Python script:
pythonimport pandas as pd
This imports the Pandas library and renames it to "pd" for ease of use.
Next, you can create a DataFrame by passing a dictionary or a list of lists to the pd.DataFrame() function. Here's an example of creating a DataFrame from a dictionary:
pythondata = {'name': ['John', 'Jane', 'Bob', 'Emily'], 'age': [28, 25, 20, 32], 'city': ['New York', 'San Francisco', 'Chicago', 'Los Angeles']}df = pd.DataFrame(data)print(df)This creates a DataFrame with three columns (name, age, and city) and four rows of data.
Output:
markdownname age city0 John 28 New York1 Jane 25 San Francisco2 Bob 20 Chicago3 Emily 32 Los Angeles
You can also load data from a file using functions like pd.read_csv(), pd.read_excel(), or pd.read_sql().
Once you have a DataFrame, you can perform various operations on it such as selecting columns, filtering rows, grouping data, merging/joining multiple DataFrames, etc. Here are a few examples:
python# Selecting columnsdf['name'] # returns a Series containing the 'name' columndf[['name', 'age']] # returns a DataFrame containing the 'name' and 'age' columns# Filtering rowsdf[df['age'] > 25] # returns all rows where the 'age' column is greater than 25
# Grouping datadf.groupby('city').mean() # groups the data by the 'city' column and returns the mean age for each group
# Merging/joining DataFramesdf1 = pd.DataFrame({'name': ['John', 'Jane', 'Bob'], 'age': [28, 25, 20]})
df2 = pd.DataFrame({'name': ['Bob', 'Emily', 'John'], 'city': ['Chicago', 'Los Angeles', 'New York']})
merged_df = pd.merge(df1, df2, on='name') # merges the two DataFrames on the 'name' columnThese are just a few examples of what you can do with Pandas DataFrame. You can refer to the Pandas documentation for more details and examples.