# Time series Analysis 101 - Part 1

Compiled by endeesa. Last update 23/April/2022

## 1. Pandas refresher

The following notes will revisit a few pandas concepts which are important for doing time series analysis in Python. Suppose you created a pandas dataframe like the one shown in figure 1. The name assigned to the dataframe is df

Sales
Date
22/04/2022 70
23/04/2022 50
24/04/2022 77
24/04/2022 90

These are some common operations that you might want to perform:

### i. Convert index values of type string into datetime objects using pd_todatetime()

df.index = pd.to_datetime(df.index)

### ii. Plot a time series on a line graph

# Produces a matplotlib line plot(s) using all the columns
df.plot(grid=True)

### iii. Index slicing

• Recall that slicing is used to filter a subset of the data based on the position
• Similarly you can slice pandas datetime indexes to filter data based on years, months, days etc.
# Pandas datetime indexing examples

# Filter by year 2022
timeSeries2012 = df['2022']

# Filter values from 2022 April
timeSeries2012May = df['2012-04']

### iv. Frequency conversion

• Sometimes we may wish to downsample or updample readings into monthly, quarterly or yearly frequency
• This functionality can be easily obtained from the built-in pandas function pd.resample()
# Example: Convert daily readings to MONTHLY readings using the median
df.resample('M').median()
• Other popular frequencies: Q-quarter, D-day, W-week, A=year, T-minute etc.

### v. Merge multiple dataframes

# Assume we have another dataframe df2 similar to df
# We can merge the columns of these 2 dataframes as follows

df.join(other=df2, how='innner', on=None)

Note that if we don’t specify the value for the on argument, the two dataframes will be matched by the index. Read the docs for more info

• If instead, we wanted to merge the rows, we would use df.concat()

vi. Calculating correlation and autocorrelation

• Correlation is a simple measure that tells us whether the values between two columns vary together or not
# Assume you have a dataframe named 'stocks' with stock prices for microsoft and google
# The columns are named 'MSFT' and 'GOOGL' respectively

correlation = stocks['MSFT'].corr(stocks['GOOGLE'])

Typically when dealing with time series data. We do not calculate the correlation on the actual prices , but the percentage changes instead. Use the ‘pct_change()’ function to convert the values before computing the correlation.

• If we are interested in knowing the the correlation of a time series with a delayed version of itself, we can calculate the autocorrelation as follows:
# First convert the actual prices to returns
msft_returns = stocks['MSFT'].pct_change()

# Then compute the autocorrelation
msft_returns.autocorr()

## Knowledge check

• Try putting the concepts covered above into practice with the following short exercise

• Download a the oil prices dataset from here
• Read the data into a pandas dataframe
• Set the index of the dataframe to be the date column
• Plot the oil prices from 2000 to 2020
• Create a new dataframe with 2019 data only and change the frequency to quarters
• Calculate lag 2 autocorrelation of the oil prices in 2019
• Plot the autocorrelation function of the oil prices for 2020(Optional)
• Once completed, move on to Part 2 of this series where we will cover EDA(exploratory data analysis) methods applicable for time series data