May 03, 2022
Compiled by endeesa. Last update 23/April/2022
The following notes will revisit a few pandas concepts which are important for doing time series analysis in Python. Suppose you created a pandas dataframe like the one shown in figure 1. The name assigned to the dataframe is df
Sales | |
---|---|
Date | |
22/04/2022 | 70 |
23/04/2022 | 50 |
24/04/2022 | 77 |
24/04/2022 | 90 |
These are some common operations that you might want to perform:
df.index = pd.to_datetime(df.index)
# Produces a matplotlib line plot(s) using all the columns
df.plot(grid=True)
# Pandas datetime indexing examples
# Filter by year 2022
timeSeries2012 = df['2022']
# Filter values from 2022 April
timeSeries2012May = df['2012-04']
# Example: Convert daily readings to MONTHLY readings using the median
df.resample('M').median()
# Assume we have another dataframe df2 similar to df
# We can merge the columns of these 2 dataframes as follows
df.join(other=df2, how='innner', on=None)
Note that if we don’t specify the value for the on argument, the two dataframes will be matched by the index. Read the docs for more info
vi. Calculating correlation and autocorrelation
# Assume you have a dataframe named 'stocks' with stock prices for microsoft and google
# The columns are named 'MSFT' and 'GOOGL' respectively
correlation = stocks['MSFT'].corr(stocks['GOOGLE'])
Typically when dealing with time series data. We do not calculate the correlation on the actual prices , but the percentage changes instead. Use the ‘pct_change()’ function to convert the values before computing the correlation.
# First convert the actual prices to returns
msft_returns = stocks['MSFT'].pct_change()
# Then compute the autocorrelation
msft_returns.autocorr()
Try putting the concepts covered above into practice with the following short exercise