RL and trading in financial markets 101
In this tutorial we will focus on using machine learning techniques, particularly reinforcement learning, for trading in financial markets. The goal will be to build a trading agent that learns to maximize total profit over a given period.
We will be using the OpenAI Gym’s environment model, so if you are not already familiar with it, you can check the previous tutorials.
Part 1: Setting Up the Trading Environment
Before we can train an agent, we need to set up the trading environment. We’ll use pandas_datareader to download historical stock price data from Yahoo Finance.
First, let’s install the required libraries.
pip install gym pandas_datareader yfinance
Next, we can import the required libraries:
import gym
from gym import spaces
import numpy as np
import pandas as pd
from pandas_datareader import data as pdr
import yfinance as yf
yf.pdr_override()
Next, let’s define our custom trading environment. The environment will have a continuous action space representing the proportion of our portfolio to invest in the stock. A positive action value indicates a long position, while a negative action value indicates a short position. The state will be the latest stock price and the current portfolio value.
class TradingEnv(gym.Env):
def __init__(self, stock_symbol, start_date, end_date, initial_balance=10000):
super(TradingEnv, self).__init__()
# Load data
self.stock_data = pdr.get_data_yahoo(stock_symbol, start=start_date, end=end_date)
self.stock_prices = self.stock_data['Adj Close'].values
self.initial_balance = initial_balance
self.n_steps = len(self.stock_prices)
# Define action and state space
# They must be gym.spaces objects
# Example when using discrete actions:
# self.action_space = spaces.Discrete(N_DISCRETE_ACTIONS)
self.action_space = spaces.Box(low=-1, high=1, shape=(1,))
self.observation_space = spaces.Box(low=0, shape=(2,))
def step(self, action):
# Execute one time step within the environment
pass
def reset(self):
# Reset the state of the environment to an initial state
pass
def render(self, mode='human'):
# Render the environment to the screen
pass
Part 2: Implementing the Reinforcement Learning Agent
The trading agent will be implemented using a DQN (Deep Q-Network) similar to the one we used for playing the CartPole game. The agent will be trained to maximize the total rewards, which are defined as the change in portfolio value.
import random
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
from collections import deque
class DQNAgent:
def __init__(self, state_size, action_size):
self.state_size = state_size
self.action_size = action_size
self.memory = deque(maxlen=2000)
self.gamma = 0.95 # discount rate
self.epsilon = 1.0 # exploration rate
self.epsilon_min = 0.01
self.epsilon_decay = 0.995
self.learning_rate = 0.001
self.model = self._build_model()
def _build_model(self):
# Neural network for Deep-Q learning Model
= Sequential()
model 24, input_dim=self.state_size, activation='relu'))
model.add(Dense(24, activation='relu'))
model.add(Dense(self.action_size, activation='linear'))
model.add(Dense(
model
compile(loss='mse', optimizer=Adam(lr=self.learning_rate))
.return model
# Other methods like `remember`, `act`, `replay` would be similar to what we defined earlier in the CartPole example
Part 3: Training the Agent
Finally, we can create an instance of our trading environment and our DQN agent, and train the agent using historical data. The agent learns by observing the outcome of its actions, remembering them, and learning from the experiences.
# Specify the stock symbol and date range
= 'AAPL'
stock_symbol = '2015-01-01'
start_date = '2020-12-31'
end_date
# Create the trading environment
= TradingEnv(stock_symbol, start_date, end_date)
env
# Create the DQN agent
= DQNAgent(env.observation_space.shape[0], env.action_space.shape[0])
agent
# Training the agent
= 100
episodes for e in range(episodes):
= env.reset()
state = np.reshape(state, [1, agent.state_size])
state for time in range(env.n_steps):
= agent.act(state)
action = env.step(action)
next_state, reward, done, _ = reward if not done else -10
reward = np.reshape(next_state, [1, agent.state_size])
next_state
agent.remember(state, action, reward, next_state, done)= next_state
state if done:
print("episode: {}/{}, score: {}".format(e, episodes, time))
break
if len(agent.memory) > batch_size:
agent.replay(batch_size)
Note that the trading environment and DQN agent are not fully implemented in this tutorial. The step
, reset
, and render
methods in the TradingEnv
class and the remember
, act
, and replay
methods in the DQNAgent
class need to be implemented similar to the CartPole example.
Also, note that this is a very simplified model that doesn’t take into account transaction costs, uses only the price data to make decisions, and assumes that the agent can trade any fractional amount of the stock, which might not be possible in real-world trading scenarios.
Finally, remember that this model is trained entirely on historical data, and as the common saying in finance goes, “past performance is not indicative of future results”. Therefore, caution should be taken when using reinforcement learning models for actual trading.