Reinforcement learning for power scheduling in a grid-tied pv-battery electric vehicles charging station

Arwa, Erick Odhiambo

Reinforcement learning for power scheduling in a grid-tied pv-battery electric vehicles charging station

Master Thesis

2021

Abstract

Grid-tied renewable energy sources (RES) based electric vehicle (EV) charging stations are an example of a distributed generator behind the meter system (DGBMS) which characterizes most modern power infrastructure. To perform power scheduling in such a DGBMS, stochastic variables such as load profile of the charging station, output profile of the RES and tariff profile of the utility must be considered at every decision step. The stochasticity in this kind of optimization environment makes power scheduling a challenging task that deserves substantial research attention. This dissertation investigates the application of reinforcement learning (RL) techniques in solving the power scheduling problem in a grid-tied PV-powered EV charging station with the incorporation of a battery energy storage system. RL is a reward-motivated optimization technique that was derived from the way animals learn to optimize their behavior in a new environment. Unlike other optimization methods such as numerical and soft computing techniques, RL does not require an accurate model of the optimization environment in order to arrive at an optimal solution. This study developed and evaluated the feasibility of two RL algorithms, namely, an asynchronous Q-learning algorithm and an advantage actor-critic (A2C) algorithm, in performing power scheduling in the EV charging station under static conditions. To assess the performances of the proposed algorithms, the conventional Q-learning and actor-critic algorithm were implemented to compare their global cost convergence and their learning characteristics. First, the power scheduling problem was expressed as a sequential decision-making process. Then an asynchronous Q-learning algorithm was developed to solve it. Further, an advantage actor-critic (A2C) algorithm was developed and was used to solve the power scheduling problem. The two algorithms were tested using a 24-hour load, generation and utility grid tariff profiles under static optimization conditions. The performance of the asynchronous Q-learning algorithm was compared with that of the conventional Q-learning method in terms of the global cost, stability and scalability. Likewise, the A2C was compared with the conventional actor-critic method in terms of stability, scalability and convergence. Simulation results showed that both the developed algorithms (asynchronous Q-learning algorithm and A2C) converged to lower global costs and displayed more stable learning characteristics than their conventional counterparts. This research established that proper restriction of the action-space of a Q-learning algorithm improves its stability and convergence. It was also observed that such a restriction may come with compromise of computational speed and scalability. Of the four algorithms analyzed, the A2C was found to produce a power schedule with the lowest global cost and the best usage of the battery energy storage system.

Keywords

Electrical Engineering

Reference:

Collections

Masters

Full item page