Forschung – RL4CES

Deep Reinforcement Learning

Deep reinforcement learning (DRL) is a class of autonomous learners (artificial intelligences) in which autonomous neural networks independently find possible solutions and try them out during training.

The strategies (policies) learnt in this way are very versatile and do not necessarily have to be monitored or restricted by experts. They can map complex courses of action and can be used universally for the respective use case after training.
This makes DRL algorithms much faster to use than conventional optimisation methods.

Adaptability

Conventional Deep Learning models are trained for a specific task such as classifying images of dogs or cats. They are able to achieve superhuman accuracy, but as soon as the task slightly changes, e.g., images of horses also need to be recognized, these models usually fail to generalize.

At the same time, humans often find it easy to adapt to new circumstances. Even young children can distinguish new objects from other items after being provided with only a few examples. They can successfully transfer knowledge from other tasks to the new one or they have already learned what the task, e.g. image classification, is generally about.

These two explanatory approaches motivate the research areas of transfer learning and meta-learning.

Transfer-Learning

Assume two similar machine learning tasks T₁ and T₂, i.e. two tasks only differing in loss functions, domain or domain dynamics. Transfer learning particularly aims to transfer knowledge from the first task, the source task, to the second one, the target task.

This is done by training a learner to solve source task T₁ and then transferring the knowledge gained to a learner processing the target task T₂. Usually, the source task thereby is assumed to contain more training data so that it can be learned better, while the target task is not provided with sufficient training data to learn the task on its own.

Meta-Learning

In contrast, a meta-learning model aims to iteratively learn from an increasing number of tasks Tᵢ drawn from a distribution of tasks that are similar in the way described above. As a result, it generalizes effectively and fast, i.e. in few shots, to unseen tasks from the same distribution.
This is why the meta-learning paradigm is often called "learning to learn".

Instead of merely transferring one specific solution to another specific one, a meta-learning model identifies all common properties shared between all tasks from a task distribution before fine-tuning to a particular task within a few shots.

Adaptability in the Energy System:

The energy system is constantly exposed to strong external influences: be it crises that cause unpredictable price fluctuations on the energy market, extreme weather events that put a strain on the energy grid or even political decisions such as nodal pricing that could fundamentally change the pricing mechanism on the energy markets.

All these changes pose intractable challenges to conventional Deep Learning models which, for instance, control the grid or compensate for weather-induced energy fluctuations via the energy market.

Therefore, we have set ourselves the goal of developing algorithms that can adapt to such drastic changes within a very short time. With the help of transfer and meta-reinforcement learning models, we want to make the energy system more adaptable and therefore more resilient. This will be our contribution to the energy transition of Germany within the next years.

Trustworthy AI

An important area of research in artificial intelligence is the development of trustworthy AI systems that operate transparently, safely and reliably. Trustworthy AI is particularly important for power supply systems, where malfunctions or misuse of AI can lead to significant economic losses or power outages. Two important areas in this field of research are Explainable AI and Adversarial Machine Learning (AML).

Explainable Artificial Intelligence

Explainable AI is a subfield of Trustworthy AI and focuses on developing AI systems that can provide clear and understandable explanations for their decision-making processes. These explanations are particularly important in energy systems, where various stakeholders (including regulators, operators, and the public) need to understand and justify the actions of AI systems. Most research in Explainable AI focuses on post-hoc explanation methods, which provide explanations for black-box models after a model has made a decision. Black-box machine learning (ML) models, such as random forests and deep neural networks, are functions too complex for humans to understand. However, post-hoc explanations are not always complete and precise. This makes it difficult to trust these explanations and the black-box model they attempt to explain. Moreover, black-box models (and thus post-hoc explanations) are often even necessary. Especially for structured data with meaningful features, inherently interpretable models (such as linear regression or decision trees) tend to work just as well as black-box models. Unlike black-box models, inherently interpretable AI models are designed to be transparent from the start and provide reliable explanations as a core part of their function. However, for the large quantities of unstructured data from power system applications (e.g., audio, images, and time series), deep learning (DL) offers significant performance improvements. Therefore, there is a need for inherently interpretable DL models that are capable of processing unstructured data with high performance while being transparent in their decision making.

Adversarial Machine Learning

Another important aspect of Trustworthy AI is adversarial robustness. It refers to the ability of an AI system to withstand attempts by attackers to manipulate or disrupt the operation of the AI system. Adversarial robustness is particularly important for AI applications in power supply systems, where a successful attack can cause significant damage, such as power outages. Research in adversarial machine learning focuses on developing methods to generate attacks and techniques to defend against them. However, most of this research takes place in the areas of computer vision and natural language processing. There is currently little research on AML in areas relevant to AI applications in power supply systems, such as regression and reinforcement learning tasks.

Complex Observation and Action Spaces

Observation Space

In both energy trading and grid control, there are a large number of input variables (e.g. current weather conditions, weather and renewable energy forecasts, order book information, feed-in, etc.) that the deep reinforcement learning algorithm has to deal with. For DRL applications, these input variables must enter the DRL environment via so-called observation spaces. With the help of the observation spaces, the DRL agent can then estimate the current status and derive meaningful actions. Due to the large number of possibilities, however, a large observation space is a hindrance to the development of DRL agents. For this reason, the input variables must be treated differently and scaled, for example, through parameterisation or upstream neural networks. As part of RL4CES, techniques such as imitation learning are also being examined more closely in order to pre-train the DRL agent so that it can handle the input variables better.

Action Space

In addition to the observation spaces, the action spaces are equally important for the performance of DRL agents. The action spaces describe the options available to the DRL agent. Both in the area of energy grids (switching grids, switching power on and off, etc.) and in energy trading (portfolio optimisation, forecast changes, etc.), there are complex action spaces that grow exponentially as the number of actions increases. Depending on the state of the environment, intelligent solutions must restrict some of the actions without ruling out cases from the outset.
Accordingly, one area of research in RL4CES focusses on how to deal with the complex action spaces for the use of DRL in energy grids and energy trading. To this end, various algorithms in combination with rule-based approaches are being analysed and tested for applications. Furthermore, approaches such as action masking are examined and analysed in more detail.

Graph Neural Networks

GNNs are deep learning models specialized in processing data with a graph structure. Unlike traditional neural networks, they can model and learn complex relationships and dependencies between entities. The data is represented as a graph, consisting of nodes connected by edges. In power grids, for example, these nodes correspond to network components such as generators or buses, linked by transmission lines. GNNs process such graphs through multiple layers, where information is exchanged between neighboring nodes. Each node learns an abstract representation, while the graph structure is preserved, meaning the output graph can also reflect relationships between nodes. Generally, GNNs are used for classification or regression at the node, edge, or graph level, as well as for predicting connections within the graph. GNNs have now proven to be effective models for a wide range of tasks. Since power grids naturally have a graph structure, with nodes that strongly influence each other, GNNs are particularly well-suited for information extraction and predictive modeling in this domain. GNNs for energy networks are a rapidly evolving and relatively young field, and we are working on developing GNNs that enhance both network calculations and control strategies. In this effort, we collaborate closely with the GAIN research group (https://www.gain-group.de/).