xLSTM: Extended Long Short-Term Memory.

November 26, 2024

Enhancing LSTM for Time Series Forecasting: Introducing xLSTM

Long Short-Term Memory (LSTM) networks, widely used for sequence modeling like time series forecasting, face significant challenges such as exploding or vanishing gradients, particularly with long sequences. Additionally, their inability to parallelize computations, due to their reliance on the previous hidden state without access to earlier states, limits their efficiency. LSTMs traditionally employ a sigmoid function to control the gates within the network, but this can exacerbate the vanishing gradient problem.

To address these issues, researchers have developed a variant known as xLSTM, which modifies traditional LSTM architecture to enhance performance and manageability.

xLSTM Architecture and Innovations

xLSTM incorporates two distinct architectural blocks: the xLSTM and mLSTM. The xLSTM block aims to overcome the vanishing gradient problem by replacing the sigmoid function with an exponential function. This choice, while effective in maintaining higher gradients, can lead to excessively large values. To counter this, the output of each hidden state is normalized by dividing by a constant 'n', and gradient clipping is used to manage the scale of gradients effectively. These adjustments ensure that the outputs and gradients remain stable throughout training, introducing block diagonal matrices for memory mixing, which optimizes memory utilization over traditional full matrices used in recurrent computations.

mLSTM, on the other hand, stores keys and values in a memory matrix, with query vectors held in the hidden state. Similar to xLSTM, it uses the exponential function for gates and normalizes its outputs to prevent runaway values.

Comparative Performance

While xLSTM introduces significant improvements over traditional LSTM, it does not consistently outperform all models. In comparative studies, xLSTM was juxtaposed against models like Llama and Mamba, revealing that although xLSTM is not always the top performer, it remains a strong contender against the Transformer model and state-space models like Mamba. This competitiveness is particularly notable in applications where traditional LSTM strengths are crucial, such as in time series forecasting.

The introduction of xLSTM represents a pivotal enhancement in the field of recurrent neural networks, offering a viable alternative to both traditional LSTMs and more recent architectures like Transformers. By addressing core limitations of LSTMs and adapting their functionality to modern computational needs, xLSTM stands as a testament to the ongoing evolution in neural network design, particularly for challenging applications like time series analysis.

https://arxiv.org/abs/2405.04517

Search This Blog

Large Language Models