LSTM pseudocode

by Maria Jane Poncardas, July 19, 2019


Forecasting models are generated based on the training of historical data of six (6) <confidential hydroelectric company> plants, which is from January 1, 2011 to May 4, 2014. The author used Python Jupyter notebook to construct training codes for RNN-LSTM forecasting method. To perform such training, missing values need to be treated hence imputation methods were employed such as Kalman filtering, per plant’s maximum rated capacity and the triple exponential smoothing. The following are the steps resembling information of the code for the forecasting:


PREPROCESSING:


  1. Import necessary packages and modules for the training:

    1. pandas - is used to extract power generated out from the consolidated dataset of excel file

    2. Tensorflow’s keras is utilized to build the architecture of RNN and is also used to initiate the model

    3. sklearn is for transforming features of data by scaling to a given range, in this case, between zero and one.

  2. The dataset being read was divided into two parts (per <hydroelectric> river): training set and test set. In this study, the author’s objective is to employ a length of 24 test sets (which is out from the tail of the dataset) for a day’s cycle. The rest are used for training.

  3. The dataset was scaled from zero to one.


ARCHITECTURE SETUP:


  1.  LSTM works based on timesteps. In order to create a prediction, it observes the data from the previous timesteps, which in this case is fifty (50) timesteps. 


  1. Each LSTMs memory cell requires a 3D shape tensor . With this, the author reshaped the 2D array into 3D.

  2. The author is implementing stacked LSTM in Keras. A Stacked LSTM architecture can be defined as an LSTM model comprised of multiple LSTM layers. In this study, the author has added four stacked hidden LSTM layers.

regressor = Sequential()

#50 neurons
#4 hidden layer
regressor.add(LSTM(units = 50, return_sequences = True, input_shape = (X_train.shape[1], 1)))
regressor.add(Dropout(0.2))
regressor.add(LSTM(units=50, return_sequences = True))
regressor.add(Dropout(0.2))
regressor.add(LSTM(units=50, return_sequences = True))
regressor.add(Dropout(0.2))
regressor.add(LSTM(units=50))
regressor.add(Dropout(0.2))
regressor.add(Dense(units=1, activation = 'sigmoid'))

regressor.compile(optimizer='adam', loss = 'mean_squared_error', metrics = ['accuracy'])


        7. Dropout is applied to every added LSTM layer. This works by probabilistically removing which may be input variables in the data sample or activations from a previous layer and is used to prevent overfitting.


        8. The dense layer is just a regular layer of neurons in the neural network wherein each neuron receives input from all the neurons in the previous layer, thus densely connected.


        9. These were then compiled with an optimizer “adam”, derived from adaptive moment estimation, as stochastic optimization specifically designed for learning rate algorithms.


        10. Having built the architecture, the machine is ready for training in a form of fitting. This was divided into batch sizes for machine’s memory management.


TESTING:


        11. Tests were also scaled and reshaped for consistency to the training data preprocessing. These were then fed to the model to predict and compare the accuracy of its forecasts.


        12. For an additional dataset to be tested out, say August 1 to 15, 2014, these were appended to the dataset (previously split for training + testing) and the model that was generated for prediction was loaded and used.


Every <hydroelectric> plant, which was treated with three different imputation techniques, has its unique trained model. 


(Number of models = N <hydroelectric> plants * M imputation techniques = 18)



===================================================================


Summary:


  1. Data were preprocessed per <hydroelectric> plant wherein the missing data for more than 48 hours were omitted and the remaining missing data were imputed with three different imputation techniques.

  2. The data was scaled from zero to one, timesteps were also generated for forecasting preprocessed training and test sets and was then reshaped into 3D tensor.

  3. The architecture was a stacked LSTM, wherein there are four hidden layers, and a dense which is used to receive input from all the neurons in the previous layers.

  4. Sigmoid was the activation function used since input range is between 0 and 1.

  5. The compiled architecture was then fitted into batch sizes for memory management.

  6. After fitting/training, the trained model according to its parameters (<hydroelectric> Plant and imputation technique), is then used to predict test sets and check its accuracy (MAPE) against the actual values.

Comments

Popular posts from this blog

Using QGIS to convert .shp file to .geojson (GADM)

On-the-Job training