# Self-Normalizing Net for Numeric Data

Perform classification or regression on numeric data

Released in 2017, self-normalizing neural networks outperform other fully connected networks for a variety of classification and regression tasks on numeric data. This class of models provides optimal propagation of activations that are close to zero mean and unit variance across many layers. This is achieved using a special activation function named Scaled Exponential Linear Unit (SELU) and a special type of dropout named Alpha Dropout.

Number of models: 2

## Examples

### Resource retrieval

Get the uninitialized net (there are no pre-trained nets in this model):

 In[1]:=
 Out[1]=

### NetModel parameters

This model consists of a family of individual nets, each identified by a specific parameter combination. Inspect the available parameters:

 In[2]:=
 Out[2]=

Pick a non-default model by specifying the parameters:

 In[3]:=
 Out[3]=

Check the default parameter combination:

 In[4]:=
 Out[4]=

### Basic usage

Classification on numerical data: In this example, we use an eight-layer self-normalizing network to perform classification on the UCI Letter dataset. First, obtain the training and test data:

 In[5]:=

View two random training examples:

 In[6]:=
 Out[6]=

Self-normalizing nets assume that the input data has a mean of 0 and variance of 1. Standardize the test and training data:

 In[7]:=

Get the training net:

 In[8]:=
 Out[8]=

Specify a decoder for the net:

 In[9]:=
 Out[9]=

Train the net for 150 rounds, leaving 5% of the data for a validation set:

 In[10]:=
 Out[10]=

Obtain the accuracy of the trained net on the standardized test set:

 In[11]:=
 Out[11]=

Compare the accuracy against all the methods in Classify:

 In[12]:=
 Out[12]=

Obtain a random sample of standardized test data:

 In[13]:=
 Out[13]=

Test the trained net on a sample of standardized test data:

 In[14]:=
 Out[14]=

### Improving accuracy of the classifier net

Using the same example for data, first we obtain the dataset:

 In[15]:=

Standardize the test and training data in similar fashion:

 In[16]:=

Get the training net:

 In[17]:=
 Out[17]=

Specify a decoder for the net:

 In[18]:=
 Out[18]=

To improve the final accuracy, it is possible to average multiple trained networks obtained from different training runs. The following function runs multiple trainings and creates an ensemble network, which averages the outputs of the trained nets:

 In[19]:=

Specify the number of nets in the ensemble and create the ensemble network:

 In[20]:=
 Out[21]=

Obtain the accuracy of the ensemble network:

 In[22]:=
 Out[22]=

Compare it with the accuracy of the individual nets:

 In[23]:=
 Out[23]=

### Classification on nominal data

In this example, we use an eight-layer self-normalizing network to perform classification on the Mushroom Classification dataset. First, obtain the training and test data:

 In[24]:=

View two random training examples:

 In[25]:=
 Out[25]=

To standardize this data, we first need to convert all the nominal input classes into indicator vectors:

 In[26]:=
 Out[27]=

Then standardize the numeric vectors:

 In[28]:=
 Out[28]=

Create the standardized test and training dataset:

 In[29]:=

Get the training net:

 In[30]:=
 Out[30]=

Specify a decoder for the net:

 In[31]:=
 Out[31]=

Train the net for 150 rounds, leaving 5% of the data for a validation set:

 In[32]:=
 Out[32]=
 In[33]:=
 Out[33]=

Compare the accuracy against all the methods in Classify:

 In[34]:=
 Out[34]=

Obtain a sample of standardized test data and view the actual class labels:

 In[35]:=
 Out[36]=

Test the trained net on a sample of standardized test data:

 In[37]:=
 Out[37]=

### Regression on numerical data

In this example, we use an eight-layer self-normalizing network to predict the median value of properties in a neighborhood of Boston, given some features of the neighborhood. First, obtain the training and test data:

 In[38]:=

View two random training examples:

 In[39]:=
 Out[39]=

Self-normalizing nets assume that the input data has a mean of 0 and variance of 1. Standardize the test and training data:

 In[40]:=

Get the training net:

 In[41]:=
 Out[41]=

Train the net for 250 rounds leaving 7% of the data for a validation set and return both the trained net and the lowest validation loss:

 In[42]:=
 Out[42]=

Compute the test-set standard deviation:

 In[43]:=
 Out[43]=

Compare the standard deviation against all the methods in Predict:

 In[44]:=
 Out[44]=

Obtain a sample of standardized test data and view the actual class labels:

 In[45]:=
 Out[46]=

Test the trained net on a sample of standardized test data:

 In[47]:=
 Out[47]=

### Regression on nominal data

Create a dataset of the average monthly temperature (in degrees Celsius) as a function of the city, the year and the month:

 In[48]:=

View two random examples:

 In[49]:=
 Out[49]=

Split the data into training (80%) and test (20%) sets:

 In[50]:=

To standardize this data, we need to convert all the nominal classes to a indicator vectors:

 In[51]:=
 Out[53]=

Then standardize the numeric vector:

 In[54]:=
 Out[54]=

Create the standardized test and training dataset:

 In[55]:=

Get the training net:

 In[56]:=
 Out[56]=

Train the net for 1000 rounds leaving 7% of the data for a validation set and return both the trained net and the lowest validation loss:

 In[57]:=
 Out[57]=

Compute the test-set standard deviation:

 In[58]:=
 Out[58]=

Compare the standard deviation against all the methods in Predict:

 In[59]:=
 Out[59]=

Obtain a sample of standardized test data and view the actual class labels:

 In[60]:=
 Out[61]=

Test the trained net on a sample of standardized test data:

 In[62]:=
 Out[62]=

### Net information

Obtain the layer type counts:

 In[63]:=
 Out[63]=

Display the summary graphic:

 In[64]:=
 Out[64]=

### Export to MXNet

Export the net into a format that can be opened in MXNet. Input and output size must be specified before exporting:

 In[65]:=
 Out[65]=
 In[66]:=
 Out[66]=

Represent the MXNet net as a graph:

 In[67]:=
 Out[67]=

## Requirements

Wolfram Language 12.0 (April 2019) or above