Snort +AI Project

Home

Snort +AI Project > FAQ [FAQ | Downloads | Development]

1. What I should analyze before use Neural Networks to solve a problem?

You have to identify a pattern not easy to be recognized with "conventional programming", the inputs for your intelligent system should be easy to encode into numbers, which are the only things a NN can understand. From these numbers, your NN won't be more that adds and multiplications to get another number, that will have to be interpreted to present a human-readable response...

...As mentioned above, your inputs must be encoded. That means, you should use measures of occurrences, percentages, rates, stuff like that. In our case we identified that in a portscan attack, you always will have a huge amount of traffic in a short period of time, so we chose many variables that had to do with the number of attempts of connection to a host(s), and a combination of TCP flags in the packets. The point of everything is that we found something that is always present, but it is in different ways (a pattern), and that pattern could be turned in numbers...

...Remember that if something it's difficult to recognize even for brain, it will be more difficult for a system that tries to emulate it.

2. How portscans are detected?

Portscans have common characteristics of number of connections and time. Generally portscans imply initialization of a great number of connections in a short time. So inputs of ANN (Artificial Neural Network) are meant to detect connections quantity and time, between other characteristics.

3. Are Neural Network weights specifically defined for portscans? In this case, what's the policy to define weights?

MLP and Elman Artificial Neural Networks (ANN) need supervised training, so the ANN has two stages: the training stage and the execution stage. Weights are defined in training stage. We used an ANN simulator called Stuttgart Neural Network Simulator (SNNS) to build and train the ANN, that simulator receives corresponding input and output data sets.

Data sets are examples learned by the ANN in training time. Knowledge learned by ANN is stored in weights. So weights are not defined by a person, are defined by the simulator using a learning function (in this case Backpropagation for MLP and JE_BP for Elman).

4. How the ANN is trained?

To train the ANN we needed a Data set, to create it we used a modified PortscanAI preprocessor which obtains sets of input values for the ANN from sniffed network packets, but Data sets need output values too. Output values were defined by an Administrator.

For example: to create a data set for portscans, we need tow groups of data sets: 1) a group showing examples of normal traffic, and 2) a group showing examples of portscan traffic. To generate the first group, Administrator just sniffs normal traffic to generate inputs and specifies an output value symbolizing normal traffic. To generate the second group, Administrator makes a controlled portscan, sniffs it to create inputs, and defines an output value symbolizing that a portscana is happening.
After Data set is created (stored in a .txt file), we use it in SNNS to train the ANN. Then the already trained ANN is transformed into C code and then integrated to PortscanAI.

Above are two examples of data set:

For Normal traffic:

#In: h_dst h_src a_r_time a_s_time n_resp rh/has rh/had
0.470588 0.187500 1.930544 5.086774 1.000000 1.000000 0.375000
#Out: Normal portscan
1 0

For portscan traffic:

#In: h_dst h_src a_r_time a_s_time n_resp rh/has rh/had
1.000000 1.000000 0.001508 0.033645 1.000000 0.996689 0.537500
#Out: Normal ataque
0 1

5. Are specific thresholds used for portscan detection?

Trained ANN doesn't give an exact output value; it gives just an approximation, so we defined thresholds for output values. If the portscan output of ANN gives a value greater than 0.5 (for example), PortscanAI generates an alarm message indicating a possible portscan activity.

6. Is it possible to detect DoS like SYN flood, knowing DoS attack pattern and choosing appropriate weights/thresholds?

I think you can detect DoS traffic generating a new data set specific for that kind of attack. Most important parameters are: inputs and outputs of the ANN. After the ANN gives you an output value you can set thresholds you want.

7. Would that imply changes into the neural networks?

I don't think so, unless you change ANN topology.

8. Is it possible customize the neural network to detect DoS (there are many DoS attacks, creating different conditions on the network), is there any way I can trace back to the IP generating the attack, or more generally, to the packet itself so I can drop it?

Yes, you can detect the originating IP and drop packets. For example: if after the packet number 1000, the system detects that there is a DoS, you can configure it to drop that packet and 1001, 1002, etc.

9. Can the ANN detect all kind of attacks if properly trained?

I think it's better, if you have a specialized NN for each kind of attack, but the best way to solve that questions is making an analysis by your own with your neural network (your brain), if you can find a pattern that is common to all kind of DoS in different network conditions, you need just one NN that recognizes that pattern.