Prelude LML

Prelude-LML, or Prelude Log Monitoring Lackey, is a part of the project that deals with host based intrusion detection aspects through log analysis. It can monitor files created by a syslog daemon coming from different hosts on heterogeneous platforms, other types of single-line event logs, or simulate a syslog server on its own. Thus any system generating logs can benefit from the Prelude-LML's analysis engine.

  • Some example platforms are:
    • Unix systems
    • Switches and routers
    • Firewalls
    • Printers
    • Others systems which can log in Syslog format (like Windows NT/2K/XP with tools like Ntsyslog)

By placing Prelude-LML on a network and configuring other machines to send log messages to the Syslog daemon, it is possible to monitor an entire network of machines' logs.

  • Prelude-LML has two modes of operation:
    • Watch log files on the host where it is running (syslog or any other).
    • Receive UDP syslog messages from other hosts on the network.

Prelude-LML's primary function is log analysis. Logs on a local system or logs monitored over the network (if configured to accept syslog messages from other hosts) can be processed and analyzed in order to discover security anomalies.

Prelude-LML has a plugin system which actually performs all analysis and monitoring.

One of these plugins, called Pcre, is a regular expression engine powered by the PCRE (Perl Compatible Regular Expression) library. This plugin is used in Prelude LML to match a set of regular expressions (in common terms, signatures). Each ruleset provides regular expression matching for a particular purpose. Therefore, the Netfilter ruleset 'watches' for netfilter messages, and the GRSecurity ruleset 'watches' for GRSecurity messages. The configuration file tells Prelude-LML what analysis plugins to load and use to process logs.

Here is a non exhaustive list of logs that Prelude-LML Pcre plugin has rulesets for:
  • APC Environmental Monitoring Unit
  • arpwatch
  • F5 Big-IP
  • Cisco PIX
  • Cisco Router
  • Cisco VPN Concentrator
  • Clam Antivirus
  • Dell OpenManage
  • GRSecurity
  • Honeyd
  • IPChains
  • IPFW
  • Checkpoint IPSO
  • mod_security
  • Norton Antivirus Corporate Edition
  • NetApp ONTAP
  • Netfilter
  • Windows NT/200x/XP
  • PAM
  • pcAnywhere
  • SentryTools Portsentry
  • Postfix
  • ProFTPd
  • QPopper
  • SELinux
  • Sendmail
  • GNU Shadow Utils
  • Squid Proxy
  • OpenSSH sshd
  • GNU sudo
  • Open Source Tripwire
  • Vigor
  • Vpopmail
  • Linksys WAP11
  • Webmin
  • WU-FTPd
  • Exim
  • Oracle
  • tftpd
  • P3Scan
  • D-Link Wireless Router

With the power of PCRE (the regexp engine that the signature engine uses), writing additional rules is an easy task.

Multiple log files, different format

If you use multiple log file with different formatting, you can configure LML so that it know how to handle each log format.
This is done using a format section, in the PreludeLML configuration file:

[format=syslog]
time-format = "%b %d %H:%M:%S" 
prefix-regex = "^(?P<timestamp>.{15}) (?P<hostname>\S+) (?:(?P<process>\S+?)(?:\[(?P<pid>[0-9]+)\])?: )?" 
file = /var/log/messages
file = /var/log/auth.log
udp-server = 0.0.0.0:514

Additionally, LML can accept different format from a single log source (be it a file, or the UDP server). As an example, using the following configuration, LML will know how to parse any of the specified format from /var/log/mylogfile and UDP 0.0.0.0:514:

[format=syslog]
time-format = "%b %d %H:%M:%S" 
prefix-regex = "^(?P<timestamp>.{15}) (?P<hostname>\S+) (?:(?P<process>\S+?)(?:\[(?P<pid>[0-9]+)\])?: )?" 
file = /var/log/mylogfile
udp-server = 0.0.0.0:514

[format=apache]
time-format = "%d/%b/%Y:%H:%M:%S" 
prefix-regex = "^(?P<hostname>\S+) - - \[(?P<timestamp>.{20}) \+.{4}\] " 
file = /var/log/mylogfile
udp-server = 0.0.0.0:514

The format section allow several option:

  • prefix-regex

This tell LML how to handle the log header. With this option, you can bind variable used by LML to fill specific fields in the generated IDMEF Alert. Variable that you can use are:

Variable Usage
timestamp Bind to the DetectTime information of an IDMEF Alert
hostname Bind to the Target node information in an IDMEF Alert
process Bind to the Target process name in an IDMEF Alert
pid Bind to the Target process pid in an IDMEF Alert

Here is an example of how it work, using the following prefix-regex:

prefix-regex = "^(?P<timestamp>.{15}) (?P<hostname>\S+) (?:(?P<process>\S+?)(?:\[(?P<pid>[0-9]+)\])?: )?" 

Together with the following log entry:

Dec 30 20:09:03 hacklab honeydr5711: this is a log entry

When LML parse this log entry using the above prefix-regex, the timestamp, hostname, process, and pid variable will be set to the following value:

Variable Value
timestamp Dec 30 20:09:03
hostname hacklab
process honeyd
pid 5711

Each of these value will be assigned in the relevant IDMEF fields of the generated alert.
Please note that the timestamp variable is specific in that you have to specify an additional time-format option, so that LML is able to parse the time representation.

  • time-format

Should be set so that LML know how to format the timestamp in your log entry. This will be used to match the timestamp vaiable content defined through the prefix-regex option.

Sequence Description
%% The % character
%a or %A The weekday name according to the current locale, in abbreviated form or the full name.
%b or %B or %h The month name according to the current locale, in abbreviated form or the full name.
%c The date and time representation for the current locale.
%C The century number (0-99).
%d or %e The day of month (1-31).
%D Equivalent to %m/%d/%y. (This is the American style date, very confusing to non-Americans, especially since %d/%m/%y is widely used in Europe. The ISO 8601 standard format is %Y-%m-%d.)
%H The hour (0-23).
%I The hour on a 12-hour clock (1-12).
%j The day number in the year (1-366).
%m The month number (1-12).
%M The minute (0-59).
%n Arbitrary whitespace.
%p The locale's equivalent of AM or PM. (Note: there may be none.)
%r The 12-hour clock time (using the locale's AM or PM). In the POSIX locale equivalent to %I:%M:%S %p. If t_fmt_ampm is empty in the LC_TIME part of the current locale then the behaviour is undefined.
%R Equivalent to %H:%M.
%S The second (0-60; 60 may occur for leap seconds; earlier also 61 was allowed).
%t Arbitrary whitespace.
%T Equivalent to %H:%M:%S.
%U The week number with Sunday the first day of the week (0-53). The first Sunday of January is the first day of week 1.
%w The weekday number (0-6) with Sunday = 0.
%W The week number with Monday the first day of the week (0-53). The first Monday of January is the first day of week 1.
%x The date, using the locale's date format.
%X The time, using the locale's time format.
%y The year within century (0-99). When a century is not otherwise specified, values in the range 69-99 refer to years in the twentieth century (1969-1999); values in the range 00-68 refer to years in the twenty-first century (2000-2068).
%Y The year, including century (for example, 1991).

Some field descriptors can be modified by the E or O modifier characters to indicate that an alternative format or specification should be used.
If the alternative format or specification does not exist in the current locale, the unmodified field descriptor is used.

The E modifier specifies that the input string may contain alternative locale-dependent versions of the date and time representation:
Sequence Description
%Ec The locale's alternative date and time representation.
%EC The name of the base year (period) in the locale's alternative representation.
%Ex The locale's alternative date representation.
%EX The locale's alternative time representation.
%Ey The offset from %EC (year only) in the locale's alternative representation.
%EY The full alternative year representation.
The O modifier specifies that the numerical input may be in an alternative locale-dependent format:
Sequence Description
%Od or %Oe The day of the month using the locale's alternative numeric symbols; leading zeros are permitted but not required.
%OH The hour (24-hour clock) using the locale's alternative numeric symbols.
%OI The hour (12-hour clock) using the locale's alternative numeric symbols.
%Om The month using the locale's alternative numeric symbols.
%OM The minutes using the locale's alternative numeric symbols.
%OS The seconds using the locale's alternative numeric symbols.
%OU The week number of the year (Sunday as the first day of the week) using the locale's alternative numeric symbols.
%Ow The number of the weekday (Sunday=0) using the locale's alternative numeric symbols.
%OW The week number of the year (Monday as the first day of the week) using the locale's alternative numeric symbols.
%Oy The year (offset from %C) using the locale's alternative numeric symbols.
  • file

A file to monitor, this option might be set several time if you want to monitor multiple files with this format.

  • udp-server

Create an UDP server that is able to handle this format, and which listen to the specified address.

  • idmef-alter and idmef-alter-force

Recent PreludeLML version allow you to include static values into events generated using a given format:

[format=syslog]
idmef-alter = alert.analyzer(-1).node.location = My Location 

Using the above example, any events generated from a syslog format source will have the alert.analyzer(-1).node.location IDMEFPath set to My Location, unless the path is already set. You might use the idmef-alter-force option in case you want to overwrite a path that is already set.

Metadata

In the default mode of operation, Prelude-LML will keep track of the last offset analyzed in a given log file. If Prelude-LML is restarted,
it will start analyzing log files at their last known analyzed position.

You can overcome this behavior by using the 'metadata' option (available from the command line or the configuration file). This option might
take multiple arguments:

Value Description
tail Analyze from the tail of the file
head Analyze from the head of the file
last Analyze from the last known file position (default)
nowrite Do not write file metadata (prevent last keyword from working)

Example: if you want to analyze a log from the beginning of the file, without writing any metadata to be used for resuming operation, you could start Prelude-LML with the following options:

 prelude-lml --metadata=head,nowrite

Ruleset Tuning

In order to get the best performance out of Prelude-LML, you need to tune it post-install.
The most important thing to realize about the default Prelude-LML rulesets is that there are a wide range of devices supported, most of which are probably not even present in your environment. Each of these device rulesets that you leave turned on sap CPU cycles from the LML parser.

The easiest way to ensure that LML runs efficiently is to turn off those rulesets you don't need. There are two places to do this:

  • In pcre.rules, comment out the rulesets that don't apply.
  • In single.rules, comment out the stand-alone rules that don't apply.

The largest performance gain you'll see will be from pruning single.rules. Each of the rulesets in pcre.rules has a required regex to allow events to be processed by them. single.rules rules, on the other hand, are individually evaluated for every incoming log line, and those regexes are larger and more complex than the pre-ruleset regexes on each of the other rulesets.

Another thing you can do to speed up LML is to study and understand your per-device event rates and rearrange the ordering of the rulesets in pcre.rules to give preference to high event-rate devices. For instance, in Linux-heavy environments, you might places the rulesets that match the regex=kernel and regex=sshd pre-ruleset regex at the top of your ruleset list. In an environment where you were mostly monitoring PIX firewalls, you'd put the regex=%PIX line at the top. This ensures that the high-rate events will hit a match and be removed from the parsing queue more quickly, resulting in better overall performance.

Using more than one Pcre (modifying prelude-lml.conf and plugins.rules)

In some case you will try to split your logfile and match different rules on the multiple log files. In order to do that you must change the prelude-lml.conf and plugins.rules. By default if you do not specify a Pcre name, there will be only one Pcre named default.

Example :

- You have two syslog files "first.log" and "second.log" and you have three rules files "1.rules" "2.rules" "3.rules"

- You need to create two different Pcre instance :

plugins.rules original :

#   filename   plugin-name-list   pcre-option   regex
#   *          Debug              -             .*
****          Pcre               -             .*

plugins.rules new file :

#   filename   plugin-name-list   pcre-option   regex
    /var/log/first.log  Pcre[first]        -             .*
    /var/log/second.log Pcre[second]       -             .*

And the new prelude-lml.conf :

include = /usr/local/etc/prelude/default/idmef-client.conf

#next you specify the files that you will check (if you do not put files in prelude-lml.conf nothing will be check)

#syslog
time-format = "%b %d %H:%M:%S" 
prefix-regex= "^(?P<timestamp>.{15}) (?P<hostname>\S+) (?:(?<process>\S+?)(?:\[(?P<pid>[0-9]+)\])?: )?" 

file= /var/log/first.log
file= /var/log/second.log

[Pcre=first]
ruleset= /usr/local/etc/prelude-lml/ruleser/1.rules
ruleset= /usr/local/etc/prelude-lml/ruleser/2.rules

[Pcre=second]
ruleset= /usr/local/etc/prelude-lml/ruleser/2.rules
ruleset= /usr/local/etc/prelude-lml/ruleser/3.rules

And when you lauch prelude-lml you should see :

Subscribing plugin pcre[first]
Monitoring /var/log/first.log
Subscribing plugin pcre[second]
Monitoring /var/log/second.log

Filtering specific events

In order to filter certain events, you need to create a rule dedicated to matching the event you wish to filter out.
As an example, if you want to skip all PAM events concerning a successful login by alex, triggered by the following log entry:

Apr 10 16:40:05 bigamd sshd(pam_unix)!r10566: session opened for user alex by (uid=0)

You could add the following rule before the one triggering an alert:

regex=session opened for user alex by (\S*)\(uid=(\d*)\); last

The last keyword tell the engine that further processing should stop if this rule is matched.
Please note that rules are evaluated from top to bottom, so this must be inserted before the rule that actually match.

Creating and contributing rules

Rulesets that you contribute to the Prelude-LML maintainer should follow these guidelines:
  • Avoid using .+ or .* in regex entries unless actually necessary. Doing so will make your rule CPU-costly to implement.
  • Avoid capturing variables which you don't use. This causes unnecessary memory consumption.
  • At a minimum, include regex, classification.text, assessement.impact.severity, assessment.impact.type, assessment.impact.description.
  • If it's correct for this application, use the last keyword.
  • For readability, put only a single field on each line of your rules.
  • Include a sample log entry with each rule. This is very important for regression testing.
  • Gather as many pieces of data, and fill as many IDMEF fields as possible from the log entry.
  • If a similar rule exists in another ruleset (same function, different software), use the classification.text from the other rule.
  • Use only the actual log message, none of the syslog headers (this generally includes timestamp, originating node, originating process, and pid).
  • Submit new rulesets to the prelude-devel mailing list for consideration.

Rule format

An LML rules consist of a regular expression, used to match a given log entry, followed by IDMEFPath assignement (that might use value captured from the regular expression) in order to create an alert:

 regex=your_regex; idmef.path = value; idmef.path2 = value2;

Here is a working example:

#LOG:Dec  8 14:45:17 itguxweb1 sshdr32112: Accepted publickey for root from 12.34.56.78 port 56634 ssh2
#LOG:Jan 14 03:30:44 mail sshdr20298: Accepted publickey for root from fec0:0:201::3 port 63018 ssh2

regex=Accepted (\S+) for root from (\S+) port (\d+); \
 classification.text=Admin login; \
 id=1908; \
 revision=3; \
 analyzer(0).name=sshd; \
 analyzer(0).manufacturer=OpenSSH; \
 analyzer(0).class=Authentication; \
 assessment.impact.severity=medium; \
 assessment.impact.completion=succeeded; \
 assessment.impact.type=admin; \
 assessment.impact.description=Root logged in from $2 port $3 using the $1 method; \
 source(0).node.address(0).address=$2; \
 source(0).service.port=$3; \
 source(0).service.iana_protocol_name=tcp; \
 source(0).service.iana_protocol_number=6; \
 target(0).service.port=22; \
 target(0).service.name=ssh; \
 target(0).service.iana_protocol_name=tcp; \
 target(0).service.iana_protocol_number=6; \
 target(0).user.category=os-device; \
 target(0).user.user_id(0).type=target-user; \
 target(0).user.user_id(0).name=root; \
 additional_data(0).type=string; \
 additional_data(0).meaning=Authentication method; \
 additional_data(0).data=$1; \
 last;

Prelude-LML can accept any IDMEF field in the form of an IDMEFPath. Below, you will find
some example of IDMEF fields.

For listed IDMEF field (annotated (x) above), indexing starts at 0, so, for example, an event with multiple targets would have the first target listed
as target(0), followed by whatever IDMEF fields you use. See the existing rulesets for examples.

  • regex: A PCRE regex that should be matched to trigger the alert.
  • classification.text: The name of the alert, from one of the origins listed below.
  • classification.reference(x).origin: The type of reference, permitted values are: unknown, vendor-specific, user-specific, bugtraqid, cve, osvdb
  • classification.reference(x).name: Exactly one string containing the name of the reference from the source.
  • classification.reference(x).meaning: A brief manager (or the human operator of the manager) description of the alert
  • classification.reference(x).url: A URL at which the manager (or the human operator of the manager) can find additional information about the alert. The document pointed to by the URL may include an in-depth description of the attack, appropriate countermeasures, or other information deemed relevant by the vendor.
  • assessment.impact.severity: An estimate of the relative severity of the event (Possible values are: info, low, medium, high).
  • assessment.impact.completion: An indication of whether the analyzer believes the attempt that the event describes was successful or not. The permitted values are: failed, succeeded.
  • assessment.impact.type: The type of attempt represented by this event, in relatively broad categories. The permitted values are: admin, dos, file, recon, user, other.
  • assessment.impact.description: May contain a textual description of the impact, if the analyzer is able to provide additional details.
  • source(x).node.address(y).address, target(x).node.address(y).address: Address that has been attacked/Address that issued the attack. There can be more than one.
  • source(x).node.address(y).category, target(x).node.address(y).category: The type of address provided. Possible values: unknown, atm, e-mail, lotus-notes, mac, sna, vm, ipv4-addr, ipv4-addr-hex, ipv6-addr, ipv6-addr-hex, ipv6-net, ipv6-net-mask.
  • source(x).node.address(y).vlan_name, target(x).node.address(y).vlan_name: The name of the Virtual LAN to which the address belongs.
  • source(x).node.address(y).vlan_num, target(x).node.address(y).vlan_num: The number of the Virtual LAN to which the address belongs.
  • source(x).node.name, target(x).node.name: The name of the equipment. This information MUST be provided if no Address information is given.
  • source(x).node.category, target(x).node.category: The domain from which the name information was obtained. Possible values are: unknown, ads, afs, coda, dfs, dns, hosts, kerberos, nds, nis, nisplus, nt, wfw.
  • source(x).node.location, target().node.location: The location of the equipment.
  • source(x).spoofed, target(x).decoy: An indication of whether the source/target is a decoy. The permitted values are: unknown, yes, no.
  • source(x).interface, target(x).interface: May be used by a network-based analyzer with multiple interfaces to indicate which interfaces this source/target was seen on.
  • source(x).service.name, target(y).service.name: The name of the service. Whenever possible, the name from the IANA list of well-known ports SHOULD be used.
  • source(x).service.port, target(x).service.port: The port number being used.
  • source(x).service.iana_protocol_name, target(x).service.ianal_protocol_name: The protocol being used.
  • source(x).service.portlist, target(x).service.portlist: A list of port numbers being used.
  • source(x).user.category, target(y).user.category: The type of user represented (unknown, application, os-device).
  • source(x).user.user_id(y).type, target(x).user.user_id(y).type: The type of user information represented (current-user, original-user, target-user, user-privs, current-group, group-privs, other-privs).
  • source(x).user.user_id(y).name, target(x).user.user_id(y).name: A user or group name.
  • source(x).user.user_id(y).number, target(x).user.user_id(y).number: A user or group number.
  • source(x).process.name, target(x).process.name: A process name
  • source(x).process.pid, target(x).process.pid: A process PID.

Rule flow control

  • last: if a rule using the last keyword is matched, Prelude-LML won't process further rules.
  • silent: if a rule using the silent keyword is matched, Prelude-LML will remain silent and won't generate an alert.
  • chained: Rule marked as chained can only be called from other rules, using the goto or optgoto keyword.
  • goto: allow to call a specific rule from the currently matched rule: goto = rule_id. If the rule that is called doesn't match, the current rule will fail.
  • optgoto: Same as goto, but the called rule remain optional.
  • optregex: Allow to specify an optional regex.
  • min-optgoto-match: Require that at least the specified number of optgoto match for the current rule to be considered as a match.
  • min-optregex-match: Same as min-optgoto-match, but for optregex.