Tutorial - Custom Parser¶
Parser Development¶
The purpose of a Parser is to process raw content collected by the Client and map it into a format that is usable by Combiners and Rules. Raw content is content obtained directly from a system file or command, and may be collected by Insights Client, or from some other source such as a SOS Report. The following examples will demonstrate development of different types of parsers.
You can find the complete implementation of the parser and test code in the
directory insights-core-tutorials/insights_examples/parsers
.
Secure Shell Parser¶
Overview¶
Secure Shell or ssh
(“SSH”) is a commonly used tool to access and interact
with remote systems. SSH server is configured on a system using the
/etc/sshd_config
file. Red Hat Enterprise Linux utilizes OpenSSH and the
documentation for the /etc/sshd_config
file is located
here.
Here is a portion of the configuration file showing the syntax:
# $OpenBSD: sshd_config,v 1.93 2014/01/10 05:59:19 djm Exp $
Port 22
#AddressFamily any
ListenAddress 10.110.0.1
#ListenAddress ::
# The default requires explicit activation of protocol 1
#Protocol 2
Many lines begin with a #
indicating comments, and blank lines are used
to aid readability. The important lines have a configuration keyword followed
by space and then a configuration value. So in the parser we want to make sure
we capture the important lines and ignore the comments and blank lines.
Creating the Initial Parser Files¶
First we need to create the parser file. Parser files are implemented in modules.
The module should be limited to one type of application. In this case we are
working with the ssh
application so we will create an secure_shell
module.
Create the module file ~/work/insights-core-tutorials/mycomponents/parsers/secure_shell.py
in the parsers
directory:
(env)[userone@hostone ~]$ cd ~/work/insights-core-tutorials/mycomponents/parsers
(env)[userone@hostone parsers]$ touch secure_shell.py
Now edit the file and create the parser skeleton:
1from insights import Parser, parser
2from insights.specs import Specs
3
4
5@parser(Specs.sshd_config)
6class SSHDConfig(Parser):
7
8 def parse_content(self, content):
9 pass
We start by importing the Parser
class and the parser
decorator. Our
parser will inherit from the Parser
class and it will be associated with
the Specs.sshd_config
data source using the parser
decorator. Finally we
need to implement the parse_content
subroutine which is required to parse
and store the input data in our class. The base class Parser
implements a
constructor that will invoke our parse_content
method when the class
is created.
Next we’ll create the parser test file ~/work/insights-core-tutorials/mycomponents/tests/parsers/test_secure_shell.py
as a skeleton that will aid in the parser development process:
1from mycomponents.parsers.secure_shell import SSHDConfig
2
3
4def test_sshd_config():
5 pass
Once you have created and saved both of these files and we’ll run the test to make sure everything is setup correctly:
(env)[userone@hostone ~]$ cd ~/work/insights-core-tutorials
(env)[userone@hostone insights-core-tutorials]$ pytest -k secure_shell
======================== test session starts =============================
platform linux2 -- Python 2.7.15, pytest-3.0.6, py-1.7.0, pluggy-0.4.0
rootdir: /home/userone/work/mycomponents, inifile:
plugins: cov-2.4.0
collected 3 items
mycomponents/tests/parsers/test_secure_shell.py ...
===================== 3 passed in 1.26 seconds ===========================
Hint
You may sometimes see a message that pytest
cannot be found,
or see some other related message that doesn’t make sense. The first
think to check is that you have activated your virtual environment by
executing the command source bin/activate
from the root directory
of your insights-core-tutorials project. You can deactivate the virtual
environment by typing deactivate
. You can find more information
about virtual environments here:
http://docs.python-guide.org/en/latest/dev/virtualenvs/
Parser Implementation¶
Typically parser and combiner development is driven by rules that need facts generated by the parsers and combiners. Regardless of the specific requirements, it is important (1) to implement basic functionality by getting the raw data into a usable format, and (2) to not overdo the implementation because we can’t anticipate every use of the parser output. In our example we will eventually be implementing the rules that will warn us about systems that are not configured properly. Initially our parser implementation will be parsing the input data into key/value pairs. We may later discover that we can optimize rules by moving duplicate or complex processing into the parser.
Test Code¶
Referring back to our sample SSHD input we will start by creating a test for the output that we want from our parser:
1from mycomponents.parsers.secure_shell import SSHDConfig
2from insights.tests import context_wrap
3
4SSHD_CONFIG_INPUT = """
5# $OpenBSD: sshd_config,v 1.93 2014/01/10 05:59:19 djm Exp $
6
7Port 22
8#AddressFamily any
9ListenAddress 10.110.0.1
10Port 22
11ListenAddress 10.110.1.1
12#ListenAddress ::
13
14# The default requires explicit activation of protocol 1
15#Protocol 2
16Protocol 1
17"""
18
19
20def test_sshd_config():
21 sshd_config = SSHDConfig(context_wrap(SSHD_CONFIG_INPUT))
22 assert sshd_config is not None
23 assert 'Port' in sshd_config
24 assert 'PORT' in sshd_config
25 assert sshd_config['port'] == ['22', '22']
26 assert 'ListenAddress' in sshd_config
27 assert sshd_config['ListenAddress'] == ['10.110.0.1', '10.110.1.1']
28 assert sshd_config['Protocol'] == ['1']
29 assert 'AddressFamily' not in sshd_config
30 ports = [l for l in sshd_config if l.keyword == 'Port']
31 assert len(ports) == 2
32 assert ports[0].value == '22'
First we added an import for the helper function context_wrap
which we’ll
use to put our input data into a Context
object to pass to our class
constructor:
1from mycomponents.parsers.secure_shell import SSHDConfig
2from insights.tests import context_wrap
Next we include the sample data that will be used for the test. Use of the
strip()
function ensures that all white space at the beginning and end
of the data are removed:
4SSHD_CONFIG_INPUT = """
5# $OpenBSD: sshd_config,v 1.93 2014/01/10 05:59:19 djm Exp $
6
7Port 22
8#AddressFamily any
9ListenAddress 10.110.0.1
10Port 22
11ListenAddress 10.110.1.1
12#ListenAddress ::
13
14# The default requires explicit activation of protocol 1
15#Protocol 2
16Protocol 1
17"""
Next, to the body of the test, we add code to create an instance of our parser class:
31def test_sshd_config():
32 sshd_config = SSHDConfig(context_wrap(SSHD_CONFIG_INPUT))
Finally we add our tests using the attributes that we want to be able to access in our rules. First a assumptions about the data:
some keywords may be present more than once in the config file
we want to access keywords in a case insensitive way
order of the keywords matter
we are not trying to validate the configuration file so we won’t parse the values or analyze sequence of keywords
Now here are the tests:
33 assert sshd_config is not None
34 assert 'Port' in sshd_config
35 assert 'PORT' in sshd_config
36 assert sshd_config['port'] == ['22', '22']
37 assert 'ListenAddress' in sshd_config
38 assert sshd_config['ListenAddress'] == ['10.110.0.1', '10.110.0.1']
39 assert sshd_config['Protocol'] == ['1']
40 assert 'AddressFamily' not in sshd_config
41 ports = [l for l in sshd_config if l.keyword == 'Port']
42 assert len(ports) == 2
43 assert ports[0].value == '22'
Our tests assume that we want to know whether a particular keyword is present, regardless of character case used in the keyword, and we want to know the values of the keyword if present. We don’t want our rules to have to assume any particular case of characters in keywords so we can make it easy by performing case insensitive compares and assuming all lowercase for access. This may not always work, but in this example it is a safe assumption.
Parser Code¶
The subroutine parse_content
is responsible for parsing the input data and
storing the results in class attributes. You may choose the attributes that
are necessary for your parser, there are no requirements to use specific names
or types. Some general recommendations for parser class implementation are:
Choose attributes that make sense for use by actual rules, or how you anticipate rules to use the information. If rules need to iterate over the information then a
list
might be best, or if rules could access via keywords thendict
might be better.Choose attribute types that are not so complex they cannot be easily understood or serialized. Unless you know you need something complex keep it simple.
Use the
@property
decorator to create read-only getters and simplify access to information.
Now we need to implement the parser that will satisfy our tests.
1 from collections import namedtuple
2 from insights import Parser, parser, get_active_lines
3 from insights.core.spec_factory import SpecSet, simple_file
4 import os
5
6
7 class LocalSpecs(SpecSet):
8 """ Datasources for collection from local host """
9 conf_file = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'sshd_config')
10
11 sshd_config = simple_file(conf_file)
12
13
14 @parser(LocalSpecs.sshd_config)
15 class SSHDConfig(Parser):
16
17 KeyValue = namedtuple('KeyValue', ['keyword', 'value', 'kw_lower'])
18
19 def parse_content(self, content):
20 self.lines = []
21 for line in get_active_lines(content):
22 kw, val = line.split(None, 1)
23 self.lines.append(self.KeyValue(kw.strip(), val.strip(), kw.lower().strip()))
24 self.keywords = set([k.kw_lower for k in self.lines])
25
26 def __contains__(self, keyword):
27 return keyword.lower() in self.keywords
28
29 def __iter__(self):
30 for line in self.lines:
31 yield line
32
33 def __getitem__(self, keyword):
34 kw = keyword.lower()
35 if kw in self.keywords:
36 return [kv.value for kv in self.lines if kv.kw_lower == kw]
We added an imports to our skeleton to utilize get_active_lines()
and
namedtuples
. get_active_lines()
is one of the many helper methods
that you can find in insights/parsers/__init__.py
, insights/core/__init__.py
,
and insights/util/__init__.py
. get_active_lines()
will remove all
blank lines and comments from the input which simplifies your parsers
parsing logic.
1 from collections import namedtuple
2 from insights import Parser, parser, get_active_lines
3 from insights.core.spec_factory import SpecSet, simple_file
4 import os
Since the sshd_config
spec requires root access to access the
/etc/ssh/sshd_config
file we created a local SpecSet
class called
LocalSpecs` that will contain a local ``sshd_config
spec that uses a local
sshd_config
file that does not require root access to read.
6 class LocalSpecs(SpecSet):
7 """ Datasources for collection from local host """
8 conf_file = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'sshd_config')
9
10 sshd_config = simple_file(conf_file)
To get the ssh_config
file needed for the local sshd_config spec you can
copy it from ~/work/insights-core-tutorials/insights_examples/parsers/sshd_config
to the
~/work/insights-core-tutorials/mycomponents/parsers
directory as shown below.
(env)[userone@hostone insights-core-tutorials]$ cp ./insights_examples/parsers/sshd_config ./mycomponents/parsers/
We can use namedtuples
to help simplify access to the information we
are storing in our parser by creating a namedtuple with the named attributes
keyword
, value
, and kw_lower
where kw_lower is the lowercase
version of the keyword.
15 KeyValue = namedtuple('KeyValue', ['keyword', 'value', 'kw_lower'])
In this particular parser we have chosen to store all lines (self.lines
)
as KeyValue
named tuples since we don’t know what future rules might.
We are also storing the set
of lowercase keywords (self.keywords
)
to make it easier to
determine if a keyword is present in the data. The values are left
unparsed as we don’t know how a rule might need to evaluate them.
17 def parse_content(self, content):
18 self.lines = []
19 for line in get_active_lines(content):
20 kw, val = line.split(None, 1)
21 self.lines.append(self.KeyValue(kw.strip(), val.strip(), kw.lower().strip()))
22 self.keywords = set([k.kw_lower for k in self.lines])
Finally we implement some “dunder” methods to simplify use of the class.
__contains__
enables the in
operator for keyword checking.
__iter__
enables iteration over the contents of self.lines
. And
__getitem__
enables access to all values of a keyword.
24 def __contains__(self, keyword):
25 return keyword.lower() in self.keywords
26
27 def __iter__(self):
28 for line in self.lines:
29 yield line
30
31 def __getitem__(self, keyword):
32 kw = keyword.lower()
33 if kw in self.keywords:
34 return [kv.value for kv in self.lines if kv.kw_lower == kw]
We now have a complete implementation of our parser. It could certainly perform further analysis of the data and more methods for access, but it is better keep the parser simple in the beginning. Once it is in use by rules it will be easy to add functionality to the parser to allow simplification of the rules.
Parser Documentation¶
The last step to complete implementation of our parser is to create the documentation. The guidelines and examples for parser documentation is provided in the section Documentation Guidelines.
The following shows our completed parser including documentation.
1 """
2 secure_shell - Files for configuration of `ssh`
3 ===============================================
4
5 The ``secure_shell`` module provides parsing for the ``sshd_config``
6 file. The ``SSHDConfig`` class implements the parsing and
7 provides a ``list`` of all configuration lines present in
8 the file.
9
10 Sample content from the ``/etc/sshd/sshd_config`` file is::
11
12 # $OpenBSD: sshd_config,v 1.93 2014/01/10 05:59:19 djm Exp $
13
14 Port 22
15 #AddressFamily any
16 ListenAddress 10.110.0.1
17 Port 22
18 ListenAddress 10.110.1.1
19 #ListenAddress ::
20
21 # The default requires explicit activation of protocol 1
22 #Protocol 2
23 Protocol 1
24
25 Examples:
26 >>> 'Port' in sshd_config
27 True
28 >>> 'PORT' in sshd_config # items are stored case-insensitive
29 True
30 >>> 'AddressFamily' in sshd_config # comments are ignored
31 False
32 >>> sshd_config['port'] # All value stored by keyword in lists
33 ['22', '22']
34 >>> sshd_config['Protocol'] # Single items have one list element
35 ['1']
36 >>> [line for line in sshd_config if line.keyword == 'Port'] # can be used as an iterator
37 [KeyValue(keyword='Port', value='22', kw_lower='port'), KeyValue(keyword='Port', value='22', kw_lower='port')]
38 >>> sshd_config.last('ListenAddress') # Easy way of finding the current configuration for a single item
39 '10.110.1.1'
40 """
41 from collections import namedtuple
42 from insights import Parser, parser, get_active_lines
43 from insights.specs import Specs
44 from insights.core.spec_factory import SpecSet, simple_file
45 import os
46
47
48 class LocalSpecs(SpecSet):
49 """ Datasources for collection from local host """
50 conf_file = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'sshd_config')
51
52 sshd_config = simple_file(conf_file)
53
54
55 @parser(LocalSpecs.sshd_config)
56 class SSHDConfig(Parser):
57 """Parsing for ``sshd_config`` file.
58
59 Attributes:
60 lines (list): List of `KeyValue` namedtupules for each line in
61 the configuration file.
62 keywords (set): Set of keywords present in the configuration
63 file, each keyword has been converted to lowercase.
64 """
65
66 KeyValue = namedtuple('KeyValue', ['keyword', 'value', 'kw_lower'])
67 """namedtuple: Represent name value pair as a namedtuple with case ."""
68
69 def parse_content(self, content):
70 self.lines = []
71 for line in get_active_lines(content):
72 kw, val = (w.strip() for w in line.split(None, 1))
73 self.lines.append(self.KeyValue(kw, val, kw.lower()))
74 self.keywords = set([k.kw_lower for k in self.lines])
75
76 def __contains__(self, keyword):
77 return keyword.lower() in self.keywords
78
79 def __iter__(self):
80 for line in self.lines:
81 yield line
82
83 def __getitem__(self, keyword):
84 kw = keyword.lower()
85 if kw in self.keywords:
86 return [kv.value for kv in self.lines if kv.kw_lower == kw]
87
88 def last(self, keyword):
89 """str: Returns the value of the last keyword found in config."""
90 entries = self.__getitem__(keyword)
91 if entries:
92 return entries[-1]
Parser Testing¶
It is important that we ensure our tests will run successfully after any
change to our parser. We are able to do that in two ways, first by using
doctest
to test our Examples section of the secure_shell
module, and
second
by writing tests that can be tested automatically using pytest
. Starting
with adding import doctest
our original code:
1from mycomponents.parsers.secure_shell import SSHDConfig
2from insights.parsers import secure_shell
3from insights.tests import context_wrap
4import doctest
5
6SSHD_CONFIG_INPUT = """
7# $OpenBSD: sshd_config,v 1.93 2014/01/10 05:59:19 djm Exp $
8
9Port 22
10#AddressFamily any
11ListenAddress 10.110.0.1
12Port 22
13ListenAddress 10.110.1.1
14#ListenAddress ::
15
16# The default requires explicit activation of protocol 1
17#Protocol 2
18Protocol 1
19"""
20
21def test_sshd_config():
22 sshd_config = SSHDConfig(context_wrap(SSHD_CONFIG_INPUT))
23 assert sshd_config is not None
24 assert 'Port' in sshd_config
25 assert 'PORT' in sshd_config
26 assert sshd_config['port'] == ['22', '22']
27 assert 'ListenAddress' in sshd_config
28 assert sshd_config['ListenAddress'] == ['10.110.0.1', '10.110.1.1']
29 assert sshd_config['Protocol'] == ['1']
30 assert 'AddressFamily' not in sshd_config
31 ports = [l for l in sshd_config if l.keyword == 'Port']
32 assert len(ports) == 2
33 assert ports[0].value == '22'
To test the documentation, we can then use doctest
:
37def test_sshd_documentation():
38 """
39 Here we test the examples in the documentation automatically using
40 doctest. We set up an environment which is similar to what a
41 rule writer might see - a 'sshd_config' variable that has been
42 passed in as a parameter to the rule declaration. This saves doing
43 this setup in the example code.
44 """
45 env = {
46 'sshd_config': SSHDConfig(context_wrap(SSHD_CONFIG_INPUT)),
47 }
48 failed, total = doctest.testmod(secure_shell, globs=env)
49 assert failed == 0
The environment setup allows us to ‘hide’ the set-up of the environment that normally provided to the rule, which is the context in which the example code is written. There’s no easy way to show the declaration of the rule, nor the parameter that is created with the parser object, but it’s good practice to supply an obvious name that rule writers might then use in their code.
The assert
line here makes sure that any failures in the examples are
detected by pytest. This will also include the testing output from doctest,
showing where the code failed to evaluate or where the output differed from
what was given.
Because this code essentially duplicates many of the things previously
tested explicitly in the test_sshd_config
function, we can remove some
of those tests and only test the ‘corner cases’:
52SSHD_DOCS_EXAMPLE = '''
53Port 22
54Port 22
55'''
56
57def test_sshd_corner_cases():
58 """
59 Here we test any corner cases for behavior we expect to deal with
60 in the parser but doesn't make a good example.
61 """
62 config = SSHDConfig(context_wrap(SSHD_DOCS_EXAMPLE))
63 assert config.last('AddressFamily') is None
64 assert config['AddressFamily'] is None
65 ports = [l for l in config if l.keyword == 'Port']
66 assert len(ports) == 2
67 assert ports[0].value == '22'
The final version of our test now looks like this:
1from mycomponets.parsers.secure_shell import SSHDConfig
2from insights.parsers import secure_shell
3from insights.tests import context_wrap
4import doctest
5
6SSHD_CONFIG_INPUT = """
7# $OpenBSD: sshd_config,v 1.93 2014/01/10 05:59:19 djm Exp $
8
9Port 22
10#AddressFamily any
11ListenAddress 10.110.0.1
12Port 22
13ListenAddress 10.110.1.1
14#ListenAddress ::
15
16# The default requires explicit activation of protocol 1
17#Protocol 2
18Protocol 1
19"""
20
21def test_sshd_config():
22 sshd_config = SSHDConfig(context_wrap(SSHD_CONFIG_INPUT))
23 assert sshd_config is not None
24 assert 'Port' in sshd_config
25 assert 'PORT' in sshd_config
26 assert sshd_config['port'] == ['22', '22']
27 assert 'ListenAddress' in sshd_config
28 assert sshd_config['ListenAddress'] == ['10.110.0.1', '10.110.1.1']
29 assert sshd_config['Protocol'] == ['1']
30 assert 'AddressFamily' not in sshd_config
31 ports = [l for l in sshd_config if l.keyword == 'Port']
32 assert len(ports) == 2
33 assert ports[0].value == '22'
34
35
36def test_sshd_documentation():
37 """
38 Here we test the examples in the documentation automatically using
39 doctest. We set up an environment which is similar to what a
40 rule writer might see - a 'sshd_config' variable that has been
41 passed in as a parameter to the rule declaration. This saves doing
42 this setup in the example code.
43 """
44 env = {
45 'sshd_config': SSHDConfig(context_wrap(SSHD_CONFIG_INPUT)),
46 }
47 failed, total = doctest.testmod(secure_shell, globs=env)
48 assert failed == 0
49
50
51SSHD_DOCS_EXAMPLE = '''
52Port 22
53Port 22
54'''
55
56
57def test_sshd_corner_cases():
58 """
59 Here we test any corner cases for behavior we expect to deal with
60 in the parser but doesn't make a good example.
61 """
62 config = SSHDConfig(context_wrap(SSHD_DOCS_EXAMPLE))
63 assert config.last('AddressFamily') is None
64 assert config['AddressFamily'] is None
65 ports = [l for l in config if l.keyword == 'Port']
66 assert len(ports) == 2
67 assert ports[0].value == '22'
To run pytest
on just the completed secure_shell
parser execute the following command:
(env)[userone@hostone ~]$ cd ~/work/insights-core-tutorials
(env)[userone@hostone insights-core-tutorials]$ pytest -k secure_shell
Once your tests all run successfully your parser is complete.