Running OESS

Accessing the Admin Section

Upon Successful login to the OESS UI, if a user has Administrator priviledges (they are a member of a workgroup that is marked as administrator) A button on the top right of the page will be displayed called "Admin".  The Admin button is available from all pages inside of the OESS application, and is the gateway to make system level modifications.

Adding a new user

Creating a new user in OESS requires access to the Admin section.  A user can login via multiple usernames (allowing for a shared account for example) however the username must match the REMOTE_USER environment variable passed through from apache.  If the username contains a domain name for example then the user in OESS needs to also contain that user name.

Usernames are ',' seperated.  The email address is where circuit notifications are sent for all circuit events (create, remove, edit, failover, down, restoration) 

Adding a new workgroup

To create a new workgroup in OESS a user must have access to the admin section.  In the admin section there is a workgroup tab.  When looking at the tab there is a list of existing workgroups, and a new workgroup button.  Click the new workgroup button to create a new workgroup.

Creating a new workgroup requires a workgroup name, the external ID allows for integration with other applications that may be assisting with managing the OESS instance (for instance a billing application).

There are 3 workgroup types to choose from

Normal - a normal workgroup, that will only have permissions to access the ports specified

Admin - Can see and edit any circuit on the network

Demo - Can not provision on the network at all

Replacing Node / DPID Change

I've had to do this in testing (because the juniper backup REs have different DPIDs).

The change is pretty simple depending on what you have done.  If you know the DPID before the node connects you need to convert the DPID into its integer form.  Once you do that update the node_instantiation table in OESS so that the dpid of the end_epoch = -1 record for that node_id to the new dpid (integer form).

If the node has already joined, the process is slightly more complex.  You can copy the dpid value from the new unapproved node, however you will need to delete the interfaces, links, node records associated to this node.

delete from link_instantiation where interface_a_id in (select interface_id from interface where node_id = X)

delete from link_instantiation where interface_z_id in (select interface_id from interface where node_id = X)

delete from interface_instantiation where interface_id in (select interface_id from interface where node_id =?)

delete from interface where node_id = x

delete from node_instantiation where node_id = X

delete from node where node_id = x

finally for both cases this is the query needed to be run to update the dpid

update node_instantiation set dpid = Y where node_id = X and end_epoch = -1

Adding a new Switch

When a new switch starts communicating with OESS, it waits in a pending aproval state.  The pending approval state is to allow an administrator to configure the profile of the device, and verify that the device is suppose to be part of the network.  This prevents unintended devices from becoming a part of the OESS network.  

OpenFlow

The first step in adding a switch to OESS, is to configure the switch so that its controller points to the OESS controller IP address.  Once the connection is established between the switch and the controller the device will appear in the Admin->Discovery section. Clicking the device will open the Device Details panel, and allow you to confirm the device.

To confirm the device will require a few parameters are needed:

The Name of the device specifies the name as it will appear in the OESS UI, in most cases the fully qualified domain name is probably what should be entered.

The Latitude and Longitude fields are used to specify the location of the device.  

The VLAN Range field contains the allowed vlan's used on all trunk interfaces on that node.  This is specifically used when slicing the network or when using hybrid mode.  This range does not affect end interfaces. 

Default Foward LLDP to controller controls whether or not to send the forward LLDP rule to the switch. Without this your switch will not auto discover properly.  Use this only if you know what you are doing.

Default Drop Rule is installed to prevent all flows that are currently not matching from being punted to the controller (as per openflow 1.0 spec).  

Maximum Number of Flow Mods specifies the maximum number of flows that can be installed on a switch.  FWDCTL will prevent any flows over this limit from being installed. (useful when slicing)

Send Bulk Flow Rules specifies if the switch is capable of rate limiting internally or of a Barrier message must be sent between each flow rule.

Once all of these have been selected properly click the "Confirm Device" button at the bottom of the popup to approve the device.

NETCONF

Devices that speak NETCONF must be added manually using the Add MPLS switch button.

Name takes the hostname of the device that will be displayed to the user.

Short name is the name of the device as it appears in show isis adjacencies.

The Latitude and Longitude fields are used to specify the location of the device. 

Vendor, Model, and Software Version defines which NETCONF module is used by OESS to connect to the device.

After entering the above paramerters, click Add MPLS Device.

Dual Stack

It's possible to run both OpenFlow and NETCONF at the same time. Under the Network tab, select your desired OpenFlow device. Under the Update Device dialog box, enable the MPLS Enabled check box. Provide the MPLS related field values as described under the NETCONF section, and then click Update Device.

Switch Diffs

Diffing is the process of comparing the OESS's expected state of the network against the actual network. This happens on a regular interval for both OpenFlow and NETCONF enabled devices.

OpenFlow

All diffing, as it relates to OpenFlow, is handled by OESS. There is no way for the user to influence this behavior.

NETCONF

Diffing for NETCONF enabled devices is similar to diffing for OpenFlow enabled devices, except that in some cases a user must influence the diffing behavior; This occurs when the size of a diff exceeds a preconfigured threshold. In these cases an administrator must navigate to the Config Changes section of the admin interface, and manually approve the diff for the nodes marked Pending Approval.

Insert a node in the middle

  1. Approve the new Node in the OESS UI, and verify the proper parameters are set (Maximum Flows, Message Delay, Vlan range)
  2. Break your existing link and insert into the new node
  3. Verify in the Admin interface that 2 new links were discovered
  4. In Admin section click the Network tab and click the circuit to be modified
  5. click the decom link button
  6. OESS will prompt with a message "can not decom, but detected a node in the middle, would you like to migrate" click yes
  7. Maintenance complete!

Automatic Backbone Move

OESS can detect intraswitch link moves.  If a circuit is going from switch A port 1 to switch B port 10, and a technician moves the link to switch A port 2 OESS will detect and automatically move all traffic going over the link to the new port.

No approval or other administrative task is required for this to happen automatically.

Managing Workgroups

To modify a workgroups permissions go to the admin section of the OESS UI.  Click on the Workgroup tabs on the left.  There will be a table in the center with the names of all of the workgroups.  Find the workgroup you wish to modify, and then click it.

On the new window that has opened, 2 seperate lists appear.  The left list contains all of the users currently part of the workgroup.  The right list contains the list of all the edge interfaces the workgroup is currently allowed to provision on.

Adding Users

Underneath each of the tables is an add button. The add users button will provide a list of all users currently configured in OESS.  Find the user to add to the workgroup (if the user does not exist see the add a user to OESS section). Clicking the user in the table adds the user to the Users list.  

Adding Interfaces

The add interface button opens up a map of the Network.  Clicking a node on the map will show a list of all the interfaces on the device.  Clicking an interface in the list will add that interface to the workgroup.

When running a node in Dual Stack mode (OpenFlow and NETCONF), you will find two of each interface. One interface was discovered using OpenFlow, and the second using NETCONF. Be sure to include both interfaces if you wish to enable provisioning of OpenFlow and NETCONF circuits.

Removing Users

To remove a user from a workgroup click the remove button next to their name in the user table.

Removing interfaces


To remove an interface from a workgroup click the remove button next to the interface in the Owned Interfaces table.

Editing Workgroup

At the top of the workgroup page is the Edit Workgroup Details button. Clicking this button displays a dialog that allows you to edit the Name, External ID, Node MAC Address Limit, Circuit Limit, and Circuit Endpoint Limit of a workgroup.

Link Weights / Metrics

By default, a circuit's shortest path is determined by the hop count between the A and Z endpoints. However, this behaviour can be altered by adding a weight to the intermediate links via the "metric" field. If a weight is added, the path with the lowest aggregate weight will be the shortest path. To modify the metric of a circuit, in Admin section, click the Network tab and click the circuit to be modified. Enter the desired value in the metric field and then click the "Update Link" button.

If there are multiple links between two nodes clicking the line representing the links will cause the Select Link panel to appear. Choose a link and click the Select button to open the link details panel for the link.

Managing Interface ACL Rules

To manage interface ACL rules from the admin section, click on the Network Tab. Click on the node in the map that contains the interface whose ACL rules you wish to modify.

A dialog box that contains the node's informatin will appear. At the bottom of the dialog is a table of all the interfaces contained within the node. Click the "View ACLs" button in the last column of the table to open a dialog that contains the interface's ACL information.

 

From here you can follow the Using the Frontend-ACL documentation for information on how to add, edit, remove, and reorder ACL rules.

Working with Hybrid Mode Switches

Working with hybrid mode switches, may mean that you need to restrict vlan ranges or change the vlan where discovery happens. Most restrictions exist in the Admin Section -> Networking tab.  Click on a node to set its restrictions.  For example if you have protected vlans 20-30 for non-openflow use, you will want to change the vlan range on the nodes to 1-19,31-4095.  If a switch does not support untagged when in hybrid mode, the discovery vlan can be set by editing the /etc/oess/database.xml file and adding <discovery_vlan>XXX</discovery_vlan> to the configuration.  Once this is done restart OESS for this change to take effect.

Running OESS In A Highly-Available Environment

Using standard Linux HA tools, like Pacemaker, Corosync, and DRBD, OESS can operate in an active/passive failover configuration allowing for quick automatic failover in the event of a problem on the primary server. Note that the specifics of these technologies is beyond the scope of this document, and the place to start if one is looking to operate a HA OESS instance is with the documentation of the relevant technologies.

This document specifically targets Pacemaker 1.0, Corosync 1.2.7, and DRBD 8.4.1 running on Red Hat Enterprise Linux 6. As RPMs for these packages are not typically provided by Red Hat, you will need to be able to build your own binary RPMs from source.

You will need at a minimum two systems, each with an available disk partition to use for DRBD, and a third IP address to be assigned by Pacemaker to the active host.

Pacemaker and Corosync - Used for cluster management:

http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/index.html

DRBD: Real-time filesystem replication:

http://www.drbd.org/users-guide-8.4/

Instructions:

  1. Install DRBD per the DRBD documentation on both hosts.
    1. Use your available partition as the DRBD volume.
    2. Sync and choose one host to be primary.
    3. Once you have verified DRBD is working, unmount the filesystem, stop the service, and remove it from startup (Pacemaker will manage this instead).
    4. Install MySQL on both hosts.
      1. MySQL should be configured with multimaster replication, communicating over SSL. Please see the MySQL documentation as needed.
      2. Install Apache on both hosts, no special configuration is needed.
      3. Install Pacemaker/Corosync per the Cluster Labs documentation. Create a cluster with your hosts and verify they all can join.
      4.  Stop MessageBus on each host and remove it from startup. It will need to be managed by Pacemaker, as OESS requires it.
      5. Install OESS on both hosts.
        1. Be sure to run oess_setup.pl only on the primary, and copy the database.xml to the secondary.
        2. Insure the DB users get created manually on the secondary.
        3. The /SNMP directory created by the setup should be changed to be a symlink into the DRBD volume on both hosts.
        4. Configure the cluster for the non-OESS services.
          1. For a three-node cluster with a primary, backup, and quorum node for breaking ties, this will look like the following:
            node srv1.domain.com
            node srv2.domain.com
            node srv3.domain.com
            primitive ClusterIP ocf:heartbeat:IPaddr2 \
               params ip="SHARED_IP_HERE" cidr_netmask="32" \
               op monitor interval="10s" \
               meta target-role="Started"
            primitive UsageData ocf:linbit:drbd \
               params drbd_resource="usagedata" \
               op monitor interval="10s"
            primitive messagebus lsb:messagebus \
               op monitor interval="10s"
            primitive mysqld lsb:mysqld \
               op monitor interval="10s"
            primitive usageFS ocf:heartbeat:Filesystem \
             params device="/dev/drbd/by-res/usagedata"        directory="/drbd" fstype="ext4" \
             meta target-role="Started"
            primitive website ocf:heartbeat:apache \
               params configfile="/etc/httpd/conf/httpd.conf" \
               op monitor interval="60s"
            ms UsageDataClone UsageData \
               meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Master"
            clone messagebusClone messagebus \
               meta target-role="Started"
            clone mysqldClone mysqld \
               meta clone-max="2" target-role="Started" is-managed="true"
            clone websiteClone website \
               meta clone-max="2" target-role="Started"
            location ip_prefer_srv1 ClusterIP 100: srv1.domain.com
            location ip_prefer_srv2 ClusterIP 100: srv2.domain.com
            location messagebus_prefer_srv1 messagebusClone 100: srv1.domain.com
            location messagebus_prefer_srv2 messagebusClone 100: srv2.domain.com
            location messagebus_prefer_srv3 messagebusClone 100: srv3.domain.com
            location mysqld_prefer_srv1 mysqldClone 100: srv1.domain.com
            location mysqld_prefer_srv2 mysqldClone 100: srv2.domain.com
            location usagedataclone_prefer_srv1 UsageDataClone 100: srv1.domain.com
            location usagedataclone_prefer_srv2 UsageDataClone 100: srv2.domain.com
            location usagefs_prefer_srv1 usageFS 100: srv1.domain.com
            location usagefs_prefer_srv2 usageFS 100: srv2.domain.com
            location website_prefer_srv1 websiteClone 100: srv1.domain.com
            location website_prefer_srv2 websiteClone 100: srv2.domain.com
            colocation fs_on_drbd inf: usageFS UsageDataClone:Master
            colocation ip_on_fs inf: ClusterIP usageFS
            order drbd_before_fs inf: UsageDataClone:promote usageFS:start
            order ip_before_website inf: ClusterIP websiteClone
            order mysqld_before_website inf: mysqldClone websiteClone
            property $id="cib-bootstrap-options" \
                  dc-version="1.0.11-XXXXX" \
                  cluster-infrastructure="openais" \
                  expected-quorum-votes="3" \
                  stonith-enabled="false" \
                  symmetric-cluster="false" \
                  last-lrm-refresh="XXXXXXXXX"
  1. Verify the cluster is operating and running these services correctly. Per the above config, MessageBus should be running on all three nodes, MySQL and Apache on the primary and secondary, and the DRBD volume/filesystem and shared IP on the primary only.
  2. Finally add the OESS services to the cluster.
    1. For the cluster described above, add the following:
      primitive fwdctl lsb:oess-fwdctl \
            op monitor interval="10s" \
            meta target-role="Started" is-managed="true"
      primitive notify lsb:oess-notification \
            meta target-role="Started" \
            op monitor interval="10s"
      primitive nox-controller lsb:nox_cored \
            op monitor interval="10s" \
            meta target-role="Started" is-managed="true"
      primitive topo lsb:oess-topo \
            op monitor interval="10s" \
            meta target-role="Started" is-managed="true"
      primitive vlan-stats lsb:oess-vlan_stats \
            op monitor interval="20s" \
            meta is-managed="true" target-role="Started"
      location fwctl_prefer_srv1 fwdctl 100: srv1.domain.com
      location fwctl_prefer_srv2 fwdctl 100: srv2.domain.com
      location notify_prefer_srv1 notify 100: srv1.domain.com
      location notify_prefer_srv2 notify 100: srv2.domain.com
      location nox_prefer_srv1 nox-controller 100: srv1.domain.com
      location nox_prefer_srv2 nox-controller 100: srv2.domain.com
      location topo_prefer_srv1 topo 100: srv1.domain.com
      location topo_prefer_srv2 topo 100: srv2.domain.com
      location vlan_stats_prefer_srv1 vlan-stats 100: srv1.domain.com
      location vlan_stats_prefer_srv2 vlan-stats 100: srv2.domain.com
      colocation fwdctl_on_ip inf: fwdctl ClusterIP
      colocation ip_on_fs inf: ClusterIP usageFS
      colocation notify_on_ip inf: notify ClusterIP
      colocation nox_on_ip inf: nox-controller ClusterIP
      colocation topo_on_ip inf: topo ClusterIP
      colocation vlan-stats_on_ip inf: vlan-stats ClusterIP
      order fs_before_vlan-stats inf: usageFS vlan-stats
      order fwdctl_before_notify inf: fwdctl notify
      order fwdctl_before_vlan-stats inf: fwdctl vlan-stats
      order ip_before_topo inf: ClusterIP topo
      order ip_before_website inf: ClusterIP websiteClone
      order messagebus_before_fwdctl inf: messagebusClone fwdctl
      order messagebus_before_notify inf: messagebusClone notify
      order messagebus_before_nox inf: messagebusClone nox-controller
      order messagebus_before_topo inf: messagebusClone topo
      order messagebus_before_vlan-stats inf: messagebusClone vlan-stats
      order mysqld_before_fwdctl inf: mysqldClone fwdctl
      order mysqld_before_notify inf: mysqldClone notify
      order mysqld_before_topo inf: mysqldClone topo
      order mysqld_before_vlan-stats inf: mysqldClone vlan-stats
      order mysqld_before_website inf: mysqldClone websiteClone
      order nox_before_fwdctl inf: nox-controller fwdctl
      order topo_before_nox inf: topo nox-controller
  1. Commit and verify the services start.
  2. Your output of “crm resource status” should look something close to this:
    ClusterIP   (ocf::heartbeat:IPaddr2) Started
    fwdctl      (lsb:oess-fwdctl) Started
    nox-controller    (lsb:nox_cored) Started
    topo  (lsb:oess-topo) Started
    usageFS     (ocf::heartbeat:Filesystem) Started
    vlan-stats  (lsb:oess-vlan_stats) Started
    Master/Slave Set: UsageDataClone
                Masters: [ srv1.domain.com ]
                Slaves: [ srv2.domain.com ]
    Clone Set: messagebusClone
                Started: [ srv1.domain.com srv2.domain.com srv3.domain.com ]
    Clone Set: mysqldClone
                Started: [ srv1.domain.com srv2.domain.com ]
    Clone Set: websiteClone
                Started: [ srv1.domain.com srv2.domain.com ]
    notify      (lsb:oess-notification) Started

 

The diagram below is a helpful visualization of the OESS dependencies and start order as managed by the cluster.

 

Circuit Loops

In OESS 1.1.8+ the Circuit loop feature allows you to loop all traffic that is recieved on a node, back to the source.  This disrupts traffic forwarding on that circuit.

Enabling this is fairly simple select the circuit you want to loop and then click the Loop Circuit button.  This will take you to a page that provides many warnings that you will distrupt traffic forwarding for this circuit if you continue.  You must then select a node in the path to loop all traffic at.  

Once you select a node and confirm that you wish to do this, OESS will install flows that send all traffic recieved on that node for that circuit back at the source of the traffic.  This may be useful to test a link in a path.

When you loop a circuit you will see a purple circuit indicating the node that was looped, and the circuit status will be looped.

Link/Node Maintenances

Link and Node maintenances performe a "Soft" down of a link or in the case of Nodes all links attached to the node.  This proactivly causes circuits to "fail over" to their backup path if one is configured.  It will then prevent the circuits from restoring to primary and prevent notifications for flapping links/nodes.  Putting a node or link into maintenance mode does not disrupt forwarding for circuits that have no backup path or can not be moved to an alternate path.  There is a momentary disruption while circuits do change paths.

Upon completion of the maintenance, the engineer clicks the "Complete Maintenance" button and OESS will signal link up events, and restore circuits to the primary path.

To put a Link or Node into maintenance mode, goto the admin section of OESS, and click the network tab.  Click on the link or node you want to put into maintenance mode.  At the bottom of the popup will be a button that says "put device/link into maintenance".  

You can see what devices / links are in maintenance mode by going to the Maintenance tab on the admin section.  This is where you can see what maintenances are currently happening, and complete them if they are ready to be completed.