Assessing and determining “scope”
The goal scope of your program should encompass all networks and systems across all geographic locations that are owned by your organization. You will find most individuals scoff at this goal, which is fair. It is a very daunting task to achieve and I would argue a majority of organizations never get near 99% coverage.
This can seem daunting at first, but is necessary to properly ascertain the true risks to the organization. Again, you can de-scope and work outwards after determining what is most important to your organization. Don’t start with systems that don’t matter, start with the systems that hold your most critical data and processes in place. If you already have data classification in place, that is great barometer to show if your priorities align with what the business believes to be the most critical data.
This will vary greatly by organization and there is never a one size fits all solution to this. Without knowledge of the business, you will have a very tough time achieving the full value of the program. This is why partnerships with the different business units is key.
Here are some examples of items that could be included with your scope to give some ideas. This is not a comprehensive or exhaustive list. Try and keep it at a high level and let the procedures and processes address specific sub components. Such as if you have 30 web applications, you may not want to list them all individually here.
- Corporate Networks
- Production
- Development
- Test
- Physical locations
- Headquarters
- Satellite Offices
- Data centers
- Co-location
- Cloud Instances
- Azure
- AWS
- Linode
- Telecommunications
- VOIP
- PBX
- Internal and customer facing apps
- WebApp
- MobileApp
- Third party controlled environments
- Service providers
- Cloud hosting
- Third part developers
- Staff augmentation
Explaining how to approach the program
It is important to lay out expectations for how to use the program document and writing out how to use it.
Define a few things such as if it needs to be reviewed at certain intervals, the goal of the document, and why it is important.
You want to make sure to lay out the end goal, which in terms of security, is to bring balance to confidentiality, integrity and availability while increasing IT infrastructure resiliency. The objects being protected are the organization’s data and systems used to collect, process, and maintain that data.
A common language
Words are important and carry meaning. Establishing a baseline for core concepts will help reduce confusion or miscommunication. After explaining the approach, taking time to define what items such as a “vulnerability” or “managing vulnerabilities” are examples. It could be as simple as pulling definitions from a reference material or going as deep as stating technical versus non-technical vulnerabilities. It is also important to lay out when vulnerabilities should be managed and who has the authority to manage them. Be careful as this document will be handed to business units, so try and provide instructions the business units can understand, avoid technical jargon where possible.
Lay out the options to address “risk”
As part of a your common language, it is imperative to define what options are valid once risk is discovered and spelling them out. Traditionally this has been seen as reduction/mitigation, avoidance, transfer, or acceptance. These definitions should include what they mean and examples of each so the audience can relate to the individual options for management. Setting expectations for this as well, will help make the process easier. One example is labeling “acceptance” as the item of last resort and how to handle them, such as “annual reviews”, “added to a risk register”, and reporting them to the board if necessary.
Establishing a methodology
Don’t reinvent the wheel. There are teams of people much smart than us that have created frameworks and methodologies to provide structure. Picking these based on your organization can vary greatly by geography or industry.
A common framework used in the United States are NIST 800-115 and CIS. These frameworks can be used as a roadmap for how to address vulnerability management within the organization. It will also help you baseline your organization’s posture against others in the same industry. This is a very common way to help size up against your peers so management and the board can see how their organization stacks up. Remember though that while it feels good or bad to size up against other organizations, remember a lot of postures are self reported, and therefore maybe skewed.
Laying out a maturity model and roadmap
There are different resources for determining an organization’s current maturity and looking forward to whats necessary to move up to the next maturity level. I have found the SANS vulnerability management maturity model to be an excellent resource to use. While there are others out there, this one is free and well written in my opinion.
Many maturity models have not accounted for cloud computing, which is another reason I like the SANS maturity model as it has a separate baseline for cloud compared to traditional on-premise vulnerability management. This can be very helpful as organizations move towards heavier cloud adoption.
References: https://www.sans.org/blog/vulnerability-management-maturity-model/ https://www.sans.org/posters/ciso-mind-map-and-vulnerability-management-maturity-model/
A big piece of having a maturity model is setting achievable goals. This should be stated in the program, such as “This organizations goal is to reach level three in the next three years.” The key word in the first sentence is achievable. Do not set yourself up for failure. Under promise and over deliver will make you a rockstar without burning you out.
Patch management
Now that an understanding of vulnerabilities along with what they are and how we handle them at a high level in the organization, it’s time to get into the “how”.
I found that explaining the concept of patch management and the different vulnerabilities with their life cycle also explains the different patching nomenclature such as “0 days”. A great reference for this is a paper called “Lifecycle of a Zero-day” by Bilge and Dumitras published in 2012.
It lays out the entire lifetime of vulnerabilities with a timeline and labels to help the reader understand why certain patches have higher risk scores than others. By taking different attributes into account such as public/bastioned systems, actively exploited, and the time of vulnerability exposure, you can create a narrative to help teams understand priority.
Vulnerability rating
An established ruleset for how to classify vulnerabilities should be laid out. Using “CVSS” score without business context is not the way to do this. Business context is important and was intended as a key part to determine CVSS’s true impact to your organization, but unfortunately many people just take the numbers at face value to the detriment of their organization. This is not saying CVSS is useless, but that it requires further investigation to provide true value.
Your ratings could be as simple as:
- Critical
- weaponized and automated with no or little user interaction
- remotely exploitable
- exploitable by attacker with minimal or no skill
- High
- weaponized, but requires hands on keyboard
- can be remotely exploitable, but not necessary
- exploitable by attacker with some skills
- Medium
- exploitation published, but difficult to exploit
- exploitability is mitigated to a high degree by current defenses
- Low
- exploitation unpublished or extremely difficult
- minimal impact
Set timelines for accountability
Once the ratings are defined, setting an achievable timeline for each will help ensure accountability. The key word here is achievable. An unachievable goal is a great way to sink a program at the start. This timeline should cover compensating controls and available patches. Limiting to patches will leave systems exposed and vulnerable.
Examples
- Critical - 2 weeks from disclosure for internal systems, 48 hours for externally available systems
- Severe - 1 month from disclosure, 2 weeks for externallyl available systems
- Moderate - 60 days from disclosure
- Low - 1 year from disclosure
- Out of band (0-days) - 1 week from disclosure, 48 hours for externally available systems
- Unsupported patches - decision made by management within 1 month
- All security patches must be installed within 90 days of release
Define systems and applications in scope for the program
Spell out what in your environment and falls under the patch management program. There may be IoT devices that automatically update that only require auditing, while other systems require care and maintenance. Also spell out the process for end-of-life systems. Forever days exist and must be addressed where possible. Consider that dusty box that everyone refuses to touch that only has replacement parts on EBAY, as an example.
Example of a list:
- Server class
- RHEL
- Windows
- zOS
- Workstation class
- Windows
- Mac
- Linux
- Network
- Cisco routers
- Juniper switches
- Palo Alto firewalls
- Databases
- SQL
- NoSQL
- Supported applications
- Adobe
- DNS
- DHCP
- Chrome
- Java
Context is key, classification is necessary
Calling out the requirement for asset categories and classifications provides necessary context for vulnerability and patch management. This could be as simple as if the system has data that is public/private and if it is bastioned/publically accessible. Associated ratings based on this helps immensely when competing priorities come up. There are only so many hours in the day.
Lay out how vulnerabilities are found
State the ways vulnerabilities are found in your organization such as:
- Vulnerability assessments
- Penetration testing
- Threat Intelligence Feeds
- Vendor patch RSS feeds or email notifications
- Risk assessments
Patch management lifecycle
Define how patch management is done and define the individual components. Do not create a long process. A great example would be:
- Assess
- Create and maintain a system and software inventory
- Assess patch management architecture
- Review infrastructure / configuration
- Identify
- Identify new patches
- Determine patch relevance, including threat assessment
- Verify path authenticity and integrity, such as checking the hash or certificate
- Evaluate & Plan
- Test the patch, if possible
- Applying patches to systems without Test environments carries inherent risk
- Perform a risk assessment for possible repercussions from patching or not patching
- Obtain approval from the appropriate stakeholders to deploy the patch
- Plan the patch release process and notify affected parties
- Test the patch, if possible
- Deploy
- Distribute and install the patch(s)
- Report on progress to the appropriate stakeholders
- Handle exceptions with a coordinated mitigation plan
- Review deployment for future process improvement
Give examples of processes and questions to answer for each step.
Exceptions
Exceptions will happen. They hopefully will be few and far in between for appropriate reasons in the grand scheme. The best way to handle this is require documentation in a ticket with authorization from management after review by the security team. Exceptions must be reviewed at predefined intervals or they are lost forever, creating risk and worse, technical debt. I would advise the following be included:
- All appropriate vulnerabilities listed
- Justification and mitigating controls available
- Exceptions authorized by the appropriate level of management
- Timeframe of exception acceptance and review
- At least annually, preferably quarterly
A common or recurring exception example would be how ever many days your organization has deemed acceptable for Microsoft patch Tuesday remediations. This may vary based on exposure and sensitivity, but it is an easy example of “external systems have 48 hours to patch critical” while “internal sensitive systems have 7 days to patch” would be applicable examples.
Patching considerations
You should lay out considerations for patching in your environment. Some examples to cover:
- Testing environment
- The primary function of a test environment is to mitigate risk prior to implementing changes to the production environment. An additional benefit from a test environment is to facilitate operator training on new configurations, checklist development, and evaluate procedures prior to deployment on production systems.
- Backups
- A backup of the existing stable system should be captured before production patching is conducted. It is recommended that asset custodians test backups to ensure viability on a consistent basis.
- Contingency
- Do not lay out all contingencies, define that the asset owners/custodians should work together to determine what happens in different failure scenarios.
- Regulatory requirements
- State that any regulatory requirements with stricter timelines will be adhered to for any impacted systems.