Why , When and How To do a Sitecore Quality Assessment or Audit

Introduction

Sitecore Assessment or Audit is a tedious exercise and here I’m trying to provide a high-level framework to carry out this activity.
Of course, this would not be one size fit all kind of template, but it can certainly be used as guide or a template when conducting such audits. Bear in minds that such Quality assessment templates needs to be tailored to suit a particular project considering the environment, Technologies used, Sitecore version, Topology, System integrations and so on.

When to do a Sitecore Assessment or Audit?

·         If the current partner did not deliver a quality implementation & if you are thinking of switching over to a new implementation partner, then it is certainly a good idea to conduct a Sitecore Audit to understand how bad or worse the situation is & also to decide on the next steps
·         As a pre cursor to any Sitecore upgrade activity or Cloud migration activity
·         As a regular hygiene, conduct an Sitecore Assessment or Audit every 2 years to check if there are any deviations from standard practice or if the implementation is not running as smoothly as it used to before

Why to do a Sitecore Assessment or Audit?

·         Capture the AS-IS solution architecture, design, and process aspects
·         Quality checks are carried out on various aspects like Infrastructure, Sitecore, Application, Integrations & Dev Ops Processes etc would provide a clear picture on deviations from best practices
Recommendations will help plan the next steps, identify must have, nice have fixes etc

How to conduct a Sitecore Quality Assessment or Audit?

Current Architecture and Design

The first step would be to capture the current infra and solution architecture and design aspects. If there documents that are already present and if those documents are providing appropriate information about the current landscape, then it would be great. If not, it be better to reverse engineer and create infra-architecture diagram, solution architecture diagram, current specification of the deployment environments, current usage stats, traffic volumes, overall technology stack etc.  This would serve as a foundation for carrying out the quality assessments & help in future maintenance and knowledge sharing.

 

Infrastructure

App Service environment

 

                     Incase ASE environments are used for deployment, check if there are avenues for cost optimization

                     for e.g. Just 2 ASE, i.e., one for Production & one for Non-Production should be sufficient. However, it completely depends the demands of the project.

                      

 

App Service Plans and App Services

Check if the following best practices are followed

·         Enable Azure "local cache" feature

·         Enable Asp.Net "Application Initialization" feature

·         Disable Azure Proactive Auto-heal

·         Prevent unexpected application pool recycling

·         Disable Dynamic Cache

·         Static Content compression

·         Dynamic content compression

·         Keep Alive & Always On

·         Expire Web Content

 

                     Check if the following Security best practices are followed

 

·         Run App Service applications in fully isolated and dedicated environment Azure App Service Environment (ASE)

·         WAF used to provide centralized inbound protection of web applications from the most common exploits and vulnerabilities.

·         Content Delivery server runs behind WAF and IP restrictions on the Web App. These limits access only from the Application Gateway

·         With Application Gateway, only IPs from countries where application needs access to be allowed. This is to avert any Ddos attack. IP restriction will be an ongoing activity and any malicious IPs will be added to the list as and when they are identified

·         Secure keys and credentials - Azure Key Vault safeguards keys and secrets by encrypting authentication keys, storage account keys, data encryption keys, .pfx files, and passwords using keys that are protected by HSMs.

·         Use RBAC to assign permissions to users, groups, and applications at a certain scope.

·         Restrict incoming source IP addresses - App Service Environment’s virtual network integration feature that helps restrict incoming source IP addresses through network security groups. Virtual networks enable to place Azure resources in a non-internet, routable network that can be access controlled.

·         Azure Active Directory authentication (instead of SQL Server authentication) to connect to databases - This can stop the proliferation of user identities across database servers.

·         Azure SQL firewall - SQL Database default source IP address restrictions allow access from any Azure address, including other subscriptions and tenants. This must be restricted to only allow IP addresses to access the web application instance.

·         Encrypt data at rest - Transparent Data Encryption (TDE) is enabled by default. TDE transparently encrypts SQL Server, Azure SQL Database, and Azure SQL Data Warehouse data and log files. TDE protects against a compromise of direct access to the files or their backup. This enables to encrypt data at rest without changing existing applications.

·         Connection string are stored in App setting, so they are encrypted at REST and during transit.

·         Access is limited via deny anonymous access web.config rules o For CD servers, anonymous access is denied to: a. /App_Config b. /xsl c. /sitecore/modules/Shell d. /sitecore/modules/debug e. /sitecore

·         App Service will not serve requests for .configs via default request filtering rules

·         Non-HTTPS requests are caught & redirected to HTTPS

·         Request Validation is enabled by default.


Sitecore XP Roles

Check if the config files in each app service has the correct role assigned.

For Eg:

·         CD App service should have ContentDelivery role,

·         CM App service should have ContentManagement role etc

Also check the search engine configured correctly, sometimes the default Lucene search would remain in the configs unnoticed.

Security risk assessment and mitigation

Check the STRIDE threats and mitigations

·         Spoofing should be mitigated by HTTPS connection

·         Tampering should be mitigated by Integrity

·         Repudiation should be mitigated by Enable Azure monitoring and diagnostics

·         Information disclosure should be mitigated by Encrypt sensitive data

·         Denial of Service should be mitigated by Monitor performance metrics for potential DoS and implement connection filters

·         Elevation of Privilege should be mitigated by Use of AD Authentication on content Authoring & custom Authorization for customers

 

Azure KeyVault

 

·         Credentials/ Secrets should be stored in Azure Key Vault and not in web.config as clear text.

·         Valid certificates should be stored in Key Vault

 

Azure Redis Cache for Session management

Check the following setting

        a)  operationTimeoutInMilliseconds is set to "50
  b)  retryTimeoutInMilliseconds="16000
  c)  connectionTimeoutInMilliseconds ="3000"
  d)  pollingMaxExpiredSessionsPerSecond="20"

Application insights

Check if the Sitecore logs and performance counters are configured into application insights

Azure Web app Backups

Check if there are Azure backups scheduled for web apps. If not define a backup policy and retention period

Azure Search

If Azure search is used as the search engine, then check the following

                     if <indexAllFields>true</indexAllFields> this setting will index all the fields.

                     1K Field limit per index – This is a limitation in Azure search , any impact due to this needs to be analyzed.

                     Recommend SOLR Cloud

 

 

Sitecore Environments

Check if each of the below environments exists and is in use for the intended purposes.

#

Environment

Purpose

1

Development / Integration

Source is merged

Run Smoke tests and automated Tests

2

Testing / QA

Functional / manual Testing

 Regression Testing

3

Staging / UAT

 Replica of Production

 Load testing, Deployment testing

 Warm fail over during production outage

 UAT Testing

4

Production

 Live Website

 

Disaster Recovery

Check if there is a separate DR environment. If not check if the DR can be designed to be created on Demand because with SLA for Azure App Service at 99.95% there is very little need to have a separate DR environment.

 

Monitoring, Alerts & Automated Maintenance

Check if the important events are monitored

For e.g.:

                     High rate of 5xx HTTP code in WebApp.

                     Average response time of the target WebApp is below an acceptable threshold.

                    High rate of CPU utilization

                    Persistent connection failures.

                    High rate of concurrent requests

                    High rate of Resource utilization


Sitecore / Application

Sitecore application "security hardening"

Check if the following security hardening practices are followed.

https://doc.sitecore.com/xp/en/developers/93/platform-administration-and-architecture/security-tasks.html

 

Sitecore Client

Check if the following best practices are followed.

·         Check Long Running Validators

·         Check Excessive Item Versions

·         Check Excessive Items per Node

·         Disable Webdav

·         Disable performance counters

·         Disable memory monitor

·         Remove Unreferenced Media Items from Sitecore

Administration

Check if the following best practices are followed.

·         Enable sticky sessions in load-balanced environments

·         Disable Sitecore UploadWatcher

·         Remove the “master” database in CD environment

·         Restrict access to the Client editing tools in CD environment

·         Set log4net to only record errors

·         Configure Sitecore to replace spaces with dashes in item names.

·         Use a dash for the media URL instead of ~

·         Disable the Publish Site option

·         Ensure availability of the Preview on CM servers

·         Setup continuous integration

·         Harden the CD servers

Best Practise

Check if the following best practices are followed.

·         Separate custom configuration

·         Replace the file extension with the forward slash

·         Utilize Developer Strip in Content Editor

Experience Platform

Check if the following best practices are followed.

·         Ensure analytics data is captured in the environments

·         Ensure that the server roles are properly configured as CM/CD/Processing Server/Reporting Role

·         Sitecore recommends that the collection database has a dedicated SQL server. In high traffic websites a lot of stress is placed on this database and can experience performance problems if adequate resources are not available

·         Sitecore recommends that there is at least one separate Processing server. In high traffic websites a lot of stress is placed on the processing and Content Authoring servers can experience performance problems if adequate resources are not available

Other Configurations

Check if the following best practices are followed.

·         Use App_Config/Include Files

·         Sitecore assemblies to be consistent across environments

·         Debugging must be turned off. The “debug” parameter of the <compilation> section to be set to “false” in the QA environment for the CM server and both CD servers

·         ContentEditor.RenderCollapsedSections

·          Disable prefetching of collapsed sections in Sitecore UI

·         ContentEditor.CheckHasChildrenOnTreeNodes

·         ContentEditor.CheckSecurityOnTreeNodes

·         MemoryMonitorHook CheckInterval

·         Analytics.PeformLookup

·         Media.MediaLinkPrefix

·         Possibility of hiding the following UI elements

1.     The Pages Bar

2.     The Quick Action Bar

3.     The Validator Bar

4.     The Quick Info Section

5.     Individual tabs in Ribbon

·         Workbox Data

·         Set languageEmbedding to “always” or “never”

·          Sub layout caching

Code

Check if the Helix Component based architecture is followed or not and highlight any deviations

 

Coding Standards that need to be corrected for maintainability

·         Hardcoding

·         Naming conventions not followed

·         Single Responsibility

·         Open-Closed Principle

·         Liskov Substitution

·         Interface Segregation

·         Dependency Inversion Principle

 

Check the below code optimization are followed to optimize the performance

·         API Controller that can be moved out of Sitecore

·         Avoid usage of GetItem, nested method calls, nested foreach loops & Use Lambda / Linq instead of foreach

·         Optimize Pipeline Processors

·         Libraries that may need an upgrade – jquery etc

 

 

Integrations

Check if the following best practices are followed.

·         Code Optimization to reduce API calls

·         The REST API end point are following best practises for naming

·         Check is API Client credentials are stored in Azure Key vault

·         Look up data can be stored in Sitecore itself or can be cached to avoid API calls there by improving performance

·         iFrame Integration

·         Re-direction through Links

 

 

Exception handling and logging

·         Proper error pages setup

·         Unless required, all the environments should have the Logging level set as ERROR.

 

Databases

Database Properties

·         Check Compatibility Level

·         Check Auto Shrink Property Set to False

·         Check Recovery Model Set to Simple

Query: SELECT name compatibility_level,is_auto_shrink_on, recovery_model_desc
FROM sys.databases

Database Usage metrics

§  Databases          

·         Max DTU             

·         Avg DTU Usage

·         Max DTU Usage

·         Avg DTU percentage

·         Max DTU percentage

·         No. Instances when there was a spike above 75%

·         Check if Web database is Single Isolated database

·         Check if One elastic pool Sitecore XM roles like core, master forms etc

·         Check if another elastic pool for XP roles

·         Check Regular SQL Health maintenance - To reduce the performance implication due to fragmented indexes, the indexes must be rebuilt at regular intervals.

·         Check if Database Backups enabled

·         Check if the Azure Features that are enabled or not

o   An Azure Active Directory administrator should be provisioned for SQL servers

o   Auditing on SQL server should be enabled

o   Private endpoint connections on Azure SQL Database should be enabled

o   Azure Defender for SQL should be enabled for unprotected Azure SQL servers

o   Public network access on Azure SQL Database should be disabled

o   Vulnerability assessment should be enabled on your SQL servers

·         Check if the Database maintenance is setup for following activities

o   Rebuild Index Task

o   Check Database Integrity

o   Update Statistics Task

o   Cleanup Database Tables

o   Check Database Cleanup Agents

DevOps

·         Check if industry standard practice like GitFlow is followed as branching and merging strategy

·         Check if Blue / Green or Active/Passive Deployment setup for any of the environment

·         Check if Sitecore Base folder (vanilla instance files and configs) as part of the pipeline.

·         Check if the role specific configuration files are part of the CI/CD pipeline and appropriate environment and role specifics transformations are set up

 

General

ü  Check the communications between the Sitecore app services

ü  Check the communication between Sitecore other integrating systems, API Gateway, third party, proxy, or any other systems in general

ü  Configs, App Settings and Connection Strings etc - Many of these settings will need adjustment for different Sitecore roles and deployment slots. Hence this needs due diligence, ensure these are correctly configured on DevOps.

ü  Typically, resources are under-utilized. Carefully planning is required to ensure usage of optimal number of azure resources

ü  Check if quantifiable and measurable NFR are defined and if the current design satisfies these NFRs

Eg:

·         The solution must be able to support the X peak users per hour

·         The website and all underlying functions required for account management (including Search) must be available 99.5% of the time measured 24 hours a Day 365 Per Year.

Comments

Popular posts from this blog

Sitecore: Performance issue on page load, Analytics?

Sitecore App Service Backup Problems and solutions

How to go to a Complete Sitecore Cloud Native from Sitecore XP?