Back to Skills

System Configuration Analysis

openshift-eng
Updated Today
21 views
16
110
16
View on GitHub
Documentationaidata

About

This skill enables developers to analyze extracted sosreport archives to diagnose system configuration issues. It extracts and examines OS details, installed packages, systemd service status, and security policies like SELinux/AppArmor. Use it for investigating service failures, verifying package versions, or checking security settings from a sosreport.

Documentation

System Configuration Analysis Skill

This skill provides detailed guidance for analyzing system configuration from sosreport archives, including OS information, installed packages, systemd services, and SELinux/AppArmor settings.

When to Use This Skill

Use this skill when:

  • Analyzing the /sosreport:analyze command's system configuration phase
  • Investigating service failures or misconfigurations
  • Verifying package versions and updates
  • Checking security policy settings (SELinux/AppArmor)
  • Understanding system state and configuration

Prerequisites

  • Sosreport archive must be extracted to a working directory
  • Path to the sosreport root directory must be known
  • Understanding of Linux system administration

Key Configuration Data Locations in Sosreport

  1. System Information:

    • uname - Kernel version
    • etc/os-release - OS distribution and version
    • uptime - System uptime
    • proc/uptime - Uptime in seconds
    • sos_commands/release/ - Release information
  2. Package Information:

    • installed-rpms - RPM packages (RHEL/Fedora/CentOS)
    • installed-debs - DEB packages (Debian/Ubuntu)
    • sos_commands/yum/ - Yum/DNF information
    • sos_commands/rpm/ - RPM database queries
  3. Service Status:

    • sos_commands/systemd/systemctl_list-units - All units
    • sos_commands/systemd/systemctl_list-units_--failed - Failed units
    • sos_commands/systemd/systemctl_status_--all - Detailed service status
    • sos_commands/systemd/systemctl_list-unit-files - Unit files
  4. SELinux:

    • sos_commands/selinux/sestatus - SELinux status
    • sos_commands/selinux/getenforce - Current enforcement mode
    • sos_commands/selinux/selinux-policy - Policy information
    • var/log/audit/audit.log - SELinux denials
  5. AppArmor (if applicable):

    • sos_commands/apparmor/ - AppArmor configuration
    • etc/apparmor.d/ - AppArmor profiles
  6. System Configuration Files:

    • etc/ - System-wide configuration
    • etc/sysctl.conf or etc/sysctl.d/ - Kernel parameters
    • etc/security/limits.conf - Resource limits

Implementation Steps

Step 1: Analyze System Information

  1. Check OS version and distribution:

    if [ -f etc/os-release ]; then
      cat etc/os-release
    fi
    
  2. Get kernel version:

    if [ -f uname ]; then
      cat uname
    elif [ -f proc/version ]; then
      cat proc/version
    fi
    
  3. Check system uptime:

    if [ -f uptime ]; then
      cat uptime
    elif [ -f proc/uptime ]; then
      # Parse uptime from proc/uptime (seconds)
      awk '{printf "%.2f days\n", $1/86400}' proc/uptime
    fi
    
  4. Extract key system details:

    • OS name and version
    • Kernel version
    • System architecture (x86_64, aarch64, etc.)
    • Uptime (days)
  5. Check for outdated kernel or OS:

    • Compare kernel version with current stable
    • Note if system hasn't been rebooted in a very long time (>365 days)
    • Identify if OS version is EOL

Step 2: Analyze Installed Packages

  1. List installed packages:

    # For RPM-based systems
    if [ -f installed-rpms ]; then
      cat installed-rpms
    fi
    
    # For DEB-based systems
    if [ -f installed-debs ]; then
      cat installed-debs
    fi
    
  2. Extract key package versions:

    # Important system packages
    grep -E "^(kernel|systemd|glibc|openssh|openssl)" installed-rpms 2>/dev/null
    
    # Or use awk to parse package name and version
    awk '{print $1}' installed-rpms | head -20
    
  3. Check for known problematic versions:

    • Security vulnerabilities (if known CVEs)
    • Buggy package versions
    • Compatibility issues
  4. Identify package manager issues:

    # Check yum/dnf logs for errors
    if [ -d sos_commands/yum ]; then
      grep -i "error\|fail" sos_commands/yum/* 2>/dev/null
    fi
    
  5. Count packages and categorize:

    • Total packages installed
    • Key package versions (kernel, systemd, glibc, etc.)
    • Recently updated packages (if timestamps available)

Step 3: Analyze Service Status

  1. List all systemd units:

    if [ -f sos_commands/systemd/systemctl_list-units ]; then
      cat sos_commands/systemd/systemctl_list-units
    fi
    
  2. Identify failed services:

    if [ -f sos_commands/systemd/systemctl_list-units_--failed ]; then
      cat sos_commands/systemd/systemctl_list-units_--failed
    elif [ -f sos_commands/systemd/systemctl_list-units ]; then
      grep "failed" sos_commands/systemd/systemctl_list-units
    fi
    
  3. Check service details:

    # Parse detailed status for failed services
    if [ -f sos_commands/systemd/systemctl_status_--all ]; then
      # Extract service names and their status
      grep -E "●|Active:" sos_commands/systemd/systemctl_status_--all | head -50
    fi
    
  4. Count services by state:

    # Count running, failed, inactive services
    if [ -f sos_commands/systemd/systemctl_list-units ]; then
      awk '{print $4}' sos_commands/systemd/systemctl_list-units | sort | uniq -c
    fi
    
  5. Identify critical service failures:

    • System services (systemd-*, dbus, NetworkManager)
    • Application services (httpd, nginx, postgresql, etc.)
    • Custom services
  6. Extract failure reasons from logs:

    # For each failed service, find related log entries
    grep -i "failed to start\|service.*failed" sos_commands/logs/journalctl_--no-pager 2>/dev/null | head -20
    

Step 4: Analyze SELinux Configuration

  1. Check SELinux status:

    if [ -f sos_commands/selinux/sestatus ]; then
      cat sos_commands/selinux/sestatus
    fi
    
  2. Get SELinux mode:

    if [ -f sos_commands/selinux/getenforce ]; then
      cat sos_commands/selinux/getenforce
    fi
    
  3. Check for SELinux denials:

    # Look for AVC denials in audit log
    if [ -f var/log/audit/audit.log ]; then
      grep "avc.*denied" var/log/audit/audit.log | head -50
    fi
    
    # Or in journald logs
    grep -i "selinux.*denied\|avc.*denied" sos_commands/logs/journalctl_--no-pager 2>/dev/null | head -20
    
  4. Parse denial information:

    • Extract denied operations (read, write, execute, etc.)
    • Identify source and target contexts
    • Note which services are affected
  5. Check for SELinux booleans:

    if [ -f sos_commands/selinux/getsebool_-a ]; then
      cat sos_commands/selinux/getsebool_-a
    fi
    
  6. Identify SELinux issues:

    • SELinux in permissive mode (may hide errors)
    • SELinux disabled (security concern)
    • Frequent AVC denials (policy may need adjustment)
    • Context mismatches

Step 5: Check System Configuration

  1. Review kernel parameters:

    # Check sysctl settings
    if [ -f sos_commands/kernel/sysctl_-a ]; then
      cat sos_commands/kernel/sysctl_-a
    elif [ -d etc/sysctl.d ]; then
      cat etc/sysctl.d/*.conf 2>/dev/null
    fi
    
  2. Check resource limits:

    if [ -f etc/security/limits.conf ]; then
      grep -v "^#\|^$" etc/security/limits.conf
    fi
    
    # Check limits.d directory
    if [ -d etc/security/limits.d ]; then
      cat etc/security/limits.d/*.conf 2>/dev/null
    fi
    
  3. Review boot parameters:

    if [ -f sos_commands/boot/grub2-editenv_list ]; then
      cat sos_commands/boot/grub2-editenv_list
    elif [ -f proc/cmdline ]; then
      cat proc/cmdline
    fi
    
  4. Check systemd configuration:

    # Look for systemd configuration overrides
    if [ -d etc/systemd/system ]; then
      find etc/systemd/system -name "*.conf" 2>/dev/null
    fi
    

Step 6: Generate System Configuration Summary

Create a structured summary with the following sections:

  1. System Information:

    • OS name and version
    • Kernel version
    • Architecture
    • System uptime
    • Last boot time
  2. Package Summary:

    • Total packages installed
    • Key package versions (kernel, systemd, glibc, openssl, openssh)
    • Known problematic packages (if any)
    • Package manager issues
  3. Service Status:

    • Total services
    • Running services count
    • Failed services count
    • List of failed services with reasons
    • Critical service status
  4. SELinux/AppArmor:

    • SELinux status (enabled/disabled)
    • SELinux mode (enforcing/permissive)
    • Denial count
    • Top denied operations
    • Policy recommendations
  5. Configuration Issues:

    • Kernel parameter anomalies
    • Resource limit issues
    • Boot parameter problems
    • Configuration file errors

Error Handling

  1. Missing configuration files:

    • Different distributions have different file locations
    • Some files may not be collected based on sosreport options
    • Document missing data in summary
  2. Package manager variations:

    • Handle both RPM and DEB systems
    • Account for different package naming conventions
    • Support multiple package managers (yum, dnf, apt)
  3. SELinux vs AppArmor:

    • Check which MAC system is in use
    • Analyze accordingly
    • Note if both or neither are present
  4. Systemd vs init:

    • Older systems may use init instead of systemd
    • Check for both service management systems
    • Adapt analysis based on what's present

Output Format

The system configuration analysis should produce:

SYSTEM CONFIGURATION SUMMARY
============================

SYSTEM INFORMATION
------------------
OS: {os_name} {os_version}
Kernel: {kernel_version}
Architecture: {arch}
Uptime: {uptime_days} days ({last_boot_time})

Status: {OK|WARNING|CRITICAL}
Notes:
  - {system_info_note}

INSTALLED PACKAGES
------------------
Total Packages: {count}

Key Package Versions:
  kernel: {version}
  systemd: {version}
  glibc: {version}
  openssl: {version}
  openssh-server: {version}

Status: {OK|WARNING|CRITICAL}
Issues:
  - {package_issue_description}

SYSTEMD SERVICES
----------------
Total Units: {total}
Active: {active_count}
Failed: {failed_count}
Inactive: {inactive_count}

Failed Services:
  ● {service_name}.service - {description}
    Reason: {failure_reason}
    Last Failed: {timestamp}

  ● {service_name}.service - {description}
    Reason: {failure_reason}
    Last Failed: {timestamp}

Status: {OK|WARNING|CRITICAL}
Recommendations:
  - {service_recommendation}

SELINUX
-------
Status: {enabled|disabled}
Mode: {enforcing|permissive|disabled}
Policy: {policy_name}

AVC Denials: {count} denials found

Top Denied Operations:
  [{count}x] {operation} on {target} by {source}
  [{count}x] {operation} on {target} by {source}

SELinux Booleans: {count} custom settings

Status: {OK|WARNING|CRITICAL}
Issues:
  - {selinux_issue_description}

Recommendations:
  - {selinux_recommendation}

KERNEL PARAMETERS
-----------------
Key sysctl Settings:
  vm.swappiness: {value}
  net.ipv4.ip_forward: {value}
  kernel.panic: {value}

Custom Parameters: {count} custom settings found

Status: {OK|WARNING|CRITICAL}
Notes:
  - {kernel_param_note}

RESOURCE LIMITS
---------------
Custom Limits Found: {count}

{user_or_group}  {type}  {item}  {value}

Status: {OK|WARNING}
Notes:
  - {limits_note}

CRITICAL CONFIGURATION ISSUES
-----------------------------
{severity}: {issue_description}
  Evidence: {file_path}
  Impact: {impact_description}
  Recommendation: {remediation_action}

RECOMMENDATIONS
---------------
1. {actionable_recommendation}
2. {actionable_recommendation}

DATA SOURCES
------------
- OS Info: {sosreport_path}/etc/os-release
- Kernel: {sosreport_path}/uname
- Packages: {sosreport_path}/installed-rpms
- Services: {sosreport_path}/sos_commands/systemd/systemctl_list-units
- SELinux: {sosreport_path}/sos_commands/selinux/sestatus
- Audit Log: {sosreport_path}/var/log/audit/audit.log

Examples

Example 1: Failed Service Analysis

# List failed services
$ cat sos_commands/systemd/systemctl_list-units_--failed
  UNIT                    LOAD   ACTIVE SUB    DESCRIPTION
● httpd.service          loaded failed failed Apache Web Server
● postgresql.service     loaded failed failed PostgreSQL database

# Find failure reason in logs
$ grep "httpd.service" sos_commands/logs/journalctl_--no-pager | grep -i "failed\|error"
Jan 15 10:23:45 server systemd[1]: httpd.service: Main process exited, code=exited, status=1/FAILURE
Jan 15 10:23:45 server systemd[1]: httpd.service: Failed with result 'exit-code'
Jan 15 10:23:45 server httpd[12345]: (98)Address already in use: AH00072: make_sock: could not bind to address [::]:80

# Interpretation: httpd failed because port 80 is already in use

Example 2: SELinux Denial Analysis

# Check for AVC denials
$ grep "avc.*denied" var/log/audit/audit.log | head -5
type=AVC msg=audit(1705320245.123:456): avc: denied { write } for pid=1234 comm="httpd" name="index.html" dev="sda1" ino=789012 scontext=system_u:system_r:httpd_t:s0 tcontext=system_u:object_r:user_home_t:s0 tclass=file permissive=0

# Interpretation:
# - httpd (web server) was denied write access
# - Target file: index.html with context user_home_t
# - Issue: Web server trying to write to user home directory
# - Solution: Fix file context or move file to proper location

Example 3: Package Version Check

# Check for specific package versions
$ grep "^openssl" installed-rpms
openssl-1.1.1k-7.el8_6.x86_64
openssl-libs-1.1.1k-7.el8_6.x86_64

$ grep "^kernel" installed-rpms
kernel-4.18.0-425.el8.x86_64
kernel-4.18.0-477.el8.x86_64
kernel-core-4.18.0-425.el8.x86_64
kernel-core-4.18.0-477.el8.x86_64

# Interpretation:
# - OpenSSL version 1.1.1k (check for known CVEs)
# - Multiple kernels installed (good for rollback)
# - Current kernel is 4.18.0-477 (from uname)

Tips for Effective Analysis

  1. Check service dependencies: Failed service may be due to dependency failure
  2. Correlate with logs: Service failures often have detailed errors in logs
  3. Verify configurations: Check service config files for syntax errors
  4. Consider timing: When did service fail? Correlate with system events
  5. SELinux context matters: File contexts must match policy expectations
  6. Package versions: Compare with known good/bad versions
  7. Uptime significance: Very long uptime may mean missed security updates

Common Configuration Patterns and Issues

  1. Service dependency failure: ServiceB fails because ServiceA is not running
  2. Port conflict: Service fails to bind - port already in use
  3. Permission denied: Service can't access required files/directories
  4. SELinux blocking: Service denied access by SELinux policy
  5. Missing dependencies: Required package not installed
  6. Configuration error: Syntax error in config file
  7. Resource limits: Service hits ulimit (open files, processes, etc.)
  8. Outdated kernel: Running kernel doesn't match installed packages

Configuration Issue Severity Classification

Issue TypeSeverityImpact
Critical service failedHighCore functionality unavailable
Optional service failedLowNon-essential feature unavailable
SELinux in permissiveWarningReduced security, hiding issues
SELinux disabledCriticalNo mandatory access control
Kernel very outdatedHighMissing security fixes
EOL OS versionCriticalNo security updates
Many AVC denialsWarningPolicy may need tuning

See Also

  • Logs Analysis Skill: For detailed service failure log analysis
  • Resource Analysis Skill: For resource limit issues
  • Network Analysis Skill: For network service configuration

Quick Install

/plugin add https://github.com/openshift-eng/ai-helpers/tree/main/system-config-analysis

Copy and paste this command in Claude Code to install this skill

GitHub 仓库

openshift-eng/ai-helpers
Path: plugins/sosreport/skills/system-config-analysis

Related Skills

sglang

Meta

SGLang is a high-performance LLM serving framework that specializes in fast, structured generation for JSON, regex, and agentic workflows using its RadixAttention prefix caching. It delivers significantly faster inference, especially for tasks with repeated prefixes, making it ideal for complex, structured outputs and multi-turn conversations. Choose SGLang over alternatives like vLLM when you need constrained decoding or are building applications with extensive prefix sharing.

View skill

evaluating-llms-harness

Testing

This Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.

View skill

llamaguard

Other

LlamaGuard is Meta's 7-8B parameter model for moderating LLM inputs and outputs across six safety categories like violence and hate speech. It offers 94-95% accuracy and can be deployed using vLLM, Hugging Face, or Amazon SageMaker. Use this skill to easily integrate content filtering and safety guardrails into your AI applications.

View skill

langchain

Meta

LangChain is a framework for building LLM applications using agents, chains, and RAG pipelines. It supports multiple LLM providers, offers 500+ integrations, and includes features like tool calling and memory management. Use it for rapid prototyping and deploying production systems like chatbots, autonomous agents, and question-answering services.

View skill