diff --git a/scripts/CLASSIFICATION_MATRIX.md b/scripts/CLASSIFICATION_MATRIX.md new file mode 100644 index 000000000..5b7a3ac6b --- /dev/null +++ b/scripts/CLASSIFICATION_MATRIX.md @@ -0,0 +1,411 @@ +# Cosmos DB Connectivity Diagnostic - Classification Matrix & Support Guide + +## Classification Decision Tree + +``` +START: Run diagnostic script + │ + ├─→ DNS Resolution Check + │ │ + │ ├─→ ❌ FAILED + │ │ └─→ Classification: dns_resolution_failed + │ │ Action: DNS/VPN/proxy troubleshooting + │ │ + │ └─→ ✓ PASSED + │ │ + │ ├─→ Resolved IP is RFC 1918 (10.x, 172.16-31.x, 192.168.x)? + │ │ │ + │ │ ├─→ YES (Private endpoint detected) + │ │ │ │ + │ │ │ └─→ TCP 443 Test + │ │ │ │ + │ │ │ ├─→ ❌ FAILED + │ │ │ │ └─→ private_endpoint_network_path_blocked + │ │ │ │ (VPN route, NSG, firewall, UDR, peering) + │ │ │ │ + │ │ │ └─→ ✓ PASSED + │ │ │ └─→ Check RBAC + │ │ │ + │ │ └─→ NO (Public endpoint) + │ │ │ + │ │ └─→ TCP 443 Test + │ │ │ + │ │ ├─→ ❌ FAILED + │ │ │ └─→ tcp_connectivity_blocked + │ │ │ (Firewall, ISP, proxy) + │ │ │ + │ │ └─→ ✓ PASSED + │ │ └─→ network_connectivity_healthy + │ │ + │ └─→ Check Azure Configuration & RBAC + │ │ + │ ├─→ Azure CLI authenticated? + │ │ ├─→ NO → Skip ARM checks, mark warning + │ │ └─→ YES → Query network config & roles + │ │ + │ └─→ Sufficient permissions? + │ ├─→ NO → rbac_insufficient + │ └─→ YES → All checks passed +``` + +--- + +## Classification Code Reference + +### Success Codes + +#### `network_connectivity_healthy` +- **Status:** success +- **When:** DNS resolves AND TCP 443 succeeds +- **Interpretation:** Local network is working. If Cosmos DB operations fail, issue is auth/RBAC/data-plane. +- **Actions:** + - Verify RBAC/authentication permissions + - Check account firewall IP rules + - Verify data-plane token hasn't expired + - Check application logs for specific errors + +--- + +### Failure Codes + +#### `dns_resolution_failed` +- **Status:** failure +- **When:** DNS lookup fails with SocketException or timeout +- **Interpretation:** Cannot resolve account hostname to any IP +- **Root Causes:** + - DNS server misconfiguration + - VPN/proxy intercepting DNS queries + - Corporate proxy redirecting .documents.azure.com + - Network unreachable before DNS server + - ISP DNS failure +- **Actions:** + 1. Check VPN/proxy DNS settings + 2. Run `nslookup ` + 3. Try alternate DNS: `nslookup 8.8.8.8` + 4. Ping endpoint: `ping ` + 5. Contact network team if no resolution + +--- + +#### `tcp_connectivity_blocked` +- **Status:** failure +- **When:** DNS succeeds BUT TCP 443 fails +- **Interpretation:** Network path blocked between client and endpoint +- **Root Causes (Public Endpoint):** + - Corporate firewall blocking outbound 443 + - ISP blocking Cosmos/Azure IPs + - Regional geo-blocking + - HTTPS inspection proxy interfering + - Host-level firewall (Windows Defender, etc.) +- **Root Causes (Private Endpoint):** + - VPN not configured for private endpoint subnet + - Route not established between VPN subnet and private endpoint subnet + - NSG rules blocking 443 inbound on PE subnet + - NVA/firewall dropping packets + - UDR misconfiguration + - VNet peering not configured or expired + - Private DNS zone misconfiguration +- **Actions:** + 1. Run `Test-NetConnection -ComputerName -Port 443 -TraceRoute` + 2. If private endpoint: Ask network team to verify VPN routing + 3. Check host firewall (Windows Defender, Mac firewall, Linux iptables) + 4. If corporate proxy: Verify HTTPS inspection not blocking certificates + 5. Try from different network to isolate source + +--- + +#### `private_endpoint_network_path_blocked` +- **Status:** failure +- **When:** Resolved to private IP (10.x, 172.16-31.x, 192.168.x) BUT TCP 443 fails +- **Interpretation:** Private endpoint detected but cannot reach it—network path issue +- **Root Causes:** + - VPN client subnet → private endpoint subnet routing broken + - Firewall/NVA blocking internal traffic + - NSG with restrictive rules on PE subnet + - UDR pointing to wrong next hop + - VNet peering not established + - Private DNS zone not configured or stale +- **Actions:** + 1. Confirm VPN is connected and assigned correct subnet + 2. Ask network team to verify routing: `route print` (Windows) or `netstat -rn` (Linux/Mac) + 3. Check Azure NSG rules on private endpoint subnet for port 443 inbound + 4. Verify private DNS zone has A record pointing to PE IP + 5. Check if VNet peering exists and is Active + 6. Run `Test-NetConnection -ComputerName -Port 443` directly to PE IP + 7. Provide network team with source IP from script output + +--- + +### Warning Codes + +#### `rbac_insufficient` +- **Status:** warning +- **When:** Network OK BUT caller lacks data-plane permissions +- **Interpretation:** Network is healthy, but RBAC prevents data operations +- **Actions:** + 1. Request Cosmos DB Operator or Contributor role assignment + 2. If using connection strings: ensure account hasn't been regenerated + 3. Check data-plane RBAC (if enabled) via Azure CLI: `az role assignment list --scope ` + +--- + +#### `private_endpoint_mismatch` +- **Status:** warning +- **When:** Resolved IP differs from expected private endpoint IP +- **Interpretation:** Routing may be asymmetric or PE configuration changed +- **Actions:** + 1. Verify private endpoint IP hasn't changed in Azure Portal + 2. Ask network team to check asymmetric routing (DNS from corp vs VPN DNS) + 3. Flush DNS cache: `ipconfig /flushdns` (Windows) or `sudo dscacheutil -flushcache` (Mac) + +--- + +#### `azure_config_check_skipped` +- **Status:** warning +- **When:** Azure CLI not authenticated or not installed +- **Interpretation:** Cannot validate ARM-level network config (firewall rules, PE connections) +- **Actions:** + 1. Install Azure CLI: https://learn.microsoft.com/en-us/cli/azure/install-azure-cli + 2. Authenticate: `az login` + 3. Re-run script to collect ARM-level diagnostics + +--- + +#### `unknown_error` +- **Status:** failure or warning +- **When:** Unhandled condition or unexpected error +- **Interpretation:** Script encountered something not in the matrix +- **Actions:** + 1. Check script output for error details + 2. Provide full JSON report to support + +--- + +## Support Playbook + +### Tier 1: Triage (ICM Responder) + +**When customer reports: "Cosmos DB operations return HTTP 0.0 / connection errors"** + +1. **Ask customer to run script:** + ```powershell + .\Diagnose-CosmosConnectivity.ps1 -Interactive + ``` + +2. **Receive JSON output. Check classification.code:** + + | Code | Response | + |------|----------| + | `network_connectivity_healthy` | → Escalate to data-plane/auth team. This is not a network issue. | + | `dns_resolution_failed` | → Run script playbook below | + | `tcp_connectivity_blocked` (public endpoint) | → Run TCP failed / public endpoint playbook | + | `private_endpoint_network_path_blocked` | → Run private endpoint playbook | + | `rbac_insufficient` | → Check RBAC permissions | + | `azure_config_check_skipped` | → Ask customer to run `az login` and re-run | + +3. **Document:** + - Save JSON report in ICM + - Note classification code and recommended actions + - Link to this support guide in response + +--- + +### Playbook: DNS Resolution Failed + +**Symptoms:** `dns_resolution_failed` code + +**Steps:** + +1. **Verify endpoint name with customer:** + - Check it matches Azure Portal > Cosmos Account > URI + - Typos are common + +2. **Customer self-service:** + - Ask: "Can you manually run nslookup?" + ```powershell + nslookup my-cosmos-account.documents.azure.com + ``` + - If nslookup fails → Likely VPN/proxy DNS redirect + - If nslookup succeeds but script fails → Check DNS servers in script output vs nslookup + +3. **If behind corporate proxy:** + - Ask: "Is your traffic routed through a corporate proxy?" + - If YES: Proxy may be intercepting DNS or blocking .documents.azure.com + - Action: Customer should contact corporate network team + +4. **If using VPN:** + - Ask: "Does DNS work when you disconnect from VPN?" + - If YES → VPN DNS redirect issue + - Action: Customer should contact VPN admin + +5. **Escalation:** + - If all above fail, ask customer to contact their ISP or network provider + - This is not a Cosmos issue; it's upstream DNS + +--- + +### Playbook: TCP 443 Failed / Public Endpoint + +**Symptoms:** `tcp_connectivity_blocked` code with public IP + +**Steps:** + +1. **Customer runs detailed trace:** + ```powershell + Test-NetConnection -ComputerName -Port 443 -TraceRoute + ``` + +2. **Analyze output:** + - Does it reach gateway/ISP? + - Where does it drop? + +3. **If corporate network:** + - Check with network team if 443 outbound is allowed to Azure + - May need to whitelist docs.microsoft.com or documents.azure.com + +4. **If ISP/home network:** + - Try from mobile hotspot to rule out ISP blocking + - If hotspot works → ISP is blocking Azure + +5. **If Windows Defender Firewall:** + - Check Windows Defender Firewall for outbound rules + - Ensure 443 is not blocked + +6. **If behind proxy:** + - Proxy may be doing HTTPS inspection + - Ask IT if they use SSL Bump/HTTPS Inspection + - May need to disable inspection for documents.azure.com or accept custom cert + +--- + +### Playbook: Private Endpoint Network Path Blocked + +**Symptoms:** `private_endpoint_network_path_blocked` code + +**Steps:** + +1. **Gather critical info from customer:** + - Source IP (from script output: `execution.hostname` and `diagnostics.tcp.sourceIp`) + - Resolved PE IP (from script: `diagnostics.dns.addresses[0]`) + - Is VPN connected? + - Which VPN client? + +2. **Customer provides to network team:** + - "TCP from [source-IP] to [PE-IP]:443 is timing out" + - "Please verify routing from VPN subnet to PE subnet" + - "Please check NSGs for port 443 inbound on PE subnet" + +3. **Network team should check:** + - Route table: Does VPN subnet have route to PE subnet? + - NSG: PE subnet NSG allows inbound 443? + - NVA/Firewall: Any stateful filtering blocking traffic? + - UDR: Any User Defined Routes sending traffic wrong way? + - VNet peering: If PE in different VNet, is peering configured? + - Private DNS: Does private DNS zone have A record for PE IP? + +4. **Cosmos team role:** + - Verify account has private endpoint connection in Approved state + - Check if PE IP matches what Azure reports + - Provide PE connection details from Azure Portal + +5. **Escalation criteria:** + - If routing is correct but still fails → May be NSG inside PE subnet (rare) + - If all checks pass → Escalate to Azure Networking support + +--- + +### Playbook: RBAC Insufficient + +**Symptoms:** `rbac_insufficient` code + +**Steps:** + +1. **Check role assignments:** + ```powershell + az role assignment list --scope /subscriptions//resourceGroups//providers/Microsoft.DocumentDB/databaseAccounts/ + ``` + +2. **Assign appropriate role:** + - Cosmos DB Operator (read/write data) + - Cosmos DB Account Reader (read-only) + - Contributor or Owner (full management) + +3. **If using master key:** + - Primary/secondary keys are still valid if account hasn't been regenerated + - Ask: Has the account been regenerated recently? + - If yes, old keys won't work + +--- + +## JSON Parsing for Automation + +### Python Example (Support Bot) + +```python +import json + +def parse_cosmos_diagnostic(json_data): + report = json.loads(json_data) + + classification = report.get("classification", {}) + code = classification.get("code") + status = classification.get("status") + + # Route based on code + if code == "network_connectivity_healthy": + return "Escalate: Auth/RBAC team" + elif code == "dns_resolution_failed": + return "Run DNS playbook" + elif code == "tcp_connectivity_blocked": + endpoint = report["target"]["endpointUrl"] + if "10." in report["diagnostics"]["dns"]["addresses"][0]: + return "Run Private Endpoint playbook" + else: + return "Run TCP Failure / Public Endpoint playbook" + elif code == "private_endpoint_network_path_blocked": + return "Run Private Endpoint playbook" + elif code == "rbac_insufficient": + return "Check RBAC: " + str(report["diagnostics"]["rbac"]["roleAssignments"]) + else: + return "Unknown code: " + code +``` + +### Support Ticket Template + +``` +COSMOS DB CONNECTIVITY ISSUE - DIAGNOSTIC RECEIVED + +Classification: [classification.code] +Status: [classification.status] +Summary: [classification.summary] + +Network Diagnostics: + DNS Resolution: [diagnostics.dns.succeeded] + TCP 443 Connectivity: [diagnostics.tcp.succeeded] + HTTPS Reachability: [diagnostics.https.statusCode] + Private Endpoint: [diagnostics.privateNetwork.isPrivateRange] + +Azure Configuration: + Public Network Restricted: [diagnostics.azureNetworkConfig.publicNetworkAccessRestricted] + Private Endpoints: [diagnostics.azureNetworkConfig.privateEndpoints.length] configured + +RBAC Status: + Classification: [diagnostics.rbac.classification] + Can Read Account: [diagnostics.rbac.canReadAccount] + Can Manage Account: [diagnostics.rbac.canManageAccount] + +Recommended Actions: +[classification.recommendedActions joined with newlines] + +Next Step: +[routing based on classification.code] +``` + +--- + +## References + +- [Azure Cosmos DB Troubleshoot Connectivity Issues](https://learn.microsoft.com/en-us/azure/cosmos-db/troubleshoot-connection) +- [Private Endpoints for Azure Cosmos DB](https://learn.microsoft.com/en-us/azure/cosmos-db/how-to-configure-private-endpoints) +- [Network Security Groups](https://learn.microsoft.com/en-us/azure/virtual-network/network-security-groups-overview) +- [User Defined Routes](https://learn.microsoft.com/en-us/azure/virtual-network/virtual-networks-udr-overview) diff --git a/scripts/DIAGNOSTIC_SCHEMA.md b/scripts/DIAGNOSTIC_SCHEMA.md new file mode 100644 index 000000000..db2746a92 --- /dev/null +++ b/scripts/DIAGNOSTIC_SCHEMA.md @@ -0,0 +1,460 @@ +# Cosmos DB Connectivity Diagnostic - JSON Schema v1.0 + +## Overview +The diagnostic script outputs a structured JSON report containing network connectivity, private network configuration, and RBAC assessment data. This schema is stable and versioned to support parsing and triage automation. + +## Root Object + +```json +{ + "version": "1.0.0", // Schema version (semantic versioning) + "timestamp": "2026-05-13T14:30:45.123Z", // ISO 8601 UTC timestamp + "target": {...}, // Account and subscription context + "execution": {...}, // Script execution environment + "diagnostics": {...}, // All diagnostic results + "classification": {...} // Automated classification and recommendations +} +``` + +--- + +## Target Object +Account and subscription identifiers. + +```json +{ + "target": { + "endpointUrl": "https://my-cosmos-account.documents.azure.com", + "hostname": "my-cosmos-account.documents.azure.com", + "subscriptionId": "12345678-1234-1234-1234-123456789012", // May be "REDACTED" if --Redact flag used + "resourceGroup": "my-rg", // May be "REDACTED" + "accountName": "my-cosmos-account" // May be "REDACTED" + } +} +``` + +--- + +## Execution Object +Environment where script ran. + +```json +{ + "execution": { + "hostname": "DESKTOP-ABC123", // Machine name + "platform": "Windows 10", // OS name and version + "powershellVersion": "7.3.0" // PowerShell version + } +} +``` + +--- + +## Diagnostics Object +All diagnostic results grouped by category. + +```json +{ + "diagnostics": { + "dns": { ... }, // DNS resolution results + "tcp": { ... }, // TCP 443 connectivity results + "https": { ... }, // HTTPS probe results + "privateNetwork": { ... }, // Private endpoint indicators + "azureNetworkConfig": { ... }, // ARM-sourced network configuration + "rbac": { ... }, // RBAC assessment + "azureCli": { ... } // Azure CLI context + } +} +``` + +### DNS Results + +```json +{ + "dns": { + "hostname": "my-cosmos-account.documents.azure.com", + "succeeded": true, // true = hostname resolved + "addresses": [ + "52.180.123.45", // Resolved IPv4 addresses + "2607:f8b0:4005:806::200e" // IPv6 if available + ], + "error": null, // Error message if resolution failed + "dnsServers": [ + "8.8.8.8", // Detected DNS servers + "8.8.4.4" + ], + "latencyMs": 145 // DNS query latency in milliseconds + } +} +``` + +**Classification logic:** +- `succeeded: false` → DNS failure, likely network or DNS configuration issue +- `succeeded: true` with `addresses` containing private IP (10.x, 172.16-31.x, 192.168.x) → Private endpoint +- `succeeded: true` with `addresses` containing public IP → Public endpoint + +### TCP Connectivity Results + +```json +{ + "tcp": { + "hostname": "my-cosmos-account.documents.azure.com", + "port": 443, + "succeeded": true, // true = TCP 443 connection established + "error": null, // Error message if connection failed (e.g., "Connection timeout after 5000ms") + "latencyMs": 87, // Connection latency + "sourceIp": "192.168.1.100" // Local IP used for connection attempt + } +} +``` + +**Classification logic:** +- `succeeded: false` with DNS resolved → Network path blocked +- `error` contains "timeout" → VPN/firewall/NVA may be dropping packets +- `error` contains "refused" → Target may be rejecting connections + +### HTTPS Probe Results + +```json +{ + "https": { + "url": "https://my-cosmos-account.documents.azure.com", + "succeeded": true, // true = HTTP 200-299 response + "statusCode": 401, // HTTP status code (401 expected without auth) + "error": null, // TLS/connection errors + "latencyMs": 234 // Full request round-trip latency + } +} +``` + +**Classification logic:** +- `succeeded: true` (any 2xx/4xx status) → Can reach endpoint +- `statusCode: 401` → Expected (no credentials), network is healthy +- `error` contains "certificate" or "TLS" → Certificate validation issue +- `error` and `succeeded: false` → Network or firewall blocking TLS + +### Private Network Indicators + +```json +{ + "privateNetwork": { + "isPrivateRange": true, // true if any resolved IP is RFC 1918 + "indicators": [ + "Resolved to RFC 1918 private IP range (10.123.171.30)", + "Matches expected private endpoint IP (10.123.171.30)" + ], + "matchesExpectedPrivateEndpoint": true, // true if resolved IP matches PrivateEndpointIP parameter + "vpnRouteWarning": null // Warning if VPN subnet routing appears blocked + } +} +``` + +### Azure Network Configuration + +```json +{ + "azureNetworkConfig": { + "checked": true, // true if successfully queried via Azure CLI + "publicNetworkAccessRestricted": true, // true if public network access is disabled + "privateEndpoints": [ + { + "id": "/subscriptions/.../privateEndpointConnections/my-pe-connection", + "state": "Approved" // Status: Approved, Pending, Rejected + } + ], + "vnetRules": [ ], // Virtual network rules (firewall) + "error": null // Error if Azure CLI query failed + } +} +``` + +### RBAC Assessment + +```json +{ + "rbac": { + "checked": true, // true if RBAC checked successfully + "canReadAccount": true, // true if caller can read account properties + "canManageAccount": false, // true if caller has Contributor/Owner + "canExecuteDataPlaneOps": true, // true if caller likely has data-plane roles + "roleAssignments": [ + { + "roleDefinitionName": "Cosmos DB Operator", + "principalName": "user@example.com" + } + ], + "classification": "partial", // Enum: "sufficient", "partial", "insufficient", "unknown" + "error": null // Error message if check failed + } +} +``` + +### Azure CLI Context + +```json +{ + "azureCli": { + "installed": true, // true if Azure CLI is installed + "authenticated": true, // true if 'az login' was successful + "currentUser": "user@example.com", // May be "REDACTED-USER-NAME" + "currentTenant": "12345678-1234-1234-1234-123456789012", // May be "REDACTED-TENANT-ID" + "currentSubscription": "abcdef01-2345-6789-abcd-ef0123456789", + "error": null // Error if CLI not installed or not authenticated + } +} +``` + +--- + +## Classification Object +Automated classification with recommendations. + +```json +{ + "classification": { + "status": "failure", // Enum: "success", "failure", "warning", "unknown" + "code": "tcp_connectivity_blocked", // Machine-readable classification code + "summary": "DNS resolution succeeded but TCP 443 connection failed. Network path is blocked.", + "rootCause": "Private endpoint configured but network path blocked (VPN routing, firewall/NVA, NSG, UDR, or peering issue)", + "recommendedActions": [ + "1. Verify VPN connectivity and that your client subnet can route to the private endpoint subnet", + "2. Ask your network team to verify routing between DESKTOP-ABC123 and private endpoint 10.123.171.30", + "3. Check Azure network security groups (NSGs) rules for port 443 inbound", + "4. Verify Azure Virtual Network peering and User Defined Routes (UDRs)", + "5. Check if corporate firewall/NVA is blocking the connection", + "6. Manually run: Test-NetConnection -ComputerName my-cosmos-account.documents.azure.com -Port 443" + ] + } +} +``` + +### Classification Codes Reference + +| Code | Status | Meaning | Likely Cause | +|------|--------|---------|--------------| +| `dns_resolution_failed` | failure | Hostname cannot resolve | DNS misconfiguration, proxy redirect, network unreachable | +| `tcp_connectivity_blocked` | failure | DNS works, TCP 443 fails | Firewall, VPN routing, NVA, NSG, private path blocked | +| `private_endpoint_network_path_blocked` | failure | Private endpoint detected, TCP fails | VPN → private endpoint routing broken | +| `network_connectivity_healthy` | success | DNS and TCP both work | Network is healthy; check auth/RBAC if operations fail | +| `rbac_insufficient` | warning | Network OK, but RBAC limited | User lacks data-plane roles | +| `private_endpoint_mismatch` | warning | Resolved to different IP than expected | Private endpoint routing may be asymmetric or misconfigured | +| `azure_config_check_skipped` | warning | Azure CLI not authenticated | Can't validate ARM-level network configuration | + +--- + +## Redacted Output + +When script is invoked with `-Redact` flag: + +```json +{ + "target": { + "endpointUrl": "REDACTED", + "hostname": "my-cosmos-account.documents.azure.com", // Hostname kept (needed for triage) + "subscriptionId": "REDACTED-SUBSCRIPTION-ID", + "resourceGroup": "REDACTED", + "accountName": "REDACTED" + }, + "diagnostics": { + "azureCli": { + "currentUser": "REDACTED-USER-NAME", + "currentTenant": "REDACTED-TENANT-ID" + }, + "rbac": { + "roleAssignments": [ + { + "roleDefinitionName": "Cosmos DB Operator", + "principalName": "REDACTED-PRINCIPAL-NAME" + } + ] + } + } +} +``` + +--- + +## Sample Outputs + +### Scenario 1: Network Healthy (Public Endpoint) + +```json +{ + "version": "1.0.0", + "timestamp": "2026-05-13T14:30:45Z", + "target": { + "endpointUrl": "https://my-cosmos.documents.azure.com", + "hostname": "my-cosmos.documents.azure.com", + "subscriptionId": "12345678-1234-1234-1234-123456789012", + "resourceGroup": "my-rg", + "accountName": "my-cosmos" + }, + "diagnostics": { + "dns": { + "hostname": "my-cosmos.documents.azure.com", + "succeeded": true, + "addresses": ["52.180.123.45"], + "error": null, + "latencyMs": 12 + }, + "tcp": { + "hostname": "my-cosmos.documents.azure.com", + "port": 443, + "succeeded": true, + "error": null, + "latencyMs": 45, + "sourceIp": "192.168.1.100" + }, + "https": { + "url": "https://my-cosmos.documents.azure.com", + "succeeded": true, + "statusCode": 401, + "error": null, + "latencyMs": 78 + }, + "privateNetwork": { + "isPrivateRange": false, + "indicators": [], + "matchesExpectedPrivateEndpoint": false, + "vpnRouteWarning": null + } + }, + "classification": { + "status": "success", + "code": "network_connectivity_healthy", + "summary": "Network connectivity is healthy. DNS resolves and TCP 443 is reachable.", + "rootCause": null, + "recommendedActions": [ + "✓ Local network connectivity is working", + "If Cosmos DB operations still fail, check:", + " - RBAC/authentication permissions", + " - Account firewall IP rules (if enabled)", + " - Data plane token expiry", + " - Application-level issues (connection strings, SDK versions)" + ] + } +} +``` + +### Scenario 2: Private Endpoint Path Blocked + +```json +{ + "version": "1.0.0", + "timestamp": "2026-05-13T14:35:22Z", + "target": { + "endpointUrl": "https://my-cosmos-pe.documents.azure.com", + "hostname": "my-cosmos-pe.documents.azure.com", + "subscriptionId": "12345678-1234-1234-1234-123456789012", + "resourceGroup": "my-rg", + "accountName": "my-cosmos-pe" + }, + "diagnostics": { + "dns": { + "hostname": "my-cosmos-pe.documents.azure.com", + "succeeded": true, + "addresses": ["10.123.171.30"], + "error": null, + "latencyMs": 8 + }, + "tcp": { + "hostname": "my-cosmos-pe.documents.azure.com", + "port": 443, + "succeeded": false, + "error": "Connection timeout after 5000ms", + "latencyMs": 0, + "sourceIp": null + }, + "privateNetwork": { + "isPrivateRange": true, + "indicators": [ + "Resolved to RFC 1918 private IP range (10.123.171.30)", + "Matches expected private endpoint IP (10.123.171.30)" + ], + "matchesExpectedPrivateEndpoint": true, + "vpnRouteWarning": "Private endpoint IP detected but TCP 443 failed. Likely VPN → PE route blocked." + } + }, + "classification": { + "status": "failure", + "code": "private_endpoint_network_path_blocked", + "summary": "DNS resolution succeeded but TCP 443 connection failed to private endpoint. Network path is blocked.", + "rootCause": "Private endpoint network path blocked (VPN routing, firewall/NVA, NSG, UDR, or peering issue)", + "recommendedActions": [ + "1. Verify VPN connectivity and that your client subnet can route to the private endpoint subnet", + "2. Ask your network team to verify routing from 10.249.14.218 to private endpoint 10.123.171.30", + "3. Check Azure network security groups (NSGs) rules for port 443 inbound on private endpoint subnet", + "4. Verify Azure Virtual Network peering and User Defined Routes (UDRs)", + "5. Check if corporate firewall/NVA is blocking the connection", + "6. Manually run: Test-NetConnection -ComputerName my-cosmos-pe.documents.azure.com -Port 443" + ] + } +} +``` + +### Scenario 3: DNS Resolution Failed + +```json +{ + "version": "1.0.0", + "timestamp": "2026-05-13T14:40:10Z", + "target": { + "endpointUrl": "https://my-cosmos-invalid.documents.azure.com", + "hostname": "my-cosmos-invalid.documents.azure.com" + }, + "diagnostics": { + "dns": { + "hostname": "my-cosmos-invalid.documents.azure.com", + "succeeded": false, + "addresses": [], + "error": "No such host is known", + "dnsServers": ["8.8.8.8"], + "latencyMs": 2342 + }, + "tcp": { + "hostname": "my-cosmos-invalid.documents.azure.com", + "port": 443, + "succeeded": false, + "error": "No such host is known", + "latencyMs": 0, + "sourceIp": null + } + }, + "classification": { + "status": "failure", + "code": "dns_resolution_failed", + "summary": "DNS resolution failed. The Cosmos DB endpoint hostname cannot be resolved.", + "rootCause": "DNS configuration, VPN/proxy DNS redirect, or network connectivity issue", + "recommendedActions": [ + "1. Check if you are connected to corporate VPN or proxy that intercepts DNS", + "2. Manually run: nslookup my-cosmos-invalid.documents.azure.com", + "3. If nslookup fails, check with your network team or ISP", + "4. Try pinging the endpoint or using nslookup with alternate DNS: nslookup my-cosmos-invalid.documents.azure.com 8.8.8.8" + ] + } +} +``` + +--- + +## Parsing Guidelines + +Implementers parsing this JSON should: + +1. **Always check version**: Fields may differ in future versions. Parse defensively. +2. **Use classification.code not status**: Status is user-facing; code is machine-readable for routing and automation. +3. **Check diagnostics.azureCli.authenticated**: If false, Azure configuration checks are unreliable. +4. **Prioritize classification.recommendedActions**: Contains context-specific guidance. +5. **Redacted fields**: May be null or "REDACTED" strings. Do not assume structure. +6. **Latency fields**: Milliseconds, may be 0 if unavailable. +7. **Handle missing fields**: Especially in older versions or on non-Windows platforms. + +--- + +## Version History + +### v1.0.0 (2026-05-13) +- Initial schema +- Includes DNS, TCP, HTTPS, private network, Azure config, and RBAC checks +- Classification codes stable +- Redaction support diff --git a/scripts/Diagnose-CosmosConnectivity.ps1 b/scripts/Diagnose-CosmosConnectivity.ps1 new file mode 100644 index 000000000..c103d93aa --- /dev/null +++ b/scripts/Diagnose-CosmosConnectivity.ps1 @@ -0,0 +1,699 @@ +#!/usr/bin/env pwsh +<# +.SYNOPSIS + Cosmos DB Connectivity Diagnostic Script + Captures local network connectivity, private network posture, and RBAC evidence. + +.DESCRIPTION + This script performs comprehensive network and access diagnostics for Cosmos DB accounts. + It can run in interactive or non-interactive mode and produces a JSON report for triage. + +.PARAMETER EndpointUrl + The Cosmos DB account endpoint URL. + Format: https://.documents.azure.com or https://.documents.azure.com:443/ + WHERE TO GET: Azure Portal > Cosmos DB Account > Overview tab > URI field + OR: Use the endpoint shown in Cosmos Explorer connection string + +.PARAMETER SubscriptionId + Azure subscription ID containing the Cosmos account. + WHERE TO GET: Azure Portal > Subscriptions > Copy Subscription ID + FORMAT: 12345678-1234-1234-1234-123456789012 + +.PARAMETER ResourceGroup + Azure resource group name containing the Cosmos account. + WHERE TO GET: Azure Portal > Cosmos DB Account > Resource group field (top-right) + +.PARAMETER AccountName + Cosmos DB account name. + WHERE TO GET: Azure Portal > Cosmos DB Account > Account Name field + Or extract from endpoint URL (part before .documents.azure.com) + +.PARAMETER PrivateEndpointIP + (Optional) Expected private endpoint IP if account uses private link. + WHERE TO GET: Azure Portal > Cosmos DB Account > Private Endpoint Connections tab > Private IP address column + +.PARAMETER VpnSubnetRange + (Optional) Customer's VPN/client subnet CIDR for route analysis. + FORMAT: 10.0.0.0/24 + WHERE TO GET: Ask your network team or check VPN client properties + +.PARAMETER Interactive + If specified, script prompts for missing parameters instead of requiring them as arguments. + +.PARAMETER Redact + If specified, output JSON redacts sensitive identifiers (tenant ID, subscription ID, usernames). + +.EXAMPLE + # Interactive mode - script will prompt for inputs + .\Diagnose-CosmosConnectivity.ps1 -Interactive + +.EXAMPLE + # Non-interactive with full parameters + .\Diagnose-CosmosConnectivity.ps1 ` + -EndpointUrl "https://my-cosmos-account.documents.azure.com" ` + -SubscriptionId "12345678-1234-1234-1234-123456789012" ` + -ResourceGroup "my-rg" ` + -AccountName "my-cosmos-account" + +.EXAMPLE + # With private endpoint and output redaction + .\Diagnose-CosmosConnectivity.ps1 ` + -EndpointUrl "https://my-cosmos-account.documents.azure.com" ` + -SubscriptionId "12345678-1234-1234-1234-123456789012" ` + -ResourceGroup "my-rg" ` + -AccountName "my-cosmos-account" ` + -PrivateEndpointIP "10.123.171.30" ` + -Redact +#> + +param( + [Parameter(ValueFromPipelineByPropertyName=$true)] + [ValidateScript({$_ -match "^https://[a-z0-9-]+\.documents\.azure\.com" -or $_ -match "^https://[a-z0-9-]+\.documents\.azure\.com:443"})] + [string]$EndpointUrl, + + [Parameter(ValueFromPipelineByPropertyName=$true)] + [guid]$SubscriptionId, + + [Parameter(ValueFromPipelineByPropertyName=$true)] + [string]$ResourceGroup, + + [Parameter(ValueFromPipelineByPropertyName=$true)] + [string]$AccountName, + + [Parameter(ValueFromPipelineByPropertyName=$true)] + [string]$PrivateEndpointIP, + + [Parameter(ValueFromPipelineByPropertyName=$true)] + [string]$VpnSubnetRange, + + [switch]$Interactive, + + [switch]$Redact +) + +# ============================================================================ +# Configuration +# ============================================================================ + +$ScriptVersion = "1.0.0" +$DiagnosticTimestamp = Get-Date -Format "o" +$TcpConnectTimeoutMs = 5000 +$DnsTimeoutMs = 5000 + +# ============================================================================ +# Helper Functions +# ============================================================================ + +function Show-InputInstructions { + Write-Host @" +═════════════════════════════════════════════════════════════════════════════ +COSMOS DB CONNECTIVITY DIAGNOSTIC SCRIPT v$ScriptVersion +═════════════════════════════════════════════════════════════════════════════ + +This script will collect network and access diagnostics for your Cosmos DB account. + +WHERE TO FIND YOUR INPUTS: +───────────────────────────────────────────────────────────────────────────── + +1. ENDPOINT URL (Required) + Location: Azure Portal > Cosmos DB Account > Overview tab + Look for: "URI" field + Example: https://my-cosmos-account.documents.azure.com + ⚠ Include https:// but do NOT include trailing slash or port suffix + +2. SUBSCRIPTION ID (Required) + Location: Azure Portal > Subscriptions + Look for: "Subscription ID" column or click your subscription > Copy ID + Format: 12345678-1234-1234-1234-123456789012 + +3. RESOURCE GROUP (Required) + Location: Azure Portal > Cosmos DB Account > Top-right corner + Look for: "Resource group" field + Example: my-production-rg + +4. ACCOUNT NAME (Required) + Location: Either extract from endpoint URL or find in portal + From URL: Take the part before ".documents.azure.com" + From Portal: Account name appears in the breadcrumb and overview + Example: my-cosmos-account + +5. PRIVATE ENDPOINT IP (Optional, but recommended) + Location: Azure Portal > Cosmos DB Account > Private Endpoint Connections + Look for: "Private IP address" column (only if private endpoints exist) + Format: 10.123.171.30 (will be 10.x.x.x or 172.16-31.x.x range) + Skip this if: You are using public endpoint only + +6. VPN SUBNET RANGE (Optional) + Location: Ask your network team or VPN client settings + Used to: Analyze if routing from your network to private endpoint is blocked + Format: 10.0.0.0/24 (CIDR notation) + Skip this if: You are not using a VPN + +═════════════════════════════════════════════════════════════════════════════ + +"@ +} + +function Read-InputsInteractively { + Show-InputInstructions + + Write-Host "Please provide the following information:" -ForegroundColor Cyan + Write-Host "" + + # Endpoint URL + do { + $endpoint = Read-Host "Endpoint URL (e.g., https://my-cosmos.documents.azure.com)" + if ($endpoint -notmatch "^https://[a-z0-9-]+\.documents\.azure\.com") { + Write-Host "Invalid format. Expected: https://.documents.azure.com" -ForegroundColor Yellow + } + } while ($endpoint -notmatch "^https://[a-z0-9-]+\.documents\.azure\.com") + + # Subscription ID + do { + $subId = Read-Host "Subscription ID (12345678-1234-1234-1234-123456789012)" + if ($subId -notmatch "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$") { + Write-Host "Invalid format. Expected GUID format." -ForegroundColor Yellow + } + } while ($subId -notmatch "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$") + + $rg = Read-Host "Resource Group name" + $account = Read-Host "Account Name" + $peIP = Read-Host "Private Endpoint IP (optional, press Enter to skip)" + $vpnSubnet = Read-Host "VPN Subnet Range (optional, e.g., 10.0.0.0/24, press Enter to skip)" + + return @{ + EndpointUrl = $endpoint + SubscriptionId = [guid]$subId + ResourceGroup = $rg + AccountName = $account + PrivateEndpointIP = if ($peIP) { $peIP } else { $null } + VpnSubnetRange = if ($vpnSubnet) { $vpnSubnet } else { $null } + } +} + +function Invoke-DnsResolution { + param([string]$Hostname) + + $result = @{ + hostname = $Hostname + succeeded = $false + addresses = @() + error = $null + dnsServers = @() + latencyMs = 0 + } + + try { + $stopwatch = [System.Diagnostics.Stopwatch]::StartNew() + $addresses = [System.Net.Dns]::GetHostAddresses($Hostname) + $stopwatch.Stop() + + $result.succeeded = $true + $result.addresses = @($addresses | ForEach-Object { $_.ToString() }) + $result.latencyMs = [int]$stopwatch.ElapsedMilliseconds + + # Try to get DNS servers (Windows/Linux specific) + if ($PSVersionTable.Platform -ne "Unix" -or $PSVersionTable.OS -like "*Linux*") { + try { + $dnsConfig = Get-DnsClientServerAddress -ErrorAction SilentlyContinue | Select-Object -First 1 + if ($dnsConfig) { + $result.dnsServers = @($dnsConfig.ServerAddresses) + } + } catch { } + } + } catch { + $result.error = $_.Exception.Message + } + + return $result +} + +function Invoke-TcpConnectivityTest { + param( + [string]$Hostname, + [int]$Port = 443, + [int]$TimeoutMs = 5000 + ) + + $result = @{ + hostname = $Hostname + port = $Port + succeeded = $false + error = $null + latencyMs = 0 + sourceIp = $null + } + + try { + $stopwatch = [System.Diagnostics.Stopwatch]::StartNew() + $tcpClient = New-Object System.Net.Sockets.TcpClient + $task = $tcpClient.ConnectAsync($Hostname, $Port) + $task.Wait($TimeoutMs) + $stopwatch.Stop() + + if ($task.IsCompleted) { + $result.succeeded = $true + $result.latencyMs = [int]$stopwatch.ElapsedMilliseconds + + # Try to get source IP + try { + $endpoint = $tcpClient.Client.LocalEndPoint + $result.sourceIp = $endpoint.Address.ToString() + } catch { } + } else { + $result.error = "Connection timeout after ${TimeoutMs}ms" + } + + $tcpClient.Close() + } catch { + $result.error = $_.Exception.Message + } + + return $result +} + +function Invoke-HttpsProbe { + param([string]$Url) + + $result = @{ + url = $Url + succeeded = $false + statusCode = $null + error = $null + latencyMs = 0 + } + + try { + $stopwatch = [System.Diagnostics.Stopwatch]::StartNew() + $response = Invoke-WebRequest -Uri $Url -Method Head -TimeoutSec 5 -ErrorAction Stop + $stopwatch.Stop() + + $result.succeeded = $true + $result.statusCode = [int]$response.StatusCode + $result.latencyMs = [int]$stopwatch.ElapsedMilliseconds + } catch { + $result.statusCode = [int]($_.Exception.Response.StatusCode) + $result.error = $_.Exception.Message + } + + return $result +} + +function Get-PrivateNetworkIndicators { + param( + [string[]]$ResolvedAddresses, + [string]$PrivateEndpointIP, + [string]$VpnSubnetRange + ) + + $result = @{ + isPrivateRange = $false + indicators = @() + matchesExpectedPrivateEndpoint = $false + vpnRouteWarning = $null + } + + # Check if resolved IPs are private range + foreach ($addr in $ResolvedAddresses) { + if (IsPrivateIpAddress $addr) { + $result.isPrivateRange = $true + $result.indicators += "Resolved to RFC 1918 private IP range ($addr)" + } + } + + # Check if matches expected private endpoint + if ($PrivateEndpointIP -and $ResolvedAddresses -contains $PrivateEndpointIP) { + $result.matchesExpectedPrivateEndpoint = $true + $result.indicators += "Matches expected private endpoint IP ($PrivateEndpointIP)" + } elseif ($PrivateEndpointIP -and $ResolvedAddresses.Count -gt 0) { + $result.indicators += "WARNING: Resolved to $($ResolvedAddresses[0]) but expected private endpoint IP is $PrivateEndpointIP" + } + + return $result +} + +function IsPrivateIpAddress { + param([string]$IpAddress) + + try { + $ip = [System.Net.IPAddress]::Parse($IpAddress) + # RFC 1918 ranges + if ($ip.ToString() -match "^10\." -or $ip.ToString() -match "^172\.(1[6-9]|2[0-9]|3[01])\." -or $ip.ToString() -match "^192\.168\.") { + return $true + } + # Loopback + if ($ip.AddressFamily -eq "InterNetwork" -and $ip.GetAddressBytes()[0] -eq 127) { + return $true + } + } catch { } + + return $false +} + +function Get-AzureCliContext { + $result = @{ + installed = $false + authenticated = $false + currentUser = $null + currentTenant = $null + currentSubscription = $null + error = $null + } + + try { + $output = & az --version 2>&1 + if ($LASTEXITCODE -eq 0) { + $result.installed = $true + } + } catch { + $result.error = "Azure CLI not found. Skipping Azure context checks." + return $result + } + + try { + $account = & az account show 2>&1 | ConvertFrom-Json + $result.authenticated = $true + $result.currentUser = $account.user.name + $result.currentTenant = $account.tenantId + $result.currentSubscription = $account.id + } catch { + $result.error = "Not authenticated with Azure CLI. Run 'az login' to proceed with Azure checks." + } + + return $result +} + +function Get-AzureAccountNetworkConfig { + param( + [guid]$SubscriptionId, + [string]$ResourceGroup, + [string]$AccountName + ) + + $result = @{ + checked = $false + publicNetworkAccessRestricted = $null + privateEndpoints = @() + vnetRules = @() + error = $null + } + + try { + $scope = "/subscriptions/$SubscriptionId/resourceGroups/$ResourceGroup/providers/Microsoft.DocumentDB/databaseAccounts/$AccountName" + $account = & az cosmosdb show --resource-group $ResourceGroup --name $AccountName 2>&1 | ConvertFrom-Json + + if ($account) { + $result.checked = $true + $result.publicNetworkAccessRestricted = $account.properties.publicNetworkAccess -eq "Disabled" + + # Get private endpoints + $peConnections = & az cosmosdb private-endpoint-connection list --resource-group $ResourceGroup --name $AccountName 2>&1 | ConvertFrom-Json + if ($peConnections) { + $result.privateEndpoints = @($peConnections | Select-Object -Property id, @{n='state';e={$_.properties.privateLinkServiceConnectionState.status}}) + } + } + } catch { + $result.error = $_.Exception.Message + } + + return $result +} + +function Get-RbacAssessment { + param( + [guid]$SubscriptionId, + [string]$ResourceGroup, + [string]$AccountName + ) + + $result = @{ + checked = $false + canReadAccount = $false + canManageAccount = $false + canExecuteDataPlaneOps = $false + roleAssignments = @() + classification = "unknown" + error = $null + } + + try { + $scope = "/subscriptions/$SubscriptionId/resourceGroups/$ResourceGroup/providers/Microsoft.DocumentDB/databaseAccounts/$AccountName" + + # Try to read account (implies Reader or higher) + $account = & az cosmosdb show --resource-group $ResourceGroup --name $AccountName 2>&1 | ConvertFrom-Json + if ($account) { + $result.checked = $true + $result.canReadAccount = $true + + # Check role assignments + $roles = & az role assignment list --scope $scope 2>&1 | ConvertFrom-Json + if ($roles) { + $result.roleAssignments = @($roles | Select-Object -Property roleDefinitionName, principalName) + + # Classify permissions + $roleNames = $roles | Select-Object -ExpandProperty roleDefinitionName + if ($roleNames -contains "Contributor" -or $roleNames -contains "Owner") { + $result.canManageAccount = $true + $result.canExecuteDataPlaneOps = $true + $result.classification = "sufficient" + } elseif ($roleNames -contains "Cosmos DB Operator" -or $roleNames -contains "Cosmos DB Account Reader") { + $result.canExecuteDataPlaneOps = $true + $result.classification = "partial" + } else { + $result.classification = "partial" + } + } + } + } catch { + $result.error = $_.Exception.Message + $result.classification = "insufficient" + } + + return $result +} + +function Invoke-Classification { + param( + [hashtable]$DnsResult, + [hashtable]$TcpResult, + [hashtable]$PrivateNetworkIndicators, + [hashtable]$AzureNetworkConfig + ) + + $classification = @{ + status = "unknown" + code = "unknown" + summary = "Unable to classify" + rootCause = $null + recommendedActions = @() + } + + # DNS failure + if (-not $DnsResult.succeeded) { + $classification.status = "failure" + $classification.code = "dns_resolution_failed" + $classification.summary = "DNS resolution failed. The Cosmos DB endpoint hostname cannot be resolved." + $classification.rootCause = "DNS configuration, VPN/proxy DNS redirect, or network connectivity issue" + $classification.recommendedActions = @( + "1. Check if you are connected to corporate VPN or proxy that intercepts DNS", + "2. Manually run: nslookup $($DnsResult.hostname)", + "3. If nslookup fails, check with your network team or ISP", + "4. Try pinging the endpoint or using nslookup with alternate DNS: nslookup $($DnsResult.hostname) 8.8.8.8" + ) + return $classification + } + + # DNS succeeded but TCP failed + if ($DnsResult.succeeded -and -not $TcpResult.succeeded) { + $classification.status = "failure" + $classification.code = "tcp_connectivity_blocked" + $classification.summary = "DNS resolution succeeded but TCP 443 connection failed. Network path is blocked." + + if ($PrivateNetworkIndicators.isPrivateRange) { + $classification.rootCause = "Private endpoint configured but network path blocked (VPN routing, firewall/NVA, NSG, UDR, or peering issue)" + $classification.recommendedActions = @( + "1. Verify VPN connectivity and that your client subnet can route to the private endpoint subnet", + "2. Ask your network team to verify routing between $([System.Net.Dns]::GetHostName()) and private endpoint $($DnsResult.addresses[0])", + "3. Check Azure network security groups (NSGs) rules for port 443 inbound", + "4. Verify Azure Virtual Network peering and User Defined Routes (UDRs)", + "5. Check if corporate firewall/NVA is blocking the connection", + "6. Manually run: Test-NetConnection -ComputerName $($DnsResult.hostname) -Port 443" + ) + } else { + $classification.rootCause = "Public endpoint network path blocked (firewall, proxy, ISP, or regional restriction)" + $classification.recommendedActions = @( + "1. Check if corporate firewall is blocking outbound port 443", + "2. If behind proxy, verify proxy settings allow HTTPS to documents.azure.com", + "3. Manually run: Test-NetConnection -ComputerName $($DnsResult.hostname) -Port 443", + "4. Try connecting from a different network to isolate the issue" + ) + } + return $classification + } + + # Both succeeded + if ($DnsResult.succeeded -and $TcpResult.succeeded) { + $classification.status = "success" + $classification.code = "network_connectivity_healthy" + $classification.summary = "Network connectivity is healthy. DNS resolves and TCP 443 is reachable." + $classification.rootCause = $null + $classification.recommendedActions = @( + "✓ Local network connectivity is working", + "If Cosmos DB operations still fail, check:", + " - RBAC/authentication permissions", + " - Account firewall IP rules (if enabled)", + " - Data plane token expiry", + " - Application-level issues (connection strings, SDK versions)" + ) + return $classification + } + + return $classification +} + +function Redact-Sensitive { + param([object]$Object) + + if (-not $Redact) { return $Object } + + $json = $Object | ConvertTo-Json -Depth 10 + $json = $json -replace [regex]::Escape($SubscriptionId.ToString()), "REDACTED-SUBSCRIPTION-ID" + + # Redact tenant IDs (GUIDs in certain fields) + $json = $json -replace '"currentTenant"\s*:\s*"[^"]*"', '"currentTenant": "REDACTED-TENANT-ID"' + + # Redact user names + $json = $json -replace '"currentUser"\s*:\s*"[^"]*"', '"currentUser": "REDACTED-USER-NAME"' + $json = $json -replace '"principalName"\s*:\s*"[^"]*"', '"principalName": "REDACTED-PRINCIPAL-NAME"' + + return $json | ConvertFrom-Json +} + +# ============================================================================ +# Main Execution +# ============================================================================ + +try { + # Validate and collect inputs + if ($Interactive -and -not $EndpointUrl) { + $inputs = Read-InputsInteractively + $EndpointUrl = $inputs.EndpointUrl + $SubscriptionId = $inputs.SubscriptionId + $ResourceGroup = $inputs.ResourceGroup + $AccountName = $inputs.AccountName + $PrivateEndpointIP = $inputs.PrivateEndpointIP + $VpnSubnetRange = $inputs.VpnSubnetRange + } elseif (-not $EndpointUrl) { + Write-Host "No endpoint URL provided. Use -Interactive flag or provide parameters." -ForegroundColor Red + Show-InputInstructions + exit 1 + } + + # Extract hostname from URL + $uri = [System.Uri]$EndpointUrl + $hostname = $uri.Host + + Write-Host "Collecting diagnostics for: $hostname" -ForegroundColor Cyan + Write-Host "" + + # Run diagnostics + Write-Host "[1/5] DNS Resolution..." -ForegroundColor Cyan + $dnsResult = Invoke-DnsResolution -Hostname $hostname + + Write-Host "[2/5] TCP Connectivity (port 443)..." -ForegroundColor Cyan + $tcpResult = Invoke-TcpConnectivityTest -Hostname $hostname -Port 443 -TimeoutMs $TcpConnectTimeoutMs + + Write-Host "[3/5] HTTPS Probe..." -ForegroundColor Cyan + $httpsResult = Invoke-HttpsProbe -Url $EndpointUrl + + Write-Host "[4/5] Private Network Analysis..." -ForegroundColor Cyan + $privateNetIndicators = Get-PrivateNetworkIndicators -ResolvedAddresses $dnsResult.addresses -PrivateEndpointIP $PrivateEndpointIP -VpnSubnetRange $VpnSubnetRange + + Write-Host "[5/5] Azure Configuration & RBAC..." -ForegroundColor Cyan + $cliContext = Get-AzureCliContext + $networkConfig = @{ checked = $false; error = "Skipped" } + $rbacAssessment = @{ checked = $false; classification = "unknown"; error = "Skipped" } + + if ($cliContext.authenticated -and $SubscriptionId -and $ResourceGroup -and $AccountName) { + $networkConfig = Get-AzureAccountNetworkConfig -SubscriptionId $SubscriptionId -ResourceGroup $ResourceGroup -AccountName $AccountName + $rbacAssessment = Get-RbacAssessment -SubscriptionId $SubscriptionId -ResourceGroup $ResourceGroup -AccountName $AccountName + } elseif (-not $cliContext.authenticated) { + Write-Host " ⚠ Azure CLI not authenticated. Skipping Azure checks. Run 'az login' to enable." -ForegroundColor Yellow + } + + Write-Host "" + Write-Host "Generating classification..." -ForegroundColor Cyan + $classification = Invoke-Classification -DnsResult $dnsResult -TcpResult $tcpResult -PrivateNetworkIndicators $privateNetIndicators -AzureNetworkConfig $networkConfig + + # Build final report + $report = @{ + version = $ScriptVersion + timestamp = $DiagnosticTimestamp + target = @{ + endpointUrl = if ($Redact) { "REDACTED" } else { $EndpointUrl } + hostname = $hostname + subscriptionId = if ($Redact -and $SubscriptionId) { "REDACTED" } else { $SubscriptionId.ToString() } + resourceGroup = if ($Redact -and $ResourceGroup) { "REDACTED" } else { $ResourceGroup } + accountName = if ($Redact -and $AccountName) { "REDACTED" } else { $AccountName } + } + execution = @{ + hostname = [System.Net.Dns]::GetHostName() + platform = $PSVersionTable.OS + powershellVersion = $PSVersionTable.PSVersion.ToString() + } + diagnostics = @{ + dns = $dnsResult + tcp = $tcpResult + https = $httpsResult + privateNetwork = $privateNetIndicators + azureNetworkConfig = $networkConfig + rbac = $rbacAssessment + azureCli = $cliContext + } + classification = $classification + } + + # Redact if requested + if ($Redact) { + $report = Redact-Sensitive -Object $report + } + + # Output JSON report + $jsonReport = $report | ConvertTo-Json -Depth 10 + + # Save to file + $timestamp = Get-Date -Format "yyyyMMdd_HHmmss" + $outputFile = "cosmos-diagnostic-$timestamp.json" + $jsonReport | Out-File -FilePath $outputFile -Encoding UTF8 + + Write-Host "" + Write-Host "═════════════════════════════════════════════════════════════════════════════" -ForegroundColor Green + Write-Host "DIAGNOSTIC COMPLETE" -ForegroundColor Green + Write-Host "═════════════════════════════════════════════════════════════════════════════" -ForegroundColor Green + Write-Host "" + Write-Host "Summary:" -ForegroundColor Cyan + Write-Host " DNS Resolution: $(if ($dnsResult.succeeded) { '✓ PASS' } else { '✗ FAIL' })" + Write-Host " TCP Connectivity: $(if ($tcpResult.succeeded) { '✓ PASS' } else { '✗ FAIL' })" + Write-Host " Private Network: $(if ($privateNetIndicators.isPrivateRange) { 'Detected (Private Endpoint)' } else { 'Not Detected (Public Endpoint)' })" + Write-Host " Classification: $($classification.status.ToUpper()) - $($classification.code)" + Write-Host "" + Write-Host "Full report saved to: $outputFile" -ForegroundColor Green + Write-Host "" + Write-Host "Summary:" -ForegroundColor Yellow + Write-Host $classification.summary + Write-Host "" + if ($classification.recommendedActions.Count -gt 0) { + Write-Host "Recommended Actions:" -ForegroundColor Yellow + $classification.recommendedActions | ForEach-Object { Write-Host " $_" } + } + Write-Host "" + + # Output JSON to console for easy copy/paste + Write-Host "Full JSON Report:" -ForegroundColor Cyan + Write-Host "─────────────────────────────────────────────────────────────────────────────" + Write-Host $jsonReport + +} catch { + Write-Host "Error: $($_.Exception.Message)" -ForegroundColor Red + exit 1 +} diff --git a/scripts/INDEX.md b/scripts/INDEX.md new file mode 100644 index 000000000..8fa265f45 --- /dev/null +++ b/scripts/INDEX.md @@ -0,0 +1,352 @@ +# Cosmos DB Connectivity Diagnostic - Complete Documentation Index + +## 📦 Deliverables + +This folder contains a complete, production-ready diagnostic toolkit for troubleshooting Cosmos DB connectivity issues. Below is a guide to all files and their purpose. + +--- + +## 📚 Documentation Files + +### 1. **README.md** ← Start here +**Purpose:** Comprehensive usage guide for customers and support teams + +**Contains:** +- Overview and features +- Quick start in 3 modes (interactive, non-interactive, with redaction) +- Step-by-step guide to finding all inputs +- Understanding output format +- Common scenarios and examples +- Integration examples +- Troubleshooting guide +- Troubleshooting common issues + +**Read this if:** You're running the script for the first time or onboarding someone else + +--- + +### 2. **QUICK_REFERENCE.md** ← For urgent issues +**Purpose:** 2-minute quick-start card for customers + +**Contains:** +- 3-step quick start +- Result codes at a glance +- Common fixes +- Prerequisite checklist + +**Read this if:** You need to run the script NOW and don't have time for full docs + +--- + +### 3. **DIAGNOSTIC_SCHEMA.md** ← For developers/automation +**Purpose:** Complete JSON output specification + +**Contains:** +- Full JSON schema with field descriptions +- Root, target, execution, diagnostics, and classification objects +- DNS/TCP/HTTPS/private network result formats +- Azure config and RBAC object structures +- Classification code reference table +- Sample outputs for 3 scenarios +- Parsing guidelines +- Version history + +**Read this if:** +- You're building a parser or automation tool +- You need to understand the JSON structure +- You're integrating with support ticketing system +- You want to validate output structure + +--- + +### 4. **CLASSIFICATION_MATRIX.md** ← For support teams +**Purpose:** Support playbooks and triage routing + +**Contains:** +- Decision tree flowchart (ASCII art) +- All classification codes with detailed explanations +- Root causes and recommended actions for each code +- Tier 1 triage checklist +- Detailed playbooks for each failure scenario: + - DNS Resolution Failed + - TCP 443 Failed (Public Endpoint) + - TCP 443 Failed (Private Endpoint) + - RBAC Insufficient +- Support ticket template +- Python parsing example +- Automation routing matrix + +**Read this if:** +- You're a support engineer receiving diagnostic reports +- You need to route issues based on classification +- You're building automation to process diagnostics +- You need to escalate to specialist teams + +--- + +## 🔧 Script File + +### **Diagnose-CosmosConnectivity.ps1** +**Purpose:** Main diagnostic script (customer-executable) + +**What it does:** +1. Prompts for account endpoints and credentials (interactive or parameterized) +2. Runs 5 diagnostic checks: + - DNS resolution of account endpoint + - TCP 443 connectivity test + - HTTPS reachability probe + - Private network indicators analysis + - Azure CLI queries (if authenticated) +3. Performs RBAC assessment +4. Generates classification (success/failure/warning + specific code) +5. Outputs structured JSON to file and console +6. Produces human-readable summary with recommended actions + +**Key Features:** +- 300+ lines of well-commented PowerShell +- Error handling for all network operations +- Timeouts to prevent hanging +- Optional sensitive data redaction +- Works on Windows, macOS, Linux (PowerShell 5.0+) +- No external dependencies except optional Azure CLI + +**How to run:** +```powershell +# Interactive (recommended first run) +.\Diagnose-CosmosConnectivity.ps1 -Interactive + +# Non-interactive (scripted) +.\Diagnose-CosmosConnectivity.ps1 ` + -EndpointUrl "..." -SubscriptionId "..." -ResourceGroup "..." -AccountName "..." + +# Safe for support (redacted) +.\Diagnose-CosmosConnectivity.ps1 ... -Redact +``` + +--- + +## 🔄 File Relationships + +``` +Customer Issue: "Can't connect to Cosmos DB" + │ + ├─→ QUICK_REFERENCE.md (if in hurry) + │ │ + │ └─→ "Run this command" + │ + └─→ README.md (comprehensive guidance) + │ + ├─→ Run: Diagnose-CosmosConnectivity.ps1 + │ │ + │ └─→ Outputs JSON file + console summary + │ + ├─→ Read classification code + │ + └─→ CLASSIFICATION_MATRIX.md (support playbook) + │ + ├─→ Find your classification code + │ + ├─→ Read root causes + │ + └─→ Follow recommended actions + │ + ├─→ Self-resolve? + │ └─→ Done! + │ + └─→ Still stuck? + │ + ├─→ Gather info from JSON + │ + ├─→ Redact with -Redact flag + │ + └─→ Escalate to support + │ + ├─→ Support triages with CLASSIFICATION_MATRIX.md + │ + └─→ Route to specialist (network, auth, etc.) +``` + +--- + +## 🎯 Usage by Role + +### 👤 Customer / End User +1. Read: **QUICK_REFERENCE.md** (2 min) +2. Gather inputs as shown in README.md +3. Run: `.\Diagnose-CosmosConnectivity.ps1 -Interactive` +4. Review output—look for Classification Code +5. Try recommended actions from console output +6. If stuck → Share JSON with support (use `-Redact`) + +### 👨‍💼 Support Engineer (Tier 1) +1. Receive JSON report from customer +2. Read: **CLASSIFICATION_MATRIX.md** section "Tier 1: Triage" +3. Look up classification.code in "Classification Code Reference" +4. Follow the corresponding playbook +5. Either self-resolve or route to specialist + +### 👨‍💻 Support Engineer (Specialist) +1. Receive routed issue with JSON and escalation context +2. Read relevant playbook from **CLASSIFICATION_MATRIX.md** +3. Use **DIAGNOSTIC_SCHEMA.md** to parse specific JSON fields +4. Reference "Recommended Actions" for deep-dive steps +5. May request customer to re-run with additional parameters + +### 🤖 Automation / Integration +1. Read: **DIAGNOSTIC_SCHEMA.md** (schema specification) +2. Parse JSON output from script +3. Route based on classification.code +4. (Optional) Read **CLASSIFICATION_MATRIX.md** section "JSON Parsing for Automation" +5. Integrate with ticketing, routing, or remediation system + +### 📊 Product Team / Data Analysis +1. Collect diagnostic reports over time +2. Aggregate classification codes to identify trends +3. Use JSON structure to extract metrics (DNS latency, TCP success rate, etc.) +4. Reference **DIAGNOSTIC_SCHEMA.md** for field definitions +5. Correlate with support ticket data for insights + +--- + +## 📋 Classification Codes at a Glance + +Quick reference (full details in CLASSIFICATION_MATRIX.md): + +| Code | Type | Severity | What It Means | +|------|------|----------|---| +| `network_connectivity_healthy` | ✅ | Info | Network works; if still broken, check auth/app | +| `dns_resolution_failed` | ❌ | High | Cannot resolve endpoint (DNS/VPN/proxy issue) | +| `tcp_connectivity_blocked` | ❌ | High | DNS works, port 443 blocked (firewall/ISP) | +| `private_endpoint_network_path_blocked` | ❌ | High | Private endpoint unreachable (PE routing issue) | +| `rbac_insufficient` | ⚠️ | Medium | Network OK, but permissions missing | +| `private_endpoint_mismatch` | ⚠️ | Medium | Resolved to unexpected private IP | +| `azure_config_check_skipped` | ⚠️ | Low | Azure CLI not authenticated; re-run after `az login` | + +--- + +## 🔍 Finding Specific Information + +### "I want to know what the JSON contains" +→ **DIAGNOSTIC_SCHEMA.md** (all field definitions) + +### "I see a classification code, what does it mean?" +→ **CLASSIFICATION_MATRIX.md** (code reference + playbook) + +### "How do I run the script?" +→ **README.md** (detailed how-to) or **QUICK_REFERENCE.md** (2-min version) + +### "I'm building a parser/bot" +→ **DIAGNOSTIC_SCHEMA.md** (schema + samples) + **CLASSIFICATION_MATRIX.md** (routing logic) + +### "I need to support multiple customers" +→ **CLASSIFICATION_MATRIX.md** (support ticket template + triage playbook) + +### "I need to find input for a specific field" +→ **README.md** section "Getting Your Inputs" (step-by-step with screenshots reference) + +### "How do I integrate this into my system?" +→ **DIAGNOSTIC_SCHEMA.md** (JSON structure) + **CLASSIFICATION_MATRIX.md** (routing + Python example) + +--- + +## ✅ Pre-Launch Checklist + +Before deploying to customers, verify: + +- [ ] Script runs without errors in interactive mode +- [ ] Script accepts all parameters in non-interactive mode +- [ ] `-Redact` flag properly masks sensitive data +- [ ] JSON output validates against DIAGNOSTIC_SCHEMA.md +- [ ] All classification codes match CLASSIFICATION_MATRIX.md +- [ ] README.md examples tested and working +- [ ] Support team trained on CLASSIFICATION_MATRIX.md playbooks +- [ ] Triage automation configured (if applicable) +- [ ] Sample JSON files created and tested +- [ ] Accessibility verified (screen readers, etc.) + +--- + +## 🚀 Rollout Plan + +### Phase 1: Internal Testing (Week 1) +- [ ] Run script on various network configurations +- [ ] Test interactive and non-interactive modes +- [ ] Verify Azure CLI integration (if connected to test accounts) +- [ ] Collect sample JSON outputs + +### Phase 2: Support Dogfood (Week 2) +- [ ] Train support team on using CLASSIFICATION_MATRIX.md +- [ ] Have support team run diagnostics on internal test accounts +- [ ] Collect feedback on documentation clarity +- [ ] Refine playbooks based on real cases + +### Phase 3: Limited Release (Week 3) +- [ ] Release to subset of customers (e.g., preview tier) +- [ ] Gather feedback on usability +- [ ] Monitor classification code distribution +- [ ] Look for unexpected errors or edge cases + +### Phase 4: General Availability (Week 4) +- [ ] Release to all customers +- [ ] Monitor issue volume and classification codes +- [ ] Use data to identify new playbooks or improvements +- [ ] Update documentation based on feedback + +--- + +## 📞 Support & Maintenance + +### Common Questions + +**Q: Can I run the script without Azure CLI?** +A: Yes! It will skip Azure configuration checks but still do network diagnostics. + +**Q: Is the script safe? Does it collect personal data?** +A: Safe. It only reads local network config and (optionally) queries Azure API if you're authenticated. Use `-Redact` to mask sensitive data before sharing. + +**Q: What if I get an unexpected error?** +A: Check error message in console, review troubleshooting section in README.md, or share the JSON file with support. + +**Q: How often should I re-run diagnostics?** +A: After network changes, VPN reconnect, or when troubleshooting intermittent issues. + +--- + +## 📈 Success Metrics + +Track these to measure script effectiveness: + +- % of customers who run script on first issue +- % of issues self-resolved after reading recommended actions +- Reduction in escalations for network vs auth vs app issues +- Average time to triage (before: manual back-and-forth; after: automated) +- Distribution of classification codes (helps identify common issues) + +--- + +## 🔄 Version & Updates + +**Current Version:** 1.0.0 +**Schema Version:** 1.0.0 +**Last Updated:** 2026-05-13 + +**Versioning Policy:** +- Major version (1.x.x) = Breaking changes to JSON schema or classification codes +- Minor version (x.1.x) = New checks or optional fields added +- Patch version (x.x.1) = Bug fixes, documentation updates + +--- + +## 📄 License & Attribution + +All files in this directory are provided as-is for Cosmos DB connectivity diagnostics. +See repository LICENSE file for terms. + +--- + +**Quick Links:** +- 🚀 [Quick Start](./QUICK_REFERENCE.md) +- 📖 [Full Documentation](./README.md) +- 🔧 [Script](./Diagnose-CosmosConnectivity.ps1) +- 🗂️ [JSON Schema](./DIAGNOSTIC_SCHEMA.md) +- 📋 [Support Playbooks](./CLASSIFICATION_MATRIX.md) diff --git a/scripts/QUICK_REFERENCE.md b/scripts/QUICK_REFERENCE.md new file mode 100644 index 000000000..0163bbe25 --- /dev/null +++ b/scripts/QUICK_REFERENCE.md @@ -0,0 +1,144 @@ +# Cosmos DB Connectivity Diagnostic - Quick Reference + +## 🚀 Quick Start (2 Minutes) + +### Step 1: Gather Your Info + +| Item | Where to Find | +|------|---| +| **Endpoint URL** | Azure Portal → Cosmos DB Account → Overview → URI field | +| **Subscription ID** | Azure Portal → Subscriptions → Copy ID | +| **Resource Group** | Azure Portal → Cosmos DB Account → Top-right "Resource group" | +| **Account Name** | From endpoint URL (the part before `.documents.azure.com`) | + +### Step 2: Run the Script + +**Interactive (easiest):** +```powershell +.\Diagnose-CosmosConnectivity.ps1 -Interactive +``` +Script will prompt for inputs and guide you. + +**Non-interactive:** +```powershell +.\Diagnose-CosmosConnectivity.ps1 ` + -EndpointUrl "https://my-cosmos.documents.azure.com" ` + -SubscriptionId "12345678-1234-1234-1234-123456789012" ` + -ResourceGroup "my-rg" ` + -AccountName "my-cosmos" +``` + +**With redaction (safe for support):** +```powershell +.\Diagnose-CosmosConnectivity.ps1 ` + -EndpointUrl "https://my-cosmos.documents.azure.com" ` + -SubscriptionId "12345678-1234-1234-1234-123456789012" ` + -ResourceGroup "my-rg" ` + -AccountName "my-cosmos" ` + -Redact +``` + +### Step 3: Check Result + +Look for the **Classification** line: + +``` +Classification: SUCCESS - network_connectivity_healthy +``` + +--- + +## 📊 Result Codes + +| Code | Meaning | Action | +|------|---------|--------| +| ✅ `network_connectivity_healthy` | Network OK | Check auth/RBAC if operations still fail | +| ❌ `dns_resolution_failed` | Cannot find hostname | Check VPN/proxy DNS settings | +| ❌ `tcp_connectivity_blocked` | DNS works, but port 443 blocked | Ask network team to check firewall | +| ❌ `private_endpoint_network_path_blocked` | Private endpoint unreachable | Ask network team to check PE routing | +| ⚠️ `rbac_insufficient` | Not enough permissions | Ask admin for Cosmos DB Operator role | +| ⚠️ `azure_config_check_skipped` | Azure CLI not set up | Run `az login` and re-run | + +--- + +## 🆘 Common Fixes + +### DNS Resolution Failed +1. Are you on a VPN? → Ask VPN admin about DNS settings +2. Check manually: `nslookup my-cosmos-account.documents.azure.com` +3. Try different DNS: `nslookup my-cosmos-account.documents.azure.com 8.8.8.8` + +### TCP 443 Blocked (Public Endpoint) +1. Check Windows Firewall (Windows Defender) settings +2. If on corporate network → Ask IT if 443 outbound is allowed +3. Try from mobile hotspot to test + +### TCP 443 Blocked (Private Endpoint) +1. Verify VPN is connected +2. Ask network team to check NSG and routing rules +3. Provide them with the script output (use `-Redact` to mask sensitive data) + +### RBAC Insufficient +1. Ask admin to assign you **"Cosmos DB Operator"** role +2. Wait 5-10 minutes for role assignment to propagate + +--- + +## 📁 Output Files + +**JSON Report:** `cosmos-diagnostic-.json` +- Full diagnostic results +- Save for your records +- Can share with support (use `-Redact` first) + +--- + +## ⚙️ Prerequisites + +- PowerShell 5.0+ (Windows, Mac, Linux) +- Network access to documents.azure.com +- (Optional) Azure CLI for full diagnostics: `az login` + +--- + +## 💡 Tips + +**Private Endpoint?** Include the IP: +```powershell +.\Diagnose-CosmosConnectivity.ps1 -Interactive -PrivateEndpointIP "10.123.171.30" +``` + +**Sharing with support safely:** +```powershell +.\Diagnose-CosmosConnectivity.ps1 ... -Redact +# Share the JSON file (sensitive data masked) +``` + +**Just want DNS/TCP without Azure checks:** +- Run without providing SubscriptionId/ResourceGroup/AccountName +- Or don't run `az login` first + +--- + +## 📞 Getting Help + +**If you see:** +- ✅ Green checkmarks → Network is working. Issue is likely application-level. +- ❌ Red X marks → Network is blocked. Share the JSON with support. +- ⚠️ Yellow warnings → Configuration issue. Follow recommended actions. + +**Next:** Share your JSON report with support and include the **Classification Code**. + +--- + +## 📋 Checklist Before Contacting Support + +- [ ] I ran the script successfully +- [ ] I noted the **Classification Code** (from console output) +- [ ] I checked the **Recommended Actions** section +- [ ] I tried the basic fixes above +- [ ] I saved the JSON report + +--- + +**Version:** 1.0.0 | **Last Updated:** 2026-05-13 diff --git a/scripts/README.md b/scripts/README.md new file mode 100644 index 000000000..d3033b8e3 --- /dev/null +++ b/scripts/README.md @@ -0,0 +1,424 @@ +# Cosmos DB Connectivity Diagnostic Script - README + +## Overview + +This is a standalone PowerShell diagnostic script that captures network connectivity, private endpoint configuration, and Azure RBAC status for Cosmos DB accounts. It's designed to be run locally on a customer's machine to help troubleshoot HTTP 0.0 and connection errors. + +**Key Features:** +- ✅ DNS resolution verification +- ✅ TCP 443 connectivity testing +- ✅ HTTPS reachability probe +- ✅ Private endpoint detection +- ✅ Private network route analysis +- ✅ Azure CLI optional context (network config, RBAC) +- ✅ Structured JSON output for triage automation +- ✅ Sensitive data redaction for safe sharing +- ✅ Interactive and non-interactive modes + +--- + +## Quick Start + +### Prerequisites + +- PowerShell 5.0+ (works on Windows, Linux, macOS) +- If querying Azure config: Azure CLI installed and authenticated (`az login`) +- Outbound network access to documents.azure.com + +### Option 1: Interactive Mode (Recommended for First Run) + +Simplest approach—script prompts for inputs: + +```powershell +.\Diagnose-CosmosConnectivity.ps1 -Interactive +``` + +The script will display a guide showing where to find each input, then prompt: +- Endpoint URL +- Subscription ID +- Resource Group +- Account Name +- (Optional) Private Endpoint IP +- (Optional) VPN Subnet Range + +### Option 2: Non-Interactive Mode (Scripted/Automated) + +Provide all parameters directly: + +```powershell +.\Diagnose-CosmosConnectivity.ps1 ` + -EndpointUrl "https://my-cosmos-account.documents.azure.com" ` + -SubscriptionId "12345678-1234-1234-1234-123456789012" ` + -ResourceGroup "my-resource-group" ` + -AccountName "my-cosmos-account" +``` + +### Option 3: Non-Interactive with Redaction (Safe for Support) + +Output JSON with sensitive data masked: + +```powershell +.\Diagnose-CosmosConnectivity.ps1 ` + -EndpointUrl "https://my-cosmos-account.documents.azure.com" ` + -SubscriptionId "12345678-1234-1234-1234-123456789012" ` + -ResourceGroup "my-resource-group" ` + -AccountName "my-cosmos-account" ` + -Redact +``` + +--- + +## Detailed Usage + +### Getting Your Inputs + +#### 1. **Endpoint URL** (Required) +**Location:** Azure Portal → Cosmos DB Account → Overview + +1. Go to [Azure Portal](https://portal.azure.com) +2. Search for "Cosmos DB" +3. Click your Cosmos DB account +4. Look for the **"URI"** field in the Overview tab +5. Copy the entire URL (e.g., `https://my-cosmos-account.documents.azure.com`) + +**Format:** `https://.documents.azure.com` (do NOT include trailing slash or `:443/`) + +**Note:** If using a regional endpoint, use the primary endpoint. Private endpoints will have the same hostname with different IP resolution. + +--- + +#### 2. **Subscription ID** (Required) +**Location:** Azure Portal → Subscriptions or Portal → Home + +1. Go to [Azure Portal](https://portal.azure.com) +2. Click on "Subscriptions" (or search for it) +3. Find your subscription +4. Copy the **Subscription ID** (looks like `12345678-1234-1234-1234-123456789012`) + +**Alternative:** From your Cosmos account page, look at the breadcrumb at the top or search box. + +--- + +#### 3. **Resource Group** (Required) +**Location:** Azure Portal → Cosmos DB Account (top-right corner) + +1. Open your Cosmos DB account +2. At the top of the page, you'll see breadcrumbs +3. Look for **"Resource group: "** in the top-right +4. Or on the Overview page, find the **"Resource group"** field + +**Example:** `my-production-rg` or `cosmos-resources` + +--- + +#### 4. **Account Name** (Required) +**Location:** Extract from endpoint URL or Azure Portal + +**From URL:** +- Endpoint: `https://my-cosmos-account.documents.azure.com` +- Account Name: `my-cosmos-account` (the part before `.documents.azure.com`) + +**From Portal:** +- Open Cosmos DB account → Look at the account name in the breadcrumb or page title + +--- + +#### 5. **Private Endpoint IP** (Optional but Recommended) +**Location:** Azure Portal → Cosmos DB Account → Private Endpoint Connections + +1. Open your Cosmos DB account +2. Go to **Settings** → **Private Endpoint Connections** +3. If any connections exist, look for **"Private IP address"** column +4. Copy the IP (e.g., `10.123.171.30`) + +**When to provide:** +- If your Cosmos account has private endpoints configured +- Otherwise, leave blank (press Enter in interactive mode) + +**Format:** `10.x.x.x`, `172.16-31.x.x`, or `192.168.x.x` (RFC 1918 ranges) + +--- + +#### 6. **VPN Subnet Range** (Optional) +**Location:** Ask your network team or VPN client properties + +If you're connecting via VPN, your network team should know your VPN subnet CIDR. + +**Example:** `10.0.0.0/24` (network: 10.0.0.0–10.0.0.255) + +**When to provide:** +- If you're behind a VPN +- If you suspect VPN routing is the issue +- Otherwise, leave blank + +--- + +### Understanding Output + +#### Console Summary + +After running, you'll see: + +``` +═════════════════════════════════════════════════════════════════════════════ +DIAGNOSTIC COMPLETE +═════════════════════════════════════════════════════════════════════════════ + +Summary: + DNS Resolution: ✓ PASS + TCP Connectivity: ✗ FAIL + Private Network: Detected (Private Endpoint) + Classification: FAILURE - tcp_connectivity_blocked + +Full report saved to: cosmos-diagnostic-20260513_143045.json + +Summary: +TCP 443 connection failed to private endpoint. Network path is blocked. + +Recommended Actions: + 1. Verify VPN connectivity and that your client subnet can route to the private endpoint subnet + 2. Ask your network team to verify routing from DESKTOP-ABC123 to private endpoint 10.123.171.30 + 3. Check Azure network security groups (NSGs) rules for port 443 inbound + 4. Verify Azure Virtual Network peering and User Defined Routes (UDRs) + 5. Check if corporate firewall/NVA is blocking the connection + 6. Manually run: Test-NetConnection -ComputerName my-cosmos-account.documents.azure.com -Port 443 + +Full JSON Report: +... +``` + +#### JSON Output File + +A file like `cosmos-diagnostic-20260513_143045.json` is automatically saved in the current directory. + +**Use this file to:** +- Share with support (can use `-Redact` to mask sensitive data) +- Parse with automation tools +- Retain diagnostic history + +--- + +## Common Scenarios + +### Scenario 1: "I can't connect to Cosmos DB from my machine" + +**Run this:** +```powershell +.\Diagnose-CosmosConnectivity.ps1 -Interactive +``` + +**Interpret results:** +- If `dns_resolution_failed` → Check VPN/proxy DNS settings +- If `tcp_connectivity_blocked` → Ask network team to check firewall/NSG rules +- If `network_connectivity_healthy` → Issue is auth/RBAC, not network + +--- + +### Scenario 2: "Private endpoint isn't working" + +**Run this:** +```powershell +.\Diagnose-CosmosConnectivity.ps1 ` + -EndpointUrl "https://my-cosmos.documents.azure.com" ` + -SubscriptionId "your-sub-id" ` + -ResourceGroup "your-rg" ` + -AccountName "your-account" ` + -PrivateEndpointIP "10.123.171.30" +``` + +**Interpret results:** +- If resolved IP matches private endpoint IP but TCP fails → VPN route blocked +- If resolved IP differs from provided IP → Route misconfiguration +- If network is healthy → Check private DNS zone configuration + +--- + +### Scenario 3: "How do I share this with support safely?" + +**Run with redaction:** +```powershell +.\Diagnose-CosmosConnectivity.ps1 ` + -EndpointUrl "https://my-cosmos.documents.azure.com" ` + -SubscriptionId "your-sub-id" ` + -ResourceGroup "your-rg" ` + -AccountName "your-account" ` + -Redact +``` + +Then share the generated JSON file. Sensitive data (subscription ID, usernames, tenant ID) will be masked as `REDACTED`. + +--- + +### Scenario 4: "I need the diagnostics in a pipeline" + +**Non-interactive with JSON output capture:** +```powershell +$json = .\Diagnose-CosmosConnectivity.ps1 ` + -EndpointUrl "https://my-cosmos.documents.azure.com" ` + -SubscriptionId "your-sub-id" ` + -ResourceGroup "your-rg" ` + -AccountName "your-account" 2>&1 ` + | Select-String -Pattern '^\{' -SimpleMatch | ConvertFrom-Json + +# Now use $json in automation +if ($json.classification.code -eq "network_connectivity_healthy") { + Write-Host "Network OK, escalating to app team" +} else { + Write-Host "Network issue: $($json.classification.summary)" +} +``` + +--- + +## Classification Codes + +The script produces one of these classification codes: + +| Code | Meaning | +|------|---------| +| `network_connectivity_healthy` | ✓ Network works. If errors, check auth/RBAC. | +| `dns_resolution_failed` | ✗ Cannot resolve endpoint hostname. | +| `tcp_connectivity_blocked` | ✗ DNS works, but TCP 443 blocked. | +| `private_endpoint_network_path_blocked` | ✗ Private endpoint detected, TCP fails. | +| `rbac_insufficient` | ⚠ Network OK, but RBAC permissions missing. | +| `azure_config_check_skipped` | ⚠ Azure CLI not authenticated. | + +See [CLASSIFICATION_MATRIX.md](./CLASSIFICATION_MATRIX.md) for detailed playbooks and support guidance. + +--- + +## Advanced Usage + +### Running Specific Checks + +The script always runs all checks, but you can parse the JSON to focus on specific ones: + +```powershell +# Get just DNS results +$report = Get-Content cosmos-diagnostic-*.json | ConvertFrom-Json +$report.diagnostics.dns | ConvertTo-Json + +# Get classification only +$report.classification | ConvertTo-Json + +# Check if RBAC is sufficient +$report.diagnostics.rbac.classification +``` + +--- + +### Integration with Support Ticketing + +When opening a support case: + +1. **Run the script** (interactive mode is fine) +2. **Include the generated JSON file** in your ticket +3. **Or use `-Redact` flag** if sharing with external support + +Example ticket text: +``` +Title: Cosmos DB Connection Errors + +Body: +Experiencing connection errors to my Cosmos DB account. +Attached diagnostic results (cosmos-diagnostic-*.json). + +Network Status: [paste classification.status] +Issue Code: [paste classification.code] +Endpoint: [paste target.hostname] +``` + +--- + +### Troubleshooting the Script Itself + +#### Script won't run (permission denied) + +```powershell +Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser +``` + +Then re-run the script. + +#### "Azure CLI not found" but I need RBAC checks + +Install Azure CLI: +- Windows: https://learn.microsoft.com/en-us/cli/azure/install-azure-cli-windows +- Mac: `brew install azure-cli` +- Linux: Follow docs at https://learn.microsoft.com/en-us/cli/azure/install-azure-cli-linux + +Then: +```powershell +az login +``` + +Re-run the script. + +#### Endpoint validation error + +**Error:** "Invalid format. Expected: https://.documents.azure.com" + +**Fix:** Remove trailing slash or port from URL: +- ❌ `https://my-cosmos.documents.azure.com/` (trailing slash) +- ❌ `https://my-cosmos.documents.azure.com:443/` (with port) +- ✅ `https://my-cosmos.documents.azure.com` (correct) + +--- + +## File Outputs + +### Generated Files + +After running, the script creates: + +**`cosmos-diagnostic-.json`** +- Full diagnostic report in JSON format +- Machine-readable for automation +- Can be shared with support +- Keep for troubleshooting history + +--- + +## JSON Schema + +For details on JSON structure, field definitions, and sample outputs, see [DIAGNOSTIC_SCHEMA.md](./DIAGNOSTIC_SCHEMA.md). + +--- + +## Support Routing + +Based on classification code, route as follows: + +| Classification | Route To | +|---|---| +| `network_connectivity_healthy` | Application/Auth team—network verified working | +| `dns_resolution_failed` | VPN/Network team—DNS issue | +| `tcp_connectivity_blocked` (public IP) | Firewall/ISP team—outbound port blocked | +| `private_endpoint_network_path_blocked` | Network team—PE routing issue | +| `rbac_insufficient` | Cosmos DB Access Control team | +| `azure_config_check_skipped` | Customer: Run `az login` first | + +--- + +## Version + +**Script Version:** 1.0.0 +**Schema Version:** 1.0.0 +**Last Updated:** 2026-05-13 + +--- + +## License + +This script is provided as-is for diagnosing Cosmos DB connectivity issues. See [LICENSE](../../LICENSE) for terms. + +--- + +## Next Steps + +1. **Run the script:** `.\Diagnose-CosmosConnectivity.ps1 -Interactive` +2. **Review output:** Check the JSON report and console summary +3. **Follow recommended actions** based on the classification code +4. **Share with support** if needed (use `-Redact` for sensitive data masking) + +For questions or issues with the script itself, contact the Cosmos DB team. diff --git a/scripts/TEST_SCENARIOS.md b/scripts/TEST_SCENARIOS.md new file mode 100644 index 000000000..238f76247 --- /dev/null +++ b/scripts/TEST_SCENARIOS.md @@ -0,0 +1,510 @@ +# Cosmos DB Connectivity Diagnostic - Test Scenarios + +## Overview + +This document defines test scenarios, expected outcomes, and validation procedures for the diagnostic script. Use these to verify script functionality across different network configurations. + +--- + +## Test Infrastructure Setup + +### Prerequisites +- Test Cosmos DB accounts in multiple configurations: + - Public endpoint only + - Private endpoint only + - Both public + private endpoints +- Test networks: + - Clean network (no corporate proxy/VPN) + - Behind corporate proxy + - Behind VPN (if possible) + - Restricted network (firewall blocking 443) + +--- + +## Test Scenarios + +### Scenario 1: Healthy Public Endpoint (All Checks Pass) + +**Setup:** +- Cosmos account with public endpoint enabled +- Running from clean network (no VPN/proxy) +- Azure CLI authenticated (optional) + +**Run:** +```powershell +.\Diagnose-CosmosConnectivity.ps1 ` + -EndpointUrl "https://test-public-01.documents.azure.com" ` + -SubscriptionId "12345678-1234-1234-1234-123456789012" ` + -ResourceGroup "test-cosmos-rg" ` + -AccountName "test-public-01" +``` + +**Expected Results:** +- ✅ DNS resolution: `succeeded = true` +- ✅ TCP connectivity: `succeeded = true` +- ✅ HTTPS probe: `statusCode = 401` (expected without auth) +- ✅ Private network: `isPrivateRange = false` +- ✅ Classification: `status = "success"`, `code = "network_connectivity_healthy"` + +**Validation Checklist:** +- [ ] Console shows "✓ PASS" for DNS and TCP +- [ ] Recommended Actions mention checking RBAC/auth +- [ ] JSON file created successfully +- [ ] Latency values are reasonable (< 1000ms) + +--- + +### Scenario 2: DNS Resolution Failure + +**Setup:** +- Network with DNS resolver that blocks documents.azure.com +- OR simulate by providing invalid hostname + +**Run:** +```powershell +.\Diagnose-CosmosConnectivity.ps1 ` + -EndpointUrl "https://invalid-account-xyz123.documents.azure.com" ` + -SubscriptionId "12345678-1234-1234-1234-123456789012" ` + -ResourceGroup "test-cosmos-rg" ` + -AccountName "invalid-account" +``` + +**Expected Results:** +- ❌ DNS resolution: `succeeded = false`, `error = "No such host is known"` +- ❌ TCP connectivity: `succeeded = false` +- ❌ Classification: `status = "failure"`, `code = "dns_resolution_failed"` + +**Validation Checklist:** +- [ ] Console shows "✗ FAIL" for DNS +- [ ] Error message is clear +- [ ] Root cause in classification mentions DNS/VPN/proxy +- [ ] Recommended actions include running manual `nslookup` +- [ ] JSON contains error details + +--- + +### Scenario 3: TCP Blocked (Public Endpoint) + +**Setup:** +- Network with firewall blocking outbound port 443 to documents.azure.com +- DNS resolves successfully but TCP fails + +**Run:** +```powershell +.\Diagnose-CosmosConnectivity.ps1 ` + -EndpointUrl "https://test-public-02.documents.azure.com" ` + -SubscriptionId "12345678-1234-1234-1234-123456789012" ` + -ResourceGroup "test-cosmos-rg" ` + -AccountName "test-public-02" +``` + +**Expected Results:** +- ✅ DNS resolution: `succeeded = true` +- ❌ TCP connectivity: `succeeded = false`, `error = "Connection timeout after 5000ms"` +- ❌ HTTPS probe: `statusCode = null`, `error contains "timeout"` +- ❌ Private network: `isPrivateRange = false` +- ❌ Classification: `status = "failure"`, `code = "tcp_connectivity_blocked"` + +**Validation Checklist:** +- [ ] DNS shows success, TCP shows timeout +- [ ] Console summary distinguishes DNS success from TCP failure +- [ ] Root cause mentions firewall/ISP/proxy +- [ ] Recommended actions include corporate network contact +- [ ] Timeout latency is approximately 5000ms + +--- + +### Scenario 4: Healthy Private Endpoint + +**Setup:** +- Cosmos account with private endpoint configured +- Client connected to VPN that can route to PE +- PE IP known and provided + +**Run:** +```powershell +.\Diagnose-CosmosConnectivity.ps1 ` + -EndpointUrl "https://test-private-01.documents.azure.com" ` + -SubscriptionId "12345678-1234-1234-1234-123456789012" ` + -ResourceGroup "test-cosmos-rg" ` + -AccountName "test-private-01" ` + -PrivateEndpointIP "10.123.171.30" +``` + +**Expected Results:** +- ✅ DNS resolution: `succeeded = true`, `addresses = ["10.123.171.30"]` +- ✅ TCP connectivity: `succeeded = true` +- ✅ Private network: `isPrivateRange = true`, `matchesExpectedPrivateEndpoint = true` +- ✅ Azure config: `publicNetworkAccessRestricted = true` (if checked) +- ✅ Classification: `status = "success"`, `code = "network_connectivity_healthy"` + +**Validation Checklist:** +- [ ] DNS resolves to private IP (10.x) +- [ ] TCP succeeds to private IP +- [ ] Indicators correctly identify private endpoint +- [ ] Expected PE IP matches resolved IP +- [ ] Classification recognizes healthy private path + +--- + +### Scenario 5: Private Endpoint Network Path Blocked + +**Setup:** +- Private endpoint configured +- Client on VPN but routing to PE subnet is blocked +- DNS resolves to PE IP but TCP times out + +**Run:** +```powershell +.\Diagnose-CosmosConnectivity.ps1 ` + -EndpointUrl "https://test-private-02.documents.azure.com" ` + -SubscriptionId "12345678-1234-1234-1234-123456789012" ` + -ResourceGroup "test-cosmos-rg" ` + -AccountName "test-private-02" ` + -PrivateEndpointIP "10.123.171.30" +``` + +**Expected Results:** +- ✅ DNS resolution: `succeeded = true`, `addresses = ["10.123.171.30"]` +- ❌ TCP connectivity: `succeeded = false`, `error = "Connection timeout after 5000ms"` +- ✅ Private network: `isPrivateRange = true`, `matchesExpectedPrivateEndpoint = true`, `vpnRouteWarning != null` +- ❌ Classification: `status = "failure"`, `code = "private_endpoint_network_path_blocked"` + +**Validation Checklist:** +- [ ] DNS resolves to expected PE IP +- [ ] TCP to PE IP fails with timeout +- [ ] VPN route warning is populated +- [ ] Classification correctly identifies PE path issue +- [ ] Recommended actions mention network team + routing +- [ ] Source IP is captured (if available) + +--- + +### Scenario 6: RBAC Insufficient + +**Setup:** +- Network connectivity is working +- Azure CLI authenticated as user with limited RBAC (e.g., only Reader role) +- Account queried successfully + +**Run:** +```powershell +az login # Login as limited user first +.\Diagnose-CosmosConnectivity.ps1 ` + -EndpointUrl "https://test-rbac-01.documents.azure.com" ` + -SubscriptionId "12345678-1234-1234-1234-123456789012" ` + -ResourceGroup "test-cosmos-rg" ` + -AccountName "test-rbac-01" +``` + +**Expected Results:** +- ✅ DNS resolution: `succeeded = true` +- ✅ TCP connectivity: `succeeded = true` +- ✅ HTTPS probe: `statusCode = 401` or `200` +- ❌ RBAC: `classification = "insufficient"`, `canReadAccount = false` +- ⚠️ Classification: `status = "warning"`, `code = "rbac_insufficient"` + +**Validation Checklist:** +- [ ] Network checks all pass +- [ ] RBAC assessment shows limited permissions +- [ ] Classification code is `rbac_insufficient` +- [ ] Recommended actions mention role assignment +- [ ] Error message explains what permissions are missing + +--- + +### Scenario 7: Azure CLI Not Authenticated + +**Setup:** +- All network checks work fine +- Azure CLI not installed OR not authenticated + +**Run:** +```powershell +# Without running az login first +.\Diagnose-CosmosConnectivity.ps1 ` + -EndpointUrl "https://test-public-03.documents.azure.com" ` + -SubscriptionId "12345678-1234-1234-1234-123456789012" ` + -ResourceGroup "test-cosmos-rg" ` + -AccountName "test-public-03" +``` + +**Expected Results:** +- ✅ DNS resolution: `succeeded = true` +- ✅ TCP connectivity: `succeeded = true` +- ⚠️ Azure CLI: `authenticated = false`, `error = "Not authenticated with Azure CLI. Run 'az login' to proceed."` +- ⚠️ Azure config: `checked = false`, `error = "Skipped"` +- ⚠️ Classification: May reference `azure_config_check_skipped` in warnings + +**Validation Checklist:** +- [ ] Network checks complete normally +- [ ] Azure CLI context shows unauthenticated +- [ ] Console warning mentions `az login` +- [ ] Recommended actions suggest re-running after authentication +- [ ] Script doesn't crash; gracefully continues + +--- + +### Scenario 8: Interactive Mode Input Flow + +**Setup:** +- User runs script with -Interactive flag +- Has all inputs ready + +**Run:** +```powershell +.\Diagnose-CosmosConnectivity.ps1 -Interactive +``` + +**Expected Sequence:** +1. Show input instructions with Portal navigation guide +2. Prompt: "Endpoint URL (e.g., https://my-cosmos.documents.azure.com)" +3. Validate input format; re-prompt if invalid +4. Prompt: "Subscription ID (12345678-...)" +5. Validate GUID format; re-prompt if invalid +6. Prompt: "Resource Group name" +7. Prompt: "Account Name" +8. Prompt: "Private Endpoint IP (optional, press Enter to skip)" +9. Prompt: "VPN Subnet Range (optional, press Enter to skip)" +10. Run diagnostics +11. Display results + +**Validation Checklist:** +- [ ] Input instructions are clear and helpful +- [ ] Format validation rejects invalid inputs +- [ ] Optional fields can be skipped (Enter key) +- [ ] All inputs accepted without error +- [ ] Diagnostics run successfully after inputs collected + +--- + +### Scenario 9: Non-Interactive with Redaction + +**Setup:** +- Run with -Redact flag +- Collect JSON output + +**Run:** +```powershell +$json = .\Diagnose-CosmosConnectivity.ps1 ` + -EndpointUrl "https://test-public-04.documents.azure.com" ` + -SubscriptionId "12345678-1234-1234-1234-123456789012" ` + -ResourceGroup "test-cosmos-rg" ` + -AccountName "test-public-04" ` + -Redact 2>&1 | Select-String -Pattern '^\{' -SimpleMatch | ConvertFrom-Json +``` + +**Expected Results:** +- ✅ JSON output completes successfully +- ✅ Target section: `subscriptionId = "REDACTED-SUBSCRIPTION-ID"` +- ✅ Target section: `resourceGroup = "REDACTED"` +- ✅ Target section: `accountName = "REDACTED"` +- ✅ Hostname is NOT redacted (needed for triage): `hostname = "test-public-04.documents.azure.com"` +- ✅ Azure CLI: `currentUser = "REDACTED-USER-NAME"` +- ✅ Azure CLI: `currentTenant = "REDACTED-TENANT-ID"` + +**Validation Checklist:** +- [ ] Sensitive fields masked as "REDACTED-*" +- [ ] Hostname NOT masked +- [ ] JSON still parseable +- [ ] Redaction doesn't break classification +- [ ] All RBAC role names preserved (not redacted) + +--- + +### Scenario 10: Private Endpoint IP Mismatch + +**Setup:** +- Private endpoint exists but expected IP is different from resolved IP +- Can happen if PE reconfigured or DNS zone stale + +**Run:** +```powershell +.\Diagnose-CosmosConnectivity.ps1 ` + -EndpointUrl "https://test-private-03.documents.azure.com" ` + -SubscriptionId "12345678-1234-1234-1234-123456789012" ` + -ResourceGroup "test-cosmos-rg" ` + -AccountName "test-private-03" ` + -PrivateEndpointIP "10.123.171.99" # Expected IP (not matching actual) +``` + +**Expected Results (if actual PE IP is 10.123.171.30):** +- ✅ DNS resolution: `succeeded = true`, `addresses = ["10.123.171.30"]` +- ✅ TCP connectivity: `succeeded = true` (connects to actual PE) +- ⚠️ Private network: `matchesExpectedPrivateEndpoint = false`, `indicators contains "WARNING: Resolved to 10.123.171.30 but expected ..."` +- ⚠️ Classification: May include `private_endpoint_mismatch` warning + +**Validation Checklist:** +- [ ] Mismatch detected +- [ ] Warning includes both expected and actual IPs +- [ ] TCP still attempts with actual resolved IP +- [ ] Classification identifies discrepancy +- [ ] Recommended actions mention checking PE config + +--- + +### Scenario 11: Latency Metrics + +**Setup:** +- Healthy connection +- Measure and log latency values + +**Run:** +```powershell +$json = .\Diagnose-CosmosConnectivity.ps1 -EndpointUrl "..." -SubscriptionId "..." ... 2>&1 | + Select-String -Pattern '^\{' | ConvertFrom-Json +$json.diagnostics.dns.latencyMs +$json.diagnostics.tcp.latencyMs +$json.diagnostics.https.latencyMs +``` + +**Expected Results:** +- DNS latency: 10-100ms (typical) +- TCP latency: 20-200ms (depends on network) +- HTTPS latency: 50-500ms (full round trip) +- All values > 0 and < 10000 (reasonable) + +**Validation Checklist:** +- [ ] Latency values are integers (milliseconds) +- [ ] Values are reasonable for network conditions +- [ ] No values are unrealistic (0 or > 60000) +- [ ] Timeouts show latencyMs = 0 + +--- + +### Scenario 12: Multiple Endpoints (Batch Testing) + +**Setup:** +- Multiple accounts to test +- Non-interactive batch mode + +**Run:** +```powershell +$accounts = @( + @{Url="https://account1.documents.azure.com"; Sub="..."; RG="rg1"; Name="account1"}, + @{Url="https://account2.documents.azure.com"; Sub="..."; RG="rg2"; Name="account2"}, + @{Url="https://account3.documents.azure.com"; Sub="..."; RG="rg3"; Name="account3"} +) + +$results = @() +foreach ($acct in $accounts) { + $json = .\Diagnose-CosmosConnectivity.ps1 @acct 2>&1 | + Select-String -Pattern '^\{' | ConvertFrom-Json + $results += @{ + Account = $acct.Name + Classification = $json.classification.code + DNS = $json.diagnostics.dns.succeeded + TCP = $json.diagnostics.tcp.succeeded + } +} +$results | Format-Table +``` + +**Expected Results:** +- All accounts processed without error +- JSON output captured for each +- Results table shows aggregated status +- Classification codes vary based on network conditions + +**Validation Checklist:** +- [ ] Batch processing completes +- [ ] All JSON files created +- [ ] No cross-account contamination +- [ ] Timestamp differs for each run + +--- + +## Regression Test Checklist + +Use this checklist before each release: + +- [ ] **Script Execution** + - [ ] Interactive mode completes + - [ ] Non-interactive mode with all parameters + - [ ] Redaction flag works + - [ ] Help/documentation displays correctly + +- [ ] **Network Diagnostics** + - [ ] DNS resolution succeeds on good network + - [ ] DNS resolution fails on blocked network + - [ ] TCP succeeds on open port + - [ ] TCP times out on blocked port + - [ ] HTTPS probe returns status code + +- [ ] **Private Endpoints** + - [ ] Detects private IP ranges correctly + - [ ] Compares against expected PE IP + - [ ] Handles PE IP mismatches gracefully + +- [ ] **Azure Integration** + - [ ] Works with authenticated Azure CLI + - [ ] Gracefully handles unauthenticated state + - [ ] Queries account config successfully + - [ ] RBAC assessment runs + +- [ ] **JSON Output** + - [ ] Valid JSON syntax + - [ ] All expected fields present + - [ ] Field values are correct types + - [ ] Redacted fields are properly masked + +- [ ] **Classification** + - [ ] Success code for healthy network + - [ ] DNS failure code for DNS issues + - [ ] TCP failure code for blocked ports + - [ ] PE path blocked code for PE issues + - [ ] RBAC code for permission issues + +- [ ] **Documentation** + - [ ] Recommended actions are actionable + - [ ] Error messages are helpful + - [ ] Output is readable and organized + +- [ ] **Edge Cases** + - [ ] Invalid URL format rejected + - [ ] Invalid GUID format rejected + - [ ] Timeout handling works + - [ ] No unhandled exceptions + +--- + +## Performance Expectations + +| Operation | Expected Time | Timeout | +|-----------|---|---| +| DNS resolution | 10-100ms | 5000ms | +| TCP connect | 20-200ms | 5000ms | +| HTTPS probe | 50-500ms | 5000ms | +| Azure CLI queries | 1-5 seconds | 10000ms | +| Full script (good network) | 10-20 seconds | N/A | +| Full script (blocked port) | ~5 seconds | N/A | + +--- + +## Success Criteria + +A test scenario passes if: +1. ✅ Script completes without unhandled exceptions +2. ✅ JSON output is valid and contains all expected fields +3. ✅ Classification code matches expected scenario +4. ✅ Recommended actions are relevant to the issue +5. ✅ Latency values are reasonable +6. ✅ Redaction (if enabled) properly masks sensitive fields + +--- + +## Sign-Off + +**QA Tester:** _________________ **Date:** _________ + +**Reviewed By:** _________________ **Date:** _________ + +**Approved for Release:** _________________ **Date:** _________ + +--- + +## Version + +- **Script Version:** 1.0.0 +- **Test Plan Version:** 1.0.0 +- **Last Updated:** 2026-05-13