mirror of
https://github.com/Azure/cosmos-explorer.git
synced 2026-05-15 01:37:37 +01:00
network connectivity
This commit is contained in:
@@ -0,0 +1,411 @@
|
|||||||
|
# Cosmos DB Connectivity Diagnostic - Classification Matrix & Support Guide
|
||||||
|
|
||||||
|
## Classification Decision Tree
|
||||||
|
|
||||||
|
```
|
||||||
|
START: Run diagnostic script
|
||||||
|
│
|
||||||
|
├─→ DNS Resolution Check
|
||||||
|
│ │
|
||||||
|
│ ├─→ ❌ FAILED
|
||||||
|
│ │ └─→ Classification: dns_resolution_failed
|
||||||
|
│ │ Action: DNS/VPN/proxy troubleshooting
|
||||||
|
│ │
|
||||||
|
│ └─→ ✓ PASSED
|
||||||
|
│ │
|
||||||
|
│ ├─→ Resolved IP is RFC 1918 (10.x, 172.16-31.x, 192.168.x)?
|
||||||
|
│ │ │
|
||||||
|
│ │ ├─→ YES (Private endpoint detected)
|
||||||
|
│ │ │ │
|
||||||
|
│ │ │ └─→ TCP 443 Test
|
||||||
|
│ │ │ │
|
||||||
|
│ │ │ ├─→ ❌ FAILED
|
||||||
|
│ │ │ │ └─→ private_endpoint_network_path_blocked
|
||||||
|
│ │ │ │ (VPN route, NSG, firewall, UDR, peering)
|
||||||
|
│ │ │ │
|
||||||
|
│ │ │ └─→ ✓ PASSED
|
||||||
|
│ │ │ └─→ Check RBAC
|
||||||
|
│ │ │
|
||||||
|
│ │ └─→ NO (Public endpoint)
|
||||||
|
│ │ │
|
||||||
|
│ │ └─→ TCP 443 Test
|
||||||
|
│ │ │
|
||||||
|
│ │ ├─→ ❌ FAILED
|
||||||
|
│ │ │ └─→ tcp_connectivity_blocked
|
||||||
|
│ │ │ (Firewall, ISP, proxy)
|
||||||
|
│ │ │
|
||||||
|
│ │ └─→ ✓ PASSED
|
||||||
|
│ │ └─→ network_connectivity_healthy
|
||||||
|
│ │
|
||||||
|
│ └─→ Check Azure Configuration & RBAC
|
||||||
|
│ │
|
||||||
|
│ ├─→ Azure CLI authenticated?
|
||||||
|
│ │ ├─→ NO → Skip ARM checks, mark warning
|
||||||
|
│ │ └─→ YES → Query network config & roles
|
||||||
|
│ │
|
||||||
|
│ └─→ Sufficient permissions?
|
||||||
|
│ ├─→ NO → rbac_insufficient
|
||||||
|
│ └─→ YES → All checks passed
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Classification Code Reference
|
||||||
|
|
||||||
|
### Success Codes
|
||||||
|
|
||||||
|
#### `network_connectivity_healthy`
|
||||||
|
- **Status:** success
|
||||||
|
- **When:** DNS resolves AND TCP 443 succeeds
|
||||||
|
- **Interpretation:** Local network is working. If Cosmos DB operations fail, issue is auth/RBAC/data-plane.
|
||||||
|
- **Actions:**
|
||||||
|
- Verify RBAC/authentication permissions
|
||||||
|
- Check account firewall IP rules
|
||||||
|
- Verify data-plane token hasn't expired
|
||||||
|
- Check application logs for specific errors
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Failure Codes
|
||||||
|
|
||||||
|
#### `dns_resolution_failed`
|
||||||
|
- **Status:** failure
|
||||||
|
- **When:** DNS lookup fails with SocketException or timeout
|
||||||
|
- **Interpretation:** Cannot resolve account hostname to any IP
|
||||||
|
- **Root Causes:**
|
||||||
|
- DNS server misconfiguration
|
||||||
|
- VPN/proxy intercepting DNS queries
|
||||||
|
- Corporate proxy redirecting .documents.azure.com
|
||||||
|
- Network unreachable before DNS server
|
||||||
|
- ISP DNS failure
|
||||||
|
- **Actions:**
|
||||||
|
1. Check VPN/proxy DNS settings
|
||||||
|
2. Run `nslookup <endpoint-hostname>`
|
||||||
|
3. Try alternate DNS: `nslookup <endpoint-hostname> 8.8.8.8`
|
||||||
|
4. Ping endpoint: `ping <endpoint-hostname>`
|
||||||
|
5. Contact network team if no resolution
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### `tcp_connectivity_blocked`
|
||||||
|
- **Status:** failure
|
||||||
|
- **When:** DNS succeeds BUT TCP 443 fails
|
||||||
|
- **Interpretation:** Network path blocked between client and endpoint
|
||||||
|
- **Root Causes (Public Endpoint):**
|
||||||
|
- Corporate firewall blocking outbound 443
|
||||||
|
- ISP blocking Cosmos/Azure IPs
|
||||||
|
- Regional geo-blocking
|
||||||
|
- HTTPS inspection proxy interfering
|
||||||
|
- Host-level firewall (Windows Defender, etc.)
|
||||||
|
- **Root Causes (Private Endpoint):**
|
||||||
|
- VPN not configured for private endpoint subnet
|
||||||
|
- Route not established between VPN subnet and private endpoint subnet
|
||||||
|
- NSG rules blocking 443 inbound on PE subnet
|
||||||
|
- NVA/firewall dropping packets
|
||||||
|
- UDR misconfiguration
|
||||||
|
- VNet peering not configured or expired
|
||||||
|
- Private DNS zone misconfiguration
|
||||||
|
- **Actions:**
|
||||||
|
1. Run `Test-NetConnection -ComputerName <hostname> -Port 443 -TraceRoute`
|
||||||
|
2. If private endpoint: Ask network team to verify VPN routing
|
||||||
|
3. Check host firewall (Windows Defender, Mac firewall, Linux iptables)
|
||||||
|
4. If corporate proxy: Verify HTTPS inspection not blocking certificates
|
||||||
|
5. Try from different network to isolate source
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### `private_endpoint_network_path_blocked`
|
||||||
|
- **Status:** failure
|
||||||
|
- **When:** Resolved to private IP (10.x, 172.16-31.x, 192.168.x) BUT TCP 443 fails
|
||||||
|
- **Interpretation:** Private endpoint detected but cannot reach it—network path issue
|
||||||
|
- **Root Causes:**
|
||||||
|
- VPN client subnet → private endpoint subnet routing broken
|
||||||
|
- Firewall/NVA blocking internal traffic
|
||||||
|
- NSG with restrictive rules on PE subnet
|
||||||
|
- UDR pointing to wrong next hop
|
||||||
|
- VNet peering not established
|
||||||
|
- Private DNS zone not configured or stale
|
||||||
|
- **Actions:**
|
||||||
|
1. Confirm VPN is connected and assigned correct subnet
|
||||||
|
2. Ask network team to verify routing: `route print` (Windows) or `netstat -rn` (Linux/Mac)
|
||||||
|
3. Check Azure NSG rules on private endpoint subnet for port 443 inbound
|
||||||
|
4. Verify private DNS zone has A record pointing to PE IP
|
||||||
|
5. Check if VNet peering exists and is Active
|
||||||
|
6. Run `Test-NetConnection -ComputerName <pe-ip> -Port 443` directly to PE IP
|
||||||
|
7. Provide network team with source IP from script output
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Warning Codes
|
||||||
|
|
||||||
|
#### `rbac_insufficient`
|
||||||
|
- **Status:** warning
|
||||||
|
- **When:** Network OK BUT caller lacks data-plane permissions
|
||||||
|
- **Interpretation:** Network is healthy, but RBAC prevents data operations
|
||||||
|
- **Actions:**
|
||||||
|
1. Request Cosmos DB Operator or Contributor role assignment
|
||||||
|
2. If using connection strings: ensure account hasn't been regenerated
|
||||||
|
3. Check data-plane RBAC (if enabled) via Azure CLI: `az role assignment list --scope <account-id>`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### `private_endpoint_mismatch`
|
||||||
|
- **Status:** warning
|
||||||
|
- **When:** Resolved IP differs from expected private endpoint IP
|
||||||
|
- **Interpretation:** Routing may be asymmetric or PE configuration changed
|
||||||
|
- **Actions:**
|
||||||
|
1. Verify private endpoint IP hasn't changed in Azure Portal
|
||||||
|
2. Ask network team to check asymmetric routing (DNS from corp vs VPN DNS)
|
||||||
|
3. Flush DNS cache: `ipconfig /flushdns` (Windows) or `sudo dscacheutil -flushcache` (Mac)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### `azure_config_check_skipped`
|
||||||
|
- **Status:** warning
|
||||||
|
- **When:** Azure CLI not authenticated or not installed
|
||||||
|
- **Interpretation:** Cannot validate ARM-level network config (firewall rules, PE connections)
|
||||||
|
- **Actions:**
|
||||||
|
1. Install Azure CLI: https://learn.microsoft.com/en-us/cli/azure/install-azure-cli
|
||||||
|
2. Authenticate: `az login`
|
||||||
|
3. Re-run script to collect ARM-level diagnostics
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### `unknown_error`
|
||||||
|
- **Status:** failure or warning
|
||||||
|
- **When:** Unhandled condition or unexpected error
|
||||||
|
- **Interpretation:** Script encountered something not in the matrix
|
||||||
|
- **Actions:**
|
||||||
|
1. Check script output for error details
|
||||||
|
2. Provide full JSON report to support
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Support Playbook
|
||||||
|
|
||||||
|
### Tier 1: Triage (ICM Responder)
|
||||||
|
|
||||||
|
**When customer reports: "Cosmos DB operations return HTTP 0.0 / connection errors"**
|
||||||
|
|
||||||
|
1. **Ask customer to run script:**
|
||||||
|
```powershell
|
||||||
|
.\Diagnose-CosmosConnectivity.ps1 -Interactive
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Receive JSON output. Check classification.code:**
|
||||||
|
|
||||||
|
| Code | Response |
|
||||||
|
|------|----------|
|
||||||
|
| `network_connectivity_healthy` | → Escalate to data-plane/auth team. This is not a network issue. |
|
||||||
|
| `dns_resolution_failed` | → Run script playbook below |
|
||||||
|
| `tcp_connectivity_blocked` (public endpoint) | → Run TCP failed / public endpoint playbook |
|
||||||
|
| `private_endpoint_network_path_blocked` | → Run private endpoint playbook |
|
||||||
|
| `rbac_insufficient` | → Check RBAC permissions |
|
||||||
|
| `azure_config_check_skipped` | → Ask customer to run `az login` and re-run |
|
||||||
|
|
||||||
|
3. **Document:**
|
||||||
|
- Save JSON report in ICM
|
||||||
|
- Note classification code and recommended actions
|
||||||
|
- Link to this support guide in response
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Playbook: DNS Resolution Failed
|
||||||
|
|
||||||
|
**Symptoms:** `dns_resolution_failed` code
|
||||||
|
|
||||||
|
**Steps:**
|
||||||
|
|
||||||
|
1. **Verify endpoint name with customer:**
|
||||||
|
- Check it matches Azure Portal > Cosmos Account > URI
|
||||||
|
- Typos are common
|
||||||
|
|
||||||
|
2. **Customer self-service:**
|
||||||
|
- Ask: "Can you manually run nslookup?"
|
||||||
|
```powershell
|
||||||
|
nslookup my-cosmos-account.documents.azure.com
|
||||||
|
```
|
||||||
|
- If nslookup fails → Likely VPN/proxy DNS redirect
|
||||||
|
- If nslookup succeeds but script fails → Check DNS servers in script output vs nslookup
|
||||||
|
|
||||||
|
3. **If behind corporate proxy:**
|
||||||
|
- Ask: "Is your traffic routed through a corporate proxy?"
|
||||||
|
- If YES: Proxy may be intercepting DNS or blocking .documents.azure.com
|
||||||
|
- Action: Customer should contact corporate network team
|
||||||
|
|
||||||
|
4. **If using VPN:**
|
||||||
|
- Ask: "Does DNS work when you disconnect from VPN?"
|
||||||
|
- If YES → VPN DNS redirect issue
|
||||||
|
- Action: Customer should contact VPN admin
|
||||||
|
|
||||||
|
5. **Escalation:**
|
||||||
|
- If all above fail, ask customer to contact their ISP or network provider
|
||||||
|
- This is not a Cosmos issue; it's upstream DNS
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Playbook: TCP 443 Failed / Public Endpoint
|
||||||
|
|
||||||
|
**Symptoms:** `tcp_connectivity_blocked` code with public IP
|
||||||
|
|
||||||
|
**Steps:**
|
||||||
|
|
||||||
|
1. **Customer runs detailed trace:**
|
||||||
|
```powershell
|
||||||
|
Test-NetConnection -ComputerName <hostname> -Port 443 -TraceRoute
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Analyze output:**
|
||||||
|
- Does it reach gateway/ISP?
|
||||||
|
- Where does it drop?
|
||||||
|
|
||||||
|
3. **If corporate network:**
|
||||||
|
- Check with network team if 443 outbound is allowed to Azure
|
||||||
|
- May need to whitelist docs.microsoft.com or documents.azure.com
|
||||||
|
|
||||||
|
4. **If ISP/home network:**
|
||||||
|
- Try from mobile hotspot to rule out ISP blocking
|
||||||
|
- If hotspot works → ISP is blocking Azure
|
||||||
|
|
||||||
|
5. **If Windows Defender Firewall:**
|
||||||
|
- Check Windows Defender Firewall for outbound rules
|
||||||
|
- Ensure 443 is not blocked
|
||||||
|
|
||||||
|
6. **If behind proxy:**
|
||||||
|
- Proxy may be doing HTTPS inspection
|
||||||
|
- Ask IT if they use SSL Bump/HTTPS Inspection
|
||||||
|
- May need to disable inspection for documents.azure.com or accept custom cert
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Playbook: Private Endpoint Network Path Blocked
|
||||||
|
|
||||||
|
**Symptoms:** `private_endpoint_network_path_blocked` code
|
||||||
|
|
||||||
|
**Steps:**
|
||||||
|
|
||||||
|
1. **Gather critical info from customer:**
|
||||||
|
- Source IP (from script output: `execution.hostname` and `diagnostics.tcp.sourceIp`)
|
||||||
|
- Resolved PE IP (from script: `diagnostics.dns.addresses[0]`)
|
||||||
|
- Is VPN connected?
|
||||||
|
- Which VPN client?
|
||||||
|
|
||||||
|
2. **Customer provides to network team:**
|
||||||
|
- "TCP from [source-IP] to [PE-IP]:443 is timing out"
|
||||||
|
- "Please verify routing from VPN subnet to PE subnet"
|
||||||
|
- "Please check NSGs for port 443 inbound on PE subnet"
|
||||||
|
|
||||||
|
3. **Network team should check:**
|
||||||
|
- Route table: Does VPN subnet have route to PE subnet?
|
||||||
|
- NSG: PE subnet NSG allows inbound 443?
|
||||||
|
- NVA/Firewall: Any stateful filtering blocking traffic?
|
||||||
|
- UDR: Any User Defined Routes sending traffic wrong way?
|
||||||
|
- VNet peering: If PE in different VNet, is peering configured?
|
||||||
|
- Private DNS: Does private DNS zone have A record for PE IP?
|
||||||
|
|
||||||
|
4. **Cosmos team role:**
|
||||||
|
- Verify account has private endpoint connection in Approved state
|
||||||
|
- Check if PE IP matches what Azure reports
|
||||||
|
- Provide PE connection details from Azure Portal
|
||||||
|
|
||||||
|
5. **Escalation criteria:**
|
||||||
|
- If routing is correct but still fails → May be NSG inside PE subnet (rare)
|
||||||
|
- If all checks pass → Escalate to Azure Networking support
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Playbook: RBAC Insufficient
|
||||||
|
|
||||||
|
**Symptoms:** `rbac_insufficient` code
|
||||||
|
|
||||||
|
**Steps:**
|
||||||
|
|
||||||
|
1. **Check role assignments:**
|
||||||
|
```powershell
|
||||||
|
az role assignment list --scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.DocumentDB/databaseAccounts/<account>
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Assign appropriate role:**
|
||||||
|
- Cosmos DB Operator (read/write data)
|
||||||
|
- Cosmos DB Account Reader (read-only)
|
||||||
|
- Contributor or Owner (full management)
|
||||||
|
|
||||||
|
3. **If using master key:**
|
||||||
|
- Primary/secondary keys are still valid if account hasn't been regenerated
|
||||||
|
- Ask: Has the account been regenerated recently?
|
||||||
|
- If yes, old keys won't work
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## JSON Parsing for Automation
|
||||||
|
|
||||||
|
### Python Example (Support Bot)
|
||||||
|
|
||||||
|
```python
|
||||||
|
import json
|
||||||
|
|
||||||
|
def parse_cosmos_diagnostic(json_data):
|
||||||
|
report = json.loads(json_data)
|
||||||
|
|
||||||
|
classification = report.get("classification", {})
|
||||||
|
code = classification.get("code")
|
||||||
|
status = classification.get("status")
|
||||||
|
|
||||||
|
# Route based on code
|
||||||
|
if code == "network_connectivity_healthy":
|
||||||
|
return "Escalate: Auth/RBAC team"
|
||||||
|
elif code == "dns_resolution_failed":
|
||||||
|
return "Run DNS playbook"
|
||||||
|
elif code == "tcp_connectivity_blocked":
|
||||||
|
endpoint = report["target"]["endpointUrl"]
|
||||||
|
if "10." in report["diagnostics"]["dns"]["addresses"][0]:
|
||||||
|
return "Run Private Endpoint playbook"
|
||||||
|
else:
|
||||||
|
return "Run TCP Failure / Public Endpoint playbook"
|
||||||
|
elif code == "private_endpoint_network_path_blocked":
|
||||||
|
return "Run Private Endpoint playbook"
|
||||||
|
elif code == "rbac_insufficient":
|
||||||
|
return "Check RBAC: " + str(report["diagnostics"]["rbac"]["roleAssignments"])
|
||||||
|
else:
|
||||||
|
return "Unknown code: " + code
|
||||||
|
```
|
||||||
|
|
||||||
|
### Support Ticket Template
|
||||||
|
|
||||||
|
```
|
||||||
|
COSMOS DB CONNECTIVITY ISSUE - DIAGNOSTIC RECEIVED
|
||||||
|
|
||||||
|
Classification: [classification.code]
|
||||||
|
Status: [classification.status]
|
||||||
|
Summary: [classification.summary]
|
||||||
|
|
||||||
|
Network Diagnostics:
|
||||||
|
DNS Resolution: [diagnostics.dns.succeeded]
|
||||||
|
TCP 443 Connectivity: [diagnostics.tcp.succeeded]
|
||||||
|
HTTPS Reachability: [diagnostics.https.statusCode]
|
||||||
|
Private Endpoint: [diagnostics.privateNetwork.isPrivateRange]
|
||||||
|
|
||||||
|
Azure Configuration:
|
||||||
|
Public Network Restricted: [diagnostics.azureNetworkConfig.publicNetworkAccessRestricted]
|
||||||
|
Private Endpoints: [diagnostics.azureNetworkConfig.privateEndpoints.length] configured
|
||||||
|
|
||||||
|
RBAC Status:
|
||||||
|
Classification: [diagnostics.rbac.classification]
|
||||||
|
Can Read Account: [diagnostics.rbac.canReadAccount]
|
||||||
|
Can Manage Account: [diagnostics.rbac.canManageAccount]
|
||||||
|
|
||||||
|
Recommended Actions:
|
||||||
|
[classification.recommendedActions joined with newlines]
|
||||||
|
|
||||||
|
Next Step:
|
||||||
|
[routing based on classification.code]
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- [Azure Cosmos DB Troubleshoot Connectivity Issues](https://learn.microsoft.com/en-us/azure/cosmos-db/troubleshoot-connection)
|
||||||
|
- [Private Endpoints for Azure Cosmos DB](https://learn.microsoft.com/en-us/azure/cosmos-db/how-to-configure-private-endpoints)
|
||||||
|
- [Network Security Groups](https://learn.microsoft.com/en-us/azure/virtual-network/network-security-groups-overview)
|
||||||
|
- [User Defined Routes](https://learn.microsoft.com/en-us/azure/virtual-network/virtual-networks-udr-overview)
|
||||||
@@ -0,0 +1,460 @@
|
|||||||
|
# Cosmos DB Connectivity Diagnostic - JSON Schema v1.0
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
The diagnostic script outputs a structured JSON report containing network connectivity, private network configuration, and RBAC assessment data. This schema is stable and versioned to support parsing and triage automation.
|
||||||
|
|
||||||
|
## Root Object
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"version": "1.0.0", // Schema version (semantic versioning)
|
||||||
|
"timestamp": "2026-05-13T14:30:45.123Z", // ISO 8601 UTC timestamp
|
||||||
|
"target": {...}, // Account and subscription context
|
||||||
|
"execution": {...}, // Script execution environment
|
||||||
|
"diagnostics": {...}, // All diagnostic results
|
||||||
|
"classification": {...} // Automated classification and recommendations
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Target Object
|
||||||
|
Account and subscription identifiers.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"target": {
|
||||||
|
"endpointUrl": "https://my-cosmos-account.documents.azure.com",
|
||||||
|
"hostname": "my-cosmos-account.documents.azure.com",
|
||||||
|
"subscriptionId": "12345678-1234-1234-1234-123456789012", // May be "REDACTED" if --Redact flag used
|
||||||
|
"resourceGroup": "my-rg", // May be "REDACTED"
|
||||||
|
"accountName": "my-cosmos-account" // May be "REDACTED"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Execution Object
|
||||||
|
Environment where script ran.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"execution": {
|
||||||
|
"hostname": "DESKTOP-ABC123", // Machine name
|
||||||
|
"platform": "Windows 10", // OS name and version
|
||||||
|
"powershellVersion": "7.3.0" // PowerShell version
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Diagnostics Object
|
||||||
|
All diagnostic results grouped by category.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"diagnostics": {
|
||||||
|
"dns": { ... }, // DNS resolution results
|
||||||
|
"tcp": { ... }, // TCP 443 connectivity results
|
||||||
|
"https": { ... }, // HTTPS probe results
|
||||||
|
"privateNetwork": { ... }, // Private endpoint indicators
|
||||||
|
"azureNetworkConfig": { ... }, // ARM-sourced network configuration
|
||||||
|
"rbac": { ... }, // RBAC assessment
|
||||||
|
"azureCli": { ... } // Azure CLI context
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### DNS Results
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"dns": {
|
||||||
|
"hostname": "my-cosmos-account.documents.azure.com",
|
||||||
|
"succeeded": true, // true = hostname resolved
|
||||||
|
"addresses": [
|
||||||
|
"52.180.123.45", // Resolved IPv4 addresses
|
||||||
|
"2607:f8b0:4005:806::200e" // IPv6 if available
|
||||||
|
],
|
||||||
|
"error": null, // Error message if resolution failed
|
||||||
|
"dnsServers": [
|
||||||
|
"8.8.8.8", // Detected DNS servers
|
||||||
|
"8.8.4.4"
|
||||||
|
],
|
||||||
|
"latencyMs": 145 // DNS query latency in milliseconds
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Classification logic:**
|
||||||
|
- `succeeded: false` → DNS failure, likely network or DNS configuration issue
|
||||||
|
- `succeeded: true` with `addresses` containing private IP (10.x, 172.16-31.x, 192.168.x) → Private endpoint
|
||||||
|
- `succeeded: true` with `addresses` containing public IP → Public endpoint
|
||||||
|
|
||||||
|
### TCP Connectivity Results
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"tcp": {
|
||||||
|
"hostname": "my-cosmos-account.documents.azure.com",
|
||||||
|
"port": 443,
|
||||||
|
"succeeded": true, // true = TCP 443 connection established
|
||||||
|
"error": null, // Error message if connection failed (e.g., "Connection timeout after 5000ms")
|
||||||
|
"latencyMs": 87, // Connection latency
|
||||||
|
"sourceIp": "192.168.1.100" // Local IP used for connection attempt
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Classification logic:**
|
||||||
|
- `succeeded: false` with DNS resolved → Network path blocked
|
||||||
|
- `error` contains "timeout" → VPN/firewall/NVA may be dropping packets
|
||||||
|
- `error` contains "refused" → Target may be rejecting connections
|
||||||
|
|
||||||
|
### HTTPS Probe Results
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"https": {
|
||||||
|
"url": "https://my-cosmos-account.documents.azure.com",
|
||||||
|
"succeeded": true, // true = HTTP 200-299 response
|
||||||
|
"statusCode": 401, // HTTP status code (401 expected without auth)
|
||||||
|
"error": null, // TLS/connection errors
|
||||||
|
"latencyMs": 234 // Full request round-trip latency
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Classification logic:**
|
||||||
|
- `succeeded: true` (any 2xx/4xx status) → Can reach endpoint
|
||||||
|
- `statusCode: 401` → Expected (no credentials), network is healthy
|
||||||
|
- `error` contains "certificate" or "TLS" → Certificate validation issue
|
||||||
|
- `error` and `succeeded: false` → Network or firewall blocking TLS
|
||||||
|
|
||||||
|
### Private Network Indicators
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"privateNetwork": {
|
||||||
|
"isPrivateRange": true, // true if any resolved IP is RFC 1918
|
||||||
|
"indicators": [
|
||||||
|
"Resolved to RFC 1918 private IP range (10.123.171.30)",
|
||||||
|
"Matches expected private endpoint IP (10.123.171.30)"
|
||||||
|
],
|
||||||
|
"matchesExpectedPrivateEndpoint": true, // true if resolved IP matches PrivateEndpointIP parameter
|
||||||
|
"vpnRouteWarning": null // Warning if VPN subnet routing appears blocked
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Azure Network Configuration
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"azureNetworkConfig": {
|
||||||
|
"checked": true, // true if successfully queried via Azure CLI
|
||||||
|
"publicNetworkAccessRestricted": true, // true if public network access is disabled
|
||||||
|
"privateEndpoints": [
|
||||||
|
{
|
||||||
|
"id": "/subscriptions/.../privateEndpointConnections/my-pe-connection",
|
||||||
|
"state": "Approved" // Status: Approved, Pending, Rejected
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"vnetRules": [ ], // Virtual network rules (firewall)
|
||||||
|
"error": null // Error if Azure CLI query failed
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### RBAC Assessment
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"rbac": {
|
||||||
|
"checked": true, // true if RBAC checked successfully
|
||||||
|
"canReadAccount": true, // true if caller can read account properties
|
||||||
|
"canManageAccount": false, // true if caller has Contributor/Owner
|
||||||
|
"canExecuteDataPlaneOps": true, // true if caller likely has data-plane roles
|
||||||
|
"roleAssignments": [
|
||||||
|
{
|
||||||
|
"roleDefinitionName": "Cosmos DB Operator",
|
||||||
|
"principalName": "user@example.com"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"classification": "partial", // Enum: "sufficient", "partial", "insufficient", "unknown"
|
||||||
|
"error": null // Error message if check failed
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Azure CLI Context
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"azureCli": {
|
||||||
|
"installed": true, // true if Azure CLI is installed
|
||||||
|
"authenticated": true, // true if 'az login' was successful
|
||||||
|
"currentUser": "user@example.com", // May be "REDACTED-USER-NAME"
|
||||||
|
"currentTenant": "12345678-1234-1234-1234-123456789012", // May be "REDACTED-TENANT-ID"
|
||||||
|
"currentSubscription": "abcdef01-2345-6789-abcd-ef0123456789",
|
||||||
|
"error": null // Error if CLI not installed or not authenticated
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Classification Object
|
||||||
|
Automated classification with recommendations.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"classification": {
|
||||||
|
"status": "failure", // Enum: "success", "failure", "warning", "unknown"
|
||||||
|
"code": "tcp_connectivity_blocked", // Machine-readable classification code
|
||||||
|
"summary": "DNS resolution succeeded but TCP 443 connection failed. Network path is blocked.",
|
||||||
|
"rootCause": "Private endpoint configured but network path blocked (VPN routing, firewall/NVA, NSG, UDR, or peering issue)",
|
||||||
|
"recommendedActions": [
|
||||||
|
"1. Verify VPN connectivity and that your client subnet can route to the private endpoint subnet",
|
||||||
|
"2. Ask your network team to verify routing between DESKTOP-ABC123 and private endpoint 10.123.171.30",
|
||||||
|
"3. Check Azure network security groups (NSGs) rules for port 443 inbound",
|
||||||
|
"4. Verify Azure Virtual Network peering and User Defined Routes (UDRs)",
|
||||||
|
"5. Check if corporate firewall/NVA is blocking the connection",
|
||||||
|
"6. Manually run: Test-NetConnection -ComputerName my-cosmos-account.documents.azure.com -Port 443"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Classification Codes Reference
|
||||||
|
|
||||||
|
| Code | Status | Meaning | Likely Cause |
|
||||||
|
|------|--------|---------|--------------|
|
||||||
|
| `dns_resolution_failed` | failure | Hostname cannot resolve | DNS misconfiguration, proxy redirect, network unreachable |
|
||||||
|
| `tcp_connectivity_blocked` | failure | DNS works, TCP 443 fails | Firewall, VPN routing, NVA, NSG, private path blocked |
|
||||||
|
| `private_endpoint_network_path_blocked` | failure | Private endpoint detected, TCP fails | VPN → private endpoint routing broken |
|
||||||
|
| `network_connectivity_healthy` | success | DNS and TCP both work | Network is healthy; check auth/RBAC if operations fail |
|
||||||
|
| `rbac_insufficient` | warning | Network OK, but RBAC limited | User lacks data-plane roles |
|
||||||
|
| `private_endpoint_mismatch` | warning | Resolved to different IP than expected | Private endpoint routing may be asymmetric or misconfigured |
|
||||||
|
| `azure_config_check_skipped` | warning | Azure CLI not authenticated | Can't validate ARM-level network configuration |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Redacted Output
|
||||||
|
|
||||||
|
When script is invoked with `-Redact` flag:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"target": {
|
||||||
|
"endpointUrl": "REDACTED",
|
||||||
|
"hostname": "my-cosmos-account.documents.azure.com", // Hostname kept (needed for triage)
|
||||||
|
"subscriptionId": "REDACTED-SUBSCRIPTION-ID",
|
||||||
|
"resourceGroup": "REDACTED",
|
||||||
|
"accountName": "REDACTED"
|
||||||
|
},
|
||||||
|
"diagnostics": {
|
||||||
|
"azureCli": {
|
||||||
|
"currentUser": "REDACTED-USER-NAME",
|
||||||
|
"currentTenant": "REDACTED-TENANT-ID"
|
||||||
|
},
|
||||||
|
"rbac": {
|
||||||
|
"roleAssignments": [
|
||||||
|
{
|
||||||
|
"roleDefinitionName": "Cosmos DB Operator",
|
||||||
|
"principalName": "REDACTED-PRINCIPAL-NAME"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Sample Outputs
|
||||||
|
|
||||||
|
### Scenario 1: Network Healthy (Public Endpoint)
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"version": "1.0.0",
|
||||||
|
"timestamp": "2026-05-13T14:30:45Z",
|
||||||
|
"target": {
|
||||||
|
"endpointUrl": "https://my-cosmos.documents.azure.com",
|
||||||
|
"hostname": "my-cosmos.documents.azure.com",
|
||||||
|
"subscriptionId": "12345678-1234-1234-1234-123456789012",
|
||||||
|
"resourceGroup": "my-rg",
|
||||||
|
"accountName": "my-cosmos"
|
||||||
|
},
|
||||||
|
"diagnostics": {
|
||||||
|
"dns": {
|
||||||
|
"hostname": "my-cosmos.documents.azure.com",
|
||||||
|
"succeeded": true,
|
||||||
|
"addresses": ["52.180.123.45"],
|
||||||
|
"error": null,
|
||||||
|
"latencyMs": 12
|
||||||
|
},
|
||||||
|
"tcp": {
|
||||||
|
"hostname": "my-cosmos.documents.azure.com",
|
||||||
|
"port": 443,
|
||||||
|
"succeeded": true,
|
||||||
|
"error": null,
|
||||||
|
"latencyMs": 45,
|
||||||
|
"sourceIp": "192.168.1.100"
|
||||||
|
},
|
||||||
|
"https": {
|
||||||
|
"url": "https://my-cosmos.documents.azure.com",
|
||||||
|
"succeeded": true,
|
||||||
|
"statusCode": 401,
|
||||||
|
"error": null,
|
||||||
|
"latencyMs": 78
|
||||||
|
},
|
||||||
|
"privateNetwork": {
|
||||||
|
"isPrivateRange": false,
|
||||||
|
"indicators": [],
|
||||||
|
"matchesExpectedPrivateEndpoint": false,
|
||||||
|
"vpnRouteWarning": null
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"classification": {
|
||||||
|
"status": "success",
|
||||||
|
"code": "network_connectivity_healthy",
|
||||||
|
"summary": "Network connectivity is healthy. DNS resolves and TCP 443 is reachable.",
|
||||||
|
"rootCause": null,
|
||||||
|
"recommendedActions": [
|
||||||
|
"✓ Local network connectivity is working",
|
||||||
|
"If Cosmos DB operations still fail, check:",
|
||||||
|
" - RBAC/authentication permissions",
|
||||||
|
" - Account firewall IP rules (if enabled)",
|
||||||
|
" - Data plane token expiry",
|
||||||
|
" - Application-level issues (connection strings, SDK versions)"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Scenario 2: Private Endpoint Path Blocked
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"version": "1.0.0",
|
||||||
|
"timestamp": "2026-05-13T14:35:22Z",
|
||||||
|
"target": {
|
||||||
|
"endpointUrl": "https://my-cosmos-pe.documents.azure.com",
|
||||||
|
"hostname": "my-cosmos-pe.documents.azure.com",
|
||||||
|
"subscriptionId": "12345678-1234-1234-1234-123456789012",
|
||||||
|
"resourceGroup": "my-rg",
|
||||||
|
"accountName": "my-cosmos-pe"
|
||||||
|
},
|
||||||
|
"diagnostics": {
|
||||||
|
"dns": {
|
||||||
|
"hostname": "my-cosmos-pe.documents.azure.com",
|
||||||
|
"succeeded": true,
|
||||||
|
"addresses": ["10.123.171.30"],
|
||||||
|
"error": null,
|
||||||
|
"latencyMs": 8
|
||||||
|
},
|
||||||
|
"tcp": {
|
||||||
|
"hostname": "my-cosmos-pe.documents.azure.com",
|
||||||
|
"port": 443,
|
||||||
|
"succeeded": false,
|
||||||
|
"error": "Connection timeout after 5000ms",
|
||||||
|
"latencyMs": 0,
|
||||||
|
"sourceIp": null
|
||||||
|
},
|
||||||
|
"privateNetwork": {
|
||||||
|
"isPrivateRange": true,
|
||||||
|
"indicators": [
|
||||||
|
"Resolved to RFC 1918 private IP range (10.123.171.30)",
|
||||||
|
"Matches expected private endpoint IP (10.123.171.30)"
|
||||||
|
],
|
||||||
|
"matchesExpectedPrivateEndpoint": true,
|
||||||
|
"vpnRouteWarning": "Private endpoint IP detected but TCP 443 failed. Likely VPN → PE route blocked."
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"classification": {
|
||||||
|
"status": "failure",
|
||||||
|
"code": "private_endpoint_network_path_blocked",
|
||||||
|
"summary": "DNS resolution succeeded but TCP 443 connection failed to private endpoint. Network path is blocked.",
|
||||||
|
"rootCause": "Private endpoint network path blocked (VPN routing, firewall/NVA, NSG, UDR, or peering issue)",
|
||||||
|
"recommendedActions": [
|
||||||
|
"1. Verify VPN connectivity and that your client subnet can route to the private endpoint subnet",
|
||||||
|
"2. Ask your network team to verify routing from 10.249.14.218 to private endpoint 10.123.171.30",
|
||||||
|
"3. Check Azure network security groups (NSGs) rules for port 443 inbound on private endpoint subnet",
|
||||||
|
"4. Verify Azure Virtual Network peering and User Defined Routes (UDRs)",
|
||||||
|
"5. Check if corporate firewall/NVA is blocking the connection",
|
||||||
|
"6. Manually run: Test-NetConnection -ComputerName my-cosmos-pe.documents.azure.com -Port 443"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Scenario 3: DNS Resolution Failed
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"version": "1.0.0",
|
||||||
|
"timestamp": "2026-05-13T14:40:10Z",
|
||||||
|
"target": {
|
||||||
|
"endpointUrl": "https://my-cosmos-invalid.documents.azure.com",
|
||||||
|
"hostname": "my-cosmos-invalid.documents.azure.com"
|
||||||
|
},
|
||||||
|
"diagnostics": {
|
||||||
|
"dns": {
|
||||||
|
"hostname": "my-cosmos-invalid.documents.azure.com",
|
||||||
|
"succeeded": false,
|
||||||
|
"addresses": [],
|
||||||
|
"error": "No such host is known",
|
||||||
|
"dnsServers": ["8.8.8.8"],
|
||||||
|
"latencyMs": 2342
|
||||||
|
},
|
||||||
|
"tcp": {
|
||||||
|
"hostname": "my-cosmos-invalid.documents.azure.com",
|
||||||
|
"port": 443,
|
||||||
|
"succeeded": false,
|
||||||
|
"error": "No such host is known",
|
||||||
|
"latencyMs": 0,
|
||||||
|
"sourceIp": null
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"classification": {
|
||||||
|
"status": "failure",
|
||||||
|
"code": "dns_resolution_failed",
|
||||||
|
"summary": "DNS resolution failed. The Cosmos DB endpoint hostname cannot be resolved.",
|
||||||
|
"rootCause": "DNS configuration, VPN/proxy DNS redirect, or network connectivity issue",
|
||||||
|
"recommendedActions": [
|
||||||
|
"1. Check if you are connected to corporate VPN or proxy that intercepts DNS",
|
||||||
|
"2. Manually run: nslookup my-cosmos-invalid.documents.azure.com",
|
||||||
|
"3. If nslookup fails, check with your network team or ISP",
|
||||||
|
"4. Try pinging the endpoint or using nslookup with alternate DNS: nslookup my-cosmos-invalid.documents.azure.com 8.8.8.8"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Parsing Guidelines
|
||||||
|
|
||||||
|
Implementers parsing this JSON should:
|
||||||
|
|
||||||
|
1. **Always check version**: Fields may differ in future versions. Parse defensively.
|
||||||
|
2. **Use classification.code not status**: Status is user-facing; code is machine-readable for routing and automation.
|
||||||
|
3. **Check diagnostics.azureCli.authenticated**: If false, Azure configuration checks are unreliable.
|
||||||
|
4. **Prioritize classification.recommendedActions**: Contains context-specific guidance.
|
||||||
|
5. **Redacted fields**: May be null or "REDACTED" strings. Do not assume structure.
|
||||||
|
6. **Latency fields**: Milliseconds, may be 0 if unavailable.
|
||||||
|
7. **Handle missing fields**: Especially in older versions or on non-Windows platforms.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Version History
|
||||||
|
|
||||||
|
### v1.0.0 (2026-05-13)
|
||||||
|
- Initial schema
|
||||||
|
- Includes DNS, TCP, HTTPS, private network, Azure config, and RBAC checks
|
||||||
|
- Classification codes stable
|
||||||
|
- Redaction support
|
||||||
@@ -0,0 +1,699 @@
|
|||||||
|
#!/usr/bin/env pwsh
|
||||||
|
<#
|
||||||
|
.SYNOPSIS
|
||||||
|
Cosmos DB Connectivity Diagnostic Script
|
||||||
|
Captures local network connectivity, private network posture, and RBAC evidence.
|
||||||
|
|
||||||
|
.DESCRIPTION
|
||||||
|
This script performs comprehensive network and access diagnostics for Cosmos DB accounts.
|
||||||
|
It can run in interactive or non-interactive mode and produces a JSON report for triage.
|
||||||
|
|
||||||
|
.PARAMETER EndpointUrl
|
||||||
|
The Cosmos DB account endpoint URL.
|
||||||
|
Format: https://<account-name>.documents.azure.com or https://<account-name>.documents.azure.com:443/
|
||||||
|
WHERE TO GET: Azure Portal > Cosmos DB Account > Overview tab > URI field
|
||||||
|
OR: Use the endpoint shown in Cosmos Explorer connection string
|
||||||
|
|
||||||
|
.PARAMETER SubscriptionId
|
||||||
|
Azure subscription ID containing the Cosmos account.
|
||||||
|
WHERE TO GET: Azure Portal > Subscriptions > Copy Subscription ID
|
||||||
|
FORMAT: 12345678-1234-1234-1234-123456789012
|
||||||
|
|
||||||
|
.PARAMETER ResourceGroup
|
||||||
|
Azure resource group name containing the Cosmos account.
|
||||||
|
WHERE TO GET: Azure Portal > Cosmos DB Account > Resource group field (top-right)
|
||||||
|
|
||||||
|
.PARAMETER AccountName
|
||||||
|
Cosmos DB account name.
|
||||||
|
WHERE TO GET: Azure Portal > Cosmos DB Account > Account Name field
|
||||||
|
Or extract from endpoint URL (part before .documents.azure.com)
|
||||||
|
|
||||||
|
.PARAMETER PrivateEndpointIP
|
||||||
|
(Optional) Expected private endpoint IP if account uses private link.
|
||||||
|
WHERE TO GET: Azure Portal > Cosmos DB Account > Private Endpoint Connections tab > Private IP address column
|
||||||
|
|
||||||
|
.PARAMETER VpnSubnetRange
|
||||||
|
(Optional) Customer's VPN/client subnet CIDR for route analysis.
|
||||||
|
FORMAT: 10.0.0.0/24
|
||||||
|
WHERE TO GET: Ask your network team or check VPN client properties
|
||||||
|
|
||||||
|
.PARAMETER Interactive
|
||||||
|
If specified, script prompts for missing parameters instead of requiring them as arguments.
|
||||||
|
|
||||||
|
.PARAMETER Redact
|
||||||
|
If specified, output JSON redacts sensitive identifiers (tenant ID, subscription ID, usernames).
|
||||||
|
|
||||||
|
.EXAMPLE
|
||||||
|
# Interactive mode - script will prompt for inputs
|
||||||
|
.\Diagnose-CosmosConnectivity.ps1 -Interactive
|
||||||
|
|
||||||
|
.EXAMPLE
|
||||||
|
# Non-interactive with full parameters
|
||||||
|
.\Diagnose-CosmosConnectivity.ps1 `
|
||||||
|
-EndpointUrl "https://my-cosmos-account.documents.azure.com" `
|
||||||
|
-SubscriptionId "12345678-1234-1234-1234-123456789012" `
|
||||||
|
-ResourceGroup "my-rg" `
|
||||||
|
-AccountName "my-cosmos-account"
|
||||||
|
|
||||||
|
.EXAMPLE
|
||||||
|
# With private endpoint and output redaction
|
||||||
|
.\Diagnose-CosmosConnectivity.ps1 `
|
||||||
|
-EndpointUrl "https://my-cosmos-account.documents.azure.com" `
|
||||||
|
-SubscriptionId "12345678-1234-1234-1234-123456789012" `
|
||||||
|
-ResourceGroup "my-rg" `
|
||||||
|
-AccountName "my-cosmos-account" `
|
||||||
|
-PrivateEndpointIP "10.123.171.30" `
|
||||||
|
-Redact
|
||||||
|
#>
|
||||||
|
|
||||||
|
param(
|
||||||
|
[Parameter(ValueFromPipelineByPropertyName=$true)]
|
||||||
|
[ValidateScript({$_ -match "^https://[a-z0-9-]+\.documents\.azure\.com" -or $_ -match "^https://[a-z0-9-]+\.documents\.azure\.com:443"})]
|
||||||
|
[string]$EndpointUrl,
|
||||||
|
|
||||||
|
[Parameter(ValueFromPipelineByPropertyName=$true)]
|
||||||
|
[guid]$SubscriptionId,
|
||||||
|
|
||||||
|
[Parameter(ValueFromPipelineByPropertyName=$true)]
|
||||||
|
[string]$ResourceGroup,
|
||||||
|
|
||||||
|
[Parameter(ValueFromPipelineByPropertyName=$true)]
|
||||||
|
[string]$AccountName,
|
||||||
|
|
||||||
|
[Parameter(ValueFromPipelineByPropertyName=$true)]
|
||||||
|
[string]$PrivateEndpointIP,
|
||||||
|
|
||||||
|
[Parameter(ValueFromPipelineByPropertyName=$true)]
|
||||||
|
[string]$VpnSubnetRange,
|
||||||
|
|
||||||
|
[switch]$Interactive,
|
||||||
|
|
||||||
|
[switch]$Redact
|
||||||
|
)
|
||||||
|
|
||||||
|
# ============================================================================
|
||||||
|
# Configuration
|
||||||
|
# ============================================================================
|
||||||
|
|
||||||
|
$ScriptVersion = "1.0.0"
|
||||||
|
$DiagnosticTimestamp = Get-Date -Format "o"
|
||||||
|
$TcpConnectTimeoutMs = 5000
|
||||||
|
$DnsTimeoutMs = 5000
|
||||||
|
|
||||||
|
# ============================================================================
|
||||||
|
# Helper Functions
|
||||||
|
# ============================================================================
|
||||||
|
|
||||||
|
function Show-InputInstructions {
|
||||||
|
Write-Host @"
|
||||||
|
═════════════════════════════════════════════════════════════════════════════
|
||||||
|
COSMOS DB CONNECTIVITY DIAGNOSTIC SCRIPT v$ScriptVersion
|
||||||
|
═════════════════════════════════════════════════════════════════════════════
|
||||||
|
|
||||||
|
This script will collect network and access diagnostics for your Cosmos DB account.
|
||||||
|
|
||||||
|
WHERE TO FIND YOUR INPUTS:
|
||||||
|
─────────────────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
1. ENDPOINT URL (Required)
|
||||||
|
Location: Azure Portal > Cosmos DB Account > Overview tab
|
||||||
|
Look for: "URI" field
|
||||||
|
Example: https://my-cosmos-account.documents.azure.com
|
||||||
|
⚠ Include https:// but do NOT include trailing slash or port suffix
|
||||||
|
|
||||||
|
2. SUBSCRIPTION ID (Required)
|
||||||
|
Location: Azure Portal > Subscriptions
|
||||||
|
Look for: "Subscription ID" column or click your subscription > Copy ID
|
||||||
|
Format: 12345678-1234-1234-1234-123456789012
|
||||||
|
|
||||||
|
3. RESOURCE GROUP (Required)
|
||||||
|
Location: Azure Portal > Cosmos DB Account > Top-right corner
|
||||||
|
Look for: "Resource group" field
|
||||||
|
Example: my-production-rg
|
||||||
|
|
||||||
|
4. ACCOUNT NAME (Required)
|
||||||
|
Location: Either extract from endpoint URL or find in portal
|
||||||
|
From URL: Take the part before ".documents.azure.com"
|
||||||
|
From Portal: Account name appears in the breadcrumb and overview
|
||||||
|
Example: my-cosmos-account
|
||||||
|
|
||||||
|
5. PRIVATE ENDPOINT IP (Optional, but recommended)
|
||||||
|
Location: Azure Portal > Cosmos DB Account > Private Endpoint Connections
|
||||||
|
Look for: "Private IP address" column (only if private endpoints exist)
|
||||||
|
Format: 10.123.171.30 (will be 10.x.x.x or 172.16-31.x.x range)
|
||||||
|
Skip this if: You are using public endpoint only
|
||||||
|
|
||||||
|
6. VPN SUBNET RANGE (Optional)
|
||||||
|
Location: Ask your network team or VPN client settings
|
||||||
|
Used to: Analyze if routing from your network to private endpoint is blocked
|
||||||
|
Format: 10.0.0.0/24 (CIDR notation)
|
||||||
|
Skip this if: You are not using a VPN
|
||||||
|
|
||||||
|
═════════════════════════════════════════════════════════════════════════════
|
||||||
|
|
||||||
|
"@
|
||||||
|
}
|
||||||
|
|
||||||
|
function Read-InputsInteractively {
|
||||||
|
Show-InputInstructions
|
||||||
|
|
||||||
|
Write-Host "Please provide the following information:" -ForegroundColor Cyan
|
||||||
|
Write-Host ""
|
||||||
|
|
||||||
|
# Endpoint URL
|
||||||
|
do {
|
||||||
|
$endpoint = Read-Host "Endpoint URL (e.g., https://my-cosmos.documents.azure.com)"
|
||||||
|
if ($endpoint -notmatch "^https://[a-z0-9-]+\.documents\.azure\.com") {
|
||||||
|
Write-Host "Invalid format. Expected: https://<account-name>.documents.azure.com" -ForegroundColor Yellow
|
||||||
|
}
|
||||||
|
} while ($endpoint -notmatch "^https://[a-z0-9-]+\.documents\.azure\.com")
|
||||||
|
|
||||||
|
# Subscription ID
|
||||||
|
do {
|
||||||
|
$subId = Read-Host "Subscription ID (12345678-1234-1234-1234-123456789012)"
|
||||||
|
if ($subId -notmatch "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$") {
|
||||||
|
Write-Host "Invalid format. Expected GUID format." -ForegroundColor Yellow
|
||||||
|
}
|
||||||
|
} while ($subId -notmatch "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$")
|
||||||
|
|
||||||
|
$rg = Read-Host "Resource Group name"
|
||||||
|
$account = Read-Host "Account Name"
|
||||||
|
$peIP = Read-Host "Private Endpoint IP (optional, press Enter to skip)"
|
||||||
|
$vpnSubnet = Read-Host "VPN Subnet Range (optional, e.g., 10.0.0.0/24, press Enter to skip)"
|
||||||
|
|
||||||
|
return @{
|
||||||
|
EndpointUrl = $endpoint
|
||||||
|
SubscriptionId = [guid]$subId
|
||||||
|
ResourceGroup = $rg
|
||||||
|
AccountName = $account
|
||||||
|
PrivateEndpointIP = if ($peIP) { $peIP } else { $null }
|
||||||
|
VpnSubnetRange = if ($vpnSubnet) { $vpnSubnet } else { $null }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
function Invoke-DnsResolution {
|
||||||
|
param([string]$Hostname)
|
||||||
|
|
||||||
|
$result = @{
|
||||||
|
hostname = $Hostname
|
||||||
|
succeeded = $false
|
||||||
|
addresses = @()
|
||||||
|
error = $null
|
||||||
|
dnsServers = @()
|
||||||
|
latencyMs = 0
|
||||||
|
}
|
||||||
|
|
||||||
|
try {
|
||||||
|
$stopwatch = [System.Diagnostics.Stopwatch]::StartNew()
|
||||||
|
$addresses = [System.Net.Dns]::GetHostAddresses($Hostname)
|
||||||
|
$stopwatch.Stop()
|
||||||
|
|
||||||
|
$result.succeeded = $true
|
||||||
|
$result.addresses = @($addresses | ForEach-Object { $_.ToString() })
|
||||||
|
$result.latencyMs = [int]$stopwatch.ElapsedMilliseconds
|
||||||
|
|
||||||
|
# Try to get DNS servers (Windows/Linux specific)
|
||||||
|
if ($PSVersionTable.Platform -ne "Unix" -or $PSVersionTable.OS -like "*Linux*") {
|
||||||
|
try {
|
||||||
|
$dnsConfig = Get-DnsClientServerAddress -ErrorAction SilentlyContinue | Select-Object -First 1
|
||||||
|
if ($dnsConfig) {
|
||||||
|
$result.dnsServers = @($dnsConfig.ServerAddresses)
|
||||||
|
}
|
||||||
|
} catch { }
|
||||||
|
}
|
||||||
|
} catch {
|
||||||
|
$result.error = $_.Exception.Message
|
||||||
|
}
|
||||||
|
|
||||||
|
return $result
|
||||||
|
}
|
||||||
|
|
||||||
|
function Invoke-TcpConnectivityTest {
|
||||||
|
param(
|
||||||
|
[string]$Hostname,
|
||||||
|
[int]$Port = 443,
|
||||||
|
[int]$TimeoutMs = 5000
|
||||||
|
)
|
||||||
|
|
||||||
|
$result = @{
|
||||||
|
hostname = $Hostname
|
||||||
|
port = $Port
|
||||||
|
succeeded = $false
|
||||||
|
error = $null
|
||||||
|
latencyMs = 0
|
||||||
|
sourceIp = $null
|
||||||
|
}
|
||||||
|
|
||||||
|
try {
|
||||||
|
$stopwatch = [System.Diagnostics.Stopwatch]::StartNew()
|
||||||
|
$tcpClient = New-Object System.Net.Sockets.TcpClient
|
||||||
|
$task = $tcpClient.ConnectAsync($Hostname, $Port)
|
||||||
|
$task.Wait($TimeoutMs)
|
||||||
|
$stopwatch.Stop()
|
||||||
|
|
||||||
|
if ($task.IsCompleted) {
|
||||||
|
$result.succeeded = $true
|
||||||
|
$result.latencyMs = [int]$stopwatch.ElapsedMilliseconds
|
||||||
|
|
||||||
|
# Try to get source IP
|
||||||
|
try {
|
||||||
|
$endpoint = $tcpClient.Client.LocalEndPoint
|
||||||
|
$result.sourceIp = $endpoint.Address.ToString()
|
||||||
|
} catch { }
|
||||||
|
} else {
|
||||||
|
$result.error = "Connection timeout after ${TimeoutMs}ms"
|
||||||
|
}
|
||||||
|
|
||||||
|
$tcpClient.Close()
|
||||||
|
} catch {
|
||||||
|
$result.error = $_.Exception.Message
|
||||||
|
}
|
||||||
|
|
||||||
|
return $result
|
||||||
|
}
|
||||||
|
|
||||||
|
function Invoke-HttpsProbe {
|
||||||
|
param([string]$Url)
|
||||||
|
|
||||||
|
$result = @{
|
||||||
|
url = $Url
|
||||||
|
succeeded = $false
|
||||||
|
statusCode = $null
|
||||||
|
error = $null
|
||||||
|
latencyMs = 0
|
||||||
|
}
|
||||||
|
|
||||||
|
try {
|
||||||
|
$stopwatch = [System.Diagnostics.Stopwatch]::StartNew()
|
||||||
|
$response = Invoke-WebRequest -Uri $Url -Method Head -TimeoutSec 5 -ErrorAction Stop
|
||||||
|
$stopwatch.Stop()
|
||||||
|
|
||||||
|
$result.succeeded = $true
|
||||||
|
$result.statusCode = [int]$response.StatusCode
|
||||||
|
$result.latencyMs = [int]$stopwatch.ElapsedMilliseconds
|
||||||
|
} catch {
|
||||||
|
$result.statusCode = [int]($_.Exception.Response.StatusCode)
|
||||||
|
$result.error = $_.Exception.Message
|
||||||
|
}
|
||||||
|
|
||||||
|
return $result
|
||||||
|
}
|
||||||
|
|
||||||
|
function Get-PrivateNetworkIndicators {
|
||||||
|
param(
|
||||||
|
[string[]]$ResolvedAddresses,
|
||||||
|
[string]$PrivateEndpointIP,
|
||||||
|
[string]$VpnSubnetRange
|
||||||
|
)
|
||||||
|
|
||||||
|
$result = @{
|
||||||
|
isPrivateRange = $false
|
||||||
|
indicators = @()
|
||||||
|
matchesExpectedPrivateEndpoint = $false
|
||||||
|
vpnRouteWarning = $null
|
||||||
|
}
|
||||||
|
|
||||||
|
# Check if resolved IPs are private range
|
||||||
|
foreach ($addr in $ResolvedAddresses) {
|
||||||
|
if (IsPrivateIpAddress $addr) {
|
||||||
|
$result.isPrivateRange = $true
|
||||||
|
$result.indicators += "Resolved to RFC 1918 private IP range ($addr)"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# Check if matches expected private endpoint
|
||||||
|
if ($PrivateEndpointIP -and $ResolvedAddresses -contains $PrivateEndpointIP) {
|
||||||
|
$result.matchesExpectedPrivateEndpoint = $true
|
||||||
|
$result.indicators += "Matches expected private endpoint IP ($PrivateEndpointIP)"
|
||||||
|
} elseif ($PrivateEndpointIP -and $ResolvedAddresses.Count -gt 0) {
|
||||||
|
$result.indicators += "WARNING: Resolved to $($ResolvedAddresses[0]) but expected private endpoint IP is $PrivateEndpointIP"
|
||||||
|
}
|
||||||
|
|
||||||
|
return $result
|
||||||
|
}
|
||||||
|
|
||||||
|
function IsPrivateIpAddress {
|
||||||
|
param([string]$IpAddress)
|
||||||
|
|
||||||
|
try {
|
||||||
|
$ip = [System.Net.IPAddress]::Parse($IpAddress)
|
||||||
|
# RFC 1918 ranges
|
||||||
|
if ($ip.ToString() -match "^10\." -or $ip.ToString() -match "^172\.(1[6-9]|2[0-9]|3[01])\." -or $ip.ToString() -match "^192\.168\.") {
|
||||||
|
return $true
|
||||||
|
}
|
||||||
|
# Loopback
|
||||||
|
if ($ip.AddressFamily -eq "InterNetwork" -and $ip.GetAddressBytes()[0] -eq 127) {
|
||||||
|
return $true
|
||||||
|
}
|
||||||
|
} catch { }
|
||||||
|
|
||||||
|
return $false
|
||||||
|
}
|
||||||
|
|
||||||
|
function Get-AzureCliContext {
|
||||||
|
$result = @{
|
||||||
|
installed = $false
|
||||||
|
authenticated = $false
|
||||||
|
currentUser = $null
|
||||||
|
currentTenant = $null
|
||||||
|
currentSubscription = $null
|
||||||
|
error = $null
|
||||||
|
}
|
||||||
|
|
||||||
|
try {
|
||||||
|
$output = & az --version 2>&1
|
||||||
|
if ($LASTEXITCODE -eq 0) {
|
||||||
|
$result.installed = $true
|
||||||
|
}
|
||||||
|
} catch {
|
||||||
|
$result.error = "Azure CLI not found. Skipping Azure context checks."
|
||||||
|
return $result
|
||||||
|
}
|
||||||
|
|
||||||
|
try {
|
||||||
|
$account = & az account show 2>&1 | ConvertFrom-Json
|
||||||
|
$result.authenticated = $true
|
||||||
|
$result.currentUser = $account.user.name
|
||||||
|
$result.currentTenant = $account.tenantId
|
||||||
|
$result.currentSubscription = $account.id
|
||||||
|
} catch {
|
||||||
|
$result.error = "Not authenticated with Azure CLI. Run 'az login' to proceed with Azure checks."
|
||||||
|
}
|
||||||
|
|
||||||
|
return $result
|
||||||
|
}
|
||||||
|
|
||||||
|
function Get-AzureAccountNetworkConfig {
|
||||||
|
param(
|
||||||
|
[guid]$SubscriptionId,
|
||||||
|
[string]$ResourceGroup,
|
||||||
|
[string]$AccountName
|
||||||
|
)
|
||||||
|
|
||||||
|
$result = @{
|
||||||
|
checked = $false
|
||||||
|
publicNetworkAccessRestricted = $null
|
||||||
|
privateEndpoints = @()
|
||||||
|
vnetRules = @()
|
||||||
|
error = $null
|
||||||
|
}
|
||||||
|
|
||||||
|
try {
|
||||||
|
$scope = "/subscriptions/$SubscriptionId/resourceGroups/$ResourceGroup/providers/Microsoft.DocumentDB/databaseAccounts/$AccountName"
|
||||||
|
$account = & az cosmosdb show --resource-group $ResourceGroup --name $AccountName 2>&1 | ConvertFrom-Json
|
||||||
|
|
||||||
|
if ($account) {
|
||||||
|
$result.checked = $true
|
||||||
|
$result.publicNetworkAccessRestricted = $account.properties.publicNetworkAccess -eq "Disabled"
|
||||||
|
|
||||||
|
# Get private endpoints
|
||||||
|
$peConnections = & az cosmosdb private-endpoint-connection list --resource-group $ResourceGroup --name $AccountName 2>&1 | ConvertFrom-Json
|
||||||
|
if ($peConnections) {
|
||||||
|
$result.privateEndpoints = @($peConnections | Select-Object -Property id, @{n='state';e={$_.properties.privateLinkServiceConnectionState.status}})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} catch {
|
||||||
|
$result.error = $_.Exception.Message
|
||||||
|
}
|
||||||
|
|
||||||
|
return $result
|
||||||
|
}
|
||||||
|
|
||||||
|
function Get-RbacAssessment {
|
||||||
|
param(
|
||||||
|
[guid]$SubscriptionId,
|
||||||
|
[string]$ResourceGroup,
|
||||||
|
[string]$AccountName
|
||||||
|
)
|
||||||
|
|
||||||
|
$result = @{
|
||||||
|
checked = $false
|
||||||
|
canReadAccount = $false
|
||||||
|
canManageAccount = $false
|
||||||
|
canExecuteDataPlaneOps = $false
|
||||||
|
roleAssignments = @()
|
||||||
|
classification = "unknown"
|
||||||
|
error = $null
|
||||||
|
}
|
||||||
|
|
||||||
|
try {
|
||||||
|
$scope = "/subscriptions/$SubscriptionId/resourceGroups/$ResourceGroup/providers/Microsoft.DocumentDB/databaseAccounts/$AccountName"
|
||||||
|
|
||||||
|
# Try to read account (implies Reader or higher)
|
||||||
|
$account = & az cosmosdb show --resource-group $ResourceGroup --name $AccountName 2>&1 | ConvertFrom-Json
|
||||||
|
if ($account) {
|
||||||
|
$result.checked = $true
|
||||||
|
$result.canReadAccount = $true
|
||||||
|
|
||||||
|
# Check role assignments
|
||||||
|
$roles = & az role assignment list --scope $scope 2>&1 | ConvertFrom-Json
|
||||||
|
if ($roles) {
|
||||||
|
$result.roleAssignments = @($roles | Select-Object -Property roleDefinitionName, principalName)
|
||||||
|
|
||||||
|
# Classify permissions
|
||||||
|
$roleNames = $roles | Select-Object -ExpandProperty roleDefinitionName
|
||||||
|
if ($roleNames -contains "Contributor" -or $roleNames -contains "Owner") {
|
||||||
|
$result.canManageAccount = $true
|
||||||
|
$result.canExecuteDataPlaneOps = $true
|
||||||
|
$result.classification = "sufficient"
|
||||||
|
} elseif ($roleNames -contains "Cosmos DB Operator" -or $roleNames -contains "Cosmos DB Account Reader") {
|
||||||
|
$result.canExecuteDataPlaneOps = $true
|
||||||
|
$result.classification = "partial"
|
||||||
|
} else {
|
||||||
|
$result.classification = "partial"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} catch {
|
||||||
|
$result.error = $_.Exception.Message
|
||||||
|
$result.classification = "insufficient"
|
||||||
|
}
|
||||||
|
|
||||||
|
return $result
|
||||||
|
}
|
||||||
|
|
||||||
|
function Invoke-Classification {
|
||||||
|
param(
|
||||||
|
[hashtable]$DnsResult,
|
||||||
|
[hashtable]$TcpResult,
|
||||||
|
[hashtable]$PrivateNetworkIndicators,
|
||||||
|
[hashtable]$AzureNetworkConfig
|
||||||
|
)
|
||||||
|
|
||||||
|
$classification = @{
|
||||||
|
status = "unknown"
|
||||||
|
code = "unknown"
|
||||||
|
summary = "Unable to classify"
|
||||||
|
rootCause = $null
|
||||||
|
recommendedActions = @()
|
||||||
|
}
|
||||||
|
|
||||||
|
# DNS failure
|
||||||
|
if (-not $DnsResult.succeeded) {
|
||||||
|
$classification.status = "failure"
|
||||||
|
$classification.code = "dns_resolution_failed"
|
||||||
|
$classification.summary = "DNS resolution failed. The Cosmos DB endpoint hostname cannot be resolved."
|
||||||
|
$classification.rootCause = "DNS configuration, VPN/proxy DNS redirect, or network connectivity issue"
|
||||||
|
$classification.recommendedActions = @(
|
||||||
|
"1. Check if you are connected to corporate VPN or proxy that intercepts DNS",
|
||||||
|
"2. Manually run: nslookup $($DnsResult.hostname)",
|
||||||
|
"3. If nslookup fails, check with your network team or ISP",
|
||||||
|
"4. Try pinging the endpoint or using nslookup with alternate DNS: nslookup $($DnsResult.hostname) 8.8.8.8"
|
||||||
|
)
|
||||||
|
return $classification
|
||||||
|
}
|
||||||
|
|
||||||
|
# DNS succeeded but TCP failed
|
||||||
|
if ($DnsResult.succeeded -and -not $TcpResult.succeeded) {
|
||||||
|
$classification.status = "failure"
|
||||||
|
$classification.code = "tcp_connectivity_blocked"
|
||||||
|
$classification.summary = "DNS resolution succeeded but TCP 443 connection failed. Network path is blocked."
|
||||||
|
|
||||||
|
if ($PrivateNetworkIndicators.isPrivateRange) {
|
||||||
|
$classification.rootCause = "Private endpoint configured but network path blocked (VPN routing, firewall/NVA, NSG, UDR, or peering issue)"
|
||||||
|
$classification.recommendedActions = @(
|
||||||
|
"1. Verify VPN connectivity and that your client subnet can route to the private endpoint subnet",
|
||||||
|
"2. Ask your network team to verify routing between $([System.Net.Dns]::GetHostName()) and private endpoint $($DnsResult.addresses[0])",
|
||||||
|
"3. Check Azure network security groups (NSGs) rules for port 443 inbound",
|
||||||
|
"4. Verify Azure Virtual Network peering and User Defined Routes (UDRs)",
|
||||||
|
"5. Check if corporate firewall/NVA is blocking the connection",
|
||||||
|
"6. Manually run: Test-NetConnection -ComputerName $($DnsResult.hostname) -Port 443"
|
||||||
|
)
|
||||||
|
} else {
|
||||||
|
$classification.rootCause = "Public endpoint network path blocked (firewall, proxy, ISP, or regional restriction)"
|
||||||
|
$classification.recommendedActions = @(
|
||||||
|
"1. Check if corporate firewall is blocking outbound port 443",
|
||||||
|
"2. If behind proxy, verify proxy settings allow HTTPS to documents.azure.com",
|
||||||
|
"3. Manually run: Test-NetConnection -ComputerName $($DnsResult.hostname) -Port 443",
|
||||||
|
"4. Try connecting from a different network to isolate the issue"
|
||||||
|
)
|
||||||
|
}
|
||||||
|
return $classification
|
||||||
|
}
|
||||||
|
|
||||||
|
# Both succeeded
|
||||||
|
if ($DnsResult.succeeded -and $TcpResult.succeeded) {
|
||||||
|
$classification.status = "success"
|
||||||
|
$classification.code = "network_connectivity_healthy"
|
||||||
|
$classification.summary = "Network connectivity is healthy. DNS resolves and TCP 443 is reachable."
|
||||||
|
$classification.rootCause = $null
|
||||||
|
$classification.recommendedActions = @(
|
||||||
|
"✓ Local network connectivity is working",
|
||||||
|
"If Cosmos DB operations still fail, check:",
|
||||||
|
" - RBAC/authentication permissions",
|
||||||
|
" - Account firewall IP rules (if enabled)",
|
||||||
|
" - Data plane token expiry",
|
||||||
|
" - Application-level issues (connection strings, SDK versions)"
|
||||||
|
)
|
||||||
|
return $classification
|
||||||
|
}
|
||||||
|
|
||||||
|
return $classification
|
||||||
|
}
|
||||||
|
|
||||||
|
function Redact-Sensitive {
|
||||||
|
param([object]$Object)
|
||||||
|
|
||||||
|
if (-not $Redact) { return $Object }
|
||||||
|
|
||||||
|
$json = $Object | ConvertTo-Json -Depth 10
|
||||||
|
$json = $json -replace [regex]::Escape($SubscriptionId.ToString()), "REDACTED-SUBSCRIPTION-ID"
|
||||||
|
|
||||||
|
# Redact tenant IDs (GUIDs in certain fields)
|
||||||
|
$json = $json -replace '"currentTenant"\s*:\s*"[^"]*"', '"currentTenant": "REDACTED-TENANT-ID"'
|
||||||
|
|
||||||
|
# Redact user names
|
||||||
|
$json = $json -replace '"currentUser"\s*:\s*"[^"]*"', '"currentUser": "REDACTED-USER-NAME"'
|
||||||
|
$json = $json -replace '"principalName"\s*:\s*"[^"]*"', '"principalName": "REDACTED-PRINCIPAL-NAME"'
|
||||||
|
|
||||||
|
return $json | ConvertFrom-Json
|
||||||
|
}
|
||||||
|
|
||||||
|
# ============================================================================
|
||||||
|
# Main Execution
|
||||||
|
# ============================================================================
|
||||||
|
|
||||||
|
try {
|
||||||
|
# Validate and collect inputs
|
||||||
|
if ($Interactive -and -not $EndpointUrl) {
|
||||||
|
$inputs = Read-InputsInteractively
|
||||||
|
$EndpointUrl = $inputs.EndpointUrl
|
||||||
|
$SubscriptionId = $inputs.SubscriptionId
|
||||||
|
$ResourceGroup = $inputs.ResourceGroup
|
||||||
|
$AccountName = $inputs.AccountName
|
||||||
|
$PrivateEndpointIP = $inputs.PrivateEndpointIP
|
||||||
|
$VpnSubnetRange = $inputs.VpnSubnetRange
|
||||||
|
} elseif (-not $EndpointUrl) {
|
||||||
|
Write-Host "No endpoint URL provided. Use -Interactive flag or provide parameters." -ForegroundColor Red
|
||||||
|
Show-InputInstructions
|
||||||
|
exit 1
|
||||||
|
}
|
||||||
|
|
||||||
|
# Extract hostname from URL
|
||||||
|
$uri = [System.Uri]$EndpointUrl
|
||||||
|
$hostname = $uri.Host
|
||||||
|
|
||||||
|
Write-Host "Collecting diagnostics for: $hostname" -ForegroundColor Cyan
|
||||||
|
Write-Host ""
|
||||||
|
|
||||||
|
# Run diagnostics
|
||||||
|
Write-Host "[1/5] DNS Resolution..." -ForegroundColor Cyan
|
||||||
|
$dnsResult = Invoke-DnsResolution -Hostname $hostname
|
||||||
|
|
||||||
|
Write-Host "[2/5] TCP Connectivity (port 443)..." -ForegroundColor Cyan
|
||||||
|
$tcpResult = Invoke-TcpConnectivityTest -Hostname $hostname -Port 443 -TimeoutMs $TcpConnectTimeoutMs
|
||||||
|
|
||||||
|
Write-Host "[3/5] HTTPS Probe..." -ForegroundColor Cyan
|
||||||
|
$httpsResult = Invoke-HttpsProbe -Url $EndpointUrl
|
||||||
|
|
||||||
|
Write-Host "[4/5] Private Network Analysis..." -ForegroundColor Cyan
|
||||||
|
$privateNetIndicators = Get-PrivateNetworkIndicators -ResolvedAddresses $dnsResult.addresses -PrivateEndpointIP $PrivateEndpointIP -VpnSubnetRange $VpnSubnetRange
|
||||||
|
|
||||||
|
Write-Host "[5/5] Azure Configuration & RBAC..." -ForegroundColor Cyan
|
||||||
|
$cliContext = Get-AzureCliContext
|
||||||
|
$networkConfig = @{ checked = $false; error = "Skipped" }
|
||||||
|
$rbacAssessment = @{ checked = $false; classification = "unknown"; error = "Skipped" }
|
||||||
|
|
||||||
|
if ($cliContext.authenticated -and $SubscriptionId -and $ResourceGroup -and $AccountName) {
|
||||||
|
$networkConfig = Get-AzureAccountNetworkConfig -SubscriptionId $SubscriptionId -ResourceGroup $ResourceGroup -AccountName $AccountName
|
||||||
|
$rbacAssessment = Get-RbacAssessment -SubscriptionId $SubscriptionId -ResourceGroup $ResourceGroup -AccountName $AccountName
|
||||||
|
} elseif (-not $cliContext.authenticated) {
|
||||||
|
Write-Host " ⚠ Azure CLI not authenticated. Skipping Azure checks. Run 'az login' to enable." -ForegroundColor Yellow
|
||||||
|
}
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
Write-Host "Generating classification..." -ForegroundColor Cyan
|
||||||
|
$classification = Invoke-Classification -DnsResult $dnsResult -TcpResult $tcpResult -PrivateNetworkIndicators $privateNetIndicators -AzureNetworkConfig $networkConfig
|
||||||
|
|
||||||
|
# Build final report
|
||||||
|
$report = @{
|
||||||
|
version = $ScriptVersion
|
||||||
|
timestamp = $DiagnosticTimestamp
|
||||||
|
target = @{
|
||||||
|
endpointUrl = if ($Redact) { "REDACTED" } else { $EndpointUrl }
|
||||||
|
hostname = $hostname
|
||||||
|
subscriptionId = if ($Redact -and $SubscriptionId) { "REDACTED" } else { $SubscriptionId.ToString() }
|
||||||
|
resourceGroup = if ($Redact -and $ResourceGroup) { "REDACTED" } else { $ResourceGroup }
|
||||||
|
accountName = if ($Redact -and $AccountName) { "REDACTED" } else { $AccountName }
|
||||||
|
}
|
||||||
|
execution = @{
|
||||||
|
hostname = [System.Net.Dns]::GetHostName()
|
||||||
|
platform = $PSVersionTable.OS
|
||||||
|
powershellVersion = $PSVersionTable.PSVersion.ToString()
|
||||||
|
}
|
||||||
|
diagnostics = @{
|
||||||
|
dns = $dnsResult
|
||||||
|
tcp = $tcpResult
|
||||||
|
https = $httpsResult
|
||||||
|
privateNetwork = $privateNetIndicators
|
||||||
|
azureNetworkConfig = $networkConfig
|
||||||
|
rbac = $rbacAssessment
|
||||||
|
azureCli = $cliContext
|
||||||
|
}
|
||||||
|
classification = $classification
|
||||||
|
}
|
||||||
|
|
||||||
|
# Redact if requested
|
||||||
|
if ($Redact) {
|
||||||
|
$report = Redact-Sensitive -Object $report
|
||||||
|
}
|
||||||
|
|
||||||
|
# Output JSON report
|
||||||
|
$jsonReport = $report | ConvertTo-Json -Depth 10
|
||||||
|
|
||||||
|
# Save to file
|
||||||
|
$timestamp = Get-Date -Format "yyyyMMdd_HHmmss"
|
||||||
|
$outputFile = "cosmos-diagnostic-$timestamp.json"
|
||||||
|
$jsonReport | Out-File -FilePath $outputFile -Encoding UTF8
|
||||||
|
|
||||||
|
Write-Host ""
|
||||||
|
Write-Host "═════════════════════════════════════════════════════════════════════════════" -ForegroundColor Green
|
||||||
|
Write-Host "DIAGNOSTIC COMPLETE" -ForegroundColor Green
|
||||||
|
Write-Host "═════════════════════════════════════════════════════════════════════════════" -ForegroundColor Green
|
||||||
|
Write-Host ""
|
||||||
|
Write-Host "Summary:" -ForegroundColor Cyan
|
||||||
|
Write-Host " DNS Resolution: $(if ($dnsResult.succeeded) { '✓ PASS' } else { '✗ FAIL' })"
|
||||||
|
Write-Host " TCP Connectivity: $(if ($tcpResult.succeeded) { '✓ PASS' } else { '✗ FAIL' })"
|
||||||
|
Write-Host " Private Network: $(if ($privateNetIndicators.isPrivateRange) { 'Detected (Private Endpoint)' } else { 'Not Detected (Public Endpoint)' })"
|
||||||
|
Write-Host " Classification: $($classification.status.ToUpper()) - $($classification.code)"
|
||||||
|
Write-Host ""
|
||||||
|
Write-Host "Full report saved to: $outputFile" -ForegroundColor Green
|
||||||
|
Write-Host ""
|
||||||
|
Write-Host "Summary:" -ForegroundColor Yellow
|
||||||
|
Write-Host $classification.summary
|
||||||
|
Write-Host ""
|
||||||
|
if ($classification.recommendedActions.Count -gt 0) {
|
||||||
|
Write-Host "Recommended Actions:" -ForegroundColor Yellow
|
||||||
|
$classification.recommendedActions | ForEach-Object { Write-Host " $_" }
|
||||||
|
}
|
||||||
|
Write-Host ""
|
||||||
|
|
||||||
|
# Output JSON to console for easy copy/paste
|
||||||
|
Write-Host "Full JSON Report:" -ForegroundColor Cyan
|
||||||
|
Write-Host "─────────────────────────────────────────────────────────────────────────────"
|
||||||
|
Write-Host $jsonReport
|
||||||
|
|
||||||
|
} catch {
|
||||||
|
Write-Host "Error: $($_.Exception.Message)" -ForegroundColor Red
|
||||||
|
exit 1
|
||||||
|
}
|
||||||
@@ -0,0 +1,352 @@
|
|||||||
|
# Cosmos DB Connectivity Diagnostic - Complete Documentation Index
|
||||||
|
|
||||||
|
## 📦 Deliverables
|
||||||
|
|
||||||
|
This folder contains a complete, production-ready diagnostic toolkit for troubleshooting Cosmos DB connectivity issues. Below is a guide to all files and their purpose.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📚 Documentation Files
|
||||||
|
|
||||||
|
### 1. **README.md** ← Start here
|
||||||
|
**Purpose:** Comprehensive usage guide for customers and support teams
|
||||||
|
|
||||||
|
**Contains:**
|
||||||
|
- Overview and features
|
||||||
|
- Quick start in 3 modes (interactive, non-interactive, with redaction)
|
||||||
|
- Step-by-step guide to finding all inputs
|
||||||
|
- Understanding output format
|
||||||
|
- Common scenarios and examples
|
||||||
|
- Integration examples
|
||||||
|
- Troubleshooting guide
|
||||||
|
- Troubleshooting common issues
|
||||||
|
|
||||||
|
**Read this if:** You're running the script for the first time or onboarding someone else
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 2. **QUICK_REFERENCE.md** ← For urgent issues
|
||||||
|
**Purpose:** 2-minute quick-start card for customers
|
||||||
|
|
||||||
|
**Contains:**
|
||||||
|
- 3-step quick start
|
||||||
|
- Result codes at a glance
|
||||||
|
- Common fixes
|
||||||
|
- Prerequisite checklist
|
||||||
|
|
||||||
|
**Read this if:** You need to run the script NOW and don't have time for full docs
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 3. **DIAGNOSTIC_SCHEMA.md** ← For developers/automation
|
||||||
|
**Purpose:** Complete JSON output specification
|
||||||
|
|
||||||
|
**Contains:**
|
||||||
|
- Full JSON schema with field descriptions
|
||||||
|
- Root, target, execution, diagnostics, and classification objects
|
||||||
|
- DNS/TCP/HTTPS/private network result formats
|
||||||
|
- Azure config and RBAC object structures
|
||||||
|
- Classification code reference table
|
||||||
|
- Sample outputs for 3 scenarios
|
||||||
|
- Parsing guidelines
|
||||||
|
- Version history
|
||||||
|
|
||||||
|
**Read this if:**
|
||||||
|
- You're building a parser or automation tool
|
||||||
|
- You need to understand the JSON structure
|
||||||
|
- You're integrating with support ticketing system
|
||||||
|
- You want to validate output structure
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 4. **CLASSIFICATION_MATRIX.md** ← For support teams
|
||||||
|
**Purpose:** Support playbooks and triage routing
|
||||||
|
|
||||||
|
**Contains:**
|
||||||
|
- Decision tree flowchart (ASCII art)
|
||||||
|
- All classification codes with detailed explanations
|
||||||
|
- Root causes and recommended actions for each code
|
||||||
|
- Tier 1 triage checklist
|
||||||
|
- Detailed playbooks for each failure scenario:
|
||||||
|
- DNS Resolution Failed
|
||||||
|
- TCP 443 Failed (Public Endpoint)
|
||||||
|
- TCP 443 Failed (Private Endpoint)
|
||||||
|
- RBAC Insufficient
|
||||||
|
- Support ticket template
|
||||||
|
- Python parsing example
|
||||||
|
- Automation routing matrix
|
||||||
|
|
||||||
|
**Read this if:**
|
||||||
|
- You're a support engineer receiving diagnostic reports
|
||||||
|
- You need to route issues based on classification
|
||||||
|
- You're building automation to process diagnostics
|
||||||
|
- You need to escalate to specialist teams
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🔧 Script File
|
||||||
|
|
||||||
|
### **Diagnose-CosmosConnectivity.ps1**
|
||||||
|
**Purpose:** Main diagnostic script (customer-executable)
|
||||||
|
|
||||||
|
**What it does:**
|
||||||
|
1. Prompts for account endpoints and credentials (interactive or parameterized)
|
||||||
|
2. Runs 5 diagnostic checks:
|
||||||
|
- DNS resolution of account endpoint
|
||||||
|
- TCP 443 connectivity test
|
||||||
|
- HTTPS reachability probe
|
||||||
|
- Private network indicators analysis
|
||||||
|
- Azure CLI queries (if authenticated)
|
||||||
|
3. Performs RBAC assessment
|
||||||
|
4. Generates classification (success/failure/warning + specific code)
|
||||||
|
5. Outputs structured JSON to file and console
|
||||||
|
6. Produces human-readable summary with recommended actions
|
||||||
|
|
||||||
|
**Key Features:**
|
||||||
|
- 300+ lines of well-commented PowerShell
|
||||||
|
- Error handling for all network operations
|
||||||
|
- Timeouts to prevent hanging
|
||||||
|
- Optional sensitive data redaction
|
||||||
|
- Works on Windows, macOS, Linux (PowerShell 5.0+)
|
||||||
|
- No external dependencies except optional Azure CLI
|
||||||
|
|
||||||
|
**How to run:**
|
||||||
|
```powershell
|
||||||
|
# Interactive (recommended first run)
|
||||||
|
.\Diagnose-CosmosConnectivity.ps1 -Interactive
|
||||||
|
|
||||||
|
# Non-interactive (scripted)
|
||||||
|
.\Diagnose-CosmosConnectivity.ps1 `
|
||||||
|
-EndpointUrl "..." -SubscriptionId "..." -ResourceGroup "..." -AccountName "..."
|
||||||
|
|
||||||
|
# Safe for support (redacted)
|
||||||
|
.\Diagnose-CosmosConnectivity.ps1 ... -Redact
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🔄 File Relationships
|
||||||
|
|
||||||
|
```
|
||||||
|
Customer Issue: "Can't connect to Cosmos DB"
|
||||||
|
│
|
||||||
|
├─→ QUICK_REFERENCE.md (if in hurry)
|
||||||
|
│ │
|
||||||
|
│ └─→ "Run this command"
|
||||||
|
│
|
||||||
|
└─→ README.md (comprehensive guidance)
|
||||||
|
│
|
||||||
|
├─→ Run: Diagnose-CosmosConnectivity.ps1
|
||||||
|
│ │
|
||||||
|
│ └─→ Outputs JSON file + console summary
|
||||||
|
│
|
||||||
|
├─→ Read classification code
|
||||||
|
│
|
||||||
|
└─→ CLASSIFICATION_MATRIX.md (support playbook)
|
||||||
|
│
|
||||||
|
├─→ Find your classification code
|
||||||
|
│
|
||||||
|
├─→ Read root causes
|
||||||
|
│
|
||||||
|
└─→ Follow recommended actions
|
||||||
|
│
|
||||||
|
├─→ Self-resolve?
|
||||||
|
│ └─→ Done!
|
||||||
|
│
|
||||||
|
└─→ Still stuck?
|
||||||
|
│
|
||||||
|
├─→ Gather info from JSON
|
||||||
|
│
|
||||||
|
├─→ Redact with -Redact flag
|
||||||
|
│
|
||||||
|
└─→ Escalate to support
|
||||||
|
│
|
||||||
|
├─→ Support triages with CLASSIFICATION_MATRIX.md
|
||||||
|
│
|
||||||
|
└─→ Route to specialist (network, auth, etc.)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🎯 Usage by Role
|
||||||
|
|
||||||
|
### 👤 Customer / End User
|
||||||
|
1. Read: **QUICK_REFERENCE.md** (2 min)
|
||||||
|
2. Gather inputs as shown in README.md
|
||||||
|
3. Run: `.\Diagnose-CosmosConnectivity.ps1 -Interactive`
|
||||||
|
4. Review output—look for Classification Code
|
||||||
|
5. Try recommended actions from console output
|
||||||
|
6. If stuck → Share JSON with support (use `-Redact`)
|
||||||
|
|
||||||
|
### 👨💼 Support Engineer (Tier 1)
|
||||||
|
1. Receive JSON report from customer
|
||||||
|
2. Read: **CLASSIFICATION_MATRIX.md** section "Tier 1: Triage"
|
||||||
|
3. Look up classification.code in "Classification Code Reference"
|
||||||
|
4. Follow the corresponding playbook
|
||||||
|
5. Either self-resolve or route to specialist
|
||||||
|
|
||||||
|
### 👨💻 Support Engineer (Specialist)
|
||||||
|
1. Receive routed issue with JSON and escalation context
|
||||||
|
2. Read relevant playbook from **CLASSIFICATION_MATRIX.md**
|
||||||
|
3. Use **DIAGNOSTIC_SCHEMA.md** to parse specific JSON fields
|
||||||
|
4. Reference "Recommended Actions" for deep-dive steps
|
||||||
|
5. May request customer to re-run with additional parameters
|
||||||
|
|
||||||
|
### 🤖 Automation / Integration
|
||||||
|
1. Read: **DIAGNOSTIC_SCHEMA.md** (schema specification)
|
||||||
|
2. Parse JSON output from script
|
||||||
|
3. Route based on classification.code
|
||||||
|
4. (Optional) Read **CLASSIFICATION_MATRIX.md** section "JSON Parsing for Automation"
|
||||||
|
5. Integrate with ticketing, routing, or remediation system
|
||||||
|
|
||||||
|
### 📊 Product Team / Data Analysis
|
||||||
|
1. Collect diagnostic reports over time
|
||||||
|
2. Aggregate classification codes to identify trends
|
||||||
|
3. Use JSON structure to extract metrics (DNS latency, TCP success rate, etc.)
|
||||||
|
4. Reference **DIAGNOSTIC_SCHEMA.md** for field definitions
|
||||||
|
5. Correlate with support ticket data for insights
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📋 Classification Codes at a Glance
|
||||||
|
|
||||||
|
Quick reference (full details in CLASSIFICATION_MATRIX.md):
|
||||||
|
|
||||||
|
| Code | Type | Severity | What It Means |
|
||||||
|
|------|------|----------|---|
|
||||||
|
| `network_connectivity_healthy` | ✅ | Info | Network works; if still broken, check auth/app |
|
||||||
|
| `dns_resolution_failed` | ❌ | High | Cannot resolve endpoint (DNS/VPN/proxy issue) |
|
||||||
|
| `tcp_connectivity_blocked` | ❌ | High | DNS works, port 443 blocked (firewall/ISP) |
|
||||||
|
| `private_endpoint_network_path_blocked` | ❌ | High | Private endpoint unreachable (PE routing issue) |
|
||||||
|
| `rbac_insufficient` | ⚠️ | Medium | Network OK, but permissions missing |
|
||||||
|
| `private_endpoint_mismatch` | ⚠️ | Medium | Resolved to unexpected private IP |
|
||||||
|
| `azure_config_check_skipped` | ⚠️ | Low | Azure CLI not authenticated; re-run after `az login` |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🔍 Finding Specific Information
|
||||||
|
|
||||||
|
### "I want to know what the JSON contains"
|
||||||
|
→ **DIAGNOSTIC_SCHEMA.md** (all field definitions)
|
||||||
|
|
||||||
|
### "I see a classification code, what does it mean?"
|
||||||
|
→ **CLASSIFICATION_MATRIX.md** (code reference + playbook)
|
||||||
|
|
||||||
|
### "How do I run the script?"
|
||||||
|
→ **README.md** (detailed how-to) or **QUICK_REFERENCE.md** (2-min version)
|
||||||
|
|
||||||
|
### "I'm building a parser/bot"
|
||||||
|
→ **DIAGNOSTIC_SCHEMA.md** (schema + samples) + **CLASSIFICATION_MATRIX.md** (routing logic)
|
||||||
|
|
||||||
|
### "I need to support multiple customers"
|
||||||
|
→ **CLASSIFICATION_MATRIX.md** (support ticket template + triage playbook)
|
||||||
|
|
||||||
|
### "I need to find input for a specific field"
|
||||||
|
→ **README.md** section "Getting Your Inputs" (step-by-step with screenshots reference)
|
||||||
|
|
||||||
|
### "How do I integrate this into my system?"
|
||||||
|
→ **DIAGNOSTIC_SCHEMA.md** (JSON structure) + **CLASSIFICATION_MATRIX.md** (routing + Python example)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ✅ Pre-Launch Checklist
|
||||||
|
|
||||||
|
Before deploying to customers, verify:
|
||||||
|
|
||||||
|
- [ ] Script runs without errors in interactive mode
|
||||||
|
- [ ] Script accepts all parameters in non-interactive mode
|
||||||
|
- [ ] `-Redact` flag properly masks sensitive data
|
||||||
|
- [ ] JSON output validates against DIAGNOSTIC_SCHEMA.md
|
||||||
|
- [ ] All classification codes match CLASSIFICATION_MATRIX.md
|
||||||
|
- [ ] README.md examples tested and working
|
||||||
|
- [ ] Support team trained on CLASSIFICATION_MATRIX.md playbooks
|
||||||
|
- [ ] Triage automation configured (if applicable)
|
||||||
|
- [ ] Sample JSON files created and tested
|
||||||
|
- [ ] Accessibility verified (screen readers, etc.)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🚀 Rollout Plan
|
||||||
|
|
||||||
|
### Phase 1: Internal Testing (Week 1)
|
||||||
|
- [ ] Run script on various network configurations
|
||||||
|
- [ ] Test interactive and non-interactive modes
|
||||||
|
- [ ] Verify Azure CLI integration (if connected to test accounts)
|
||||||
|
- [ ] Collect sample JSON outputs
|
||||||
|
|
||||||
|
### Phase 2: Support Dogfood (Week 2)
|
||||||
|
- [ ] Train support team on using CLASSIFICATION_MATRIX.md
|
||||||
|
- [ ] Have support team run diagnostics on internal test accounts
|
||||||
|
- [ ] Collect feedback on documentation clarity
|
||||||
|
- [ ] Refine playbooks based on real cases
|
||||||
|
|
||||||
|
### Phase 3: Limited Release (Week 3)
|
||||||
|
- [ ] Release to subset of customers (e.g., preview tier)
|
||||||
|
- [ ] Gather feedback on usability
|
||||||
|
- [ ] Monitor classification code distribution
|
||||||
|
- [ ] Look for unexpected errors or edge cases
|
||||||
|
|
||||||
|
### Phase 4: General Availability (Week 4)
|
||||||
|
- [ ] Release to all customers
|
||||||
|
- [ ] Monitor issue volume and classification codes
|
||||||
|
- [ ] Use data to identify new playbooks or improvements
|
||||||
|
- [ ] Update documentation based on feedback
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📞 Support & Maintenance
|
||||||
|
|
||||||
|
### Common Questions
|
||||||
|
|
||||||
|
**Q: Can I run the script without Azure CLI?**
|
||||||
|
A: Yes! It will skip Azure configuration checks but still do network diagnostics.
|
||||||
|
|
||||||
|
**Q: Is the script safe? Does it collect personal data?**
|
||||||
|
A: Safe. It only reads local network config and (optionally) queries Azure API if you're authenticated. Use `-Redact` to mask sensitive data before sharing.
|
||||||
|
|
||||||
|
**Q: What if I get an unexpected error?**
|
||||||
|
A: Check error message in console, review troubleshooting section in README.md, or share the JSON file with support.
|
||||||
|
|
||||||
|
**Q: How often should I re-run diagnostics?**
|
||||||
|
A: After network changes, VPN reconnect, or when troubleshooting intermittent issues.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📈 Success Metrics
|
||||||
|
|
||||||
|
Track these to measure script effectiveness:
|
||||||
|
|
||||||
|
- % of customers who run script on first issue
|
||||||
|
- % of issues self-resolved after reading recommended actions
|
||||||
|
- Reduction in escalations for network vs auth vs app issues
|
||||||
|
- Average time to triage (before: manual back-and-forth; after: automated)
|
||||||
|
- Distribution of classification codes (helps identify common issues)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🔄 Version & Updates
|
||||||
|
|
||||||
|
**Current Version:** 1.0.0
|
||||||
|
**Schema Version:** 1.0.0
|
||||||
|
**Last Updated:** 2026-05-13
|
||||||
|
|
||||||
|
**Versioning Policy:**
|
||||||
|
- Major version (1.x.x) = Breaking changes to JSON schema or classification codes
|
||||||
|
- Minor version (x.1.x) = New checks or optional fields added
|
||||||
|
- Patch version (x.x.1) = Bug fixes, documentation updates
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📄 License & Attribution
|
||||||
|
|
||||||
|
All files in this directory are provided as-is for Cosmos DB connectivity diagnostics.
|
||||||
|
See repository LICENSE file for terms.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Quick Links:**
|
||||||
|
- 🚀 [Quick Start](./QUICK_REFERENCE.md)
|
||||||
|
- 📖 [Full Documentation](./README.md)
|
||||||
|
- 🔧 [Script](./Diagnose-CosmosConnectivity.ps1)
|
||||||
|
- 🗂️ [JSON Schema](./DIAGNOSTIC_SCHEMA.md)
|
||||||
|
- 📋 [Support Playbooks](./CLASSIFICATION_MATRIX.md)
|
||||||
@@ -0,0 +1,144 @@
|
|||||||
|
# Cosmos DB Connectivity Diagnostic - Quick Reference
|
||||||
|
|
||||||
|
## 🚀 Quick Start (2 Minutes)
|
||||||
|
|
||||||
|
### Step 1: Gather Your Info
|
||||||
|
|
||||||
|
| Item | Where to Find |
|
||||||
|
|------|---|
|
||||||
|
| **Endpoint URL** | Azure Portal → Cosmos DB Account → Overview → URI field |
|
||||||
|
| **Subscription ID** | Azure Portal → Subscriptions → Copy ID |
|
||||||
|
| **Resource Group** | Azure Portal → Cosmos DB Account → Top-right "Resource group" |
|
||||||
|
| **Account Name** | From endpoint URL (the part before `.documents.azure.com`) |
|
||||||
|
|
||||||
|
### Step 2: Run the Script
|
||||||
|
|
||||||
|
**Interactive (easiest):**
|
||||||
|
```powershell
|
||||||
|
.\Diagnose-CosmosConnectivity.ps1 -Interactive
|
||||||
|
```
|
||||||
|
Script will prompt for inputs and guide you.
|
||||||
|
|
||||||
|
**Non-interactive:**
|
||||||
|
```powershell
|
||||||
|
.\Diagnose-CosmosConnectivity.ps1 `
|
||||||
|
-EndpointUrl "https://my-cosmos.documents.azure.com" `
|
||||||
|
-SubscriptionId "12345678-1234-1234-1234-123456789012" `
|
||||||
|
-ResourceGroup "my-rg" `
|
||||||
|
-AccountName "my-cosmos"
|
||||||
|
```
|
||||||
|
|
||||||
|
**With redaction (safe for support):**
|
||||||
|
```powershell
|
||||||
|
.\Diagnose-CosmosConnectivity.ps1 `
|
||||||
|
-EndpointUrl "https://my-cosmos.documents.azure.com" `
|
||||||
|
-SubscriptionId "12345678-1234-1234-1234-123456789012" `
|
||||||
|
-ResourceGroup "my-rg" `
|
||||||
|
-AccountName "my-cosmos" `
|
||||||
|
-Redact
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 3: Check Result
|
||||||
|
|
||||||
|
Look for the **Classification** line:
|
||||||
|
|
||||||
|
```
|
||||||
|
Classification: SUCCESS - network_connectivity_healthy
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📊 Result Codes
|
||||||
|
|
||||||
|
| Code | Meaning | Action |
|
||||||
|
|------|---------|--------|
|
||||||
|
| ✅ `network_connectivity_healthy` | Network OK | Check auth/RBAC if operations still fail |
|
||||||
|
| ❌ `dns_resolution_failed` | Cannot find hostname | Check VPN/proxy DNS settings |
|
||||||
|
| ❌ `tcp_connectivity_blocked` | DNS works, but port 443 blocked | Ask network team to check firewall |
|
||||||
|
| ❌ `private_endpoint_network_path_blocked` | Private endpoint unreachable | Ask network team to check PE routing |
|
||||||
|
| ⚠️ `rbac_insufficient` | Not enough permissions | Ask admin for Cosmos DB Operator role |
|
||||||
|
| ⚠️ `azure_config_check_skipped` | Azure CLI not set up | Run `az login` and re-run |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🆘 Common Fixes
|
||||||
|
|
||||||
|
### DNS Resolution Failed
|
||||||
|
1. Are you on a VPN? → Ask VPN admin about DNS settings
|
||||||
|
2. Check manually: `nslookup my-cosmos-account.documents.azure.com`
|
||||||
|
3. Try different DNS: `nslookup my-cosmos-account.documents.azure.com 8.8.8.8`
|
||||||
|
|
||||||
|
### TCP 443 Blocked (Public Endpoint)
|
||||||
|
1. Check Windows Firewall (Windows Defender) settings
|
||||||
|
2. If on corporate network → Ask IT if 443 outbound is allowed
|
||||||
|
3. Try from mobile hotspot to test
|
||||||
|
|
||||||
|
### TCP 443 Blocked (Private Endpoint)
|
||||||
|
1. Verify VPN is connected
|
||||||
|
2. Ask network team to check NSG and routing rules
|
||||||
|
3. Provide them with the script output (use `-Redact` to mask sensitive data)
|
||||||
|
|
||||||
|
### RBAC Insufficient
|
||||||
|
1. Ask admin to assign you **"Cosmos DB Operator"** role
|
||||||
|
2. Wait 5-10 minutes for role assignment to propagate
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📁 Output Files
|
||||||
|
|
||||||
|
**JSON Report:** `cosmos-diagnostic-<timestamp>.json`
|
||||||
|
- Full diagnostic results
|
||||||
|
- Save for your records
|
||||||
|
- Can share with support (use `-Redact` first)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ⚙️ Prerequisites
|
||||||
|
|
||||||
|
- PowerShell 5.0+ (Windows, Mac, Linux)
|
||||||
|
- Network access to documents.azure.com
|
||||||
|
- (Optional) Azure CLI for full diagnostics: `az login`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 💡 Tips
|
||||||
|
|
||||||
|
**Private Endpoint?** Include the IP:
|
||||||
|
```powershell
|
||||||
|
.\Diagnose-CosmosConnectivity.ps1 -Interactive -PrivateEndpointIP "10.123.171.30"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Sharing with support safely:**
|
||||||
|
```powershell
|
||||||
|
.\Diagnose-CosmosConnectivity.ps1 ... -Redact
|
||||||
|
# Share the JSON file (sensitive data masked)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Just want DNS/TCP without Azure checks:**
|
||||||
|
- Run without providing SubscriptionId/ResourceGroup/AccountName
|
||||||
|
- Or don't run `az login` first
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📞 Getting Help
|
||||||
|
|
||||||
|
**If you see:**
|
||||||
|
- ✅ Green checkmarks → Network is working. Issue is likely application-level.
|
||||||
|
- ❌ Red X marks → Network is blocked. Share the JSON with support.
|
||||||
|
- ⚠️ Yellow warnings → Configuration issue. Follow recommended actions.
|
||||||
|
|
||||||
|
**Next:** Share your JSON report with support and include the **Classification Code**.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📋 Checklist Before Contacting Support
|
||||||
|
|
||||||
|
- [ ] I ran the script successfully
|
||||||
|
- [ ] I noted the **Classification Code** (from console output)
|
||||||
|
- [ ] I checked the **Recommended Actions** section
|
||||||
|
- [ ] I tried the basic fixes above
|
||||||
|
- [ ] I saved the JSON report
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Version:** 1.0.0 | **Last Updated:** 2026-05-13
|
||||||
@@ -0,0 +1,424 @@
|
|||||||
|
# Cosmos DB Connectivity Diagnostic Script - README
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
This is a standalone PowerShell diagnostic script that captures network connectivity, private endpoint configuration, and Azure RBAC status for Cosmos DB accounts. It's designed to be run locally on a customer's machine to help troubleshoot HTTP 0.0 and connection errors.
|
||||||
|
|
||||||
|
**Key Features:**
|
||||||
|
- ✅ DNS resolution verification
|
||||||
|
- ✅ TCP 443 connectivity testing
|
||||||
|
- ✅ HTTPS reachability probe
|
||||||
|
- ✅ Private endpoint detection
|
||||||
|
- ✅ Private network route analysis
|
||||||
|
- ✅ Azure CLI optional context (network config, RBAC)
|
||||||
|
- ✅ Structured JSON output for triage automation
|
||||||
|
- ✅ Sensitive data redaction for safe sharing
|
||||||
|
- ✅ Interactive and non-interactive modes
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
- PowerShell 5.0+ (works on Windows, Linux, macOS)
|
||||||
|
- If querying Azure config: Azure CLI installed and authenticated (`az login`)
|
||||||
|
- Outbound network access to documents.azure.com
|
||||||
|
|
||||||
|
### Option 1: Interactive Mode (Recommended for First Run)
|
||||||
|
|
||||||
|
Simplest approach—script prompts for inputs:
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
.\Diagnose-CosmosConnectivity.ps1 -Interactive
|
||||||
|
```
|
||||||
|
|
||||||
|
The script will display a guide showing where to find each input, then prompt:
|
||||||
|
- Endpoint URL
|
||||||
|
- Subscription ID
|
||||||
|
- Resource Group
|
||||||
|
- Account Name
|
||||||
|
- (Optional) Private Endpoint IP
|
||||||
|
- (Optional) VPN Subnet Range
|
||||||
|
|
||||||
|
### Option 2: Non-Interactive Mode (Scripted/Automated)
|
||||||
|
|
||||||
|
Provide all parameters directly:
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
.\Diagnose-CosmosConnectivity.ps1 `
|
||||||
|
-EndpointUrl "https://my-cosmos-account.documents.azure.com" `
|
||||||
|
-SubscriptionId "12345678-1234-1234-1234-123456789012" `
|
||||||
|
-ResourceGroup "my-resource-group" `
|
||||||
|
-AccountName "my-cosmos-account"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Option 3: Non-Interactive with Redaction (Safe for Support)
|
||||||
|
|
||||||
|
Output JSON with sensitive data masked:
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
.\Diagnose-CosmosConnectivity.ps1 `
|
||||||
|
-EndpointUrl "https://my-cosmos-account.documents.azure.com" `
|
||||||
|
-SubscriptionId "12345678-1234-1234-1234-123456789012" `
|
||||||
|
-ResourceGroup "my-resource-group" `
|
||||||
|
-AccountName "my-cosmos-account" `
|
||||||
|
-Redact
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Detailed Usage
|
||||||
|
|
||||||
|
### Getting Your Inputs
|
||||||
|
|
||||||
|
#### 1. **Endpoint URL** (Required)
|
||||||
|
**Location:** Azure Portal → Cosmos DB Account → Overview
|
||||||
|
|
||||||
|
1. Go to [Azure Portal](https://portal.azure.com)
|
||||||
|
2. Search for "Cosmos DB"
|
||||||
|
3. Click your Cosmos DB account
|
||||||
|
4. Look for the **"URI"** field in the Overview tab
|
||||||
|
5. Copy the entire URL (e.g., `https://my-cosmos-account.documents.azure.com`)
|
||||||
|
|
||||||
|
**Format:** `https://<account-name>.documents.azure.com` (do NOT include trailing slash or `:443/`)
|
||||||
|
|
||||||
|
**Note:** If using a regional endpoint, use the primary endpoint. Private endpoints will have the same hostname with different IP resolution.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### 2. **Subscription ID** (Required)
|
||||||
|
**Location:** Azure Portal → Subscriptions or Portal → Home
|
||||||
|
|
||||||
|
1. Go to [Azure Portal](https://portal.azure.com)
|
||||||
|
2. Click on "Subscriptions" (or search for it)
|
||||||
|
3. Find your subscription
|
||||||
|
4. Copy the **Subscription ID** (looks like `12345678-1234-1234-1234-123456789012`)
|
||||||
|
|
||||||
|
**Alternative:** From your Cosmos account page, look at the breadcrumb at the top or search box.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### 3. **Resource Group** (Required)
|
||||||
|
**Location:** Azure Portal → Cosmos DB Account (top-right corner)
|
||||||
|
|
||||||
|
1. Open your Cosmos DB account
|
||||||
|
2. At the top of the page, you'll see breadcrumbs
|
||||||
|
3. Look for **"Resource group: <name>"** in the top-right
|
||||||
|
4. Or on the Overview page, find the **"Resource group"** field
|
||||||
|
|
||||||
|
**Example:** `my-production-rg` or `cosmos-resources`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### 4. **Account Name** (Required)
|
||||||
|
**Location:** Extract from endpoint URL or Azure Portal
|
||||||
|
|
||||||
|
**From URL:**
|
||||||
|
- Endpoint: `https://my-cosmos-account.documents.azure.com`
|
||||||
|
- Account Name: `my-cosmos-account` (the part before `.documents.azure.com`)
|
||||||
|
|
||||||
|
**From Portal:**
|
||||||
|
- Open Cosmos DB account → Look at the account name in the breadcrumb or page title
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### 5. **Private Endpoint IP** (Optional but Recommended)
|
||||||
|
**Location:** Azure Portal → Cosmos DB Account → Private Endpoint Connections
|
||||||
|
|
||||||
|
1. Open your Cosmos DB account
|
||||||
|
2. Go to **Settings** → **Private Endpoint Connections**
|
||||||
|
3. If any connections exist, look for **"Private IP address"** column
|
||||||
|
4. Copy the IP (e.g., `10.123.171.30`)
|
||||||
|
|
||||||
|
**When to provide:**
|
||||||
|
- If your Cosmos account has private endpoints configured
|
||||||
|
- Otherwise, leave blank (press Enter in interactive mode)
|
||||||
|
|
||||||
|
**Format:** `10.x.x.x`, `172.16-31.x.x`, or `192.168.x.x` (RFC 1918 ranges)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### 6. **VPN Subnet Range** (Optional)
|
||||||
|
**Location:** Ask your network team or VPN client properties
|
||||||
|
|
||||||
|
If you're connecting via VPN, your network team should know your VPN subnet CIDR.
|
||||||
|
|
||||||
|
**Example:** `10.0.0.0/24` (network: 10.0.0.0–10.0.0.255)
|
||||||
|
|
||||||
|
**When to provide:**
|
||||||
|
- If you're behind a VPN
|
||||||
|
- If you suspect VPN routing is the issue
|
||||||
|
- Otherwise, leave blank
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Understanding Output
|
||||||
|
|
||||||
|
#### Console Summary
|
||||||
|
|
||||||
|
After running, you'll see:
|
||||||
|
|
||||||
|
```
|
||||||
|
═════════════════════════════════════════════════════════════════════════════
|
||||||
|
DIAGNOSTIC COMPLETE
|
||||||
|
═════════════════════════════════════════════════════════════════════════════
|
||||||
|
|
||||||
|
Summary:
|
||||||
|
DNS Resolution: ✓ PASS
|
||||||
|
TCP Connectivity: ✗ FAIL
|
||||||
|
Private Network: Detected (Private Endpoint)
|
||||||
|
Classification: FAILURE - tcp_connectivity_blocked
|
||||||
|
|
||||||
|
Full report saved to: cosmos-diagnostic-20260513_143045.json
|
||||||
|
|
||||||
|
Summary:
|
||||||
|
TCP 443 connection failed to private endpoint. Network path is blocked.
|
||||||
|
|
||||||
|
Recommended Actions:
|
||||||
|
1. Verify VPN connectivity and that your client subnet can route to the private endpoint subnet
|
||||||
|
2. Ask your network team to verify routing from DESKTOP-ABC123 to private endpoint 10.123.171.30
|
||||||
|
3. Check Azure network security groups (NSGs) rules for port 443 inbound
|
||||||
|
4. Verify Azure Virtual Network peering and User Defined Routes (UDRs)
|
||||||
|
5. Check if corporate firewall/NVA is blocking the connection
|
||||||
|
6. Manually run: Test-NetConnection -ComputerName my-cosmos-account.documents.azure.com -Port 443
|
||||||
|
|
||||||
|
Full JSON Report:
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
#### JSON Output File
|
||||||
|
|
||||||
|
A file like `cosmos-diagnostic-20260513_143045.json` is automatically saved in the current directory.
|
||||||
|
|
||||||
|
**Use this file to:**
|
||||||
|
- Share with support (can use `-Redact` to mask sensitive data)
|
||||||
|
- Parse with automation tools
|
||||||
|
- Retain diagnostic history
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Common Scenarios
|
||||||
|
|
||||||
|
### Scenario 1: "I can't connect to Cosmos DB from my machine"
|
||||||
|
|
||||||
|
**Run this:**
|
||||||
|
```powershell
|
||||||
|
.\Diagnose-CosmosConnectivity.ps1 -Interactive
|
||||||
|
```
|
||||||
|
|
||||||
|
**Interpret results:**
|
||||||
|
- If `dns_resolution_failed` → Check VPN/proxy DNS settings
|
||||||
|
- If `tcp_connectivity_blocked` → Ask network team to check firewall/NSG rules
|
||||||
|
- If `network_connectivity_healthy` → Issue is auth/RBAC, not network
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Scenario 2: "Private endpoint isn't working"
|
||||||
|
|
||||||
|
**Run this:**
|
||||||
|
```powershell
|
||||||
|
.\Diagnose-CosmosConnectivity.ps1 `
|
||||||
|
-EndpointUrl "https://my-cosmos.documents.azure.com" `
|
||||||
|
-SubscriptionId "your-sub-id" `
|
||||||
|
-ResourceGroup "your-rg" `
|
||||||
|
-AccountName "your-account" `
|
||||||
|
-PrivateEndpointIP "10.123.171.30"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Interpret results:**
|
||||||
|
- If resolved IP matches private endpoint IP but TCP fails → VPN route blocked
|
||||||
|
- If resolved IP differs from provided IP → Route misconfiguration
|
||||||
|
- If network is healthy → Check private DNS zone configuration
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Scenario 3: "How do I share this with support safely?"
|
||||||
|
|
||||||
|
**Run with redaction:**
|
||||||
|
```powershell
|
||||||
|
.\Diagnose-CosmosConnectivity.ps1 `
|
||||||
|
-EndpointUrl "https://my-cosmos.documents.azure.com" `
|
||||||
|
-SubscriptionId "your-sub-id" `
|
||||||
|
-ResourceGroup "your-rg" `
|
||||||
|
-AccountName "your-account" `
|
||||||
|
-Redact
|
||||||
|
```
|
||||||
|
|
||||||
|
Then share the generated JSON file. Sensitive data (subscription ID, usernames, tenant ID) will be masked as `REDACTED`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Scenario 4: "I need the diagnostics in a pipeline"
|
||||||
|
|
||||||
|
**Non-interactive with JSON output capture:**
|
||||||
|
```powershell
|
||||||
|
$json = .\Diagnose-CosmosConnectivity.ps1 `
|
||||||
|
-EndpointUrl "https://my-cosmos.documents.azure.com" `
|
||||||
|
-SubscriptionId "your-sub-id" `
|
||||||
|
-ResourceGroup "your-rg" `
|
||||||
|
-AccountName "your-account" 2>&1 `
|
||||||
|
| Select-String -Pattern '^\{' -SimpleMatch | ConvertFrom-Json
|
||||||
|
|
||||||
|
# Now use $json in automation
|
||||||
|
if ($json.classification.code -eq "network_connectivity_healthy") {
|
||||||
|
Write-Host "Network OK, escalating to app team"
|
||||||
|
} else {
|
||||||
|
Write-Host "Network issue: $($json.classification.summary)"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Classification Codes
|
||||||
|
|
||||||
|
The script produces one of these classification codes:
|
||||||
|
|
||||||
|
| Code | Meaning |
|
||||||
|
|------|---------|
|
||||||
|
| `network_connectivity_healthy` | ✓ Network works. If errors, check auth/RBAC. |
|
||||||
|
| `dns_resolution_failed` | ✗ Cannot resolve endpoint hostname. |
|
||||||
|
| `tcp_connectivity_blocked` | ✗ DNS works, but TCP 443 blocked. |
|
||||||
|
| `private_endpoint_network_path_blocked` | ✗ Private endpoint detected, TCP fails. |
|
||||||
|
| `rbac_insufficient` | ⚠ Network OK, but RBAC permissions missing. |
|
||||||
|
| `azure_config_check_skipped` | ⚠ Azure CLI not authenticated. |
|
||||||
|
|
||||||
|
See [CLASSIFICATION_MATRIX.md](./CLASSIFICATION_MATRIX.md) for detailed playbooks and support guidance.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Advanced Usage
|
||||||
|
|
||||||
|
### Running Specific Checks
|
||||||
|
|
||||||
|
The script always runs all checks, but you can parse the JSON to focus on specific ones:
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
# Get just DNS results
|
||||||
|
$report = Get-Content cosmos-diagnostic-*.json | ConvertFrom-Json
|
||||||
|
$report.diagnostics.dns | ConvertTo-Json
|
||||||
|
|
||||||
|
# Get classification only
|
||||||
|
$report.classification | ConvertTo-Json
|
||||||
|
|
||||||
|
# Check if RBAC is sufficient
|
||||||
|
$report.diagnostics.rbac.classification
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Integration with Support Ticketing
|
||||||
|
|
||||||
|
When opening a support case:
|
||||||
|
|
||||||
|
1. **Run the script** (interactive mode is fine)
|
||||||
|
2. **Include the generated JSON file** in your ticket
|
||||||
|
3. **Or use `-Redact` flag** if sharing with external support
|
||||||
|
|
||||||
|
Example ticket text:
|
||||||
|
```
|
||||||
|
Title: Cosmos DB Connection Errors
|
||||||
|
|
||||||
|
Body:
|
||||||
|
Experiencing connection errors to my Cosmos DB account.
|
||||||
|
Attached diagnostic results (cosmos-diagnostic-*.json).
|
||||||
|
|
||||||
|
Network Status: [paste classification.status]
|
||||||
|
Issue Code: [paste classification.code]
|
||||||
|
Endpoint: [paste target.hostname]
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Troubleshooting the Script Itself
|
||||||
|
|
||||||
|
#### Script won't run (permission denied)
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
|
||||||
|
```
|
||||||
|
|
||||||
|
Then re-run the script.
|
||||||
|
|
||||||
|
#### "Azure CLI not found" but I need RBAC checks
|
||||||
|
|
||||||
|
Install Azure CLI:
|
||||||
|
- Windows: https://learn.microsoft.com/en-us/cli/azure/install-azure-cli-windows
|
||||||
|
- Mac: `brew install azure-cli`
|
||||||
|
- Linux: Follow docs at https://learn.microsoft.com/en-us/cli/azure/install-azure-cli-linux
|
||||||
|
|
||||||
|
Then:
|
||||||
|
```powershell
|
||||||
|
az login
|
||||||
|
```
|
||||||
|
|
||||||
|
Re-run the script.
|
||||||
|
|
||||||
|
#### Endpoint validation error
|
||||||
|
|
||||||
|
**Error:** "Invalid format. Expected: https://<account-name>.documents.azure.com"
|
||||||
|
|
||||||
|
**Fix:** Remove trailing slash or port from URL:
|
||||||
|
- ❌ `https://my-cosmos.documents.azure.com/` (trailing slash)
|
||||||
|
- ❌ `https://my-cosmos.documents.azure.com:443/` (with port)
|
||||||
|
- ✅ `https://my-cosmos.documents.azure.com` (correct)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## File Outputs
|
||||||
|
|
||||||
|
### Generated Files
|
||||||
|
|
||||||
|
After running, the script creates:
|
||||||
|
|
||||||
|
**`cosmos-diagnostic-<timestamp>.json`**
|
||||||
|
- Full diagnostic report in JSON format
|
||||||
|
- Machine-readable for automation
|
||||||
|
- Can be shared with support
|
||||||
|
- Keep for troubleshooting history
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## JSON Schema
|
||||||
|
|
||||||
|
For details on JSON structure, field definitions, and sample outputs, see [DIAGNOSTIC_SCHEMA.md](./DIAGNOSTIC_SCHEMA.md).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Support Routing
|
||||||
|
|
||||||
|
Based on classification code, route as follows:
|
||||||
|
|
||||||
|
| Classification | Route To |
|
||||||
|
|---|---|
|
||||||
|
| `network_connectivity_healthy` | Application/Auth team—network verified working |
|
||||||
|
| `dns_resolution_failed` | VPN/Network team—DNS issue |
|
||||||
|
| `tcp_connectivity_blocked` (public IP) | Firewall/ISP team—outbound port blocked |
|
||||||
|
| `private_endpoint_network_path_blocked` | Network team—PE routing issue |
|
||||||
|
| `rbac_insufficient` | Cosmos DB Access Control team |
|
||||||
|
| `azure_config_check_skipped` | Customer: Run `az login` first |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Version
|
||||||
|
|
||||||
|
**Script Version:** 1.0.0
|
||||||
|
**Schema Version:** 1.0.0
|
||||||
|
**Last Updated:** 2026-05-13
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
This script is provided as-is for diagnosing Cosmos DB connectivity issues. See [LICENSE](../../LICENSE) for terms.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
1. **Run the script:** `.\Diagnose-CosmosConnectivity.ps1 -Interactive`
|
||||||
|
2. **Review output:** Check the JSON report and console summary
|
||||||
|
3. **Follow recommended actions** based on the classification code
|
||||||
|
4. **Share with support** if needed (use `-Redact` for sensitive data masking)
|
||||||
|
|
||||||
|
For questions or issues with the script itself, contact the Cosmos DB team.
|
||||||
@@ -0,0 +1,510 @@
|
|||||||
|
# Cosmos DB Connectivity Diagnostic - Test Scenarios
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
This document defines test scenarios, expected outcomes, and validation procedures for the diagnostic script. Use these to verify script functionality across different network configurations.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Test Infrastructure Setup
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
- Test Cosmos DB accounts in multiple configurations:
|
||||||
|
- Public endpoint only
|
||||||
|
- Private endpoint only
|
||||||
|
- Both public + private endpoints
|
||||||
|
- Test networks:
|
||||||
|
- Clean network (no corporate proxy/VPN)
|
||||||
|
- Behind corporate proxy
|
||||||
|
- Behind VPN (if possible)
|
||||||
|
- Restricted network (firewall blocking 443)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Test Scenarios
|
||||||
|
|
||||||
|
### Scenario 1: Healthy Public Endpoint (All Checks Pass)
|
||||||
|
|
||||||
|
**Setup:**
|
||||||
|
- Cosmos account with public endpoint enabled
|
||||||
|
- Running from clean network (no VPN/proxy)
|
||||||
|
- Azure CLI authenticated (optional)
|
||||||
|
|
||||||
|
**Run:**
|
||||||
|
```powershell
|
||||||
|
.\Diagnose-CosmosConnectivity.ps1 `
|
||||||
|
-EndpointUrl "https://test-public-01.documents.azure.com" `
|
||||||
|
-SubscriptionId "12345678-1234-1234-1234-123456789012" `
|
||||||
|
-ResourceGroup "test-cosmos-rg" `
|
||||||
|
-AccountName "test-public-01"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected Results:**
|
||||||
|
- ✅ DNS resolution: `succeeded = true`
|
||||||
|
- ✅ TCP connectivity: `succeeded = true`
|
||||||
|
- ✅ HTTPS probe: `statusCode = 401` (expected without auth)
|
||||||
|
- ✅ Private network: `isPrivateRange = false`
|
||||||
|
- ✅ Classification: `status = "success"`, `code = "network_connectivity_healthy"`
|
||||||
|
|
||||||
|
**Validation Checklist:**
|
||||||
|
- [ ] Console shows "✓ PASS" for DNS and TCP
|
||||||
|
- [ ] Recommended Actions mention checking RBAC/auth
|
||||||
|
- [ ] JSON file created successfully
|
||||||
|
- [ ] Latency values are reasonable (< 1000ms)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Scenario 2: DNS Resolution Failure
|
||||||
|
|
||||||
|
**Setup:**
|
||||||
|
- Network with DNS resolver that blocks documents.azure.com
|
||||||
|
- OR simulate by providing invalid hostname
|
||||||
|
|
||||||
|
**Run:**
|
||||||
|
```powershell
|
||||||
|
.\Diagnose-CosmosConnectivity.ps1 `
|
||||||
|
-EndpointUrl "https://invalid-account-xyz123.documents.azure.com" `
|
||||||
|
-SubscriptionId "12345678-1234-1234-1234-123456789012" `
|
||||||
|
-ResourceGroup "test-cosmos-rg" `
|
||||||
|
-AccountName "invalid-account"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected Results:**
|
||||||
|
- ❌ DNS resolution: `succeeded = false`, `error = "No such host is known"`
|
||||||
|
- ❌ TCP connectivity: `succeeded = false`
|
||||||
|
- ❌ Classification: `status = "failure"`, `code = "dns_resolution_failed"`
|
||||||
|
|
||||||
|
**Validation Checklist:**
|
||||||
|
- [ ] Console shows "✗ FAIL" for DNS
|
||||||
|
- [ ] Error message is clear
|
||||||
|
- [ ] Root cause in classification mentions DNS/VPN/proxy
|
||||||
|
- [ ] Recommended actions include running manual `nslookup`
|
||||||
|
- [ ] JSON contains error details
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Scenario 3: TCP Blocked (Public Endpoint)
|
||||||
|
|
||||||
|
**Setup:**
|
||||||
|
- Network with firewall blocking outbound port 443 to documents.azure.com
|
||||||
|
- DNS resolves successfully but TCP fails
|
||||||
|
|
||||||
|
**Run:**
|
||||||
|
```powershell
|
||||||
|
.\Diagnose-CosmosConnectivity.ps1 `
|
||||||
|
-EndpointUrl "https://test-public-02.documents.azure.com" `
|
||||||
|
-SubscriptionId "12345678-1234-1234-1234-123456789012" `
|
||||||
|
-ResourceGroup "test-cosmos-rg" `
|
||||||
|
-AccountName "test-public-02"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected Results:**
|
||||||
|
- ✅ DNS resolution: `succeeded = true`
|
||||||
|
- ❌ TCP connectivity: `succeeded = false`, `error = "Connection timeout after 5000ms"`
|
||||||
|
- ❌ HTTPS probe: `statusCode = null`, `error contains "timeout"`
|
||||||
|
- ❌ Private network: `isPrivateRange = false`
|
||||||
|
- ❌ Classification: `status = "failure"`, `code = "tcp_connectivity_blocked"`
|
||||||
|
|
||||||
|
**Validation Checklist:**
|
||||||
|
- [ ] DNS shows success, TCP shows timeout
|
||||||
|
- [ ] Console summary distinguishes DNS success from TCP failure
|
||||||
|
- [ ] Root cause mentions firewall/ISP/proxy
|
||||||
|
- [ ] Recommended actions include corporate network contact
|
||||||
|
- [ ] Timeout latency is approximately 5000ms
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Scenario 4: Healthy Private Endpoint
|
||||||
|
|
||||||
|
**Setup:**
|
||||||
|
- Cosmos account with private endpoint configured
|
||||||
|
- Client connected to VPN that can route to PE
|
||||||
|
- PE IP known and provided
|
||||||
|
|
||||||
|
**Run:**
|
||||||
|
```powershell
|
||||||
|
.\Diagnose-CosmosConnectivity.ps1 `
|
||||||
|
-EndpointUrl "https://test-private-01.documents.azure.com" `
|
||||||
|
-SubscriptionId "12345678-1234-1234-1234-123456789012" `
|
||||||
|
-ResourceGroup "test-cosmos-rg" `
|
||||||
|
-AccountName "test-private-01" `
|
||||||
|
-PrivateEndpointIP "10.123.171.30"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected Results:**
|
||||||
|
- ✅ DNS resolution: `succeeded = true`, `addresses = ["10.123.171.30"]`
|
||||||
|
- ✅ TCP connectivity: `succeeded = true`
|
||||||
|
- ✅ Private network: `isPrivateRange = true`, `matchesExpectedPrivateEndpoint = true`
|
||||||
|
- ✅ Azure config: `publicNetworkAccessRestricted = true` (if checked)
|
||||||
|
- ✅ Classification: `status = "success"`, `code = "network_connectivity_healthy"`
|
||||||
|
|
||||||
|
**Validation Checklist:**
|
||||||
|
- [ ] DNS resolves to private IP (10.x)
|
||||||
|
- [ ] TCP succeeds to private IP
|
||||||
|
- [ ] Indicators correctly identify private endpoint
|
||||||
|
- [ ] Expected PE IP matches resolved IP
|
||||||
|
- [ ] Classification recognizes healthy private path
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Scenario 5: Private Endpoint Network Path Blocked
|
||||||
|
|
||||||
|
**Setup:**
|
||||||
|
- Private endpoint configured
|
||||||
|
- Client on VPN but routing to PE subnet is blocked
|
||||||
|
- DNS resolves to PE IP but TCP times out
|
||||||
|
|
||||||
|
**Run:**
|
||||||
|
```powershell
|
||||||
|
.\Diagnose-CosmosConnectivity.ps1 `
|
||||||
|
-EndpointUrl "https://test-private-02.documents.azure.com" `
|
||||||
|
-SubscriptionId "12345678-1234-1234-1234-123456789012" `
|
||||||
|
-ResourceGroup "test-cosmos-rg" `
|
||||||
|
-AccountName "test-private-02" `
|
||||||
|
-PrivateEndpointIP "10.123.171.30"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected Results:**
|
||||||
|
- ✅ DNS resolution: `succeeded = true`, `addresses = ["10.123.171.30"]`
|
||||||
|
- ❌ TCP connectivity: `succeeded = false`, `error = "Connection timeout after 5000ms"`
|
||||||
|
- ✅ Private network: `isPrivateRange = true`, `matchesExpectedPrivateEndpoint = true`, `vpnRouteWarning != null`
|
||||||
|
- ❌ Classification: `status = "failure"`, `code = "private_endpoint_network_path_blocked"`
|
||||||
|
|
||||||
|
**Validation Checklist:**
|
||||||
|
- [ ] DNS resolves to expected PE IP
|
||||||
|
- [ ] TCP to PE IP fails with timeout
|
||||||
|
- [ ] VPN route warning is populated
|
||||||
|
- [ ] Classification correctly identifies PE path issue
|
||||||
|
- [ ] Recommended actions mention network team + routing
|
||||||
|
- [ ] Source IP is captured (if available)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Scenario 6: RBAC Insufficient
|
||||||
|
|
||||||
|
**Setup:**
|
||||||
|
- Network connectivity is working
|
||||||
|
- Azure CLI authenticated as user with limited RBAC (e.g., only Reader role)
|
||||||
|
- Account queried successfully
|
||||||
|
|
||||||
|
**Run:**
|
||||||
|
```powershell
|
||||||
|
az login # Login as limited user first
|
||||||
|
.\Diagnose-CosmosConnectivity.ps1 `
|
||||||
|
-EndpointUrl "https://test-rbac-01.documents.azure.com" `
|
||||||
|
-SubscriptionId "12345678-1234-1234-1234-123456789012" `
|
||||||
|
-ResourceGroup "test-cosmos-rg" `
|
||||||
|
-AccountName "test-rbac-01"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected Results:**
|
||||||
|
- ✅ DNS resolution: `succeeded = true`
|
||||||
|
- ✅ TCP connectivity: `succeeded = true`
|
||||||
|
- ✅ HTTPS probe: `statusCode = 401` or `200`
|
||||||
|
- ❌ RBAC: `classification = "insufficient"`, `canReadAccount = false`
|
||||||
|
- ⚠️ Classification: `status = "warning"`, `code = "rbac_insufficient"`
|
||||||
|
|
||||||
|
**Validation Checklist:**
|
||||||
|
- [ ] Network checks all pass
|
||||||
|
- [ ] RBAC assessment shows limited permissions
|
||||||
|
- [ ] Classification code is `rbac_insufficient`
|
||||||
|
- [ ] Recommended actions mention role assignment
|
||||||
|
- [ ] Error message explains what permissions are missing
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Scenario 7: Azure CLI Not Authenticated
|
||||||
|
|
||||||
|
**Setup:**
|
||||||
|
- All network checks work fine
|
||||||
|
- Azure CLI not installed OR not authenticated
|
||||||
|
|
||||||
|
**Run:**
|
||||||
|
```powershell
|
||||||
|
# Without running az login first
|
||||||
|
.\Diagnose-CosmosConnectivity.ps1 `
|
||||||
|
-EndpointUrl "https://test-public-03.documents.azure.com" `
|
||||||
|
-SubscriptionId "12345678-1234-1234-1234-123456789012" `
|
||||||
|
-ResourceGroup "test-cosmos-rg" `
|
||||||
|
-AccountName "test-public-03"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected Results:**
|
||||||
|
- ✅ DNS resolution: `succeeded = true`
|
||||||
|
- ✅ TCP connectivity: `succeeded = true`
|
||||||
|
- ⚠️ Azure CLI: `authenticated = false`, `error = "Not authenticated with Azure CLI. Run 'az login' to proceed."`
|
||||||
|
- ⚠️ Azure config: `checked = false`, `error = "Skipped"`
|
||||||
|
- ⚠️ Classification: May reference `azure_config_check_skipped` in warnings
|
||||||
|
|
||||||
|
**Validation Checklist:**
|
||||||
|
- [ ] Network checks complete normally
|
||||||
|
- [ ] Azure CLI context shows unauthenticated
|
||||||
|
- [ ] Console warning mentions `az login`
|
||||||
|
- [ ] Recommended actions suggest re-running after authentication
|
||||||
|
- [ ] Script doesn't crash; gracefully continues
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Scenario 8: Interactive Mode Input Flow
|
||||||
|
|
||||||
|
**Setup:**
|
||||||
|
- User runs script with -Interactive flag
|
||||||
|
- Has all inputs ready
|
||||||
|
|
||||||
|
**Run:**
|
||||||
|
```powershell
|
||||||
|
.\Diagnose-CosmosConnectivity.ps1 -Interactive
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected Sequence:**
|
||||||
|
1. Show input instructions with Portal navigation guide
|
||||||
|
2. Prompt: "Endpoint URL (e.g., https://my-cosmos.documents.azure.com)"
|
||||||
|
3. Validate input format; re-prompt if invalid
|
||||||
|
4. Prompt: "Subscription ID (12345678-...)"
|
||||||
|
5. Validate GUID format; re-prompt if invalid
|
||||||
|
6. Prompt: "Resource Group name"
|
||||||
|
7. Prompt: "Account Name"
|
||||||
|
8. Prompt: "Private Endpoint IP (optional, press Enter to skip)"
|
||||||
|
9. Prompt: "VPN Subnet Range (optional, press Enter to skip)"
|
||||||
|
10. Run diagnostics
|
||||||
|
11. Display results
|
||||||
|
|
||||||
|
**Validation Checklist:**
|
||||||
|
- [ ] Input instructions are clear and helpful
|
||||||
|
- [ ] Format validation rejects invalid inputs
|
||||||
|
- [ ] Optional fields can be skipped (Enter key)
|
||||||
|
- [ ] All inputs accepted without error
|
||||||
|
- [ ] Diagnostics run successfully after inputs collected
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Scenario 9: Non-Interactive with Redaction
|
||||||
|
|
||||||
|
**Setup:**
|
||||||
|
- Run with -Redact flag
|
||||||
|
- Collect JSON output
|
||||||
|
|
||||||
|
**Run:**
|
||||||
|
```powershell
|
||||||
|
$json = .\Diagnose-CosmosConnectivity.ps1 `
|
||||||
|
-EndpointUrl "https://test-public-04.documents.azure.com" `
|
||||||
|
-SubscriptionId "12345678-1234-1234-1234-123456789012" `
|
||||||
|
-ResourceGroup "test-cosmos-rg" `
|
||||||
|
-AccountName "test-public-04" `
|
||||||
|
-Redact 2>&1 | Select-String -Pattern '^\{' -SimpleMatch | ConvertFrom-Json
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected Results:**
|
||||||
|
- ✅ JSON output completes successfully
|
||||||
|
- ✅ Target section: `subscriptionId = "REDACTED-SUBSCRIPTION-ID"`
|
||||||
|
- ✅ Target section: `resourceGroup = "REDACTED"`
|
||||||
|
- ✅ Target section: `accountName = "REDACTED"`
|
||||||
|
- ✅ Hostname is NOT redacted (needed for triage): `hostname = "test-public-04.documents.azure.com"`
|
||||||
|
- ✅ Azure CLI: `currentUser = "REDACTED-USER-NAME"`
|
||||||
|
- ✅ Azure CLI: `currentTenant = "REDACTED-TENANT-ID"`
|
||||||
|
|
||||||
|
**Validation Checklist:**
|
||||||
|
- [ ] Sensitive fields masked as "REDACTED-*"
|
||||||
|
- [ ] Hostname NOT masked
|
||||||
|
- [ ] JSON still parseable
|
||||||
|
- [ ] Redaction doesn't break classification
|
||||||
|
- [ ] All RBAC role names preserved (not redacted)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Scenario 10: Private Endpoint IP Mismatch
|
||||||
|
|
||||||
|
**Setup:**
|
||||||
|
- Private endpoint exists but expected IP is different from resolved IP
|
||||||
|
- Can happen if PE reconfigured or DNS zone stale
|
||||||
|
|
||||||
|
**Run:**
|
||||||
|
```powershell
|
||||||
|
.\Diagnose-CosmosConnectivity.ps1 `
|
||||||
|
-EndpointUrl "https://test-private-03.documents.azure.com" `
|
||||||
|
-SubscriptionId "12345678-1234-1234-1234-123456789012" `
|
||||||
|
-ResourceGroup "test-cosmos-rg" `
|
||||||
|
-AccountName "test-private-03" `
|
||||||
|
-PrivateEndpointIP "10.123.171.99" # Expected IP (not matching actual)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected Results (if actual PE IP is 10.123.171.30):**
|
||||||
|
- ✅ DNS resolution: `succeeded = true`, `addresses = ["10.123.171.30"]`
|
||||||
|
- ✅ TCP connectivity: `succeeded = true` (connects to actual PE)
|
||||||
|
- ⚠️ Private network: `matchesExpectedPrivateEndpoint = false`, `indicators contains "WARNING: Resolved to 10.123.171.30 but expected ..."`
|
||||||
|
- ⚠️ Classification: May include `private_endpoint_mismatch` warning
|
||||||
|
|
||||||
|
**Validation Checklist:**
|
||||||
|
- [ ] Mismatch detected
|
||||||
|
- [ ] Warning includes both expected and actual IPs
|
||||||
|
- [ ] TCP still attempts with actual resolved IP
|
||||||
|
- [ ] Classification identifies discrepancy
|
||||||
|
- [ ] Recommended actions mention checking PE config
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Scenario 11: Latency Metrics
|
||||||
|
|
||||||
|
**Setup:**
|
||||||
|
- Healthy connection
|
||||||
|
- Measure and log latency values
|
||||||
|
|
||||||
|
**Run:**
|
||||||
|
```powershell
|
||||||
|
$json = .\Diagnose-CosmosConnectivity.ps1 -EndpointUrl "..." -SubscriptionId "..." ... 2>&1 |
|
||||||
|
Select-String -Pattern '^\{' | ConvertFrom-Json
|
||||||
|
$json.diagnostics.dns.latencyMs
|
||||||
|
$json.diagnostics.tcp.latencyMs
|
||||||
|
$json.diagnostics.https.latencyMs
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected Results:**
|
||||||
|
- DNS latency: 10-100ms (typical)
|
||||||
|
- TCP latency: 20-200ms (depends on network)
|
||||||
|
- HTTPS latency: 50-500ms (full round trip)
|
||||||
|
- All values > 0 and < 10000 (reasonable)
|
||||||
|
|
||||||
|
**Validation Checklist:**
|
||||||
|
- [ ] Latency values are integers (milliseconds)
|
||||||
|
- [ ] Values are reasonable for network conditions
|
||||||
|
- [ ] No values are unrealistic (0 or > 60000)
|
||||||
|
- [ ] Timeouts show latencyMs = 0
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Scenario 12: Multiple Endpoints (Batch Testing)
|
||||||
|
|
||||||
|
**Setup:**
|
||||||
|
- Multiple accounts to test
|
||||||
|
- Non-interactive batch mode
|
||||||
|
|
||||||
|
**Run:**
|
||||||
|
```powershell
|
||||||
|
$accounts = @(
|
||||||
|
@{Url="https://account1.documents.azure.com"; Sub="..."; RG="rg1"; Name="account1"},
|
||||||
|
@{Url="https://account2.documents.azure.com"; Sub="..."; RG="rg2"; Name="account2"},
|
||||||
|
@{Url="https://account3.documents.azure.com"; Sub="..."; RG="rg3"; Name="account3"}
|
||||||
|
)
|
||||||
|
|
||||||
|
$results = @()
|
||||||
|
foreach ($acct in $accounts) {
|
||||||
|
$json = .\Diagnose-CosmosConnectivity.ps1 @acct 2>&1 |
|
||||||
|
Select-String -Pattern '^\{' | ConvertFrom-Json
|
||||||
|
$results += @{
|
||||||
|
Account = $acct.Name
|
||||||
|
Classification = $json.classification.code
|
||||||
|
DNS = $json.diagnostics.dns.succeeded
|
||||||
|
TCP = $json.diagnostics.tcp.succeeded
|
||||||
|
}
|
||||||
|
}
|
||||||
|
$results | Format-Table
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected Results:**
|
||||||
|
- All accounts processed without error
|
||||||
|
- JSON output captured for each
|
||||||
|
- Results table shows aggregated status
|
||||||
|
- Classification codes vary based on network conditions
|
||||||
|
|
||||||
|
**Validation Checklist:**
|
||||||
|
- [ ] Batch processing completes
|
||||||
|
- [ ] All JSON files created
|
||||||
|
- [ ] No cross-account contamination
|
||||||
|
- [ ] Timestamp differs for each run
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Regression Test Checklist
|
||||||
|
|
||||||
|
Use this checklist before each release:
|
||||||
|
|
||||||
|
- [ ] **Script Execution**
|
||||||
|
- [ ] Interactive mode completes
|
||||||
|
- [ ] Non-interactive mode with all parameters
|
||||||
|
- [ ] Redaction flag works
|
||||||
|
- [ ] Help/documentation displays correctly
|
||||||
|
|
||||||
|
- [ ] **Network Diagnostics**
|
||||||
|
- [ ] DNS resolution succeeds on good network
|
||||||
|
- [ ] DNS resolution fails on blocked network
|
||||||
|
- [ ] TCP succeeds on open port
|
||||||
|
- [ ] TCP times out on blocked port
|
||||||
|
- [ ] HTTPS probe returns status code
|
||||||
|
|
||||||
|
- [ ] **Private Endpoints**
|
||||||
|
- [ ] Detects private IP ranges correctly
|
||||||
|
- [ ] Compares against expected PE IP
|
||||||
|
- [ ] Handles PE IP mismatches gracefully
|
||||||
|
|
||||||
|
- [ ] **Azure Integration**
|
||||||
|
- [ ] Works with authenticated Azure CLI
|
||||||
|
- [ ] Gracefully handles unauthenticated state
|
||||||
|
- [ ] Queries account config successfully
|
||||||
|
- [ ] RBAC assessment runs
|
||||||
|
|
||||||
|
- [ ] **JSON Output**
|
||||||
|
- [ ] Valid JSON syntax
|
||||||
|
- [ ] All expected fields present
|
||||||
|
- [ ] Field values are correct types
|
||||||
|
- [ ] Redacted fields are properly masked
|
||||||
|
|
||||||
|
- [ ] **Classification**
|
||||||
|
- [ ] Success code for healthy network
|
||||||
|
- [ ] DNS failure code for DNS issues
|
||||||
|
- [ ] TCP failure code for blocked ports
|
||||||
|
- [ ] PE path blocked code for PE issues
|
||||||
|
- [ ] RBAC code for permission issues
|
||||||
|
|
||||||
|
- [ ] **Documentation**
|
||||||
|
- [ ] Recommended actions are actionable
|
||||||
|
- [ ] Error messages are helpful
|
||||||
|
- [ ] Output is readable and organized
|
||||||
|
|
||||||
|
- [ ] **Edge Cases**
|
||||||
|
- [ ] Invalid URL format rejected
|
||||||
|
- [ ] Invalid GUID format rejected
|
||||||
|
- [ ] Timeout handling works
|
||||||
|
- [ ] No unhandled exceptions
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Performance Expectations
|
||||||
|
|
||||||
|
| Operation | Expected Time | Timeout |
|
||||||
|
|-----------|---|---|
|
||||||
|
| DNS resolution | 10-100ms | 5000ms |
|
||||||
|
| TCP connect | 20-200ms | 5000ms |
|
||||||
|
| HTTPS probe | 50-500ms | 5000ms |
|
||||||
|
| Azure CLI queries | 1-5 seconds | 10000ms |
|
||||||
|
| Full script (good network) | 10-20 seconds | N/A |
|
||||||
|
| Full script (blocked port) | ~5 seconds | N/A |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Success Criteria
|
||||||
|
|
||||||
|
A test scenario passes if:
|
||||||
|
1. ✅ Script completes without unhandled exceptions
|
||||||
|
2. ✅ JSON output is valid and contains all expected fields
|
||||||
|
3. ✅ Classification code matches expected scenario
|
||||||
|
4. ✅ Recommended actions are relevant to the issue
|
||||||
|
5. ✅ Latency values are reasonable
|
||||||
|
6. ✅ Redaction (if enabled) properly masks sensitive fields
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Sign-Off
|
||||||
|
|
||||||
|
**QA Tester:** _________________ **Date:** _________
|
||||||
|
|
||||||
|
**Reviewed By:** _________________ **Date:** _________
|
||||||
|
|
||||||
|
**Approved for Release:** _________________ **Date:** _________
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Version
|
||||||
|
|
||||||
|
- **Script Version:** 1.0.0
|
||||||
|
- **Test Plan Version:** 1.0.0
|
||||||
|
- **Last Updated:** 2026-05-13
|
||||||
Reference in New Issue
Block a user