12 KiB
Cosmos DB Connectivity Diagnostic - Complete Documentation Index
📦 Deliverables
This folder contains a complete, production-ready diagnostic toolkit for troubleshooting Cosmos DB connectivity issues. Below is a guide to all files and their purpose.
📚 Documentation Files
1. README.md ← Start here
Purpose: Comprehensive usage guide for customers and support teams
Contains:
- Overview and features
- Quick start in 3 modes (interactive, non-interactive, with redaction)
- Step-by-step guide to finding all inputs
- Understanding output format
- Common scenarios and examples
- Integration examples
- Troubleshooting guide
- Troubleshooting common issues
Read this if: You're running the script for the first time or onboarding someone else
2. QUICK_REFERENCE.md ← For urgent issues
Purpose: 2-minute quick-start card for customers
Contains:
- 3-step quick start
- Result codes at a glance
- Common fixes
- Prerequisite checklist
Read this if: You need to run the script NOW and don't have time for full docs
3. DIAGNOSTIC_SCHEMA.md ← For developers/automation
Purpose: Complete JSON output specification
Contains:
- Full JSON schema with field descriptions
- Root, target, execution, diagnostics, and classification objects
- DNS/TCP/HTTPS/private network result formats
- Azure config and RBAC object structures
- Classification code reference table
- Sample outputs for 3 scenarios
- Parsing guidelines
- Version history
Read this if:
- You're building a parser or automation tool
- You need to understand the JSON structure
- You're integrating with support ticketing system
- You want to validate output structure
4. CLASSIFICATION_MATRIX.md ← For support teams
Purpose: Support playbooks and triage routing
Contains:
- Decision tree flowchart (ASCII art)
- All classification codes with detailed explanations
- Root causes and recommended actions for each code
- Tier 1 triage checklist
- Detailed playbooks for each failure scenario:
- DNS Resolution Failed
- TCP 443 Failed (Public Endpoint)
- TCP 443 Failed (Private Endpoint)
- RBAC Insufficient
- Support ticket template
- Python parsing example
- Automation routing matrix
Read this if:
- You're a support engineer receiving diagnostic reports
- You need to route issues based on classification
- You're building automation to process diagnostics
- You need to escalate to specialist teams
🔧 Script File
Diagnose-CosmosConnectivity.ps1
Purpose: Main diagnostic script (customer-executable)
What it does:
- Prompts for account endpoints and credentials (interactive or parameterized)
- Runs 5 diagnostic checks:
- DNS resolution of account endpoint
- TCP 443 connectivity test
- HTTPS reachability probe
- Private network indicators analysis
- Azure CLI queries (if authenticated)
- Performs RBAC assessment
- Generates classification (success/failure/warning + specific code)
- Outputs structured JSON to file and console
- Produces human-readable summary with recommended actions
Key Features:
- 300+ lines of well-commented PowerShell
- Error handling for all network operations
- Timeouts to prevent hanging
- Optional sensitive data redaction
- Works on Windows, macOS, Linux (PowerShell 5.0+)
- No external dependencies except optional Azure CLI
How to run:
# Interactive (recommended first run)
.\Diagnose-CosmosConnectivity.ps1 -Interactive
# Non-interactive (scripted)
.\Diagnose-CosmosConnectivity.ps1 `
-EndpointUrl "..." -SubscriptionId "..." -ResourceGroup "..." -AccountName "..."
# Safe for support (redacted)
.\Diagnose-CosmosConnectivity.ps1 ... -Redact
🔄 File Relationships
Customer Issue: "Can't connect to Cosmos DB"
│
├─→ QUICK_REFERENCE.md (if in hurry)
│ │
│ └─→ "Run this command"
│
└─→ README.md (comprehensive guidance)
│
├─→ Run: Diagnose-CosmosConnectivity.ps1
│ │
│ └─→ Outputs JSON file + console summary
│
├─→ Read classification code
│
└─→ CLASSIFICATION_MATRIX.md (support playbook)
│
├─→ Find your classification code
│
├─→ Read root causes
│
└─→ Follow recommended actions
│
├─→ Self-resolve?
│ └─→ Done!
│
└─→ Still stuck?
│
├─→ Gather info from JSON
│
├─→ Redact with -Redact flag
│
└─→ Escalate to support
│
├─→ Support triages with CLASSIFICATION_MATRIX.md
│
└─→ Route to specialist (network, auth, etc.)
🎯 Usage by Role
👤 Customer / End User
- Read: QUICK_REFERENCE.md (2 min)
- Gather inputs as shown in README.md
- Run:
.\Diagnose-CosmosConnectivity.ps1 -Interactive - Review output—look for Classification Code
- Try recommended actions from console output
- If stuck → Share JSON with support (use
-Redact)
👨💼 Support Engineer (Tier 1)
- Receive JSON report from customer
- Read: CLASSIFICATION_MATRIX.md section "Tier 1: Triage"
- Look up classification.code in "Classification Code Reference"
- Follow the corresponding playbook
- Either self-resolve or route to specialist
👨💻 Support Engineer (Specialist)
- Receive routed issue with JSON and escalation context
- Read relevant playbook from CLASSIFICATION_MATRIX.md
- Use DIAGNOSTIC_SCHEMA.md to parse specific JSON fields
- Reference "Recommended Actions" for deep-dive steps
- May request customer to re-run with additional parameters
🤖 Automation / Integration
- Read: DIAGNOSTIC_SCHEMA.md (schema specification)
- Parse JSON output from script
- Route based on classification.code
- (Optional) Read CLASSIFICATION_MATRIX.md section "JSON Parsing for Automation"
- Integrate with ticketing, routing, or remediation system
📊 Product Team / Data Analysis
- Collect diagnostic reports over time
- Aggregate classification codes to identify trends
- Use JSON structure to extract metrics (DNS latency, TCP success rate, etc.)
- Reference DIAGNOSTIC_SCHEMA.md for field definitions
- Correlate with support ticket data for insights
📋 Classification Codes at a Glance
Quick reference (full details in CLASSIFICATION_MATRIX.md):
| Code | Type | Severity | What It Means |
|---|---|---|---|
network_connectivity_healthy |
✅ | Info | Network works; if still broken, check auth/app |
dns_resolution_failed |
❌ | High | Cannot resolve endpoint (DNS/VPN/proxy issue) |
tcp_connectivity_blocked |
❌ | High | DNS works, port 443 blocked (firewall/ISP) |
private_endpoint_network_path_blocked |
❌ | High | Private endpoint unreachable (PE routing issue) |
rbac_insufficient |
⚠️ | Medium | Network OK, but permissions missing |
private_endpoint_mismatch |
⚠️ | Medium | Resolved to unexpected private IP |
azure_config_check_skipped |
⚠️ | Low | Azure CLI not authenticated; re-run after az login |
🔍 Finding Specific Information
"I want to know what the JSON contains"
→ DIAGNOSTIC_SCHEMA.md (all field definitions)
"I see a classification code, what does it mean?"
→ CLASSIFICATION_MATRIX.md (code reference + playbook)
"How do I run the script?"
→ README.md (detailed how-to) or QUICK_REFERENCE.md (2-min version)
"I'm building a parser/bot"
→ DIAGNOSTIC_SCHEMA.md (schema + samples) + CLASSIFICATION_MATRIX.md (routing logic)
"I need to support multiple customers"
→ CLASSIFICATION_MATRIX.md (support ticket template + triage playbook)
"I need to find input for a specific field"
→ README.md section "Getting Your Inputs" (step-by-step with screenshots reference)
"How do I integrate this into my system?"
→ DIAGNOSTIC_SCHEMA.md (JSON structure) + CLASSIFICATION_MATRIX.md (routing + Python example)
✅ Pre-Launch Checklist
Before deploying to customers, verify:
- Script runs without errors in interactive mode
- Script accepts all parameters in non-interactive mode
-Redactflag properly masks sensitive data- JSON output validates against DIAGNOSTIC_SCHEMA.md
- All classification codes match CLASSIFICATION_MATRIX.md
- README.md examples tested and working
- Support team trained on CLASSIFICATION_MATRIX.md playbooks
- Triage automation configured (if applicable)
- Sample JSON files created and tested
- Accessibility verified (screen readers, etc.)
🚀 Rollout Plan
Phase 1: Internal Testing (Week 1)
- Run script on various network configurations
- Test interactive and non-interactive modes
- Verify Azure CLI integration (if connected to test accounts)
- Collect sample JSON outputs
Phase 2: Support Dogfood (Week 2)
- Train support team on using CLASSIFICATION_MATRIX.md
- Have support team run diagnostics on internal test accounts
- Collect feedback on documentation clarity
- Refine playbooks based on real cases
Phase 3: Limited Release (Week 3)
- Release to subset of customers (e.g., preview tier)
- Gather feedback on usability
- Monitor classification code distribution
- Look for unexpected errors or edge cases
Phase 4: General Availability (Week 4)
- Release to all customers
- Monitor issue volume and classification codes
- Use data to identify new playbooks or improvements
- Update documentation based on feedback
📞 Support & Maintenance
Common Questions
Q: Can I run the script without Azure CLI? A: Yes! It will skip Azure configuration checks but still do network diagnostics.
Q: Is the script safe? Does it collect personal data?
A: Safe. It only reads local network config and (optionally) queries Azure API if you're authenticated. Use -Redact to mask sensitive data before sharing.
Q: What if I get an unexpected error? A: Check error message in console, review troubleshooting section in README.md, or share the JSON file with support.
Q: How often should I re-run diagnostics? A: After network changes, VPN reconnect, or when troubleshooting intermittent issues.
📈 Success Metrics
Track these to measure script effectiveness:
- % of customers who run script on first issue
- % of issues self-resolved after reading recommended actions
- Reduction in escalations for network vs auth vs app issues
- Average time to triage (before: manual back-and-forth; after: automated)
- Distribution of classification codes (helps identify common issues)
🔄 Version & Updates
Current Version: 1.0.0
Schema Version: 1.0.0
Last Updated: 2026-05-13
Versioning Policy:
- Major version (1.x.x) = Breaking changes to JSON schema or classification codes
- Minor version (x.1.x) = New checks or optional fields added
- Patch version (x.x.1) = Bug fixes, documentation updates
📄 License & Attribution
All files in this directory are provided as-is for Cosmos DB connectivity diagnostics. See repository LICENSE file for terms.
Quick Links: