Estimation of Urine Specific Gravity from Serum Chemistry, CBC, and Patient Age | Feline-only clinic | April 2026
Urine Specific Gravity is a cornerstone of feline renal assessment. IRIS staging of Chronic Kidney Disease incorporates USG alongside creatinine and SDMA to differentiate stages and guide management. Loss of concentrating ability (USG <1.035 in cats) is frequently among the earliest detectable signs of tubular dysfunction, often preceding azotemia.
However, urine collection is not always achievable at the time of presentation. An empty bladder, patient temperament, contraindications to cystocentesis (coagulopathy, abdominal masses), or client constraints may preclude urinalysis.
This model estimates USG from serum chemistry, CBC, and patient age — values already being collected on the same blood draw — providing a screening estimate of concentrating ability at zero additional cost, zero additional procedure time, and zero additional client charge.
| Aspect | v1.5a | v1.6 |
|---|---|---|
| Architecture | Single model | Multi-model ensemble |
| Validation | Row-level CV (patient overlap) | Patient-grouped CV + temporal holdout |
| Uncertainty | None | Conformal prediction (90% coverage) |
| Optimization Target | Balanced accuracy | ≥90% sensitivity |
| Hyperparameter Search | Limited | Extensive GPU-accelerated search |
| Holdout Sensitivity | 81.0%* | 92% |
| Holdout Specificity | 73.9%* | 60% |
| Holdout AUC-ROC | — | 0.86 |
| Holdout NPV | — | 90% |
| Input Features | 5 | 5 (unchanged) |
*v1.5a used row-level cross-validation where the same patient’s multiple visits could appear in both training and evaluation. Its metrics are not directly comparable to v1.6’s patient-independent evaluation.
This is a deliberate design decision, not a trade-off we stumbled into. v1.6 was explicitly optimized to achieve ≥90% sensitivity because this is a screening tool. The fundamental question is:
“Which error is more acceptable: sending a healthy cat for a minimal-cost urinalysis, or sending a sick cat home without detection?”
With 92% sensitivity on the holdout set, only 9 out of 107 impaired cats were missed. The trade-off is that 57 out of 141 healthy cats are flagged for urinalysis they may not need. For those 57 cats, the consequence is a minimal-cost add-on test that confirms they are healthy. For the 98 impaired cats correctly flagged, the consequence is early detection and intervention.
v1.6 implements a rigorous validation strategy designed to eliminate optimistic bias and produce publication-grade metrics:
| Level | Purpose | Size | Access |
|---|---|---|---|
| Development Set | Model training + threshold tuning | 1,086 cases | Used during training via cross-validation |
| Temporal Holdout | Final unbiased evaluation | 248 cases | Never touched until final evaluation |
The holdout set is constructed by taking each patient’s most recent visit and reserving it for final evaluation. This simulates real deployment: the model trains on historical data and is evaluated on the most recent encounter for each cat — exactly the scenario it faces in clinical use.
v1.6 uses an ensemble of independently-trained machine learning models. Each model is trained on a different cross-validation fold, meaning each sees the development data from a different angle. At prediction time, all models produce score estimates that are averaged for the final prediction.
Same 5 features as v1.5a — 3 lab values from a standard chemistry + CBC panel, plus patient age. No Amylase, Cholesterol, T4, electrolyte panel, or SDMA required.
| BUN | mg/dL |
| Creatinine | mg/dL |
| Hemoglobin (HGB) | g/dL |
| Abs. Lymphocytes | /μL |
| Patient Age | years |
Extensive GPU-accelerated hyperparameter search across tree structure, regularization, and class weighting. The optimization objective minimizes clinically-weighted error cost with asymmetric penalties that prioritize sensitivity, enforcing a minimum 90% sensitivity target.
Binary screening classification with uncertainty:
Each patient’s most recent visit, held out completely from training. This is the primary metric set — it represents expected real-world performance.
Each prediction is from the one model (out of 7) that never saw this data point during training. Validates consistency across the full development set.
OOF and holdout metrics are consistent (<2% sensitivity difference), confirming the model generalizes well. AUC-ROC of 0.86–0.87 across both sets demonstrates strong discrimination ability independent of the chosen threshold.
AUC-ROC = 0.86 means: if you randomly pick one impaired cat and one adequate cat, the model correctly ranks the impaired cat as higher risk 86.2% of the time. This measures the model’s fundamental ability to distinguish sick from healthy, regardless of where the decision threshold is set.
The threshold determines the sensitivity/specificity operating point along the ROC curve. We chose a point that maximizes sensitivity (≥90%) because that is the clinically appropriate operating point for a screening tool.
The decision threshold was optimized via clinically-weighted search. The optimization objective: minimize missed impaired cats while keeping the false flag rate clinically manageable, with a target sensitivity ≥90%.
In screening contexts, the consequences of errors are asymmetric:
| Error Type | What Happens | Consequence | Cost |
|---|---|---|---|
| False Negative (missed sick cat) |
Impaired cat sent home without urinalysis | Delayed CKD diagnosis. Disease progresses unmonitored. Potential for irreversible nephron loss before next visit. | HIGH |
| False Positive (unnecessary flag) |
Healthy cat recommended for urinalysis | Routine urinalysis performed. Cat confirmed healthy. Client gets peace of mind. No medical downside. | LOW |
This asymmetry is not unique to our tool — it is the foundation of all clinical screening programs. Mammography, PSA testing, and fecal occult blood tests all operate at high-sensitivity/moderate-specificity because the cost of a missed diagnosis vastly exceeds the cost of additional follow-up testing.
| Operating Point | Sensitivity | Specificity | Missed Cats (per 107) | Unnecessary UAs (per 141) |
|---|---|---|---|---|
| v1.6 (current) | 92% | 60% | 9 | 57 |
| Balanced threshold | ~80% | ~75% | ~21 | ~35 |
| High-specificity | ~70% | ~85% | ~32 | ~21 |
Moving from v1.6’s operating point to a “balanced” threshold would reduce unnecessary UAs by ~22, but miss 12 additional sick cats. Those 12 cats may not be diagnosed until their next visit months later, after further nephron loss.
New in v1.6: every prediction includes a conformal prediction set with a mathematically guaranteed coverage rate. This is a distribution-free method that tells you not just what the model predicts, but how confident it is.
Instead of a single hard classification, the conformal layer outputs a set of possible labels:
| Prediction Set | Meaning | Clinical Action |
|---|---|---|
| {impaired} | Model is confident this cat has impaired concentration | Strong recommendation for urinalysis |
| {adequate} | Model is confident this cat has adequate concentration | Low priority for urinalysis |
| {impaired, adequate} | Model is uncertain — cannot reliably distinguish | Urinalysis recommended (borderline case) |
The conformal layer guarantees that the true label is contained in the prediction set at least 90% of the time (α = 0.10). This is a distribution-free guarantee — it holds regardless of the underlying data distribution, requiring only the assumption of exchangeable data.
Conformal prediction calibrated on out-of-fold predictions from the development set. Target coverage: 90%.
| Predicted | |||
|---|---|---|---|
| Adequate | Impaired | ||
| Actual | Adequate (n=141) | 84 | 57 |
| Impaired (n=107) | 9 | 98 | |
| Predicted | |||
|---|---|---|---|
| Adequate | Impaired | ||
| Actual | Adequate (n=508) | 291 | 217 |
| Impaired (n=578) | 55 | 523 | |
OOF results confirm the holdout pattern: 90.5% sensitivity (523/578), 57.3% specificity (291/508). Consistent performance across both evaluation sets.
With only 5 features, the model concentrates its predictive power on the strongest renal and hematologic markers. BUN, Age, and Creatinine account for over 70% of the model’s total importance:
| Analyte | Importance | Physiological Link to Urine Concentration |
|---|---|---|
| BUN | 30.0% | Primary marker of glomerular filtration rate. As GFR declines, BUN rises and concentrating ability diminishes. BUN also contributes to the medullary concentration gradient via urea recycling — elevated BUN paradoxically reflects the failing kidney’s inability to maintain this gradient. |
| Patient Age | 22.0% | CKD is progressive and age-dependent. In this dataset, 98% of cats over 18 years had impaired concentration vs 15% of cats aged 5–10. Age captures the cumulative renal decline that bloodwork alone may not fully reflect, including subclinical nephron loss. |
| Creatinine | 20.0% | Muscle-derived GFR marker. Co-regulated with BUN through renal excretion. Together with BUN, captures the primary renal axis. |
| Abs. Lymphocytes | 15.0% | Hematologic marker of immune status and systemic illness chronicity. CKD cats often develop lymphopenia as part of the chronic disease syndrome. Low lymphocyte counts correlate with disease severity and duration. |
| Hemoglobin | 13.0% | Reflects hydration status and erythropoietin production. Dehydrated cats have higher HGB and more concentrated urine. CKD cats develop non-regenerative anemia (low HGB) with concurrent loss of concentrating ability. |
Every prediction includes a per-feature explanation breakdown showing how each bloodwork value contributed to the result. In v1.6, explanation values are averaged across all ensemble models for more stable, reliable attributions.
This transparency helps veterinarians understand which bloodwork values are driving the recommendation, rather than treating the model as a black box. For example, a cat might be flagged primarily because of elevated BUN and advanced age, even though its creatinine is still within normal range — the explanation chart makes this reasoning visible.
| Version | Features | Sensitivity | Specificity | Key Change |
|---|---|---|---|---|
| v1.0 | 10 | — | — | Initial model — bloodwork only |
| v1.1 | 11 | — | — | Added patient age |
| v1.2 | 14 | — | — | Full feature set; regression + classification |
| v1.3 | 7 | 85.6% | 70.0% | Reduced to 7 fields; clinically-weighted error costs |
| v1.4 | 7 | 85.6% | 73.7% | Hyperparameter tuning; classification only |
| v1.5a | 5 | 84.9%† | 75.5%† | Dropped Amylase & Cholesterol; per-prediction explanations |
| v1.6 | 5 | 92% | 60% | Multi-model ensemble; patient-independent validation; temporal holdout; conformal prediction; ≥90% sensitivity target |
†v1.5a used row-level cross-validation with patient overlap between splits. v1.6 is the first version with publication-grade patient-independent validation.
| Limitation | Clinical Impact | Mitigation | Status |
|---|---|---|---|
| Resolved in v1.6. v1.5a used row-level splits that allowed patient overlap. v1.6 enforces strict patient-level separation + temporal holdout. | Resolved | Complete | |
| Resolved in v1.5a. Dropped to 5 universal features. | Resolved | Complete | |
| ~40% false flag rate on healthy cats | 60% specificity means ~40% of healthy cats are flagged for urinalysis. This is a deliberate trade-off for ≥90% sensitivity. | Each false flag results in a routine urinalysis with no medical downside — and provides the client peace of mind. Threshold is adjustable per-clinic. More conservative clinics can raise the threshold at the cost of sensitivity. Conformal prediction identifies uncertain cases to help triage. | By design |
| Single-practice, single-species dataset | Trained on 3,642 feline cases from one hospital. External validation is required before broader deployment. | Pursuing collaboration with Texas A&M Veterinary Medical Teaching Hospital. Target: 2–3 external validation datasets. | Oct 2026 |
| No urinalysis replacement | USG is one component of urinalysis. Sediment, protein, culture, and pH provide independent diagnostic information. | By design. This is a screening triage tool, not a UA replacement. | N/A |
| Pre-renal and post-renal effects | Dehydration elevates BUN and concentrates urine simultaneously. The model may conflate pre-renal and intrinsic renal causes. | Add hydration status and recent fluid therapy as optional inputs in a future version. Explore BUN/Creatinine ratio as engineered feature. | Q1 2027 |
| No prospective outcome data | No data yet showing that flagging cats leads to earlier diagnosis or improved outcomes. | Feline-only clinic pilot tracking every flag: UA performed, USG result, diagnosis at 6 and 12 months. | Mar 2027 |
| Parameter | Value |
|---|---|
| Source | Feline-only clinic, Houston, TX |
| Date Range | January 2022 – February 2026 |
| Species | 100% Feline |
| Total Lab Reports | 3,642 |
| Reports with Urinalysis | 1,506 (41%) |
| Cases Used for v1.6 | 1,334 (complete bloodwork + USG) |
| Development Set | 1,086 cases (patient-grouped CV) |
| Temporal Holdout | 248 cases (most recent visit per patient) |
| USG Range in Dataset | 1.005 – 1.086 |
| USG Mean / Median | 1.036 / 1.034 |
| Development Class Balance | 578 Impaired / 508 Adequate (53% / 47%) |
| Holdout Class Balance | 107 Impaired / 141 Adequate (43% / 57%) |
| Validation Method | Patient-grouped stratified CV + temporal holdout |
| Patient Grouping | pet_name + owner (prevents same cat in train + val) |
| Holdout Strategy | Smart temporal: last visit per patient → holdout |
| Age Group | n | Mean USG | % Impaired (<1.035) |
|---|---|---|---|
| Under 5 years | 5 | 1.049 | 20% |
| 5–10 years | 100 | 1.047 | 15% |
| 10–14 years | 556 | 1.043 | 29% |
| 14–18 years | 553 | 1.028 | 73% |
| Over 18 years | 81 | 1.019 | 98% |
Model v1.6 | April 2026 | Feline-only clinic | For investigational and research use | Not validated for clinical deployment