Python & AI/ML Coding Standards¶
Feb 2026
1. Code Style & Formatting¶
Automated consistency, zero debates
Black + isort on every save — Required¶
Configure Black (line-length 88) and isort (profile=black) in pyproject.toml. Add pre-commit hooks so unformatted code never reaches the repo. Zero config debates.
Tools: Black, isort, pre-commit
Ruff as your single linter — Required¶
Ruff replaces Flake8, pylint, pycodestyle in a single Rust-powered tool (10-100x faster). Enable rules: E, F, W, I, N, UP, S, B, A, C4, SIM, TCH, RUF. Block merge on errors.
Tools: Ruff
Type hints everywhere + mypy strict — Required¶
Type hints on all function signatures and class attributes. Enable mypy --strict in CI. For ML: annotate tensor shapes in docstrings (e.g., # shape: (batch, seq_len, d_model)).
Tools: mypy, pyright
Organized imports: stdlib then third-party then local — Required¶
Group imports: (1) standard library, (2) third-party (numpy, torch, sklearn), (3) local project. One blank line between groups. isort handles this automatically. No wildcard imports.
Tools: isort
pyproject.toml as single config source — Recommended¶
Consolidate all tool configs into pyproject.toml. No scattered setup.cfg, .flake8, mypy.ini. Pin Python version with requires-python. Use hatchling, setuptools, or flit.
2. Naming & Project Structure¶
Predictable, searchable, self-documenting
PEP 8 naming with no exceptions — Required¶
snake_case for functions/variables/modules. PascalCase for classes. UPPER_SNAKE for constants. _private prefix for internal APIs. Booleans as questions: is_trained, has_converged.
Name by intent, not type — Required¶
Avoid: df, model, X_train. Prefer: customer_transactions, churn_classifier, training_features. Exception: short-lived loop vars and well-known ML conventions (X, y) in small scopes.
Feature-based project layout — Recommended¶
src/<project>/ with subpackages by feature. Co-locate tests beside source. ML-specific: separate /notebooks, /configs, /data (gitignored), /models (gitignored), /src for production.
No magic numbers, use constants or configs — Required¶
Extract all hyperparameters to config files (YAML/TOML) or dataclasses. Use Hydra, OmegaConf, or Pydantic Settings for config management. Makes experiments reproducible.
Tools: Hydra, OmegaConf, Pydantic
Separate notebooks from production code — Required¶
Notebooks for exploration only, never production logic. Extract reusable code into .py modules immediately. Use nbstripout to strip outputs from committed notebooks.
Tools: nbstripout
3. Error Handling & Logging¶
Fail gracefully, debug quickly
Never silently swallow exceptions — Required¶
Every except block must: log with context, re-raise, or return meaningful error. except: pass is forbidden. For ML: catch specific failures (data loading, GPU OOM) with context to reproduce.
Structured logging with structlog — Required¶
Use structlog or loguru instead of print(). Log as JSON in production. Include: timestamp, severity, experiment_id, model_version. Add GPU memory and training step to ML logs.
Tools: structlog, loguru
Custom exception hierarchy — Recommended¶
Create: DataValidationError, ModelNotTrainedError, PipelineTimeoutError, InferenceError. Enables precise handling and better error messages. Map to HTTP codes at API boundaries.
Validate inputs at boundaries with Pydantic — Required¶
Pydantic BaseModel for API inputs, configs, pipeline interfaces. Validate data schemas before training with pandera. Fail fast: reject bad data before a 3-hour training run.
Tools: Pydantic, pandera
Never log secrets, PII, or model weights — Required¶
Sanitize logs: no API keys, user data, or raw model parameters. Be cautious with training samples containing PII. Audit log output. Compliance requirement (GDPR, CCPA).
4. Testing & Code Review¶
Ship with confidence
Test behavior, not implementation — Required¶
Use pytest. Test what code does, not how. Assert output correct for input. Arrange-Act-Assert pattern. For ML: test output shapes, prediction ranges, preprocessing determinism.
Tools: pytest
Testing pyramid: unit, integration, E2E — Required¶
Many fast unit tests, some integration tests (API, DB, pipeline stages), few E2E tests (full train-to-inference). Target 70-80% coverage on business logic. @pytest.mark.slow for heavy tests.
Tools: pytest-cov
ML-specific: test pipelines and model contracts — Required¶
Test: data loading schema, deterministic preprocessing (set seeds), model input shapes, valid prediction ranges, saved model reload produces same output. Use fixtures for synthetic data.
Tools: pytest fixtures
CI blocks merge on any failure — Required¶
Every PR triggers: Ruff lint, mypy check, pytest. Single failure blocks merge. Keep unit tests <5 min. Flaky tests are bugs. Use GitHub Actions or GitLab CI.
Tools: GitHub Actions
Small PRs, review for logic not style — Required¶
Style enforced by Black + Ruff. Humans review for: correctness, edge cases, error handling, security, performance. 1 approval required. PRs <400 lines. Use PR templates.
5. AI/ML Best Practices¶
Reproducible, responsible, production-ready
Pin seeds everywhere for reproducibility — Required¶
Set seeds: random, numpy, torch, tensorflow, PYTHONHASHSEED. Use deterministic algorithms (torch.use_deterministic_algorithms). Log full env: Python version, packages, GPU, CUDA.
Tools: random.seed, torch.manual_seed
Version data, models, configs, and code — Required¶
DVC or MLflow for data/model versioning. Track experiments with W&B, MLflow, or Neptune. Configs alongside code (Hydra). Every experiment reproducible from commit hash + config.
Tools: DVC, MLflow, W&B, Hydra
Separate training, eval, and inference code — Required¶
Clean interfaces: Trainer.train(), Evaluator.evaluate(), Predictor.predict(). Each independently testable. Makes it trivial to swap models or deploy to different targets.
Validate data quality before and after transforms — Required¶
Use pandera or great_expectations for data schemas. Validate: column types, value ranges, nulls, distribution drift. Run on raw input AND after preprocessing. Fail pipeline on check failure.
Tools: pandera, great_expectations
Treat AI-generated code as untrusted input — Required¶
Always review Copilot/Claude output for correctness, security, standards adherence. Run full lint + type check + tests. AI does not know your architecture. Never blindly accept.
Document model cards and ethics — Recommended¶
Every deployed model needs a model card: intended use, limitations, training data summary, eval metrics, bias analysis, failure modes. Consider fairness metrics for user-facing models.
Tools: Model Cards, SHAP, LIME
Containerize and pin for deployment — Required¶
Docker for reproducible environments. Pin ALL deps with pip-compile or Poetry lock. Pin CUDA/cuDNN in Dockerfile. No 'latest' tags. Test inference in container before deploy.
Tools: Docker, pip-tools, Poetry
Summary¶
| Section | Total | Required | Recommended |
|---|---|---|---|
| Code Style & Formatting | 5 | 4 | 1 |
| Naming & Project Structure | 5 | 4 | 1 |
| Error Handling & Logging | 5 | 4 | 1 |
| Testing & Code Review | 5 | 5 | 0 |
| AI/ML Best Practices | 7 | 6 | 1 |
| Total | 27 | 23 | 4 |