← Back
Release History

Changelog

A record of improvements, fixes, and new features — newest first.

0.6.1

2026-04-16

Fixed

  • Tibetan syllable parser falsely flagged **འོ** (achung + o-vowel) on valid forms such as **བྱའོ**, **པོའོ**, and **མཐོའོ**. The **འི** path is now generalized: shared `ACHUNG_SUFFIX_VOWELS` (ི, ོ), one detect/parse pipeline, and completeness rules aligned so two vowel marks are allowed when the second sits on suffix **འ**.
  • Spelling rules page: **རུ་** locative example updated from **རྒྱལ་ཁབ་རུ་** to **བཀའ་རུ་**. A **བ** suffix takes **ཏུ་** (e.g. **རྒྱལ་ཁབ་ཏུ་** “to/in the kingdom”), not **རུ་**, which only follows **འ** or no suffix—consistent with the particle rules in the spellchecker.

0.6.0

2026-04-15

Fixed

  • Docker Compose startup failure caused by `confidence_score` index referencing a non-existent column in `schema.sql` and `001_add_spelling_reference.sql`

Added

  • `TabButton`, `PDFDownloadLink`, and `ResetLink` extracted as shared components
  • `Button` component gains `disabled` and `type` props

Changed

  • `TextInput` Clear and Submit buttons now use the shared `Button` component
  • `spellcheck.tsx` `useEffect`-to-reset-state anti-pattern replaced with `checkedText`/`isStale` derived flag
  • `PDFUpload` internal `selectedFile` state removed; parent-owned state passed as prop
  • `ResultLink` (spellcheck.tsx) and `DownloadButton` (JobStatus.tsx) consolidated into shared `PDFDownloadLink`

0.5.0

2026-04-15

Added

  • Upload size cap (50 MB) on PDF endpoints; returns HTTP 413 on oversized files
  • `max_length=100_000` on text spellcheck input to prevent event-loop blocking
  • Async page limit restored: PDFs ≤ 15 pages sync, larger PDFs routed to background queue

Changed

  • `TibetanSpellChecker` and `DictionaryService` now shared as module-level singletons — DB corpus loaded once at startup instead of per request
  • CPU-bound PDF processing steps (`extract_pdf`, `annotate_pdf`, `build_docx`, `check_text`) offloaded to thread pool via `asyncio.to_thread` to keep the event loop unblocked

Removed

  • Unused `checkHealth` function and `PDFResultURLs`, `SpellCheckRequest` interfaces from frontend API client
  • Unused `EmailStr` import from spellcheck schemas; `email-validator` dependency dropped
  • Unused `pyctcdecode` and `thin-plate-spline` dependencies
  • Stale archive docs: `REFACTORING_SUMMARY.md`, `MIGRATION_COMPLETE.md`, `SETUP_COMPLETE.md`, `RTF_QUICK_START.md`
  • Stale conversational artifact from `WORD_CORPUS_PLAN.md`

0.4.0

2026-04-14

Added

  • Dictionary-based word lookup (Phase 2) — structurally valid syllables are now checked against a corpus of attested Tibetan words and flagged as `unknown_word` if absent, surfaced as yellow warnings distinct from red structural errors
  • `spelling_reference` database table with normalized lookup, source tracking, dialect and Sanskrit flags, and four indexes for fast access
  • `DictionaryService` — loads corpus at startup, extracts every valid syllable from multi-syllable entries, O(1) frozenset lookup; degrades silently when no database is configured
  • Corpus build scripts: `build_corpus.py` with scaffolded extractors for THDL, Rangjung Yeshe, and Monlam; `seed_corpus.py` with ~55 hand-verified words for pipeline testing
  • `corpus_hit` field on Phase 1 structural errors — flags when a flagged syllable is nonetheless attested in the corpus, useful for monitoring potential false positives over time
  • Structured JSON logging on every spell check (`spellcheck_result` tag) with aggregate counts for analysis via `grep | jq`
  • `GET /api/v1/corpus/stats` endpoint

0.3.0

2026-04-01

Added

  • PDF spellcheck pipeline — upload a Tibetan PDF, get back an annotated DOCX with each structurally invalid syllable marked; supports both digital PDFs and scanned documents via OCR fallback
  • Async job support for PDF processing — upload returns a job ID; frontend polls until complete
  • BDRC ONNX OCR engine for scanned PDFs, replacing Tesseract; model files excluded from version control
  • Broken-CMap detection for digital PDFs — automatically falls back to OCR when the PDF's character map is corrupt or missing (ADR-015)
  • OCR/copy-paste parity tests — fixture-based regression tests verifying that the OCR and digital extraction paths produce equivalent spellcheck results
  • PDF spell check UI — upload form, job progress polling, email capture, results display
  • Shared `PageTitle` and `SectionHeading` typography components adopted across static pages

Changed

  • Particle validation tightened: ས, ར, ཞིག, and ཅིག marked as lenient to reduce false positives; ཏུ handling improved; locative suggestions refined
  • PDF annotation and DOCX export refactored to syllable granularity
  • Digital PDF extraction migrated from pdfplumber to fitz for better CMap support

Removed

  • JSON download endpoint removed from PDF spellcheck pipeline

0.2.0

2026-03-12

Added

  • Particle spelling rules — context-aware validation of relational, agentive, locative, and indefinite article particles against the suffix of the preceding word, with suggestions for the correct form
  • Excepted words — ནའང་ and ལའང་ are now recognized as grammatically valid by convention and no longer flagged
  • Tibetan numeral syllables (U+0F20–U+0F33) are now silently skipped; numbers no longer produce false positives or break particle context
  • Tibetan punctuation and mark characters (yig mgo openers ༄ ༅, decorative shad variants ༑ ༒, gter tsheg ༔, etc.) are now silently skipped
  • Spelling Rules page — learner-facing reference documenting syllable anatomy, particle rules by category, and an explanation of each error type
  • Changelog page — release notes published on the frontend at /changelog

Fixed

  • ཧཧིབ་ was not being flagged as invalid; ཧ (ha) cannot act as a prefix and is now correctly detected as an unparsed character
  • "Relational" used throughout in place of "genitive" for the bdag gi sgra particle category, matching standard English terminology for this community

Changed

  • Subscript validation re-enabled after confirming the rule data is correct and no false positives occur on attested Tibetan words

0.1.0

2026-02-09

Added

  • Tibetan syllable spellchecker — validates the structural correctness of Tibetan syllables against classical spelling rules
  • Validates all five prefix consonants (ག་ ད་ བ་ མ་ ར་) against valid root consonant combinations
  • Validates superscripts (ར་ ལ་ ས་), subscripts (ྱ ྲ ླ ྭ), suffixes, and post-suffixes
  • Detects impossible syllable structures: subscripts appearing after vowels or suffixes, multiple vowel groups, unusual mark positions
  • Detects syllables that exceed the maximum valid length
  • Web interface for checking Tibetan text inline
  • Wylie transliteration shown alongside each flagged syllable for reference

Resources for the Orgyen Khandroling Sangha

© 2026 Butter Dots Dot Com