# transcript browser This project started as a **transcript browser** for subtitle/text files and later diverged into an attempt to index and search PDF content. The transcript side is the core that works best right now; PDF support exists but is still rough. ## Current status - Personal/experimental codebase, now being published as part of a project archive. - Primary value: fast local search across subtitle/text-like files with quick jump-to-source actions. - PDF indexing was added later and is incomplete. - Build setup works, but is currently clunky and hard to follow. ## What the app does - Loads files from a folder (`.srt`, `.txt`, `.html`, `.pdf`). - Indexes file content into memory. - Lets you search from a single query box. - Shows matching snippets. - Opens results in external tools: - `.srt` -> media player at subtitle timestamp - `.txt`/`.html` -> text editor - `.pdf` -> PDF viewer at page ## What is not great yet - **PDF parsing quality:** extraction is token-based and does not robustly handle Unicode/text layout. - **Build system readability:** custom two-stage build flow with many hardcoded source/library entries. - **Platform assumptions:** strongly Windows-oriented defaults (paths, commands, Win32 backend). - **Some UX/engineering TODOs remain:** error handling and configuration polish are still in progress. ## Repository map (excluding external modules) - `build.bat` - bootstrap script for the custom build tool. - `build_file.cpp` - project-specific build recipe (compiles app and dependency objects). - `src/transcript_browser/main.cpp` - UI/event loop and app entry point. - `src/transcript_browser/loading_thread.cpp` - folder scanning + parsing jobs. - `src/transcript_browser/searching_thread.cpp` - asynchronous query matching. - `src/transcript_browser/read_srt.cpp` - SRT parsing. - `src/transcript_browser/read_pdf.cpp` - PDF text extraction attempt. - `src/transcript_browser/config.cpp` - config parsing/serialization and launch commands. - `src/basic/` - shared utilities (arena, arrays, filesystem/process/thread helpers). - `src/build_tool/` - custom build tool sources. ## Build and run (current flow) This project currently expects a Windows + MSVC environment. 1. Open a Developer Command Prompt (so `cl.exe` is available). 2. From repo root, run: ```bat build.bat ``` 3. Run the built executable from `build/`: ```bat build\transcript_browser.exe ``` Notes: - `build.bat` first builds `build/build_tool.exe` (if missing), then executes it. - The build tool compiles and runs `build_file.cpp` to produce `transcript_browser.exe`. - Build outputs and object files are placed in `build/`. ## Runtime usage - Start the app. - In the input field, load a folder with: ```text read=C:/path/to/folder ``` - Press Enter to enqueue parsing. - Type any query to search loaded content. - Use: - `F1` to toggle loaded files view - `F2` to edit config commands ## Configuration The app stores config next to the executable as `transcript_browser.config`. Keys: - `SRTCommand` - `PDFCommand` - `TXTCommand` - `ReadOnStart` Supported placeholders used in commands include: - `{video}` - `{time_in_seconds}` - `{file}` - `{page}` - `{line}` If a path contains spaces, wrap it in quotes.