diff --git a/README.md b/README.md new file mode 100644 index 0000000..ffbc793 --- /dev/null +++ b/README.md @@ -0,0 +1,109 @@ +# transcript browser + +This project started as a **transcript browser** for subtitle/text files and later diverged into an attempt to index and search PDF content. The transcript side is the core that works best right now; PDF support exists but is still rough. + +## Current status + +- Personal/experimental codebase, now being published as part of a project archive. +- Primary value: fast local search across subtitle/text-like files with quick jump-to-source actions. +- PDF indexing was added later and is incomplete. +- Build setup works, but is currently clunky and hard to follow. + +## What the app does + +- Loads files from a folder (`.srt`, `.txt`, `.html`, `.pdf`). +- Indexes file content into memory. +- Lets you search from a single query box. +- Shows matching snippets. +- Opens results in external tools: + - `.srt` -> media player at subtitle timestamp + - `.txt`/`.html` -> text editor + - `.pdf` -> PDF viewer at page + +## What is not great yet + +- **PDF parsing quality:** extraction is token-based and does not robustly handle Unicode/text layout. +- **Build system readability:** custom two-stage build flow with many hardcoded source/library entries. +- **Platform assumptions:** strongly Windows-oriented defaults (paths, commands, Win32 backend). +- **Some UX/engineering TODOs remain:** error handling and configuration polish are still in progress. + +## Repository map (excluding external modules) + +- `build.bat` - bootstrap script for the custom build tool. +- `build_file.cpp` - project-specific build recipe (compiles app and dependency objects). +- `src/transcript_browser/main.cpp` - UI/event loop and app entry point. +- `src/transcript_browser/loading_thread.cpp` - folder scanning + parsing jobs. +- `src/transcript_browser/searching_thread.cpp` - asynchronous query matching. +- `src/transcript_browser/read_srt.cpp` - SRT parsing. +- `src/transcript_browser/read_pdf.cpp` - PDF text extraction attempt. +- `src/transcript_browser/config.cpp` - config parsing/serialization and launch commands. +- `src/basic/` - shared utilities (arena, arrays, filesystem/process/thread helpers). +- `src/build_tool/` - custom build tool sources. + +## Build and run (current flow) + +This project currently expects a Windows + MSVC environment. + +1. Open a Developer Command Prompt (so `cl.exe` is available). +2. From repo root, run: + +```bat +build.bat +``` + +3. Run the built executable from `build/`: + +```bat +build\transcript_browser.exe +``` + +Notes: + +- `build.bat` first builds `build/build_tool.exe` (if missing), then executes it. +- The build tool compiles and runs `build_file.cpp` to produce `transcript_browser.exe`. +- Build outputs and object files are placed in `build/`. + +## Runtime usage + +- Start the app. +- In the input field, load a folder with: + +```text +read=C:/path/to/folder +``` + +- Press Enter to enqueue parsing. +- Type any query to search loaded content. +- Use: + - `F1` to toggle loaded files view + - `F2` to edit config commands + +## Configuration + +The app stores config next to the executable as `transcript_browser.config`. + +Keys: + +- `SRTCommand` +- `PDFCommand` +- `TXTCommand` +- `ReadOnStart` + +Supported placeholders used in commands include: + +- `{video}` +- `{time_in_seconds}` +- `{file}` +- `{page}` +- `{line}` + +If a path contains spaces, wrap it in quotes. + +## Build-system cleanup ideas + +If this project gets another iteration, high-impact cleanup would be: + +1. Replace or simplify the custom build chain (e.g., CMake/Meson or a smaller single-step script). +2. Separate third-party dependency build concerns from app build logic. +3. Remove hardcoded absolute defaults and make platform-specific commands explicit in config/docs. +4. Add a minimal regression test path for parsing/search behavior.