Add README
This commit is contained in:
109
README.md
Normal file
109
README.md
Normal file
@@ -0,0 +1,109 @@
|
|||||||
|
# transcript browser
|
||||||
|
|
||||||
|
This project started as a **transcript browser** for subtitle/text files and later diverged into an attempt to index and search PDF content. The transcript side is the core that works best right now; PDF support exists but is still rough.
|
||||||
|
|
||||||
|
## Current status
|
||||||
|
|
||||||
|
- Personal/experimental codebase, now being published as part of a project archive.
|
||||||
|
- Primary value: fast local search across subtitle/text-like files with quick jump-to-source actions.
|
||||||
|
- PDF indexing was added later and is incomplete.
|
||||||
|
- Build setup works, but is currently clunky and hard to follow.
|
||||||
|
|
||||||
|
## What the app does
|
||||||
|
|
||||||
|
- Loads files from a folder (`.srt`, `.txt`, `.html`, `.pdf`).
|
||||||
|
- Indexes file content into memory.
|
||||||
|
- Lets you search from a single query box.
|
||||||
|
- Shows matching snippets.
|
||||||
|
- Opens results in external tools:
|
||||||
|
- `.srt` -> media player at subtitle timestamp
|
||||||
|
- `.txt`/`.html` -> text editor
|
||||||
|
- `.pdf` -> PDF viewer at page
|
||||||
|
|
||||||
|
## What is not great yet
|
||||||
|
|
||||||
|
- **PDF parsing quality:** extraction is token-based and does not robustly handle Unicode/text layout.
|
||||||
|
- **Build system readability:** custom two-stage build flow with many hardcoded source/library entries.
|
||||||
|
- **Platform assumptions:** strongly Windows-oriented defaults (paths, commands, Win32 backend).
|
||||||
|
- **Some UX/engineering TODOs remain:** error handling and configuration polish are still in progress.
|
||||||
|
|
||||||
|
## Repository map (excluding external modules)
|
||||||
|
|
||||||
|
- `build.bat` - bootstrap script for the custom build tool.
|
||||||
|
- `build_file.cpp` - project-specific build recipe (compiles app and dependency objects).
|
||||||
|
- `src/transcript_browser/main.cpp` - UI/event loop and app entry point.
|
||||||
|
- `src/transcript_browser/loading_thread.cpp` - folder scanning + parsing jobs.
|
||||||
|
- `src/transcript_browser/searching_thread.cpp` - asynchronous query matching.
|
||||||
|
- `src/transcript_browser/read_srt.cpp` - SRT parsing.
|
||||||
|
- `src/transcript_browser/read_pdf.cpp` - PDF text extraction attempt.
|
||||||
|
- `src/transcript_browser/config.cpp` - config parsing/serialization and launch commands.
|
||||||
|
- `src/basic/` - shared utilities (arena, arrays, filesystem/process/thread helpers).
|
||||||
|
- `src/build_tool/` - custom build tool sources.
|
||||||
|
|
||||||
|
## Build and run (current flow)
|
||||||
|
|
||||||
|
This project currently expects a Windows + MSVC environment.
|
||||||
|
|
||||||
|
1. Open a Developer Command Prompt (so `cl.exe` is available).
|
||||||
|
2. From repo root, run:
|
||||||
|
|
||||||
|
```bat
|
||||||
|
build.bat
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Run the built executable from `build/`:
|
||||||
|
|
||||||
|
```bat
|
||||||
|
build\transcript_browser.exe
|
||||||
|
```
|
||||||
|
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
- `build.bat` first builds `build/build_tool.exe` (if missing), then executes it.
|
||||||
|
- The build tool compiles and runs `build_file.cpp` to produce `transcript_browser.exe`.
|
||||||
|
- Build outputs and object files are placed in `build/`.
|
||||||
|
|
||||||
|
## Runtime usage
|
||||||
|
|
||||||
|
- Start the app.
|
||||||
|
- In the input field, load a folder with:
|
||||||
|
|
||||||
|
```text
|
||||||
|
read=C:/path/to/folder
|
||||||
|
```
|
||||||
|
|
||||||
|
- Press Enter to enqueue parsing.
|
||||||
|
- Type any query to search loaded content.
|
||||||
|
- Use:
|
||||||
|
- `F1` to toggle loaded files view
|
||||||
|
- `F2` to edit config commands
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
The app stores config next to the executable as `transcript_browser.config`.
|
||||||
|
|
||||||
|
Keys:
|
||||||
|
|
||||||
|
- `SRTCommand`
|
||||||
|
- `PDFCommand`
|
||||||
|
- `TXTCommand`
|
||||||
|
- `ReadOnStart`
|
||||||
|
|
||||||
|
Supported placeholders used in commands include:
|
||||||
|
|
||||||
|
- `{video}`
|
||||||
|
- `{time_in_seconds}`
|
||||||
|
- `{file}`
|
||||||
|
- `{page}`
|
||||||
|
- `{line}`
|
||||||
|
|
||||||
|
If a path contains spaces, wrap it in quotes.
|
||||||
|
|
||||||
|
## Build-system cleanup ideas
|
||||||
|
|
||||||
|
If this project gets another iteration, high-impact cleanup would be:
|
||||||
|
|
||||||
|
1. Replace or simplify the custom build chain (e.g., CMake/Meson or a smaller single-step script).
|
||||||
|
2. Separate third-party dependency build concerns from app build logic.
|
||||||
|
3. Remove hardcoded absolute defaults and make platform-specific commands explicit in config/docs.
|
||||||
|
4. Add a minimal regression test path for parsing/search behavior.
|
||||||
Reference in New Issue
Block a user