transcript browser
This project started as a transcript browser for subtitle/text files and later diverged into an attempt to index and search PDF content. The transcript side is the core that works best right now; PDF support exists but is still rough.
Current status
- Personal/experimental codebase, now being published as part of a project archive.
- Primary value: fast local search across subtitle/text-like files with quick jump-to-source actions.
- PDF indexing was added later and is incomplete.
- Build setup works, but is currently clunky and hard to follow.
What the app does
- Loads files from a folder (
.srt,.txt,.html,.pdf). - Indexes file content into memory.
- Lets you search from a single query box.
- Shows matching snippets.
- Opens results in external tools:
.srt-> media player at subtitle timestamp.txt/.html-> text editor.pdf-> PDF viewer at page
What is not great yet
- PDF parsing quality: extraction is token-based and does not robustly handle Unicode/text layout.
- Build system readability: custom two-stage build flow with many hardcoded source/library entries.
- Platform assumptions: strongly Windows-oriented defaults (paths, commands, Win32 backend).
- Some UX/engineering TODOs remain: error handling and configuration polish are still in progress.
Repository map (excluding external modules)
build.bat- bootstrap script for the custom build tool.build_file.cpp- project-specific build recipe (compiles app and dependency objects).src/transcript_browser/main.cpp- UI/event loop and app entry point.src/transcript_browser/loading_thread.cpp- folder scanning + parsing jobs.src/transcript_browser/searching_thread.cpp- asynchronous query matching.src/transcript_browser/read_srt.cpp- SRT parsing.src/transcript_browser/read_pdf.cpp- PDF text extraction attempt.src/transcript_browser/config.cpp- config parsing/serialization and launch commands.src/basic/- shared utilities (arena, arrays, filesystem/process/thread helpers).src/build_tool/- custom build tool sources.
Build and run (current flow)
This project currently expects a Windows + MSVC environment.
- Open a Developer Command Prompt (so
cl.exeis available). - From repo root, run:
build.bat
- Run the built executable from
build/:
build\transcript_browser.exe
Notes:
build.batfirst buildsbuild/build_tool.exe(if missing), then executes it.- The build tool compiles and runs
build_file.cppto producetranscript_browser.exe. - Build outputs and object files are placed in
build/.
Runtime usage
- Start the app.
- In the input field, load a folder with:
read=C:/path/to/folder
- Press Enter to enqueue parsing.
- Type any query to search loaded content.
- Use:
F1to toggle loaded files viewF2to edit config commands
Configuration
The app stores config next to the executable as transcript_browser.config.
Keys:
SRTCommandPDFCommandTXTCommandReadOnStart
Supported placeholders used in commands include:
{video}{time_in_seconds}{file}{page}{line}
If a path contains spaces, wrap it in quotes.
Description
This project started as a transcript browser for subtitle/text files and later diverged into an attempt to index and search PDF content.
Languages
C
69.4%
C++
30.6%