Krzosa Karol 0f443282a9 Update README
2026-03-19 23:40:11 +01:00
2026-03-19 23:31:39 +01:00
2026-03-19 23:31:39 +01:00
2024-06-19 06:51:06 +02:00
2026-03-19 23:40:11 +01:00

transcript browser

This project started as a transcript browser for subtitle/text files and later diverged into an attempt to index and search PDF content. The transcript side is the core that works best right now; PDF support exists but is still rough.

Current status

  • Personal/experimental codebase, now being published as part of a project archive.
  • Primary value: fast local search across subtitle/text-like files with quick jump-to-source actions.
  • PDF indexing was added later and is incomplete.
  • Build setup works, but is currently clunky and hard to follow.

What the app does

  • Loads files from a folder (.srt, .txt, .html, .pdf).
  • Indexes file content into memory.
  • Lets you search from a single query box.
  • Shows matching snippets.
  • Opens results in external tools:
    • .srt -> media player at subtitle timestamp
    • .txt/.html -> text editor
    • .pdf -> PDF viewer at page

What is not great yet

  • PDF parsing quality: extraction is token-based and does not robustly handle Unicode/text layout.
  • Build system readability: custom two-stage build flow with many hardcoded source/library entries.
  • Platform assumptions: strongly Windows-oriented defaults (paths, commands, Win32 backend).
  • Some UX/engineering TODOs remain: error handling and configuration polish are still in progress.

Repository map (excluding external modules)

  • build.bat - bootstrap script for the custom build tool.
  • build_file.cpp - project-specific build recipe (compiles app and dependency objects).
  • src/transcript_browser/main.cpp - UI/event loop and app entry point.
  • src/transcript_browser/loading_thread.cpp - folder scanning + parsing jobs.
  • src/transcript_browser/searching_thread.cpp - asynchronous query matching.
  • src/transcript_browser/read_srt.cpp - SRT parsing.
  • src/transcript_browser/read_pdf.cpp - PDF text extraction attempt.
  • src/transcript_browser/config.cpp - config parsing/serialization and launch commands.
  • src/basic/ - shared utilities (arena, arrays, filesystem/process/thread helpers).
  • src/build_tool/ - custom build tool sources.

Build and run (current flow)

This project currently expects a Windows + MSVC environment.

  1. Open a Developer Command Prompt (so cl.exe is available).
  2. From repo root, run:
build.bat
  1. Run the built executable from build/:
build\transcript_browser.exe

Notes:

  • build.bat first builds build/build_tool.exe (if missing), then executes it.
  • The build tool compiles and runs build_file.cpp to produce transcript_browser.exe.
  • Build outputs and object files are placed in build/.

Runtime usage

  • Start the app.
  • In the input field, load a folder with:
read=C:/path/to/folder
  • Press Enter to enqueue parsing.
  • Type any query to search loaded content.
  • Use:
    • F1 to toggle loaded files view
    • F2 to edit config commands

Configuration

The app stores config next to the executable as transcript_browser.config.

Keys:

  • SRTCommand
  • PDFCommand
  • TXTCommand
  • ReadOnStart

Supported placeholders used in commands include:

  • {video}
  • {time_in_seconds}
  • {file}
  • {page}
  • {line}

If a path contains spaces, wrap it in quotes.

Description
This project started as a transcript browser for subtitle/text files and later diverged into an attempt to index and search PDF content.
Readme 8.4 MiB
Languages
C 69.4%
C++ 30.6%