94 lines
2.5 KiB
Markdown
94 lines
2.5 KiB
Markdown
# wasm_transcript_browser
|
|
|
|
This project generates a searchable transcript browser from local `.srt` files and ships it as a WebAssembly web app.
|
|
|
|
Type a word or phrase, click a match, and it opens the exact timestamp in the matching YouTube video. You can also copy a direct timestamped link.
|
|
|
|
## What it does
|
|
|
|
- Parses subtitle files (`.srt`) from a local folder.
|
|
- Generates a packed transcript index (`build/entries.inc`) at build time.
|
|
- Compiles a C codebase to `main.wasm` and serves it with a small HTML/JS shell.
|
|
- Provides instant text search over all transcript text.
|
|
- Creates timestamped YouTube links from search hits.
|
|
|
|
## How the pipeline works
|
|
|
|
1. `build_file.c` defines a hard-coded source folder:
|
|
|
|
- `folder_to_create_transcript_for`
|
|
|
|
2. During build, `src/prototype/prototype.meta.c`:
|
|
|
|
- scans `.srt` files in that folder,
|
|
- parses subtitle entries,
|
|
- normalizes text (lowercase, removes punctuation, turns `-` into spaces),
|
|
- writes generated data to `build/entries.inc`.
|
|
|
|
3. `src/prototype/main.c` includes that generated file and compiles to WASM.
|
|
|
|
4. The browser app (`package/index.html` + `package/main.wasm`) renders a custom UI and handles link open/copy actions.
|
|
|
|
## Transcript filename format
|
|
|
|
To build correct YouTube links, filenames are expected to include the 11-char YouTube ID wrapped in one character on each side at the end of the name (commonly brackets).
|
|
|
|
Example:
|
|
|
|
- `My Video Title [dQw4w9WgXcQ].en.srt`
|
|
- `My Video Title [dQw4w9WgXcQ].srt`
|
|
|
|
The app extracts the video ID from the ending token and creates links like:
|
|
|
|
- `https://youtu.be/<id>?feature=shared&t=<seconds>`
|
|
|
|
## Build
|
|
|
|
### Prerequisites
|
|
|
|
- `clang` (or `gcc`/MSVC depending on your platform)
|
|
- Python 3 (for local static server)
|
|
|
|
### Linux
|
|
|
|
```bash
|
|
./build.sh
|
|
```
|
|
|
|
### Windows
|
|
|
|
```bat
|
|
build.bat
|
|
```
|
|
|
|
Build output of interest:
|
|
|
|
- `build/entries.inc` (generated transcript index)
|
|
- `package/index.html`
|
|
- `package/main.wasm`
|
|
|
|
## Run locally
|
|
|
|
From `package/`:
|
|
|
|
```bash
|
|
python3 -m http.server 8080
|
|
```
|
|
|
|
Then open:
|
|
|
|
- `http://localhost:8080`
|
|
|
|
Windows helper script:
|
|
|
|
- `package/run_server.bat`
|
|
|
|
## Configuration notes
|
|
|
|
- Update transcript source folder in `build_file.c` before building.
|
|
- The current build file also contains hard-coded deploy commands (`ssh`/`scp`) in `build_prototype_wasm_target`; remove or update those for your own environment.
|
|
|
|
## Project status
|
|
|
|
This is an older personal project with a custom C build/codegen stack. The rough edges are expected, but the core idea works: local transcript ingestion + fast search + one-click timestamped YouTube links.
|