couple commits

This commit is contained in:
Krzosa Karol
2026-03-20 00:09:50 +01:00
parent bb3859b537
commit 3d91a1f924
17 changed files with 253 additions and 127 deletions

93
README.md Normal file
View File

@@ -0,0 +1,93 @@
# wasm_transcript_browser
This project generates a searchable transcript browser from local `.srt` files and ships it as a WebAssembly web app.
Type a word or phrase, click a match, and it opens the exact timestamp in the matching YouTube video. You can also copy a direct timestamped link.
## What it does
- Parses subtitle files (`.srt`) from a local folder.
- Generates a packed transcript index (`build/entries.inc`) at build time.
- Compiles a C codebase to `main.wasm` and serves it with a small HTML/JS shell.
- Provides instant text search over all transcript text.
- Creates timestamped YouTube links from search hits.
## How the pipeline works
1. `build_file.c` defines a hard-coded source folder:
- `folder_to_create_transcript_for`
2. During build, `src/prototype/prototype.meta.c`:
- scans `.srt` files in that folder,
- parses subtitle entries,
- normalizes text (lowercase, removes punctuation, turns `-` into spaces),
- writes generated data to `build/entries.inc`.
3. `src/prototype/main.c` includes that generated file and compiles to WASM.
4. The browser app (`package/index.html` + `package/main.wasm`) renders a custom UI and handles link open/copy actions.
## Transcript filename format
To build correct YouTube links, filenames are expected to include the 11-char YouTube ID wrapped in one character on each side at the end of the name (commonly brackets).
Example:
- `My Video Title [dQw4w9WgXcQ].en.srt`
- `My Video Title [dQw4w9WgXcQ].srt`
The app extracts the video ID from the ending token and creates links like:
- `https://youtu.be/<id>?feature=shared&t=<seconds>`
## Build
### Prerequisites
- `clang` (or `gcc`/MSVC depending on your platform)
- Python 3 (for local static server)
### Linux
```bash
./build.sh
```
### Windows
```bat
build.bat
```
Build output of interest:
- `build/entries.inc` (generated transcript index)
- `package/index.html`
- `package/main.wasm`
## Run locally
From `package/`:
```bash
python3 -m http.server 8080
```
Then open:
- `http://localhost:8080`
Windows helper script:
- `package/run_server.bat`
## Configuration notes
- Update transcript source folder in `build_file.c` before building.
- The current build file also contains hard-coded deploy commands (`ssh`/`scp`) in `build_prototype_wasm_target`; remove or update those for your own environment.
## Project status
This is an older personal project with a custom C build/codegen stack. The rough edges are expected, but the core idea works: local transcript ingestion + fast search + one-click timestamped YouTube links.