regex - How to clean a .srt file without removing numbers that appear in the closed captions through VIM? -
it known .srt
files structured in blocks having 3 underlying parts, example:
228 00:39:06,680 --> 00:39:13,460 lorem ipsum dolor sit amet
now, let suppose in closed captions there excerpts representing speech of speaker quoting literary opus of else, additional example:
228 00:39:06,680 --> 00:39:13,460 according erasmus, book 1, chapter 23...
problem: wish extract text .srt
deleting frame number, frame duration without erasing, however, cardinal numbers appear in closed captions quotations through vim.
attempts: using regular expression , substitute
command, have found way "delete" duration line :%s/\d\d:\d\d:\d\d,\d\d\d --> \d\d:\d\d:\d\d,\d\d\d/ /g
, numbers same idea, except searching each cardinal number entry option /gc
bypass amidst text.
however, have considerable amount of such quotations extract, cardinal number should maintained. selecting yes/no
entries turns tedious task.
since have lacking skill in using regex
, presume there is, @ least, less "ugly" manner perform strategy aforementioned. perhaps, more elegant way not delete unwanted portions, recover raw text without frame , duration lines, like:
lorem ipsum dolor sit met according erasmus, book 1, chapter 23...
someone knows how that?
- don't replace content of line nothing, delete line. instead of using
:s/pattern//g
, use:g/pattern/d
(see:help :g
) - anchor patterns using
^
,$
match lines consist entirely of thing want remove.
put together:
:g/^\d\+$/d :g/^\d\d:\d\d:\d\d,\d\d\d --> \d\d:\d\d:\d\d,\d\d\d$/d
(wow, that's lot of "d").
this still has possibility of nuking "line of dialog" consists only of digits, won't eat numbers in middle of line.
to better job suggest using little more fit-for-purpose vim — either programming language, or subtitle editor :)
Comments
Post a Comment