Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[youtube] Support --download-sections with formats --live-from-start #6498

Open
wants to merge 70 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
2fbd6de
[utils] Add hackish 'now' support for --download-sections
elyse0 Mar 9, 2023
439be2b
[utils] Add microseconds to unified_timestamp
Oct 21, 2022
367429e
[common] Extract start and end keys for Dash fragments
elyse0 Mar 9, 2023
1799a6a
[utils] Allow using local timezone for 'now' timestamps
elyse0 Mar 9, 2023
fdb9aaf
Use local timezone for download sections
elyse0 Mar 9, 2023
b83d752
Add fixme in modified parse_chapters function
elyse0 Mar 9, 2023
fba1c39
[youtube] Support --download-sections for YT Livestream from start
elyse0 Mar 9, 2023
e42e256
Create last_segment_url only if necessary
elyse0 Mar 10, 2023
317ba03
Improve parse_chapters comments
elyse0 Mar 10, 2023
9327587
Fix linter
elyse0 Mar 10, 2023
e3b08ba
[extractor/iq] Set more language codes (#6476)
D0LLYNH0 Mar 9, 2023
8df4707
[extractor/opencast] Add ltitools to `_VALID_URL` (#6371)
C0D3D3V Mar 9, 2023
c376b95
[downloader/curl] Fix progress reporting
pukkandan Mar 9, 2023
2d16554
[extractor/youtube] Bypass throttling for `-f17`
pukkandan Mar 9, 2023
c222f6c
[extractor/twitch] Fix `is_live` (#6500)
elyse0 Mar 10, 2023
96f5d29
[extractor/cbc:gem] Update `_VALID_URL` (#6499)
makew0rld Mar 10, 2023
34d3df7
Support loading info.json with a list at it's root
pukkandan Mar 10, 2023
9f717b6
[extractor/hidive] Fix login
pukkandan Mar 10, 2023
e3ffdf7
[extractor/opencast] Fix format bug (#6512)
C0D3D3V Mar 11, 2023
f137666
[extractor/rokfin] Re-construct manifest url (#6507)
vampirefrog Mar 11, 2023
db62ffd
[extractor/youtube] Add client name to `format_note` when `-v` (#6254)
Lesmiscore Mar 11, 2023
5ef1a92
[extractor/youtube] Add extractor-arg `include_duplicate_formats`
pukkandan Mar 9, 2023
9fc70f3
[extractor/youtube] Construct fragment list lazily
pukkandan Mar 11, 2023
e6e2eb0
Support negative durations
elyse0 Mar 10, 2023
e40132d
Revert "[utils] Allow using local timezone for 'now' timestamps"
elyse0 Mar 10, 2023
0ed9a73
Add fragment count
elyse0 Mar 10, 2023
a43ba2e
Fix unified_timestamp
elyse0 Mar 12, 2023
cdac764
Remove tz_aware date code
elyse0 Mar 12, 2023
fbae888
Add debug for selected section
elyse0 Mar 12, 2023
3faa1e3
Add initial documentation
elyse0 Mar 12, 2023
79ae58a
Fix linter
elyse0 Mar 12, 2023
5e4699a
Fix linter
elyse0 Mar 12, 2023
6cea8cb
Merge remote-tracking branch 'origin/master' into pr/6498
pukkandan Mar 12, 2023
544836d
Allow days in parse_duration
elyse0 Mar 12, 2023
b131f3d
Improve option documentation
elyse0 Mar 12, 2023
2fbe185
Add some documentation
elyse0 Mar 12, 2023
01f672f
Lock less agressively
elyse0 Mar 18, 2023
129555b
Fix return values of _extract_sequence_from_mpd
elyse0 Mar 18, 2023
128d304
Always compute last_seq
elyse0 Apr 19, 2023
7f93eb7
Support for epoch timestamps
elyse0 May 7, 2023
78285ee
Update options docs
elyse0 May 7, 2023
4e93198
Restore README.md
elyse0 May 7, 2023
444e02e
Merge remote-tracking branch 'origin/master' into yt-live-from-start-…
elyse0 May 7, 2023
8ee942a
Add warning about --download-sections without --live-from-start
elyse0 May 13, 2023
1f79746
Merge remote-tracking branch 'origin' into yt-live-from-start-range
elyse0 Jun 3, 2023
99e6074
Merge remote-tracking branch 'origin' into yt-live-from-start-range
elyse0 Jun 24, 2023
622c555
Fix bug after merge
elyse0 Jun 24, 2023
1416cee
Update yt_dlp/options.py
bashonly Jul 22, 2023
194bc49
Merge branch 'yt-dlp:master' into pr/6498
bashonly Jul 22, 2023
bd73047
Cleanup
bashonly Jul 22, 2023
2741b58
Merge remote-tracking branch 'origin' into yt-live-from-start-range
elyse0 Oct 8, 2023
fb2b57a
Merge remote-tracking branch 'github/yt-live-from-start-range' into y…
elyse0 Oct 8, 2023
5156a16
Merge branch 'master' into yt-live-from-start-range
bashonly Jan 19, 2024
6fc6349
Merge branch 'master' into yt-live-from-start-range
bashonly Feb 29, 2024
50c943e
Merge branch 'yt-dlp:master' into pr/yt-live-from-start-range
bashonly Mar 19, 2024
cf96b24
Merge branch 'master' into yt-live-from-start-range
bashonly Apr 16, 2024
172dfbe
Merge branch 'yt-dlp:master' into pr/live-sections
bashonly May 10, 2024
54ad67d
Merge branch 'yt-dlp:master' into pr/live-sections
bashonly May 23, 2024
6a84199
Merge branch 'yt-dlp:master' into pr/live-sections
bashonly May 28, 2024
6208f7b
Merge branch 'master' into yt-live-from-start-range
bashonly Jun 12, 2024
66a6e0a
Merge branch 'yt-dlp:master' into pr/live-sections
bashonly Jul 8, 2024
724a6cb
Merge branch 'yt-dlp:master' into pr/live-sections
bashonly Jul 11, 2024
4f1af12
Merge branch 'master' into pr/live-sections
bashonly Jul 21, 2024
c0be43d
Merge branch 'yt-dlp:master' into pr/live-sections
bashonly Jul 25, 2024
160d973
Merge branch 'master' into yt-live-from-start-range
bashonly Aug 17, 2024
9dd8574
Merge branch 'master' into yt-live-from-start-range
bashonly Oct 9, 2024
e05694e
Merge branch 'yt-dlp:master' into pr/live-sections
bashonly Oct 26, 2024
670bafe
Merge branch 'yt-dlp:master' into pr/live-sections
bashonly Nov 7, 2024
03eee53
Merge branch 'yt-dlp:master' into pr/live-sections
bashonly Dec 6, 2024
7d101c8
Merge branch 'yt-dlp:master' into pr/live-sections
bashonly Dec 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions test/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -452,10 +452,15 @@ def test_unified_timestamps(self):
self.assertEqual(unified_timestamp('2018-03-14T08:32:43.1493874+00:00'), 1521016363)
self.assertEqual(unified_timestamp('Sunday, 26 Nov 2006, 19:00'), 1164567600)
self.assertEqual(unified_timestamp('wed, aug 16, 2008, 12:00pm'), 1218931200)
self.assertEqual(unified_timestamp('2022-10-13T02:37:47.831Z'), 1665628667)

self.assertEqual(unified_timestamp('December 31 1969 20:00:01 EDT'), 1)
self.assertEqual(unified_timestamp('Wednesday 31 December 1969 18:01:26 MDT'), 86)
self.assertEqual(unified_timestamp('12/31/1969 20:01:18 EDT', False), 78)
self.assertEqual(unified_timestamp('2023-03-09T18:01:33.646Z', with_milliseconds=True), 1678384893.646)
# ISO8601 spec says that if no timezone is specified, we should use local timezone;
# but yt-dlp uses UTC to keep things consistent
self.assertEqual(unified_timestamp('2023-03-11T06:48:34.008'), 1678517314)

def test_determine_ext(self):
self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4')
Expand Down
9 changes: 7 additions & 2 deletions yt_dlp/YoutubeDL.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,12 @@
from .compat import urllib # isort: split
from .compat import urllib_req_to_req
from .cookies import CookieLoadError, LenientSimpleCookie, load_cookies
from .downloader import FFmpegFD, get_suitable_downloader, shorten_protocol_name
from .downloader import (
DashSegmentsFD,
FFmpegFD,
get_suitable_downloader,
shorten_protocol_name,
)
from .downloader.rtmp import rtmpdump_version
from .extractor import gen_extractor_classes, get_info_extractor
from .extractor.common import UnsupportedURLIE
Expand Down Expand Up @@ -3373,7 +3378,7 @@ def existing_video_file(*filepaths):
fd, success = None, True
if info_dict.get('protocol') or info_dict.get('url'):
fd = get_suitable_downloader(info_dict, self.params, to_stdout=temp_filename == '-')
if fd != FFmpegFD and 'no-direct-merge' not in self.params['compat_opts'] and (
if fd not in [FFmpegFD, DashSegmentsFD] and 'no-direct-merge' not in self.params['compat_opts'] and (
info_dict.get('section_start') or info_dict.get('section_end')):
msg = ('This format cannot be partially downloaded' if FFmpegFD.available()
else 'You have requested downloading the video partially, but ffmpeg is not installed')
Expand Down
11 changes: 9 additions & 2 deletions yt_dlp/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
import optparse
import os
import re
import time
import traceback

from .cookies import SUPPORTED_BROWSERS, SUPPORTED_KEYRINGS, CookieLoadError
Expand Down Expand Up @@ -340,12 +341,13 @@ def parse_chapters(name, value, advanced=False):
(?P<end_sign>-?)(?P<end>[^-]+)
)?'''

current_time = time.time()
chapters, ranges, from_url = [], [], False
for regex in value or []:
if advanced and regex == '*from-url':
from_url = True
continue
elif not regex.startswith('*'):
elif not regex.startswith('*') and not regex.startswith('#'):
try:
chapters.append(re.compile(regex))
except re.error as err:
Expand All @@ -362,11 +364,16 @@ def parse_chapters(name, value, advanced=False):
err = 'Must be of the form "*start-end"'
elif not advanced and any(signs):
err = 'Negative timestamps are not allowed'
else:
elif regex.startswith('*'):
dur[0] *= -1 if signs[0] else 1
dur[1] *= -1 if signs[1] else 1
if dur[1] == float('-inf'):
err = '"-inf" is not a valid end'
elif regex.startswith('#'):
dur[0] = dur[0] * (-1 if signs[0] else 1) + current_time
dur[1] = dur[1] * (-1 if signs[1] else 1) + current_time
if dur[1] == float('-inf'):
err = '"-inf" is not a valid end'
if err:
raise ValueError(f'invalid {name} time range "{regex}". {err}')
ranges.append(dur)
Expand Down
2 changes: 2 additions & 0 deletions yt_dlp/downloader/dash.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ def real_download(self, filename, info_dict):
'filename': fmt.get('filepath') or filename,
'live': 'is_from_start' if fmt.get('is_from_start') else fmt.get('is_live'),
'total_frags': fragment_count,
'section_start': info_dict.get('section_start'),
'section_end': info_dict.get('section_end'),
}

if real_downloader:
Expand Down
20 changes: 17 additions & 3 deletions yt_dlp/extractor/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -2747,7 +2747,7 @@ def extract_common(source):
r = int(s.get('r', 0))
ms_info['total_number'] += 1 + r
ms_info['s'].append({
't': int(s.get('t', 0)),
't': int_or_none(s.get('t')),
# @d is mandatory (see [1, 5.3.9.6.2, Table 17, page 60])
'd': int(s.attrib['d']),
'r': r,
Expand Down Expand Up @@ -2789,8 +2789,14 @@ def extract_Initialization(source):
return ms_info

mpd_duration = parse_duration(mpd_doc.get('mediaPresentationDuration'))
availability_start_time = unified_timestamp(
mpd_doc.get('availabilityStartTime'), with_milliseconds=True) or 0
stream_numbers = collections.defaultdict(int)
for period_idx, period in enumerate(mpd_doc.findall(_add_ns('Period'))):
# segmentIngestTime is completely out of spec, but YT Livestream do this
segment_ingest_time = period.get('{http://youtube.com/yt/2012/10/10}segmentIngestTime')
if segment_ingest_time:
availability_start_time = unified_timestamp(segment_ingest_time, with_milliseconds=True)
period_entry = {
'id': period.get('id', f'period-{period_idx}'),
'formats': [],
Expand Down Expand Up @@ -2969,13 +2975,17 @@ def add_segment_url():
'Bandwidth': bandwidth,
'Number': segment_number,
}
duration = float_or_none(segment_d, representation_ms_info['timescale'])
start = float_or_none(segment_time, representation_ms_info['timescale'])
representation_ms_info['fragments'].append({
media_location_key: segment_url,
'duration': float_or_none(segment_d, representation_ms_info['timescale']),
'duration': duration,
'start': availability_start_time + start,
'end': availability_start_time + start + duration,
})

for s in representation_ms_info['s']:
segment_time = s.get('t') or segment_time
segment_time = s['t'] if s.get('t') is not None else segment_time
segment_d = s['d']
add_segment_url()
segment_number += 1
Expand All @@ -2991,15 +3001,19 @@ def add_segment_url():
fragments = []
segment_index = 0
timescale = representation_ms_info['timescale']
start = 0
for s in representation_ms_info['s']:
duration = float_or_none(s['d'], timescale)
for _ in range(s.get('r', 0) + 1):
segment_uri = representation_ms_info['segment_urls'][segment_index]
fragments.append({
location_key(segment_uri): segment_uri,
'duration': duration,
'start': availability_start_time + start,
'end': availability_start_time + start + duration,
})
segment_index += 1
start += duration
representation_ms_info['fragments'] = fragments
elif 'segment_urls' in representation_ms_info:
# Segment URLs with no SegmentTimeline
Expand Down
68 changes: 50 additions & 18 deletions yt_dlp/extractor/youtube.py
Original file line number Diff line number Diff line change
Expand Up @@ -2870,17 +2870,17 @@ def refetch_manifest(format_id, delay):
microformats = traverse_obj(
prs, (..., 'microformat', 'playerMicroformatRenderer'),
expected_type=dict)
_, live_status, _, formats, _ = self._list_formats(video_id, microformats, video_details, prs, player_url)
is_live = live_status == 'is_live'
start_time = time.time()
with lock:
_, live_status, _, formats, _ = self._list_formats(video_id, microformats, video_details, prs, player_url)
is_live = live_status == 'is_live'
start_time = time.time()

def mpd_feed(format_id, delay):
"""
@returns (manifest_url, manifest_stream_number, is_live) or None
"""
for retry in self.RetryManager(fatal=False):
with lock:
refetch_manifest(format_id, delay)
refetch_manifest(format_id, delay)

f = next((f for f in formats if f['format_id'] == format_id), None)
if not f:
Expand Down Expand Up @@ -2911,6 +2911,11 @@ def _live_dash_fragments(self, video_id, format_id, live_start_time, mpd_feed, m
begin_index = 0
download_start_time = ctx.get('start') or time.time()

section_start = ctx.get('section_start') or 0
section_end = ctx.get('section_end') or math.inf

self.write_debug(f'Selected section: {section_start} -> {section_end}')

lack_early_segments = download_start_time - (live_start_time or download_start_time) > MAX_DURATION
if lack_early_segments:
self.report_warning(bug_reports_message(
Expand All @@ -2931,9 +2936,10 @@ def _extract_sequence_from_mpd(refresh_sequence, immediate):
or (mpd_url, stream_number, False))
if not refresh_sequence:
if expire_fast and not is_live:
return False, last_seq
return False
elif old_mpd_url == mpd_url:
return True, last_seq
return True

if manifestless_orig_fmt:
fmt_info = manifestless_orig_fmt
else:
Expand All @@ -2944,14 +2950,13 @@ def _extract_sequence_from_mpd(refresh_sequence, immediate):
fmts = None
if not fmts:
no_fragment_score += 2
return False, last_seq
return False
fmt_info = next(x for x in fmts if x['manifest_stream_number'] == stream_number)
fragments = fmt_info['fragments']
fragment_base_url = fmt_info['fragment_base_url']
assert fragment_base_url

_last_seq = int(re.search(r'(?:/|^)sq/(\d+)', fragments[-1]['path']).group(1))
return True, _last_seq
return True

self.write_debug(f'[{video_id}] Generating fragments for format {format_id}')
while is_live:
Expand All @@ -2971,11 +2976,19 @@ def _extract_sequence_from_mpd(refresh_sequence, immediate):
last_segment_url = None
continue
else:
should_continue, last_seq = _extract_sequence_from_mpd(True, no_fragment_score > 15)
should_continue = _extract_sequence_from_mpd(True, no_fragment_score > 15)
no_fragment_score += 2
if not should_continue:
continue

last_fragment = fragments[-1]
last_seq = int(re.search(r'(?:/|^)sq/(\d+)', fragments[-1]['path']).group(1))

known_fragment = next(
(fragment for fragment in fragments if f'sq/{known_idx}' in fragment['path']), None)
if known_fragment and known_fragment['end'] > section_end:
break

if known_idx > last_seq:
last_segment_url = None
continue
Expand All @@ -2985,20 +2998,36 @@ def _extract_sequence_from_mpd(refresh_sequence, immediate):
if begin_index < 0 and known_idx < 0:
# skip from the start when it's negative value
known_idx = last_seq + begin_index

if lack_early_segments:
known_idx = max(known_idx, last_seq - int(MAX_DURATION // fragments[-1]['duration']))
known_idx = max(known_idx, last_seq - int(MAX_DURATION // last_fragment['duration']))

fragment_count = last_seq - known_idx if section_end == math.inf else int(
(section_end - section_start) // last_fragment['duration'])

try:
for idx in range(known_idx, last_seq):
# do not update sequence here or you'll get skipped some part of it
should_continue, _ = _extract_sequence_from_mpd(False, False)
should_continue = _extract_sequence_from_mpd(False, False)
if not should_continue:
known_idx = idx - 1
raise ExtractorError('breaking out of outer loop')
last_segment_url = urljoin(fragment_base_url, f'sq/{idx}')
yield {
'url': last_segment_url,
'fragment_count': last_seq,
}

frag_duration = last_fragment['duration']
frag_start = last_fragment['start'] - (last_seq - idx) * frag_duration
frag_end = frag_start + frag_duration

if frag_start >= section_start and frag_end <= section_end:
last_segment_url = urljoin(fragment_base_url, f'sq/{idx}')

yield {
'url': last_segment_url,
'fragment_count': fragment_count,
'duration': frag_duration,
'start': frag_start,
'end': frag_end,
}

if known_idx == last_seq:
no_fragment_score += 5
else:
Expand Down Expand Up @@ -4199,6 +4228,9 @@ def build_fragments(f):
dct['downloader_options'] = {'http_chunk_size': CHUNK_SIZE}
yield dct

if live_status == 'is_live' and self.get_param('download_ranges') and not self.get_param('live_from_start'):
self.report_warning('For YT livestreams, --download-sections is only supported with --live-from-start')

needs_live_processing = self._needs_live_processing(live_status, duration)
skip_bad_formats = 'incomplete' not in format_types
if self._configuration_arg('include_incomplete_formats'):
Expand Down
9 changes: 8 additions & 1 deletion yt_dlp/options.py
Original file line number Diff line number Diff line change
Expand Up @@ -429,7 +429,14 @@ def _alias_callback(option, opt_str, value, parser, opts, nargs):
general.add_option(
'--live-from-start',
action='store_true', dest='live_from_start',
help='Download livestreams from the start. Currently only supported for YouTube (Experimental)')
help=('Download livestreams from the start. Currently only supported for YouTube (Experimental). '
'Time ranges can be specified using --download-sections to download only a part of the stream. '
'Negative values are allowed for specifying a relative previous time, using the # syntax '
'e.g. --download-sections "#-24hours - 0" (download last 24 hours), '
'e.g. --download-sections "#-1h - 30m" (download from 1 hour ago until the next 30 minutes), '
'e.g. --download-sections "#-3days - -2days" (download from 3 days ago until 2 days ago). '
'It is also possible to specify an exact unix timestamp range, using the * syntax, '
'e.g. --download-sections "*1672531200 - 1672549200" (download between those two timestamps)'))
general.add_option(
'--no-live-from-start',
action='store_false', dest='live_from_start',
Expand Down
21 changes: 13 additions & 8 deletions yt_dlp/utils/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -1250,7 +1250,7 @@ def unified_strdate(date_str, day_first=True):
return str(upload_date)


def unified_timestamp(date_str, day_first=True):
def unified_timestamp(date_str, day_first=True, with_milliseconds=False):
if not isinstance(date_str, str):
return None

Expand All @@ -1276,7 +1276,7 @@ def unified_timestamp(date_str, day_first=True):
for expression in date_formats(day_first):
with contextlib.suppress(ValueError):
dt_ = dt.datetime.strptime(date_str, expression) - timezone + dt.timedelta(hours=pm_delta)
return calendar.timegm(dt_.timetuple())
return calendar.timegm(dt_.timetuple()) + (dt_.microsecond / 1e6 if with_milliseconds else 0)

timetuple = email.utils.parsedate_tz(date_str)
if timetuple:
Expand Down Expand Up @@ -2071,16 +2071,19 @@ def parse_duration(s):

days, hours, mins, secs, ms = [None] * 5
m = re.match(r'''(?x)
(?P<sign>[+-])?
(?P<before_secs>
(?:(?:(?P<days>[0-9]+):)?(?P<hours>[0-9]+):)?(?P<mins>[0-9]+):)?
(?P<secs>(?(before_secs)[0-9]{1,2}|[0-9]+))
(?P<ms>[.:][0-9]+)?Z?$
''', s)
if m:
days, hours, mins, secs, ms = m.group('days', 'hours', 'mins', 'secs', 'ms')
sign, days, hours, mins, secs, ms = m.group('sign', 'days', 'hours', 'mins', 'secs', 'ms')
else:
m = re.match(
r'''(?ix)(?:P?
r'''(?ix)(?:
(?P<sign>[+-])?
P?
(?:
[0-9]+\s*y(?:ears?)?,?\s*
)?
Expand All @@ -2104,17 +2107,19 @@ def parse_duration(s):
(?P<secs>[0-9]+)(?P<ms>\.[0-9]+)?\s*s(?:ec(?:ond)?s?)?\s*
)?Z?$''', s)
if m:
days, hours, mins, secs, ms = m.groups()
sign, days, hours, mins, secs, ms = m.groups()
else:
m = re.match(r'(?i)(?:(?P<hours>[0-9.]+)\s*(?:hours?)|(?P<mins>[0-9.]+)\s*(?:mins?\.?|minutes?)\s*)Z?$', s)
m = re.match(r'(?i)(?P<sign>[+-])?(?:(?P<days>[0-9.]+)\s*(?:days?)|(?P<hours>[0-9.]+)\s*(?:hours?)|(?P<mins>[0-9.]+)\s*(?:mins?\.?|minutes?)\s*)Z?$', s)
if m:
hours, mins = m.groups()
sign, days, hours, mins = m.groups()
else:
return None

sign = -1 if sign == '-' else 1

if ms:
ms = ms.replace(':', '.')
return sum(float(part or 0) * mult for part, mult in (
return sign * sum(float(part or 0) * mult for part, mult in (
(days, 86400), (hours, 3600), (mins, 60), (secs, 1), (ms, 1)))


Expand Down
Loading