Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[youtube:tab] Fallback to API when webpage fails to download #1122

Merged
merged 63 commits into from
Oct 8, 2021
Merged
Changes from 1 commit
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
db89662
api-only channel extraction
coletdjnz Aug 11, 2021
54161df
add extractor arg and support mixes
coletdjnz Aug 12, 2021
841fe34
bug and linter
coletdjnz Aug 12, 2021
3549b38
use hacky getAccountSwitcherEndpoint to get account syncid and sessio…
coletdjnz Aug 13, 2021
59e5b21
linter
coletdjnz Aug 13, 2021
6d5344e
Merge remote-tracking branch 'upstream/master' into api-only-real
coletdjnz Aug 13, 2021
6ee625e
failing to extract resolve_url is usually due to playlist/channel not…
coletdjnz Aug 13, 2021
6126633
initial implementation of multi-js-player support (not tested)
coletdjnz Aug 18, 2021
7a05226
Fix player url related logic
coletdjnz Aug 19, 2021
11fde94
Revert "Fix player url related logic"
coletdjnz Aug 21, 2021
9f712a0
Revert "initial implementation of multi-js-player support (not tested)"
coletdjnz Aug 21, 2021
16af686
Use iframe API to get latest player version
coletdjnz Aug 21, 2021
534f3fb
don't unnecessary fetch for player js if the client doesn't require it
coletdjnz Aug 21, 2021
0c2fe14
remove unused self._player_url
coletdjnz Aug 21, 2021
7b6894c
linter and bug fix
coletdjnz Aug 21, 2021
ba2065b
revert accidental change
coletdjnz Aug 21, 2021
87019f0
add player_js=require/no_require to allow pure api-only mode
coletdjnz Aug 22, 2021
6fb686f
remove redundant check
coletdjnz Aug 22, 2021
371ca87
Add `is_authenticated` wrapper for sapisidhash function, and a warning.
coletdjnz Aug 22, 2021
f08e1fa
remove warning for now as it may cause issues
coletdjnz Aug 22, 2021
3c63303
[youtube] remove annotations and xsrf token extraction
coletdjnz Aug 23, 2021
263dd0b
Merge remote-tracking branch 'upstream/master' into api-only-real
coletdjnz Aug 25, 2021
aad471c
player_skip=player option
coletdjnz Sep 2, 2021
134c781
Merge remote-tracking branch 'upstream/master' into api-only-real
coletdjnz Sep 2, 2021
df861a6
cleanup
coletdjnz Sep 2, 2021
14ca1e8
Merge remote-tracking branch 'upstream/master' into api-only-real
coletdjnz Sep 26, 2021
0014400
move code to `_extract_tab_endpoint`
coletdjnz Sep 26, 2021
785b7d8
use is_authenticated property
coletdjnz Sep 26, 2021
efd5f28
create _extract_data to handle webpage to api fallbacks
coletdjnz Sep 27, 2021
37c274c
pass ytcfg instead of webpage
coletdjnz Sep 27, 2021
3355b33
sorry linter
coletdjnz Sep 27, 2021
f777387
Merge remote-tracking branch 'upstream/master' into api-only-real
coletdjnz Sep 27, 2021
7822665
cleanup which my IDE hates
coletdjnz Sep 27, 2021
5b49019
make extract_webpage non fatal
coletdjnz Sep 27, 2021
3554d68
improve extract_webpage
coletdjnz Sep 27, 2021
7ae1093
oops
coletdjnz Sep 27, 2021
710a44d
linter not happy :(
coletdjnz Sep 27, 2021
464d93a
use check_get_keys
coletdjnz Sep 27, 2021
b93716a
set up base for mp to OLAK resolve
coletdjnz Sep 27, 2021
a613940
fixup resolve MP to OLAK playlist
coletdjnz Sep 29, 2021
7120ce8
fixup
coletdjnz Sep 29, 2021
12ad0ac
only use api to resolve MP playlists
coletdjnz Sep 29, 2021
f663c36
linter
coletdjnz Sep 29, 2021
0b1eba2
minor change
coletdjnz Sep 29, 2021
615ce58
function name change
coletdjnz Sep 29, 2021
042d049
if ytcfg exists then authentication should work
coletdjnz Sep 29, 2021
4f71ecd
Merge remote-tracking branch 'origin/api-only-real' into tab-api-fall…
coletdjnz Sep 29, 2021
2920649
remove exception that I don't think is used anymore
coletdjnz Sep 29, 2021
8d57066
minor tweak
coletdjnz Sep 29, 2021
66ab5e6
errors from API pages are expected most of the time
coletdjnz Sep 30, 2021
9541b78
Fix home page extraction and improve visitorData handling
coletdjnz Sep 30, 2021
76c5d1c
visitorData should be extracted from the latest data
coletdjnz Sep 30, 2021
09bf547
very minor bracket cleanup
coletdjnz Sep 30, 2021
6510871
Merge remote-tracking branch 'upstream/master' into tab-api-fallback
coletdjnz Oct 2, 2021
01c28fa
fix extract_from_tabs thumbnails and tags being same list
coletdjnz Oct 2, 2021
cfb5b69
Lists -> Playlists
coletdjnz Oct 2, 2021
b5c47f3
Failing to resolve album to playlist should be fatal
coletdjnz Oct 2, 2021
2451c74
Merge remote-tracking branch 'origin/tab-api-fallback' into tab-api-f…
coletdjnz Oct 2, 2021
d6e21bc
visitor data doc improve
coletdjnz Oct 2, 2021
465e71a
webpage: break like extract_response when receive an error instead of…
coletdjnz Oct 2, 2021
1712bf4
Add tests and raise error when no endpoint resolved
coletdjnz Oct 4, 2021
f294ced
Add documentation
coletdjnz Oct 5, 2021
7e90581
allow non-fatal exit of `extract_tab_endpoint`
coletdjnz Oct 5, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[youtube] remove annotations and xsrf token extraction
  • Loading branch information
coletdjnz committed Aug 23, 2021
commit 3c63303e099b5e7189d43202ad9e0cb67c552eac
35 changes: 1 addition & 34 deletions yt_dlp/extractor/youtube.py
Original file line number Diff line number Diff line change
Expand Up @@ -3217,40 +3217,7 @@ def process_language(container, base_url, lang_code, sub_name, query):
needs_auth=info['age_limit'] >= 18,
is_unlisted=None if is_private is None else is_unlisted)

# get xsrf for annotations or comments
get_annotations = self.get_param('writeannotations', False)
get_comments = self.get_param('getcomments', False)
if get_annotations or get_comments:
xsrf_token = None
if master_ytcfg:
xsrf_token = try_get(master_ytcfg, lambda x: x['XSRF_TOKEN'], compat_str)
if not xsrf_token:
xsrf_token = self._search_regex(
r'([\'"])XSRF_TOKEN\1\s*:\s*([\'"])(?P<xsrf_token>(?:(?!\2).)+)\2',
webpage, 'xsrf token', group='xsrf_token', fatal=False)

# annotations
if get_annotations:
invideo_url = get_first(
player_responses,
('annotations', 0, 'playerAnnotationsUrlsRenderer', 'invideoUrl'),
expected_type=str)
if xsrf_token and invideo_url:
xsrf_field_name = None
if master_ytcfg:
xsrf_field_name = try_get(master_ytcfg, lambda x: x['XSRF_FIELD_NAME'], compat_str)
if not xsrf_field_name:
xsrf_field_name = self._search_regex(
r'([\'"])XSRF_FIELD_NAME\1\s*:\s*([\'"])(?P<xsrf_field_name>\w+)\2',
webpage, 'xsrf field name',
group='xsrf_field_name', default='session_token')
info['annotations'] = self._download_webpage(
self._proto_relative_url(invideo_url),
video_id, note='Downloading annotations',
errnote='Unable to download video annotations', fatal=False,
data=urlencode_postdata({xsrf_field_name: xsrf_token}))

if get_comments:
if self.get_param('getcomments', False):
info['__post_extractor'] = lambda: self._extract_comments(master_ytcfg, video_id, contents, webpage)

self.mark_watched(video_id, player_responses)
Expand Down