mirror of
https://github.com/ytdl-org/youtube-dl.git
synced 2025-01-18 16:52:16 +00:00
CONTRIBUTING.md: text improvements for common mistakes
This commit is contained in:
parent
427cc21531
commit
5347dda5e2
116
CONTRIBUTING.md
116
CONTRIBUTING.md
@ -349,6 +349,13 @@ Use `try_get` for safe metadata extraction from parsed JSON.
|
|||||||
|
|
||||||
Use `unified_strdate` for uniform `upload_date` or any `YYYYMMDD` meta field extraction, `unified_timestamp` for uniform `timestamp` extraction, `parse_filesize` for `filesize` extraction, `parse_count` for count meta fields extraction, `parse_resolution`, `parse_duration` for `duration` extraction, `parse_age_limit` for `age_limit` extraction.
|
Use `unified_strdate` for uniform `upload_date` or any `YYYYMMDD` meta field extraction, `unified_timestamp` for uniform `timestamp` extraction, `parse_filesize` for `filesize` extraction, `parse_count` for count meta fields extraction, `parse_resolution`, `parse_duration` for `duration` extraction, `parse_age_limit` for `age_limit` extraction.
|
||||||
|
|
||||||
|
There are also a number of utility functions that behave like parts of the Python standard library, but offer better compatibility or handle errors in a more graceful way. These include:
|
||||||
|
|
||||||
|
* ``re.search`` → ``self._parse_regex``
|
||||||
|
* ``json.loads`` → ``self._parse_json``
|
||||||
|
* ``libxml2.parseMemory`` → ``self._parse_xml``
|
||||||
|
* ``libxml2.xpathEval`` → ``xpath_with_ns``, ``xpath_element``, ``xpath_text``
|
||||||
|
|
||||||
Explore [`youtube_dl/utils.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/utils.py) for more useful convenience functions.
|
Explore [`youtube_dl/utils.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/utils.py) for more useful convenience functions.
|
||||||
|
|
||||||
#### More examples
|
#### More examples
|
||||||
@ -366,3 +373,112 @@ duration = float_or_none(video.get('durationMs'), scale=1000)
|
|||||||
view_count = int_or_none(video.get('views'))
|
view_count = int_or_none(video.get('views'))
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Use Subclasses for Different Extractor Types
|
||||||
|
|
||||||
|
While seems easier at first to have a single extractor class handle multiple different types of URLs, this is difficult to read and maintain. If a website uses multiple URLs, they should be split into different extractors.
|
||||||
|
|
||||||
|
Correct:
|
||||||
|
```python
|
||||||
|
class SpecialVidsIE(InfoExtractor):
|
||||||
|
_VALID_URL = "^https://(?:www\.)specialvideos.com/(?P<id>[A-Za-z0-9-]+)"
|
||||||
|
|
||||||
|
def _real_extract(self, url):
|
||||||
|
# ... write code for full URLs...
|
||||||
|
|
||||||
|
class SpecialVidsEmbedLinksIE(SpecialVidsIE):
|
||||||
|
_VALID_URL = "^https://(?:www\.)specialvideos.com/embed/(?P<id>[0-9a-f]+)"
|
||||||
|
|
||||||
|
def _real_extract(self, url):
|
||||||
|
# ... write code for embedded URLs...
|
||||||
|
```
|
||||||
|
|
||||||
|
Incorrect:
|
||||||
|
```python
|
||||||
|
class SpecialVidsIE(InfoExtractor):
|
||||||
|
_VALID_URL = "^https://(?:www\.)specialvideos.com/(P<em>(embed)?/)(?P<id>[A-Za-z0-9-]+)"
|
||||||
|
|
||||||
|
def _real_extract(self, url):
|
||||||
|
# if embed != "/"
|
||||||
|
# ... code for full URLs ...
|
||||||
|
# else:
|
||||||
|
# ... code for embedded URLs ...
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
### Delegate Multiple URL Formats
|
||||||
|
|
||||||
|
If the code for two different extractors is very similar, the extractors should *delegate* their calls.
|
||||||
|
|
||||||
|
Correct:
|
||||||
|
```python
|
||||||
|
class SpecialVidsIE(InfoExtractor):
|
||||||
|
_VALID_URL = "^https://(?:www\.)specialvideos.com/(?P<id>[A-Za-z0-9-]+)"
|
||||||
|
|
||||||
|
def _real_extract(self, url):
|
||||||
|
display_id = self._video_id(url)
|
||||||
|
# ... write the parsing code ...
|
||||||
|
return {
|
||||||
|
'url': stream_url,
|
||||||
|
'title': video_title,
|
||||||
|
'id': real_id,
|
||||||
|
'display_id': display_id,
|
||||||
|
# ... more fields...
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class SpecialVidsEmbedLinksIE(SpecialVidsIE):
|
||||||
|
_VALID_URL = "^https://(?:www\.)specialvideos.com/(P<em>(embed)?/)(?P<id>[A-Za-z0-9-]+)"
|
||||||
|
|
||||||
|
def _real_extract(self, url):
|
||||||
|
display_id = self._video_id(url)
|
||||||
|
# ... download the page and parse the full URL ...
|
||||||
|
self.url_result(full_url) # delegate the full URL to the other extractor
|
||||||
|
```
|
||||||
|
|
||||||
|
Incorrect:
|
||||||
|
```python
|
||||||
|
class SpecialVidsIE(InfoExtractor):
|
||||||
|
_VALID_URL = "^https://(?:www\.)specialvideos.com/(?P<id>[A-Za-z0-9-]+)"
|
||||||
|
|
||||||
|
def _real_extract(self, url):
|
||||||
|
display_id = self._video_id(url)
|
||||||
|
# ... write the parsing code ...
|
||||||
|
return {
|
||||||
|
'url': stream_url,
|
||||||
|
'title': video_title,
|
||||||
|
'id': real_id,
|
||||||
|
'display_id': display_id,
|
||||||
|
# ... more fields...
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class SpecialVidsEmbedLinksIE(SpecialVidsIE):
|
||||||
|
_VALID_URL = "^https://(?:www\.)specialvideos.com/(P<em>(embed)?/)(?P<id>[A-Za-z0-9-]+)"
|
||||||
|
|
||||||
|
def _real_extract(self, url):
|
||||||
|
display_id = self._video_id(url)
|
||||||
|
# ... write the parsing code for this page ...
|
||||||
|
# ... re-fetch the full URL ...
|
||||||
|
return {
|
||||||
|
'url': stream_url,
|
||||||
|
'title': video_title,
|
||||||
|
'id': real_id,
|
||||||
|
# ... more fields...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Don't Repeat Yourself
|
||||||
|
|
||||||
|
The *DRY principle* is very important, because it makes code easier to understand and maintain. "DRY" is a very common code review comment, because of several common mistakes. To avoid them, ask yourself these questions after your code is working:
|
||||||
|
|
||||||
|
1. Are there strings or values repeated that should be constants?
|
||||||
|
2. Are there members or functions copy-pasted between delegate classes?
|
||||||
|
3. Are certain actions the code takes repeated in more than one place?
|
||||||
|
|
||||||
|
### Changes Must Be Testable
|
||||||
|
|
||||||
|
It is often difficult to see all the effects of a change to an extractor. Thus, our rule is that *every change should be testable*. There must be at least one test that fails before the change and passes after the change.
|
||||||
|
|
||||||
|
If an extractor is being updated or created, then there are no failing tests before the change. So this rule means a new test *must* be created. This is especially important when changing the valid URLs accepted by an extractor, a change that seems small but often has far-reaching consequences.
|
||||||
|
|
||||||
|
Don't forget to make sure the test fails before the change, too. Otherwise, nothing has really been fixed.
|
||||||
|
Loading…
Reference in New Issue
Block a user