Scene codec


















Third, there is motion tracking. If we go back to our scene of two people throwing around a ball, then the ball travels across the scene. For some of its travels, it will look exactly the same, so rather than send the same data again and about the ball, it would be better to just note that the block with the ball has how moved a bit.

Motion vectors can be complex and finding those vectors and plotting the tracks can be time-consuming during encoding, but not during decoding. The supreme battle for a video encoder is to keep the bitrate low and the quality high. As video encoding has progressed over the years the aim of each successive generation was to decrease the bitrate and maintain the same level of quality. At the same time, there has also been an increase in the display resolutions able to consumers.

A high screen resolution also means more pixels to represent which means more data is needed for each frame. As a starting point, a rule of thumb, the higher the bitrate the better the quality.

But if you use a low bitrate the picture quality can disintegrate quickly. When the files are store on a DVD disc, Blu-ray disk, or on a hard drive the bitrate determines the file size.

To make things simple we will ignore any audio tracks and any embedded information inside of a video stream. If a DVD is roughly 4. In comparison, a 4K video clip straight out of my Android smartphone in H.

Just looking at those very rough numbers we can see that H. The same file encoded in H. These are very much rough estimations about the compression ratios available because the numbers I have given imply a constant bitrate.

However, some codecs allow videos to be encoded in a variable bitrate governed by a quality setting. This means that the bitrate changes moment by moment, with a predefined maximum bitrate used when the scenes are complex and lower bitrates when things are less cluttered.

It is then this quality setting that determines the overall bitrate. There are various ways to measure quality. You can look at the peak signal to noise ratio as well as other statistics. Plus you can look at the perceptive quality.

If 20 people what the same video clips from different encoders, which ones will be ranked higher for quality. From a personal, subjective point of view that is hard to verify and equally hard to dispute. Above is a montage of a single frame from the same video, encoded in three different ways. The top left is the original video.

Next to the right is the AV1 codec, with H. The original source was 4K. This is a less than perfect method to visualize the differences, but it should help illustrate the point. Due to the reduction of the overall resolution this is a 1, x 1, image, I find it hard to spot much of a difference between the four images, especially without pixel peeping. Regardless of which of the two codecs you pick, you also have to pick which flavor you want.

Start off with the smallest ProRes or DNx codec in the same resolution as your capture codec. If you have lots of extra storage space, think about using the next largest flavor. This means that you transcode your camera files into a codec that is both good for editing and very high-quality not very lossy. The key to picking a good Direct Intermediate codec is to make sure that you are preserving all of the information from your capture codec. An intermediate codec will never make your images better more detailed explanation below , but it can definitely make them worse if you choose the wrong codec.

The important thing is to understand the details of your original footage and make sure that your intermediate codec is at least as good as your capture codec in each area.

You want an intermediate codec that is at least and 8-bit. Going beyond these values e. We have 4 options to choose from that are and bit. You might think that all you need is to match the camera bitrate Mbps , but you actually need to greatly exceed the camera bitrate. This is because h. Because h. In order for ProRes to match the image quality of h. ProRes will probably do just fine, but if you have lots of storage space, then going up to ProRes HQ will have a slight edge.

Part of the reason why the Direct Intermediate workflow is common is because it used to be a lot harder to use a proxy workflow. The main exception is when you have a lot of mixed footage types. If you have multiple frame rates and frame sizes in the same project, switching back and forth from the proxies to the capture codecs can be a headache.

If you are using some third-party tools to help prep and organize your footage before you start cutting, those can also make the relinking process more tricky. One common example might be software that automatically syncs audio tracks or multicam shoots. If you were to include the LUT in your transcode for Direct Intermediate workflow, you would be losing all of the benefits of recording in log in the first place.

This is very important, because it is very commonly misunderstood, and there is a lot of misinformation online. Transcoding your footage before you edit will never increase the quality of the output.

There are some extra operations that you could do in the transcode process such as using sophisticated up-res tools that could increase the image quality in some cases, but a new codec by itself will never increase the quality of your image. That includes going from h. It also includes going from 8-bit to bit. And going from to This is a photo of a rose reflected in a water droplet.

Now what if I take a photo of my monitor with a Red Helium 8k camera. This is a beast of a camera. The Red camera has more megapixels, right? I have a file that is technically higher-resolution, but it does not capture any more of my subject the rose than the first one did.

You are making a copy of a copy, taking a photo of a photo. The big caveat is that, if you are doing any processing, any transformation of the image adding a LUT, for instance , then you definitely do want to transcode into a higher-quality codec, which will retain new information.

Not ideal for editing. The downside is that you would need about Peanuts for a big facility, but a significant investment for a solo editor. So you might decide to use a Proxy workflow instead and transcode your files to the ProRes Proxy 4K format. Then your footage would only take up 2. You can then easily edit off of a single hard drive, and your workflow gets a lot simpler. For instructions on how to calculate bitrates and file sizes, check out this article: The Simple Formula to Calculate Video Bitrates.

You might decide to transcode the footage even further down to ProRes Proxy HD, which would shrink your footage down to just GB, which becomes more feasible to send over the Internet if you have a fast connection. When the edit is all done, you just re-link your project back to the original camera files and export. The big question at this point is whether you want to color-correct straight on the original camera files, or whether you want to transcode.

In order to make good decisions about color, you need the highest quality image that you have available, because you need to be able to see exactly what you have to work with. This is certainly a simple option. If you did a proxy edit, you can relink to the camera files for the finishing process and go to town. This will give you maximum image quality, but remember how the camera files can be slow to work with? The camera files may slow down the process a little, but depending on the software you use and the amount of work you need to do, you might decide that the simplicity is worth a little bit of potential slowdown.

If you have a short edit without a lot of complexity, then this can be a great and easy workflow. You could transcode all of your footage to a high-image-quality codec, link to those files, and then start doing your color-correction.

Fortunately, there is another option. When you consolidate a project, your editing software will make a copy of your project along with a copy of the media, but only the particular files that you ended up using in your sequence.

This cuts down on the storage a lot, which comes in handy at this stage. You can also consolidate down even further so that you only keep the specific portions of each take that you actually used in the edit, discarding the rest.

Now you can take this new consolidated project after relinking to the originals and transcode all of these files to a very high-quality, high-bitrate codec, and start color-correcting. The licensing costs may be tied to the number of times the codec is used, the amount of data compressed using the codec , or in other ways.

While one codec may provide an exceptionally high compression quality e. Indications of the licensing costs for various codecs may be stored within the codec library or at other locations accessible by the comparison module In one embodiment, the licensing costs are considered only when a number of the top codecs produce similar results, e.

In the example of FIG. However, the codec with the highest PSNR score is more than two times more expensive than the codec with the next highest PSNR score, which is, itself, almost three times more expensive than the codec with the third highest PSNR score. In one configuration, the comparison module would select the codec with the third highest PSNR score due to its much lower licensing cost In other embodiments, the comparison module may create a composite score not shown based on the PSNR score, the licensing cost , and other possible factors.

In still other embodiments, the comparison module may calculate an anticipated cost not shown for the entire transmission and seek to minimize that cost over all of the codec selection decisions. Hence, the comparison module might select a more expensive codec for certain scenes , where a substantial increase in quality is realized, while selecting less expensive codecs for other scenes.

However, there is no guarantee that the destination system may be able to process data that quickly. Moreover, there is no guarantee that the network will always provide the same amount of bandwidth. As a result, there may be a need to periodically change the target data rate within the selection module of the source system , since the target data rate will affect which codecs are selected for various scenes For example, as shown in FIG. Typically, the bandwidth over cellular networks is limited.

Similarly, the processing power of a cellular telephone is substantially less than that of a personal computer or dedicated video conferencing system. In one embodiment, in response to receiving a connection request, the destination system provides the source system with a modified target data rate , e.

The modified rate may be communicated to the source system using any standard data structure or technique. Thereafter, depending on the configuration, the target data rate may be replaced by the modified rate In certain embodiments, an actual data rate is not communicated. Rather, a message is sent specifying one or more constraints or capabilities of the destination system or network , in which case it would be up to the source system to revise the target data rate as appropriate.

In one embodiment, dynamic streaming may be employed where no specific message is sent by destination system The source system may use latency calculations, requests to resend lost packets, etc.

In one configuration, as shown in FIG. While the depicted video frame is subdivided into four sub-frames a - d of equal size, the invention is not limited in this respect. For instance, a video frame may be subdivided into any number of sub-frames , although too many sub-frames may adversely affect compression quality.

Moreover, the sub-frames need not be of equal size. For example, sub-frames near the center of the video frame may be smaller due to the relatively greater amount of motion in this area. In certain embodiments, the sub-frames may be defined by objects represented within the video frame As an example, the head of a person could be defined as a separate object and, hence, a different sub-frame from the background.

Algorithms e. A set of sub-frames a - d within a scene exhibit characteristics a - d , and may be treated, for practical purposes, like a complete video frame Accordingly, using the techniques described above, the characteristics a - d may be used to determine an optimal codec a - d for the compressing the respective sub-frames a - d. For example, an AI system not shown may be used to determine whether an association exists between a set of characteristics and a particular codec If no association exists, compression and comparison modules not shown may be used to test a plurality of codecs on the respective sub-frames to determine the optimal codec Thus, different sub-frames a - d of a single scene may be compressed using different codecs a - d.

In the illustrated embodiment, four different codecs a - d are used. While specific embodiments and applications of the present invention have been illustrated and described, it is to be understood that the invention is not limited to the precise configuration and components disclosed herein. Various modifications, changes, and variations apparent to those of skill in the art may be made in the arrangement, operation, and details of the methods and systems of the present invention disclosed herein without departing from the spirit and scope of the present invention.

A media compression method comprising: using a computer to perform the steps of: obtaining a media signal to be communicated to a destination system;. The method of claim 1 , wherein the codecs are selected from the group consisting of discrete cosine transform DCT codecs, fractal codecs, and wavelet codecs.

The method of claim 1 , wherein a first automatically selected codec comprises a discrete cosine transform DCT codec and a second automatically selected codec comprises a fractal codec. The method of claim 1 , wherein a first automatically selected codec comprises a discrete cosine transform DCT codec and a second automatically selected codec comprises a wavelet codec.

The method of claim 1 , wherein automatically selecting further comprises: identifying a plurality of characteristics of a scene; and. The method of claim 5 , wherein the characteristics are selected from the group consisting of motion characteristics and color characteristics. The method of claim 6 , wherein searching further comprises using an Artificial Intelligence AI system to locate a codec associated with the identified characteristics of a scene.

The method of claim 7 , wherein the AI system comprises a neural network. The method of claim 7 , wherein the AI system comprises an expert system.

The method of claim 5 , wherein searching further comprises searching for an association between the identified characteristics and a set of parameters to be used with the automatically selected codec; wherein compressing further comprises compressing the scene using the automatically selected codec with the associated set of parameters; and.

The method of claim 1 , wherein automatically selecting further comprises: testing at least a subset of the codecs of the codec library on a scene; and. The method of claim 11 , wherein testing further comprises: storing a baseline snapshot of the scene; and. The method of claim 12 , wherein comparing further comprises comparing the quality according to a Just Noticeable Difference JND value.

The method of claim 12 , further comprising: identifying a plurality of characteristics of a scene; and. The method of claim 11 , wherein testing further comprises testing codecs of the codec library on the scene using different sets of parameters and automatically selecting the codec and set of parameters that produce a highest compression quality for the scene according to a set of criteria without exceeding the target data rate; wherein compressing further comprises compressing the scene using the automatically selected codec with the automatically selected parameters; and.

The method of claim 16 , further comprising: identifying a plurality of characteristics of a scene; and. The method of claim 1 , further comprising adjusting the target data rate in response to constraints of the destination system.

The method of claim 1 , further comprising adjusting the target data rate in response to conditions of a transmission channel to the destination system. The method of claim 1 , further comprising adjusting the target data rate in response to a message from the destination system. The method of claim 1 , wherein identifying further comprises detecting a scene change in response to one frame of the media signal being sufficiently different from a previous frame.

The method of claim 1 , wherein identifying further comprises detecting a scene change in response to the passage of a fixed period of time. The method of claim 1 , wherein delivering further comprises streaming each compressed scene to the destination system through a network. The method of claim 1 , wherein delivering further comprises storing each compressed scene on a storage medium. The method of claim 1 , wherein at least one codec in the library has an associated licensing cost, and wherein selecting further comprises automatically selecting the codec having the least licensing cost in response to two or more codecs producing substantially the same quality of compressed output for a scene.

A media compression method comprising: using a computer to perform the steps of: providing a library of codecs, at least one codec having an associated licensing cost;. A method for communicating a media signal comprising: using a computer to perform the steps of: selectively compressing, by a compression module, at least two scenes of a media signal using different codecs from a codec library, wherein the codecs are automatically selected, by a selection module, to produce a highest compression quality for the respective scenes according to a set of criteria without exceeding a target data rate; and.

A media compression system comprising: an input module to obtain a media signal to be communicated to a destination system;. The system of claim 28 , wherein the codecs are automatically selected from the group consisting of discrete cosine transform DCT codecs, fractal codecs, and wavelet codecs. The system of claim 28 , wherein a first automatically selected codec comprises a block codec and a second automatically selected codec comprises a fractal codec.

The system of claim 28 , wherein a first automatically selected codec comprises a block codec and a second automatically selected codec comprises a wavelet codec. The system of claim 28 , wherein the identification module is to identify a plurality of characteristics of a scene; and wherein the selection module is to search for a codec in the library that is associated with the identified characteristics of the scene.

The system of claim 32 , wherein the characteristics are selected from the group consisting of motion characteristics and color characteristics. The system of claim 33 , wherein the selection module comprises an Artificial Intelligence AI system to locate a codec associated with the identified characteristics of a scene.

The system of claim 34 , wherein the AI system comprises a neural network. The system of claim 34 , wherein the AI system comprises an expert system. Maximum number of pixels per image. This value can be used to avoid out of memory failures due to large images. Enable cropping if cropping parameters are multiples of the required alignment for the left and top parameters. If the alignment is not met the cropping will be partially applied to maintain alignment.

Default is 1 enabled. When you configure your FFmpeg build, all the supported native decoders are enabled by default. Decoders requiring an external library must be enabled manually via the corresponding --enable-lib option.

You can list all available decoders using the configure option --list-decoders. Requires the presence of the libdav1d headers and library during configuration. You need to explicitly configure the build with --enable-libdav1d. Set amount of frame threads to use during decoding. The default value is 0 autodetect. Use the global option threads instead. Set amount of tile threads to use during decoding. Apply film grain to the decoded video if present in the bitstream. Defaults to the internal default of the library.

This option is deprecated and will be removed in the future. Select an operating point of a scalable AV1 bitstream 0 - Requires the presence of the libuavs3d headers and library during configuration. You need to explicitly configure the build with --enable-libuavs3d. Dynamic Range Scale Factor. The factor to apply to dynamic range values from the AC-3 stream.

This factor is applied exponentially. The default value is 1. There are 3 notable scale factor ranges:. DRC enabled. Applies a fraction of the stream DRC value. Audio reproduction is between full range and full compression. Loud sounds are fully compressed.

Soft sounds are enhanced. The lavc FLAC encoder used to produce buggy streams with high lpc values like the default value. This decoder generates wave patterns according to predefined sequences. Its use is purely internal and the format of the data it accepts is not publicly documented. Requires the presence of the libcelt headers and library during configuration. You need to explicitly configure the build with --enable-libcelt.

Requires the presence of the libgsm headers and library during configuration. You need to explicitly configure the build with --enable-libgsm. Requires the presence of the libilbc headers and library during configuration. You need to explicitly configure the build with --enable-libilbc. Using it requires the presence of the libopencore-amrnb headers and library during configuration. You need to explicitly configure the build with --enable-libopencore-amrnb.

Using it requires the presence of the libopencore-amrwb headers and library during configuration. You need to explicitly configure the build with --enable-libopencore-amrwb. Requires the presence of the libopus headers and library during configuration. You need to explicitly configure the build with --enable-libopus. Sets the base path for the libaribb24 library. This is utilized for reading of configuration files for custom unicode conversions , and for dumping of non-text symbols as images under that location.

This codec decodes the bitmap subtitles used in DVDs; the same subtitles can also be found in VobSub file pairs and in some Matroska files. Specify the global palette used by the bitmaps. When stored in VobSub, the palette is normally specified in the index file; in Matroska, the palette is stored in the codec extra-data in the same format as in VobSub. The format for this option is a string containing 16 bits hexadecimal numbers without 0x prefix separated by commas, for example 0d00ee, eed, , eaeaea, 0ce60b, ec14ed, ebff0b, 0da, 7b7b7b, d1d1d1, 7b2a0e, 0dc, 0fb, cf0dec, cfa80c, 7cb.

Only decode subtitle entries marked as forced. Some titles have forced and non-forced subtitles in the same track. Setting this flag to 1 will only keep the forced subtitles. Default value is 0. Requires the presence of the libzvbi headers and library during configuration. You need to explicitly configure the build with --enable-libzvbi. List of teletext page numbers to decode.

Pages that do not match the specified list are dropped. Set default character set used for decoding, a value between 0 and 87 see ETS , Section 15, Table Default value is -1, which does not override the libzvbi default.

This option is needed for some legacy level 1. The default format, you should use this for teletext pages, because certain graphics and colors cannot be expressed in simple text or even ASS. Formatted ASS output, subtitle pages and teletext pages are returned in different styles, subtitle pages are stripped down to text, but an effort is made to keep the text alignment and the formatting. Chops leading and trailing spaces and removes empty lines from the generated text.

This option is useful for teletext based subtitles where empty spaces may be present at the start or at the end of the lines or empty lines may be present between the subtitle lines because of double-sized teletext characters. Default value is 1. Sets the display duration of the decoded teletext pages or subtitles in milliseconds. Default value is -1 which means infinity or until the next subtitle event comes.

Force transparent background of the generated teletext bitmaps. Default value is 0 which means an opaque background. Sets the opacity of the teletext background. When you configure your FFmpeg build, all the supported native encoders are enabled by default.

Encoders requiring an external library must be enabled manually via the corresponding --enable-lib option. You can list all available encoders using the configure option --list-encoders. Setting this automatically activates constant bit rate CBR mode. If this option is unspecified it is set to kbps.

Set quality for variable bit rate VBR mode. This option is valid only using the ffmpeg command-line tool.

Set cutoff frequency. If unspecified will allow the encoder to dynamically adjust the cutoff to improve clarity on low bitrates. This method first sets quantizers depending on band thresholds and then tries to find an optimal combination by adding or subtracting a specific value from all quantizers and adjusting some individual quantizer a little.

This is an experimental coder which currently produces a lower quality, is more unstable and is slower than the default twoloop coder but has potential. Not currently recommended. Worse with low bitrates less than 64kbps , but is better and much faster at higher bitrates. Can be forced for all bands using the value "enable", which is mainly useful for debugging or disabled using "disable". Sets intensity stereo coding tool usage. Can be disabled for debugging by setting the value to "disable".

Uses perceptual noise substitution to replace low entropy high frequency bands with imperceptible white noise during the decoding process. Enables the use of a multitap FIR filter which spans through the high frequency bands to hide quantization noise during the encoding process and is reverted by the decoder. As well as decreasing unpleasant artifacts in the high range this also reduces the entropy in the high bands and allows for more bits to be used by the mid-low bands. Enables the use of the long term prediction extension which increases coding efficiency in very low bandwidth situations such as encoding of voice or solo piano music by extending constant harmonic peaks in bands throughout frames.

Use in conjunction with -ar to decrease the samplerate. Enables the use of a more traditional style of prediction where the spectral coefficients transmitted are replaced by the difference of the current coefficients minus the previous "predicted" coefficients. In theory and sometimes in practice this can improve quality for low to mid bitrate audio. The default, AAC "Low-complexity" profile.

Is the most compatible and produces decent quality. Introduced in MPEG4. Introduced in MPEG2. This does not mean that one is always faster, just that one or the other may be better suited to a particular system. The AC-3 metadata options are used to set parameters that describe the audio, but in most cases do not affect the audio encoding itself.

Some of the options do directly affect or influence the decoding and playback of the resulting bitstream, while others are just for informational purposes. A few of the options will add bits to the output stream that could otherwise be used for audio data, and will thus affect the quality of the output. Those will be indicated accordingly with a note in the option list below. Allow Per-Frame Metadata.

Specifies if the encoder should check for changing metadata for each frame. Center Mix Level. The amount of gain the decoder should apply to the center channel when downmixing to stereo. This field will only be written to the bitstream if a center channel is present. The value is specified as a scale factor. There are 3 valid values:. Surround Mix Level. The amount of gain the decoder should apply to the surround channel s when downmixing to stereo.

This field will only be written to the bitstream if one or more surround channels are present. Audio Production Information is optional information describing the mixing environment. Either none or both of the fields are written to the bitstream. Mixing Level. Specifies peak sound pressure level SPL in the production environment when the mix was mastered. Valid values are 80 to , or -1 for unknown or not indicated.

The default value is -1, but that value cannot be used if the Audio Production Information is written to the bitstream. Room Type. Describes the equalization used during the final mixing session at the studio or on the dubbing stage. A large room is a dubbing stage with the industry standard X-curve equalization; a small room has flat equalization.

Dialogue Normalization. This parameter determines a level shift during audio reproduction that sets the average volume of the dialogue to a preset level. The goal is to match volume level between program sources. A value of dB will result in no volume level change, relative to the source volume, during audio reproduction.

Valid values are whole numbers in the range to -1, with being the default. Dolby Surround Mode. Specifies whether the stereo signal uses Dolby Surround Pro Logic. This field will only be written to the bitstream if the audio stream is stereo. Original Bit Stream Indicator.

Specifies whether this audio is from the original source and not a copy.



0コメント

  • 1000 / 1000