Our text-based editing approach lays the foundation for better editing tools for movie post production. Filmed dialogue scenes often require re-timing or editing based on small script changes, which currently requires tedious manual work. Our editing technique also enables easy adaptation of audio-visual video content to specific target audiences: e.g., instruction videos can be fine-tuned to audiences of different backgrounds, or a storyteller video can be adapted to children of different age groups purely based on textual script edits. In short, our work was developed for storytelling purposes.
However, the availability of such technology — at a quality that some might find indistinguishable from source material — also raises important and valid concerns about the potential for misuse. Although methods for image and video manipulation are as old as the media themselves, the risks of abuse are heightened when applied to a mode of communication that is sometimes considered to be authoritative evidence of thoughts and intents. We acknowledge that bad actors might use such technologies to falsify personal statements and slander prominent individuals. We are concerned about such deception and misuse.
Therefore, we believe it is critical that video synthesized using our tool clearly presents itself as synthetic. The fact that the video is synthesized may be obvious by context (e.g. if the audience understands they are watching a fictional movie), directly stated in the video or signaled via watermarking. We also believe that it is essential to obtain permission from the performers for any alteration before sharing a resulting video with a broad audience. Finally, it is important that we as a community continue to develop forensics, fingerprinting and verification techniques (digital and non-digital) to identify manipulated video. Such safeguarding measures would reduce the potential for misuse while allowing creative uses of video editing technologies like ours.
We hope that publication of the technical details of such systems can spread awareness and knowledge regarding their inner workings, sparking and enabling associated research into the aforementioned forgery detection, watermarking and verification systems. Finally, we believe that a robust public conversation is necessary to create a set of appropriate regulations and laws that would balance the risks of misuse of these tools against the importance of creative, consensual use cases.
@article{Fried:2019:TET:3306346.3323028, author = {Fried, Ohad and Tewari, Ayush and Zollh\"{o}fer, Michael and Finkelstein, Adam and Shechtman, Eli and Goldman, Dan B and Genova, Kyle and Jin, Zeyu and Theobalt, Christian and Agrawala, Maneesh}, title = {Text-based Editing of Talking-head Video}, journal = {ACM Trans. Graph.}, issue_date = {July 2019}, volume = {38}, number = {4}, month = jul, year = {2019}, issn = {0730-0301}, pages = {68:1--68:14}, articleno = {68}, numpages = {14}, url = {http://doi.acm.org/10.1145/3306346.3323028}, doi = {10.1145/3306346.3323028}, acmid = {3323028}, publisher = {ACM}, address = {New York, NY, USA}, keywords = {dubbing, face parameterization, face tracking, neural rendering, talking heads, text-based video editing, visemes}, }