Monday, August 26, 2013

XOFilmmaker: To Mux or To Demux


Tonight I had some open time and managed to get back to my XOFilmmaker development work, which rocks.

Inspiration from Peru
I recently joined OLPC's support mailing list and it's been super interesting to see the chatter on there. I had a conversation with a teacher out in Peru, working with kids on the XO laptops. They said they are very excited to try XOFilmmaker when it is done, and they really look forward to making movies with it!! Wow. They said they will "definitely use it" and are excited. Talk about the motivation I need to get back into the fray and bend the (increasingly complex) GStreamer library to my will! For the good of the kids, I *will* solve this!!!!!!!!

What were my goals tonight?
My goal for tonight was to jump back in whole-heartedly. I haven't had time for the past 2 months to work on this given my travel in Texas and SIGGRAPH conference talk, so I wanted to pick things up from where I left them back in June.

In June I had written some code which took in an Ogg video file filmed with an XO Laptop's webcam and then successfully spliced a portion of video A with a portion of video B, and then played that video back. Pretty good start. But, there is no audio attached, and I'm not sure why.

I have 2 goals which I was hoping to chip away at tonight, and I managed to make some progress.
  1. Figure out how to get it to play back sync'd audio along with the video.
  2. Write the file to disk, instead of play it back on screen.
So, how did it go tonight?
At first, slooooooooooowly. I ran into a lot of problems and some roads that looked promising but led nowhere.

I spent a bunch of time digging into GStreamer and GNonLin more, but kept running into more walls.

I started with the attempt to write to disk... it didn't work at first attempt, though I did get a 0-length file written to disk. So, not exactly sure how to do this, but I probably need to remove the streamer from the playback UI and just run a command which does not sync to the UI. I suspect I need a "filesink" for this, not the playback  "autovideosink". Anyway, I decided to leave this for another time and jump into figuring out the missing audio.

With every night like this, I end up pushing through the frustration of not getting anywhere and banging my head against the GStreamer API, making a few silly mistakes, and then eventually learning something and making some minor progress. Just need to keep at it and make sure I make time for lots of these evenings before the end of the year... then, just maybe, I'll be able to get this open source software out the door and into the hands of kids in Peru!

Let's look at where I started:

import pygst
import gst
import pygtk
import gtk

class Main:
    def __init__(self):

        # set up the glade file
        self.wTree ="", "mainwindow")
        signals = {
            "on_play_clicked" : self.OnPlay,
            "on_stop_clicked" : self.OnStop,
            "on_quit_clicked" : self.OnQuit,


        # creating the pipeline
        self.pipeline = gst.Pipeline("mypipeline")

        # creating a gnlcomposition
        self.comp = gst.element_factory_make("gnlcomposition", "mycomposition")
        self.comp.connect("pad-added", self.OnPad)

        # create an audioconvert
        self.compconvert = gst.element_factory_make("audioconvert", "compconvert")

        # create an alsasink
        self.sink = gst.element_factory_make("alsasink", "alsasink")
        # create a gnlfilesource
        self.audio1 = gst.element_factory_make("gnlfilesource", "audio1")

        # set the gnlfilesource properties
        self.audio1.set_property("location", "/home/mjutan/high_1.ogg")
        self.audio1.set_property("start", 0 * gst.SECOND)
        self.audio1.set_property("duration", 5 * gst.SECOND)
        self.audio1.set_property("media-start", 0 * gst.SECOND)
        self.audio1.set_property("media-duration", 5 * gst.SECOND)

        caps = gst.Caps("audio/x-raw-float")
        self.filter = gst.element_factory_make("capsfilter", "filter")
        self.filter.set_property("caps", caps)
        # show the window
        self.window = self.wTree.get_widget("mainwindow")

    def OnPad(self, comp, pad):
        print "pad added!"
        convpad = self.compconvert.get_compatible_pad(pad, pad.get_caps())

    def OnPlay(self, widget):
        print "play"

    def OnStop(self, widget):
        print "stop"

    def OnQuit(self, widget):
        print "quitting"



I spent an impressive amount of time stuck on a Python error. Whoops.

I was trying to force the "type" (or "caps") of the format to be audio/x-raw-int, trying to get the audio layer of the Ogg file extracted and get that to play back.

I should explain: Ogg is just a container. And it contains 2 pieces in my case, a "Vorbis" audio stream, and a "Theora" video stream. I've already successfully extracted the Theora video stream and can splice that together, so that rocks. But I gotta figure out what the deal is with the audio.

As you can see here, the file I recorded with the XO webcam is a combo Ogg file which contains Theora and Vorbis together.

 [mjutan@localhost Downloads]$ ogginfo high_1.ogg
Processing file "high_1.ogg"...

New logical stream (#1, serial: 4428c820): type theora
New logical stream (#2, serial: 35df27fa): type vorbis
Vorbis headers parsed for stream 2, information follows...
Version: 0
Vendor: Xiph.Org libVorbis I 20120203 (Omnipresent)
Channels: 1
Rate: 16000

Nominal bitrate: 48.000000 kb/s
Upper bitrate not set
Lower bitrate not set
User comments section follows...
    ARTIST=Mike Jutan - San Francisco
    TITLE=Video by Mike Jutan - San Francisco
Theora headers parsed for stream 1, information follows...
Version: 3.2.1
Vendor: Xiph.Org libtheora 1.1 20090822 (Thusnelda)
Width: 400
Height: 300
Total image: 400 by 304, crop offset (0, 4)
Framerate 10/1 (10.00 fps)
Pixel aspect ratio 1:1 (1.000000:1)
Frame aspect 4:3
Colourspace unspecified
Pixel format 4:2:0
Target bitrate: 0 kbps
Nominal quality setting (0-63): 16
Vorbis stream 2:
    Total data length: 23413 bytes
    Playback length: 0m:05.767s
    Average bitrate: 32.472954 kb/s
Logical stream 2 ended
Theora stream 1:
    Total data length: 31776 bytes
    Playback length: 0m:05.900s
    Average bitrate: 43.086102 kb/s
Logical stream 1 ended
Ok so where next. I got caught up on the syntax for the "Caps", trying to supply a similar arrangement to this code below in Python which did not work at all.

There is a command-line version of GStreamer that you can access with gst-launch and that allows you to sort-of "prototype" out a streamer Pipeline on the command-line to see if it works before then trying to integrate it into Python. So that's pretty cool.

Problem is, I was getting supppper weird results from this below. This was telling me that GStreamer was missing some plugins which turned out to be a pretty awesome red herring/wild goose chase.

I also was silly and tried to set a caps value on something using this format:
self.audio1.set_property("caps", "audio/x-vorbis")

And I got a pretty clear TypeError from Python. I was so confused though that I went on a search trying to figure out what I was doing wrong. As it turns out, it was a simple type error and you need to create a Caps object, like this:
        caps = gst.Caps("audio/x-raw-float")
        self.filter = gst.element_factory_make("capsfilter", "filter")
        self.filter.set_property("caps", caps)

But,  as it turns out, I may not even need that. D'oh.

Anyhow, this took me on a trip to attempt to install all the other gstreamer plugins, consider building gst-plugins-base myself, and a bunch of other things that didn't work. All because of this weird message about missing plugins.

gst-launch gnlfilesource name=video location=$PWD/high_1.ogg \
start=0 duration=5000000000 \
media-start=0 media-duration=5000000000 \
! identity single-segment=true ! progressreport update-freq=1 ! ffmpegcolorspace \
! theoraenc ! oggmux name=mux ! filesink location=$PWD/outputResult.ogg \
gnlfilesource name=audio caps="audio/x-raw-int" location=$PWD/high_1.ogg \
start=0 duration=5000000000 \
media-start=0 media-duration=5000000000 \
! identity single-segment=true ! audioconvert ! vorbisenc ! mux.

Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...
ERROR: from element /GstPipeline:pipeline0/GnlFileSource:audio/GstURIDecodeBin:internal-uridecodebin/GstDecodeBin2:decodebin20: Your GStreamer installation is missing a plug-in.
Additional debug info:
gstdecodebin2.c(3576): gst_decode_bin_expose (): /GstPipeline:pipeline0/GnlFileSource:audio/GstURIDecodeBin:internal-uridecodebin/GstDecodeBin2:decodebin20:
no suitable plugins found
ERROR: pipeline doesn't want to preroll.
Setting pipeline to NULL ...
Freeing pipeline ...

Bah. So anyway after that, I then managed to get through the Python errors and then ended up with this pretty hard-to-understand error. It's basically saying that my audioconvert module is failing and that sortof implied again that maybe I was missing a plugin. Hmm...

** Message: pygobject_register_sinkfunc is deprecated (GstObject)
pad added!
Traceback (most recent call last):
  File "", line 64, in OnPad
TypeError: argument 1 must be gst.Pad, not None

After more searching I finally came across this useful line. And enter the magical demuxer....

gst-launch filesrc location=$PWD/high_1.ogg ! oggdemux name="demuxer" \
  demuxer. ! queue ! vorbisdec ! audioconvert ! audioresample ! autoaudiosink \
  demuxer. ! queue ! theoradec ! ffmpegcolorspace ! autovideosink

I ran this line and... voila!!!!!!!! My video clip played in a viewer, WITH AUDIO!!! Huzzahh!!

So this is great news. I suspect the problem turned out to be that really I just need to essentially "unpack" the Vorbis audio file from the OGG container... I can't just "read the audio portion" from the Ogg file... though that seemed to work fine with the video portion. I guess I actually need to "demux" the Ogg file first, essentially unpacking it into it's audio and video portions. From there, I should hopefully be able to then compile up the start and end time segments that I want, and then re-compile them into a new Ogg file ("muxing them") and save that out to a filesink on disk. And of course, I'll need to do that in Python.

So after a lot of tripping over the wrong things, I feel like I managed to make some progress. The issue with my specific file is likely that I need to demux it first or something like that. I just tried an initial version of this in Python and I don't get any audio playing... but at least there are no failures.

One other odd thing in that this seems to work ok with the filesrc object. But I need to use the "gnlfilesource" object so I can splice it. That seems to hang up the pipeline for who knows what reason...
gst-launch gnlfilesource location=$PWD/high_1.ogg start=0 duration=5000 \
  media-start=0 media-duration=5000 ! oggdemux name="demuxer" \
  demuxer. ! queue ! vorbisdec ! audioconvert ! audioresample ! autoaudiosink
Setting pipeline to PAUSED ...
Pipeline is PREROLLING ...

Maybe you have to have a  gnlcomposition for this to work. Anyway, I should go to bed. 3am on a "school" night :)

Something to try to continue another night. Off to bed to dream of video editing in developing nations.

Mike :)

No comments: