English – Page 2 – Vivia's blog

How to make a cat want to leave a place

2018-04-13 vivia English

TL;DR: Burn a lot of incense. The cat won’t like the smoke and will leave voluntarily.

What actually happened: One of the cats (a particularly antisocial and uncooperative one) ripped apart the mosquito net at one of the basement windows and jumped inside to give birth to her kittens. She chose a particularly fluffy place as her new nest: on top of some carpets we had stored there. It was also right next to the boiler for the petrol heating that we have.

Leaving the cat there would probably mean that the carpets would eventually need to be thrown away, not to mention that the whole basement would smell badly. What’s worse: my parents had to leave for a few days. If they left the window open, the cat would be able to jump in and out, but so would the thieves. If they closed it, she’d be so upset that she might as well rip everything apart. Furthermore, she was already quite ill-tempered before, and she became particularly aggressive when she had to protect her babies, so grabbing her to take her out by force was out of the question. We tried sprinkling some vinegar and even some bleach on safe places, in case the smell became too unpleasant, but it didn’t seem to work.

A friend suggested to light a fire and let the cat leave because of the smoke. However, a room full of carpets and a petrol boiler wasn’t exactly the safest place for a fire. I told my parents and they thought about burning some incense. That made the cat bring the babies into another room, but still inside the basement. Some progress here: we had saved the carpets from the cats, but we’d still have to leave a way in and out for her – and the thieves.

However, that other room made it much easier to burn A LOT of incense, without having to put it on places that would risk being accidentally knocked over by an upset cat, and also without risking to burn carpets below. At the end it was successful: she was successfully seen outside the basement! The kittens weren’t seen or heard outside the basement, but they also weren’t seen or heard inside, so it’s safe to assume that she finally brought them outside.

Bug of the day: 2018-04-11

2018-04-11 vivia Bug of the day, English

I recently noticed that we had a very big memory leak. My first reaction was to run valgrind. However, valgrind was only showing me some loose change here and there, nothing serious. Just to be sure, I fixed the small things it found, but the memory usage kept increasing. What’s worse; memory usage was pretty much constant when running things on valgrind, but was skyrocketing when running things normally. Pretty much the definition of a Heisenbug.

Some time later, I noticed that the memory usage was also just fine sometimes when running stuff normally. Then I noticed that, in those cases, it would fail to create an output file, but wouldn’t bother informing you about it. That’s pretty much the definition of a Mad Girlfriend Bug.

I investigated the Mad Girlfriend Bug a bit more, and found that some element would sometimes fail to transition to PLAYING fast enough, causing it to return flushing, and therefore preventing all media data from running through the pipeline. It just happened that valgrind changed the timing so that it was always reproducible. That also explains why the memory leak wasn’t appearing on valgrind:

In the meantime, I took a look at git diff and found what was causing the memory leak. I was using an appsink/appsrc combination. If you pull a buffer from the appsink and push it into the appsrc, it gets consumed by the appsrc. However, if you pull a sample from the appsink and push it into the appsrc, you need to unref it later. I ended up having to make it obvious in the documentation:

https://bugzilla.gnome.org/show_bug.cgi?id=795150

Categories of cyclists in Thessaloniki (and not only)

2017-12-21 vivia Cycling, English

The Pirate Ship of Captain Jim

Look, the pirate ship of Captain Jim
has its side-lights backwards when it’s shipping
it will take a thousand people tripping
’cause it’s filled with hashish to the brim
(N. Kavvadias, “Armida”)

These cyclists take special care in having the red light at the front and the white one at the back. So, you see them on your lane and wonder who’s this distracted person on the wrong lane, or (even worse) you see them on the opposite lane and you think you still have time, until they suddenly appear in front of you. I won’t give any further information at the moment, because they always show up when you least expect them so you don’t have time to see much.

The Ninja

These are the cyclists who have taken up a secret mission from Splinter or something, under the condition that they absolutely must not be seen; if by any chance someone sees them, they have to pay with their lives. So, the Ninjas always cycle during the night without any lights on, they wear black clothes, and they even paint their face and hands black. If you see them, you’ll see them on the last moment, because they forgot to cover the metallic parts of their bike with a cloth and they shone a bit under the light. The advantage is that, if they indeed need to sacrifice their life, it will look like an accident.

“Look, ma, no teeth!”

Unlike the Ninja, the “No teeth” can even be detected from a satellite, due to their sigmoid trajectory. They have their hands off the handle bar and their brain busy. They might be driving in the middle lane of the avenue in the middle of the day while playing Pokémon Go, they might be taking romantic selfies while cycling in the sunset, or they might just be eating peanuts, waiting for a chance to scatter them on the asphalt when they suddenly need to brake. [Disclaimer: The aforementioned examples are not a product of my imagination, any resemblance with real events and situations is unfortunately not a coincidence]

The clown car

Last time I looked at the regulations, the maximum amount of people allowed in an average car was 5. As we very well know, however, this doesn’t prevent much larger groups of circus clowns from squeezing inside a car. Similarly, we can see groups of two or even three cyclists riding the same bike, without hesitating to go out on the avenue. Of course, they can’t go fast at all. If you need to overtake them, you should keep three times the normal distance, because it’s never a good idea to be close to a moving object of mass-ive stupidity.

The Usain Bolt

These are the cyclists who believe that the crowded promenade bike lane during peak hours is the most suitable place for them to train for Tour de France. They’re always riding a racing bike, wearing colourful lycra clothes and sunglasses, carrying a water bottle, and obtaining a speed that might even exceed the 50km/h limit for inhabited areas. The slaloms they perform in order to overtake are something that even professional skiers would admire deeply. According to eyewitnesses, when they want to overtake, they’re even capable of cycling on the water of any adjacent river or sea.

The White Rabbit

This is a variation of the Usain Bolt, but the difference is that the White Rabbits aren’t running fast for sport, but because they’re in a hurry. You’ll see them with an average bike and everyday clothing, sweating to break the sound barrier in order to make it on time for a lesson, a shop, or a train. A necessary part of their equipment is their backpack, where they’re carrying their books, purchases, or weekend clothes respectively. If by any chance you find a White Rabbit without a backpack, it means that they’re rushing for some appointment or date, without minding that their significant other will see them drenched in sweat.

The Paganist

This category thinks that, just like Hercules obtained the strength of the lion of Nemea by wearing its skin, they’ll obtain the speed of the Usain Bolt by wearing the same clothes and riding the same bike. The speed they obtain partially justify the expense, because they can go twice as fast as a five-year-old with a trike outside the bike lane. The only thing that Paganists like more than cycling is discussions about cycling. So, they gather together with like-minded people on benches, trying to figure out who managed to find the smallest quality-to-price ratio for their bike and equipment. However, they only seem to care about the denominator of the fraction; the numerator is usually not interesting to them.

The religious procession

In this case, there are at least two bikes leading the rest of their group. These bikes are always going side by side, in order to block the whole width of the bike lane, and their speed is painfully low. You see their mouths moving, and it only makes sense that what can be heard is religious chants. The rest of the group follows them, slowly following the procession and praying, except they can’t hold candles because the wind would blow them off. At the end of the procession we can see some annoyed people following at the same speed, while looking into their calendars and trying to figure out if maybe it’s St Patrick’s day today.

Honourable mention: The taxi driver’s brother

In a well-known joke (at least in Greece), the taxi driver was proudly declaring that “my brother and I always run the red lights”. At the very least, he was considerate enough to stop at the green light, in order to not crash against his brother, in case he was running the red light on the other side. A simple glance at the city streets will reveal that the family in question has numerous members, and they’re not all as prudent as the taxi driver and his brother, which means that they don’t even stop at the green light. Unfortunately, one will notice that these family members are not restricting themselves to cycling, but they’ll often be found driving a motorcycle or a car. Therefore, they won’t be included into the official categories, but we’ll limit them to an honourable mention.

Bug of the day: 2017-10-13

2017-12-15 vivia Bug of the day, English

I made some changes near the source of the pipeline, essentially processing some metadata. The result: RTSP preview worked, but the files written were 0-sized. Well, that sounds like cleaning up your kitchen and causing your bathroom sink to start leaking, except the bathroom is at the other side of the apartment, and your shower is working just fine.

I traced it back to a few lines of code that were supposed to be a small improvement, but harmless at worst. They were about buffer meta, which is just what it says on the box: metadata that goes together with the buffer. Timecodes in GStreamer are such metadata. Another metadata is “reference timestamp”: they have some caps, which just act as a label, and a timestamp. Our buffers used to have only one reference timestamp in the past: stream time. Recently we added also hardware time. (Don’t try to understand those, they’re just labels for what it’s worth in this context). When getting the reference timestamp for a buffer, I used to just leave the caps (the label) NULL, meaning “give me whatever is first”. But now we also had hardware time, so I had replaced the NULL with a proper caps description. NULL worked, I could create files. With the proper caps description, I couldn’t. That sounds like going to a bar, getting a beer when you ask for a drink, but getting nothing when you ask for a beer. Makes no sense, does it?

I started tracing back the buffers to see where the problem started. The muxer wasn’t outputting any data. It received video, but it didn’t receive audio. Hmm, suspicious. Kind of makes sense though, the RTSP preview is video-only. We were getting notifications about audio levels though, so something must have gotten lost along the way. Weird. I tried seeing where the audio buffers get lost, and it turned out to be at the exact spot where the aforementioned lines of code were. Okay, now it’s at least really starting to make sense. That part was synchronising audio with video though. It wasn’t supposed to eat bread with buffers instead of bread with butter. So what’s wrong here?

A small investigation showed that, while audio buffers had both stream and hardware reference timestamps, video buffers had only hardware reference timestamps at that point. I tried passing NULL to see what would happen, and the log file happily reported “I found the first reference timestamp for both audio and video. It’s a hardware timestamp in both cases!”… Well, okay, at least synchronisation worked in that case, but it was a pure coincidence. I then looked at the code that added the reference timestamps, and, surprise of all surprises, it was identical for both audio and video. I even added an assertion right afterwards to make sure that they both existed for video. They did. But my video buffers genuinely had no stream reference timestamps when I was trying to get that information. So where does the meta go?

I took a look at my pipeline, to see if there were any elements that would delete the meta. There weren’t. The only element that did anything with meta was timecodestamper: it was configured to remove any existing timecode meta and insert its own. It looked a bit far-fetched to imagine that it would remove the reference timestamp meta as well, but everything else made even less sense, so I gave that a try. There was a function that checked all metas and removed the timecode one. I checked that one, but nope, they were both still intact after its call. What I did find, though, made even less sense. The function to add a timecode meta was removing one of the two other metas!

God enlightened me to talk to Sebastian at that point. He said “oh, Mathieu fixed that one a few days ago”:
https://bugzilla.gnome.org/show_bug.cgi?id=788617

It was essentially a bug in a homegrown linked list implementation, and took him a lot of head scratching and hair pulling to figure it out. It manifested at a completely different place, too. That bug had existed since something like 2012 and nobody had ever run into it in the past. Until Mathieu did, a few days ago. And now I also did. Think about the odds of that.

Bug of the day: 2017-03-30

2017-12-15 vivia Bug of the day, English

We had a pipeline that would capture 4K video, convert it using a high quality but low compression ratio codec, mux it into a .mov container together with raw audio, and write it into a file at a networked drive. The drive is fast enough to easily make use of the 10GB Ethernet card.

One 4K session was running fine. Two 4K sessions were freezing. CPU and memory usage were fine – nothing that seemed to cause things to lock up I tried taking pipeline dumps of the freeze: the queue right before the encoder was full. It was 600ms and qtmux shouldn’t need more than that for interleaving, so why is the queue getting full? So I examined the timestamps of all incoming buffers. Turns out that audio and video were off by more than 600ms … but WHY? (Thursday night is here)

Now some information on how qtmux works: Raw audio has no timestamps. It just assumes that it’s progressing at the given rate. If there’s a gap, the data that comes afterwards just gets squeezed and essentially gets converted to earlier timestamps, messing up A/V sync in the meantime.

So, keeping in mind the above, we noticed that several chunks of audio and/or video were going missing, and that’s what was causing the queue to get full. But why would they go missing? (Friday night is here)

Examining the data as it came from the capture card, we noticed that the callback that fed us the audio/video data was just forgetting to be called. It would get called at very regular intervals, until BOOM, a spike of maybe half a second or even one second happens. What’s worse – the drivers have no internal buffering. If you take too long to process the incoming data, you lose the next frame. We thought that the scheduler might be messing things up, but setting the capture thread to real-time priority didn’t seem to have any effect either. The thread itself was doing nothing much, it was just putting the data into a buffer and leaving, so it wasn’t spending more than 1ms in there. No locking or anything else was happening. CPU usage was still OK, no queues were really full at that point, so what could be wrong? We had this wild idea: does it run out of buffers in the pool? YES! But why? (Monday night is here)

On Tuesday we examined the situation with the buffer pool, looking at pipeline graphs and processing log files. Turns out that the queue right before the file sink was getting full first. After this, the queues between encoder and muxer would also get full. Then the queue before the encoder would also get full. Everything was zero-copy until there in my tests, so that’s what caused the capture card drivers to run out of buffers in their pool. Eventually the situation downstream of the encoder would slowly stabilize, but that was too late because then we had the gap that made everything explode. But why on Earth would the file sink refuse to write any further buffers? That was on Tuesday. I tried several things but nothing really seemed to help.

Yesterday we also enlisted a colleague, and he examined the system parameters in /proc etc. He optimized a lot of things – for example, the throughput with “dd” has pretty much doubled now. However, even with all these changes, the situation became better, but didn’t really solve itself.

Today (Thursday) I examined some graphs of how long the writev() call needed versus how long it took for it to be called again. Turns out that writev() could take up to half a second! Then it would get called again almost immediately. After a quick search, I ran into this issue here: https://github.com/netty/netty/issues/5360 We had already tried increasing the buffering in the filesink earlier, but it had no real effect. Turns out that we weren’t using proper values: even the 4MB we tried is maybe smaller than one video frame. So I gave it some thought and then set the filesink buffer size to 128MB (it won’t actually write the data into the file before that). AND IT WORKS!!!

Bug of the day: 2016-12-22

2017-12-15 vivia Bug of the day, English

Somewhere, I was receiving an integer and I had to return a filename based on it, where the integer was the sequence number, e.g. 0, 1, 2, etc. I needed to use that integer in a callback, so I was passing it as user_data with a GUINT_TO_POINTER (type casting with some sanity checks) and getting it back with GPOINTER_TO_UINT on the other side. The first file was being written normally, but the second failed. I tried inspecting why and saw this line in the callback:

MyElement *self = MY_ELEMENT (user_data);

where “self” wasn’t needed in that callback, I just added it out of habit, because usually user_data is self. It does have some sanity checks there, so the first time worked because user_data was just 0x0 – so self is NULL, alright. The second time, self was becoming (MyElement *) 0x1, so the sanity check was failing, and kaboom.

Bug of the day: 2016-12-14

2017-12-15 vivia Bug of the day, English

The previous day, I was looking at the debug logs and noticed a general teardown of the pipeline (setting state to NULL, removing elements, etc) when something crashed (e.g. wrong parameters => failed to link). I was thinking it absolutely shouldn’t happen and went to fix it. I could reproduce the pipeline crashing completely, but not the teardown. In any case, I pushed some fixes for it to fail more gracefully and went to sleep.

Next day, the pipeline started crashing for no apparent reason when I was trying to debug something else. I opened the log and, sure enough, there was a teardown. I also thought I could see some “child process killed” debug output on stdout, but that was running under screen, and “how do I scroll under screen?” should turned into “meh, that must have been just my idea, I can see the teardown here – plus I’m not getting a core dump”. I tried debugging where it could come from, but it ended up coming from a user command. However, my colleague insisted that the pipeline was just crashing.

Upon closer inspection, I noticed two processes doing teardown simultaneously. I asked my colleague and he said “yes, I was running four RTSP previews at once, but I was only recording to a file on one channel”. I tried seeing which process had the filesink in it, and I noticed it was yet another process, the last line of which was something completely normal and non-error. At that point I scrolled up and, sure enough, the “child process killed” message was there. It was even signal 9 (SIGKILL), which explained the absence of a core dump.

What had really happened: I was already debugging something else, so I had set debug log level to a high value. My colleague was running four channels at once, which means four different RTSP sessions, which encoded their streams four times, on a CPU encoder. When the time came for one of these four sessions to start encoding again in order to write it on the file, the machine simply couldn’t handle the load, the child process took longer than the configured timeout to respond and ended up getting killed as “probably deadlocked”.

Ποδηλατώντας στη Θεσσαλονίκη – Cycling in Thessaloniki

2017-12-15 vivia Cycling, English, Ελληνικά

Κάποιες μέτριες (όχι ιδιαίτερα άσχημες) διαδρομές στη Θεσσαλονίκη με το ποδήλατο. Η ημερομηνία και η ώρα κάτω αριστερά είναι λάθος, δεν είχα ρυθμίσει την ώρα στην κάμερα.

Some average (not really bad) cycling trips in Thessaloniki. The date and time at the bottom left are wrong, I hadn’t configured the time on the camera.

Άδεια χρήσης – License:

Μουσική – Music:

River Valley Breakdown Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 3.0 License
http://creativecommons.org/licenses/by/3.0/

Βίντεο – Video:

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0
International License: http://creativecommons.org/licenses/by-sa/4.0/