Bug 1074051

Summary: Deadlock in thunar 1.6.13-124.1
Product: [openSUSE] openSUSE Tumbleweed Reporter: Markus Elfring <Markus.Elfring>
Component: XfceAssignee: Markus Elfring <Markus.Elfring>
Status: RESOLVED UPSTREAM QA Contact: E-mail List <qa-bugs>
Severity: Normal    
Priority: P5 - None CC: seife, vinz
Version: Current   
Target Milestone: ---   
Hardware: x86-64   
OS: SUSE Other   
URL: https://bugzilla.xfce.org/show_bug.cgi?id=14122
Whiteboard:
Found By: --- Services Priority:
Business Priority: Blocker: ---
Marketing QA Status: --- IT Deployment: ---

Description Markus Elfring 2017-12-24 10:40:30 UTC
I have noticed once more that the windows of the application “thunar 1.6.13-124.1” do not respond any more as expected. Their status lines display the message “Ordnerinhalt wird geladen …” (Folder contents are being loaded …).

elfring@Sonne:~> ps -ef|grep Thunar
elfring   3346  3336  0 10:49 tty2     00:00:02 Thunar --sm-client-id 2e621412a-b89a-4db2-9ba2-bd3cc0e1d1ce --daemon
elfring@Sonne:~> strace -p 3346
strace: Process 3346 attached
futex(0x55df5419db28, FUTEX_WAIT_PRIVATE, 2, NULL

How will the affected data synchronisation be fixed anyhow?
Comment 1 Markus Elfring 2017-12-24 11:56:37 UTC
The directory contents are correctly displayed if I start another instance of the application “thunar 1.6.13-124.1”.

But a software component got stuck after my login for this XFCE session.

elfring@Sonne:~> gdb -p 3346
…
(gdb) bt
#0  0x00007fbafaec7fd9 in syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00007fbafb43febc in g_mutex_lock_slowpath (mutex=mutex@entry=0x55df5419db28) at gthread-posix.c:1313
#2  0x00007fbafb440702 in g_mutex_lock (mutex=mutex@entry=0x55df5419db28) at gthread-posix.c:1337
#3  0x000055df5230aa49 in thunar_thumbnailer_queue_async_reply (proxy=0x55df53804550, handle=2, error=0x0, user_data=0x55df53eb63a0) at thunar-thumbnailer.c:617
#4  0x000055df5230a63e in thunar_thumbnailer_proxy_queue_async_callback (proxy=0x55df53804550, call=<optimized out>, user_data=0x55df5418e790) at ../thunar/thunar-thumbnailer-proxy.h:36
#5  0x00007fbafbcc98a2 in complete_pending_call_and_unlock (connection=connection@entry=0x55df534fa1e0, pending=0x55df543e4da0, message=message@entry=0x55df543e4310) at dbus-connection.c:2331
#6  0x00007fbafbccd21f in dbus_connection_dispatch (connection=connection@entry=0x55df534fa1e0) at dbus-connection.c:4652
#7  0x00007fbafbf124f5 in message_queue_dispatch (source=source@entry=0x55df534fc090, callback=<optimized out>, user_data=<optimized out>) at dbus-gmain.c:90
#8  0x00007fbafb3faf97 in g_main_dispatch (context=0x55df534818b0) at gmain.c:3148
#9  0x00007fbafb3faf97 in g_main_context_dispatch (context=context@entry=0x55df534818b0) at gmain.c:3813
#10 0x00007fbafb3fb1d0 in g_main_context_iterate (context=0x55df534818b0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3886
#11 0x00007fbafb3fb4e2 in g_main_loop_run (loop=0x55df54142fc0) at gmain.c:4082
#12 0x00007fbafd308ad7 in gtk_main () at /usr/lib64/libgtk-x11-2.0.so.0
#13 0x000055df522b3799 in main (argc=<optimized out>, argv=<optimized out>) at main.c:312
Comment 2 Markus Elfring 2017-12-24 14:06:45 UTC
(In reply to comment #0)

Background information from a working instance of Thunar can look like the following.

elfring@Sonne:~> strace -p 12852
…
restart_syscall(<... resuming interrupted poll ...>) = 0
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=18, events=POLLIN}], 5, 44991
…
elfring@Sonne:~> gdb -p 12852
…
(gdb) bt
#0  0x00007f76dea1ff2b in __GI___poll (fds=0x556c9cf47450, nfds=5, timeout=44975) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007f76def58149 in g_main_context_poll (priority=<optimized out>, n_fds=5, fds=0x556c9cf47450, timeout=<optimized out>, context=0x556c9cb6e940) at gmain.c:4187
#2  0x00007f76def58149 in g_main_context_iterate (context=0x556c9cb6e940, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3881
#3  0x00007f76def584e2 in g_main_loop_run (loop=0x556c9ced8cf0) at gmain.c:4082
#4  0x00007f76e0e65ad7 in gtk_main () at /usr/lib64/libgtk-x11-2.0.so.0
#5  0x0000556c9c334847 in main (argc=<optimized out>, argv=<optimized out>) at main.c:312
Comment 3 Stefan Seyfried 2017-12-25 19:03:26 UTC
Since we have no relevant patches in Thunar, you need to report this upstream in order to get it fixed. There is no thunar-specific knowledge among the openSUSE Thunar maintainers.

Please add the number of the upstream bug, so that we can monitor that for a fix.
Comment 4 Markus Elfring 2017-12-25 19:22:17 UTC
(In reply to Stefan Seyfried from comment #3)

* How do you think about to forward my bug report to the other issue tracker?

* Will this description trigger any further consequences for better quality assurance around a software component like “thunar-thumbnailer”?
Comment 5 Stefan Seyfried 2017-12-25 21:45:30 UTC
I cannot reproduce it, i have no gdb backtrace, so it's pretty useless if I play proxy.
Comment 6 Markus Elfring 2017-12-26 10:23:29 UTC
(In reply to Stefan Seyfried from comment #5)
> I cannot reproduce it,

You might need to try it a bit harder. Other development tools might help to find the reason for another special programming mistake easier.
https://en.wikipedia.org/wiki/Deadlock_prevention_algorithms


> i have no gdb backtrace,

I showed one for my report.


> so it's pretty useless if I play proxy.

I got this software from an openSUSE package. So I assume that there could be more possibilities to consider around the openQA system.

How do you think about to experiment with deadlock detection agents (for example)?
Comment 7 Markus Elfring 2017-12-26 13:11:00 UTC
(In reply to Stefan Seyfried from comment #5)

Would you like to clarify the software situation around the message “Loading folder contents...” any more?

Do the implementations of the functions “thunar_standard_view_view_init” and “thunar_standard_view_set_loading” contain details for further development considerations?
https://git.xfce.org/xfce/thunar/tree/thunar/thunar-standard-view.c?id=89de3451891c36b1b86cb626de820d11e0f4bd9e#n1658


Which events could disturb (or block) the data processing for thumbnails?
Comment 8 Markus Elfring 2017-12-26 14:00:39 UTC
(In reply to comment #1)

How often can the call of the macro “_thumbnailer_lock” be reached in the implementation of the function “thunar_thumbnailer_queue_async_reply” before the corresponding statement “_thumbnailer_unlock” will be completely executed?
https://git.xfce.org/xfce/thunar/tree/thunar/thunar-thumbnailer.c?id=89de3451891c36b1b86cb626de820d11e0f4bd9e#n265
Comment 9 Stefan Seyfried 2017-12-26 14:18:46 UTC
Just to make one thing clear:
There are two people doing regular XFCE packaging and maintenance for openSUSE now.
I (and I guess Takashi, too) do not have free resources to discuss your ideas on software quality, design and openqa.

Your questions are implying we (Takashi and me) are doing somthing wrong or show a lack of interest. That's not the case, even if we two cannot make the world single handedly a better place and fix all your software problems.

I personally take offense at your attitude.

I do not like to clarify _anything_. I'm packaging XFCE in my free time, I'm not going to dig deeper in every piece of software you think you found a bug in. That's something you need to do yourself, especially after you have shown the capability to do so. The same is true for your question in comment 8: please read the code, I will not do this for you.

Please ask these questions upstream (or even better, ask useful question, in a much less accusing tone, this might increase your chances of success).

If you want more openQA work for XFCE -- sure, go ahead and do it! openSUSE is a project where everyone can contribute, but noone can demand others do your work.
Comment 10 Markus Elfring 2017-12-26 15:22:17 UTC
(In reply to Stefan Seyfried from comment #9)
> There are two people doing regular XFCE packaging and maintenance
> for openSUSE now.

Thanks for another bit of background information.


> Your questions are implying we (Takashi and me) are doing somthing wrong or
> show a lack of interest.

I find it interesting that you interpret my feedback in this direction.
But our limited development capacity will influence the selection of possibilities which will get another look.


> I personally take offense at your attitude.

I hope that our dialogue can become more constructive once more.


> The same is true for your question in comment 8:
> please read the code, I will not do this for you.

I hope somehow that SUSE contributors could share a bit more software experiences.
Comment 11 Markus Elfring 2017-12-26 16:43:51 UTC
(In reply to Stefan Seyfried from comment #3)

Would you like to track any progress on the bug report “Deadlock during loading of folder contents”?
https://bugzilla.xfce.org/show_bug.cgi?id=14122
Comment 12 Stefan Seyfried 2018-02-18 01:46:11 UTC
https://build.opensuse.org/request/show/577658

I'm updating Thunar to current maintenance release 1.6.14
Unfortunately I did not see anything in the changelog / diff that looks like it might solve this issue.
Comment 13 Markus Elfring 2018-03-08 09:10:45 UTC
(In reply to Stefan Seyfried from comment #12)

* Do you observe the message “Loading folder contents...” also in your test environment (as it happens for me again with the software “Thunar 1.6.14-128.3” for the directory “/home/elfring/” at the moment)?

* Did this issue become reproducible more often?


Other directories which are offered in the left places widget are immediately displayed here.
Comment 14 Stefan Seyfried 2018-03-08 14:43:06 UTC
I never see this message at all, but then I'm not a heavy thunar user, I usually just use it to mount and browse a USB stick or similar simple tasks.

So I have not the faintest idea on how to reproduce it.
Comment 15 Stefan Seyfried 2018-03-08 15:00:36 UTC
Ok, now I see the message (I looked in the title bar, not the status bar...) but still cannot reproduce any hang.
Comment 16 Markus Elfring 2018-03-08 15:05:49 UTC
(In reply to Stefan Seyfried from comment #15)

* Can such a hiccup look like unwanted software behaviour?

* Will directory sizes matter for this use case?
Comment 17 Vinzenz Vietzke 2019-03-22 11:23:16 UTC
In Leap 42.3 Thunar is v1.6.10, in Leap 15 it's 1.6.14, TW has 1.8.4. So none of the officially supported Distribution version matches the reportedly problematic version of Thunar.

Is this bug report still relevant?
Comment 18 Markus Elfring 2019-03-22 11:32:09 UTC
(In reply to Vinzenz Vietzke from comment #17)

Can corresponding software development attention be increased anyhow for the circumstances around possible re-entrant execution of the function “thunar_thumbnailer_queue_async_reply”?
Comment 19 Vinzenz Vietzke 2019-03-22 11:55:59 UTC
I guess that would be something for upstream development. As I can see you already opened a bug report there: https://bugzilla.xfce.org/show_bug.cgi?id=14122
Maybe ask in #xfce-dev on Freenode IRC or via mailing lists if and how someone could help you with this problem. 

I'll close this bug report for now but feel free to reopen it anytime.
Comment 20 Markus Elfring 2019-03-22 11:58:44 UTC
(In reply to Vinzenz Vietzke from comment #19)
I would appreciate a more constructive feedback for my clarification request.
Comment 21 Vinzenz Vietzke 2019-03-22 13:24:46 UTC
(In reply to Markus Elfring from comment #20)
> I would appreciate a more constructive feedback for my clarification request.

Again, this is something you'll have to ask Xfce or (to be more accurate) Thunar devs.