Ticket #117 (closed defect: fixed)
savannah: consecutive resize events not handled correctly
Reported by: | egmont | Owned by: | andrew_b |
---|---|---|---|
Priority: | minor | Milestone: | 4.8.24 |
Component: | mc-tty | Version: | master |
Keywords: | Cc: | ossi, zaytsev, egmont@… | |
Blocked By: | Blocking: | ||
Branch state: | merged | Votes for changeset: | andrew_b |
Description (last modified by ossi) (diff)
Original: http://savannah.gnu.org/bugs/?17822
Submitted by: | Egmont Koblinger <egmont> | Submitted on: | Fri 22 Sep 2006 01:04:44 PM UTC |
Category: | Screen output | Severity: | 3 - Normal |
Status: | None | Privacy: | Public |
Assigned to: | None | Open/Closed: | Open |
Release: | 4.6.1 | Operating System: | GNU/Linux |
Original submission:
Modern desktop environments and window managers usually resize the windows in an opaque way: plenty of resize events are sent to the application while the edge of the window is being dragged. This gives users a much better feedback than the ancient method (only showing where the edges of the window will be, and resizing when the mouse button is released), though it definitely requires much more cpu resources. Unfortunately mc is unable to properly handle when the terminal is resized opaquely. It receives tons of resize events (sigwinch), most likely ignores the new ones while processing an older one. Quite often when I finish resizing my window, mc draws its panels with a different size. Hence if I enlarge my terminal, I might get black columns/rows on its right/bottom part. If I shrink the window, the whole screen can be garbled. Then when I press one key, or click somewhere in the terminal with the mouse, suddenly mc's screen is repainted with the right size, and everything goes on perfectly. It would be nice to eliminate these race conditions and make mc resize correctly to the final terminal size, no matter how many resize events were sent in a very short time. Bonus (reported in #368): if you start mc in an xterm and resize the terminal before mc shows the panels, it will not be adjusted to the terminal's size until you resize again once the panels are there.
Comment 1 by Egmont Koblinger <egmont> at Tue 10 Oct 2006 03:27:46 PM UTC:
Normally mc stays in a select() call in key.c:get_event() which is called by dialog.c:frontend_run_dlg(). frontend_run_dlg() checks if the window size has changed and calls the proper functions in this case. Then it does a lot of other things, then calls get_event() which yet again does a lot of other things and finally arrives at that select(). A subsequent window resize event (SIGWINCH) causes this select to return with -1 EINTR which is later handled correctly. The problem is caused by the lot of code lines executed after checking for a window size change, but before entering the select call. If another window size change event occurs while executing these commands, it will not be handled until select() exits due to a keypress for example. In key.c:get_event(), I placed the following line above the "flag = select (...)" statement: if (winch_flag) return EV_NONE; and then my resizing problems were gone. So now I check again for a resize event right before entering the select function. Though this modification seems to solve my bug, another bug appeared. Press F5 on ".." so an error dialog box appears. Resize the window now. mc enters an infinite loop consuming 100% CPU time and not reacting on any keypress. I don't yet know how to fix this new bug. Swapping the new statement and the enable_interrupt_key() call right before it doesn't help either. On the other hand, even if we didn't have any side effects, my patch still doesn't fix my original problem, only decreases the possibility for this to occur. Still there is a chance that SIGWINCH arrives right after checking for winch_flag but before entering the select() call. I'm just curious how to solve this problem 100% perfectly, without any race condition, without the slightest possibility to misbehave. Now I'm interested in the theory first, not in the implementation details in mc. Let's suppose a single application that only wants to do two things: process data arriving from several file descriptors, and always correctly display the terminal size from its main loop (i.e. not from a signal handler, since doing complex things from a signal handler is just plain wrong). My only idea is the following: create a pipe, the sigwinch handler writes a byte into it, and the select() call checks for its reading end in addition to the other file descriptors it is interested in. This way the select() call immediately exits if there's an unprocessed sigwinch event, no matter if it occured before or during this select call. And of course we have to set the close-on-exec flag on these fd's so that subprocesses don't inherit them.
Comment 2 by Egmont Koblinger <egmont> at Tue 10 Oct 2006 03:40:51 PM UTC:
Oh, I just found the pselect() call which is supposed to solve this problem. However, its manpage says the Linux kernel supports this only since 2.6.16 which is a quite new piece. It says that glibc had an emulation which is vulnerable to the race condition I just mentioned and pselect was just introduced for. The manpage also mentions and recommends the self-pipe trick which is more portable.
Comment 3 by Egmont Koblinger <egmont> at Tue 10 Oct 2006 04:22:16 PM UTC:
Added a resize.patch. This does not suffer from the new bug, though I don't understand what made a difference. Still a rewrite to using pipes is needed to fill a minor race condition.
Comment 4 by Leonard den Ottolander <leonardjo> at Fri 03 Nov 2006 07:14:55 PM UTC:
Shall I commit this patch or should I wait for the piped version?
Comment 5 by Egmont Koblinger <egmont> at Mon 06 Nov 2006 12:01:58 PM UTC:
Please commit it if it seems to be okay for you (I've been using mc 4.6.1 with this patch for a month, and found no side effects, while it makes mc much better when resizing the terminal). But please don't yet close this bugreport to remind us that a proper fix is still needed. This patch only makes it much better, but still there's a small chance for the bug to appear. Until we have a proper fix, it's good to apply a temporary hack to heavily decrease the chance for the bug.
Comment 6 by Leonard den Ottolander <leonardjo> at Mon 06 Nov 2006 10:24:35 PM UTC:
As I understand your patch/hack, eliminating the timeouts on a window resize event significantly reduces the chance mc is still waiting inside the get_event() loop when a new resize event occurs.
Comment 7 by Leonard den Ottolander <leonardjo> at Wed 08 Nov 2006 01:38:24 PM UTC:
Committed. Quite an improvement in behaviour. Thank you.
Comment 8 by Denis Vlasenko <vda> at Mon 01 Oct 2007 03:21:44 PM UTC:
If you run without mouse support (mc -d), mc doesn't detect resizes. You need to press a key for mc to resize itself to new window after window is maximized or resized (press up/down arrow, Ctrl-O, almost anything will work). Tested on 4.6.2-pre1.
Comment 9 by Oswald Buddenhagen <ossi> at Mon 01 Oct 2007 09:03:01 PM UTC:
when a resize is sent before mc is completely up, it doesn't get the geometry right, either - and yes, this happens about every time an xterm -e mc is restored by session management on my system. so the signal handler should be set up before the initial geometry is queried.
Comment 10 by Pavel Tsekov <ptsekov> at Tue 02 Oct 2007 09:18:19 AM UTC:
Please, do not steal existing bug reports.
Attachments
Change History
comment:3 Changed 15 years ago by ossi
i had a quick look at the code, and it is the very definition of "mess". :)
what should be used here is the so-called signal self-pipe trick which is employed to serialize signals into the regular event loop.
a pretty minimal qt-based example can be seen at http://websvn.kde.org/trunk/KDE/kdebase/workspace/kscreensaver/libkscreensaver/main.cpp?r1=737514&r2=737513&pathrev=737514&view=patch
that would also remove the need for "minimal" resize handling and special-casing of slang vs. ncurses. the current code isn't signal-safe anyway - you may still get a random crash during resizing.
the same also applies to the sigchld handler (or rather, the two of them, which is a problem in itself). so maybe it would be appropriate to re-purpose this bug to a general "clean up signal handling".
comment:5 Changed 15 years ago by zaytsev
- Cc zaytsev added
- Version changed from 4.6.1 to master
- Description modified (diff)
- Milestone changed from Future Releases to 4.7
Moved an extra bit from #368 to this report.
comment:6 Changed 13 years ago by andrew_b
- Branch state set to no branch
- Milestone changed from 4.7 to Future Releases
comment:8 follow-up: ↓ 9 Changed 11 years ago by ossi
to fix this bug properly, one needs to make some improvements to the main loop.
what do you guys think of switching to the glib mainloop and using proper socket notifiers and timers instead of the current handcrafted solution?
comment:9 in reply to: ↑ 8 Changed 11 years ago by andrew_b
Replying to ossi:
what do you guys think of switching to the glib mainloop?
Yes, we think about that.
comment:10 Changed 11 years ago by ossi
- Cc ossi added; ossi@… removed
- Description modified (diff)
- Reporter changed from slavazanko to egmont
comment:12 Changed 11 years ago by egmont
- Priority changed from major to minor
- Cc egmont@… added
Now that pselect() is widely available, one quick and easy way to properly fix the problem is to change the main select() call to a pselect() which blocks/unblocks sigwinch. Glib mainloop would of course be much nicer, but probably more work. For what it's worth, the bug never occurred to me since it was "almost fixed", the race condition window is so small.
comment:13 Changed 11 years ago by ossi
i don't really like that - the critical sections would have to be quite large, and it's a nightmare to make sure everything is covered.
note that the glib mainloop is no hard precondition. i just wouldn't want to think of hacking the existing mainloop(s) if a cleaner solution is planned anyway.
comment:14 Changed 5 years ago by andrew_b
- Owner set to andrew_b
- Status changed from new to accepted
- Component changed from mc-core to mc-tty
- Branch state changed from no branch to on review
- Milestone changed from Future Releases to 4.8.24
Branch: 117_sigwinch
changeset:3b124b0d058afbb223264dd8217b4224ae9a4c38
comment:15 Changed 5 years ago by egmont
Is it really worth it? Looks quite complex for a nasty hackish workaround to me, instead of using the proper pselect.
comment:16 Changed 5 years ago by egmont
Also, are these pipe fds closed when launching any external process? I can't test the patch now, sorry, but there's at least no cloexec in it.
comment:17 Changed 5 years ago by ossi
signal serialization via a pipe is the standard way to do things in an event loop based application. indeed, the linux kernel recently added dedicated poll()-able objects to make the workaround with the pipe unnecessary (forkfd, signalfd, eventfd, and some more iirc).
this implementation has the additional feature of signal compression, which is appropriate.
leaked fds should be indeed avoided. see pipe2() and fcntl() with O_CLOEXEC (of course, one can manually track the fds as well, but that's annoying).
comment:18 Changed 5 years ago by ossi
fwiw, the error handling should be a lot less defensive - when pipe() or fcntl() fail, something is seriously wrong, and you should perror() and exit(3) rather than falling back to no action. that also removes the need for conditionals down the path.
comment:19 Changed 5 years ago by egmont
signal serialization via a pipe is the standard way to do things in an event loop based application.
I'd agree that this used to be the standard way (or rather: standard workaround) for this, ages ago, until a cleaner solution was designed.
indeed, the linux kernel recently added dedicated poll()-able objects
According to its manual page, "pselect() is defined in POSIX.1g, and in POSIX.1-2001 and POSIX.1-2008", furthermore, "pselect() was added to Linux in kernel 2.6.16" (which was released in 2006). That's not "recently", that's a long ago.
comment:20 Changed 5 years ago by andrew_b
pselect() has portability problems.
comment:21 Changed 5 years ago by egmont
This function is missing on some platforms:
OpenBSD 3.8
Released in 2005
Minix 3.1.8
2010
AIX 5.1
2001
HP-UX 11.23
2003
IRIX 6.5
2006
OSF/1 5.1
2000
Solaris 9
2002
mingw
last release is 6 years old
MSVC 14
2015 (if I'm looking at the right thing: Visual C++ 2015 a.k.a. C++ 14.0), two new versions since
Interix 3.5
2004
BeOS.
last version in 2000
On some platforms, this function fails to detect invalid fds with EBADF, but only if they lie beyond the current maximum open fd:
FreeBSD 8.2.
2013
Honestly, why do we care about such old systems, or ones that haven't been able to implement a standard for 10+ (but even like 18-ish) years?
Let me put it into a different perspective: Is there any platform where mc currently compiles and runs fine, and pselect would break it?
comment:22 Changed 5 years ago by ossi
how exactly is pselect() a pollable object? i don't think you bothered to research the relevant keywords i posted here ...
even with perfect pselect() support, getting signal handling right in an event-loop based application (and even more a library, like glib and qt) is basically an impossible task. just forget about it.
comment:23 follow-up: ↓ 24 Changed 5 years ago by egmont
how exactly is pselect() a pollable object?
Indeed, my bad, I didn't literally answer to your question, rather meant to answer to the generic concerns about pselect.
getting signal handling right in an event-loop based application (and even more a library, like glib and qt) is basically an impossible task
Could you please elaborate? Especially in the context of mc, which isn't a library, doesn't use glib's main loop, nor qt.
mc spends its idle times in one of its few select() calls. Let's generally block WINCH, and replace these select()s with pselect()s that unblock WINCH for their duration only; let the signal handler flip a bit (as it does now), and let's check for that bit after the pselect() and resize if necessary (as it's pretty much done now). What's the problem?
Given that pselect was specifically designed to solve this very problem we're arguing about, e.g. quoting from its manual page
The reason that pselect() is needed is that if one wants to wait for either a signal or for a file descriptor to become ready, then an atomic test is needed to prevent race conditions.
I'm truly skeptical if anyone says pselect() is a problematic choice.
comment:24 in reply to: ↑ 23 ; follow-up: ↓ 25 Changed 5 years ago by ossi
Replying to egmont:
in the context of mc, which [...] doesn't use glib's main loop
as a matter of fact, it's supposed to start using the glib main loop once someone does the work. the current loops are an utter #ifdef mess.
mc spends its idle times in one of its few select() calls.
"one of its few" is a recipe for "fun".
What's the problem?
none so far, except that you get an extremely rigid structure that gets harder to extend and more fragile each time you (try to) add something to it. sigchld? timers?
comment:25 in reply to: ↑ 24 ; follow-up: ↓ 26 Changed 5 years ago by egmont
as a matter of fact, it's supposed to start using the glib main loop once someone does the work. the current loops are an utter #ifdef mess.
Let's code against current mc, not a future mc that we have in our minds but is unlikely to get implemented in the foreseeable future.
"one of its few" is a recipe for "fun".
The very same kind and amount of "fun" as the current raceable WINCH handling, or the proposed self-pipe solution.
What's the problem?
none so far, except that you get an extremely rigid structure that gets harder to extend and more fragile each time you (try to) add something to it. sigchld? timers?
I truly don't see it justified that replacing the current select() with a pselect() to temporarily unblock WINCH would result in an any more rigid structure than the self-pipe trick, e.g. I don't see it justified that adding a timer to pselect() would be any more complicated than adding it to select().
Mind you, it doesn't hold in the other direction either. It's not that the self-pipe trick is bad per se, nor that it's significantly more complex than pselect. It's just that the self-pipe trick is an ancient workaround for something that has had a properly designed solution for 10+ years. Hence my preference of going with the latter.
comment:26 in reply to: ↑ 25 ; follow-up: ↓ 27 Changed 5 years ago by ossi
Replying to egmont:
Let's code against current mc, not a future mc that we have in our minds
the code is already written and happens to be more compatible with the assumed future direction. it would be plain silly to change the approach now.
comment:27 in reply to: ↑ 26 Changed 5 years ago by egmont
the code is already written
Including the lack of O_CLOEXEC which needs to be fixed before merging to master.
and happens to be more compatible with the assumed future direction.
I still don't see this claim being justified.
it would be plain silly to change the approach now.
Let andrew_b pick whichever of these two approaches, based on the comments so far. Andrew, whichever you pick, thanks for the work!
comment:28 Changed 5 years ago by andrew_b
Sorry, I'm too busy and have no time to fix my code. I'll to that on weekend.
comment:29 Changed 5 years ago by andrew_b
Fixed: [3b124b0d058afbb223264dd8217b4224ae9a4c38].
I hope.
comment:30 follow-up: ↓ 31 Changed 5 years ago by ossi
the pipe handling is now correct afaict.
anyway, i had a somewhat closer look at the rest now, and i'm not sure it makes sense: why is the setup specific to the toolkit? why are you still chaining to slang? isn't it possible to notify it externally, synchronously with the main loop?
the other question i have is how this plays with nested main loops (i.e., modal dialogs). will the whole stack resize, or only the top window?
comment:31 in reply to: ↑ 30 ; follow-up: ↓ 32 Changed 5 years ago by andrew_b
Replying to ossi:
why is the setup specific to the toolkit?
Because we have two different toolkits.
why are you still chaining to slang? isn't it possible to notify it externally, synchronously with the main loop?
What do you mean? I'm afraid I'm not quite understanding you.
the other question i have is how this plays with nested main loops (i.e., modal dialogs). will the whole stack resize, or only the top window?
pipe is just replacement for global flag winch_flag. SIGWINCH handling is unchanged,
comment:32 in reply to: ↑ 31 ; follow-up: ↓ 33 Changed 5 years ago by ossi
Replying to andrew_b:
Because we have two different toolkits.
that doesn't answer the question. if the setup was toolkit-agnostic, there would be no need to treat them separately.
What do you mean?
the slang codepath still passes on sigwinch directly to slang. it is well possible that slang offers an api function to trigger a size adjustment, making the special-cased signal handling unnecessary.
pipe is just replacement for global flag winch_flag. SIGWINCH handling is unchanged,
there is a major difference, which is that now querying the pipe implicitly resets the "flag", while previously it didn't. and the querying is done in several places. i didn't check the implications, just pointing it out.
comment:33 in reply to: ↑ 32 Changed 5 years ago by andrew_b
comment:34 Changed 5 years ago by andrew_b
- Votes for changeset set to andrew_b
- Branch state changed from on review to approved
comment:35 Changed 5 years ago by andrew_b
- Status changed from accepted to testing
- Resolution set to fixed
- Branch state changed from approved to merged
Merged to master: [af65483e97f85eb1bc067dca906d0cba6996a0f1].
comment:37 Changed 5 years ago by andrew_b
Fix is in #4019.
comment:38 Changed 5 years ago by andrew_b
Yet another fix: #4052.
(In #2198) Replying to Spinal:
Definetely, no.
Strange. I've tried 4.6.1 from repo, and it doesn't work also. It seems, subshell resizing in NCurses-based mc didn't work ever.
Replying to egmont:
Correct, it is an NCurses-related bug only. But implementation of window resize handling in S-Lang-based mc is not quite correct.
In general, the implementation of SIGWINCH signal handling in mc should be rewritten. One of possible way is described in #117.