Ticket #4636 (new defect)
Fails to read directory if resized while external editor was running
Reported by: | egmont | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | Future Releases |
Component: | mc-core | Version: | 4.8.33 |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Branch state: | no branch | Votes for changeset: |
Description
Have two different directories in the two panels. Make sure the "other" panel has lots of files.
Edit a file in "this" panel using an external editor by pressing F4. (Note that executing the command "my-editor filename.txt" doesn't trigger the bug, it has to be F4, and it has to be an external editor. Interestingly, F3 with an external viewer doesn't trigger the bug either.)
While running this external editor, resize the terminal.
Quit the external editor. mc's default look of two panels is back.
The "other" panel contains lots of incorrect entries. Many of the files appear with invalid type ('?' before the filename), dated Jan 1 1970, and in red (error) color. As if stat() failed on them.
In fact, strace shows that mc concatenates "this" panel's directory and the "other" panel's filename, and tries to stat() that, which indeed almost always fails.
Here's what happens according to strace, after quitting the external editor:
First, both panel's directory is listed and all the files are stat()ed correctly.
Then, a little bit later, the other directory is listed again (getdents64()), and the first dozen or so files are newfstatat()'ed there as expected, but then this gets interrupted for a while.
The next ~200 lines are about totally different things. stat()'ing /etc/localtime many times, reading /proc/self/mountinfo multiple times, and even doing some UI update stuff, apparently actually painting mc's entire display. (No sign in strace of this being related to a SIGWINCH or other signal arriving.)
Then, after these ~200 lines of strace output, stat()'ing the directory entries continues, but this time under the wrong directory. (Could perhaps some global variable, containing the directory path, changed during those ~200 lines??)
Note that when mc first starts up and everything is okay, stat()'ing the directory entries is then also interrupted at the very same position (after the first dozen-ish entries) for a pselect() call, trying to read from one of its pipe file descriptors with a 0 timeout. But then pselect() doesn't report any data available. When the bug occurs, it begins with that same pselect() which then returns that data is available, and the next step successfully reads a single \0 byte from that pipe. This is the beginning of those ~200 lines of things that somehow break the directory.
I can reproduce this bug with 4.8.31, 4.8.33, and current master (post 4630_resize). Cannot reproduce on another machine with 4.8.25.
Change History
comment:4 Changed 34 hours ago by egmont
It's 789aa0bb2289f22a51215b6002cefe9aca692341.
(You might need to cherry-pick 6a4107b27716511fa8c56c7c327da72996650046 to compile versions thereabouts.)
comment:5 Changed 34 hours ago by egmont
Note that preceding versions also interrupt the flow of stat()'ing all the files after about a dozen of them, and doing that pselect() with a possible ~200 line followup before resuming to those stat()s.
I have no idea what could have been the reason to interrupt this flow, but (without knowing the cause) this sounds like a pretty fragile idea in the first place and I guess it should be addressed.
I guess the bisected commit just sort-of-accidentally happens to bring a bad design into the surface, but is not related to the root cause.
Mind you, I haven't studied what and why the code actually does.
comment:6 Changed 15 hours ago by egmont
Something quite interesting happens on the UI that's pretty hard to spot with the naked eye :)
The "other" panel is first painted with only those 13 (or so) files that it manages to stat. And later it's repainted with all the remaining, incorrectly stat'ed filed.
How to see it?
Start script and then start mc inside that. Reproduce the steps necessary to invoke this bug. When resizing the window (as part of the reproducer steps), make it just slightly bigger than it was before.
Then quit mc and also exit from the shell started by script.
Next, replay the resulting typescript file slowly. One way to do it is using slowcat as found in VTE's source. Get VTE from https://gitlab.gnome.org/GNOME/vte/, compile using meson && ninja.
Replay the recorded output with something like slowcat -t 1000 typescript
Note: The actual resizing of the window isn't replayed. If you just made the window slightly bigger while reproducing the bug, and stick to that bigger size while replaying, it should be okay (with some harmless gap).
comment:7 Changed 15 hours ago by egmont
In src/filemanager/dir.c there are two similar methods that perform an mc_readdir() in a loop.
Both of them call list->callback (DIR_READ, dp) which is src/filemanager/panel.c's panel_dir_list_callback.
This, at the 16th occasion of calling (including "." and ".." which are skipped, I presume) calls rotate_dash (TRUE).
In src/filemanager/layout.c, rotate_dash() calls mc_refresh ().
So, after reading and properly processing 13 directory entries (plus the two that are skipped), as we're about to process the next entry, before processing that an entire UI update kicks in, thereby displaying those 13 files that we've stat()'ed so far.
The idea of updating the UI after reading 13 entries is already quite bad. And, interestingly, there are no subsequent UI updates (not even the rotating dash) per every 16 directory entry.
This is just a tiny part of the whole picture, but this is what I got so far. There are missing pieces inside: why does mc_refresh() break the reference directory? And there are missing pieces outside: why is this triggered only if external F4 and window resize are involved?
comment:8 Changed 15 hours ago by egmont
I have to correct myself:
rotate_dash() is indeed called after every 16 files, but because it has its own internal clock to rotate only after 1/10 seconds, it mostly like doesn't result in further UI updates.
But it still raises questions:
- If it takes ages to load the directory, then in addition to the rotating dash, is it really desired to keep updating the partial file listing, according to the directories we've read so far? Or shall the panel remain empty until everything's loaded? I'm not sure.
- Is it desired to do an update after the first 13 entries? I highly doubt, it's an unpleasant flicker on the UI if the action completes quickly (as most of the time it does).
comment:9 Changed 14 hours ago by egmont
I've eliminated the timing constraint from rotate_dash, and replayed slowly what mc does.
The panel listing is updated after the first 13 entries, but later on it's not updated at every 16 more files, it's only the quickly rotating dash that is updated then.
So, somehow, the first time when panel_dir_list_callback()'s counter reaches 16, it results in an entire UI update (showing the partial file listing we've seen so far). On subsequent occasions when it reaches a next multiple of 16, it's just the rotating dash that's updated.
comment:10 Changed 14 hours ago by zaytsev
If it takes ages to load the directory, then in addition to the rotating dash, is it really desired to keep updating the partial file listing, according to the directories we've read so far? Or shall the panel remain empty until everything's loaded? I'm not sure.
If I remember correctly, this was quite a cool feature in Volkov Commander. I also remember the files unmarking themselves in mc, which I found really sweet. It could be that I'm hallucinating though. Not something we can't live without, if it complicates the system so much that it leads to obscure bugs though... event loop... sigh.
Broke somewhere between 4.8.26 and 4.8.27.