fkr.dev

About


I am Flo, born 1990, developer, hacker, artist and this is my blog. Into lisps, oil painting and esoteric text editors. On the interwebs I like to appear as a badger interested in cryptography. Every other decade I like spinning old jazz/funk records or programming experimental electronic music as Winfried Scratchmann. Currently located in Aix la Chapelle.

On Capturing the release events of modkeys using X11 1

As one will learn when I finally manage to publish my notes or one of the ~3-7 fragments of blog posts concerned with this topic: I tend to stumble upon the question of Ergonomics and ways to improve Unified Command Line Interfacing - the dwimlayer interfaces between humans and computers, especially using keyboards.
With this in mind I find myself ever-so-often wondering why a special kind of menu or ui interaction is barely used, albeit being a quite effective mechanism. It is the ability of terminating a menu interaction on Key release. The most prominent example of this might be the behaviour of the menu used to switch the focused window under Windows, triggered using Meta(Alt)-Tab. Pressing this key combination spawns a list of available windows. Pressing the Tab key repeatedly while holding the Meta key selects the next menu element, while releasing the Meta key confirms the selected action and closes the menu. If I remember correctly, pressing the Esc key while the menu is visible closes the menu without triggering the currently selected action (and if this is not the current behaviour, I would expect this from a reasonable designed interface).
This way of interaction seems to be a nice sweet spot between two well established ways of putting in an action:
  • no-menu: you press some key (maybe in combination with a modifier) and an action is immediately executed. Either some menu displaying the possibility of the action is permanently visible or you have to memorize the association between action+key combination.
  • preview/confirm: in this case some menu is available, it might be spawned by your key combination, permanently or activated otherwise. Key (sic!) here is, your key combination selects a candidate without triggering its associated action. Advantages: The user is informed about the action that they are about to trigger. This does also and especially allow the interface/menu designer to detail arbitrary information that may dynamically depend on the selected item and the overall state of an application: It may show a preview of the changes the action is going to produce. It may display a longer text detailing the operation. It may even suggest additional key combinations that trigger refined versions of the selected action. The disadvantage at hand is that any selected action needs to be manually confirmed by an additional key press, effectively doubling the work that needs to be done to issue an action in many cases.
Note that, although the Alt-Tab action known from windows does only (?) bind the Tab key as regular action to iterate through the window list there is no reason why one should not have arbitrary keys bound to actions that get selected on key-press, but confirmed on modifier release. My primary hope for this mechanism is to a) improve discoverability of commands: display alternatives and also the “classical” way to activate them through “no-menu” shortcuts, and b) allow previewing the state after activating the action on selection, allowing an user to decide against it and abort the operation by pressing Esc (or, for my preferences: Ctrl-g) before any real harm is done :).
So I felt mildly pleasured to discover a hackernews link of someone who replicated the Alt-Tab menu for Linux Window Managers running under X11. Adhering to the Computer Science Culture hackers cultural spirit, I tested the linked application and found it pretty much working. Unfortunately I find the functionality realized through the windows Alt-Tab menu pretty lacking: It merely allows to iterate over the list of all windows, no filtering, no shortcuts, unclear order. Thanks to our open source software culture though there is nothing holding me back (ok, time) in an effort to fix those shortcomings. So i dabbled in the source code and found the relevant pieces:
// Grab Alt+Tab
        xlib::XGrabKey(
            display,
            tab_key,
            alt_mask,
            root_window,
            1,
            xlib::GrabModeAsync,
            xlib::GrabModeAsync,
        );

        loop {
            let mut event: xlib::XEvent = std::mem::zeroed();
            xlib::XNextEvent(display, &mut event);

            match event.get_type() {
                xlib::KeyPress => {
                    log::debug!("Alt+Tab Pressed [X11]");
                    state::IS_VISIBLE.store(true, Ordering::SeqCst);
                    let index = state::SELECTED_INDEX.load(Ordering::SeqCst);
                    state::SELECTED_INDEX.store(index + 1, Ordering::SeqCst);
                    state::SELECTED_INDEX_CHANGED.store(true, Ordering::SeqCst);
                }
                xlib::KeyRelease => {
                    let xkey = xlib::XKeyEvent::from(event);
                    if xkey.keycode == alt_key as u32 {
                        state::IS_VISIBLE.store(false, Ordering::SeqCst);
                        state::SELECTED_INDEX.store(-1, Ordering::SeqCst);
                        state::SELECTED_INDEX_CHANGED.store(true, Ordering::SeqCst);
                    }
                    if xkey.keycode == tab_key as u32 {
                        //
                    }
                }
                _ => {}
            }
        }
It’s written in Rust, but who cares. The code is pretty straight forward, we grep the tab key when the modifier mask suggests that alt is pressed. We then loop infinite while binding event to the next event issued by the X window server. On keypress through the grabbing, we know Alt-Tab was pressed, on key release we ensure that the released key is the Alt key (and not tab). The bound actions obviously manipulate some kind of global state that we want to replace with our own functionality anyways.
I found myself seriously puzzled on finding out this does not work the way it looks. The key release of the alt key was just not captured. I dug through the source a bit more and found the following behaviour coded in the menu-display logic.
let controller = EventControllerKey::new();
let window_clone = window.clone();
let tabs_clone = tabs.clone();
controller.connect_key_released(
    move |_, keyval, _, _| match keyval.name().unwrap().as_str() {
        "Alt_L" => {
            log::debug!("Alt_L released [GTK]");
            window_clone.hide();
            state::IS_VISIBLE.store(false, Ordering::SeqCst);

            {
                let mut tabs = tabs_clone.write().unwrap();
                tabs.reorder_prev_first();
            }

            state::SELECTED_INDEX.store(-1, Ordering::SeqCst);

            let surface = window_clone.surface().unwrap();
            let display = window_clone.display();
            let monitor = display.monitor_at_surface(&surface).unwrap();
            let monitor_name = monitor.model().unwrap();

            // ...
        }
        _ => {}
    },
);
window.add_controller(controller);
Here we see an EventControllerKey that is instructed to listen on key release events. The function passed (anonymously) checks if the released key is indeed the Alt_L (left) key. If this is the case our global state is manipulated, the window is closed and some information about the order of the list when spawned next time is stored.
Unfortunately this code depends on the GTK4 library used to display the menu for this app. It also seems to capture the Alt_L key only “locally” which means, when the menu has focus. In my everlasting quest for generalization though, at no point I did intend to couple my fix to any specific window-rendering library nor did I tend to comply to the assumption any UI element needs to have focus when releasing any mod key to confirm a selected action.
And so, the wälzen of random google search results i call research had begun again. Sooner than later I found this stackoverflow post that was answered:
I can think of two different ways around this.

1. Select KeyReleaseMask for all windows (and keep track of appearing and disappearing windows); or
2. Once you know Alt is pressed, poll the keyboard state with XQueryKeyboard every 0.1 second or so until it's released.
Both options did not sound sound to me, but the answering user reported the first would work in an experiment, and they also delivered code!
#include <X11/Xlib.h>
#include <X11/Xutil.h>
#include <stdbool.h>
#include <stdio.h>

void dowin (Display* dpy, Window win, int reg)
{
  Window root, parent;
  Window* children;
  int nchildren, i;

  XSelectInput (dpy, win, reg ? KeyReleaseMask|SubstructureNotifyMask : 0);
  XQueryTree (dpy, win, &root, &parent, &children, &nchildren);

  for (i = 0; i < nchildren; ++i)
  {
    dowin (dpy, children[i], reg);
  }

  XFree(children);
}


int main()
{
    Display*    dpy     = XOpenDisplay(0);
    Window      win     = DefaultRootWindow(dpy);
    XEvent      ev;

    unsigned int    alt_modmask       = Mod1Mask;
    unsigned int    ignored_modmask   = 0; // stub
    KeyCode         tab_keycode       = XKeysymToKeycode(dpy,XK_Tab);
    KeyCode         alt_keycode       = XKeysymToKeycode(dpy,XK_Alt_L);

    dowin (dpy, win, True);

    XGrabKey (dpy,
            tab_keycode,
            alt_modmask | ignored_modmask,
            win,
            True,
            GrabModeAsync, GrabModeAsync);


    while(true)
    {
        ev.xkey.keycode = 0;
        ev.xkey.state = 0;
        ev.xkey.type = 0;

        XNextEvent(dpy, &ev);
        switch(ev.type)
        {
            case KeyPress:
                printf ("Press %x: d-%d\n", ev.xkey.window, ev.xkey.state, ev.xkey.keycode);
                break;

            case KeyRelease:
                printf ("Release %x: %d-%d\n", ev.xkey.window, ev.xkey.state, ev.xkey.keycode);
                break;

            case MapNotify:
                printf ("Mapped %x\n", ev.xmap.window);
                dowin (dpy, ev.xmap.window, True);
                break;

            case UnmapNotify:
                printf ("Unmapped %x\n", ev.xunmap.window);
                dowin (dpy, ev.xunmap.window, False);
                break;

            default:
                printf ("Event type %d\n", ev.type);
                break;
        }

    }

    XCloseDisplay(dpy);
    return 0;
}
Welp, that is C source. As I wanted to at least try the presented ’solution’, I was left with the archaic (gretchen) question: translate or ffi? As the latter part of the code essentially corresponded to the first Rust snippet from above, I decided to rewrite the dowin function in rust:
unsafe fn dowin(dpy: *mut Display, win: Window, reg: bool) {
    let mut root: Window = 0;
    let mut parent: Window = 0;
    let mut children: *mut Window = std::ptr::null_mut();
    let mut nchildren: u32 = 0;

    // Convert reg to the desired event mask or 0 if reg is false.
    let event_mask = if reg {
        x11::xlib::KeyReleaseMask | x11::xlib::SubstructureNotifyMask
    } else {
        0
    };

    // Select input events on the current window based on `reg`.
    XSelectInput(dpy, win, event_mask as i64);

    // Query the window tree to get its children.
    XQueryTree(
        dpy,
        win,
        &mut root,
        &mut parent,
        &mut children,
        &mut nchildren,
    );

    // Recursively call `dowin` on each child.
    for i in 0..nchildren as isize {
        dowin(dpy, *children.offset(i), reg);
    }

    // Free the list of children windows to avoid memory leaks.
    XFree(children as *mut std::ffi::c_void);
}
What we do here is: iterate over the tree of all windows and select (not grab) the key release event for each window. We also select the event emitted when new windows get created/disposed. In the loop code above we handle those events by calling the dowin function for those events windows, to either select them too, or cancel our selection from before. I do not really know why this is necessary, how this helps and what differentiates grabbing from selecting events, but adapting this workaround worked!..
Well it worked somehow. It was just the case that scrolling a website in firefox using the vimium c extension now exposed strange behaviour (chrome chromium’s fine). Also selecting links through the shortcut interface did not work robust. I also had to learn that capturing any keys while rofi, a menu selection app I intended to use, runs was not working if I did not activate the special (and.. experimental sigh) normal window mode. (Fortunately there is a rofi-rs repo that allows to call rofi using rust code, unfortunately it does not support normal window mode. Somehow it seems to be the same pattern repeating, ever and ever)
On a personal note: I do not think it should be this hard :(. In a follow up post I describe how I solved this issue (see me wusel through all rust/c/x11 key binding repos at hand), which hack of a bash script the working solution replaces and discuss whether all of this is a konzertierte operation to convince me of using Wayland.
As always I appreciate notes on existing solutions or ways to achieve the intended mechanisms in an appropriate manner. Cheers!

fkr.dev

About


I am Flo, born 1990, developer, hacker, artist and this is my blog. Into lisps, oil painting and esoteric text editors. On the interwebs I like to appear as a badger interested in cryptography. Every other decade I like spinning old jazz/funk records or programming experimental electronic music as Winfried Scratchmann. Currently located in Aix la Chapelle.

On Capturing the release events of modkeys using X11 1

As one will learn when I finally manage to publish my notes or one of the ~3-7 fragments of blog posts concerned with this topic: I tend to stumble upon the question of Ergonomics and ways to improve Unified Command Line Interfacing - the dwimlayer interfaces between humans and computers, especially using keyboards.
With this in mind I find myself ever-so-often wondering why a special kind of menu or ui interaction is barely used, albeit being a quite effective mechanism. It is the ability of terminating a menu interaction on Key release. The most prominent example of this might be the behaviour of the menu used to switch the focused window under Windows, triggered using Meta(Alt)-Tab. Pressing this key combination spawns a list of available windows. Pressing the Tab key repeatedly while holding the Meta key selects the next menu element, while releasing the Meta key confirms the selected action and closes the menu. If I remember correctly, pressing the Esc key while the menu is visible closes the menu without triggering the currently selected action (and if this is not the current behaviour, I would expect this from a reasonable designed interface).
This way of interaction seems to be a nice sweet spot between two well established ways of putting in an action:
  • no-menu: you press some key (maybe in combination with a modifier) and an action is immediately executed. Either some menu displaying the possibility of the action is permanently visible or you have to memorize the association between action+key combination.
  • preview/confirm: in this case some menu is available, it might be spawned by your key combination, permanently or activated otherwise. Key (sic!) here is, your key combination selects a candidate without triggering its associated action. Advantages: The user is informed about the action that they are about to trigger. This does also and especially allow the interface/menu designer to detail arbitrary information that may dynamically depend on the selected item and the overall state of an application: It may show a preview of the changes the action is going to produce. It may display a longer text detailing the operation. It may even suggest additional key combinations that trigger refined versions of the selected action. The disadvantage at hand is that any selected action needs to be manually confirmed by an additional key press, effectively doubling the work that needs to be done to issue an action in many cases.
Note that, although the Alt-Tab action known from windows does only (?) bind the Tab key as regular action to iterate through the window list there is no reason why one should not have arbitrary keys bound to actions that get selected on key-press, but confirmed on modifier release. My primary hope for this mechanism is to a) improve discoverability of commands: display alternatives and also the “classical” way to activate them through “no-menu” shortcuts, and b) allow previewing the state after activating the action on selection, allowing an user to decide against it and abort the operation by pressing Esc (or, for my preferences: Ctrl-g) before any real harm is done :).
So I felt mildly pleasured to discover a hackernews link of someone who replicated the Alt-Tab menu for Linux Window Managers running under X11. Adhering to the Computer Science Culture hackers cultural spirit, I tested the linked application and found it pretty much working. Unfortunately I find the functionality realized through the windows Alt-Tab menu pretty lacking: It merely allows to iterate over the list of all windows, no filtering, no shortcuts, unclear order. Thanks to our open source software culture though there is nothing holding me back (ok, time) in an effort to fix those shortcomings. So i dabbled in the source code and found the relevant pieces:
// Grab Alt+Tab
        xlib::XGrabKey(
            display,
            tab_key,
            alt_mask,
            root_window,
            1,
            xlib::GrabModeAsync,
            xlib::GrabModeAsync,
        );

        loop {
            let mut event: xlib::XEvent = std::mem::zeroed();
            xlib::XNextEvent(display, &mut event);

            match event.get_type() {
                xlib::KeyPress => {
                    log::debug!("Alt+Tab Pressed [X11]");
                    state::IS_VISIBLE.store(true, Ordering::SeqCst);
                    let index = state::SELECTED_INDEX.load(Ordering::SeqCst);
                    state::SELECTED_INDEX.store(index + 1, Ordering::SeqCst);
                    state::SELECTED_INDEX_CHANGED.store(true, Ordering::SeqCst);
                }
                xlib::KeyRelease => {
                    let xkey = xlib::XKeyEvent::from(event);
                    if xkey.keycode == alt_key as u32 {
                        state::IS_VISIBLE.store(false, Ordering::SeqCst);
                        state::SELECTED_INDEX.store(-1, Ordering::SeqCst);
                        state::SELECTED_INDEX_CHANGED.store(true, Ordering::SeqCst);
                    }
                    if xkey.keycode == tab_key as u32 {
                        //
                    }
                }
                _ => {}
            }
        }
It’s written in Rust, but who cares. The code is pretty straight forward, we grep the tab key when the modifier mask suggests that alt is pressed. We then loop infinite while binding event to the next event issued by the X window server. On keypress through the grabbing, we know Alt-Tab was pressed, on key release we ensure that the released key is the Alt key (and not tab). The bound actions obviously manipulate some kind of global state that we want to replace with our own functionality anyways.
I found myself seriously puzzled on finding out this does not work the way it looks. The key release of the alt key was just not captured. I dug through the source a bit more and found the following behaviour coded in the menu-display logic.
let controller = EventControllerKey::new();
let window_clone = window.clone();
let tabs_clone = tabs.clone();
controller.connect_key_released(
    move |_, keyval, _, _| match keyval.name().unwrap().as_str() {
        "Alt_L" => {
            log::debug!("Alt_L released [GTK]");
            window_clone.hide();
            state::IS_VISIBLE.store(false, Ordering::SeqCst);

            {
                let mut tabs = tabs_clone.write().unwrap();
                tabs.reorder_prev_first();
            }

            state::SELECTED_INDEX.store(-1, Ordering::SeqCst);

            let surface = window_clone.surface().unwrap();
            let display = window_clone.display();
            let monitor = display.monitor_at_surface(&surface).unwrap();
            let monitor_name = monitor.model().unwrap();

            // ...
        }
        _ => {}
    },
);
window.add_controller(controller);
Here we see an EventControllerKey that is instructed to listen on key release events. The function passed (anonymously) checks if the released key is indeed the Alt_L (left) key. If this is the case our global state is manipulated, the window is closed and some information about the order of the list when spawned next time is stored.
Unfortunately this code depends on the GTK4 library used to display the menu for this app. It also seems to capture the Alt_L key only “locally” which means, when the menu has focus. In my everlasting quest for generalization though, at no point I did intend to couple my fix to any specific window-rendering library nor did I tend to comply to the assumption any UI element needs to have focus when releasing any mod key to confirm a selected action.
And so, the wälzen of random google search results i call research had begun again. Sooner than later I found this stackoverflow post that was answered:
I can think of two different ways around this.

1. Select KeyReleaseMask for all windows (and keep track of appearing and disappearing windows); or
2. Once you know Alt is pressed, poll the keyboard state with XQueryKeyboard every 0.1 second or so until it's released.
Both options did not sound sound to me, but the answering user reported the first would work in an experiment, and they also delivered code!
#include <X11/Xlib.h>
#include <X11/Xutil.h>
#include <stdbool.h>
#include <stdio.h>

void dowin (Display* dpy, Window win, int reg)
{
  Window root, parent;
  Window* children;
  int nchildren, i;

  XSelectInput (dpy, win, reg ? KeyReleaseMask|SubstructureNotifyMask : 0);
  XQueryTree (dpy, win, &root, &parent, &children, &nchildren);

  for (i = 0; i < nchildren; ++i)
  {
    dowin (dpy, children[i], reg);
  }

  XFree(children);
}


int main()
{
    Display*    dpy     = XOpenDisplay(0);
    Window      win     = DefaultRootWindow(dpy);
    XEvent      ev;

    unsigned int    alt_modmask       = Mod1Mask;
    unsigned int    ignored_modmask   = 0; // stub
    KeyCode         tab_keycode       = XKeysymToKeycode(dpy,XK_Tab);
    KeyCode         alt_keycode       = XKeysymToKeycode(dpy,XK_Alt_L);

    dowin (dpy, win, True);

    XGrabKey (dpy,
            tab_keycode,
            alt_modmask | ignored_modmask,
            win,
            True,
            GrabModeAsync, GrabModeAsync);


    while(true)
    {
        ev.xkey.keycode = 0;
        ev.xkey.state = 0;
        ev.xkey.type = 0;

        XNextEvent(dpy, &ev);
        switch(ev.type)
        {
            case KeyPress:
                printf ("Press %x: d-%d\n", ev.xkey.window, ev.xkey.state, ev.xkey.keycode);
                break;

            case KeyRelease:
                printf ("Release %x: %d-%d\n", ev.xkey.window, ev.xkey.state, ev.xkey.keycode);
                break;

            case MapNotify:
                printf ("Mapped %x\n", ev.xmap.window);
                dowin (dpy, ev.xmap.window, True);
                break;

            case UnmapNotify:
                printf ("Unmapped %x\n", ev.xunmap.window);
                dowin (dpy, ev.xunmap.window, False);
                break;

            default:
                printf ("Event type %d\n", ev.type);
                break;
        }

    }

    XCloseDisplay(dpy);
    return 0;
}
Welp, that is C source. As I wanted to at least try the presented ’solution’, I was left with the archaic (gretchen) question: translate or ffi? As the latter part of the code essentially corresponded to the first Rust snippet from above, I decided to rewrite the dowin function in rust:
unsafe fn dowin(dpy: *mut Display, win: Window, reg: bool) {
    let mut root: Window = 0;
    let mut parent: Window = 0;
    let mut children: *mut Window = std::ptr::null_mut();
    let mut nchildren: u32 = 0;

    // Convert reg to the desired event mask or 0 if reg is false.
    let event_mask = if reg {
        x11::xlib::KeyReleaseMask | x11::xlib::SubstructureNotifyMask
    } else {
        0
    };

    // Select input events on the current window based on `reg`.
    XSelectInput(dpy, win, event_mask as i64);

    // Query the window tree to get its children.
    XQueryTree(
        dpy,
        win,
        &mut root,
        &mut parent,
        &mut children,
        &mut nchildren,
    );

    // Recursively call `dowin` on each child.
    for i in 0..nchildren as isize {
        dowin(dpy, *children.offset(i), reg);
    }

    // Free the list of children windows to avoid memory leaks.
    XFree(children as *mut std::ffi::c_void);
}
What we do here is: iterate over the tree of all windows and select (not grab) the key release event for each window. We also select the event emitted when new windows get created/disposed. In the loop code above we handle those events by calling the dowin function for those events windows, to either select them too, or cancel our selection from before. I do not really know why this is necessary, how this helps and what differentiates grabbing from selecting events, but adapting this workaround worked!..
Well it worked somehow. It was just the case that scrolling a website in firefox using the vimium c extension now exposed strange behaviour (chrome chromium’s fine). Also selecting links through the shortcut interface did not work robust. I also had to learn that capturing any keys while rofi, a menu selection app I intended to use, runs was not working if I did not activate the special (and.. experimental sigh) normal window mode. (Fortunately there is a rofi-rs repo that allows to call rofi using rust code, unfortunately it does not support normal window mode. Somehow it seems to be the same pattern repeating, ever and ever)
On a personal note: I do not think it should be this hard :(. In a follow up post I describe how I solved this issue (see me wusel through all rust/c/x11 key binding repos at hand), which hack of a bash script the working solution replaces and discuss whether all of this is a konzertierte operation to convince me of using Wayland.
As always I appreciate notes on existing solutions or ways to achieve the intended mechanisms in an appropriate manner. Cheers!