Simplify detection that the selected file is an archive
Submitted by Sergey Ponomarev
Assigned to Jannis Pohlmann
Description
Sorry guys, this is a big issue which involves the Thunar sources itself. When clicking right button on an archive file it is shown a context menu with "Extract here" option. Internally the plugin tries to determine that the selected file is an archive. To do this the plugin contains a full list of archive MIME types:
static const gchar TAP_MIME_TYPES[][34] = { "application/x-7z-compressed", "application/x-gtar", "application/zip", ... };
The problem here is that in fact there are more compressor algorithms. Recently was published ZStandard (zstd) *.zst and Brotli *.br and we need to support them too. I wanted to add them but in fact I decided to find a way to get rid of the TAP_MIME_TYPES list at all. To determine a content type internally Thunar uses GLib which uses the Shared MIME Info library https://www.freedesktop.org/wiki/Software/shared-mime-info/ which is kind of clone of libmagic + icons + translations. Unfortunately the shared-mime-info doesn't have a clear flag that the type is an archive. But all archive types (tar, zip, gz etc) always have the same icon "package-x-generic". So we can use the content type's icon name to determine that it is an archive or compressed file:
generic_icon_name = g_content_type_get_generic_icon_name (content_type); file_is_archive = g_strcmp0 (generic_icon_name, "package-x-generic") == 0;
This looks like a hack but it always works and when zst and br or any other will be added to Shared MIME Info then the plugin starts to recognize them.
I already created a patch for Thunar Send To Email tool: Thunar SendTo Email: improve archives detection https://bugzilla.xfce.org/show_bug.cgi?id=15917
Please take a look on it first because then it will be easier to understand the changes proposed in this ticket.
Implementation:
First of all I decided that it would be better to extract archive detection from the thunar-archive-plugin into thunarx and create a new function thunarx_file_info_is_archive() 0001-thunarx-expose-a-new-function-thunarx_file_info_is_a.patch (This is a patch for Thunar itself. Do to forget to install the thunarx headers with "cd thunarx; make; make install")
The implementation will just always return FALSE but we'll just call the function from the thunar-archive-plugin. See patches 0001-tap-backend.c-Use-g_list_free_full-where-applicable.patch (refactoring to fix warnings) and 0002-Use-thunarx_file_info_is_archive-instead-of-detectin.patch for thunar-archive-plugin.
You can see the resulted branch on GitHub: https://github.com/stokito/thunar-archive-plugin/tree/thurarx_archive
Then I implemented the thunar_file_is_archive() function in the same manner as I did for send-to-email:
/**
- thunar_file_is_archive:
- @file : a #ThunarFile instance.
- Checks whether @file refers to an archive e.g. tar, zip, gz etc.
- Return value: %TRUE if @file is an archive. **/ gboolean thunar_file_is_archive (const ThunarFile *file) { const gchar *content_type; gchar *generic_icon_name; gboolean file_is_archive;
if (thunar_file_is_directory (file)) return FALSE;
content_type = thunar_file_get_content_type (THUNAR_FILE (file)); if (content_type == NULL) { return FALSE; }
/* If icon for the content is "package-x-generic" then the type is archive.
- GLib internally uses a shared-mime-info library to determine a content type and it's description.
- Unfortunately the shared-mime-info doesn't have a clear flag that the type is an archive.
- But all archive types (tar, zip, gz etc) always have the same icon "package-x-generic".
- So we can use the content type's icon name to determine that it is an archive or compressed file. */ generic_icon_name = g_content_type_get_generic_icon_name (content_type); if (generic_icon_name == NULL) { return FALSE; } file_is_archive = g_strcmp0 (generic_icon_name, "package-x-generic") == 0; g_free (generic_icon_name); return file_is_archive; }
See 0002-thunarx-implement-thunarx_file_info_is_archive.patch
I checked and everything is working just fine!
You can see the resulted branch on GitHub: https://github.com/stokito/thunar/tree/thunarx_archive_simple
In fact there can make a performance improvement. First of all we can use the same approach as for thunar_file_is_directory() and check only a file flag instead checking the file content type each time on the thunar_file_is_archive() call. We need to add into ThunarFileFlags bitmask a new value THUNAR_FILE_FLAG_IS_ARCHIVE and then inside of thunar_file_is_archive() we can check FLAG_IS_SET(file, THUNAR_FILE_FLAG_IS_ARCHIVE)
Second improvement is that since we deal with an icon so the same icon is already resolved inside of thunar_file_get_icon_name(). There we call the g_themed_icon_get_names() function which returns a list of all icons. Internally the g_themed_icon_get_names calls the same g_content_type_get_generic_icon_name() as we already used. So from the returned list we can check that if there is the "package-x-generic" then the file is an archive.
See 0002-thunarx-implement-thunarx_file_info_is_archive-by-ic.patch with the optimized version.
This is a great performance improvement because we already have this icons list and we can do the archive type check almost for free. Yes, it looks even more ugly but in the same time we'll use even less code and it will be faster.
I'll attach patches with the fix and small refactoring as comments because I don't now how to attach multiple patch files: But you can also see it on github: thunar-archive-plugin https://github.com/stokito/thunar-archive-plugin/tree/thurarx_archive thunar simple: https://github.com/stokito/thunar/tree/thunarx_archive_simple thunar optimized: https://github.com/stokito/thunar/tree/thurarx_archive_optimized
Version: 0.4.0