Skip to content

"Smart search" for HTML

This MR was inspired by #107 , although it does not fully address that issue.

This is a start towards implementing a more user-friendly content search for rich text formats.

Ordinary end users (i.e., non-developers), typically open files such as .rtf, .odt, and .html in external programs that only show the actual display text, and hide the file contents used to control the display of this content. Thus, they would not expect catfish to pull up files in a content search if those terms do not show up using Ctrl-F in a WYSIWIG editor.

I only implemented this in HTML for now because it is a bit more straightforward than the other formats. .ODT / .DOC will probably be a bit trickier to implement as they are binary.

For now, I wanted to check that we are in agreement that this is a path we wish to take with the program and that this is a good way to implement it, performance-wise, before any continued work on this.

Merge request reports

Loading