Few optimizations to `CatfishSearchEngine.py` for better performance and lesser memory usage
I think we can slightly optimize catfish 1. The `else` block at line 292 in method [`CatfishSearchEngine.search_filenames` in CatfishSearchEngine.py](https://gitlab.xfce.org/apps/catfish/-/blob/master/catfish/CatfishSearchEngine.py#L292) ```py # OLD match_list = set() for kword in keywords: if kword in fname.lower(): match_list.add(kword) if len(set(keywords)) == len(match_list): return True # NEW # Avoids repeated substring search for duplicates in `keywords` # Avoids repeated processing of converting `fname` to lower case # Avoids repeated conversion of object `keywords` to `set` # No need of an extra `set` data structure `match_list` (thus reducing memory usage) # Returns immediately if a single `kword` is not present in `fnanme` fname = fname.lower() for kword in set(keywords): if kword not in fname: return False return True ``` 2. In method [`CatfishSearchMethod_Zeitgeist.run` at line 674 of CatfishSearchEngine.py](https://gitlab.xfce.org/apps/catfish/-/blob/master/catfish/CatfishSearchEngine.py#L674) - This change avoid repeated conversion of string `keywords` to lower case in nested `for` loops ```py # line=676 # OLD keywords = " ".join(keywords) # NEW keywords = " ".join(keywords).lower() ``` ```py # line=707 # OLD if keywords.lower() in filename and \ # NEW if keywords in filename and \ ``` 3. `else` block of method [`CatfishSearchMethod_Fulltext.search_text` at line 557 of CatfishSearchEngine.py](https://gitlab.xfce.org/apps/catfish/-/blob/master/catfish/CatfishSearchEngine.py#L557) - I did some performance testing using the attached files `optimization3_.*` and found that the proposed change indeed improved the performance. Following are some of the results - [optimization3_speed_test.py](/uploads/f66d0c13bc2931252f2017fb0b76d76a/optimization3_speed_test.py) - [optimization3_speed_test_input_generator.py](/uploads/de76a27b1c89c6e8c6553d0c127dc099/optimization3_speed_test_input_generator.py) ``` [ExecutionTime] "catfish_old_full_search" = 00:00:08.600 (i.e. ~8 seconds) [ExecutionTime] "new_full_search" = 00:00:00.285 (~0.29 seconds) [ExecutionTime] "catfish_old_full_search" = 00:00:00.935 (i.e. ~0.9 seconds) [ExecutionTime] "new_full_search" = 00:00:00.128 (i.e. ~0.13 seconds) ``` ```py # OLD def search_text(self, lines, keywords): if self.exact: for line in lines: if " ".join(keywords) in line.lower(): return True else: match_list = set() for line in lines: for kword in keywords: if kword in line.lower(): match_list.add(kword) if len(set(keywords)) == len(match_list): return True # NEW # Avoids repeated substring search for duplicates in `keywords` # Avoids repeated conversion of `keywords` object to `set` # Avoids repeated substring search if the keyword was already found in the past # Extra memory requirement has reduced from `set[str]` to `bool array + int` def search_text(self, lines, keywords): if self.exact: for line in lines: if " ".join(keywords) in line.lower(): return True else: keywords = list(set(keywords)) match_found = [False for i in range(len(keywords))] match_found_count = 0 for line in lines: for i in range(len(keywords)): if (not match_found[i]) and keywords[i] in line.lower(): match_found[i] = True match_found_count += 1 if match_found_count == len(keywords): return True ```
issue