Few optimizations to `CatfishSearchEngine.py` for better performance and lesser memory usage
I think we can slightly optimize catfish
1. The `else` block at line 292 in method [`CatfishSearchEngine.search_filenames` in CatfishSearchEngine.py](https://gitlab.xfce.org/apps/catfish/-/blob/master/catfish/CatfishSearchEngine.py#L292)
```py
# OLD
match_list = set()
for kword in keywords:
if kword in fname.lower():
match_list.add(kword)
if len(set(keywords)) == len(match_list):
return True
# NEW
# Avoids repeated substring search for duplicates in `keywords`
# Avoids repeated processing of converting `fname` to lower case
# Avoids repeated conversion of object `keywords` to `set`
# No need of an extra `set` data structure `match_list` (thus reducing memory usage)
# Returns immediately if a single `kword` is not present in `fnanme`
fname = fname.lower()
for kword in set(keywords):
if kword not in fname:
return False
return True
```
2. In method [`CatfishSearchMethod_Zeitgeist.run` at line 674 of CatfishSearchEngine.py](https://gitlab.xfce.org/apps/catfish/-/blob/master/catfish/CatfishSearchEngine.py#L674)
- This change avoid repeated conversion of string `keywords` to lower case in nested `for` loops
```py
# line=676
# OLD
keywords = " ".join(keywords)
# NEW
keywords = " ".join(keywords).lower()
```
```py
# line=707
# OLD
if keywords.lower() in filename and \
# NEW
if keywords in filename and \
```
3. `else` block of method [`CatfishSearchMethod_Fulltext.search_text` at line 557 of CatfishSearchEngine.py](https://gitlab.xfce.org/apps/catfish/-/blob/master/catfish/CatfishSearchEngine.py#L557)
- I did some performance testing using the attached files `optimization3_.*` and found that the proposed change indeed improved the performance. Following are some of the results
- [optimization3_speed_test.py](/uploads/f66d0c13bc2931252f2017fb0b76d76a/optimization3_speed_test.py)
- [optimization3_speed_test_input_generator.py](/uploads/de76a27b1c89c6e8c6553d0c127dc099/optimization3_speed_test_input_generator.py)
```
[ExecutionTime] "catfish_old_full_search" = 00:00:08.600 (i.e. ~8 seconds)
[ExecutionTime] "new_full_search" = 00:00:00.285 (~0.29 seconds)
[ExecutionTime] "catfish_old_full_search" = 00:00:00.935 (i.e. ~0.9 seconds)
[ExecutionTime] "new_full_search" = 00:00:00.128 (i.e. ~0.13 seconds)
```
```py
# OLD
def search_text(self, lines, keywords):
if self.exact:
for line in lines:
if " ".join(keywords) in line.lower():
return True
else:
match_list = set()
for line in lines:
for kword in keywords:
if kword in line.lower():
match_list.add(kword)
if len(set(keywords)) == len(match_list):
return True
# NEW
# Avoids repeated substring search for duplicates in `keywords`
# Avoids repeated conversion of `keywords` object to `set`
# Avoids repeated substring search if the keyword was already found in the past
# Extra memory requirement has reduced from `set[str]` to `bool array + int`
def search_text(self, lines, keywords):
if self.exact:
for line in lines:
if " ".join(keywords) in line.lower():
return True
else:
keywords = list(set(keywords))
match_found = [False for i in range(len(keywords))]
match_found_count = 0
for line in lines:
for i in range(len(keywords)):
if (not match_found[i]) and keywords[i] in line.lower():
match_found[i] = True
match_found_count += 1
if match_found_count == len(keywords):
return True
```
issue