Caching SearXNG

This revision is from 2024/03/18 08:55. You can Restore it.

SearXNG installs itself on /usr/local/searxng/searxng-src, with the main source code in searxng-src directory.

To hack the results, the file is webapp.py in /usr/local/searxng/searxng-src/searx/webapp.py

The function in webapp.py is...

@app.route('/search', methods=[['GET', 'POST']])

def search():

A cache could work by...

  • making a directory in the searx folder named cache
  • make sub-folders in the cache directory from a to z and 0 to 9

  • name the cache files after the search term
  • check filename exists when a search is performed
  • if there is a match read in the local file instead and avoid the search
  • send the keywords to the maintainers so they can update the cache. They can then crawl the search engines and build a more comprehensive cache.

Proposed searXNG options:

  • use cache
  • update the cache
  • disclosure to end user

Benefits:

  • turns searXNG into a full search engine built from caching results
  • searches are against a local file, so it speeds up searching significantly
  • offline searching if the cache gets big enough

File in question:

/usr/local/searxng/searxng-src/searx/search/__init__.py : class Search

/usr/local/searxng/searxng-src/searx/ : webapp.py : def search() : search = SearchWithPlugins(search_query, request.user_plugins, request) # pylint: disable=redefined-outer-name

In class Search...

def search_multiple_requests(self, requests):

duplicated it with

def search_multiple_requests2(self, requests):

An if else clause based on if a cached results exist, to choose to return a cached version of do the real search.

class Search:

Something like.. in def search_standard(self):

if os.path.isfile(filepath):

search_multiple_requests2(self, requests) # cached version

else:

search_multiple_requests(self, requests) # do the real search

the original

def search_multiple_requests(self, requests):

# pylint: disable=protected-access

search_id = str(uuid4())

for engine_name, query, request_params in requests:

_search = copy_current_request_context(PROCESSORS[engine_name].search)

th = threading.Thread( # pylint: disable=invalid-name

target=_search,

args=(query, request_params, self.result_container, self.start_time, self.actual_timeout),

name=search_id,

)

th._timeout = False

th._engine_name = engine_name

th.start()

for th in threading.enumerate(): # pylint: disable=invalid-name

if th.name == search_id:

remaining_time = max(0.0, self.actual_timeout - (default_timer() - self.start_time))

th.join(remaining_time)

if th.is_alive():

th._timeout = True

self.result_container.add_unresponsive_engine(th._engine_name, 'timeout')

PROCESSORS[th._engine_name].logger.error('engine timeout')

The else def, return mock results

def search_multiple_requests2(self, requests):

# pylint: disable=protected-access

search_id = str(uuid4())

# Skip actual searching, assign mock result instead

mock_result_container = ResultContainer()

web_results = ['Mock Web Result 1', 'Mock Web Result 2', 'Mock Web Result 3']

# Ensure each result dictionary has a 'content' key

mock_web_results = [{'url': result, 'content': ''} for result in web_results]

mock_result_container.extend('web', mock_web_results)

self.result_container = mock_result_container

Trying to populate the class correctly for the mock search_multiple_requests2 is the nightmare.

  

📝 📜 ⏱️ ⬆️