Caching SearXNG- Immortality Knowledge Base

Caching SearXNG

This revision is from 2024/03/18 08:55. You can Restore it.

SearXNG installs itself on /usr/local/searxng/searxng-src, with the main source code in searxng-src directory.

To hack the results, the file is webapp.py in /usr/local/searxng/searxng-src/searx/webapp.py

The function in webapp.py is...


@app.route('/search', methods=[['GET', 'POST']])
def search():

A cache could work by...

making a directory in the searx folder named cache
make sub-folders in the cache directory from a to z and 0 to 9

name the cache files after the search term
check filename exists when a search is performed
if there is a match read in the local file instead and avoid the search
send the keywords to the maintainers so they can update the cache. They can then crawl the search engines and build a more comprehensive cache.

Proposed searXNG options:

use cache
update the cache
disclosure to end user

Benefits:

turns searXNG into a full search engine built from caching results
searches are against a local file, so it speeds up searching significantly
offline searching if the cache gets big enough

File in question:

/usr/local/searxng/searxng-src/searx/search/__init__.py : class Search

/usr/local/searxng/searxng-src/searx/ : webapp.py : def search() : search = SearchWithPlugins(search_query, request.user_plugins, request) # pylint: disable=redefined-outer-name

In class Search...

def search_multiple_requests(self, requests):

duplicated it with

def search_multiple_requests2(self, requests):

An if else clause based on if a cached results exist, to choose to return a cached version of do the real search.

class Search:

Something like.. in def search_standard(self):


if os.path.isfile(filepath):  
     search_multiple_requests2(self, requests) # cached version
else:  
    search_multiple_requests(self, requests) # do the real search

the original



    def search_multiple_requests(self, requests):
        # pylint: disable=protected-access
        search_id = str(uuid4())

        for engine_name, query, request_params in requests:
            _search = copy_current_request_context(PROCESSORS[engine_name].search)
            th = threading.Thread(  # pylint: disable=invalid-name
                target=_search,
                args=(query, request_params, self.result_container, self.start_time, self.actual_timeout),
                name=search_id,
            )
            th._timeout = False
            th._engine_name = engine_name
            th.start()

        for th in threading.enumerate():  # pylint: disable=invalid-name
            if th.name == search_id:
                remaining_time = max(0.0, self.actual_timeout - (default_timer() - self.start_time))
                th.join(remaining_time)
                if th.is_alive():
                    th._timeout = True
                    self.result_container.add_unresponsive_engine(th._engine_name, 'timeout')
                    PROCESSORS[th._engine_name].logger.error('engine timeout')

The else def, return mock results



    def search_multiple_requests2(self, requests):
        # pylint: disable=protected-access
        search_id = str(uuid4())

        # Skip actual searching, assign mock result instead
        mock_result_container = ResultContainer()
        web_results = ['Mock Web Result 1', 'Mock Web Result 2', 'Mock Web Result 3']
    
        # Ensure each result dictionary has a 'content' key
        mock_web_results = [{'url': result, 'content': ''} for result in web_results]
    
        mock_result_container.extend('web', mock_web_results)
        self.result_container = mock_result_container

Trying to populate the class correctly for the mock search_multiple_requests2 is the nightmare.

📝 📜 ⏱️ ⬆️