Search In The Future

1. What

   A conventional search, like Google, is definitely retrospective. This means that you can find only what had happened in the past. Moreover, any change should be indexed first before it becomes available for a search. The indexing typically lasts from days to weeks. Good enough for a historical reference but always too late for important updates.

   The alternative approach is a prospective search. It instantly alerts when something matching a query happens. Best to get insights, monitor mentions in the media, find a job or any other offer.

2. Why

   I want to get alerts about my interests as soon as they happen. Whether it's a scientific discovery, breaking news or matching offers. My idea is that I'm not alone and many of us would want the same too.

   It's enough for a human to briefly read and understand a message for seconds or minutes. So a good end-to-end delivery delay should be 1 second and the worst should be 1 minute. This is quite far from what conventional search engines offer (hours at least).

   Moreover, you have to repeat the search again and again to get new updates. This is the waste of your time. And still no guarantee you get anything matching. Instead, I want to submit a query once and just wait for results. A good solution should push an alert notification when and only when a result occurs. Without any useless distractions.

2.2. Feeds

   What about feed readers and aggregators like Feedly or Inoreader? Reader applications automatically poll for updates and able to notify. However, this is not a true search because it lacks the content filtering. The only way to filter is to limit the sources you allow to distract you. Then you remain in your limited bubble missing any serendipitous message from the rest of the world. Additionally, there's no guarantee a trusted source won't feed you with a spam or other irrelevant messages. The client side content filtering is not feasible too. Just because your mobile phone can not handle a stream of messages to filter from the whole internet. Feeds are always a compromise between missing the important outside your bubble and distracting to tons of spam.

   Therefore, an effective prospective search solution should resolve matching queries for every incoming message. After this the message may be pushed to everyone interested. For the alerts notification purpose we have today some ready to use instant messaging applications. So I see the solution as an integration with an existing messenger.

2.3. Past Solutions Graveyard

   Google had something resembling named Real-Time Search a long ago. Unfortunately, it moved to the graveyard in 2011. There is also Google Alerts, but it's based on the retrospective search, hence, the delay is unacceptable (might be not working at all). So the demand is still present and I don't know anything else fulfilling it.

3. How

   Based on the requirements described above, I implemented the Awakari service. It's free to use, so I encourage everyone to try it.

   When you navigate to the awakari.com site, first you will see the search input box. You can use it for a quick keywords query test without signing in. Such search is text only and results are limited to the past. When it's not enough, and you want to be notified about future results you need to subscribe.

   Currently, it's possible to subscribe with a Telegram account. The Telegram integration serves both authentication and notification purposes. Awakari service itself doesn't use any user data except id to distinguish users. This may be proven as far as this part is Open-Source. Other delivery and notification options are possible to implement too.

   After signed in, you're welcome to:

    ✓ create and use up to 10 subscriptions

    ✓ publish up to 100 messages per day

    ✓ define advanced query conditions

    ✓ bring own publishing sources

   Sources added by a user will utilize the user's daily publishing quota. You can request to increase a limit or separate a dedicated publishing limit for your source. As far as the service is completely free, donations are appreciated. This would help handle a load and further development.

3.1. Sources

   There are currently around 500 sources producing around 1M of messages per month. Awakari supports a wide variety of publishing source types:

    ✓ Feeds:

        ✓ RSS / Atom / JSON

        ✓ schema.org entries:

            ✓ JSON-LD

            ✓ Microdata

        ✓ Microformats

        ✓ HTML5 articles

    ✓ Fediverse (ActivityPub):

        ✓ Mastodon

        ✓ Friendica

        ✓ Hubzilla

        ? others (not tested though)

    ✓ Telegram Channels (public)

    ✓ Sites been checked daily

   Awakari recognizes the feeds with WebSub. If there's no WebSub in the feed, Awakari uses periodic polling for updates and collects a frequency statistic. The statistic is later used for the adaptive poll scheduling. For other source types like Telegram and Fediverse the updates are being consumed in real-time without any polling.

   Awakari extracts the message text and performs a text search to find all matching subscriptions. It also uses the basic words stemming so different forms of the same word will match. At the moment the stemming works only for english language but later this may be extended. Exact text match option is available as well.

   The general recommendation is to use specific keywords longer than 3 symbols. When you specify multiple keywords separated by space the search will match any of them. So "funny cat" will match every message containing either "funny" or "cat" or both. This is a bit different from how conventional search works.

   The next important feature is structured search across the message metadata. Messages contain useful attributes like:

    • Subject (Author)

    • Categories

    • Language

    • Location

    • Price

    • Source

... and more. So you can use it in advanced conditions for the additional filtering. For text search conditions the attribute key may be skipped to look for a match everywhere.

   Note also the "Not" checkbox on the screenshot. This negates the condition when you want to exclude anything from the results. Negative conditions can not be used alone and require to be combined with others.

   As far as some message attributes are numbers, there are the numeric search conditions. You can define a certain message attribute like price, latitude or media duration. Then define a comparison like "<", "=", "≥" and value like 42. Note that it's required to specifiy a message attribute key for a numeric condition.

3.5. Group Conditions

   Finally, you can group conditions and define the logic like "And", "Or", "Xor". Nested conditions may also be groups.

   Together all these advanced search features provide rich possibilities to filter a content. You're encouraged to experiment with various conditions combinations. When a subscription produce unexpected results just update it modifying its conditions. For example, a subscription initially may be too general and produce too many results. Just try to make it more specific then.

3.6. Reading Results

   When you create a new subscription the @AwakariBot in Telegram starts to deliver the search results to you. If you have multiple subscriptions it's recommended to assign every subscription to a separate group for your convenience. To do this, create a simple group, add @AwakariBot and type the /start command to the chatbot. This command always displays list of your subscriptions to choose. Clicking on a selected subscription will link or unlink it to a current chat. Additonal icons against every subscription show whether the subscription is linked to the current chat or another.

   The primary chatbot purpose is a search results reader. The chatbot may be also used to publish a basic message or subscribe for a simple text search query. If you're an admin of a public channel and want Awakari to consume posts from your channel you can add @AwakariBot to your channel removing all admin rights.

   Note the resulting messages also contain the attributes, e.g. "source". You can use this info to fine-tune the subscription conditions.

   Other than Telegram integrations are doable too. So your feedback is important.

   That's it for this time. Thanks for reading.

- Andrei Kurilov, 2024