We will all agree on the fact that there is a requirement for a search bar on every website we go through. This not only saves our time by directly taking us to the thing we are searching for but also provides a quick response. Related to this, Amazon thought about coming up with an advanced search version that is, CloudSearch.
Having the support of over 34 languages with including popular search features like highlighting, autocomplete, and geospatial search, Amazon CloudSearch has set the bar high.
Curious to know more? Then, join us in this because there is so much more to come. We will be learning about the Amazon CloudSearch and its related features that will help you in starting your CloudSearch journey. So, let’s begin!
What is Amazon CloudSearch?
Amazon CloudSearch refers to a fully managed service in the AWS Cloud that provides a simple and cost-effective way for setting up, managing, and scaling a search solution for any website or application. Using Amazon CloudSearch can help you in searching large collections of data like web pages, document files, forum posts, or product information. Moreover, you can add fast search capabilities without any need to become a search expert. And, there is no place in CloudSearch to worry about hardware provisioning, setup, and maintenance.
Further, use the steps for developing a search solution with Amazon CloudSearch:
- Firstly, creating and configuring a search domain. A search domain consists of searchable data and the search instances for handling search requests.
- Secondly, upload the data for searching it in the domain. Amazon CloudSearch indexes data. Then, it deploys the search index to one or more search instances.
- Lastly, search your domain. You send a search request to your domain’s search endpoint as an HTTP/HTTPS GET request.
Amazon CloudSearch comes with unique search capabilities for your website or application. Moreover, you can create a search domain and upload the data to make it searchable by using the AWS Management Console. Then Amazon CloudSearch will provide the required resources and deploy a fine-tuned search index on its own. You may also adjust your search parameters at any moment, fine-tune search relevancy, and apply new settings. Moving on to the benefits.
What are the benefits of Amazon CloudSearch?
- Firstly, in this, you can configure and manage an Amazon CloudSearch domain using the AWS Management Console, AWS CLI, and AWS SDKs. Moreover, you can add or delete index fields and even customize search options such as faceting and highlighting.
- Secondly, it provides powerful autoscaling for all search domains. Moreover, it lets you control scaling if you know that you require more capacity for bulk uploads or are expecting a surge in search traffic.
- Thirdly, for search domains, Amazon CloudSearch provides automatic monitoring and recovery. However, when Multi-AZ is enabled, Amazon CloudSearch starts provisioning and maintaining resources for a search domain in two Availability Zones for providing high availability. Updates are automatically applied to the search instances and the search traffic is distributed across both Availability Zones.
- Next, using automatic sharding and horizontal and vertical autoscaling, it provides low latency and high performance, even at a large scale.
- Fifthly, Amazon CloudSearch is a fully managed custom search service that can handles:
- hardware and software provisioning
- setup and configuration
- software patching
- data partitioning
- node monitoring
- Scaling
- data durability
- Next, it supports powerful search features like free text, boolean, and faceted search, autocomplete suggestions, query-time rank expressions, field weighting, geospatial search, and more.
- Amazon CloudSearch offers a low total cost of ownership for your search applications compared to operating a search environment on your own.
- Lastly, it uses strong cryptographic methods for authenticating users and preventing unauthorized access to your domains. Amazon CloudSearch supports HTTPS and integrates with Identity and Access Management (IAM) for controlling access to the CloudSearch configuration service and each domain’s document, search, and suggest services.
Wondering what’s next? In the next section, we will check out the working of the search and its criteria.
How the Amazon Search Works?
Unstructured full-text documents, as well as semi-structured documents organized in mark-up languages like XML, make up the data set utilized for searching. However, for making data searchable, you showcase it as a batch of documents in either JSON or XML and upload the batch to your search domain. After that, Amazon CloudSearch generates a search index from document data as per the domain’s configuration options. Then, you submit queries against this index for finding the documents that meet specific search criteria.
1. Indexing in Amazon CloudSearch
For creating a search index from your data, Amazon CloudSearch requires the following information:
- Firstly, which document fields you want to search?
- Secondly, which document field values you want to retrieve with the search results?
- Thirdly, which document fields represent categories that you want to use to refine and filter search results?
- Lastly, how should the text within a particular field be processed?
- However, you define this metadata in your domain configuration by configuring indexing options. Use indexing options for specifying the fields included in the search index and control how you can use those fields.
2. Facets in Amazon CloudSearch
A facet refers to an index field that represents a category used for refining and filtering search results. After submitting search requests to Amazon CloudSearch, you can request facet information to find out how many hits share the same value in a facet. Moreover, you can show this information along with the search results and use it to enable users to interactively refine their searches.
3. Text Processing in Amazon CloudSearch
Amazon CloudSearch processes the contents of text and text-array fields as per the language-specific analysis scheme during indexing. However, an analysis scheme is for controlling how the text is normalized, tokenized, and stemmed. Moreover, it specifies any stopwords or synonyms for taking into account during indexing.
4. Sorting Results in Amazon CloudSearch
You can customize how search results are ranked by setting expressions that calculate custom values for every document that matches your search criteria. For example, define an expression for taking into account the value in a document’s popularity field as well as the default relevance score calculated by Amazon CloudSearch Expressions.
5. Search Requests in Amazon CloudSearch
You submit search requests to your domain’s search endpoint as HTTP/HTTPS GET requests. You can specify a variety of options for constraining your search, request facet information, control ranking. And further, specify what you want to be returned in the results. Amazon CloudSearch starts performing text processing on the search string after submitting a search request. The search string is processed to:
- Firstly, converting all characters to lowercase
- Secondly, splitting the string into separate terms on whitespace and punctuation boundaries
- Thirdly, removing terms that are on the stopword list for the field being searched.
- Lastly, mapping stems and synonyms according to the stemming and synonym options configure for the field being searched.
Amazon CloudSearch: Architecture
Interaction with Amazon CloudSearch is done using three services:
1. Configuration Service
The configuration service is for creating and configuring search domains. However, for setting up a search domain, enter a unique name and configure indexing options, text analysis schemes, availability options, scaling options, suggesters, and expressions:
- Firstly, Indexing options define the fields for including in your index. For scanning your data and automatically configure default indexing options, you can use the AWS Management Console or the Amazon CloudSearch command-line tools.
- Secondly, text analysis schemes define language-specific text processing options for text and text-array fields. Analysis schemes is for controlling the stopwords that should be ignored during,
- Indexing
- defining common synonyms for terms
- specifying how terms are mapped to common stems
- Thirdly, availability options are used for deploying a domain across two Availability Zones for ensuring high availability in the event of a service disruption.
- Then, scaling options for prescaling your domain by specifying the desired instance type, replication count, and partition count.
- After that, suggesters for retrieving possible matches for an incomplete search query so you can display results as the user types.
- Lastly, expressions are numeric expressions that are evaluated at query time. This expression is for controlling the ranking of search results. By default, documents are ranked by a relevance score.
2. Document Service
Document service is for making changes to a domain’s searchable data. Each domain has a unique document service HTTP endpoint. However, for sending data to your domain, you need to format it in JSON or XML. Each item will return as a search result is represented as a document. Every document has a unique ID and one or more fields containing the data that you want to search and return in results. Further, document fields can contain any UTF-8 string data.
3. Search Service
The search service controls the search and suggestion requests for a domain. Each domain has a unique search HTTP endpoint. However, when you send a search or suggest a request, then, the search service returns a list of matching documents. Results can be returned in either JSON or XML. Further, Amazon CloudSearch provides a rich query language for,
- searching within particular fields
- performing complex Boolean searches
- retrieving facet information
- specifying what data you want the results to include
Now, we have got familiarity with how Amazon CloudSearch works and what features make it unique. In the next sections, we will learn about getting started with Amazon CloudSearch. So, let’s start!
Getting Started with Amazon CloudSearch
Before you get started with Amazon CloudSearch, it is important that to have an Amazon Web Services (AWS) account. This AWS account provides access to Amazon CloudSearch and other AWS services. However, with other AWS services, you pay only for the Amazon CloudSearch resources you use.
For creating an AWS account:
- Firstly, go to https://aws.amazon.com. There, click Sign Up Now.
- After that, follow the instructions to sign up. There you have to enter payment information before starting using Amazon CloudSearch.
Moving on, now we will learn how to create CloudSearch Domain.
1. Creating an Amazon CloudSearch Domain
An Amazon CloudSearch domain contains:
- Firstly, the data you want to search
- Secondly, the search instances that process your search requests
- Lastly, a configuration that controls the data indexing and searching.
However, for each domain,
- Firstly, you configure indexing options that explain the fields you want to include in your index and how you want to use them
- Secondly, there are analysis schemes for specifying language-specific text processing options for individual fields
- Then, expressions that you can use for customizing how search results are ranked
- Lastly, access policies for controlling access to the domain’s document and search endpoints
For creating your movies domain:
- Firstly, go to the Amazon CloudSearch console page.
- Secondly, on that page, click Create Your First Search Domain.
- Thirdly, in the NAME YOUR DOMAIN step, enter a name for your new domain and click Continue. However, you must know that the domain names must start with a letter or number and be at least 3 and no more than 28 characters. They can contain:
- a-z (lower case)
- 0-9
- – (hyphen)
- Fourthly, in the CONFIGURE INDEX step, click Use a predefined configuration. Then, select IMDb movies, and click Continue.
- Then, in the REVIEW INDEX CONFIGURATION step, review the index fields being configured. However, eleven fields are configured automatically for the IMDb-movie data:
- Actors
- Directors
- Genres
- Image_url
- Plot
- Rank
- Rating
- Release_date
- Running_time_secs
- Title
- Year
- After that, click Continue.
- Now, in the SET UP ACCESS POLICIES step, click
- Search and Suggester service: Allow all
- Document Service: Account owner only. and click Continue.
- After that, in the CONFIRM step, review the domain configuration. Then, click Confirm to create your domain.
- Lastly, after creating the domain, click OK to exit the Create New Search Domain wizard and go to the domain’s dashboard.
2. Uploading Data to Amazon CloudSearch for Indexing
Uploading the data will allow Amazon CloudSearch for building and deploying a searchable index. However, for indexing, the data format must be in either JSON or XML. Further, the Amazon CloudSearch console can automatically convert the following file types to the required format:
- Firstly, document batches having format in JSON or XML (.json, .xml)
- Secondly, Comma Separated Value (.csv)
- Lastly, Text Documents (.txt)
For uploading the sample data to your movies domain:
- Firstly, visit the Amazon CloudSearch console page.
- Secondly, click the name of your movie’s domain in the Navigation panel for viewing the domain dashboard.
- Thirdly, click the Upload Documents button at the top of the domain dashboard.
- Then, select Predefined data on the DOCUMENT SOURCE step. After that, choose IMDb movies and click Continue.
- Next, review the upload summary on the REVIEW DOCUMENTS step. Then, click Upload Documents for sending the data to your domain for indexing.
- Lastly, click Finish on the DOCUMENT SUMMARY step for returning to the domain dashboard.
3. Searching Your Amazon CloudSearch Domain
For this, a search tester can be used in the Amazon CloudSearch console for submitting sample search requests and view the results. However, you can also submit sample search requests using a Web browser or using cURL. Further, in your application, you can use any HTTP library for sending search traffic to your Amazon CloudSearch domain.
Searching with the Search Tester
In the Amazon CloudSearch console, the search tester is for submitting sample search requests using supported query parsers like simple, structured, Lucene, or dismax. Moreover, you can define options for the selected parser, filter, and browse the configured facets. However, by default, results sorting is as per an automatically-generated relevance score, _score.
For searching your domain:
- Firstly, visit the Amazon CloudSearch console page.
- Secondly, click the name of your movies domain in the Navigation panel. Then, click the Run a Test Search link.
- Thirdly, for performing a simple text search, enter the text you want to search for and click Go. By default, it searches all text and text-array fields.
- Then, for searching particular fields, click the More Parameters link and enter a comma-separated list of the fields you want to search in the Search Fields field. However, you can add a weight to each field with a caret (^) for controlling the relative importance of each field in the search results.
- Lastly, for using the structured query syntax, select Structured from the Query Parser menu. After selecting the structured query parser, enter your structured query in the search field and click Go.
4. Deleting Your Amazon CloudSearch Movies Domain
After finishing the experiments with your movies domain, you must delete it for avoiding incurring additional usage fees.
For deleting your IMDb-movies domain:
- Firstly, visit the Amazon CloudSearch console page.
- Secondly, click the name of your movies domain in the Navigation panel for viewing the domain dashboard.
- Thirdly, click the Delete this Domain button at the top of the domain dashboard.
- Lastly, select the Delete the domain option in the Delete Domain dialog box. After that, click OK for permanently removing the domain and all of its data.
Amazon CloudSearch Pricing
For using Amazon CloudSearch pricing, there are no set-up fees or upfront commitments. The major portion of a typical domain’s costs comes from search instance usage.
However, the customers are billed as per their monthly usage over the following dimensions:
1. Search Instances
Billed for hourly instance charges:
2. Batch Uploads
Billed for the total number of document batches uploaded to your search domain. Uploaded documents are automatically indexed.
- $0.10 per 1,000 Batch Upload Requests. However, the maximum batch size is 5 MB.
3. IndexDocuments Requests
You will need to rebuild the index, if you make configuration changes to your index, for example by adding a field. However, for doing this, you use the AWS Management Console, command-line tools, AWS SDKs, or APIs in order to issue an IndexDocuments request. However, the charges are:
- $0.98 per GB of data stored in your search domain
4. Data Transfer
Standard AWS data transfer charges apply for data transferred in and out of Amazon CloudSearch. However, there is no charge for data transferring in and out of Amazon CloudSearch from other AWS services within the same region.
Final Words
With having the support of many languages and unique search features Amazon CloudSearch has proven to be the best search solution. This has helped many top organizations to add rich search capabilities into their websites or applications. Moreover, you can easily manage and control Amazon CloudSearch using AWS Management Console, AWS CLI, and AWS SDKs. So, go through the blog and get yourself familiar with advanced features of Amazon CloudSearch as well as learn to create a CloudSearch domain.