Transcript for:
Exploring Solr: Features and Architecture

welcome to my site code diaries so this is the new series which we are starting today it is called as a supportive pages in this series we are going to talk about the non-site code topic which i will help you in the site code project it can be any non uh it can be any uh dot net topic or maybe uh the uh some such topic okay so first uh uh part first video will be on the solar so this is the introduction of a solar before moving let me introduce myself i am jitendra khanekar i am if you have not subscribed to our channel please subscribe to the channel if you like the video please click on the like button please share it with others and provide your feedback via comment so let's start introduction to solar let's see what is solar you Solar is an open source search platform which is used to build a search application and it is built on the Lucene. Lucene is another technology. Some other day we will have a video on the Lucene. So, Solar is based on the Lucene. Basically, Solar is used as a search application. So, it is used as a content search. So, in your application if there is a lot of content, then you want to search anything on that content, then the Solar is used. that that search can be your wildcard search or phrase search uh that those kind of a search which you can you can use a solar okay and then uh it is it is used to find the required information from the large data source if you have a lot of data then solar is used for the search technology uh it can be uh used for a storage purpose uh it is a non-relational database it is not like a sql it doesn't have any relationship okay it is the all the data is stored as a document in the solar uh it is scalable ready to deploy search storage engine optimized to search a large volume so basically overall at a summarize if you summarize it solar is basically used uh for a for for a search searching content into the large volumes okay if you talk about the history uh the solar is a build but created by yannick silly in 2004 okay it was built for cnet network website and then uh in 2006 jan 2006 it is made as an open source project under apache software foundation the latest version of the solar is a so let's see features of solar first thing is a restful api so solar provides a restful api so that makes the developers life easy we can we can integrate with these uh solar okay in this we can enter the document in a different format like XML, JSON and CSV and get the result in the same file format so Redspool API is one of the features of Solar so that is why it can be integrated with any technology second is full text search so Solar provides a full text search full text search what it means by there will be a different option which is provided like you can search with the token you can reach search with the phrases spell check even to change uh put the wrong spelling then also it gives the result wildcard you can put a star and give that all the result autocomplete so this kind of functionality which is solar provides no sql database this is no sql database so there is no relational database okay so it means it can use for each which volumes it has a good admin interface we will have a demo of admin interface soon in the next videos as flexible and extensible so by extending the java classes and configuration we can extend the solar classes these are the few features which makes a solar very useful first is the respl api cool texture non-sql database also which is for the huge volumes and good admin interface and flexible and extensive Let's see a Solr basic concept and terminology. For this we will take a very very simple example. So let's assume that you want to search something in your book. You want to search anything. How will you search it? The first way is you will go to each page and you whatever term or whatever text which you want to search you will go to each page. and then you will start searching that page okay and finally you will find the page okay this is the one method second method how we will do it what you will do is you will go to the first page or table or index page okay there you will try to find out the relative text like you what so you are some searching for something like uh in in in one of the book you are searching for the site code you will see where is site code okay and if you find that then you will go to that page and we will check whether that page is there and then that that's how you will find the the actual page okay so so overall if you want to understand very simple i mean for very simple way so this is how solar also work okay so in solar solar what are let's say the different terminology in this example okay search whatever you are searching is your search term okay how you are searching that is a queries okay so you build the queries okay then how the indexes page like you have an index so similarly you will have a indexes created for the your your document okay so those are indexes okay document is actually your data okay which contains the fields so also one document will have a lot of fields where your data is stored like in sql we have a column and the rows right but in document it will be like one document you will have filled and value filled and valued okay then uh the uh there is something called index writer which actually writes the index so whatever the index which are getting written it is done by the index writer whereas the index which is getting searched is called as a search index so this is a very simple concept and terminologies okay so uh if you you whenever you you want to understand the solar in very simple way it is the best example to understand it okay so you have a book where you want to search something you go out via the index you find the index of what is the page number and then you will go to the uh the page similarly in solar what happens is whatever the document which gets stored in the solar the corresponding indexes are getting created okay indexing doesn't uh stored all the data okay it has the references so Whatever you are searching that will be search in the indexes and then from that you will get the document So this is the basic concept of the solar. Let's say what is solar cloud? solar cloud high ability and fault tolerant environment So environment which is having having higher ability and fault tolerant both solar cloud is basically your solar is hosted on the cloud so what is high availability high availability is the ability of a system to operate continuously without failing for a designated period so without failing the system should be up and what is the fault tolerant fault tolerant is a design to return to a safe condition in the event of failure of malfunctioning so if your system got fail in any case the system should get up in within no time so these are the two things which helps uh the uh the uh which is which is which is actually a uh features of a cloud so solar is hosted on the cloud so this will provide you high availability fault tolerance department so if you go with the definition wise it is a system in which data is organized into a multiple pieces so your data is organized into the multiple pieces or a shards we will see what is shared that can be hosted on a multiple machine so you will have a multiple machines also okay so with replicas providing the redundancy for the both scalability and fault tolerance so it's a scalable and fault tolerant and zookeeper server so that helps manage the overall structure so that both indexing and search request can be routed properly zookeeper is basically uh works as a load balancer if you are having the multiple uh instances and how it should get route for the indexing and the uh search so that will be managed by zookeeper so we'll see what is zookeeper in detail Let's see now what is solar cloud architecture and there are different terminology used in that architecture. So high level, we are not going in detail. So this is a high level solar architecture where you have a jute keeper cluster which will handle the request for the indexing as well as a request for the searching and it also handles the load. and then you have a two solar instance but there will be two things one is a shad uh two there will different shad so first shad and out of this those shad one will be silly one will be act as a leader and another will be act as a replica okay so what is zookeeper does zookeeper provides the centralized cluster management so it is managing the cluster right we can have you can see solar instance one solar two those clusters are created so that is managed by zookeeper Zookeeper tracks each node of the cluster and the state of the each core on each node Okay, so zookeeper tracks the each node. So it's not only Balanced it it also tracks the nodes, okay Of the sorry it will balance it will track the cluster and the state of each core and on each node So what its core and node we will see later. Okay Configuration files are stored in a zookeeper ok the configuration files are stored in the zookeeper and on the file system not on the file system ok so generally what we do is we put the file system in the instance itself ok but here we put it on the zookeeper Okay When the configuration changes are made so any configuration changes are made then they must be uploaded to the true keeper Okay, which in turns make sure the node node changes have been updated. So there is no if you updated anything Then it will be it is the zookeeper responsibility Okay So whatever you are uploading whatever the files which you are uploading configuration files that should go to the zookeeper and then it will Manage it. Okay for the instances zookeeper also handles the load balancing and failover so if there is any load balance if you are whether you have a two instance here so those instance we basically call it the node so we have a two nodes okay so each node which node has a lot of load accordingly it will balance it so it acts as a balance load balancer also collection so what is collection a collection is an entire group of course that represent an index okay so it is it is represent the entire uh it is a collection of a entire group of a course that represent an index what is shad shad is a logical partition of a collection okay so whatever the collection that is logically it is part uh then a partition that is called as a shad and this partition stores a part of an entire index for a collection sharding is handled automatically simply by tailing solar during the collection uh creation how many shots you like to collect your collection to have indexes updated are then generally balanced between the each shot automatically so it is automatically processed when you create the any any collection that time you specify how many shots you want like you are here we have a two shots okay so we specify those charts uh then indexes are uh they generally uh indexes update index updates are then generally balanced between the each chart automatically so the whatever the indexes are happening which is getting balanced between shared one and shadow basically it is getting replicated into the both the shot solar core it is the phase it is a single physically index so when we say a solar core it is basically basically a physical index what is the replica so this is a replica blue c here right so what is it a physical manifestation of a logical shot okay then what is leader this is the leader okay so leader is a one of a replica every shot will designated as a leader to coordinate indexing for that shadow okay so every uh if there are number of shots one one shot will be designated as a leader which will coordinate indexing for that shadow node so a single instance of a solar is a called a node so you see here two instances so each instance is called as a node cluster all the nodes we are using to host solar core so this is cluster you have a solar one solar two that completely root give but this is called as a cluster okay let's see a few configuration file which is used in solar first is the solar.xml it is the file in the solar underscore home directory so it is a file in the home directory that contains the solar cloud related information okay and all cores are load loaded from this file okay so all the cores are load from this file and it does contain the cloud related information next is the solar config.xml this file contains the definition and core specific configuration related to a request handling and response formatting along with the indexing configuring managing memory and making commits okay so solar xml file contains the definition and the core specific configurations okay then we have a schema.xml this is very important file this file contains the whole schema okay along with the fields and field type so whatever the schema which were whatever the uh uh the the document which you are creating basically whatever the data which you are storing so each data whatever the fields are there and what type of fields are there so that definitions is contained in the schema.xml okay it is it is very important file because when you create the like like let's take an example like sql you and create a sql you have a uh you create design the sql right you give the uh file what is the column name okay what is the limit so all those information right so similarly we have a schema file in the solar core properties so this file contains the configuration specific to the core okay it is related to core it is referred for a core discovery as it contains the name of the core so it defines the name of that core okay and path of the data directory where the data is stored that is also given in the core properties it can be used in any directory which will then be treated as the core directory so these are the four important configuration files insular so we have seen the solar basic solar and the solar cloud for a site developer uh you you must have heard this term search tax okay so what is search tax it is solar as a service so what is search tax search tax is a product which provides a solar as a service option so it is basically solar only but it is used as a service so it is managed solar cloud and for the site for solution you did it is more preferred product because uh you do not have to concentrate on concentrate how it works okay or do not have to manage the solar it is managed by the uh by the search tax team okay so this is the solar as a service option what is the advantage of using uh solar as a service option like search tax it is a built-in monitoring and it has a built-in monitoring and alerting mechanism so you do not have to work on the monitoring setting up the monitoring and alerting okay it has a built-in technology it is a cloud automation as i said it is it is the solar cloud only but it is a manager breakup and disaster recovery obviously if it is in the solar uh it is in the cloud then it will provide all the cloud related uh features like a high ability clustered backup and disaster recovery for security purpose it creates a private cloud instance uh instances as it is managed the infrastructure is not managed by the by us it is managed by the search tax and so the the overall infrastructure cost also get reduced okay so search tax is provided in the two provides the two products one is a managed cloud it is basically your generally a the normal solar which is hosted on the cloud so another option which is very very interesting option is a search studio if you have seen the coveo there you have a ui also so so similar kind of functionality is provided by the search studio it is a ai driven service search service with a content recommendation okay so these are the two uh uh option which is provided by the search tag okay so today we have seen uh the so we are done for today so we have seen what is the what is solar we have seen the basic of the solar what is solar cloud Okay, what are what are the different terminology used in the solar cloud and then we have seen what is search tax And what is the advantage of using search tax? This is just an introduction a nary top video on the solar so in the next few videos we will talk about how the Solar get installed and then we will have a demo of a solar admin module. Okay, so if you are not subscribed to the channel please subscribe to our channel please click on a like button please share it with your friends and please provide your feedback via the comment and also which is very important click on the notification bell icon to get the notification okay if you have any question you must be knowing that where i am reachable it is this is my gmail id and this is my linkedin id so thank you thanks for watching this video