Welcome, Guest. Please Login.
YaBB - Yet another Bulletin Board
May 17th, 2024, 3:43am
News: Welcome to the Cabin! If you want to register send me an e-mail. you can link to my e-mail under the welcome page.


Pages: 1
How to build a Search Engne WebServer (Read 470 times)
Fernando
YaBB Administrator
*****
NY City




Posts: 2320
Gender: male
How to build a Search Engne WebServer
Dec 28th, 2016, 1:34am
 
Its going on almost 15 years since my "so-called" friends and I created BizInfo Plus search engine. It was to be a business search engine for the B2B model. The hardware and software I created and modified worked perfectly. for almost 10 years it ran perfectly, then greed began to set in and then the friends because stupid idiots  
 
Back in the 90s and 2KY, there were many search engine companies, some of them still in business. Ask Jeeves, Ask.com, AOL.com. Lycos, and many others. And many of them, including Google and Yahoo used the free search engine software that was out at the time. In the simplest of terms, if you have a computer or server and it runs Perl or some other web accessible language it can run the search engine software.
 
Search Engine Software comes in two parts - the Spider or Web crawler with the Indexer and the Query Engine with the Interface. The Sppider searches the internet for websites links programmed into it and sends the information to the Indexer to build the database. The Query Engine searches through the database for requested information asked by the Interface - usually a Webpage like Google's front page.
 
Like I stated, many search engines used the same free search engine software out there and some still do to this day! Though there are many search engine software made, the top 3 I used because the other used then were Ht:Dig, Juggernaunt Search Engine, Fluid Dynamic Search Engine, Swish-E and WebGlimpse. Except for Ht:Dig, they all ran on Perl. Ht:Dig was compiled into machine language, but there was a version called Ht://Dig which was made in Perl. There were many others but I can't find the link for them. I did find:
http://www.searchtools.com/analysis/free-search-engine-comparison.html
 
First off, you need a list of links to crawl through. DMOZ has a list their volunteers search the web and put into this huge 8GB text file and one has to select and comb through what they want from it. The file is so big, no editor can read it. You need various file tools to strip the URLs from it, making a 10GB+ file to 800MB File. Still too big for any text editor to read. But you can continue to filter it and divide up the files to .com, .net, .org, .edu, and .mil websites. Then you can eliminate the trailing URL from the .com, .etc ending and now the file becomes manageable. With these separate links lists, you can begin spidering the URLs and indexing the results into your database.
 
But this can take days, is not weeks to do, especially the .com list. Back in '02, it took 6 weeks to do all the .com websites. Now, I can guess it can take longer. You would need to break down the list and spread it across several machines at once. Then you can merge the results together  
 
But then 2 problems come up. The Indexed database becomes huge. In '02, the indexed database I created for BIP was 32GB in size.  OK So it is huge. Then came the second problem, try accessing it. Using a typical hard drive, a simple search on such a database took 20 minutes to do. No one was going to go to your search engine and wait 20 minutes for a response. So how was the online search engines doing it?
 
A RAM Drive. Take RAM from the computer and making it into a virtual drive, accessing the file only took micro-seconds! But not all system can support large RAM spaces. The second best option is to use to a RAM Drive connected to the Hard Drive Port. Hyper OS has one, similar to the one I built long ago:
http://www.hyperossystems.com/
 
A Third option is using a FlashRAM Drive based on CF or SSD Drive set up. This third set up is the slowest of the three but still faster than a hard drive by many times.
 
With the database in the RAM or SSD Drive, one can access the information within micro-seconds. All one has to do is see if their system can take a million hits a month and a week. There are test sites out here you pay to test your webserver on. With the database on the RAM Drive, it can handle over a million hits a week.
 
Once you have this much done, than you can put your search engine online to a high-speed connection.
 
There is a lot more to do like securing the server, but as for the search engine itself, this is it. You're done. At least once a month you should update your database and check your logs.
 
This can work on any system, the original server for the BIP search engine was on my PowerBook G3 laptop and on a G4 Mac Tower co-located at a data warehouse. We moved up to XServe Mac Servers running in the same space with Google and Yahoo on a OSD23 line.
 
Th funny thing is, if done right, a very basic and limited search engine can run on a Raspberry / Banana / Orange / Nano Pi board or any other small board system. In fact I would like to see if it can run as in theory it can. It would be limited to its database size.
 
The demise of BIP was from my "so-called" friends, they got offers from Google, Yahoo, IBM and others to sell the search engine and its code to them. They were offered several million for this as I would find out. But I held onto the code I modified, and I was offered nothing. So I took the code and left. They were never able to recover from that.
Back to top
 
 
View Profile   IP Logged
Fernando
YaBB Administrator
*****
NY City




Posts: 2320
Gender: male
Re: How to build a Search Engne WebServer
Reply #1 - Jan 1st, 2017, 10:01pm
 
History as I remember it, AltaVista (bought out by Google in '98) used Fluid Dynamic Search Engine as did Ask Jeeves. Yahoo and Ask Jeeves used a modified version of Ht:Dig. Google experimented with several search engines before making one of their on based on the code of Ht:Dig and Juggernaunt.
 
There were many other search engine sites, and they used modified code from the software that was out there. I forget the link but there was a large site that had all the old search engine software. I need to find that website again. A lot of valuable software is (or was) on there.
Back to top
 
 
View Profile   IP Logged
Hondo I. Sackett
YaBB Administrator
*****
Behind you!




Posts: 1312
Gender: male
Re: How to build a Search Engne WebServer
Reply #2 - Jan 2nd, 2017, 10:22am
 
Good to know!
Back to top
 
 

Well the cowboy, like the red man, you had to leave your land
You can't raise your stock and plant your crop in the gumbo and the sand
Greed disguised as progress has put us to the test
They won't be glad until we're gone from our home out in the west
It's sad to see those good old days replaced with greed and doubt
Soon we'll leave the country, the campfire has gone out
Bid 'em all adieu, you can't turn the world about
The cowboy left the country, the campfire has gone out
View Profile WWW   IP Logged
Fernando
YaBB Administrator
*****
NY City




Posts: 2320
Gender: male
Re: How to build a Search Engne WebServer
Reply #3 - Jan 2nd, 2017, 1:08pm
 
Quote from Fernando on Jan 1st, 2017, 10:01pm:
History as I remember it, AltaVista (bought out by Google in '98) used Fluid Dynamic Search Engine as did Ask Jeeves. Yahoo and Ask Jeeves used a modified version of Ht:Dig. Google experimented with several search engines before making one of their on based on the code of Ht:Dig and Juggernaunt.

There were many other search engine sites, and they used modified code from the software that was out there. I forget the link but there was a large site that had all the old search engine software. I need to find that website again. A lot of valuable software is (or was) on there.

 
Mistake: That second Ask Jeeves is Ask.com, not Ask Jeeves.
Back to top
 
 
View Profile   IP Logged
Pages: 1