How to create a search engine with Google Search API

Well, it's not really a How-to article, but rather a bit of news about what I've been doing very lately.

I created a file search engine by using Google Search API. I first used Google Search API in the past when I wanted to add a bit of content automatically to each pages I created. I was pulling the first 8 results from Google Search API and displaying the title and description of the 8 results, and then lower I was displaying 3 links at random amongst the 8 results.

I think it worked pretty well by analysing traffic logs and search engine requests made by visitors to my site coming from Google.

Now since last month I've been working on using the Google Search API in order to create a PDF search engine. For this it's very simple, I display a search box and whatever the visitors submit I use to search on Google and adding a "filetype:PDF" command to the request. It returns only results whose url ends with .pdf :-)

I then display the results, and here you go I have a file search engine.

I then developped a caching system in order to limit the number of requests I do via the Google Search API, here's my method: I read the results, I create the html with it and write everything in a file.

Next time someone asks for the same request, I look if the file exists and if its creation date is less than one year old. If so I simply read the content of the file and display it. If the file doesn't exist or is too old, I recreate it.

As you can imagine, there will be a lot of files in a short time, so if I have everything in a single folder, there will be problem with the file system of the server, so I don't put everything in the same folder, I create automatically about 6000 folders inside the cache folder and each request always goes to the same folder because I made a system which takes the first 3 characters of the md5 hash of the request and that's the name of the folder.

I also wanted to create lots of pages for the search engines to read, so for each PDF file, I take the title and I use each words from 3 letters and up and I display them as tags with a link to the corresponding page. That way when search engines bots read a page, it will also visit other "tag" pages and simply by visiting it will generate the new page.

Finally what I wanted to have was the typical "top searches" and "latest searches" but without overloading my web server. So what I did was to take into account ONLY searches made by real visitors, that is the ones who submit requests via the search form, not by simply visiting a page. So at this point, when a request is submitted, I look if the request has already been submitted, if yes it simply updates the counter and the added-date in the database and if not it creates a new entry in the database. That's it. And to actually display the top and latest searches I simply read text files. Those text files are updated at most once per hour, and they are updated only when there's a new "real" search being made by a real visitor. At this moment, if the file is older than one hour, I query the database in order to update the stats, with top searches (higher counter) and latest searches (ordered by the added-date of the entries).

So finally I had my PDF files search engine. After that I thought it would be nice to also create a search engine for all kinds of files. So I simply added a field for the file extension, both in the search form and in the database, and that was pretty much it. I also added a top extensions to display, based on the number of real searches, which extensions are the most searches for.

So now I have two search engines, one dedicated to PDF files and another to all kinds of files. I decided to also make them in french.

That's how I have 4 more sites each with their own domain names. :-)
I found it was really a good time I've spent developping this search engine, very stimulating for the brain.
Now I want to create a mobile version of my english PDF file search engine because I never developped a site for mobile phones and I feel that would be a nice experience.

If you too want to develop a similar site, then simply add a comment here if you have some questions or want to exchange ideas to add functionnalities. I really like this project and I'm willing to share any information with you.

Hi, Its very interesting to

Hi,
Its very interesting to note your project.Could you share some more details with me.
Regards,

Hello, what kind of details

Hello, what kind of details would you be interested in? I may write another post with details later, if you tell me what you'd like to read I can take it into consideration.

my friend can you please

my friend can you please display the php codes you devloped to get the recent searches and top searches as i have no experiience on php and my friend ave me a pdf search engine which depends on yahoo api with pdf results if also you can display the link used in google api to get pdf results in details thanks again

Hi, I'll see what I can do in

Hi, I'll see what I can do in a future blog post.

Hi, Very interesting post.

Hi,
Very interesting post. Can you provide the link to the search engine you created?

Hey, nice article and

Hey, nice article and congratulations for your results.

I wonder if you could share us a more detailed post along with technical details: in what language is developed, what procedures do you use and recommend other articles to read before one can make similar web page (google search API).
Best.
R.

Nice article I am planning to

Nice article I am planning to write code for the same as I want to write pdf book search ... thanks

Thanks for sharing this

Thanks for sharing this informative articles. Google is the most-used search engine on the Web. If you want to know more about google manual, you can visit this.

you can make a web site like

you can make a web site like pdf book search

Sure you can :-) But you

Sure you can :-)
But you arrive after the battle you know?
Google has cleaned all pdf search engines (and other kind of file search engines) based on its API at the end of may / start of june.
Even the big sites with PR6 and lots of traffic went down.

But yes you can do a web site like pdf book search if you wanted, but what's the point now?

hi, i've searching on how to

hi, i've searching on how to make search engine for pdf file using google API.
can you share with me how to make it?

Please write something in the box below, a comment, a question, anything at all. You can even post links I don't mind at all. But please post something. If you leave this page without writing anything all your beloved ones will die instantly. Come on, don't be a rat, write something. I'll even give you money, and lot of it! You can do it, if you don't do it for me, do it for someone else, anybody, I don't care, could be for Obama or for the pope, but do it NOW before it's too late! Please, pretty please?

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.

Similar Posts

And also

eXTReMe Tracker