Categories
Tips

Web Scraping in WordPress

I had been waiting for WpVeda to be launched since my first chat with Rahul some months back. Its always great to see conscious efforts from team rtCamp to give it back to the community. Congrats Rahul, and thanks for my mention in your first post.

I too strongly advocate open source and community contributions in general and I myself have also published some plugins.

Today I am writing about one of my plugins which caters to a very specific niche but surely has a good user base from the audience it caters to. The niche task which this plugin performs is web scraping.

ADVERTISEMENT

What is Web Scraping?

Web scraping (or Web harvesting, Web data extraction) is a computer software technique of extracting information from websites. Web scraping focuses more on the transformation of unstructured Web content, typically in HTML format, into structured data that can be formatted and displayed or stored and analyzed. Web scraping is also related to Web automation, which simulates human Web browsing using computer software. Exemplary uses of Web scraping include online price comparison, weather data monitoring, market data tracking, web content mashup and web data integration.

WP Web Scraper

WP Web Scraper is an easy to implement professional web scraper for WordPress. It can be used to display real-time data from any websites directly into your posts, pages or sidebar. This plugin can be used to include real-time stock quotes, cricket or soccer scores or any other generic content. The scraper is built using time tested cURL library for scraping and phpQuery for parsing and outputting HTML.
It also has some advanced configuration features such as:

ADVERTISEMENT
  1. Configurable caching of scraped data. Cache timeout in minutes can be defined in minutes for every scrap.
  2. Custom Useragent header for your scraper can be set for every scrap.
  3. Scrap output can be displayed through custom template tag, shortcode in page, post and sidebar (text widget).
  4. Error handling – Silent fail, standard error, custom error message or display expired cache.
  5. Option to clear or replace a certain regex pattern from the scrap before output.
  6. Option to strip off a single or multiple HTML tags from the output.
  7. Supports html charset conversion between various charsets.

Am sure, this all sounds a bit geekish to someone who has not explored web scraping before.

For a better perspective, what could be a better example of displaying the download count of my plugin from wordpress.org. To do this, all I need to do (assuming that the plugin is installed) is to insert the following shortcode in my page, post or the text widget of sidebar:
[wpws url = “http://wordpress.org/extend/plugins/wp-web-scrapper/stats/” selector=”.last-child td” cache=”60″ timeout=”3″ error=”cache”]

Using Shortcode

Now lets decrypt the shortcode. The first parameter – url, specifies the url from which you intend to scrap data. The second (and perhaps the most important) parameter – selector, specifies the exact location in that url where your data is located. The syntax used is of good old CSS. If you have worked with CSS or jQuery before, it should be very easy for you to write a CSS selector.

ADVERTISEMENT

In brief, the selector mentioned above refers to the first td element which is a child of the first element with the class ‘last-child’. You can easily build such selectors by viewing the source code of your url. Rest of the parameters are optional. Cache specifies the time in minutes it will cache data instead of fetching data on each request, timeout is the maximum time in seconds the scraper will spend on the task and error denotes what happens in case the scraper fails (if cache is the value for error, it will display expired cache data in case of an error).

I hope this gets you interested enough to start exploring its limit less possibilities. Download your copy, install it and get started. You can find more info on this plugin in the FAQ section or its official page.

Please note that the content you scraping might be copyright protected. Its best to at least attribute the content owner by a linkback or better take a written permission. Apart from rights, cURLing in general is a very resource intensive task. It will exhaust the bandwidth of your host as well as the host of of the content owner. Best is not to overdo it.

Categories
Analysis

Google Easter Eggs

The Google Web site–and many of the company’s software programs–are loaded with gags, goofs, and Easter eggs that have helped Google maintain a fun-loving spirit in the cut-throat world of Web competition. Here’s a compilation of a few ones:

Thanks to Google Calculator, you can use the Google search box for serious number crunching–anything from converting currency to solving advanced maths equations. But things don’t always add up the way you think they will with Google Calculator. Try searching for “answer to life the universe and everything,” “number of horns on a unicorn,” or “once in a blue moon” for unexpected computations.

If you Google “ascii art“, you’ll find an ASCII representation of Google’s logo next to the search box. Likewise, try searching for “recursion“. The search results will have a ‘did you mean’ inspire the right spelling. It’s actually a geeky Easter egg closely related to the witty use of the “did you mean” feature to help you understand recursion.

Google’s free image editing and management software Picasa has a wild side that few people know about. Open Picasa and press Ctrl-Shift-Y, and a teddy bear will pop up. Keep pressing those keys and watch out for the sloth (or sleuth) of red-bowtied bears that take over the program!

ADVERTISEMENT

Over and above all this there is a specific google’s easter special search page too!

Google also has a tradition of perpetrating April Fools’ Day hoaxes. More on these can be read on its  wikipedia page.

(Image Credits: Google’s easter special page)


[Editor’s Note: This post is submitted by our guest blogger  Akshay Raje. Akshay is a self taught freelance web designer and developer, loves to travel and is a great movie buff too. Web.D is about his projects and experiments with web technologies.

If you, too would like to write for Devils Workshop, please check this. Details about our revenue sharing programs are here.]

Categories
News

Google Caffeine – Future of Google Web Search

For the last several months, a large team of Googlers have been working on a secret project: a next-generation architecture for Google’s web search. It’s the first step in a process that will push the envelope on size, indexing speed, accuracy, comprehensiveness and other dimensions.

The new infrastructure sits “under the hood” of Google’s search engine, which means that most users won’t notice a difference in search results. But web developers and power searchers might notice a few differences.

This new search tool (codenamed: Caffeine) is functional and a pre-beta release of it is available at www2.sandbox.google.com. My overall take on Caffeine is as follows:

  1. Speed and Index size is definitely an improvement. In some cases, the results are returned in half the time with almost double the number of estimated results.
  2. Search is more real-time with Caffeine. When queried for names of individuals, one can clearly view the latest Twitter updates too. This is in clear response to capture the ‘breaking news’ market where FaceBook and Twitter (and even Bing) are trying their hands at.
  3. There seems to be a major tweak in the keyword matching algorithm which is good for searchers, but makes it even more difficult for SEO professionals to catch up.

In the coming week, I will do a more detailed and a hands on review of this tool and post my observations.

(Source: Google Webmaster Central Blog)

(Image Credits: TechTree.com)

Categories
News

Finally – A Google Operating System

Google Chrome has always been a little more than a browser: it’s optimized for running web applications, each tab runs as a separate process, the interface is minimalistic and there’s even a task manager. “We realized that the web had evolved from mainly simple text pages to rich, interactive applications and that we needed to completely rethink the browser. What we really needed was not just a browser, but also a modern platform for web pages and applications, and that’s what we set out to build,” that’s what  Google quoted in September 2008 at the launch of Chrome.

Yesterday, Google announced a new project that’s a natural extension of Google Chrome — the Google Chrome Operating System. It’s Google’s attempt to re-think what operating systems should be. Google Chrome OS is an open source, lightweight operating system that will initially be targeted at netbooks. Later this year they will open-source its code, and netbooks running Google Chrome OS will be available for consumers in the second half of 2010.

Speed, simplicity and security are the key aspects of Google Chrome OS. The OS will be fast and lightweight, to start up and get you onto the web in a few seconds. The user interface will be minimal to stay out of your way, and most of the user experience takes place on the web. And as they did for the Google Chrome browser, Google is going back to the basics and completely redesigning the underlying security architecture of the OS so that users don’t have to deal with viruses, malware and security updates.

Google Chrome OS will run on both x86 as well as ARM chips and we are working with multiple OEMs to bring a number of netbooks to market next year. The software architecture is simple — Google Chrome running within a new windowing system on top of a Linux kernel. For application developers, the web will be the platform.

ADVERTISEMENT

With a new open source communication protocol (Wave), Browser, a bunch of web applications, a mobile OS and now a main stream OS for desktops / netbooks Google is surely investing big time! Lets just hope they stick to their motto of ‘do no evil’ with all these changes.

(Source: Official Google Blog)
(Image Credits: Google Chrome)


[Editor’s Note: This post is submitted by our guest blogger  Akshay Raje. Akshay is a self thought freelance web designer and developer, loves to travel and is a great movie buff too. Web.D is about his projects and experiments with web technologies.

If you, too would like to write for Devils Workshop, please check this. Details about our revenue sharing programs are here.]

Categories
Analysis

An insight on Bing’s real time Twitter search buzz

If you’re a bit of a geek, and use the Firefox browser, you can already add Twitter search results to both Bing and Google via Greasemonkey. But it’s pretty rudimentary, just a list of the five most recent Twitter search results for that particular query pasted atop the regular results. Now, in a nod to the (so called) increasing importance of real-time search, Microsoft has started adding Twitter updates to its Bing search engine. For now, the Twitter-related results are limited only to searches on prominent Twitterers themselves. You can read more on this on Swati’s post on the same.

But wait a minute,  isn’t Google already offering much more realtime search results?

For instance, try searching CISCO on Google and the engine will give a live stock quote too. Try searching USD to INR for the latest conversion rate. Apart from this, Google’s Blog Search crawls blogs almost every 15 mins. Bing has just done to microblogging what Google did to Blogging two years ago 🙂 Nonetheless, am sure Google will soon implement this and may be throughout all twitter accounts.

I may have been a bit too heavy on Bing and that’s because I am completely sold out Google’s ‘Do No Evil’ stand. Let me know what’s your take on this.

(Source: Bing Search Blog)
(Image Credits: Neowin.net)


[Editor’s Note: This post is submitted by our guest blogger Akshay Raje. Akshay is a self thought freelance web designer and developer, loves to travel and is a great movie buff too. Web.D is about his projects and experiments with web technologies.

If you, too would like to write for Devils Workshop, please check this. Details about our revenue sharing programs are here.]

Categories
Reviews

Firefox 3.5 is officially out!

Yes, as confirmed by Mozilla, they have officially released Firefox 3.5. My earlier post detailed about the features but now you can experience all of these yourself with the public release of Firefox 3.5. If you are a Firefox fan like me, am sure, by this time, you would have already upgraded your browser. Well, here are some more interesting facts about this magnificent organic software

World wide real-time Firefox downloads is a micro site developed to showcase the amazingly fast canvas rendering capabilities of Firefox 3.5. This page uses no applets, no flash or any other plugin to render real-time firefox downloads on a world map! It actually plots graphs in real time at a lightning speed using a magical blend of server and client side magic! Try dragging/scrolling the Canvas map, its really nice work. Here is another fabulous example of what can be achieved using the canvas element and its improved rendering capabilities. This is a first person shooter game (remember Wolfestine 3D?) completely coded using JS and Canvas. Whats surprising is that the code size of this whole game is just 13.33 kb!

Had enough of the canvas element. Well, there is more to the new Firefox apart from canvas and fast rendering JS. Session Restore is one such feature. As quoted by Mozilla, “If Firefox or your computer unexpectedly closes, you don’t have to spend time recovering data or retracing your steps through the Web. If you’re in the middle of typing an email, you’ll pick up where you left off, even down to the last word you typed. Session Restore instantly brings back your windows and tabs, restoring text you entered and any in-progress downloads.”

Video tag support is what all of us were expecting, however its really amazing to watch it in action. Checkout this embeded video to understand what I mean (will work only on Firefox 3.5). Apart from a demo of the HTML5’s video tag, the video content will also give you an overview of a lot of other interesting features.

ADVERTISEMENT

(Source: mozilla.com/firefox, some bit of Googling and tweets by John Resig)
(Image credits: lifehacker.com)


[Editor’s Note: This post is submitted by our guest blogger Akshay Raje. Akshay is a self thought freelance web designer and developer, loves to travel and is a great movie buff too. Web.D is about his projects and experiments with web technologies.

If you, too would like to write for Devils Workshop, please check this. Details about our revenue sharing programs are here.]

Categories
News

Google to Promote Web Speed

Google has created a Website for developers that is focused exclusively on making Web applications, sites and browsers faster. The site – code.google.com/speed grew out of Google’s decision to publicly share a set of best practices the search company has developed over the years.

By offering tutorials, tips and performance tools via the new site, Google wants to help make the Web faster by assembling a community of developers interested in online speed and performance.  Apart from tutorials and downloadable tools, the site will allow developers to submit ideas, suggestions and questions via a discussion forum and by using Google’s Moderator tool.

The site currently covers tutorials on various optimization aspects such as:

  • CSS optimization
  • Gzip compression
  • HTTP caching
  • Improving performance with Page Speed
  • Minimizing browser reflow
  • Optimizing JavaScript code
  • Optimizing web graphics
  • PHP performance tips
  • Prefetching resources
  • Properly including stylesheets and scripts
ADVERTISEMENT

It also has a collection of Google recommended tools for helping you discover and improve your site.

(Source and Image credits: Google Code)


[Editor’s Note: This post is submitted by our guest blogger Akshay Raje. Akshay is a self thought freelance web designer and developer, loves to travel and is a great movie buff too. Web.D is about his projects and experiments with web technologies.

If you, too would like to write for Devils Workshop, please check this. Details about our revenue sharing programs are here.]

Categories
Reviews

Firefox 3.5 – Two times faster than 3.0, Ten times faster than 2.0!

Mozilla released Firefox 3.5 Release Candidate 2, which you can download from Mozilla’s Web site. Release Candidate 2 is the first version of Firefox 3.5 that average users might want to run, since it’s faster and more stable than the beta versions were.

Firefox 3.5 boasts a number of significant changes – ranging from new ways to work with the browser features to under-the-hood improvements that Mozilla developers say will make the browser more than twice as fast as Firefox 3 and ten times faster than 2.0 (based on the results of a SunSpider test of JavaScript performance on a Windows XP machine).

Here are some of the new features you’ll find in Firefox 3.5.

What’s new in Firefox 3.5 (Release Candidate 2)

ADVERTISEMENT

Firefox 3.5 (Release Candidate) is based on the Gecko 1.9.1 rendering platform, which has been under development for the past year. Firefox 3.5 offers many changes over the previous version, supporting new web technologies, improving performance and ease of use, and adding new features for users:

  • This release candidate is now available in more than 70 languages.
  • Improved tools for controlling your private data, including a Private Browsing Mode.
  • Better performance and stability with the new TraceMonkey JavaScript engine.
  • The ability to provide Location Aware Browsing using web standards for geolocation.
  • Support for native JSON, and web worker threads.
  • Improvements to the Gecko layout engine, including speculative parsing for faster content rendering.
  • Support for new web technologies such as: HTML5 <video> and <audio> elements, downloadable fonts and other new CSS properties, JavaScript query selectors, HTML5 offline data storage for applications, and SVG transforms.

Mozilla provides Firefox 3.5 (Release Candidate) for Windows, Linux, and Mac OS X in a variety of languages. You can get the latest version of Firefox 3.5 (Release Candidate) here.

(Source: Mozilla.com/firefox)

(Image credits: PCworld.com)


[Editor’s Note: This post is submitted by our guest blogger Akshay Raje. Akshay is a self thought freelance web designer and developer, loves to travel and is a great movie buff too. Web.D is about his projects and experiments with web technologies.

If you, too would like to write for Devils Workshop, please check this. Details about our revenue sharing programs are here.]

Categories
Tips

Adding a bit of Google to your Website or Blog

My last article on elaborated Google Wave which was launched in the Google I/O this year. What got washed out by the ‘Wave’ was another launch by Google at the I/O seminar called Google Web Elements.

Google Web Elements provide an easy way for you to add Google products to your website or blog. You can add content, such as news, Google Maps, and YouTube videos, along with features like social conversations from Google Friend Connect. Google Web Elements are incredibly easy to use: just choose the features you want, and copy a few lines of code to your website.

Anyone with a blog or other website can use Google Web Elements. For example, if you own a restaurant, you can add the Maps Element to your site to show customers how to find you. If you run a blog about your favorite sports team you can add the News Element to display customized headlines about how great your team is. If you own a business, you can use the Custom Search Engine Element so customers can easily search your site, and you can even show Ads by Google on your site and share in the revenue from them.

These are the currently available Google Web Elements:

ADVERTISEMENT
  1. Calendar – Remind visitors of important dates by adding Google Calendar to your site.
  2. Conversation – Let visitors post comments directly to your website by using the Conversation element.
  3. Custom Search – Harness the power of Google to let visitors search your website and other sites you choose.
  4. Maps – Add Google Maps to your site to help visitors find a location.
  5. News – Show the latest Google News articles on your website, based on topics you choose.
  6. Presentations – Embed Google Docs presentations into your webpage so visitors can watch them directly on the page.
  7. Spreadsheets – Make sure your site’s visitors see up-to-date information. Publish Google Docs spreadsheets directly to your site.
  8. YouTube News – Keep visitors engaged by showing videos from YouTube directly on your website.

Needless to say, adding elements to your site is incredibly easy. Here’s how to do it:

  1. Go to the Google Web Elements website (www.google.com/webelements) and choose the element you want to add to your page.
  2. Choose the settings you want for your element. You’ll see a preview of the element exactly as it will appear on your site.
  3. Copy the few lines of code from the page and paste them into your webpage. That’s it!

(Source: Google Web Elements | Official Google Blog)

Categories
News

Radio on iPhone and iPod Touch

There’s just something about radio – that element of surprise, that anticipation of what’s coming next, that man-I-haven’t-heard-that-song-in-years sensation. That’s the promise of the new free Yahoo! Music iPhone app. Powered by CBS Radio, it lets you browse through 300+ stations within more than 20 genres (from Bollywood to Goth to Naughty Comedy). CBS RADIO crammed a lot of great features into the Yahoo! Music app so that you will never be without your favorite music when you’re on the go. Here are just a few of the great features included:

  • Browse through 20+ genres
  • Skip up to six songs an hour
  • Browse stations by genre or find local stations ‘near you’ utilizing GPS
  • Share stations with friends
  • Browse your listening history or recently played stations
  • Buy albums/songs via iTunes
  • Listen to 1010WINS, KROQ, WFAN and more than a hundred other CBS RADIO stations in addition to Yahoo! Music’s 150 music stations
  • Add presets for instant access to your favorite stations

The Y! Music app streams two distinct kinds of stations; interactive & non-interactive. The main difference between interactive and non-interactive streams is the ability to skip songs. Every time you launch an interactive station, you will be listening to a unique music experience. No two streams are the same! With non-interactive stations, you will be listening to the same stream that is being broadcast on the air. If you’re in Los Angeles, you can listen to a station in New York as if you were really there! It’s like being in two places at once!

(Image credits and Source: Yahoo Music Blog)


<

p class=”MsoNormal”>[Editor’s Note: This post is submitted by our guest blogger Akshay Raje.

Akshay is a self thought freelance web designer and developer, loves to travel and is a great movie buff too. Web.D is about his projects and experiments with web technologies.

If you too, would like to write for Devils Workshop, please check this. Details about our revenue sharing programs are here.]