Selenium vs CURL Auth Testing

A few months ago I was tasked with setting up a system to unit test website authorization using Selenium.

To fast forward, I also ended up creating the same tests using cURL, enabling me to compare using the two solutions to the problem.

First, a description of Selenium and how it works: Selenium is a system that can automate a browser and perform most of the tasks a user will perform on a website via a browser. It is great technology.

Selenium 2.0 also provides a "hub" that allows scaling up the number of servers you distribute your browsing over. Selenium allows you to "drive" many different browsers accross most platforms. Whatever browsers you can use on linux, you can have selenium "drive" on that OS, and whatever you can use on Windows, you can have selenium "drive" on that platform.

One thing Selenium cannot do is have you "drive" a Windows only browser such as Internet Explorer on a Linux box. The Selenium documentation is not crystal clear about that.

cURL is a php library that allows fetching of web pages, handling of cookies etc. It is lightweight, and like Selenium, you can log on to a password protected website and browse around using it.

The goal was to automate the authorization testing within our php unit testing framework so that any problems could be caught. This would allow hundreds of web page access checks without the labor that a person would have to go through to do this.

Speed

All of our unit tests are currently performed with a visit to one web page. Thousands of unit tests are run here, and very quickly too. It became apparent early on that with Selenium, these were going to take time, so I separated out the web authorization testing to not impact the regular unit testing.

After that, it became aparent that Selenium was quite slow, and although we had it set up with the "hub" meaning we could add a farm of machines to spread out the work, we also knew that cURL could do this sort of thing much more efficiently.

I observed the Selenium logging at work, and the slow aspect of it's work is iterating through each page DOM element until it locates the element you have indicated you need. Fetching the page is fast. We recognized that we could simply search the text of a web page that cURL returns hundreds of times faster and figure out what we are after without the delay that Selenium introduces.

So Selenium is excellent for emulating a user, but too slow in our books for wholesale web authorization testing. The same task can be done via cURL much much faster, and this can be done from the developers machine...so the need for a separate farm of selenium boxes is not required.

One technique to employ if setting up authorization testing is to start out with a new set of permissions. The idea is to make all the pages fail the test, then, one by one, add the proper permissions checks and get them to pass.

Once the above permissions properly iplemented, this is how we iplement the testing:

We give the test user just the one permission needed to reach that page, and turn off all other permissions. The user should be able to see that page. We then turn off the one permission needed, and turn on all other permissions...the test user should not be able to see that page ( ie this user will be bounced to the "not authorized" page).

One technique for permissions and test user manpulation is this:

Our test user can log on, but has absolutely no capability beyond that point, and cannot do anything at all...The permissions are stored in memcache, in this case Amazon's Memcache system. The test user extends a unique testing class that allows us to not provide that user with regular user properties and allows this test user to have their permissions changed depending on the testing needs of the particular page.

So essentially, the test users ablility to logon is inconsequential, and a regular user will not be privy to this users limited, unique functionality....for security purposes.

Using memcache allows rapid retrieval of the users serialized class instance without making repeated database calls across pages to load up the users permission sets. It is a great way to handle authorization testing, and to speed up a website.