AJAX

Marco Casario from italian Comtaste had a good presentation at Web 2.0 Expo in Berlin today comparing Adobe Flex to JavaFX, Lazlo, Microsoft Silverlight, AJAX and XUL (full name of the presentation is “Choosing th Final RIA Path or Choosing the Appropriate RIA Technology”). In this comparison Adobe Flex won both on the level of maturity, size of active community, small learning curve, multimedia features, usability and spread of the Flash plugin that now reaches 98% of internet users. Of course your choice of technology depends largely on the specifics of your project, but this is another indicator that Adobe Flex is on fire.

Flex & AIR
Adobe Flex is a way to easily develop Flash programs. It is based on MXML and when compiled Flex code becomes a Flash .swf file. That means that it can run in any browser that has the Flash plugin installed. What I really like with Flex is that you can write the frontend in Flex and then use whatever language you want for the backend. Using REST services in Flex is super easy, so as long as your backend can talk REST you can connect to the frontend that way. I like this idea since I would prefer PHP for the backend of whatever app I do since it is widely supported, have a very active developer community and there are lots of PHP programmers all around the world.

AIR stands for Adobe Integrated Runtime (formerly Apollo, see my “How to be cool in a Web 2.0 crowd” on tips on how to use this fact to increase your geek coolness) and it allows you to develop desktop applications using Flex or HTML/JavaScript. AIR applications can run on most platforms (Mac, Windows, Linux etc) and can access local files etc just like any other desktop app. The fact that it is now simple to write platform independant desktop applications using only HTML/JavaScript is really powerful.

Adobe MAX
About a month I was at the Adobe MAX conference in Chicago. Since that is where all Adobe-fanboys gather there were quite a few Hallelujha moments, and maybe I am somewhat influenced by that. One thing was clear from MAX though, Adobe are pushing Flex and AIR with all it’s marketing might. After having an almost 100% market share on software for web designers (Photoshop, Illustrator, Dreamweaver etc) they are now really focusing on the web developers. They are doing a great job making powerful tools that makes development a charm, their Eclipse based Flex Builder is one of the best IDE’s I have used.

Mashups & Flex
Building Mashups in Flex is easy since the REST support is really good. I had a presentation at Adobe MAX dealing with the need for mashups and webscraping in general and combining Kapow Mashup Server and Adobe Flex in particular…

| View | Upload your own

Most folks that are working with Mashups just assume that services and APIs will magically appear but unfourtunatly there are not that many public APIs around today. Just check out programmableweb and you will see. More and more are added every day, but it will bever reach the level that a majority of systems have an API, especially not if you think about systems within the coporate firewalls. Simply put there is a painful lack of APIs, and if that is not addressed it will stop the mashup wave in it’s tracks. Fortunatly there are already smart people working at this, and one of the solutions is to start using HTML as it is an API. That’s right, start using all the data and functionality that today is available in HTML to build new innovative mashups and solutions.

The potential of HTML

All new interesting applications (Skype being the exception that proves the rule) has an HTML interface. And this is true not just for the consumer facing applications, but for Enterprise level applications as well. So with the millions and millions of HTML pages in existance today it is not unlikely that HTML is one of the worlds most common data formats (I wonder how it compares to printed text and audio for example). The great thing with HTML is that it does not just contain data, it also is the interface to a whole lot of functionality (when you search Google you do that via HTML don’t you?). What if we could use HTML as one big API? That would make HTML the worlds most widespread API and that would give mashup developers and programmers access to more data and more functionality than ever before.

The problem with HTML

Almost not sites on the web today are following the HTML 4 standards. So todays browsers are very good at interpreting the tag soup that most pages consists of (ie broken HTML, missing end tags etc). Furthermore HTML is used to both mark up data in a document, for example with the <title> tag, and to mark up how the data should be presented, for example the <b> tag. All this together makes HTML documents unstructured documents (by implementation, not by nature) with data in very application specific formats (microformats will help here, but there will be some time before that is widespread enough to be really usefull).

Another problem is of course that there is fewer and fewer pages on the web that uses pure (albeit broken) HTML, there are more and more Javascript around. Especially in the Web 2.0 applications most of the really interesting functionality is available via AJAX. So it is not only HTML, but also Javascript that has to be taken into consideration when one wants to get to a web applications functionality.

Parsing

So we have huge amounts of data and functionality in HTML and we want to use it to make our latest funky Mashup. The good old approach is to try to parse the page in question using Perl, now it can be done pretty well using almost any modern programming language. There are several problems with parsing though:

It is damn complicated to get to work on serious web pages and once it is done it breaks easily
Good luck handling a real tag soup, already that breaks most parsers (since using XML parsers for this means that the parser simply stops at the first error it encounters)
It is boring to program those parsers (if you havent tried then lucky you)
Can not access functionality that uses javascript and AJAX
It is hard to handle things like login into a web application (ie session handling) and to navigate over several pages

Still this is a very usual approach to get to data and functionality in HTML. But there is a much easier way…

Web Scraping

I bet that a fair portion of the people reading the word “Web Scraping” think of old mainframe terminals and “Screen Scraping” and frown. Don’t worry, technology has moved forward lightyears since the days of mainframes. Web Scraping is to interact with HTML (including Javascript if it is a good scraper) and to either extract data from the HTML or repackage the functionality in the HTML. The data can be saved into a database or a file for example, and the functionality can be made available as a REST service, a programming language API or whatever else makes sense. Suddenly HTML is wide open. Just imagine that you wanted to get data from Digg (before the Digg API that came out a few weeks ago) for some reason, without an API that would be hard. But using a web scraper you could for example build a REST service out of the search on Digg only by accessing the HTML. Web Scrapers are used more and more for doing things like collecting large amounts of job ads or flight information and then repackage that data into sites that then allow users to search for a job or a cheap flight.

Openkapow

The web scraper of my choice is the one supplied on openkapow.com (disclaimer: I am working for Kapow Technologies, the company behind openkapow.com, but trust me in that I am not plugging openkapow to make my boss happy – it is really a great product). Using openkapow one can access data and functionality on any web page and access it as a REST service or and RSS/Atom feed. Of course JavaScript is handled automatically, it is possible to navigate multiple pages, login to restricted pages and have full control over the process flow with conditions and error handling. I recommend that anybody that is interested in how to use HTML as an API takes a look at openkapow.

An Eye Opener…

Thinking of HTML as an API does significantly expand your horizons as a developer. I have literaly seen a light go on in fellow geeks eyes when they realize the potential. Suddenly the web is really yours to use in your programs and mashups. When suddenly APIs and services are abundant then you can start using the other cool mashup tools around (Teqlo, jMaki etc).

Digitalistic

Mashup or die trying

Adobe Flex is on fire

HTML is the worlds most common API