Most folks that are working with Mashups just assume that services and APIs will magically appear but unfourtunatly there are not that many public APIs around today. Just check out programmableweb and you will see. More and more are added every day, but it will bever reach the level that a majority of systems have an API, especially not if you think about systems within the coporate firewalls. Simply put there is a painful lack of APIs, and if that is not addressed it will stop the mashup wave in it’s tracks. Fortunatly there are already smart people working at this, and one of the solutions is to start using HTML as it is an API. That’s right, start using all the data and functionality that today is available in HTML to build new innovative mashups and solutions.
The potential of HTML
All new interesting applications (Skype being the exception that proves the rule) has an HTML interface. And this is true not just for the consumer facing applications, but for Enterprise level applications as well. So with the millions and millions of HTML pages in existance today it is not unlikely that HTML is one of the worlds most common data formats (I wonder how it compares to printed text and audio for example). The great thing with HTML is that it does not just contain data, it also is the interface to a whole lot of functionality (when you search Google you do that via HTML don’t you?). What if we could use HTML as one big API? That would make HTML the worlds most widespread API and that would give mashup developers and programmers access to more data and more functionality than ever before.
The problem with HTML
Almost not sites on the web today are following the HTML 4 standards. So todays browsers are very good at interpreting the tag soup that most pages consists of (ie broken HTML, missing end tags etc). Furthermore HTML is used to both mark up data in a document, for example with the
<title> tag, and to mark up how the data should be presented, for example the
<b> tag. All this together makes HTML documents unstructured documents (by implementation, not by nature) with data in very application specific formats (microformats will help here, but there will be some time before that is widespread enough to be really usefull).
So we have huge amounts of data and functionality in HTML and we want to use it to make our latest funky Mashup. The good old approach is to try to parse the page in question using Perl, now it can be done pretty well using almost any modern programming language. There are several problems with parsing though:
- It is damn complicated to get to work on serious web pages and once it is done it breaks easily
- Good luck handling a real tag soup, already that breaks most parsers (since using XML parsers for this means that the parser simply stops at the first error it encounters)
- It is boring to program those parsers (if you havent tried then lucky you)
- It is hard to handle things like login into a web application (ie session handling) and to navigate over several pages
Still this is a very usual approach to get to data and functionality in HTML. But there is a much easier way…
An Eye Opener…
Thinking of HTML as an API does significantly expand your horizons as a developer. I have literaly seen a light go on in fellow geeks eyes when they realize the potential. Suddenly the web is really yours to use in your programs and mashups. When suddenly APIs and services are abundant then you can start using the other cool mashup tools around (Teqlo, jMaki etc).