Now here’s a free web service I find extremely useful.
We get dozens of PDF documents from the university, but some of them (annoyingly enough) are locked for printing. Printing long articles is essential, at least until we have e-book readers that are good enough at displaying scanned PDF files.
Using this free, online unlock service for PDF files, you can now print locked PDF files or copy parts of them into other documents. You simply upload the file you want to unlock, agree to the don’t-blame-us-we’ll-blame-you Terms of Service, and this service does the rest for you.
PDF Document Locked for Printing?
June 18th, 2009 by nadav
Posted in Technology, Web | 1 Comment »
Jericho – the incredible HTML parsing machine
February 22nd, 2009 by nadav
Visually, HTML is remarkably similar to XML. No wonder, since they both share a long history with SGML. But while XML documents are generally well formed, real world HTML is far from perfect. As any (frustrated man-loathing) web browser developer knows, writing a browser that displays real world HTML requires a lifetime of work and a handful of patience. After all, HTML is not an achievement of working groups and standards organizations like the W3C. It is first and foremost the brainchild of a bunch of brilliant geeks, and the outcome of the famous browser wars.
Confronted with the task of parsing HTML, and reluctant to roll our own parser, we went looking for the most potent HTML parser out there – preferably one that’s written in Java, or perhaps one that targets the .NET Framework.
Of all the libraries we’ve checked out back then, Jericho stood out from the crowd – for the following reasons:
- It’s not naive. Many libraries out there start out as an experiment of a naive programmer who witnesses the simplicity and elegance of XML and attempts to apply it to HTML. Author Martin Jericho knows what’s out there.
- By default, it does nothing. Unlike JTidy and many other proactive libraries, Jericho only modified the segments of the document it is instructed to modify. Web pages are generally written to be parsed by the most popular web browsers. Proactive libraries have to “understand” HTML the same way the popular browsers do; until they do, they will keep introducing unintended changes to the page when viewed in a browser.
- It does one thing, and does it well. Jericho does not attempt fix broken XML. It doesn’t try to fit the latest Web 2.0 AJAX framework. It does one thing – parse and surgically modify HTML documents – and does it well.
It is licensed under both the Eclipse Public License and the LGPL.
Posted in Technology, Web | 1 Comment »
JavaScript and the missing toString method
February 20th, 2009 by nadav
Debugging JavaScript code is no pleasant task. Debuggers are buggy, logging libraries are unreliable and the whole experience is generally very time consuming, especially for someone who’s used to the Java way of doing things. During my long and painful session of debugging wild1 JavaScript code, I always knew I could save a lot of time if JavaScript had the equivalent of Java’s ubiquitous toString method or PHP’s useful print_r function.
Apparently I’m not alone. Dutch blogger Kevin van Zonneveld is leading a project called php.js, which aims to port as many PHP functions as possible to JavaScript. One of those functions is print_r, which “[p]rints human-readable information about a variable”. It takes a variable of any data type, and prints it (including any fields it may contain) in a format well suited for debugging purposes. Plus, it doesn’t depend on any library or framework. This one definitely joins my JS toolkit.
It’s licensed under the liberal MIT license.
1Wild site is a term used internally at Creative Calls to refer to a web page that’s out there on the web, as opposed to a page produced by our system and under our control. The term was borrowed from the world of computer viruses.
Posted in Web | 1 Comment »
IE Quirkology: The META Content-Type Refresh Bug
February 15th, 2009 by nadav
At Creative Calls, we analyze hundred of web pages from all around the web. During our daily work, we face every possible browser quickiness and web hack known to mankind. Here’s one of the issues we’ve have the pleasure to deal with – appropriately enough, we’ve named it The META Content-Type Refresh Bug.
Here’s a code sample:
<html>
<head>
<script type="text/javascript">
alert("hello");
</script>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
</head>
<body>
</body>
</html>
What does this code do when opened for the first time in Internet Explorer? Apparently not what most people think it would do. Click here to see for yourself (make sure you’re using IE7).
To be fair, MSDN has the following advice on this issue:
To apply a character set to an entire document, you must insert the meta element before the body element. For clarity, it should appear as the first element after head, so that all browsers can translate the meta element before the document is parsed.
Though warning against the behavior of web browsers in general, it fails to specifically address the weird behavior exhibited by IE. But it wouldn’t have helped us anyway since we didn’t even know what to look for.
Posted in Web | No Comments »
Quiz Yourself on the IPA alphabet
February 12th, 2009 by nadav
The International Phonetic Alphabet is a system of phonetic notation used to represent the sounds of spoken languages. It is used by linguists to transcribe words into unambiguous written representations. The word transient, for instance, is often pronounced [ˈtrænʃənt] (or tran-shuhnt) but may also be pronounced [ˈtrænziənt] (or tranzee-uhnt). While the non-standard transcription (shown above in round brackets) is suitable for American English, it cannot represent all the sounds of other languages, such as Quechua.
A couple of years ago I’ve thrown together a game that helped my friend Eran and I study for the Israeli equivalent of the SAT exam. While that game cannot be made public because of legal issues, I have since adapted the game to help study the IPA alphabet. Many of the advanced features are broken, but it’s still mostly usable.
It was written in PHP over a few hours, and has quite a few of bugs. It’s no longer being developed. Click here to check it out.
Posted in Linguistics | No Comments »
Hacking Canon PowerShot SD450
April 7th, 2008 by nadav
After getting an iPhone with a decent built-in camera, I was under the impression that point-and-shoot cameras, like the Canon IXUS 55 (aka PowerShot SD450) I own, are all but dead. In an unusually interesting story, CNET’s Leonard Goh reports about an unofficial “expansion pack” for Canon cameras. The expansion pack’s purpose is to workaround the fact that the camera’s firmware – not necessarily its hardware – is what limits what the camera can and cannot do.
I installed the CHDK (Canon Hacker’s Development Kit) version that fits my camera model, and was amazed to see it in action. It provides, for instance, improved control over shutter speed, enabling one to take these fabulous photos using a relatively cheap camera.
Posted in Photography, Technology | No Comments »
Facebook launches Chat
April 6th, 2008 by nadav
The word is on the street that Facebook Chat is now in public beta, with selected networks already able to use it.
Being a member of the Tel Aviv University network, I’m still unable to use it myself.
Posted in Web | No Comments »
Roll your own home automation solution — from scratch
April 5th, 2008 by nadav
My brother Ori is rolling his own home automation project, which includes designing and manufacturing the hardware and developing a software application that would control the hardware (the part that makes it truly smart
). By sending and receiving simple instructions through the serial port, the software would be able to perform such simple tasks as turning on the light when a switch it turned on, as well as more sophisticated stuff like automatically turning on the dishwasher when a random car drives into the driveway.
Ori is an extremely talented electrical engineer, but he’s a little lazy when it comes to software. He approached me a while ago and suggested that I participate in the project, and be responsible for the software part of it.
Since the hardware is proprietary (the hardware as well as the protocol it supports are completely unique), a question that we asked ourselves early on in the project was whether we can use an existing piece of software to control the hardware. One problem with many existing open source solutions, though, is that they tend to mix the implementation of the hardware protocols they support with the rest of the application code, making it hard to re-use existing code or add support for new protocols.
One solution that does look different in this aspect is HouseBot, a plain old shareware program. HouseBot has a plug-in architecture that supports two main concepts: Hardware Interfaces and Devices. By defining a Hardware Interface that supports our cryptic hardware protocol, and Devices that represent real world appliances that interact with this Hardware Interface, we can write only the parts necessary to support our unique protocol, while harnessing the user interface and generic application logic provided by HouseBot. At $80 a pop for such a niche product, it would definitely be a bargain.
I’ll post more information on the project as it develops.
Posted in Technology | No Comments »