Season of KDE 2006
Touchscreen application implemented in the K Desktop
A Season of KDE 2006 project by Emmanuel Lesser
Mentored by Olaf Jan Schmidt
Project Details
Whoever wants to benefit from the advantages of a touchscreen can either buy such a screen for his desktop or opt for a so-called Tablet PC. Both options are very expensive, and therefore not accessible to many computer users who could benefit from its many advantages, like people who suffer from RSI, disabled people or simply for specific applications and games.
The objective of this project is to make touchscreen available for every computer user, by eliminating the need to invest in expensive new hardware. In fact, no investment in new hardware is required at all. The only requirements are a computer (with screen) and a webcam.
Here's the basic idea:
When installing the Touchscreen application on a computer system, it
will place small letters, numbers or symbols next to every icon on the
desktop. The webcam that is `looking' at the screen, constantly
streams it's data-feed to the OCR engine of the Touchscreen
application.
Now, by touching a certain icon on the screen, the letter next to this icon gets covered (by the finger of the user). The OCR engine, which is expecting to `read' a predefined sequence of letters, detects that one letter is missing. This way, the program can determine which icon was covered, enabling the system to open the correct application for the user.
The system is the easiest to implement on the desktop, but will of course also be useful in other applications, like office suites, games, (web)browsers, etc...
For this reason, the program will be developed in such a way that the main-engine will provide desktop implementation, while support for other applications can be added simply by creating new sub-engines.
Additional Project Details
Of course, some practical issues arise, like what happens when the hand of the user also covers other parts of the screen, and how to place the webcam in front of the screen to get a good view, without narrowing the workspace of the user.
To solve the first problem, some ingenious AI algorithms can be written in Prolog, which will enable the system to determine by itself what part of the screen the user meant to touch.
For the `position-of-the-webcam' problem no obvious solution comes to mind. Therefore, the program will request re-calibration of the webcam every time it is moved from its initial position. Of course, any better alternatives can be implemented in later versions of the application.
Another important point is to preserve a stable system, working at a normal speed, which is not easy when running a webcam and an OCR application in the background at the same time. However, this problem is not so relevant for the Touchscreen program, as the database of possible characters to recognise remains quite small.
Deliverables
The Touschscreen application consists of a main-engine, an OCR-engine (which is partially integrated in the main-engine and partially in the sub-engines), some sub-engines and a graphical user interface (GUI).
Some details:
- main-engine: written in C, contains webcam driver support, live data-feed processing, OCR-engine and general communication with the operating system
- sub-engines: written in C and Prolog, contains support and implementation for specific user applications, and AI algorithms to determine the correct necessary action when certain data is obtained from the main-engine; also contains an application- specific database of possible characters to recognise
- GUI: written in C++ and Java, provides easy user/system interaction and user preferences configuration The application will be delivered in binary files (separate files for each component). An automatic installer file can also be included (tar).
The source code will of course also be delivered and made public, depending on the preferred publishing license.
Project History
The project idea was initially developed in January 2003. In the next few months, I turned the idea into a working application by writing parts of the program in JavaScript, using an existing OCR-engine and creating a GUI with Multimedia Builder form Mediachance. Although the application turned out to be very slow and extremely system-dependent, some national computer magazines showed interest in publishing an article about it. Eventually one got published in ClickX Magazine, issue 40 (6th May 2003). A copy of the article is available online in PDF format (please refer to the top of this proposal for URL details). An original copy of the magazine can also be provided. Please keep in mind that the article is written in Dutch.
From May 2003 until now, the project hasn't improved much. I stopped working on it once I started studying at my current college, mainly due to a lack of time. When I found out about Google's Summer of Code it seemed to me as the perfect opportunity to continue working on it. I firmly believe that by coding a custom OCR-engine, using more flexible (low-level) languages like C and with my extended knowledge and experience, this application can become very fast and compatible with virtually any platform.
Plan
- Code a custom OCR-engine
- Manually convert the old JavaScript code into flexible and usable C code
- Create a new and improved GUI
- Develop AI algorithms for intelligent processing of OCR-data
- Code the sub-engine for implementation in an additional application
- Implement a wide range of drivers for popular webcams
- If any time is left: write a brief user's manual + advice on how to< position the webcam in front of the screen
See http://stunix.netfirms.com/kde/index.htm for details.
[ Edit ]