Carnegie Mellon University

The Digital Library of India and Its Journey

It all started when I, as a member of the Scientific Delegation accompanied the Indian Prime Minister Atal Behari Vajpayee on his visit to the USA in September 2000. After this visit, I met with Prof Raj Reddy at CMU where he unfolded his grandiose vision of digitizing all the Human Race’s knowledge and sought the participation by India. We did some back of the envelope calculation and found out that storage was not issue since Prof Raj foresaw the storage revolution, but the sheer logistic of scanning and digitizing. He envisioned that as part of his superb vision, we should start a “Million Book to the Web Project” (MBP) and said that he will also request China to be a member. Thus started the MBP project initially between USA, India and China and later on Egypt and Australia also pitched in. On my return to India, I contacted Dr APJ Abdul Kalam who was then the Principal Scientific Advisor to the Government of India. He was extremely supportive of the idea and sowed the seed for the Digital Library of India (DLI) with funds to demonstrate the feasibility. Prof Raj was kind enough to provide us with around 250 high speed scanners and two servers to host the contents. Once we had a full demonstration of the processes for digitizing and hosting, the then Secretary of Ministry of IT Mr. Rajiv Ratan Shaw initiated the project with full funding to the scanning centres and IISc’s role was confined to training the people engaged for scanning, providing the necessary software and hosting the contents. By then, Dr APJ Abdul Kalam became the President of India and with his support and that of the ministry of IT, we were able to create the DLI movement with 21 scanning centres spread across the country. This movement was strengthened by the annual MBP meet that Prof Raj organized in USA, India, Egypt and China and this also created a coherent family of MBP and DLI.

So far, in India a total of around 547000 books containing 190.29 pages were created and hosted. Currently these books available in pdf format in the Nation Digital Library of India portal.

Majority of the books have been OCRed and are available enabling a multilingual search engine to search the full contents. One of the major challenges that we faced in hosting the contents is that of ensuring the copyright status of the books. The lesson that we learnt in the contents that we created we could find in many areas bordering two linguistic states there are books written in one script but in the language of the other. In fact, there are more than 8000 books which in known scripts but in unknown languages – perhaps dialects which some of us are not familiar. The DLI contents are perhaps the best source of understanding our culture and this in fact made the DLI contents most popular among the Indic scholars across the world.