From rolling out social welfare schemes and setting political agendas to refining electoral rolls and taxpayer database, big data can be used in numerous ways to transform e-governance and serve citizens better, experts affirm
Of late, big data has become a buzzword, though for all the wrong reasons.
The world got an inkling of the word after former US national security agency (NSA) staffer Edward Snowden's revelation about Prism, one of the biggest surveillance programmes run by the US for cyber espionage. The processing and analytics of mammoth data in the form of audio, voice chats, emails, images collected through the surveillance programme is based on big data.
However, though big data is being used for national security purposes, it could be equally useful in improving the government and citizen interface the theme at Governance Now's recent conclave on 'Big Data: Transforming e-governance'. The panel discussed the application of big data analytics in several areas, including social welfare schemes, setting political agendas of the day and refining electoral rolls and taxpayer database, among others.
The panel also deliberated upon the nuances of using big data, as issues such as privacy, source of data, the data generation process and standardisation keep getting lobbed.
Setting political agendas
Delivering the special address at the conclave, Dr Arvind Gupta, technology head of the Bharatiya Janata Party (BJP), said big data analytics of conversation on social media space can help shape manifestoes of political parties. Referring to the recent controversy over BJP president Rajnath Singh' statement on English language vis-a-vis Sanskrit, he said while the mainstream media was critical of the statement, the remark was quite well liked in social media since it reflected local perceptions to a large extent.
"Let's take an example of unrest in the country and try to observe people's perception over social media, blogs and other portals," he said. "You see and analyse this trend and make an algorithm out of it. Out of 100 data points, if 80 data points start working in (a) single direction, then it shows empirically that there is unrest in a particular territory."
While Gupta stressed that this could lead to preventive action, he also said the issues of privacy and data security need to be debated equally hard. "Who is accessing this data? How secure is this data? These issues need to be discussed and debated," he said. "Before the government move towards becoming Big Brother, we should address the concern of data security and privacy."
Where is the data-led govt?
Sanjay Jaju, secretary, information technology, Andhra Pradesh, said the concept of information-led society is passe - it's time now for a data-led society. "However, do you think decision-making in government depends on electronic data?" he asked, before going on to answer the query: "Not really."
"To what extent has physical data been converted to electronic data in the government? Not much," he added.
The three Vs of big data
To understand big data, people need to understand three Vs besides 'veracity' of the data, he said. The first is velocity of data, which is faster than anything else in the world today. "In the Mee Seva initiative in Andhra Pradesh (literally 'at your service', an e-governance programme that offers various services to citizens) , we see around 5 lakh transactions each day." The second 'V', Jaju said, is the variety of data. "We have many layers of data in the government and lots of analytics can be done through those layers," he said.
Third 'V is the volume. "We are one of the most populous countries in the world. One sixth of humanity resides here and if the volume of data is not carefully considered it can lead us to take sub-optimal decisions," he said.
The privacy question
According to Jaju, though privacy is a concern, data is so dynamic at present that information considered private today will no longer be private tomorrow.
He said opening data to public domain is one of the most crucial aspects of governance. Citing an example from Andhra Pradesh, he said land record data has been made available in public domain in the state against much resistance. As a result, he said, 1 crore land records have been given to farmers in the past one year.
'Making sense' of data
Dr TCA Anant, secretary, ministry of statistics and programme implementation, said one has to first understand the data generation process before making sense out of data.
Pointing at limitations of data analytics, he said the information obtained is often not actionable: "A long time ago, Google (had) published an analysis of Google search results on the spread of influenza in the US. It (analysis) predicted the outbreak of influenza in different parts of the US two weeks before the Centre of Disease Control (CDC) made any prediction. Though the Google data did come useful for predicting influenza outbreak, it is still not statistics. What CDC uses is statistics (because it is) actionable.
"Google data, even though extremely innovative, is not actionable - there are many parameters missing from this data to be actionable."
Transition to electronic data is not uniform in government today, he said. The challenge in making public policy is to understand how data is coming and from where it is not coming. Data, Anant said, needs to talk to each other, and thus common standards need to be accepted so that databases can talk to each other.
Vibha Agrawal, vice president, government vertical, CA Technologies, said a huge amount of data is being generated today and the infrastructure has been made to store it for use. However, she said, not enough has been thought about the policies to use this data to improve the lives of people.
"The living standard of people will improve if big data is used to identify the right beneficiaries," she said.
Golak Simili, principal consultant and head of technology in the passport seva project of the ministry of external affairs, talked about the data retiring policy. He said the concept of big data works around the power of consolidation. "(But) how long can you accumulate data? Where is your data retiring policy," he asked.
Stressing on controlling cost and enhance data sharing to increase productivity, Simili said, "We should avoid duplicating data. For example, if UIDAI is already collecting biometric, do we need to collect it again for other processes?"
He said the department saved close to `100 crore by avoiding collection of biometrics.
Rajesh Aggarwal, secretary, information technology, Maharashtra, said most databases of beneficiaries of government schemes are erroneous.
"Thirty percent of data is garbage - much of it is nullified," Aggarwal, though he added that about 10 percent of this is "innocent" inefficiency.
"What do you think is the worst database? It is the Below Poverty Line (BPL) database," the Maharashtra IT secretary said. According to Aggarwal, the BPL database has not been revised - official figures say 70 percent of Maharashtra's population is BPL.
According to Aggarwal, once the beneficiary database of several schemes is integrated with Aadhaar, big data analytics could play a major role in refining the database and help the government in better planning and decision-making.
Database vs data warehouse
Dr CSR Prabhu, deputy director-general at NIC, said that the expression 'big data' is a modern term but the technology for it has been around for some time. Elaborating on the differences between a database and a data warehouse, Prabhu said a database only indicates the current online transactions, while a data warehouse leaves an account of your past transactions.
He gave the example of online railway ticket booking to explain the issue: "A data warehouse will even indicate how many times you have cancelled (tickets)." In other words, it is a past record of one's transaction - an extension of a database.
About the importance of analysing data, he said, "governments have fallen" on the basis on statistics, adding that data analysis helped former Andhra Pradesh chief minister N Chandababu Naidu save his government after he used predictions to control prices of onions.
Nabbing the bad guys
Sanjeev Singh, director, income tax (systems)-II, who has worked with the financial intelligence unit (FIU) for seven years, said if there is anything suspicious, especially in terms of financial institutions and monetary transactions, it will be relayed to the authorities in reasonable time using data analytics.
Elaborating on how valuable information is obtained through credit/debit card transactions, Singh said the database does not record the customer's name during such transactions but it can detect the location of the transaction, frequency of withdrawal with that card and whether there is a pattern of withdrawals - e.g., whether the withdrawals only occur at night. Using such patterns, the database can be used to identify the customer.
"It is a risk-based framework and we have to optimise it," Singh added.
He also elaborated on how big data analytics could be used to figure out fake currency-affected areas: "At the branch level, it is hardly a matter of concern if there is a gap of `5-50. But if it's a jump of 300 percent in one area, it is a matter of concern."
Setting electoral roll call
Dr Alok Shukla, deputy election commissioner, election commission of India, raised concerns about the source, or the authenticity, of data used nowadays. Using his experience as secretary of Chhattisgarh's school education department to demonstrate his point, Shukla said: "(Based on certain observations) a proposal was put to the Tribal Advisory Council (chaired by the chief minister) about introducing primary education in the children's mother tongues. Members of the council - the ministers - pounced on the suggestion. They said, 'You have studied in English. Now you want our children to study in their mother tongues. You don't want us to progress'. (But) you have to see where it is all coming from."
Shukla also raised concern over privacy. "Are we happy that the government knows everything about us?" he asked.
Though the electoral rolls are revised every year, Shukla said somehow outrageous information comes out quite often. "In Karnataka, people could (even) be shown to have 1,000 wives. But the biggest one was when the Bishop of Trivandrum was shown with the photo of a woman.
"It is a process (and) we are trying to improve it. It takes time."
According to Shukla, the EC is uploading the entire electoral roll online. A person named PG Bhatt downloaded the entire electoral role of Bangalore city and found 11 lakh errors of 64 different types. Using different algorithms and going door to door, the EC ultimately found 2 lakh errors - by no means a small achievement - that were corrected before the assembly elections held in May 2013.
Privacy, where art thou in big data?
With US and British intelligence agencies cracking into personal data and online transactions and emails of millions of people using big data, as revealed by former NSA and CIA staffer Edward Snowden, privacy has become a source of concern for experts while advocating use of big data. We reflect some of those:
Dr Arvind Gupta, technology head, Bharatiya Janata Party: "Who is accessing (all this) this data? How secure is this data? These issues need to be discussed and debated. Before the government moves towards becoming Big Brother, we should address the concern of data security and privacy."
Sanjeev Singh, director, income tax (systems)-II: Elaborating on how valuable information is obtained through debit card transactions, he said the database does not record the customer's name during such transactions but it can detect the location of the transaction, frequency of withdrawal with that card and whether there is a pattern of withdrawals - e.g. whether the withdrawals only occur at night. Using such patterns, the database can be used to identify the customer.
Dr Alok Shukla, deputy election commissioner: "Are we happy that the government knows everything about us?"
Sanjay Jaju, secretary, IT department, Andhra Pradesh: Though privacy is a concern, data is so dynamic at present that information considered private today will no longer be private tomorrow.
Video of Big Data Conclave: Transforming eGovernance