Commentary
The California state AI safety bill, SB 1047, known as the most stringent in the nation, has been
vetoed by the California governor. Taking small steps to address the low-hanging fruits in AI regulation is more beneficial than making giant leaps. One crucial step is mandating the traceability of data. California should lead in this effort, similar to its leadership in
data privacy.
Misinformation has been a significant issue affecting the war in
Ukraine. The problem is exacerbated with the rise of Generative Artificial Intelligence (AI). Deepfakes pose a significant
challenge. Governments are keen on combating misinformation and fraud. Mandating data traceability, also known as provenance, is a crucial step in that direction and offers a more comprehensive solution than just requiring watermarks in the metadata of AI-generated photos, as proposed in California’s
AB 3211, which some
companies are endorsing.
Data plays a significant role in our daily lives, particularly in the realm of AI. Preserving the integrity of data is crucial. Data provenance, which ensures traceability, is a key aspect of integrity. It aids in detecting anomalies and errors in data, similar to how tracking money can enhance the integrity of the economy.
Provenance contributes to establishing
trust in digital items found online. Research suggests that provenance is effective in reducing users’ vulnerability to misinformation. It is an important
ethical measure that discourages the misuse of digital items such as photographs. Provenance should include documentation of the method used to generate the data. Mandating the documentation of data lineage, transformations, and contextual information can help reduce the intentional creation and use of fake content, such as the
deep fakes used in scams.
I have personally experienced misinformation in various forms, including on anonymous
websites. While most reviews about my teaching on RateMyProfessors.com are positive, the false ones still have an impact. Rigorous research
studies have shown that anonymous websites like RateMyProfessors.com are inaccurate and biased. Despite this, the website
claims legal immunity for user-generated content. It remains uncertain whether the Federal Trade Commission’s
rule banning fake reviews will impact the website, but mandating data provenance could have a significant effect.
According to
research, on such websites, “there is no guarantee that a ‘student reviewer’ is even a student,” indicating a lack of provenance, making the information unsuitable for decision-making processes. Despite this, many students rely on it for decision-making, and the government seems reluctant to address the issue—unless a platform like “rate my lawmaker” emerges or articles like this make a difference.
Enforcing traceability of information on websites may not eradicate misinformation entirely but can significantly reduce its impact. Scams targeting young adults are on the rise on social media. I reported one such scam to Facebook, but it was not addressed. Content moderators may have overlooked it, possibly due to the use of vernacular language in the conversation on Facebook Messenger. Such scam attempts can be attributed, at least in part, to the lack of traceability.
There are compelling reasons to mandate data provenance. It is hailed as the “secret weapon” to safeguard businesses from
fraud and is seen as a
new beginning in cybersecurity. It aids in investigating data breaches and pinpointing the specific breached information. With the rise of Generative AI, there is a significant debate around
copyright. Data lineage can establish ownership and help resolve intellectual property disputes. Moreover, it is a valuable resource in forensic analysis.
Tracking the origins, movements, and transformations of data enables the assessment of biases and indicates reliability. Reproducibility is crucial in various research and decision-making scenarios. Data provenance plays a role in ensuring the reproducibility of experiments and decision-making processes, and it can assist in recovering lost data due to unforeseen circumstances.
Provenance is essential for the data ecosystem in the age of Generative AI. Mandating it will enhance the reliability, accountability, and trustworthiness of data, the systems built upon it, and the decisions derived from them. The good news is that the technical community is increasingly recognizing the significance of data provenance and actively working towards ensuring it through
standards,
initiatives, and technologies like the
Data Fabric. It is crucial for lawmakers to recognize its importance. In the meantime, industries should step up voluntarily.
Opinions expressed are Vishnu’s and do not represent those of his employer or any other affiliated entity.
The views expressed in this article are the author’s opinions and do not necessarily reflect those of The Epoch Times.