In this podcast, Alexy Thomas, EY India Technology Consulting Partner, talks about synthetic data, which is being considered the future of the artificial intelligence space.
In conversation with:
Alexy Thomas
EY India Technology Consulting Partner
Podcast host Silloo Jangalwala, Associate Director, BMC, speaks to Alexy Thomas from Tech Consulting at EY India, addressing predominant questions surrounding synthetic data, its potential to become the future of AI, its capability to solve privacy concerns and whether it is a one-stop solution for all AI data needs.
Background: industries need large amounts of high-quality data to train new AI models. Because of emerging data privacy concerns and stringent regulations on data sharing, gathering, and accessing real and high-quality data is becoming difficult. Synthetic data is generated artificially, with or without the help of real data sets, for the purpose of training AI modules. This may address some of these problems faced with real data.
Key takeaways
- While actual data may lack quality, volume, or variety, synthetic data can overcome these limitations and be generated in all the permutations and combinations of any given condition. Real data may also be unavailable for unseen conditions and events.
- Synthetic data can better train AI models and test systems and help build better prototypes than real data sets.
- It can also provide faster turnaround for AI testing, which requires large amounts of iterations and inputs. In the coming years, synthetic data is going to overshadow real data in AI models.
- In sectors like financial services, synthetic data can help to evaluate market behavior and to develop new and innovative products, which is what large and small financial services organizations are trying to do.
- Synthetic data comes with significant risks and limitations since the quality of synthetic data generated depends on the quality of the model that created it. So, if the input has errors or biases, the data generated using it will lead to false insight generation and, automatically, to erroneous decision-making.
Digitally generated data has the same predictive power as real data, as it replicates the statistical characteristics of the existing dataset. It can be generated for unseen conditions and events. Where actual data lacks quality, volume, or variety, synthetic data overcomes these weaknesses, as it is generated for unseen conditions.
For your convenience, a full text transcript of this podcast is available on the link below: Read the transcript.
If you would like to listen to our podcasts on the go:
Amazon music for Podcasters Spotify SoundCloud YouTube Music
Podcast
Duration 06m 24s