Introducing Synthetic Data Generation in Edge Impulse
We are proud to announce that synthetic data generation is now available inside the Edge Impulse platform, enabling a new and efficient way to work with LLM-generated images, audio, and voice to enhance your edge AI models.
This article was first published on
www.edgeimpulse.comWe are proud to announce that synthetic data generation is now available inside the Edge Impulse platform, enabling a new and efficient way to work with LLM-generated images, audio, and voice to enhance your edge AI models.
Quickly access and generate synthetic data in Edge Impulse
Training effective machine learning models requires a lot of accurate data. However, collecting and curating this data to reflect real-world scenarios is often challenging and expensive. The time, effort, and costs involved can be substantial, posing significant barriers. One option is using synthetic data, which has advanced to the point that all types of it are now viable as a part of datasets used for AI training.
As of today, we have integrated with three different GenAI solutions, Dall-E to generate images, Whisper for creating human speech elements, and ElevenLabs to generate audio sound effects.
Integrated data generation from multiple applications
And new foundation models and services are getting released every week! In order to make these integrations scalable, every user belonging to an Edge Impulse organization can also create their own synthetic data generation integrations. See our documentation for more details.
The direct integration of LLM-based data generation in Edge Impulse is available now for Enterprise Plan users and Professional Plan users; you can access this new feature directly in your Edge Impulse projects under the “data acquisition” section, alongside Dataset, Data explorer, and Data sources options.
The Synthetic Data menu lists the GenAI transformation blocks, both public and private, that have synthetic data capabilities.
Within the new Synthetic Data section, the user can add and refine their prompts quickly and efficiently. The output, including images, are then displayed, allowing users to quickly evaluate and refine their prompts until they get the desired data set, and easily delete unwanted or incorrect data samples.
This iterative workflow will make it much clearer to determine the right prompts for generating data. Additionally, any data that is not deleted will automatically be added to the project, ensuring seamless data management as you continue to build and refine your edge AI model.
New whitepaper: Discover the 5 AI Trends Transforming ManufacturingThe whitepaper, "5 Rising Trends for AI Adoption in Manufacturing," offers key insights into how artificial intelligence is transforming manufacturing processes. It explores survey data from over 150 industry leaders, revealing that optimizing production and improving product quality are top goals driving AI adoption. The report also discusses the rise of AI solutions for worker safety, real-time insights to boost profitability, and the growing importance of internal expertise in implementing AI. Download the full whitepaper to gain a deeper understanding of these trends and how AI/ML is shaping the future of manufacturing. |
Sample Workflows
To generate synthetic data, navigate to your project in Edge Impulse. Click on “Data acquisition” in the left hand menu and Select the "Synthetic Data" tab in the top nav.
Here are the steps for the currently supported Gen-AI functions:
Synthetic Images
- From the drop-down, select DALL-E 3 Synthetic Image Generator
- Enter a prompt: For example, “People wearing hard hats”
- Click “Generate data” to create the images
- Review, iterate, and prune images from the newly generated samples
New samples are automatically saved on your dataset
The Dall-E image generator functionality
Generate Human Speech
- From the drop-down, select Whisper Synthetic Voice Generator
- Enter a text prompt you want converted to digital speech. For example, "Hello, Edge!"
- Click "Generate" to create the speech data
- Review and save the generated audio files
Generate Audio
- Select ElevenLabs Synthetic Audio Generator (more info here)
- Enter prompt. For example: “Glass breaking impact”
- Click “Generate” to create the sound effects
- Review, iterate, and eventually delete the newly generated samples
- New samples are automatically saved on your dataset
See the new Synthetic Data integration in use with the new ElevenLabs integration video
Read the docs page for more information on all three options.
This new capability empowers users to streamline the process of generating and refining LLM-based prompts to create the desired data set. It will provide an efficient workflow for building models using synthetic data and make it easier for developers to push custom Gen-AI transformation blocks for their optimal AI model generation.
Download the whitepaper: 5 AI Trends Transforming Manufacturing
Stay ahead of the curve on how AI is transforming businesses with the whitepaper "5 Rising Trends for AI Adoption in Manufacturing." The guide offers valuable insights to help leaders implement AI strategies that optimize production, improve product quality, and drive innovation. Download your free copy now: