Every time you add new imported data, we always recommend that you review results from our scraping tool within the response library. This is to ensure that:
The formatting of the content is readable by the LLM
The irrelevant page content isn't being scraped (ex. cookie policy, headers/footers, etc.)
Data is grouped and labeled correctly with a topic header
If you neglect any QA process of checking your data or testing it in your chat, you risk your chat providing users with wrong information.
Troubleshooting Scraped Data Scenarios
My chat is firing a prewritten response, but I want to use my web-imported data instead to generate a response.
Prewritten responses will ALWAYS fire before generating a response! To generate responses using your data, you will need to delete the existing prewritten response associated with that topic.
In this example, a user is wanting to know the price of valet parking:
Before: When Using a Prewritten Response
Prewritten Response about Parking
After: Deleted the Prewritten Response and Generated a Response
Generated Response with Enrichment
My web-imported data looks correct in my dashboard, but it is providing users with an incorrect answer.
There are a few things to review:
Ensure that you are not providing overlapping data in another response name.
For example: If you scraped a webpage where it stated that RV parking was $35, but you have outdated data in another response name stating that it's $25, it will confuse your chat and potentially fire an incorrect answer. Delete the outdated, overlapping information.
Make sure your data has appropriate topic headers to help direct your chat to generate responses.
For example: You scraped the following data from a school fundraising event page- "Any school that wishes to participate must email no later than August 30 with the school name and fundraising coordinator’s name/email address". Without a topic header, the system has no idea what this is for (school field trips, special events, etc.). Fix this by adding a topic header as shown below:
Chunked Data with a Topic Header
Edit scraped data and add additional context as needed within the Response Library.
For example: You scraped the following data from your parking and transportation page- "Small portable electric generators (600W or less) are permitted". Within your response library that data may look ok; however, without further clarification, the system doesn't know where generators are permitted. In this case, the system would assume because generators are permitted, they are permitted everywhere including inside the stadium. Edit this data in your response library to include more details like: "Small portable electric generators (600W or less) are permitted in parking lots. Generators will not be permitted inside the stadium."
The scraped data from my website is wrong. How can I determine which webpage it's from?
Go to Studio -> NLP Manager -> Responses
Search for the wrong data using keyword phrases in the Response Library as shown below:
Once in the response, click the three dots in the top right-hand corner and then click Notes
Within the notes section, you will see when the page was last scraped and the URL!
My website content was just updated. Is my previously scraped data updated as well?
No, you will need to scrape the page again to update your data in the response library.
Note: Manual edits can be made within web-scraped data responses in the dashboard. However, if a page is rescraped, those manual edits will be overwritten.
How do I determine if my data is useful or not?
Ensure the following when reviewing your data:
Unnecessary Data: You may scrape data like privacy overviews, dynamic content, or other information that is not needed to generate responses. We recommend deleting this data or deleting the entire response name.
Data That's Not Helpful When Building Responses
Bad Chunking: Data like "Tickets are $15 for the show" may make sense to you, but not to your chat. Your chat wants to know as much context as possible like which show in particular is $15, when is it, when do tickets go on sale, is there kids pricing, etc.