# Web-Scraped Data

{% embed url="<https://files.gitbook.com/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FDIgXBUNMjWTWYHAjiHv5%2Fuploads%2FFnNVpIC9SXv2lTQK87j3%2FAdd%20Update%20Delete%20Webscraped%20Data-%20V2.mp4?alt=media&token=00a2a05f-369d-45fe-a3b6-f3636ddd20bf>" fullWidth="true" %}
How to Add, Update and Delete Web-Scraped Data
{% endembed %}

## How to Use the Web Scraping Tool

To add new web-scraped data, follow the steps below:

1. Open your AI Chat
2. Login using **!login** command
3. Click 'Get Started'
4. Click Commands 'Add this URL' or 'Add new URL'
5. Go to **Studio -> NLP Manager -> Responses** to ensure the data is carried over and formatted correctly.

***

## Edit/Delete Web-Scraped Data

### **Update** Web-Scraped Data

{% embed url="<https://files.gitbook.com/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsxN9AGLZMIc5C8dx3zU7%2Fuploads%2F5qcaH7Tug2vwtqzzhPep%2Fedit%20scraped%20data%20-%20Made%20with%20Clipchamp.mp4?alt=media&token=e6a43631-1d43-4876-99c2-eaecf5d12f31>" fullWidth="true" %}
Update Web-Imported Data
{% endembed %}

There are two ways to **update** web-scraped data:

#### Rescrape Your Webpage

1. Open your AI Chat
2. Login using **!login** command
3. Hit 'Get Started'
4. Click **Add this URL** or **Add New URL** to rescrape the page with the updated information
5. Go to **Studio -> NLP Manager -> Responses** to ensure the data is carried over and formatted correctly

#### Manually Update Your Content

1. While in the [Satisfi Dashboard](https://dashboard.satis.fi/Users/Login), go to Studio -> NLP Manager -> Responses
2. Locate the response you'd like to edit
3. Once you click on the response name, select the three dots in the right-hand corner of the window
4. Uncheck the box next to **Content Subscription.**
5. Click the pencil icon
6. Make any necessary edits
7. Hit **Save and Publish** or just **Save** to keep your changes in draft mode

{% hint style="warning" %}
**Why Would You Manually Update Scraped Data?**

* Add Topic Headers (ex. Topic: Group Ticket Perks)
* Update information that hasn't been updated on your website yet
* Improve the chunking of data
* Removing unnecessary scraped data (webpage alerts, footer text, etc.)
* Adding redacted information such as emails/phone numbers when applicable&#x20;
  {% endhint %}

### D**elete** Web-Scraped Data

1. While in the [Satisfi Dashboard](https://dashboard.satis.fi/Users/Login), go to Studio -> NLP Manager -> Responses
2. Locate the response you'd like to delete
3. Click the delete button (trash can icon)

***

## Our Recommendations

### Create Hidden Webpages

Utilize the power of our web scraping tool without having to surface information publicly on your website. By creating hidden pages on your website, you create a controlled environment that serves as a great data resource for your AI Chat to train from and curate content.

Real-World Examples: [**Tampa Bay Buccaneers**](https://www.buccaneers.com/chatbot), [**The Jockey Club**](https://www.thejockeyclub.co.uk/sandown/chat-bot/)

### Create a List of Your URLs&#x20;

We suggest making a list of the URLs you want to scrape and train in advance. Once you locate the popup on the first URL, keep your list handy and add more URLs by clicking the 'Add new URL' button. This way, you won't need to switch between pages and have complete control over the URLs you've already scraped.

### Focus on What is Important

Don't attempt to scrape every web page you have! Focus your scraping efforts on web pages where your customers typically learn key information such as A-Z Guides, Things to Do, Directories, etc.&#x20;

### What to Avoid&#x20;

Avoid scraping:

* Web pages that are frequently updated (schedules, rosters, stats, etc.). In these cases, we recommend using a prewritten response.&#x20;
* Any third-party web pages you do not manage.&#x20;
* Web pages lacking rich content such as image-heavy pages, landing pages, etc.&#x20;

### Inspect for Quality

We always recommend that you not only review results from our scraping tool within the response library but also see how responses are generated and exposed within your chat.&#x20;

{% hint style="danger" %}
If the information you scraped into the dashboard is not being understood by the LLM, this may be due to poor formatting within the response. To fix this:

* Find the corresponding URL labeled response in the library (usually named after the web page it was scraped from)
* Click **Edit**&#x20;
* Ensure that:

  * &#x20;There is a header description related to the data's topic in each section
  * All sections are separated by a 5 “-----”
  * No section is very short or extremely long

  **Note:** Manual edits can be made within web-scraped data responses in the dashboard. However, if a page is rescraped, those manual edits will be overwritten.&#x20;
  {% endhint %}

## FAQs

<details>

<summary>I'm not seeing my scraped responses and/or LLM volume in the dashboard</summary>

In the responses library, ensure that the filter is set to “*company name* LLM.” From there, data should change and reflect everything within the new experience

</details>

<details>

<summary>Is there any specific formatting required for emails/phone numbers in scraped data?</summary>

No! If your contact information is in a text format and written clearly (ex. <email@email.com> or #-###-###-####) it will be surfaced to users in generated responses.&#x20;

</details>

<details>

<summary>Can I scrape text from PDFs, images and/or web banners?</summary>

Unfortunately, this content cannot be scraped. If you'd like to include this data to create generated responses, we suggest creating a documented response

</details>

<details>

<summary>Can multiple people from my team scrape the website using the same popup?</summary>

Yes! If the page is already scraped, a message will trigger that it’s already listed, and a page refresh will be triggered

</details>

<details>

<summary>How many pages can I scrape?</summary>

You can scrape as many web pages as you need; however, we typically recommend scraping between 10-30 total depending on your website

</details>

<details>

<summary>Does the chat ever pull in information from beyond our scraped webpages? </summary>

No! Responses are only generated from information scraped from websites by you

</details>

<details>

<summary>Can I scrape any webpage I want?</summary>

Avoid scraping web pages that you do not manage, as their information may go against your organization's policies and procedures

</details>

<details>

<summary>My chat is not prompting me to login when I enter !login</summary>

You may already be logged in! Check for the asterisk in your chat's text container. If it's there, that means you're logged in. Type **!commands** and continue scraping.

If you still experience an issue logging into admin mode, reach out to our Product Support team by clicking the **Click Here For More Support** button in the header and asking for a live agent!

![](https://167344003-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsxN9AGLZMIc5C8dx3zU7%2Fuploads%2FIDUqRTgLb2jSo2gLLngi%2Falrady%20logged%20in.png?alt=media\&token=552e25cc-6d4d-43c6-816b-0d8ee1c36967)

</details>

<details>

<summary>The popup doesn’t load on a page I'm trying to scrape</summary>

Load the popup on a different page and perform the “Add New URL” command instead and enter the URL

</details>

<details>

<summary>I'm unable to scan my web page because the URL is not reachable</summary>

No worries! Submit a service request and we can assist!

</details>

<details>

<summary>If my website is updated, will it automatically update my scraped response?</summary>

As the information on your website changes, this is not automatically reflected in your scraped data. Rescrape the updated webpage to ensure your chat is properly trained and creating up-to-date responses for users

</details>

<details>

<summary>What if I have pages on other languages, should I scrape them?</summary>

No, you need to only scrape pages in English language. If you have other languages as add-ons, we will make sure to set the content in the second language for you

</details>

<details>

<summary>How can I check the exact URL the content is scraped from?</summary>

Locate the url\_ response and click on **Notes** button

![](https://167344003-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsxN9AGLZMIc5C8dx3zU7%2Fuploads%2Ft4s0bDbmYQ2nHFm3hzLN%2Fnotes%20button.png?alt=media\&token=faf1555c-29f5-440a-b07f-e6273837cb5c)

</details>

<details>

<summary>When I try to scrape my website, I get an http 500 error</summary>

No worries! Submit a service request and we can assist!

</details>
