Your site might be protected geographically, or it may still be in staging. For Veeva customers and agency partners, we can make Veeva Web2PDF available for whitelisting. Contact your Veeva representative.
Hosting settings may prevent Veeva Web2PDF from accessing your site. Confirm that your site is accessible to the public web (see above). If your site is accessible to the public web, check for delays on your landing page. Veeva Web2PDF crawlers wait briefly for a response before concluding that a site is missing. Common delays include:
Robots.txt is often used by websites to communicate with web crawlers. If you have pages on your site that you do not want search engines’ crawlers to access, you can specify that in the robots.txt file. Veeva Web2PDF fully respects the instruction you specify in your robots.txt file. If there are any specific pages that you do not want Veeva Web2PDF crawler to access, you can specify that in your robots.txt file. Veeva Web2PDF’s unique user agent is: VeevaWeb2PDFCrawler.
To allow Veeva Web2PDF full access to your website, you’ll need to add Veeva as an allowed User-agent in your robots.txt file.
User-agent: VeevaWeb2PDFCrawler
Disallow:
If you’re unsure whether your website has a robots.txt file, you can check by navigating to (yourURL)/robots.txt in your browser. When adding a new robots.txt file to your website, you must always place it in the website’s root.
The Summary Report shows common errors encountered by Veeva Web2PDF while crawling your website. The following table gives more information about these errors.
Error Type | Description |
---|---|
Duplicate Page | A duplicate page has been detected. The original page’s URL and duplicate URL are shown. This may happen when a page can be accessed on your site from multiple URLs. |
Authentication | The specified page requires a login when no or incorrect credentials are provided. Verify that you have provided the correct credentials for your website. |
External Link | The specified link is outside the parent path (sub-directory) and will not be crawled. For example, https://www.veeva.com/products/content-management/vault-promomats/ is specified as the starting page and a link is found to https://www.veeva.com/contact-us/. This will not be crawled as it is not within the starting path. Try running Veeva Web2PDF in a higher directory (ex: https://veeva.com). |
External Reference | The specified link is outside of the specified domain and will not be crawled. For example, your website has a link to the FDA’s website. |
File Reference | The specified link references a file. Files will not be captured or included by Veeva Web2PDF. Please add them separately. |
Contact Information | An email or phone number was found on your website. |
Broken Link | The specified link references an inaccessible page. Please check the link address and your hosting settings. |
Page Timeout | The specified page could not be loaded within 30 seconds. Please check the page and your hosting settings. |
Page Limit Reached | Veeva Web2PDF has reached its 1000 page limit. These links were found on your site but were not crawled. |
Job Timeout | Veeva Web2PDF has reached its 1 hour timeout. These links were found on your site but were not crawled. |
Malformed URL | An invalid URL has been detected. |
Robots Not Allowed | The robots.txt file prevents the Veeva Web2PDF crawler from accessing this page. Learn more. |
Failed to Navigate | Veeva Web2PDF was unable to access and crawl this portion of your website. |
You can find additional help on these pages:
You can also contact us with questions.