Veeva Web2PDF - Troubleshooting

Troubleshooting

Troubleshooting FAQ

Why can’t Veeva Web2PDF access my site?

Your site might be protected geographically, or it may still be in staging. For Veeva customers and agency partners, we can make Veeva Web2PDF available for whitelisting. Contact your Veeva representative.

Why can’t Veeva Web2PDF generate a PDF for my site?

Hosting settings may prevent Veeva Web2PDF from accessing your site. Confirm that your site is accessible to the public web (see above). If your site is accessible to the public web, check for delays on your landing page. Veeva Web2PDF crawlers wait briefly for a response before concluding that a site is missing. Common delays include:

Redirection to another URL: Make sure that you are using the direct URL for your site.
Loading resources: Does your landing page contain large images, animations, videos, or style sheets? Try adding a wait hint on your webpage to give these elements time to load.

What is robots.txt, and how does Veeva Web2PDF handle it?

Robots.txt is often used by websites to communicate with web crawlers. If you have pages on your site that you do not want search engines’ crawlers to access, you can specify that in the robots.txt file. Veeva Web2PDF fully respects the instruction you specify in your robots.txt file. If there are any specific pages that you do not want Veeva Web2PDF crawler to access, you can specify that in your robots.txt file. Veeva Web2PDF’s unique user agent is: VeevaWeb2PDFCrawler.

To allow Veeva Web2PDF full access to your website, you’ll need to add Veeva as an allowed User-agent in your robots.txt file.

User-agent: VeevaWeb2PDFCrawler
Disallow:

If you’re unsure whether your website has a robots.txt file, you can check by navigating to (yourURL)/robots.txt in your browser. When adding a new robots.txt file to your website, you must always place it in the website’s root.

Summary Report Errors

The Summary Report shows common errors encountered by Veeva Web2PDF while crawling your website. The following table gives more information about these errors.

Error Type	Description
Duplicate Page	A duplicate page has been detected. The original page’s URL and duplicate URL are shown. This may happen when a page can be accessed on your site from multiple URLs.
Authentication	The specified page requires a login when no or incorrect credentials are provided. Verify that you have provided the correct credentials for your website.
External Link	The specified link is outside the parent path (sub-directory) and will not be crawled. For example, https://www.veeva.com/products/content-management/vault-promomats/ is specified as the starting page and a link is found to https://www.veeva.com/contact-us/. This will not be crawled as it is not within the starting path. Try running Veeva Web2PDF in a higher directory (ex: https://veeva.com).
External Reference	The specified link is outside of the specified domain and will not be crawled. For example, your website has a link to the FDA’s website.
File Reference	The specified link references a file. Files will not be captured or included by Veeva Web2PDF. Please add them separately.
Contact Information	An email or phone number was found on your website.
Broken Link	The specified link references an inaccessible page. Please check the link address and your hosting settings.
Page Timeout	The specified page could not be loaded within 30 seconds. Please check the page and your hosting settings.
Page Limit Reached	Veeva Web2PDF has reached its 1000 page limit. These links were found on your site but were not crawled.
Job Timeout	Veeva Web2PDF has reached its 1 hour timeout. These links were found on your site but were not crawled.
Malformed URL	An invalid URL has been detected.
Robots Not Allowed	The robots.txt file prevents the Veeva Web2PDF crawler from accessing this page. Learn more.
Failed to Navigate	Veeva Web2PDF was unable to access and crawl this portion of your website.

Additional Help

You can find additional help on these pages:

You can also contact us with questions.