How to create sitemap .xml using the GSiteCrawler
Honestly, i have been looking for the perfect tool / site to generate a sitemap (.xml) without any limitation such as the numbers of pages generated. I did come across plenty of free sitemap generators through google, nevertheless i can't find any that's not being limited to a certain number of pages. After days of searching, i have finally found an application software designed for sitemap generator, which means you have to first install it locally. Once you installed GSiteCrawler you will be able to run it without any limitation and extra optional features not to be offered by any other sites out there.
Go visit the GSiteCrawler site at the following link: http://gsitecrawler.com/, download the software and install it. on your PC Once, you have successfully done that, double click the software to run it as usual.
1. The following screen is the initial page that you will normally encountered once you execute the software.
2. Click on "Add new project" once you are ready to generate a sitemap with your website. Ensure you have to include the http:// (or https://) prefix and ends with a trailing slash (/).
3. GSitecrawler will automatically detect whether your site is hosted on the Linux/Unix server OR Windows server. Then, it determineS whether the URLs are case-sensitive or not. Feel free to adjust the other settings as you see fit, but i would left them as it is by default, and including default filters for session-IDs is strongly recommended.
4. If you want to upload your sitemap automatically to your FTP server, once it was generated, you can set up that right here. Well, there's absolutely nothing wrong to do it manually, if that's more convenient for you.
5. It has 3 extra optional features that are very essential in my opinion such as the following:
|For security purpose, you shouldn't include the administration pages or any other sensitive pages in the sitemap file. Because you wouldn't want your admin pages to be indexed on the search engines. Therefore, GSiteCrawler will read ROBOTS.TXT and skip/ignore the URL listed on the text file.|
|It's common for nowadays server to has custom "file-not-found" pages. It can be hard for the search engine or even GSiteCrawler to differentiate whether the page is really existed or not. Enabling this option could check such issue.|
|If you website has been setup for a while, Google may have know some information or index a couple of pages on your site, GSiteCrawler can double check it first with Google which eventually makes the software runs faster as we don't have to do extra work.|
6. Now the crawler is running and you can manually setup the number of crawlers running on your site, just in case you didn't overload your site. You can pause the crawler anytime and continue on it later. If you want to abort the crawler, then pause the crawler and clear the total queue to restart it all over again.
7. After all crawlers have stopped and your site has been fully crawled by them, then click on Generate to choose either Google sitemap file or Yahoo url list. Afterwards, upload the sitemap file manually through your FTP software such as FileZilla.
Note: You shouldn't use GSiteCrawler if you are on Drupal or WordPress. Drupal has its own sitemap generator called XMLSitemap module, which's pretty easy and straightforward. I have initially tried using GSiteCrawler on my drupal site, it will include by default unnecessary links such as login destination, quicktabs module, etc. Believe it or not my site with just over 300 links turned out to be a large 10,000 links using GSiteCrawler due to quicktabs module being installed.