Extract Website Content (EWC) was originally an internal tool we developed at Cactix and then made available publicly for free until mid-2019 when we discontinued the service as we couldn’t support it any longer. We are now making the code available for the public at https://github.com/cactix/extractwebsitecontent for anyone who wishes to use the tool or even host it and provide it as a service. It is free from bugs as far as we know but the core crawler functionality could use some enhancements.
EWC extracts written content in blocks from an entire website or a single page and saved the extracted content into an Excel file and sends it by email once done. It is an ideal tool for translators who wish to extract and organize source content and find out the word count. It can also be helpful for marketing agencies providing content and search engine optimization services.
EWC is built in Ruby on Rails and uses PostgreSQL for database and Mailgun for transactional email. You can access the repository at https://github.com/cactix/extractwebsitecontent