Toll Free: 1-877-695-7388

GTA: (647) 699-2838

Search Engine People
  • SEO
  • SEM
  • CRO
  • Display
  • Blog
  • Why Us
  • Contact
  • Join Our Team
  • Get A Quote

Toll Free: 1-877-695-7388

GTA: (647) 699-2838

(Almost) Automatically Reveal A Site’s Directory Structure Using Excel

Ruud Hein | October 8th, 2010
Tweet97
Share7
Share
Pin
104 Shares

2921415604_5255acd8cc_z

I often analyze sites with thousands of pages " per folder.

Large sites.

With nested directories.

Just to make it " you know " more complicated fun.

Although I might have a pretty good mental image of how the site looks, knowing how it looks and showing how it is structured are much better.

Heres how you can do that with Excel 2007 " and of course it works for small(er) sites too.

Get A List Of URLs

Crawl the site, download its XML sitemap, scrape Google: do anything to get as many URLs covering as much ground as you can.

Dont forget that you can import or paste almost anything into Excel and it will make (a lot of) sense of it.

For the purpose of this post Ive downloaded the sitemap of Allied American University.

image

Split Into Folders

Select the column with the URLs.

On the data tab, select Text-to-Columns.

image

This brings up the first screen of the convert wizard.

Make sure delimited is selected and click Next.

SNAGHTMLfd69331

Select other as delimiter and enter the backslash. Click Finish.

SNAGHTMLfd7b356

Click OK.

image

Youll now several columns, some containing directories, some containing filenames.

Remove Filenames

Select a column and click Filter.

image

Select the text filter "> ends with.

image

In this example case were filtering filenames with .aspx

SNAGHTMLfde9c90

Select and delete the filtered results in that column.

image

Repeat for as many columns as you have.

image

Remove Duplicates

View #1: To get a view of the site structure, select all columns and click Remove Duplicates.

image

View #2: Or, to get a view just of the different levels/folders, select one column at a time and when you click Remove Duplicates, dont expand the selection: opt to continue with the current selection.

SNAGHTMLfe37eee

Click Remove Duplicates, confirm your selection and Excel leaves you with unique directory names only.

SNAGHTMLff0f920

For view #1 youre done at this point. For view #2, repeat for as many columns as you have.

The Results.

View #1

image

A clear view on the site structure. You can apply various sorts or filters at this point.

View #2

image 

Having sorted them alphabetically for cleaner presentation as well as having sized the columns for readability, the end result is a clean overview of the different levels of the site.

View #2 can give you a better idea of the many different levels there are to a site. Its a great view for large sites like directories as a lot of the permutations are removed.

View #1 on the other hand is more true to the actual site structure.

Tweet97
Share7
Share
Pin
104 Shares
Posted in Social Media MarketingTagged excel, IA, sitemap, tool

About the Author: Ruud Hein

My paid passion at Search Engine People sees me applying my passions and knowledge to a wide array of problems, ones I usually experience as challenges. People who know me know I love coffee.

Ruud Hein

4 thoughts on “(Almost) Automatically Reveal A Site’s Directory Structure Using Excel”

  1. Nick Stewart says:
    October 10, 2010 at 9:14 am

    Hey this is pretty cool! I’m pretty good with Excel but I never thought of doing this. This is especially helpful when you’re trying to display the directory structure of your company’s website for a presentation.

    —
    Nick, The Traffic Guy
    .-= Nick Stewart recently posted: Get More Traffic With These WordPress Plugins =-.

  2. Gemma says:
    October 11, 2010 at 2:13 am

    This method is great for seeing the overall structure of a site, and useful for understanding how page rank should flow throughout it.

  3. Carl Henry says:
    August 21, 2012 at 12:39 am

    Hi Ruud!

    Thank you very much for sharing this idea, just as one of your posters above, I have been an Exel user for many, many, many years but learn something new it can do every day.

    Enjoy your coffee 😉

    Best wishes,
    Carl

Comments are closed.

Recent Posts

  • The Manifest Names Search Engine People Among Toronto’s Most Reviewed SEO Companies
  • Movin’ On Up! Why Migrating to Google Analytics 4 (GA4) Should be a Priority
  • A Year in Review: The Digital Marketing Trends That Defined 2021
  • The Basics of Video Marketing
  • Just How Much Do Google Reviews Impact Your SEO Ranking?

Categories

  • Analytics & ROI Analysis
  • Company News
  • Content
  • Conversion Optimization
  • Display Advertising/RTB
  • Email Marketing
  • En Español
  • En Français
  • Inbound Marketing
  • Lead Nurture & Marketing Automation
  • Local Search
  • Marketing
  • Mobile
  • Partnership Marketing
  • PPC
  • PR
  • SEO
  • Social Media Marketing
  • Web Design

Additional Posts

Foursquare’s Loyalty or Groupon’s Trial?

October 6th, 2010 | by Bob Nunn

Allow Tweeters to Lock Hashtags

October 6th, 2010 | by Mick Higgins

The Complete Competitor Analysis Tool Suite

October 6th, 2010 | by Taylor Pratt

LET'S TALK

Need more information or want to get in touch?

Get in touch!
  • SEO
  • SEM
  • Display
  • Blog
  • Why Us
  • Join Our Team
  • Contact Us
  • Local SEO
  • Small Business SEO
  • Enterprise SEO
  • International SEO

LOCATION

1305 Pickering Parkway,
5th Floor Pickering, L1V 3P2

PHONE

Toll Free: 1-877-695-7388
Greater Toronto Area: (647) 699-2838

Social

© Search Engine People Inc. 2023 – Canada’s Top Digital Agency
© SEP 2023 – A Search Engine People Company | Privacy Policy

Search Engine People