How do I extract consecutive capitalized words from a body of text in Microsoft Excel?

Posted by:

How do I extract consecutive capitalized words from a body of text in Microsoft Excel?

How do I extract consecutive capitalized words from a body of text in Microsoft Excel? If you have Excel, most probably you also have Word. The “Find and Replace” windows in Excel does not have anywhere near the power of that available in Word. Nowhere near. And IMO the problem you have posed is easier to handle using the Find and Replace tools in Word. Especially if you only have to do it once (or even a few time, for yourself) and don’t want to spend the time to get VBA right, or use regular expressions. (I personally love regular expressions and have upvoted Brad Yundt’s answer.)

I’ll show you step by step with pictures too. I’m using Word 2010.

Here’s my test data

How do I extract consecutive capitalized words

You see I have put in some test data to show what it will or will not work on.

Then select the table and in the Table Tools section which appears, select Layout and the choose Convert to Text. (Separate text with Paragraph Marks. Convert nested tables does nothing-there are none here.)How do I extract consecutive capitalized words

Now you have:

How do I extract consecutive capitalized words

Notice that Show/Hide ¶ is turned on so you can see the formatting information such as the paragraph mark.

Now select all the text to be analyzed. Then click on the Replace button in the Home ribbon to open the Find/Replace dialog.

In the Find What box enter

*(<[A-Z]@> <[A-Z]@>)*^013

This says to find

How do I extract consecutive capitalized words

And in the Replace with box enter:

\1^p

This mean:

How do I extract consecutive capitalized words

We must click “More>>” and turn on “Use wildcards.”

Then select the text to analyze and click Replace All.

How do I extract consecutive capitalized words

Which produces

How do I extract consecutive capitalized words

So one must be careful of the upper case A starting a sentence as in #2, or the 3 successive upper case words in #4. It would be possible to allow for “words” which are, for example, tags, which have a number in them. Just change the [A-Z] to [A-Z0-9] in both places, or where needed.

So, one can make use of the more powerful Find/Replace tool in Word. After performing these operations just copy the results back to Excel if needed.

I thought this would actually be simpler, but a naïve creation of the Find expression didn’t do what I expected. I originally left what I copied from Excel as a table. Find/Replace did not handle that well. This was in Excel 2010. And the wildcards in this do not allow to find the end of a line (¶). Table structure seemed to confuse it. Then a web search finally showed me that I could use the “^013” for that. In the Replace with one can put the usual “^p” for the end of a paragraph. (https://answers.microsoft.com/en…)

Now actually, if one knows Linux commands or the Windows version of the vi editor, gvim, doing something such as this is equally straightforward. But this was an attempt to stay in the Microsoft Office environment.

0

Add a Comment