Combine Rows Based on Capitalization

D

dogplayingpoker

I have text that looks like this:

###

FOUR SCORE and seven years ago
our fathers brought forth on
this continent
A NEW NATION conceived
in liberty and dedicated
to the proposition
that all men are created equal
NOW WE ARE ENGAGED in a great civil war
testing whether that nation
OR ANY NATION
so conceived and so dedicated
can long endure
WE ARE MET ON A GREAT battle-field
of that war

###

There is no regularity to the number of lower-cased rows between th
capitalized rows, nor is there any regularity to how many letters ar
capitalized at the beginning of the capitalized rows. I want the outpu
to look like this, so that Excel combines rows, adding a space at th
front and starting a new row when it hits a capital letter:

###

FOUR SCORE and seven years ago our fathers brought forth on thi
continent
A NEW NATION conceived in liberty and dedicated to the proposition tha
all men are created equal
NOW WE ARE ENGAGED in a great civil war testing whether that nation
OR ANY NATION so conceived and so dedicated can long endure
WE ARE MET ON A GREAT battle-field of that war

###

And to make things even more complicated, sometimes the lower-cased row
actually start with a single capitalized letter. So really, I want it t
start a new row when it hits more than TWO capital letters in a row.

Any ideas? Thanks in advance
 
D

Don Guillett

I have text that looks like this:

###

FOUR SCORE and seven years ago
our fathers brought forth on
this continent
A NEW NATION conceived
in liberty and dedicated
to the proposition
that all men are created equal
NOW WE ARE ENGAGED in a great civil war
testing whether that nation
OR ANY NATION
so conceived and so dedicated
can long endure
WE ARE MET ON A GREAT battle-field
of that war

###

There is no regularity to the number of lower-cased rows between the
capitalized rows, nor is there any regularity to how many letters are
capitalized at the beginning of the capitalized rows. I want the output
to look like this, so that Excel combines rows, adding a space at the
front and starting a new row when it hits a capital letter:

###

FOUR SCORE and seven years ago our fathers brought forth on this
continent
A NEW NATION conceived in liberty and dedicated to the proposition that
all men are created equal
NOW WE ARE ENGAGED in a great civil war testing whether that nation
OR ANY NATION so conceived and so dedicated can long endure
WE ARE MET ON A GREAT battle-field of that war

###

And to make things even more complicated, sometimes the lower-cased rows
actually start with a single capitalized letter. So really, I want it to
start a new row when it hits more than TWO capital letters in a row.

Any ideas? Thanks in advance.

Sub insertrowiftwocapletters()
Dim i As Long
For i = Cells(Rows.Count, 1).End(xlUp).Row To 1 Step -1
If Left(Cells(i, 1), 2) = UCase(Left(Cells(i, 1), 2)) Then
Cells(i, 1) = " " & Cells(i, 1)
Rows(i).Insert
'MsgBox i
End If
Next i
End Sub
 
D

dogplayingpoker

'Don Guillett[_2_ said:
;1600153']On Friday, March 23, 2012 10:14:32 PM UTC-5, dogplayingpoke
wrote:-
I have text that looks like this:

###

FOUR SCORE and seven years ago
our fathers brought forth on
this continent
A NEW NATION conceived
in liberty and dedicated
to the proposition
that all men are created equal
NOW WE ARE ENGAGED in a great civil war
testing whether that nation
OR ANY NATION
so conceived and so dedicated
can long endure
WE ARE MET ON A GREAT battle-field
of that war

###

There is no regularity to the number of lower-cased rows between the
capitalized rows, nor is there any regularity to how many letters are
capitalized at the beginning of the capitalized rows. I want th output
to look like this, so that Excel combines rows, adding a space at the
front and starting a new row when it hits a capital letter:

###

FOUR SCORE and seven years ago our fathers brought forth on this
continent
A NEW NATION conceived in liberty and dedicated to the propositio that
all men are created equal
NOW WE ARE ENGAGED in a great civil war testing whether that nation
OR ANY NATION so conceived and so dedicated can long endure
WE ARE MET ON A GREAT battle-field of that war

###

And to make things even more complicated, sometimes the lower-case rows
actually start with a single capitalized letter. So really, I want i to
start a new row when it hits more than TWO capital letters in a row.

Any ideas? Thanks in advance.

Sub insertrowiftwocapletters()
Dim i As Long
For i = Cells(Rows.Count, 1).End(xlUp).Row To 1 Step -1
If Left(Cells(i, 1), 2) = UCase(Left(Cells(i, 1), 2)) Then
Cells(i, 1) = " " & Cells(i, 1)
Rows(i).Insert
'MsgBox i
End If
Next i
End Sub

Thanks! This did part of what I want, but its not quite there. There i
now an empty row before each string of two or more capitalized letters
but it didn't combine the rows. I'd like the output to be such tha
every single row starts with a string of the all caps text. Right now i
shows

AAAAAAAAbbbbbbbbbccccccc
ccccccccccccccvvvvvvvvvv

BBBBBBBBccccffffffgghjjkkkk
ggggggggggggggggggggg
ddddddddddddddddd
eeeeeeeeeee

CCCCCCCCCgggggghhhhhh
ffffffffgfgfg

And I'd like it to look like:

AAAAAAAAbbbbbbbbbcccccccccccccccccccccvvvvvvvvvv
BBBBBBBBccccffffffgghjjkkkkgggggggggggggggggggggdddddddddddddddddeeeeeeeeeee
CCCCCCCCCgggggghhhhhhffffffffgfgf
 
R

Ron Rosenfeld

And to make things even more complicated, sometimes the lower-cased rows
actually start with a single capitalized letter. So really, I want it to
start a new row when it hits more than TWO capital letters in a row.

Any ideas? Thanks in advance.

Your request can do with some details.

For example, you write that you want to start a new row only when you have more than TWO capitalized letters in a row, yet your line that starts with:

A NEW NATION conceived in li

only has one capitalized letter followed by a space.

It is also not clear where you want the results to go, or where your source is.

I have made some assumptions:

Your source is in column A, starting with row 1
Your results will go into column B, starting with row 1
You will start a new line if it starts with two consecutive capital letters or a pattern of Capital<space>Capital

There are some other limits in the code, having to do with size, but I doubt they will cause an issue, given the information you have provided.

To enter this Macro (Sub), <alt-F11> opens the Visual Basic Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.

To use this Macro (Sub), <alt-F8> opens the macro dialog box. Select the macro by name, and <RUN>.

============================================
Option Explicit
Sub SplitTextAtCaps()
Dim rg As Range, c As Range
Dim rDest As Range
Dim s As String
Dim v As Variant, vSrc As Variant
Dim re As Object
Set rg = Range("A1", Cells(Rows.Count, "A").End(xlUp))
Set rDest = rg(1, 1).Offset(columnoffset:=1)
vSrc = rg
For Each v In vSrc
s = s & vbLf & v
Next v
s = Mid(s, 2)

Set re = CreateObject("vbscript.regexp")
With re
.Global = True
.Pattern = vbLf & "(?![A-Z]\s?[A-Z])"
End With

s = re.Replace(s, " ")
v = Split(s, vbLf)

rDest.EntireColumn.Clear
Set rDest = rDest.Resize(rowsize:=UBound(v) + 1)
rDest = WorksheetFunction.Transpose(v)

End Sub
==================================
 
D

Don Guillett

I have text that looks like this:

###

FOUR SCORE and seven years ago
our fathers brought forth on
this continent
A NEW NATION conceived
in liberty and dedicated
to the proposition
that all men are created equal
NOW WE ARE ENGAGED in a great civil war
testing whether that nation
OR ANY NATION
so conceived and so dedicated
can long endure
WE ARE MET ON A GREAT battle-field
of that war

###

There is no regularity to the number of lower-cased rows between the
capitalized rows, nor is there any regularity to how many letters are
capitalized at the beginning of the capitalized rows. I want the output
to look like this, so that Excel combines rows, adding a space at the
front and starting a new row when it hits a capital letter:

###

FOUR SCORE and seven years ago our fathers brought forth on this
continent
A NEW NATION conceived in liberty and dedicated to the proposition that
all men are created equal
NOW WE ARE ENGAGED in a great civil war testing whether that nation
OR ANY NATION so conceived and so dedicated can long endure
WE ARE MET ON A GREAT battle-field of that war

###

And to make things even more complicated, sometimes the lower-cased rows
actually start with a single capitalized letter. So really, I want it to
start a new row when it hits more than TWO capital letters in a row.

Any ideas? Thanks in advance.

Actually, the A as the first didn't pose a problem with the data presented using my macro for the same column.
 
D

dogplayingpoker

'Ron Rosenfeld[_2_ said:
;1600164']On Sat, 24 Mar 2012 03:14:32 +0000, dogplayingpoke
And to make things even more complicated, sometimes the lower-case rows
actually start with a single capitalized letter. So really, I want i to
start a new row when it hits more than TWO capital letters in a row.

Any ideas? Thanks in advance.
-

Your request can do with some details.

For example, you write that you want to start a new row only when yo
have more than TWO capitalized letters in a row, yet your line tha
starts with:

A NEW NATION conceived in li

only has one capitalized letter followed by a space.

It is also not clear where you want the results to go, or where you
source is.

I have made some assumptions:

Your source is in column A, starting with row 1
Your results will go into column B, starting with row 1
You will start a new line if it starts with two consecutive capita
letters or a pattern of Capital<space>Capital

There are some other limits in the code, having to do with size, but
doubt they will cause an issue, given the information you hav
provided.

To enter this Macro (Sub), <alt-F11> opens the Visual Basic Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.

To use this Macro (Sub), <alt-F8> opens the macro dialog box. Select th
macro by name, and <RUN>.

============================================
Option Explicit
Sub SplitTextAtCaps()
Dim rg As Range, c As Range
Dim rDest As Range
Dim s As String
Dim v As Variant, vSrc As Variant
Dim re As Object
Set rg = Range("A1", Cells(Rows.Count, "A").End(xlUp))
Set rDest = rg(1, 1).Offset(columnoffset:=1)
vSrc = rg
For Each v In vSrc
s = s & vbLf & v
Next v
s = Mid(s, 2)

Set re = CreateObject("vbscript.regexp")
With re
.Global = True
.Pattern = vbLf & "(?![A-Z]\s?[A-Z])"
End With

s = re.Replace(s, " ")
v = Split(s, vbLf)

rDest.EntireColumn.Clear
Set rDest = rDest.Resize(rowsize:=UBound(v) + 1)
rDest = WorksheetFunction.Transpose(v)

End Sub
==================================
Worked perfectly. Thank you very very much
 
G

Gord Dibben

Well done Ron


Gord


'Ron Rosenfeld[_2_ said:
;1600164']On Sat, 24 Mar 2012 03:14:32 +0000, dogplayingpoker
And to make things even more complicated, sometimes the lower-cased rows
actually start with a single capitalized letter. So really, I want it to
start a new row when it hits more than TWO capital letters in a row.

Any ideas? Thanks in advance.
-

Your request can do with some details.

For example, you write that you want to start a new row only when you
have more than TWO capitalized letters in a row, yet your line that
starts with:

A NEW NATION conceived in li

only has one capitalized letter followed by a space.

It is also not clear where you want the results to go, or where your
source is.

I have made some assumptions:

Your source is in column A, starting with row 1
Your results will go into column B, starting with row 1
You will start a new line if it starts with two consecutive capital
letters or a pattern of Capital<space>Capital

There are some other limits in the code, having to do with size, but I
doubt they will cause an issue, given the information you have
provided.

To enter this Macro (Sub), <alt-F11> opens the Visual Basic Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.

To use this Macro (Sub), <alt-F8> opens the macro dialog box. Select the
macro by name, and <RUN>.

============================================
Option Explicit
Sub SplitTextAtCaps()
Dim rg As Range, c As Range
Dim rDest As Range
Dim s As String
Dim v As Variant, vSrc As Variant
Dim re As Object
Set rg = Range("A1", Cells(Rows.Count, "A").End(xlUp))
Set rDest = rg(1, 1).Offset(columnoffset:=1)
vSrc = rg
For Each v In vSrc
s = s & vbLf & v
Next v
s = Mid(s, 2)

Set re = CreateObject("vbscript.regexp")
With re
.Global = True
.Pattern = vbLf & "(?![A-Z]\s?[A-Z])"
End With

s = re.Replace(s, " ")
v = Split(s, vbLf)

rDest.EntireColumn.Clear
Set rDest = rDest.Resize(rowsize:=UBound(v) + 1)
rDest = WorksheetFunction.Transpose(v)

End Sub
==================================
Worked perfectly. Thank you very very much.
 
R

Ron Rosenfeld

Actually, mine worked just fine with the sample data.

Don,

On my system (W7x64, Excel 2007), copy/pasting the data and your macro, your routine
Inserted a new row prior to any row starting with a capitalized string of words (including prior to the first row)
Did NOT combine the other rows.

So the result I see, after running your macro, which hopefully won't be messed up by the newsreaders:

======================

FOUR SCORE and seven years ago
our fathers brought forth on
this continent

A NEW NATION conceived
in liberty and dedicated
to the proposition
that all men are created equal

NOW WE ARE ENGAGED in a great civil war
testing whether that nation

OR ANY NATION
so conceived and so dedicated
can long endure

WE ARE MET ON A GREAT battle-field
of that war
==========================

In interpreted that what the OP wanted was also to combine the other rows, resulting in:

===============================
FOUR SCORE and seven years ago our fathers brought forth on this continent
A NEW NATION conceived in liberty and dedicated to the proposition that all men are created equal
NOW WE ARE ENGAGED in a great civil war testing whether that nation
OR ANY NATION so conceived and so dedicated can long endure
WE ARE MET ON A GREAT battle-field of that war
=========================================
 
D

Don Guillett

I have text that looks like this:

###

FOUR SCORE and seven years ago
our fathers brought forth on
this continent
A NEW NATION conceived
in liberty and dedicated
to the proposition
that all men are created equal
NOW WE ARE ENGAGED in a great civil war
testing whether that nation
OR ANY NATION
so conceived and so dedicated
can long endure
WE ARE MET ON A GREAT battle-field
of that war

###

There is no regularity to the number of lower-cased rows between the
capitalized rows, nor is there any regularity to how many letters are
capitalized at the beginning of the capitalized rows. I want the output
to look like this, so that Excel combines rows, adding a space at the
front and starting a new row when it hits a capital letter:

###

FOUR SCORE and seven years ago our fathers brought forth on this
continent
A NEW NATION conceived in liberty and dedicated to the proposition that
all men are created equal
NOW WE ARE ENGAGED in a great civil war testing whether that nation
OR ANY NATION so conceived and so dedicated can long endure
WE ARE MET ON A GREAT battle-field of that war

###

And to make things even more complicated, sometimes the lower-cased rows
actually start with a single capitalized letter. So really, I want it to
start a new row when it hits more than TWO capital letters in a row.

Any ideas? Thanks in advance.



I have text that looks like this:

###

FOUR SCORE and seven years ago
our fathers brought forth on
this continent
A NEW NATION conceived
in liberty and dedicated
to the proposition
that all men are created equal
NOW WE ARE ENGAGED in a great civil war
testing whether that nation
OR ANY NATION
so conceived and so dedicated
can long endure
WE ARE MET ON A GREAT battle-field
of that war

###

There is no regularity to the number of lower-cased rows between the
capitalized rows, nor is there any regularity to how many letters are
capitalized at the beginning of the capitalized rows. I want the output
to look like this, so that Excel combines rows, adding a space at the
front and starting a new row when it hits a capital letter:

###

FOUR SCORE and seven years ago our fathers brought forth on this
continent
A NEW NATION conceived in liberty and dedicated to the proposition that
all men are created equal
NOW WE ARE ENGAGED in a great civil war testing whether that nation
OR ANY NATION so conceived and so dedicated can long endure
WE ARE MET ON A GREAT battle-field of that war

###

And to make things even more complicated, sometimes the lower-cased rows
actually start with a single capitalized letter. So really, I want it to
start a new row when it hits more than TWO capital letters in a row.

Any ideas? Thanks in advance.

Ron, You are absolutely correct. I read it wrong. So, still keeping it simple, how about this.

Sub iftwocaplettersbringuprow()
'assumes column A Correct wordwrap if necessary
Dim i As Long
For i = Cells(Rows.Count, 1).End(xlUp).Row To 1 Step -1
If Left(Cells(i, 1), 2) <> UCase(Left(Cells(i, 1), 2)) Then
Cells(i - 1, 1) = Cells(i - 1, 1) & " " & Cells(i, 1)
Rows(i).Delete
End If
Next i
Columns(1).AutoFit
End Sub
 
D

Don Guillett

I have text that looks like this:

###

FOUR SCORE and seven years ago
our fathers brought forth on
this continent
A NEW NATION conceived
in liberty and dedicated
to the proposition
that all men are created equal
NOW WE ARE ENGAGED in a great civil war
testing whether that nation
OR ANY NATION
so conceived and so dedicated
can long endure
WE ARE MET ON A GREAT battle-field
of that war

###

There is no regularity to the number of lower-cased rows between the
capitalized rows, nor is there any regularity to how many letters are
capitalized at the beginning of the capitalized rows. I want the output
to look like this, so that Excel combines rows, adding a space at the
front and starting a new row when it hits a capital letter:

###

FOUR SCORE and seven years ago our fathers brought forth on this
continent
A NEW NATION conceived in liberty and dedicated to the proposition that
all men are created equal
NOW WE ARE ENGAGED in a great civil war testing whether that nation
OR ANY NATION so conceived and so dedicated can long endure
WE ARE MET ON A GREAT battle-field of that war

###

And to make things even more complicated, sometimes the lower-cased rows
actually start with a single capitalized letter. So really, I want it to
start a new row when it hits more than TWO capital letters in a row.

Any ideas? Thanks in advance.
 
R

Ron Rosenfeld

Ron, You are absolutely correct. I read it wrong.

Something I do all to frequently.
So, still keeping it simple, how about this.

Sub iftwocaplettersbringuprow()
'assumes column A Correct wordwrap if necessary
Dim i As Long
For i = Cells(Rows.Count, 1).End(xlUp).Row To 1 Step -1
If Left(Cells(i, 1), 2) <> UCase(Left(Cells(i, 1), 2)) Then
Cells(i - 1, 1) = Cells(i - 1, 1) & " " & Cells(i, 1)
Rows(i).Delete
End If
Next i
Columns(1).AutoFit
End Sub

That seems to work properly here, Don.

Comment:
Because of an exposure to a rather large dataset, 500,000 rows, I've been trying to use procedures where I do most of the "work" within the macro, rather than repeatedly accessing the worksheet. After setting the range, I read it all into a variant variable, e.g. v = range("TheSourceRange") And then do the work within the array; finally writing the array back to the range, e.g. range("TheDestinationRange") = v. It was much faster on the larger DB compared with processing each row one at a time.

There are some caveats with the technique, as some care needs to be taken if TheSourceRange might be just a single cell; or if the data needs to be transposed and there are more than 2^16 elements in a 1D array.
 
D

Don Guillett

I have text that looks like this:

###

FOUR SCORE and seven years ago
our fathers brought forth on
this continent
A NEW NATION conceived
in liberty and dedicated
to the proposition
that all men are created equal
NOW WE ARE ENGAGED in a great civil war
testing whether that nation
OR ANY NATION
so conceived and so dedicated
can long endure
WE ARE MET ON A GREAT battle-field
of that war

###

There is no regularity to the number of lower-cased rows between the
capitalized rows, nor is there any regularity to how many letters are
capitalized at the beginning of the capitalized rows. I want the output
to look like this, so that Excel combines rows, adding a space at the
front and starting a new row when it hits a capital letter:

###

FOUR SCORE and seven years ago our fathers brought forth on this
continent
A NEW NATION conceived in liberty and dedicated to the proposition that
all men are created equal
NOW WE ARE ENGAGED in a great civil war testing whether that nation
OR ANY NATION so conceived and so dedicated can long endure
WE ARE MET ON A GREAT battle-field of that war

###

And to make things even more complicated, sometimes the lower-cased rows
actually start with a single capitalized letter. So really, I want it to
start a new row when it hits more than TWO capital letters in a row.

Any ideas? Thanks in advance.

Good point!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top