![]() |
€uromeinke, FEJ. and Ghoulish Delight RULE!!! NA abides. |
|
![]() |
#1 | |
Chowder Head
Join Date: Jan 2005
Location: Yes
Posts: 18,500
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Quote:
Smith, John... 1, 2, 3 Smith, John A... 2, 3 Smith, John Adam... 3 Even though they may be "different" John Smiths, they would still be combined anyways. And the likelihood of different John Smiths (with the names I am dealing with) is pretty minimal and not worth worrying about.
__________________
The thing about quotes on the internet is that you cannot verify their validity.
- Abraham Lincoln |
|
![]() |
Submit to Quotes
![]() |
![]() |
#2 |
.
Join Date: Feb 2005
Posts: 13,354
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
So the index lists for each individual needs to be combined into a single list as well?
|
![]() |
Submit to Quotes
![]() |
![]() |
#3 | |
Chowder Head
Join Date: Jan 2005
Location: Yes
Posts: 18,500
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Quote:
To expand on what I do with this. We create the body of this directory using FileMaker Pro and output it to a PDF document. There is no way within FMP (the way we are running this program) to capture the indexing data as we are running the directory. We found a program called Acrobotics PDF Index Creator, which takes a TXT file of the data to be found and processes it against the PDF file to find all instances of each record in the TXT file. I did try adding quote marks around each separate instance ("Smith, John" etc) but then it didn't find anything. All data is cleaned in Excel before being imported into FMP which makes it simple for me to capture the name data for the TXT file. That is why I am trying to suppress records in Excel.
__________________
The thing about quotes on the internet is that you cannot verify their validity.
- Abraham Lincoln |
|
![]() |
Submit to Quotes
![]() |
![]() |
#4 |
.
Join Date: Feb 2005
Posts: 13,354
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Here's a brute force method. I'm sure I'll think of something graceful later when I can't actually test it out at work.
Assume your name list is in Column A and consists of the following five names (starting in cell A1): Jones, Greg Smith, John Smith, John A. Smith, John Adam Wilson, Peter Then, on the assumption that a first and last name match is sufficient for your purposes of assuming that they are the same person: In cell B1 enter the following formula: Code:
=IF(ISERR(LEFT(A1,FIND(" ",A1,1))&LEFT(RIGHT(A1,LEN(A1)-FIND(" ",A1,1)),FIND(" ",RIGHT(A1,LEN(A1)-FIND(" ",A1,1)),1))), LEFT(A1,FIND(" ",A1,1)) & TRIM(RIGHT(A1,FIND(" ",A1,1)-3)),TRIM(LEFT(A1,FIND(" ",A1,1))&LEFT(RIGHT(A1,LEN(A1)-FIND(" ",A1,1)),FIND(" ",RIGHT(A1,LEN(A1)-FIND(" ",A1,1)),1)))) This will result in cell B1 saying "Jones, Greg" which is no change from A1. However, when you copy that formula down the column it will result in this: Code:
Jones, Greg Jones, Greg Smith, John Smith, John Smith, John A. Smith, John Smith, John Adam Smith, John Wilson, Peter Wilson, Peter So, all of the names that did consist of more than just first and last are now trimmed. Into cell C2, copy this formula (note, you're leaving C1 blank): Code:
=IF(B2=B1, "DUP", " ") Now what you'll have is: Code:
Jones, Greg Jones, Greg Smith, John Smith, John Smith, John A. Smith, John DUP Smith, John Adam Smith, John DUP Wilson, Peter Wilson, Peter In other words, each row that has a trimmed name that is identical to the row before is now labelled "DUP". You can now quickly scroll through the list and see the duplicates. If you need to manually combine the index lists for each name this may be fine. If you don't need to do that and just want to delete each duplicate row then do this: Insert a new column between B and C. Right click on the "B" at the top of the B column so the entire column is highlighted and press ctrl-c. Right click on the new C column and select "Paste special" from the options list. Select the "Values" radio button from the list of options in the window that results. This will paste the values of the B column into C without the underlying formulas. Update the formula in what is now column D to reference Column C instead of B. Now you can sort the entire spreadsheet by column D, which will group all the duplicate rows and you can mass select and delete them. Resort by Column A to get them back in alphabetical order and voila. Very brute force and lacking grace. |
![]() |
Submit to Quotes
![]() |
![]() |
#5 |
.
Join Date: Feb 2005
Posts: 13,354
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Here's a brute force method. I'm sure I'll think of something graceful later when I can't actually test it out at work.
Assume your name list is in Column A and consists of the following five names (starting in cell A1): Jones, Greg Smith, John Smith, John A. Smith, John Adam Wilson, Peter Then, on the assumption that a first and last name match is sufficient for your purposes of assuming that they are the same person: In cell B1 enter the following formula: Code:
=IF(ISERR(LEFT(A1,FIND(" ",A1,1))&LEFT(RIGHT(A1,LEN(A1)-FIND(" ",A1,1)),FIND(" ",RIGHT(A1,LEN(A1)-FIND(" ",A1,1)),1))), LEFT(A1,FIND(" ",A1,1)) & TRIM(RIGHT(A1,FIND(" ",A1,1)-3)),TRIM(LEFT(A1,FIND(" ",A1,1))&LEFT(RIGHT(A1,LEN(A1)-FIND(" ",A1,1)),FIND(" ",RIGHT(A1,LEN(A1)-FIND(" ",A1,1)),1)))) This will result in cell B1 saying "Jones, Greg" which is no change from A1. However, when you copy that formula down the column it will result in this: Code:
Jones, Greg Jones, Greg Smith, John Smith, John Smith, John A. Smith, John Smith, John Adam Smith, John Wilson, Peter Wilson, Peter So, all of the names that did consist of more than just first and last are now trimmed. Into cell C2, copy this formula (note, you're leaving C1 blank): Code:
=IF(B2=B1, "DUP", " ") Now what you'll have is: Code:
Jones, Greg Jones, Greg Smith, John Smith, John Smith, John A. Smith, John DUP Smith, John Adam Smith, John DUP Wilson, Peter Wilson, Peter In other words, each row that has a trimmed name that is identical to the row before is now labelled "DUP". You can now quickly scroll through the list and see the duplicates. If you need to manually combine the index lists for each name this may be fine. If you don't need to do that and just want to delete each duplicate row then do this: Insert a new column between B and C. Right click on the "B" at the top of the B column so the entire column is highlighted and press ctrl-c. Right click on the new C column and select "Paste special" from the options list. Select the "Values" radio button from the list of options in the window that results. This will paste the values of the B column into C without the underlying formulas. Update the formula in what is now column D to reference Column C instead of B. Now you can sort the entire spreadsheet by column D, which will group all the duplicate rows and you can mass select and delete them. Resort by Column A to get them back in alphabetical order and voila. Very brute force and lacking grace. |
![]() |
Submit to Quotes
![]() |