TOOLS FORUM

• Zhite89

Hello everyone I was hoping someone here might be able to help me out with this. I am making a NBA model and have a master player sheet with all the players and specific names from various sites. I am running into the problem however of names not matching up, so what I am trying to do is reference the names from one of my sites with the main list of names on my spreadsheet. I am just not sure how to do this. I am looking for the name in on column to reference to the other, I am guessing there is a formula but I do not know it. Thanks in advance for anyone who is able to help.

You should be able to accomplish what you want by using a pair of VLOOKUP functions.

For example, if you want to use DK salaries in a sheet that utilizes NBA.com names (and some of those names don’t match up), you could do the following…

• Create a new sheet that you enter the DK names/salaries into Columns B/C.
• Column F would be DK Names and Column G would be NBA.com names (name that range “DK_NBA” or whatever)
• Formula in A2 would be =VLOOKUP(B2,DK_NBA,2,0) and you would name that range maybe “Salaries” in this example
• Then in your other sheet(s) where you have the players names as listed on NBA.com (in column B for this example), in cell A4 you would use =VLOOKUP(B4,Salaries,3,0) and the salary for that differently named player would appear instead of an error.

Sounds like you already have a master list of different names, so you could substitute that in during Step 2 above. Also, I always nest my functions inside an IFERROR as well if it matters for calculations not having an error value.

Hope that helps and maybe someone else has a better way, but let me know if you need any clarifying or other info.

EDIT: Also, when you’re naming your ranges, I would suggest going with a bunch of extra rows so that you can just add additional players without inserting rows and/or worrying about missing someone from your lookups. So, “DK_NBA” would be like \$F\$1:\$G\$1000 even if you currently only have 400 players.

• gje627

By far the easiest way to reconcile different spellings of names is to use regular expressions and disregard trying to accomplish this task in Excel. If you’re familiar with a high level/scripting programming language like Python or Perl it’s super easy, but if not even a text editor that has regular expression find and replace functionality like Notepad++ will do the trick.

For example, let’s say you have four different spellings you’re trying to reconcile for hypothetical player “Billy Joe Bob” where your master list spells this player’s name as “Billy Joe Bob”, but DK lists his name as “BJ Bob”, FD lists his name “Bill Joseph Bob”, and Yahoo lists his name as “Billy Bob IV”

In Perl, for example, the correct regular expression syntax to reconcile all these various names into the one common name based on your master list name (Billy Joe Bob) would be as follows :

s/BJ Bob|Bill Joseph Bob|Billy Bob IV/Billy Joe Bob/gm;

Undoubtedly, you will have several names to reconcile for each slate; so you can simply list all the names changes as illustrated above into one script file and then run that file against your spreadsheet to change all names to a common name in a single operation.

Since you already have a master list of names with your preferred spelling, running this script each day against your main file is super simple and will literally take less than a single second (yes 1 second) to accomplish regardless of the size of slate or number of names to reconcile…..

• shockermandan

• Moderator

@gje627 said...

By far the easiest way to reconcile different spellings of names is to use regular expressions and disregard trying to accomplish this task in Excel. If you’re familiar with a high level/scripting programming language like Python or Perl it’s super easy, but if not even a text editor that has regular expression find and replace functionality like Notepad++ will do the trick.

At the risk of starting an argument I don’t have time to get involved with, LOL. I have lived on both sides of this fence, and I disagree. GJE’s method will take you a week to learn how to implement, and 45 seconds to implement.

Just use a darn VLOOKUP.

I believe DeadBenjamin’s explanation works, but I also explain in this video. I think your exact question is answered if you start watching around the 5 minute mark.

• gje627

The OP says he/she has a spreadsheet but doesn’t indicate whether or not he/she has Excel.

If in fact, the OP does use Excel I agree that VLOOKUP is a perfectly viable solution but alternatively I don’t agree that it would take a week to learn how to use Perl (for example) for the sole purpose of simple regular expressions. In fact, I would insist that the learning curve here for this sole task is 30 minutes or less.

In this regard, to start using regular expressions all the OP would need to do is take my sample regular expression example above, copy it into a text editor, add the appropriate Perl header (#!/usr/bin/perl) and for each player with a different name on individual sites use this same syntax where individual players are each listed on a separate line (Note: the vertical bar or “pipe” character stands for “or” in regular expressions and is not needed where there is only one alternate spelling/usage for a player name… see example below).

After all players are listed, the “script” would then be saved as a Perl file with the .pl extension, namechange.pl for example:

#!/usr/bin/perl

s/BJ Bob|Bill Joseph Bob|Billy Bob IV/Billy Joe Bob/gm;
s/Tim Hardaway/Tim Hardaway Jr./gm;
s/J.J. Redick/JJ Redick/gm;

This file would include all players on all teams, and for smaller slates only the players playing on any particular slate would be “found and replaced”. In the event a player needs to be added later, say for instance a player is called-up from the G league, a line for that player can then be simply added as necessary.

Next, Perl would need to be installed (and yes Perl is open source, so it’s free). This would likely take 5 minutes as the entire process for basic Perl installation is all automated so no learning curve here. Once installed, the only thing the OP would need to do is make sure her/his Perl directory is in the environment path and the learning curve here is approximately two minutes by simply typing “set environment path for directory in Windows 8” in Google and following the extremely simple instructions that are returned.

Now that all this is done, a command prompt would be opened, then in Windows the directory changed using the following command (or similar command depending on user’s default directory):

cd bin\perl

Now, the script is ready to run with the following command (or similar syntax acceptable in Perl):

perl -pi.bak namechange.pl masterfile.csv

That’s it.

Again, this script would only take a second (literally) to make all necessary changes and can be run each day for different slates, different data sources, etc.

Personally, I’ve used both VLOOKUP functionality and regular expressions in both Perl and Python to perform this same task.

While I much prefer the regular expression method over VLOOKUP for simplicity, efficiency of computing resources and flexibility in the event that I want to change the format of my data file (regular expressions don’t care, but VLOOKUP references and any other absolute and/or relative references in Excel would all need to be rewritten should you decide to make even basic changes in a spreadsheet) either method is perfectly acceptable and choosing one over the other in large part is simply a question of personal preference.

However, and additionally, using regular expressions and other operations in Perl or Python has the added benefit of being able to perform significantly more advanced text processing tasks should the user decide she/he wants to learn a little more at a later date.

P.S.: To reiterate, I’m not disagreeing with any other posters in this thread, just providing an alternative method for the OP to consider.

As such, I should not have included in my initial reply, first sentence, the phrase “disregard trying to accomplish this task in Excel”. Instead I should have simply stated that I personally prefer using regular expressions over VLOOKUP.

My initial reply was written quickly and late last night, and therefore was not well thought-out.

Regardless of my excuse, I sincerely apologize for any confusion…. :)

EDIT: Post updated to escape and thereby disallow the Textile parser from automatically inserting player popups for actual players in example regular expressions above.

@gje627 said...

(regular expressions don’t care, but VLOOKUP references and any other absolute and/or relative references in Excel would all need to be rewritten should you decide to make even basic changes in a spreadsheet)

Also not trying to dissuade any other options (hell, I wish I had some programming knowledge), and hopefully any discussion, adversarial or otherwise, leads to a few people learning something new. The entire goal should be to make things a little easier for the community to understand and apply.

To that end, I do have to disagree a little with the quoted section above, at least as a blanket statement as written. There are definitely some situations where it is true that some formulas may need to be rewritten, but there are also some little tricks and safeguards to help limit, and even eliminate those in most relatively simple situations. I think that Excel in general is also very good at keeping references as long as one is diligent in the process they are using. Not without its flaws, but I think it is great for novices to use and learn in making beginner to intermediate spreadsheets.

• shockermandan

• Moderator

@gje627 said...

P.S.: To reiterate, I’m not disagreeing with any other posters in this thread, just providing an alternative method for the OP to consider.

For sure! I hastily wrote my reply as well. It was pretty early.

Exploring some of these other options (like Perl) are highly valuable not only for DFS, but for life skills too. A lot of techniques I’ve applied to my real job I learned by studying how to make something work for DFS in my spare time.

I LOL’d because I was speaking at a user group meeting for some software I use for my real job, and when I started talking about Regular Expressions, everyone in the room totally glazed over. Funny enough, I was explaining how Reg Ex works and my case study was how to use them to parse NFL data.

• gje627

Personally, I think the regular expression method is in fact easier and like I said more flexible.

Also, I partially agree that Excel is generally good at keeping references, but in instances where absolute references are used Excel is not able to maintain the correct reference when a spreadsheet is changed. Using your example syntax of the absolute reference:

\$F\$1:\$G\$1000

by definition, Excel will not be able to maintain this when the overall layout of the spreadsheet is changed since the dollar signs (\$) are instructing Excel to not automatically change the cell reference in instances where columns/rows are inserted/deleted. Thus, in cases where a spreadsheet is set up with a large number of absolute references all of them will need to be rewritten.

Also, on a different matter regarding this same range example I do agree with your statement in your initial reply that adding extra rows will allow for adding player names at a later date to a spreadsheet but actually Excel has functions that cover this issue precisely, namely the OFFSET function combined with the COUNTA function.

The Excel name for ranges with a variable number of records and therefore variable ranges when additional records are added or removed is dynamic named ranges. Rather, than wasting space and explaining here a simple Google search will provide simple instructions to do this.

• gje627

@shockermandan said...

For sure! I hastily wrote my reply as well. It was pretty early.

Exploring some of these other options (like Perl) are highly valuable not only for DFS, but for life skills too. A lot of techniques I’ve applied to my real job I learned by studying how to make something work for DFS in my spare time.

I LOL’d because I was speaking at a user group meeting for some software I use for my real job, and when I started talking about Regular Expressions, everyone in the room totally glazed over. Funny enough, I was explaining how Reg Ex works and my case study was how to use them to parse NFL data.

Absolutely agree….

Like I said before, I believe that all three of us who have provided comments are correct in accomplishing the task originally posed by the OP.

Again, it’s just a matter of personal preference. :)

@gje627 said...

Also, I partially agree that Excel is generally good at keeping references, but in instances where absolute references are used Excel is not able to maintain the correct reference when a spreadsheet is changed. Using your example syntax of the absolute reference:

\$F\$1:\$G\$1000

by definition, Excel will not be able to maintain this when the overall layout of the spreadsheet is changed since the dollar signs (\$) are instructing Excel to not automatically change the cell reference in instances where columns/rows are inserted/deleted. Thus, in cases where a spreadsheet is set up with a large number of absolute references all of them will need to be rewritten.

If you insert a row (whether in the entire sheet, or just columns F&G), that formula will automatically change to \$F\$1:\$G\$1001

• gje627

If you insert a row (whether in the entire sheet, or just columns F&G), that formula will automatically change to \$F\$1:\$G\$1001

Yes, sorry….. My mistake in how I worded the reply….

What I intended to say is if you make multiple changes to the format of your spreadsheet, the references may not hold true. Let’s say you move column F to column H and move column G to column B. In this case your formula(s) won’t hold. Additionally, on the vertical level, if you add rows (or delete rows) your range of rows 1 to 1001 will not automatically update. This is where the OFFSET and COUNTA functions are helpful.

In this regard, yes, in this context I am also incorrect in distinguishing between relative and absolute data ranges, as in the example you state it does not matter. So again my apologies.

Fundamentally, however, the problem becomes evident when you add or delete a number of variables in your Excel spreadsheet (again, we’re not even sure the OP uses Excel or a different spreadsheet program).

Thus, using VLOOKUP and functions/formulas necessarily limits the manner by which you can modify your spreadsheet without redoing function/formula syntax; whereas you don’t have this limitation using regular expressions.

@gje627 said...

Additionally, on the vertical level, if you add rows (or delete rows) your range of rows 1 to 1001 will not automatically update.

I guess I’m not sure what exactly you’re referring to on this part. Inserting rows/columns will lead to automatic updates of all references to the range of cells affected. Can you give an example of what you mean? I’m probably just misunderstanding what you are trying to say, it’s not exactly easy to discuss this stuff via writing ;)

Inserting a column into a range for a VLOOKUP would mess a regular query up if the insertion was before the column reference, as would moving around the columns to a different order. I’ve started using the COLUMN function to numerically label each column and then you can use that cell in the VLOOKUP formula and it will always update with additional or moved columns. (I would imagine OFFSET is something similar, I just haven’t used that function much, or if at all before)

A Find/Replace can also be used to mass-update formulas if one is familiar with what they are doing and conscious of what may need to be updated. Once again, not as ideal as programming script I’m sure, but not really all that difficult or time-consuming with a little experience.

EDIT: I guess I should clarify that I’m referring to inserting a row/column INSIDE of a range. One could run into problems if inserting on the outside edge of a range. For that reason, I’ll make my ranges one extra row/column on each side of the data I’m actually using. Also, I think it is always a good habit to name ranges as well, makes it a little easier to use in formulas across different sheets & workbooks.

• gje627

Simple find and replace is not ideal because all players would need to be done individually, though VBA could be used but suddenly this is much more complicated than simple regular expressions performed in batch.

To word a different way, let’s assume you have variables with corresponding data in columns:

A B C D E F G

Highlighting the applicable columns of your range the columns read as follows:

A B C D E F G

Now let’s rearrange our data file where you move column F to column H because you added an additional variable and you decided you want to move the variable formerly in column G to column B; thereby shifting what was Column F two places to the right to Column H and what was column G four places to the left to Column B. Doing this your data file now looks like this:

A B C D E F G H

Unless there is a secret that I’m unaware of (please advise if I’m missing something), there is no way for Excel to automatically make this change but instead user input will be required. While this may not be a problem for small data files, for large files where a number of new data elements are added, many variables are moved around, and you have multiple functions/formulas this will be a major headache (not least of which is making your “data” file extremely bloated and potentially difficult to run when the file gets larger and larger).

Regardless, and again my fault, as I think I started a discussion that goes well beyond the scope of the original question posed by the OP.

Overall, my final general comment (and reader beware, this one’s “philosophical”) is that I guess I’m just “old school” inasmuch to me Excel and other spreadsheet programs are just that, spreadsheet programs that excel (no pun intended) at organizing and displaying data but not performing mathematical or other complex operations.

Thus, I prefer to keep my data files “clean” and manipulate and perform mathematical/statistical operations on said flat .csv file or from a database like MySQL (no, Excel is not a database either and no a.csv is not an Excel file type, even though .csv files can be opened and manipulated in Excel).

Regardless, if it helps….

I’ll concede that my method is inferior and VLOOKUP is preferable…..

• shockermandan

• Moderator

@gje627 said...

Unless there is a secret that I’m unaware of (please advise if I’m missing something), there is no way for Excel to automatically make this change but instead user input will be required.

You could use an Index/Match function with locked column references. Sorry I couldn’t help myself. Geek overload in here (myself included).

gje, your method is 100% superior. Easier is just a relative term in this scenario. Folks can get their arms wrapped around Excel a lot easier than MySQL.

The clear route here is for you to make an explainer video to show us all how easy it is.

• gje627

@shockermandan said...

You could use an Index/Match function with locked column references. Sorry I couldn’t help myself. Geek overload in here (myself included).

gje, your method is 100% superior. Easier is just a relative term in this scenario. Folks can get their arms wrapped around Excel a lot easier than MySQL.

The clear route here is for you to make an explainer video to show us all how easy it is.

I love the Geek RG forum threads… by far my favorite !!!

To clarify my comments on regular expressions and MySQL, I would probably agree that VLOOKUP is easier than using my method with MySQL, but not with a .csv file.

As far as doing a video…..

I limit my video production exclusively to performing old show tunes with my cat dancing in the background on YouTube !!!

LOL…. Sorry…. :)

EDIT: One final thing to add here…. I think ALL the advice here is excellent and I hope it helps the OP decide how to proceed…. Thanks for the great conversation guys !!!

@gje627 said...

To word a different way, let’s assume you have variables with corresponding data in columns:

A B C D E F G

Highlighting the applicable columns of your range the columns read as follows:

A B C D E F G

Now let’s rearrange our data file where you move column F to column H because you added an additional variable and you decided you want to move the variable formerly in column G to column B; thereby shifting what was Column F two places to the right to Column H and what was column G four places to the left to Column B. Doing this your data file now looks like this:

A B C D E F G H

Unless there is a secret that I’m unaware of (please advise if I’m missing something), there is no way for Excel to automatically make this change but instead user input will be required. While this may not be a problem for small data files, for large files where a number of new data elements are added, many variables are moved around, and you have multiple functions/formulas this will be a major headache (not least of which is making your “data” file extremely bloated and potentially difficult to run when the file gets larger and larger).

To start with, you are definitely correct in saying that for the purposes of a VLOOKUP, adding & moving columns as above would result in having to do some manual labor, absent any work around. The \$F\$1:\$G\$1001 and inserting rows was more in reference to the question in the OP and inserting new players to the list, so we are kind of talking about different issues, but still applicable to what you are referring to.

So in the above, let’s say that somewhere there is a formula that is VLOOKUP(A5,Players,7,0) that grabbed the value in Col G. Now after moving that column to the place of B, you would be getting the value that was formerly in Col F. So yes, you would have to change the “7” in that formula to a “2” and would have to change other formulas as well if you were pulling data from more than one column in that range. For a small number of columns, like this, not really much of a hassle, the changes and copy/paste of the formula would take a matter of a minute or two at most.

The workaround I figured out for this if dealing with a large number of columns is to use the COLUMN function in an additional row and then using that cell reference in the VLOOKUP formula instead of a set #. So the formula becomes VLOOKUP(A5,Players,B5,0) for example. I can then use the formula on multiple tables with the same variables even with inserting or moving new columns.

A real life example for me is that I have a PGA workbook setup where I have a few different sheets with different DK data. One sheet has the salary and ownership % for each tournament, one has the raw points and points percentile and another has points/\$ and relative points/\$ to the average. The latter utilizes a VLOOKUP formula using data from the first two (pts/salary). So if my headers are Row 4, I’ll have Row 3 be a formula using the COLUMN function that essentially gives each tournament an ID# (the formula might be COLUMN()-2 or something like that) and that ID will stay consistent from sheet to sheet and I won’t have to change any formulas when I add a new tournament. I could also use data from all three sheets in another place… if I wanted to see each of those variables side by side for a particular tournament for example, or maybe pull the same variable from several different tournaments into one place.

• gabrieldlt4492

you got me to attempt to create this perl script!

Whenever I run the script my csv file does not get edited :/

What could I be doing wrong?

I am just testing it out with Tim Hardaway and it aint working.

Thanks!