When looking for matches, why is Geni so utterly uninformed as to suggest e.g. a living "Hans Hansen" possibly matching some other "Hans Hansen" whose father was born 1635? This is a most unlikely match!
By using standard logic for dealing with likely uncertainties in age, Geni could do a LOT better! It could also warn us e.g. when entering birth data for a daughter at a time when the mother was 5 years old.
Am I the only one annoyed by this? And is there any work going on along these lines in the refinement and development of this otherwise excellent tool?
I believe there is and on several fronts.
The one I'm involved with is by merging in the duplicates to improve the integrity of "one tree." The less duplication of historical profiles there are (i.e., one profile to represent every ancestor), the cleaner and more accurate the path between and therefore the suggested matches.
One of the techs here. :-) We more or less disregard living status when matching, because we've seen so many historical profiles incorrectly marked as living. It's especially a problem with GEDCOM imports, we've found.
That said, if you have birth years on these profiles, we shouldn't have matched someone born in 1635 with someone born in the past century. Consider, also, that our matches must also meet some matching criteria on the immediate family names, so we're not just matching every Hand Hansen with every other one.
Regarding warnings for impossible data, we do some detection but I agree that we ought to do more. Unfortunately there's always a prioritization game with every release, resulting in some obviously-beneficial features never quite bubbling to the top.
Thanks for entering the discussion, Michael. As working in the IT industry myself, I fully understand your view. :)
But providing Geni with some knowledge of expected life lengths of human beings (leaving postulated Methusalems out :)), and of deducting possible time intervals for birth/death from the surrounding information in ancestors ond descendants, would be very useful for pruning out obviously wrong matches.
We've long considered recording an estimated birth date for profiles lacking them, based on other life event dates, and birth dates of siblings, spouses, parents and children. The reason we haven't done it, is because most profiles that are missing a birth date, are connected to a bunch of other profiles that are also missing a birth date. The likelihood of this benefiting many profiles seems small (but to be fair, we haven't counted).
Right, that's exactly why I think we should do it; if a profile has no birth or death dates and you're evaluating a merge with one born in, say, 1600, it would be helpful to know if the first profile has siblings or spouses born around that time. If he's married to a woman born in 1860, you probably shouldn't merge!
Just to show the crazy suggestion Geni does now: A match between two "Peder Jensen" born 500 years apart! Look: http://screencast.com/t/O4RCm4jOE7hU
I mean, this does not take an Einstein to avoid....
Since Erica Howton suggested on the Pro Users Help discussion http://www.geni.com/discussions/99133?page=7 -- to make my point here - I will post it here as well:
1) Over & Over - I said - " it is a Geni Problem & up to GENI "to solve its Merging Problems - Caused by Private Profiles - and Not "non-Paying Users". Just Apply the 99 or 120 year Dead Rule (if or what the law states) - and have the Geni Program "Automatically" - make the Profiles Public. That should have been the Merge Problem solution - Not "Removing Capabilities" - from non-Paying Users, many crucial to our genealogy research and Geni's success !!
2) her response: A date rule doesn't solve the problem of profiles with no date in them.
3) my comment: You are right for some profiles. But for many others (I have seen plenty) - it is a simple algorithm (based on # of generations, etc.) - to determine that the person Must be older that 100 years. It is amazing to me - how many People have to waste their time - in regards to this #1 Problem for Geni & its Users. Program Modification ='s the Solution - Not "Removing Capabilities" - from non-Paying Users - OR more Curators & Pro's performing Merges.
I agree with the comments made here like Adding; before/after/between - standard on all PC genealogy programs.
As a former IT manager, consultant with DB marketing mid-size firms (merges, purges, match-code creation, etc), 7 years genealogy (50,000+ names Public web site, with only 1st & Last name for living individuals) - there Must be a Software Solution to Fix the #1 Problem. Automation and the Kiss method - should be used - not more Curators and Pro users performing the Merges (and removing the capability from non-Payers).
Just my 2 cents :)
I posted some rules that could be used to determine if a person is living or not and if dates are probably correct or not in http://www.geni.com/discussions/99067?msg=731345
If if there are no dates on a profile it is still possible to calculate a likely min and max value for dates. Using those to determine the likelihood a a match would help.
The match proposal now always states that the match is with another tree. I'm no pro user so I could and cannot check this, but I got the impression that a lot of those proposed matches where with profiles that where in my tree (and/or where with profiles that where added by me)
A description of current matching logic would be interesting.
Mike Stangel wrote here almost a year ago: "We more or less disregard living status when matching, because we've seen so many historical profiles incorrectly marked as living. It's especially a problem with GEDCOM imports, we've found."
That might have been true then; I am not sure it is now.
I have recent case of a whole family being marked living when should have been deceased. No matches indicated. Started marking them deceased and suddenly matches started appearing.
David,
Where the matches with profiles with a deceased status?
It could be the the match takes into account a difference in status, but it could also mean that Geni does not attempt to match profiles with a living status or with a lower priority (those would be outside the big tree)
It would be nice to know if and how how the different name fields are used. There seem to be a large amount of profiles where birthname is not given and where lastname is incorrect given as the name of the husband. Matching should take that into account.
There is also different use of the first- and middle name fields.
Will 'also known as' and 'display name' be used?
(Rules for using these fields are a bit vague (http://wiki.geni.com/index.php/Naming_Conventions#Alternative_Names...) and don't specify how spelling variants should be registered.
What about language differences? The wiki states the name field should be in the original language (http://wiki.geni.com/index.php/Naming_Conventions#Names_in_original...)
The matching algorithm has evolved over time and has a lot of little corner cases that would take too long to fully document here, but I'm happy to answer your questions.
* We now require a match on living / deceased
* Last name and birth surname are matched against each other
* Middle name and display name are not matched
* Immediate family first names are matched, requiring usually 2 or more to match (and at least one parent, if any)
* Alphabets are NOT transliterated to Latin before matching, .e.g. Иван will not match Ivan
Mike Thanks.
I hope you can add a little info on this as well:
How about the matching of first names. A lot a profiles have more than one name in them, others have some of the information in middle name. You state middle name is not matched, but will two first names match if the first name in first name match or should the whole first name fields match or ...
Is the also known as used? If it is used can you explain how?
"FirstName" + "MiddleName" will match "FirstName MiddleName" + "" because, as Mike said, the separate middle name is ignored, so the match you get is "FirstName" will match ""FirstName MiddleName"
So technically (assuming last name and other rules Mike mentioned match)
✔ "John" will match "John Doe"
✘ "John Doe" will not match "John"
✘ "John Doe" will not match "John" + "Doe"
but unfortunately
✔ "John" will also match "Saint John"
and even worse
✔ "John" + "Edvard" will match "John Doe"
and the same with all John's with anything in the middle name field
Mike & Bjørn,
Thanks. That helps a lot in understanding why there are so many false matches.
There is still the question about the 'also known as' field. Is that used in any way?
From Mike's post from a year ago in this discussion, I take it that date information is now also used to determine a match. Is that correct?
Is there any information on how that works?
Bjørn - You are now explaining what I have often wondered about - cases where Geni thinks that Profile B is a match for Profile A, but does not think that Profile A is a match for Profile B.
Bjørn's example shows that perfectly:
✔ "John" will match "John Doe"
✘ "John Doe" will not match "John"
But my question for Mike: Is that sensible?
Isn't is simple logic that if Geni thinks that Profile B is a match for Profile A, it should also think that Profile A is a match for Profile B?
Geni should flag both or neither. If B is a match for A, then A should be a match for B. Saying "I think A and B are the same person" is the same as saying "I think B and A are the same person". There should always be such reciprocity.
You can't say: "I think A and B are the same person but I do not think B and A are the same person"
Or am I missing something fundamental here?
Job, yes we will exclude matches on birth / death year where both exist and are not within 4 years of each other.
David you're certainly right conceptually, but our search engine isn't very good at high-order thinking. :-) We hope to do some additional work on this but it's not the highest priority.
For engineering, a new back-end architecture that fully exposes revisions and makes it easier to revert to previous values, including undoing merges. We're also working on translatable relationship paths and improving the speed of traversing the tree for things like the Tree tab, path searches, etc.