Return of GEDCOM Import

Started by Mike Stangel on Friday, February 22, 2019
Problem with this page?

Participants:

Profiles Mentioned:

Related Projects:

Showing 1-30 of 191 posts

Hi everyone,

We're excited to announce that Geni has brought back its GEDCOM Import feature!

As you may recall, we disabled or original GEDCOM importer after too many users brought in too many trees that overlapped with existing branches on Geni. This created an extraordinary amount of work (mostly for curators) to find and merge the duplicates, resolve discrepancies, etc. So the fundamental difference with the redesigned importer, is to import a few generations at a time, stop and look for matches, continue where no matches are found but make the user resolve matches when they exist.

There are some restrictions, however -- you cannot import a GEDCOM for someone born before 1800, and all GEDCOM imports will stop once they go back to 1600. Read the full details here:

https://www.geni.com/blog/the-return-of-gedcom-imports-on-geni-3103...

Tried it this morning with a tree not connected to the Big tree. So far, so GOOD!

May I have some questions regarding GEDCOM Import?

I have a mH tree consisting of about 9000 people. Manually or using SmartCopy I've created at Geni about 3500 profiles and I do know that maybe other 500 are already in Geni but still unconnected to mine part as I had not created the profiles inbetween yet. I've tend to go one generation up, adding the parents to a current top of the tree, than adding all sibling and their descendants, obviously with spouses/partners when applicable. Thus I have all ancestors up to a certain level, all their descendants and their spouses in the tree.

What would be the best way to use the new import feature in my case to minimize the number of duplicated profiles while be sure that the import does not stop before it will import all the profiles which does not exists? I suppose that if I'll start from myself than it'll stop after matching my family and some of the top ancestors are pre 1800 so I cannot start from them...

Tigran Łaczinian I think in your case a slow, methodical approach would be best. If you can split up your GEDCOM into each branch that you want to add onto, even if it's someone born before 1800, we can help you circumvent that restriction after confirming that the branch you want to import is not already on Geni (except for the 500 unconnected but you can merge into those when they match). I know that would require a lot of separate GEDCOMs but if you can do that, I think we can get you what you want. Send me an inbox message (from my profile) if you want to coordinate that.

We've found and fixed a race condition that I believe was responsible for the importer ignoring (continuing beyond) profiles that have matches. Now we need to do two things:

1. Watch for new instances (profiles imported after the time of this message)

2. Clean up the old duplicates by cutting them at the lowest matches and then send me a link to any profile in the now-isolated branch so I can delete the whole branch. Since I won't be able to monitor 24/7 for such reports, it's probably also a good idea to mark the isolated branches fictional to avoid them getting merged in.

Mike - I am guessing there is a typo in the word "race" -- did you really mean "We've found and fixed a race condition ..." or did you mean "rare condition" or ??

I haven't seen the term "race condition" used since the late 60s/early 70s when I was designing logic boards using TTL chips within an asynchronous sequential logic scheme. Soon after that we started to use synchronous clocked logic - much more power hungry, but much safer.

Regards,

Mike King

Race conditions are alive and all-too-well. :-) Private User see https://searchstorage.techtarget.com/definition/race-condition -- basically the piece of code that says "don't continue importing here because there are matches" was competing simultaneously with the piece of code that says "find the next place to continue the import and let's GO!"

I have learned something new. Thanks!

Are you sure this has been fixed? I spent about 3 hours today merging duplicate profiles that were apparently uploaded by a dear cousin via GEDCOM yesterday (7 March 2019)....and I'm not done yet!

(Frustratingly, some of the duplicate profiles came from a GEDCOM that consisted of entries scraped from Geni... I fear that this kind of recursively-circular tree-generation will only become more of a problem over time.)

I'd make a test on a small branch cut from my myHeritage tree and there are some problems with surnames - all of them went into Birth Surname field instead of Surname, the latter remained empty. Moreover, two women who had both birth surnames as another (marriage) surname, got the profiles with only Birth Surname field.
Another problem is rather a problem with mH export than Geni - I've used the options to not export living person but it looks like it instead exported the profiles without data, so after the import I had several empty profiles in Geni. Maybe the Geni GEDCOM import feature should omit adding the profiles without any data?

Private User if any of those duplicates are still around, I'd like to take a look at it. The importer will import 5 generations of ancestors and their siblings before stopping to look for matches. So it's possible there could be duplicates in that initial import. After that, it should not continue up a branch where there are matches on the profile at the top of the branch, his/her children or grandchildren.

Tigran Łaczinian GEDCOM has no mechanism to distinguish a married name from a maiden / "birth surname". Based on the advice of our curators, we assume the name in the GEDCOM is the birth surname, unless we can see that it matches the spouse's surname in which case we move that to the last name field and put the father's surname in the birth surname field.

I think skipping blank profiles might be problematic, because they may hold together the tree structure between non-blank profiles.

Hi Mike Stangel , I think I got all of the duplicates. There were ~7 generations in this tree, total. I don't know at which node in the tree the import was initiated, so I don't know how many generations were above that. It's possible that all the duplicates made it through the initial import.

Would anything that gets flagged as a "Tree Match" also have been detected as a duplicate, or do all fields have to strictly match for the GEDCOM importer to identify a proifle as a duplicate?

In case it is in some way useful, here's the earliest known patriarch of the family I merged on 3/8, my 3rd great-grandfather:
Moshe Kanter
(The merge record is in the Revisions tab. And you can see the GEDCOM-import detritus in "About Me" indicating that the GEDCOM used Geni/MyHeritage as its source...before being re-imported into Geni as a deuplicate profile.)

Mike Stangel -- Any chance you could perhaps add a third choice in the Sources Tab - for the announcements generated by the GEDCOM imports (Source info) and the info generated by SmartCopy (source info) -- instead of having them on the Overview Screen? (or give the GEDCOM Source info its own tab, or ... )

The SmartCopy “footprint” in the about is unobtrusive, and very useful to examine from profile view.

The GEDCOM detritus can be very ugly. Editing it into coherence is beyond me.

On ONE of my profiles, all of the below was added as the result of one merge with a recently created Profile - assume they are all sources for that profile on the Ancestry Tree the Gedcom was from - anybody know if that is for sure true?

GEDCOM Source
Ancestry.com U.S. School Yearbooks Name: Ancestry.com Operations, Inc.; Location: Provo, UT, USA; Date: 2010; @R1@ "U.S., School Yearbooks, 1880-2012"; Year: 1944

GEDCOM Source
Ancestry.com New York, Passenger Lists, 1820-1957 Name: Ancestry.com Operations, Inc.; Location: Provo, UT, USA; Date: 2010; @R1@ Year: 1956; Arrival: New York, New York; Microfilm Serial: T715, 1897-1957; Microfilm Roll: Roll 8781; Line: 14; Page Number: 116

GEDCOM Source
Ancestry.com U.S. City Directories, 1821-1989 (Beta) Name: Ancestry.com Operations, Inc.; Location: Provo, UT, USA; Date: 2011; @R1@

GEDCOM Source
Ancestry.com U.S., Find A Grave Index, 1600s-Current Name: Ancestry.com Operations, Inc.; Location: Provo, UT, USA; Date: 2012; @R1@ Find A Grave

GEDCOM Source
Ancestry.com U.S., Find A Grave Index, 1600s-Current Name: Ancestry.com Operations, Inc.; Location: Provo, UT, USA; Date: 2012; @R1@ Find A Grave

For the Profile LK gives a link to, we have

GEDCOM Source
Geni World Family Tree MyHeritage The Geni World Family Tree is found on <A href="http://www.geni.com&quot; target="_blank">www.Geni.com&lt;/a&gt;. Geni is owned and operated by MyHeritage. https://www.myheritage.com/research/record-40000-116094826/moshe-ka... 23 AUG 2017 Event: Discovery Role: 40000:116094826: Added via a Person Discovery

The first link gets me to my Home Page on Geni,
the second link gets me to a MyHeritage page which tells me
"Oops, an error occurred
Please retry later, search this collection or start a new search."
-- is it actually telling us this was an import from a MyHeritage Tree that was an import from Geni
- but with errors when it comes to specifically linking to where on Geni and where on MyHeritage?
Or??

That is - the links on the Profile page get me there - the first link as it appears above just gets a "This site cannot be reached" error

Erica - not sure how unobtrusive the SmartCopy "footprints" are if there are a bunch of them - nor how useful they are for users with neither a Pro Account nor a Data Subscription - with just a Pro Account, I usually cannot glean anything from them.

But note - I definitely did not suggest eliminating them.

Think the info provided from the Smart Copy "footprint" and the "GEDCOM Source" info can be useful (at the very least, it lets you know SmartCopy or GEDCOM import was involved!) - but think the latter quickly clogs the Overview Screen, and the former can if it is used on several occasions on the same profile (so far, have definitely seen at least two instances of Smart Copy "footprints" on the same profile)

Think if they did get moved to a Tab other than Overview - some signal to indicate info in that Tab would also be advisable.

The issue is editing the information from the GEDCOM and checking the Links. So that tab would have to have an HTML enabled text box also. I can see that it’s not necessarily so easily done.

Mike Stangel - Since the re-introduction of GEDCOM Imports is also almost certainly creating an uptick in the number of merges among living and recently deceased, the following issues are possibly going to become more of a Problem:

I] When the Primary Profile was created more recently than the Secondary Profile, the View Merge Screen incorrectly shows both Profiles as Created on the Earlier Date

II] When the Primary Profile had the Profile as Living, and the Secondary Profile had the Profile as Deceased - a number of problems arise
a] . Info in the About Section on the TimelineEvent of the Secondary Profile for Death has regularly been lost to the merged profile -
b) Any documents linked to the Death Event of the Secondary Profile have regularly been lost from the Death Event of the Merged Profile
c) This one I just experienced -- You get a Profile which, if Death Date and Location were entered in the Secondary Profile, then for the Merged Profile -- it does not show Death Date or Place in the Edit Screen - but shows it in the Revision Tab as both having been entered and as currently existing
(weirdly -- it shows, and then (after an update I think) does not show, the Death Information (date, place) on the Profile. Also - did not create a Data Conflict for Living vs Deceased)

Private User except for the part about "did not create a Data Conflict for Living vs Deceased" I believe all of these issues should be fixed with a change we released yesterday, which corrected the "direction" of the merge. [The importer was merging the existing Geni profile into the GEDCOM-imported profile; we fixed it to allow the system to choose the same as it does whenever you merge any two profiles on Geni, which give precedence to Master Profiles, claimed profiles, profiles with multiple managers, profiles with more data, etc].

As for no-data-conflict-on-living-vs-deceased if you have an example of that I'd like to look into it.

Mike Stangel can you answer this user, thanks in advance
https://www.geni.com/discussions/192543?msg=1283647

Mike Stangel -
I am quite sure the problem cannot be solved by changing which Profile becomes the Main profile.

In the case of the merge I just did that had / has the problem - the previously existing profile was chosen as the Main Profile (despite being less complete). If it had been the other way - then the Created date would have been wrong in the other Profile in the View Merge screen - or have you corrected that?

The merge was done Yesterday at 2:18 PM [Central Time]

Sending you a PM with a Link to the profile - Subject: Primary was living, Secondary Deceased -- I will hold off making more changes to it until you tell me you have no more need of it.

"make the user resolve matches when they exist" it says.

Someone joined Geni yesterday and added 7000+ profiles via this gedcom load. Over several generations it took a curator to merge the profiles.

This uncovered other problems. If timeline notes are attached to an event (say, birth), and the merger selects the other profile, then the notes have gone forever. As far as I can see, no further data was added but existing data has disappeared.

I'm a basic member (like the vast proportion of members) and therefore cannot check for duplicates. Do I now have an extensive problem?

The data provided in the About by the utility is not helpful - what does it represent and where it it come from?

Ann Newport

@Elaine Tregear
Lol was that me? I'm new to all of this and uploaded my mothers bulk gedcom file

Mike Stangel -- When a merge is done, are Timeline Event notes supposed to be Concatenated - or is it supposed to happen, as Elaine says above it is happening -- "If timeline notes are attached to an event (say, birth), and the merger selects the other profile, then the notes have gone forever. "

I know in the past, this would happen if there were notes in the Death Event of the Secondary Profile, and the Primary Profile was Living (so did not have a Death Event) - but believe (but am no longer sure) if both were deceased, it did not happen -- and everyone has a Birth Event -
Private User - are you sure it is currently happening for Birth Events?
(also - FYI - if by any chance you tagged a Document to both a person and that Timeline Event -- you should be able to find the floating Timeline Event from the Profile of the person via the Document tagged to both - and then you can copy the info to the 'new" timeline event before tagging the document to the new timeline event)

Definitely present - hence the About notes.

The revisions show a complicated sequence - I doubt that I could ever work it out. Some of my recent changes are not present - presumably on the other side of a merge.

It would appear that the matching and resolving is not up to the gedcom loader as anyone can have a stab at it, so the "make the user resolve matches when they exist" is not correct, as Ann Newport demonstrates.

A side note for the translators "race-condition" is the same in every language. Ofcourse you can translate "condition" in your local language. I am BofS this kind of things.

Please STOP GEDCOM import until the bugs have been resolved.

At present it is generating more problems than can be handled.

Showing 1-30 of 191 posts

Create a free account or login to participate in this discussion