re writing an actual dictionary


Sarah Herrmann

I OCRed an old Maya to Spanish dictionary (hundreds of years old and
thousands of pages, as a hobby, not getting paid). Now I want to copy
it from the pdf format and put it into word, then paste it into excel
(I want the word in one column and the definition in another, so a tab
between them, another problem). Each definition ends with a period,
but there are also periods after numbers. I want to put a pilcrow
after every period, except for the periods after numbers, which need
to be removed, and to remove all of the pilcrows in random places
throughout the text. How do I use find and replace to do this. I used
^13(?.)^t and replaced with \1, but I am not experienced enough.
Please help?

Sarah Herrmann

Here is a sample:
K'ICH 2, 5: calentarse asentado al fuego 4: k'ichen: caliéntate 6:
calentarse el fuego 9: calentar a la lumbre
bajo la cama 2. K'IICH 1: calentarse a la lumbre; k'iichnene'ex:
calentaos a la lumbre, ítem: el fuego en que uno se calienta; in
k'iichi lo': éste es mi fuego donde me caliento 4: calentarse a la
lumbre 8: calentarse al fuego o al sol, refocilarse I3ddp: calentarse
en el sol o junto a las brasas en la época de frío 3. K ' I I C H ! B
I L 8: lumbre que se debe gozar calentándose a ella
4. K'IICHCHAHAL 8: vp ser aprovechada la lumbre para calentarse 5.
K'IICHINAH 1: calentarse así
6. K'IICHINTAH 1: ídem 7. K'IICH K'AAK' 1:
calentarse a la lumbre o al fuego 8. K'ICH K'IN 2, 6: calentarse al
sol 4 : k'ich k'in in ka'ah: caliénteme al sol 5: calentarse o
asentado al fuego 9. K'IICH K'IN 1, 8: calentarse al sol 10.
k'ichbil 11. K'IICHTAH 8: va calentarse a la lumbre, aprovecharse de
su calor 12. K'IICHTAHA'AN 8: pp de k'iichtah 13. k'ik'akuntal 11: vr
entibiarse 14.
p'up'ukhal 11: ídem.
K'ICHMAL 1: cubrirse de moho o de orín cosas de hierro 3: tomarse de
orín el cuchillo, tijeras o aguja; kich! Mi in puts': tomóse de orín
mi aguja.
^K'IK' 1, 2, 4-9, 11, 12: sangre 3, 4, 7, 8: k'ik' ni' och: flujo de
sangre por las narices o hemorragia 2. K'IIK' 8, 13cob: ídem 3. K'I'EL
12: ídem 4. K'IK'EL 1: sangre, denotando cuya; emel u ka'hilí k'ik'el:
sangre 4: u k'ik'el u lohkio'on: con su sangre nos redimió 7, 8, 12:
V. k'ik' 5. K'IIK'EL 8: V. k'ik'
6. K'IIK'ELIL 8: ídem 7. holom 9: ídem.
2 K'IK' 3: menstruo o regla de mujer 2. U'LA K'IK' 3: ídem 3. ILMA Ú
3: ídem.
3 K'IK' 6, 7: hule Yzabv: resina de hule 2. K'IK'CHE' 6: hule, una
resina de palo 3. K'IIK'CHE' 8: el árbol que produce la goma elástica
y otra especie de árbol.

I removed unnecessary tabs after the numbers. There are parts that I
know I will have to manually go through, and I am okay with that. Any
help will be appreciated.
ALSO!!!! There are periods behind V. that need to be removed as well.

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question
