This repository consists of tools for use with KTH's LMS and other systems to facilitate e-learning activities of faculty, students, and staff.
These tools are intended to be examples of how one can use the Canvas Restful API and to provide some useful functionality (mainly for teachers).
Programs can be called with the option "-v" or "--verbose" you get lots of output - showing in detail the operations of the program.
Additionally, programs can be called with an alternative configuration file using the syntax: --config FILE
for example: --config config-test.json
See the default-config.json file for an example of the structure of this file. Replace the string xxx by your access token and replace the string yyy.instructure.com with the name of the server where your Canvas LMS is running.
======================================================================
To setup a degree project course.
./setup-degree-project-course.py cycle_number course_id school_acronym
cycle_number is either 1 or 2 (1st or 2nd cycle)
"-m" or "--modules" set up the two basic modules (Gatekeeper module 1 and Gatekeeper protected module 1)
"-p" or "--page" set up the two basic pages for the course
"-s" or "--survey" set up the survey
"-S" or "--sections" set up the sections for the examiners and programs
"-c" or "--columns" set up the custom columns
"-p" or "--pages" set up the pages
"-a" or "--assignments" set up the assignments (proposal, alpha and beta drafts, active listner, self-assessment, etc.)
"-A" or "--all" set everything up (sets all of the above options to true)
with the option "-v" or "--verbose" you get lots of output - showing in detail the operations of the program
Can also be called with an alternative configuration file:
./setup-degree-project-course.py --config config-test.json 1 12683
(Very limited unless in verbose mode)
Note that the program can generate the course code list, course names, and examiner information for any of KTH's schools (as it takes the data from KOPPS) [However, I have only tried it thus far for SCI.]
Note it is not designed to be run multipe times. If you want to run it again you need to delete the things (modules, assignments, and quiz) that were created. Programs to help with this can be found at https://github.com/gqmaguirejr/Canvas-tools
For the survey, the code collects information about all of the exjobb courses owned by a given school and adds all of these to a pull-down menu for the student to select which course code they want to register for. Similarly the student can suggest an examiner from a pull-down that is generated from all of the examiners for exjobbs of a given level as specified in KOPPS for the relevant courses. Note that there is no automatic transfer (yet) of the material from the survey to the custom columns.
When generating sections, the code generates sections for each of the programs and each of the examiners to make it easy for PAs and examiners to keep track of the progress of their students.
Set up the modules:
./setup-degree-project-course.py --config config-test.json -m 1 12683
Set up the survey:
./setup-degree-project-course.py --config config-test.json -s 1 12683 EECS
Set up sections for the examiners and programs
./setup-degree-project-course.py --config config-test.json -S 2 12683 EECS
./setup-degree-project-course.py --config config-test.json -S 2 12683 SCI
The contents of the Introduction pages and assignments need to be worked over. The assignments could be added to one of the modules.
Missing yet are the updated template files for 2019 and any other files in the course.
Also missing is adding the examiners automatically to the course. However, perhaps this should be left to the normal Canvas course room creation scripts.
To collects data from KOPPS use later by setup-degree-project-course-from-JSON-file.py to set up a course (these two programs are designed to be a replacement for setup-degree-project-course.py)
./setup-degree-project-course-from-JSON-file.py cycle_number course_id school_acronym
where cycle_number is either 1 or 2 (1st or 2nd cycle)
Aa file of the form course-data-{school_acronym}-cycle-{cycle_number}.json
To setup a degree project course based upon collected data
Takes data from a file of the form course-data-{school_acronym}-cycle-{cycle_number}.json
./setup-degree-project-course-from-JSON-file.py cycle_number course_id school_acronym
cycle_number is either 1 or 2 (1st or 2nd cycle)
"-m" or "--modules" set up the two basic modules (Gatekeeper module 1 and Gatekeeper protected module 1)
"-p" or "--page" set up the two basic pages for the course
"-s" or "--survey" set up the survey
"-S" or "--sections" set up the sections for the examiners and programs
"-c" or "--columns" set up the custom columns
"-p" or "--pages" set up the pages
"-a" or "--assignments" set up the assignments (proposal, alpha and beta drafts, active listner, self-assessment, etc.)
"-A" or "--all" set everything up (sets all of the above options to true)
with the option "-v" or "--verbose" you get lots of output - showing in detail the operations of the program
Can also be called with an alternative configuration file:
./setup-degree-project-course.py --config config-test.json 1 12683
(Very limited unless in verbose mode)
Note that the program can generate the course code list, course names, and examiner information for any of KTH's schools (as it takes the data from KOPPS) [However, I have only tried it thus far for SCI.]
Note it is not designed to be run multipe times. If you want to run it again you need to delete the things (modules, assignments, and quiz) that were created. Programs to help with this can be found at https://github.com/gqmaguirejr/Canvas-tools
When generating sections, the code generates sections for each of the programs and each of the examiners to make it easy for PAs and examiners to keep track of the progress of their students.
Set up the modules:
./setup-degree-project-course-from-JSON-file.py --config config-test.json -m 1 12683
Set up the survey:
./setup-degree-project-course-from-JSON-file.py --config config-test.json -s 1 12683 EECS
Set up sections for the examiners and programs
./setup-degree-project-course-from-JSON-file.py --config config-test.json -S 2 12683 EECS
./setup-degree-project-course-from-JSON-file.py --config config-test.json -S 2 12683 SCI
The contents of the Introduction pages and assignments need to be worked over. The assignments could be added to one of the modules.
Missing yet are the updated template files for 2019 and any other files in the course.
To collect data via a dynamic quiz - uses data collected from KOPPS to build the content of many selections (courses and examiners)
The data is assumed to be in a file: course-data-{school_acronym}-cycle-{cycle_number}.json
Outputs values collected are stored into the Canvas gradebooks
Use the new KOPPS v2 API to get information about programs and specializations
Takes as a command line argument school_acronym, but only currently uses it to form the name of the output file
Outputs program acronyms and names in English and Swedish as well as the acronyms and names in English and Swedish of specializations in a file with a name in the format: progs-codes-etc-<program_code>.xlsx
To enable an examiner to generate an announcement for an oral presenation for a 1st or 2nd cycle degree project, make a cover, and set up a 10th month warning.
ruby announce-presentation.rb
(ideally) it will put an announcement into the Polopoly calendar for the school and insert an announcement into the Canvas course room for this degree project
To enable an examiner to generate an announcement for an oral presenation for a 1st or 2nd cycle degree project, make a cover, and set up a 10th month warning. Note that this version uses HTTPS, hence there is a need to set up certificates.
ruby s-announce-presentation.rb
(ideally) it will put an announcement into the Polopoly calendar for the school and insert an announcement into the Canvas course room for this degree project
To generate (for test) a cover from fixed information via the KTH cover generator
ruby generate_cover.rb
Creates a file test1.pdf that contains the front and back covers as generated
Connects to the trita database and list each of the trita related tables
ruby list_trita_tables.rb
Output of the form: ruby list_trita_tables.rb {"schemaname"=>"public", "tablename"=>"eecs_trita_for_thesis_2019", "tableowner"=>"postgres", "tablespace"=>nil, "hasindexes"=>"t", "hasrules"=>"f", "hastriggers"=>"f", "rowsecurity"=>"f"} {"id"=>"1", "authors"=>"James FakeStudent", "title"=>"A fake title for a fake thesis", "examiner"=>"Dejan Kostic"} {"id"=>"2", "authors"=>"xxx", "title"=>"xxx", "examiner"=>"yyy"} {"id"=>"3", "authors"=>"xx", "title"=>"xxx", "examiner"=>"yyy"} ...
Connects to the trita database and list each of the trita related tables
ruby remove_trita_tables.rb
Output of the form (showing the tables being deleted): ruby remove_trita_tables.rb {"schemaname"=>"public", "tablename"=>"eecs_trita_for_thesis_2019", "tableowner"=>"postgres", "tablespace"=>nil, "hasindexes"=>"t", "hasrules"=>"f", "hastriggers"=>"f", "rowsecurity"=>"f"} ...
To scrape the number of downloads of a document in DiVA.
./get-downloads-for-diva-documents.py diva2_ids.xlsx
Outputs diva-downloads.xlsx a spreadsheet of the number of downloads
The diva2_ids.xlsx must have a 'Sheet1'. The first columns of this spreadsheet should have a column heading, such as "diva2 ids". The values in the subsequent rows of this column should be of the form: diva2:dddddd, for example: diva2:1221139
./get-downloads-for-diva-documents.py diva2_ids.xlsx
To output custom data for each user in a course
./custom-data-for-users-in-course.py course_id
Prints the custom data for each user in a course
with the option '-C'or '--containers' use HTTP rather than HTTPS for access to Canvas with the option -t' or '--testing' testing mode
with the option "-v" or "--verbose" you get lots of output - showing in detail the operations of the program
Can also be called with an alternative configuration file: ./custom-data-for-users-in-course.py --config config-test.json
./custom-data-for-users-in-course.py 4
./custom-data-for-users-in-course.py --config config-test.json 4
./custom-data-for-users-in-course.py -C 5
Edit the text for an external tool for the given course_id
./edit-external-tool-for-course.py course_id tool_id 'navigation_text'
Outputs information about the external tool
with the option '-C'or '--containers' use HTTP rather than HTTPS for access to Canvas with the option "-v" or "--verbose" you get lots of output - showing in detail the operations of the program
Can also be called with an alternative configuration file: ./create_fake_users-in-course.py --config config-test.json
./edit-external-tool-for-course.py 4 2 'TestTool'
./edit-external-tools-for-course.py --config config-test.json 4 2 'TestTool'
./edit-external-tools-for-course.py -C 5 2 'TestTool'
change the tool URL to https
./edit-external-tools-for-course.py -s -C 5 2 'TestTool'
To get the information needed for covers of degree project reports (i.e., theses)
./cover_data.py school_acronym
"-t" or "--testing" to enable small tests to be done
with the option "-v" or "--verbose" you get lots of output - showing in detail the operations of the program
Produces a spreadsheet containing all of the data about degree project courses The filë́s name is of the form: exjobb_courses-{school_acronym}.xlsx
Can also be called with an alternative configuration file:
./setup-degree-project-course.py --config config-test.json 1 EECS
To list the curstom columns entries for a course
./list-all-custom-column-entries.py course_id
with the option '-C'or '--containers' use HTTP rather than HTTPS for access to Canvas with the option "-v" or "--verbose" you get lots of output - showing in detail the operations of the program Can also be called with an alternative configuration file: --config config-test.json
Outputs an xlsx file of the form containing all of the custom columns: custom-column-entries-course_id-column-column_all.xlsx The first column of the output will be user_id.
To setup a single specific degree project course
./setup-a-degree-project-course-from-JSON-file.py cycle_number course_id school_acronym course_code program_code
cycle_number is either 1 or 2 (1st or 2nd cycle)
"-m" or "--modules" set up the two basic modules (does nothing in this program)
"-p" or "--page" set up the two basic pages for the course
"-s" or "--survey" set up the survey
"-S" or "--sections" set up the sections for the examiners and programs
"-c" or "--columns" set up the custom columns
"-p" or "--pages" set up the pages
"-a" or "--assignments" set up the assignments (proposal, alpha and beta drafts, active listner, self-assessment, etc.)
"-A" or "--all" set everything up (sets all of the above options to true)
with the option "-v" or "--verbose" you get lots of output - showing in detail the operations of the program
Can also be called with an alternative configuration file:
./setup-degree-project-course.py --config config-test.json 1 12683
(Very limited unless in verbose mode)
Note it is not designed to be run multipe times. If you want to run it again you need to delete the things (modules, assignments, and quiz) that were created. Programs to help with this can be found at https://github.com/gqmaguirejr/Canvas-tools
When generating sections, the code generates sections for each of the programs and each of the examiners to make it easy for PAs and examiners to keep track of the progress of their students.
# Create custom colums:
./setup-a-degree-project-course-from-JSON-file.py -c 1 19885 EECS IA150X CINTE
# Create sections for examiners and programs:
./setup-a-degree-project-course-from-JSON-file.py -S 1 19885 EECS IA150X CINTE
# Create assignments:
./setup-a-degree-project-course-from-JSON-file.py -a 1 19885 EECS IA150X CINTE
# Create pages for the course:
./setup-a-degree-project-course-from-JSON-file.py -p 1 19885 EECS IA150X CINTE
# Create objectives:
./setup-a-degree-project-course-from-JSON-file.py -o 1 19885 EECS IA150X CINTE
The contents of the Introduction pages and assignments need to be worked over. The assignments could be added to one of the modules.
Missing yet are the updated template files for 2020 and any other files in the course.
Also missing is adding the examiners automatically to the course. However, perhaps this should be left to the normal Canvas course room creation scripts.
To generate information for use in the KTH thesis template at https://gits-15.sys.kth.se/maguire/kthlatex/tree/master/kththesis
./get-school-acronyms-and-program-names-data.py
Produces a file containing the school acronyms and all of the program names, in the format for inclusion into the thesis template
To added the URL of a page to the URL being passed to an external tool. This code is to be added to an account in Canvas as custom Javascript code.
The document Adding_URL_to_call_to_external_tool.docx describes how to add the page where an external LTI tool is invoked to the URL passed to the LTI application (for the Javascript add-url-to-button-push-for-lti.js).
Purpose To insert examiners names into a grading scale for use with an assignment to keep track of who the examiner for a student is. The example code will be used as the name of the grading scale.
The documentation of this program is in Abusing_grading_schemes.docx.
./insert_teachers_grading_standards.py -a account_id cycle_number school_acronym course_code
./insert_teachers_grading_standards.py course_id cycle_number school_acronym course_code
./insert_teachers_grading_standards.py -v 11 2 EECS II246X
To insert a grading scale for use with a Yes/Now result (the Yes or No "grade" is reported in the Gradebook by the teacher).
./insert_YesNo_grading_standards.py -a account_id
./insert_YesNo_grading_standards.py course_id
./insert_YesNo_grading_standards.py -v 11
To insert a menu item into the Canvas global navigation menu and if you click on this buttom it toggles between English ("en") and Swedish ("sv").
The details are document in Better_language_support.docx
To get information about all of the degree project courses and their examiners from KOPPS
./get-all-degree-project-examiners.py cycle_number
cycle_number is either 1 or 2
Outputs a file of the names: KTH_examiners-cycle-1.json or KTH_examiners-cycle-2.json
To check the examiner name against the list of degree project examiners
./check_degree_projects_from_DiVA.py diva_shreadsheet.xlsx
Outputs and updated spreadsheet
To get information about a KTH user based on their orcid
./get_user_by_orcid.py orcid_of_user
Outputs JSON
./get_user_by_orcid.py 0000-0002-6066-746X
user={'kthId': 'u1d13i2c', 'username': 'maguire', 'emailAddress': '[email protected]', 'firstName': 'Gerald Quentin', 'lastName': 'Maguire Jr'}
To get information about a KTH user based on their orcid
./get_user_by_kthid.py KTHID_of_user
Outputs JSON
./get_user_by_orcid.py 0000-0002-6066-746X
user={'defaultLanguage': 'en',
'acceptedTerms': True,
'isAdminHidden': False,
'avatar': {'visibility': 'public'},
'_id': 'u1d13i2c', 'kthId': 'u1d13i2c', 'username': 'maguire',
'homeDirectory': '\\\\ug.kth.se\\dfs\\home\\m\\a\\maguire',
'title': {'sv': 'PROFESSOR', 'en': 'PROFESSOR'},
'streetAddress': 'ISAFJORDSGATAN 26',
'emailAddress': '[email protected]',
'telephoneNumber': '',
'isStaff': True, 'isStudent': False,
'firstName': 'Gerald Quentin', 'lastName': 'Maguire Jr',
'city': 'Stockholm', 'postalCode': '10044',
'remark': 'COMPUTER COMMUNICATION LAB',
'lastSynced': '2020-10-28T13:36:56.000Z',
'researcher': {'researchGate': '', 'googleScholarId': 'HJgs_3YAAAAJ', 'scopusId': '8414298400', 'researcherId': 'G-4584-2011', 'orcid': '0000-0002-6066-746X'},
'courses': {
'visibility': 'public',
'codes': ['II2202',
...
],
'items': [{'title': {'sv': 'Forskningsmetodik och vetenskapligt skrivande', 'en': 'Research Methodology and Scientific Writing'}, 'roles': ['examiner', 'courseresponsible', 'teachers'], 'code': 'II2202', 'koppsUrl': 'https://www.kth.se/student/kurser/kurs/II2202', 'courseWebUrl': 'https://www.kth.se/social/course/II2202/'},
...
]},
'worksFor': {'items': [{'key': 'app.katalog3.J.JH', 'path': 'j/jh', 'location': '', 'name': 'CS DATAVETENSKAP', 'nameEn': 'DEPARTMENT OF COMPUTER SCIENCE'}, {'key': 'app.katalog3.J.JH.JHF', 'path': 'j/jh/jhf', 'location': 'KISTAGÅNGEN 16, 16440 KISTA', 'name': 'KOMMUNIKATIONSSYSTEM', 'nameEn': 'DIVISION OF COMMUNICATION SYSTEMS'}]},
'pages': [],
'links': {'visibility': 'public', 'items': [{'url': 'http://people.kth.se/~maguire/', 'name': 'Personal web page at KTH'}, {'url': 'https://www.ae-info.org/ae/Member/Maguire_Jr._Gerald_Quentin', 'name': 'page at Academia Europaea'}]}, 'description': {'visibility': 'public', 'sv': '<p>Om du verkligen vill kontakta mig eller hitta information om mig, se min hemsida:\xa0<a href="http://people.kth.se/~maguire/">http://people.kth.se/~maguire/</a></p>\r\n', 'en': '<p>If you actually want to contact me or find information related to me, see my web page:\xa0<a href="http://people.kth.se/~maguire/">http://people.kth.se/~maguire/</a></p>\r\n'},
'images': {'big': 'https://www.kth.se/social/files/576d7ae3f2765459470e7b0e/chip-identicon-52e6e0ae2260166c91cd528ba0c72263_large.png', 'visibility': 'public'},
'room': {'placesId': 'fad3809a-344b-4572-9795-5b423e0a9b2a', 'title': '4478'},
'socialId': '55564',
'createdAt': '2006-01-09T13:13:59.000Z',
'visibility': 'public'}
To get the school acronyms and the acroynms and names of the 3rd cycle programs to be used when making a 3rd cycle thesis/dissertation
get-school-acronyms-and-program-names-data-3rd-cycle.py
Outputs the LaTeX code on standard output and in a file schools_and_programs_3rd_cycle.ins
./get-school-acronyms-and-program-names-data-3rd-cycle.py
cmdp=\newcommand{\programcode}[1]{%
\ifinswedish
\IfEqCase{#1}{%
{KTHARK}{\programme{Arkitektur}}%
{KTHBIO}{\programme{Bioteknologi}}%
{KTHBYV}{\programme{Byggvetenskap}}%
{KTHDAT}{\programme{Datalogi }}%
{KTHEST}{\programme{Elektro- och systemteknik}}%
{KTHEGI}{\programme{Energiteknik och -system}}%
{KTHFTK}{\programme{Farkostteknik}}%
{KTHFYS}{\programme{Fysik}}%
{KTHGEO}{\programme{Geodesi och Geoinformatik}}%
{KTHHFL}{\programme{Hållfasthetslära}}%
{KTHIEO}{\programme{Industriell ekonomi och organisation}}%
{KTHIIP}{\programme{Industriell produktion}}%
{KTHIKT}{\programme{Informations- och kommunikationsteknik}}%
{KTHKEV}{\programme{Kemivetenskap}}%
{KTHKON}{\programme{Konst, teknik och design}}%
{KTHMAT}{\programme{Matematik}}%
{KTHKOM}{\programme{Medierad kommunikation }}%
{KTHPBA}{\programme{Planering och beslutsanalys}}%
{KTHSHB}{\programme{Samhällsbyggnad: Management, ekonomi och juridik}}%
{KTHTMV}{\programme{Teknisk materialvetenskap}}%
{KTHMEK}{\programme{Teknisk Mekanik}}%
{KTHTKB}{\programme{Teoretisk kemi och biologi}}%
}[\typeout{program's code not found}]
\else
\IfEqCase{#1}{%
{KTHARK}{\programme{Architecture}}%
{KTHBIO}{\programme{Biotechnology}}%
{KTHBYV}{\programme{Civil and Architectural Engineering}}%
{KTHDAT}{\programme{Computer Science}}%
{KTHEST}{\programme{Electrical Engineering}}%
{KTHEGI}{\programme{Energy Technology and Systems}}%
{KTHFTK}{\programme{Vehicle and Maritime Engineering}}%
{KTHFYS}{\programme{Physics}}%
{KTHGEO}{\programme{Geodesy and Geoinformatics}}%
{KTHHFL}{\programme{Solid Mechanics}}%
{KTHIEO}{\programme{Industrial Economics and Management}}%
{KTHIIP}{\programme{Production Engineering}}%
{KTHIKT}{\programme{Information and Communication Technology}}%
{KTHKEV}{\programme{Chemical Science and Engineering}}%
{KTHKON}{\programme{Art, Technology and Design }}%
{KTHMAT}{\programme{Mathematics}}%
{KTHKOM}{\programme{Mediated Communication}}%
{KTHPBA}{\programme{Planning and Decision Analysis}}%
{KTHSHB}{\programme{The Built Environment and Society: Management, Economics and Law}}%
{KTHTMV}{\programme{Engineering Materials Science}}%
{KTHMEK}{\programme{Engineering Mechanics}}%
{KTHTKB}{\programme{Theoretical Chemistry and Biology}}%
}[\typeout{program's code not found}]
\fi
}
The program creates an event entry: from a JSON file (input event type 0), from a MODS file (input event type 3), or from fixed data (input event type 2).
This event will be inserted into the KTH Cortina Calendar (unless the --nocortina flag is set or the user does not have a Cortina access key). The program also generates an announcement in the indicated Canvas course room and creates a calendar entry in the Canvas calendar for this course room.
It can also modify (using PUT) an existing Cortina Calendar entry.
./JSON_to_calendar.py -c course_id [--nocortina] --event 0|2|3 [--json file.json] [--mods file.mods]
Note that the initial fixed entry (i.e., a built in event) verison put an entry in for a thesis and then gets it, then modifies the English language "lead" for the event and modifies the entry. Finally, it gets the entry and outputs it.
The program evolved to take in events from other sources and also to generate an announcement in a Canvas course room and also to generate a Canvas Calendar event for this course room.
Extract data from the end of a PDF file that has been put out by my LaTeX template for use when inserting a thesis into DiVA. The formalt of this data is pseudo JSON.
Use the Python package pdfminer to extract the data from the PDF file. See https://github.com/pdfminer/pdfminer.six
extract_pseudo_JSON-from_PDF.py
Outputs by default calendar_event.json You can also specifiy another output file name.
Note that unless you specify the option "-l" or "--ligature" and of the common ligatures will be replaced by the letter combination, rather than left as a single code point. This is primarily to prevent problems later with ligatures in title, subtitles, abstracts, etc.
./extract_pseudo_JSON-from_PDF.py --pdf test5.pdf
./extract_pseudo_JSON-from_PDF.py --pdf test5.pdf --json event.json
./extract_pseudo_JSON-from_PDF.py --pdf oscar.pdf --json event.json
The program creates a thesis cover using the information from the arguments and a JSON file. The JSON file can be produced by extract_pseudo_JSON-from_PDF.py
./JSON_to_cover.py [-c course_id] --json file.json [--cycle 1|2] [--credits 7.5|15.0|30.0|50.0] [--exam 1|2|3|4|5|6|7|8 or or the name of the exam] [--area area_of_degree] [--area2 area_of_second_degree] [--trita trita_string] [--school ABE|CBH|EECS|ITM|SCI] [--file thesis_file.pdf] [--diva 1|2...]
Outputs the cover in a file: cover.pdf and splits the cover.pdf into two pages: cover_pages-1 and cover_pages-2
The file name if give must end in ".pdf". Still experimental
./JSON_to_cover.py -c 11 --json event.json --testing --exam 4
For a file (oscar.pdf) without For DIVA pages:
./JSON_to_cover.py --json event.json --testing --exam 4 --file oscar.pdf
For a file (oscar.pdf) with two(2) For DIVA pages:
./JSON_to_cover.py --json event.json --testing --exam 4 --file oscar.pdf --diva 2
Assuming that a student has submitted a thesis with the information in the For DIVA pages (at the end of it) that include the information about the opponent(s) and presenation.
- Save the PDF file, for example: oscar.pdf
- Extract the For DIVA information as JSON
./extract_pseudo_JSON-from_PDF.py --pdf oscar.pdf --json oscar.json
- Make the announcement for a course (with course_id 11):
./JSON_to_calendar.py -c 11 --nocortina --json oscar.json
The --nocortina flag says do not put into the KTH calendar (even if you have permssions to do so). At the moment the Cortina functionality is not available in production.
Assuming that a student has submitted a thesis with the information in the For DIVA pages (at the end of it) that includes the information for the DIVA entry and the examiner has approved the thesis.
- Save the PDF file, for example: oscar.pdf
- Extract the For DIVA information as JSON
./extract_pseudo_JSON-from_PDF.py --pdf oscar.pdf --json oscar.json
- Make the covers and apply them. For a file (oscar.pdf) with two(2) For DIVA pages:
./JSON_to_cover.py --json oscar.json --testing --exam 4 --file oscar.pdf --diva 2
It is also possible in the 3rd step to just make the cover.pdf, cover-pages-1 and cover-pages-2 files -- simply do not provide the --file and --diva arguments.
To fill in a KTH cover template with data from a JSON file
./fill_in_template.py --pdf template.pdf --json data.json
Outputs a pdf file named "output.pdf" (currently a fixed name)
./fill_in_template.py --pdf "KTH_Omslag_Exjobb_Formulär_Final_dummy_EN-20210623.pdf" --json jussi.json --trita "TRITA-EECS-EX-2021:330"
Note that the new template is net yet ready for prime time and this program is a simple hack to see if I can mechanically generate the new format of cover. Once both the template and the program are more mature the code should get integrated into JSON_to_cover.py - with a new option to specify whether you want to "new" or "old" cover.
The program creates a MODS file using the information from the arguments and a JSON file.
The input JSON file can be produced by extract_pseudo_JSON-from_PDF.py
./JSON_to_MODS.py [-c course_id] --json file.json [--cycle 1|2] [--credits 7.5|15.0|30.0|50.0] [--exam 1|2|3|4|5|6|7|8 or or the name of the exam] [--area area_of_degree] [--area2 area_of_second_degree] [--trita trita_string] [--school ABE|CBH|EECS|ITM|SCI]
Outputs the MODS file: MODS.pdf
./JSON_to_MODS.py -c 11 --json jussi.json --trita "TRITA-EECS-EX-2021:219" --testing
or
./JSON_to_MODS.py -c 11 --json test12.json --trita "TRITA-EECS-EX-2021:219" --testing
Note that currentlt the Canvas course information is not used.
The program makes an entry in LADOK for the indicate course_code and moment using the information from the arguments and a JSON file. The JSON file can be produced by extract_pseudo_JSON-from_PDF.py
./JSON_to_ladok.py [-c course_id] --json file.json --code course_code [--which 1|2] [--date 2021-07-14] [--grade [P|F|A|B|C|D|E|Fx|F”] -gradeScale ["PF"|"AF"] [--date YYYY-MM-DD]
Note that which == 3 means both authors, while 1 is hte first author only and 2 is the second author only The deault (0) is to report the result for both authors or the only author (if there is just one author).
If the exam date is not specified, it defaults to today.
An assumption is that there is only one moment that requires a project title, i.e., 'KravPaProjekttitel' is True
Misc. messages - mostly an error message including "Hinder mot skapa resultat påträffat: Rapporteringsrättighet saknas" as I do not have permission to register these course results
./JSON_to_ladok.py -c 11 --json experiment.json --code DA213X
This is very much a work in progress, since I have not really been able to test it completely. It uses the ladok3 python library, but extends it with some features that are not (yet) in the library.
The program extracts the thesis title from LADOK for all the students in the specified canvas_course.
./thesis_titles.py -c course_id
An assumption is that there is only one moment that requires a project title, i.e., 'KravPaProjekttitel' is True
Spreadsheeet with the data
./thesis_titles.py -c 25434
The program extracts the thesis title from LADOK for all the students in the canvas_course.
./thesis_titles_by_school.py -s school_acronym
An assumption is that there is only one moment that requires a project title, i.e., 'KravPaProjekttitel' is True
Output: spreadsheeet with the data in the a file with a name of the form: titles-all-school_acronym.xlsx
such as: titles-all-EECS.xlsx
./thesis_titles_by_school.py -s EECS
The program outputs a spreadsheet of titles and subtitles split by language from the input MODS file.
./MODS_to_titles_and_subtitles.py --mods file.mods
Outputs a file of the form: titles-from-{}.xlsx where {} is replace by the input filename without extension
./MODS_to_titles_and_subtitles.py --mods file.mods
Extract document information and properties from a DOCX file to make a JSON output
./extract_customDocProperties.py filename.docx
Outputs JSON for the DOCX file, in the form to be used for other program. If the output file is not specified the data is output to a file named output.json.
./extract_customDocProperties.py test.docx --json output.json
Pretty print the resulting JSON
./extract_customDocProperties.py Template-thesis-English-2021-with-for-DiVA.docx --pretty
force English as the language of the body of the document
./extract_customDocProperties.py Template-thesis-English-2021-with-for-DiVA.docx --English
force Swedish as the language of the body of the document
./extract_customDocProperties.py Template-thesis-English-2021-with-for-DiVA.docx --Swedish
Extract data from the pseudo JSON file that has been produced by my LaTeX template and cleanit up, so that it can be used with my other program (to create claendar entries, MODS file, and insert titles into LADOK).
./cleanup_pseudo_JSON-from_LaTeX.py --json fordiva.json [--acronyms acronyms.tex]
Outputs a new cleaned up JSON file in a file with the name augmented by "-cleaned"
./cleanup_pseudo_JSON-from_LaTeX.py --json fordiva.json [--acronyms acronyms.tex]
The program extracts the thesis title from LADOK for all the students in the canvas_course.
./degree_project_course_codes_by_school.py -s school_acronym
An assumption is that there is only one moment that requires a project title, i.e., 'KravPaProjekttitel' is True
Spreadsheeet with the data
Outputs XLSX spreadsheet with teachers in the course and add some KTH profile information
./teachers-in-course-kthid-and-other-profile-data.py -c course_id
Outputs a file with a name of the form teachers-COURSE_ID.xlsx
./teachers-in-course-kthid-and-other-profile-data.py --config config-test.json -c 25434
To collect the course moment information and number of students in course instances since a given starting year (by default 2020)
./courses_grades_by_school.py -s school_code
Outputs a spreadsheet with a name of the form: courses-in-XXXX.xlsx
./courses_grades_by_school.py -s EECS
To augment a spreadsheet produced by courses_grades_by_school.py
./augment_course_data.py -s school_acronym
reads in course data from courses-in-{}.xlsx
Outputs an updated spreadsheet courses-in-{}-augmented.xlsx
./augment_course_data.py -s EECS
dept_names=['EECS/Computer Science', 'EECS/Electrical Engineering', 'EECS/Human Centered Technology', 'EECS/Intelligent Systems']
dept_colors={'EECS/Computer Science': {'name': 'EECS/Computer Science', 'color': {'color': 'blue', 'transparency': 50}}, 'EECS/Electrical Engineering': {'name': 'EECS/Electrical Engineering', 'color': {'color': 'red', 'transparency': 50}}, 'EECS/Human Centered Technology': {'name': 'EECS/Human Centered Technology', 'color': {'color': 'green', 'transparency': 50}}, 'EECS/Intelligent Systems': {'name': 'EECS/Intelligent Systems', 'color': {'color': 'magenta', 'transparency': 50}}}
max_number_of_students_in_a_course=400
total_students=29817
max_row=8, cats=='cy1 degree name'!C2:C9, values=("='cy1 degree name'!$E2:$E9",)
Uses a Pie in Pie chart to show this data (sheetname=cy2 degree name)
max_row=33, cats=='cy2 degree name'!C2:C34, values=("='cy2 degree name'!$E2:$E34",)
The program creates a thesis cover using the information from the arguments and a JSON file. The JSON file can be produced by extract_pseudo_JSON-from_PDF.py
./JSON_to_DOCX_cover.py --json file.json [--cycle 1|2] [--credits 7.5|15.0|30.0|50.0] [--exam 1|2|3|4|5|6|7|8 or or the name of the exam] [--area area_of_degree] [--area2 area_of_second_degree] [--trita trita_string] [--file cover_template.docx] [--picture]
Outputs the cover in a file: <input_filename>-modified.docx
Only one test json file has been run.
# enter data from a JSON file
./JSON_to_DOCX_cover.py --json event.json
./JSON_to_DOCX_cover.py --json event.json --testing --exam 4
./JSON_to_DOCX_cover.py --json fordiva-cleaned.json --file za5.docx
# produces za5-modified.docx with the optional picture removed
# Manually specifying the level and number of credits
./JSON_to_DOCX_cover.py --json fordiva-cleaned.json --file za5.docx --cycle 1 --credits 7.5
./JSON_to_DOCX_cover.py --json fordiva-cleaned.json --file za5.docx --cycle 1 --credits 10.0
./JSON_to_DOCX_cover.py --json fordiva-cleaned.json --file za5.docx --cycle 1 --credits 15.0
# it will even work with
./JSON_to_DOCX_cover.py --json fordiva-cleaned.json --file za5.docx --cycle 1 --credits 15
./JSON_to_DOCX_cover.py --json fordiva-cleaned.json --file za5.docx --cycle 2 --credits 15.0
./JSON_to_DOCX_cover.py --json fordiva-cleaned.json --file za5.docx --cycle 2 --credits 30.0
./JSON_to_DOCX_cover.py --json fordiva-cleaned.json --file za5.docx --cycle 2 --credits 60.0
The program creates a XLSX file of orgniazation data based upon the DiVA cora API for Organisationsmetadata
./DiVA_organization_info.py [--orgid org_id] [--orgname organization_name] [--json filename.json] [--csv]
Output: outputs a file with a name of the form DiVA_org_id_date.xlsx
The columns of the spread sheet are organisation_id, organisation_name_sv, organisation_name_en, organisation_type_code, organisation_type_name, organisation_parent_id, closed_date, organisation_code
The command has --verbose and --testing optional arguments for more information and more limiting the number of records processed.
# get data from a JSON file
./DiVA_organization_info.py --orgid 177 --json UUB-20211210-get.json
# get data from a JSON file with out specifying the orgid, it will take this from the topOrganisation
./DiVA_organization_info.py --json UUB-20211210-get.json
# get date via the organization name
./DiVA_organization_info.py --orgname kth
# ouput a CSV file rather than a XLSX file
./DiVA_organization_info.py --json UUB-20211210-get.json --csv
The program extract the list of custom docproperties and their values from a DOCX file
./extract_custom_DOCX_properties.py [--file filename.docx]
Outputs the properties in a JSON file: <input_filename>-extracted.json
The custom DOCPROPETIES are in a file (with in the ZIP archive DOCX file) with the name docProps/custom.xml
./extract_custom_DOCX_properties.py --file zb1.docx
The program produces a customized DOCX by setting the custom DOCPROPERIYES to the values from the JSON file The JSON file can be produced by extract_custom_DOCX_properties.py
./customize_DOCX_file.py --json file.json [--file cover_template.docx]
Outputs a customized DOCX file: <input_filename>-modified.docx
Use of the two programs (customize_DOCX_file.py and extract_custom_DOCX_properties.py) is explained in the document: Modifying_DOCX_properties.docx
./customize_DOCX_file.py --json custom_values.json --file za5.docx
# produces za5-modified.docx
The program produces a customized ZIP of a LaTeX project based upon the values in the JSON file
./customize_LaTeX_project.py --json file.json [--file latex_project.zip] [--initialize]
Outputs a customized LaTeX project ZIP file: <input_filename>-modified.zip
If the --initialize command line argument is given, then the existing custom content is ignored. Otheriwse, if the length of the existing content is longer thane 0, the new customizeation is added at the end of the existing customization.
Only limited testing has been done.
The program creates a JSON file of customization information
./create_customized_JSON_file.py [-c CANVAS_COURSE_ID]
[-j JSON]
[--language LANGUAGE]
[--author AUTHOR]
[--author2 AUTHOR2]
[--school SCHOOL]
[--courseCode COURSECODE]
[--programCode PROGRAMCODE]
[--cycle CYCLE]
[--credits CREDITS]
[--area AREA]
[--area2 AREA2]
[--numberOfSupervisors NUMBEROFSUPERVISORS]
[--Supervisor SUPERVISOR]
[--Supervisor2 SUPERVISOR2]
[--Supervisor3 SUPERVISOR3]
[--Examiner EXAMINER]
Outputs a JSON file with customized content: by default: customize.json
The code assumes that students are in a section in the course with the course code in the section name. The code will also take advantage of students being in project groups, so you only have to give the user name for one of the students. If the Examiner and Supervisor "assignments" exist the code will use the examiner/superviors name from the "grade" of these assignments to get the data for the examiner and supervisor(s). Note that this code only supports getting information for KTH supervisors, for industrial supervisors you can just use a user name such as xxx - that does not exist as a KTH user name and the code will generate fake informaiton as a place holder for the external supervisor.
The code uses the course code to guess what national subject catergory the thesis will fall into. Note that in some cases, the course name suggests multiple categories - so these are added and then there is a note about which category codes correspond to what - so that a human can edit the resulting JSON file to have a suitable list of category codes in it.
If you specify a value, such as --courseCode COURSECODE it will override the course code detected from the section that the student is in. This is both for testing purposes and can be used if the student is not yet in the Canvas course.
./create_customized_JSON_file.py --canvas_course_id 32733 --author vvvvv --language eng --programCode TCOMK --courseCode EA275X --Examiner maguire --Supervisor vastberg --Supervisor2 xxx
If the examiner and supervisor are known in the course, then the input could be as simple as:
./create_customized_JSON_file.py --canvas_course_id 22156 --author aaaaaa --language eng --programCode TCOMK
In the above case, the actual student behind the obscured user name 'aaaaaa' was in a two person first cycle degree project and the code will correctly find the other student (if they are in a project group together in the course).
Purpose to collect information about the subjects of the various degree project courses using the information from KOPPS.
./degree_project_courses_subjects.py
Outpus the result as an XLSX file with a name: degree_project_courses_info.xlsx and a JSON file with the name: degree_project_courses_info.json
The program modifies the KTH cover (saved as a DOCX file) by inserting drop-down menus and other configuration for a particular exam and main subject/field of technology/...
./add_dropdows_to_DOCX_file.py [--file cover_template.docx]
outputs a modified DOCX file: <input_filename>-modified.docx More specifically the 'word/document.xml' within the DOCX file is modified.
Depends on the new KTH cover files not being changed.
If z6.docx contains an English cover:
./add_dropdows_to_DOCX_file.py --file z6.docx --exam kandidatexamen
If z7.docx contains a Swedish cover:
./add_dropdows_to_DOCX_file.py --file z7.docx --exam kandidatexamen --language sv
The various exams in English and Swedish
./add_dropdows_to_DOCX_file.py --file z6.docx --exam kandidatexamen
./add_dropdows_to_DOCX_file.py --file z7.docx --exam kandidatexamen --language sv
./add_dropdows_to_DOCX_file.py --file z6.docx --exam högskoleingenjörsexamen
./add_dropdows_to_DOCX_file.py --file z7.docx --exam högskoleingenjörsexamen --language sv
./add_dropdows_to_DOCX_file.py --file z6.docx --exam civilingenjörsexamen
./add_dropdows_to_DOCX_file.py --file z7.docx --exam civilingenjörsexamen --language sv
./add_dropdows_to_DOCX_file.py --file z6.docx --exam magisterexamen
./add_dropdows_to_DOCX_file.py --file z7.docx --exam magisterexamen --language sv
./add_dropdows_to_DOCX_file.py --file z6.docx --exam masterexamen
./add_dropdows_to_DOCX_file.py --file z7.docx --exam masterexamen --language sv
./add_dropdows_to_DOCX_file.py --file z6.docx --exam arkitektexamen
./add_dropdows_to_DOCX_file.py --file z7.docx --exam arkitektexamen --language sv
./add_dropdows_to_DOCX_file.py --file z6.docx --exam ämneslärarexamen
./add_dropdows_to_DOCX_file.py --file z7.docx --exam ämneslärarexamen --language sv
./add_dropdows_to_DOCX_file.py --file z6.docx --exam CLGYM
./add_dropdows_to_DOCX_file.py --file z7.docx --exam CLGYM --language sv
./add_dropdows_to_DOCX_file.py --file z6.docx --exam KPULU
./add_dropdows_to_DOCX_file.py --file z7.docx --exam KPULU --language sv
./add_dropdows_to_DOCX_file.py --file z6.docx --exam both
./add_dropdows_to_DOCX_file.py --file z7.docx --exam both --language sv
./add_dropdows_to_DOCX_file.py --file z6.docx --exam same
./add_dropdows_to_DOCX_file.py --file z7.docx --exam same --language sv
There is a script to create a directory (Some_examples) of examples:
create_some_dropdown_cover_examples.bash
Reads in data from a XLSX file of degree project courses with their subjects and computers overlaps. The end goal is to be able to cluser the degree project courses by subject.
./cluster_degree_projects.py --file xxx.xlsx
Various outputs, such as:
overlap_combinations=[{'ABE', 'CBH'}, {'EECS', 'ABE'}, {'ABE', 'ITM'}, {'SCI', 'ABE'}, {'EECS', 'CBH'}, {'ITM', 'CBH'}, {'SCI', 'CBH'}, {'STH', 'CBH'}, {'EECS', 'ITM'}, {'SCI', 'EECS'}, {'SCI', 'ITM'}]
overlapping_subjects={'Physics', 'Information and Communication Technology', 'Information Technology', 'Electrical Engineering', 'Computer Science and Engineering', 'Environmental Engineering', 'Technology and Economics', 'Engineering Physics', 'Materials Science and Engineering', 'Technology and Health', 'Mechanical Engineering', 'Industrial Management', 'Materials Science'}
ABE CBH EECS ITM SCI STH
Computer Science and Engineering X X
Electrical Engineering X X X
Engineering Physics X X X
Environmental Engineering X X
Industrial Management X X
Information Technology X X
Information and Communication Technology X X
Materials Science X X
Materials Science and Engineering X X
Mechanical Engineering X X X X
Physics X X
Technology and Economics X X
Technology and Health X X
This is very much a work in progress.
./cluster_degree_projects.py --file degree_project_courses_info-sorted.xlsx
A document about using the DOCX templates and their associated programs.
Add the the information from the JSON file to the cover (fron and back).
./add_subject_credits_title_etc_to_cover.py --file filename.docx --json filename.json --exam examname
./add_subject_credits_title_etc_to_cover.py --file Omslag_Exjobb_Eng_en-20220325.docx --json calendar_event.json --exam kandidatexamen
Create a KTH back cover with a TRITA number
./backcover.py --school xxx [--year yyyy] --number 00 --pdf output.pdf
Outputs a PDF page as a back cover.
./backcover.py --school EECS --year 2022 --number 00 --pdf output.pdf
./check_for_new_cover.py [--pdf test.pdf] [-s spreadhseet.xlsx]
If given a pdf file name it outputs some
If given a spreadsheet it ouputs some information about what PDFminer fins on the page and whether this is an old cover or incorrect degree project major subject and produces an updated spreadsheet augment with the information it got from the PDF files.
If you make a directory to put the thesis PDFs into, such as
mkdir EECS_theses_in_DIVA
the you can get the spreadsheet with:
wget -O eecs-2022.csv 'https://kth.diva-portal.org/smash/export.jsf?format=csvall2&addFilename=true&aq=[[]]&aqe=[]&aq2=[[{"dateIssued":{"from":"2022","to":"2022"}},{"organisationId":"879223","organisationId-Xtra":true},{"publicationTypeCode":["studentThesis"]}]]&onlyFullText=false&noOfRows=5000&sortOrder=title_sort_asc&sortOrder2=title_sort_asc'
wget -O sci-2022.csv 'https://kth.diva-portal.org/smash/export.jsf?format=csvall2&addFilename=true&aq=[[]]&aqe=[]&aq2=[[{"dateIssued":{"from":"2022","to":"2022"}},{"organisationId":"6091","organisationId-Xtra":true},{"publicationTypeCode":["studentThesis"]}]]&onlyFullText=false&noOfRows=5000&sortOrder=title_sort_asc&sortOrder2=title_sort_asc'
wget -O itm-2022.csv 'https://kth.diva-portal.org/smash/export.jsf?format=csvall2&addFilename=true&aq=[[]]&aqe=[]&aq2=[[{"dateIssued":{"from":"2022","to":"2022"}},{"organisationId":"6023","organisationId-Xtra":true},{"publicationTypeCode":["studentThesis"]}]]&onlyFullText=false&noOfRows=5000&sortOrder=title_sort_asc&sortOrder2=title_sort_asc'
wget -O abe-2022.csv 'https://kth.diva-portal.org/smash/export.jsf?format=csvall2&addFilename=true&aq=[[]]&aqe=[]&aq2=[[{"dateIssued":{"from":"2022","to":"2022"}},{"organisationId":"5850","organisationId-Xtra":true},{"publicationTypeCode":["studentThesis"]}]]&onlyFullText=false&noOfRows=5000&sortOrder=title_sort_asc&sortOrder2=title_sort_asc'
wget -O cbh-2022.csv 'https://kth.diva-portal.org/smash/export.jsf?format=csvall2&addFilename=true&aq=[[]]&aqe=[]&aq2=[[{"dateIssued":{"from":"2022","to":"2022"}},{"organisationId":"879224","organisationId-Xtra":true},{"publicationTypeCode":["studentThesis"]}]]&onlyFullText=false&noOfRows=5000&sortOrder=title_sort_asc&sortOrder2=title_sort_asc'
You have to convert the CSV file to XLSX file.
Now you can run a script to get all the files from the spdreasheet:
get_full_text_from_diva.py eecs-2022.xlsx
Now that you have the files locally, you can run the program with the -s option and give the name of the spreadsheet, scuh as:
/z3/maguire/E-learning/check_for_new_cover.py -s eecs-2022.xlsx
This will produce a file: eecs-2022with_coverinfo.xlsx
Fetch the full text of theses from DiVA using the URL in the field FullTextLink in the spreadsheet.
./get_full_text_from_diva.py filename.xlsx
Outputs the files to the current directory with a name of the form -FULLTEXT.pdf where is the publication ID from the first column of the spreadsheet.
check for and determina the page within the PDF file where the "For DIVA" data begins
./find_For_DIVA_page.py [--pdf test.pdf] [--spreadsheet filename.xlsx]
If run on a single PDF file, it either outputs a line of the form:
Found for DIVA page at 117 in dddddddd-FULLTEXT01.pdf
or nothing
If run on a spreadsheet it outputs a new spreadsheet (whose name ends with 'with_forDIVA_info.xlsx) augmented with a column: 'For DIVA page(s) present' it also outputs instances of found "For DiVA" pages saying:
Found for DIVA page at 96 in dddddddd-FULLTEXT01.pdf by author(s) X, Y (KTH [177], Skolan för elektroteknik och datavetenskap (EECS) [879223])
For a single PDF file:
./find_For_DIVA_page.py --pdf ddddddd-FULLTEXT01.pdf
For all the PDF files in the spreadsheet
./find_For_DIVA_page.py -s ../eecs-2022with_coverinfo.xlsx
Note that this can be fund after updating the original spreadsheet with cover information
Check for and determina the page within the PDF file where the back cover is Note that this also checks for old covers.
./find_back_cover_page.py [--pdf test.pdf] [--spreadsheet filename.xlsx]
Depending on the -v or --testing options there is various levels of output and in the case of a spreadsheet, the program produces a spreadsheet augmented with data about the back cover. The column 'Back cover' will contain the page number of the back cover that was found, while the column 'Back cover version' will contain 'Old' or 'New' to indicate which version of the cover was found.
The new spreadsheet filename will end with 'with_back_cover_info.xlsx'.
For a single PDF file:
./find_back_cover_page.py --pdf ddddddd-FULLTEXT01.pdf
For all the PDF files in the spreadsheet:
# ./find_back_cover_page.py -s ../eecs-2022with_coverinfo.xlsx
Note that this can be fund after updating the original spreadsheet with cover information
Make a front cover (in PDF) using the information in a JSON file.
./frontcover.py --json input.json --pdf output.pdf --year YYYY
The PDF file is generated in the specified output file (by default "test.pdf").
This is a work in progress.
./frontcover.py -v --json fordiva-example-cleaned.json --pdf output.pdf --year 2022
Try to make a number of different types of covers based on JSON files using frontpage.py
Needs a set of JSON files:
fordiva-example-högskoleexamen-tekink-swedish.json fordiva-example-högskoleingenjörsexamen-elektronik_och_datorteknik-swedish.json fordiva-example-kandidate-tekink-swedish.json fordiva-example-högskoleingenjörsexamen-elektronik_och_datorteknik-swedish.json fordiva-example-civilingenjörsexamen-elektrotekink-swedish.json fordiva-example-magisterexamen-swedish.json fordiva-example-masters-TCOMM.json fordiva-example-arkitektexamen-swedish.json fordiva-example-ämneslärarexamen-Technology_and_Learning.json fordiva-example-CLGYM-Technology_and_Learning.json fordiva-example-KPULU.json fordiva-example-both.json fordiva-example-same.json
Creates output file in a subdirectory "Some_examples_of_covers"
Find and extract refrences pages
./find_and_extract_references.py [--pdf test.pdf] [--spreadsheet filename.xlsx]
Ouptuts files eith file names ending with "-refpages.pdf"
path_to_executable/find_and_extract_references.py -s ../eecs-2022.xlsx
Take the tex file produced by nbconvert and customize it
./customize_tex_from_nbconvert.py filename.tex [customization.tex]
Outputs a file with a name of the form filenam-customized.tex
-
The useer first produces LaTeX from a Jupyter notebook
jupyter nbconvert --to latex Notebook_5-EECS.ipynb
-
Customize the resulting Notebook_5-EECS.tex file
customize_tex_from_nbconvert.py --tex Notebook_5-EECS.tex
Compare the files in a local directory to the files in a OneDriver folder
The spradsheet is obtained from the Onedrive folder by exporting it to Excel. To do this you go to OneDrive and change to the classic interface and then export to Excel - which gives me a query.iqy file (i.e., a Microsoft Internet Query file)
This query.iqy has to be openned in a desktop version of Excel (it did not seem possible to open it via https://www.office.com/). Once you do yet another login, it does give a spreadsheet of the files.
Oddly it does not say how large the files are but rather gives a "Huvudantal" column. Now I will just have to figure out out to compare this with the results of "ls" and then figure out which files are missing.
The preadhsheet is assumed to have the columns: 'Namn', 'Ändrat', 'Huvudantal', 'Ändrades av', 'Objekttyp', 'Sökväg'
With the option "-v" or "--verbose" you get lots of output - showing in detail the operations of the program
./compare_onedrive_folder_with_directory.py local_directory onedrive_spreadsheetFile
./compare_onedrive_folder_with_directory.py II2202-for-Wouter II2202-for-wouler-spreadsheet.xlsx
spreadsheetColumns=['Namn', 'Ändrat', 'Huvudantal', 'Ändrades av', 'Objekttyp', 'Sökväg']
skipping a file wihout a valid name
skipping a file wihout a valid name
Missing file: /z3/maguire/II2202-for-Wouter/#z9#
Due to a invalid charcter (:) in a OneDriver filename, missing file: II2202-for-Wouter/2020/Figures_for_Canvas_pages/Thumbs.db:encryptable
Due to a invalid charcter (:) in a OneDriver filename, missing file: II2202-for-Wouter/Green_networks/Thumbs.db:encryptable
Due to a invalid charcter (:) in a OneDriver filename, missing file: II2202-for-Wouter/Modules-2021/Modules/Quality_assurance/Thumbs.db:encryptable
...
Due to a invalid charcter (:) in a OneDriver filename, missing file: II2202-for-Wouter/Images/Thumbs.db:encryptable
Missing file: II2202-for-Wouter/Images/-topleve-sustainabilioty-quiz-Screenshot_20220324_182126.png
Missing file: II2202-for-Wouter/Images/-student-rights-Screenshot_20220222_152448.png
Due to a invalid charcter (:) in a OneDriver filename, missing file: /z3/maguire/II2202-for-Wouter/Images/KTH Library: Databases.jpg
Note that the two files: II2202-for-Wouter/Images/-topleve-sustainabilioty-quiz-Screenshot_20220324_182126.png II2202-for-Wouter/Images/-student-rights-Screenshot_20220222_152448.png correspond to the two files that were skipped, they are actually in OneDrive but there is a problem with the interpretation of the value in the "Namn" cell in the spreadsheet.
Using data that has been colected from a MODS output of DiVA entries of theses, collect information about the users who made those entries. This is indicated by a field in the MODS entry about who originated the record, this has been mapped to a column in the dataframe called 'recordInfo.recordOrigin'. The entry in this field is the kthid of the user who originated the record. The DiVA identifier of the thesis is in 'recordInfo.recordIdentifier'.
A goal was to understand the effects of KTH cleaning out information about people who are no longer employed.
./users_making_diva_entries.py pickel_filename
The output is a spreadsheet with information about the users who made entries in DiVA and the number of entries they made in total and by year
./users_making_diva_entries.py /z3/maguire/Jupyter/KTH-2022-MODS-pickle.gz
to take the result from users_making_diva_entries.py and add the user's first and last names from LDAP
./augment_users_making_diva_entries.py spreadsheet.xlsx
an updated spreadsheet with "-augmented" added to the base filename
You have to be on a machine where ldapsearch is available and have permissions to access the LDAP database.
The default input file name is "diva_admin_stats.xlsx"
./augment_users_making_diva_entries.py
To take the result from a Jupyter notebook matching of titles in LADOK with title in DiVA and augment with data from Canvas.
./augment_author_matches_with_canvas_info.py spreadsheet.xlsx
Outputs an updated spreadsheet with "-augmented" added to the base filename.
././augment_author_matches_with_canvas_info.py -t --file titles-all-EECS-df1-author-matches.xlsx
Creates KTH front cover - as per the new proposed simpler cover
./frontcover2023.py --json fordiva-example-cleaned.json --pdf output.pdf --year 2023
Overwrites the incoming pdf file with a new version with the cover inserted
This program is just for experiments with a proposed new simpler cover and is not approved (yet).
To force making a Swedish cover:
./frontcover2023.py -t --json test.json --pdf test.pdf
Extract author information (especially e-mail address) and working title from a degree project proposal that used my LaTeX proposal template.
The next sep is to Write this information to a spreadsheet (the name of which can be specified on the command line).
./process_degree_project_proposal.py [--pdf test.pdf] [--spreadsheet filename.xlsx]
Currently it outputs for each proposal something of the form (shown when running the command below):
proposals_for_testing/Degree_project_proposal_template_multiline_title.pdf
Total number of PDF pages=3
process page_index=0
extracted data for proposals_for_testing/Degree_project_proposal_template_multiline_title.pdf: {'authorname': 'FIRSTNAME LASTNAME', 'email': '[email protected]', 'working_title': 'A long multiline working title as an example to make sure that the extraction program can handle titles with multiple lines ', 'date': 'December 20, 2022', 'proposal length': 3}
proposals_for_testing/Degree_project_proposal_template_multiline_title-isodate.pdf
Total number of PDF pages=3
process page_index=0
extracted data for proposals_for_testing/Degree_project_proposal_template_multiline_title-isodate.pdf: {'authorname': 'FIRSTNAME LASTNAME', 'email': '[email protected]', 'working_title': 'A long multiline working title as an example to make sure that the extraction program can handle titles with multiple lines ', 'date': '2022-12-20', 'proposal length': 3}
proposals_for_testing/Degree_project_proposal_template-20221220.pdf
Total number of PDF pages=3
process page_index=0
extracted data for proposals_for_testing/Degree_project_proposal_template-20221220.pdf: {'authorname': 'FIRSTNAME LASTNAME', 'email': '[email protected]', 'working_title': 'Working title ', 'date': 'December 20, 2022', 'proposal length': 3}
Now writing the output spreadsheet with the date in the desired format
For a single PDF file:
./process_degree_project_proposal.py --pdf ddddddd-FULLTEXT01.pdf
For all the PDF files in a directory
./process_degree_project_proposal.py --dir xxxxxx
You can xxxx, for example:
Extract author information (especially e-mail address), working title, keywords and other information
./process_degree_project_proposal-XMP.py [--pdf test.pdf] [--spreadsheet filename.xlsx]
A spreadsheet with the extraced data (sorted by the creation date).
For a single PDF file:
# ./process_degree_project_proposal-XMP.py -v -p proposals_for_testing/Degree_project_proposal_template_multiline_title.pdf -s proposals-processes-date-a.xlsx
For all the PDF files in a directory:
./process_degree_project_proposal-XMP.py --dir proposals_for_testing -s proposals-processes-date-a.xlsx
The program takes in a PPTX file or directory (and possibly subdirectories) and extracts the links
./GetPPTXLinks.py path [options]
outputs a spreadsheet
./GetPPTXLinks.py Lecture-2-2-dl-backprop1.pptx -v
./GetPPTXLinks.py /z3/maguire/Nvidia/DeepLearningKit/ -r -o DeepLearningKit_PPTX_Links.xlsx