Curriculum publishing & conversion with ePub 3.0 Workshops in New Delhi

One group taking part in a publishing process activity

During the first two weeks of December, we conducted a series of workshops on building and converting K-12 textbooks in ePub 3.0 at the Central Institute of Educational Technology at NCERT in New Delhi, India. This was in continuation of the process assistance we have been providing the government body and world's largest publisher with the largest known textbook digitization effort in the world. Our goal with this post is to share some elementary understanding of where some of the challenges lay, and more-so, the choices of workshops we made to try to fill some of these gaps.

Here's some more context into our involvement: CIET has been spearheading really innovative initiatives at India's NCERT, such as NROER. NCERT is the body responsible for the research and publishing of the national curriculum (here, textbooks) used in a large number of grades all throughout the country in public and private schools. One of CIET's major undertakings is a project to convert every single textbook they have written in 4-7 languages across all grade levels, into ePub 3.0. That's over 400 textbooks, each written in multiple languages. Due to the relationship between CIET and NCERT, this isn't a one-time effort; new non-ePub versions will continue to be released in 3-year cycles and the process will more-or-less be repeated.

The advantages of textbooks being in ePub 3.0 go well beyond the immediate utility of being mobile, web and e-reader-ready. The move to such a format is a move to openness and a move to the commons. This brings huge new possibilities with remixing and adaptation. Thus far, our work was limited to recommending the best process to the local team because of our inability to be on-the-ground for extended periods.

Now, we often use the phrase "conversion to digital" when we refer to such textbook conversion processes. The use of the word "digital" is interesting; because aren't these mass produced print-layed textbooks in some digital publishing format to begin with? In this case, we are working with thousands of PageMaker chapter files and hundreds of thousands of linked and embedded assets. However, because print layouts are so fundamentally different from how the fluid web works, the best reusable digital artifacts that can be produced from these files are PDFs. Content in PDFs are very difficult to manipulate, even for the technical folk. PageMaker, or now Adobe InDesign, is accessible to a very minute segment of designers and publishers. And that's why the need for conversion into something more open and re-usable turns out to be quintessential for remixing. We believe this has big implications on teaching and learning in the decades ahead.

Where CIET was at

Our first goal was to identify what the local team (of about 12 people) were doing to begin with, and then understand where they were struggling the most. It was instantly clear that the existing process was largely manual and assisted by existing ready-made open source tools out there. That said, the amount of work that the team had been able to pull off thus far was beyond impressive. Rigorous targets and the lack of the luxury to automate easily brought about a chaotic yet fulfilling process of getting a lot done in little time. They made unscalable look like scalable.

Budgets apart from team salaries were non-existent; so custom development to buy automations was not an option. Fortunately, almost every team member had a lightweight IT background, and had written a small amount of Java or PHP code in the past, so there was a small opportunity to automate. Without experienced software engineering talent to guide such efforts, however, this opportunity gets reduced to merely understanding how to debug repeating hairy issues and make minor tweaks using simple scripts. Which can also be extremely time-saving.

The Workshops

We ran the actual workshops all-day throughout the second week of our time there. These were conceptualized and prepared AFTER the first week of needs analysis, leaving very little time for perfecting them. We split the sessions by two self-explanatory tracks: "Content Architecture" (CA) and "Content Engineering" (CE) (a way of splitting human resources in digital publishing that I learned at Inkling). Several sessions were for both the tracks. Below are the names and purposes of these sessions.

Introductions and understanding publishing processes [both]
Goals: (a) To internalize the need for a well-defined process and task allocation in every conversion or publishing process undertaken, (b) To reflect on some basic guiding principles in the digitization process.
Notes: We covered a sample process that we used as a template for the rest of the sessions.
Relearning CSS and introduction to SASS [CA]
Goals: (a) To get acquainted with modern best-practices in CSS development and CSS methodologies in teams and discover current deviations from these, (b) To know about the presence of frameworks like Bootstrap, Foundation, and Materialize, (c) To be able to explain the need and use of SASS.
Notes: Inconsistent and unmanaged CSS had led to a lot of hair-pulling in the prior months. Similar to everywhere in the web design world, most designers picked up CSS through online tutorials and trial-and-error, but struggle to follow good practices and work in teams because things seem to look correct in the short-run.
Python and ETL [CE]
Goals: (a) To get acquainted with programmatic means of manipulating abundance of marked-up content with similar issues, (b) To be able to determine when to choose which programming language, (c) To be able to summarize most data management problems faced as Extract-Transform-Load problems, (d) To be able to list different ways of storing information and contrast between them.
Notes: A culturally relevant food-based example to understand ETL was a real hit.
Guidelines and compliance development [both]
Goals: To summarize, organize and list guidelines in the process of conversion for both teams working in the destop publishing (DTP) world and internally in building ePub files.
How ePub3 works [both]
Goals: (a) To build a strong mental model of the structure of an ePub file, (b) To internalize the idea that a pre-made ePub file is not just open-able and edit-able, but also hackable through automation, (c) To build an ePub file from scratch, (d) To be able to read an official specification document and solve a custom problem without needing to search on online forums.
Fundamentals of design & grid systems [CA]
Goals: (a) To be able to memorize and explain six of the gestalt principles, and identify multiple scenarios of their application to improve existing ePub designs, (b) To be able to articulate the use of palettes and how different colors come to form, in color theory, (c) To be able to build color harmony using analogous, complementary, triad, and shade color combinations, (d) To identify grid systems and build a grid for a new webpage.
Notes: Bad design has become a business problem for the team; too many uninformed opinions have led to bad design leading to disinterested users of ePubs.
What's this IDML thing? [CE]
Goals: (a) To be able to explain the need for an open markup language that can be used to build container files with all the data necessary to programmatically reproduce an InDesign file, (b) To know the contents of an IDML directory, (c) To see a Python script in action that extracts meaning from the IDML file to produce another format, in this case HTML, (d) To know that content can be extracted from within InDesign through JavaScript as well.
Understanding Encoding [both]
Goals: (a) To believe that encoding is not as complicated as imagined, and that problems faced were addressable without the need of multiple software tools, (b) To be acquainted with the history of encoding and words like ASCII, Unicode, and UTF-8.
Intro to better tooling: Sublime, Atom, Chrome & more [both]
Goals: (a) To discover editing tools like Sublime and Atom, their flexibility, and their extensibility, (b) To get a stronger grasp of the capabilities inside Chrome DevTools, (c) To be able to explain how a simple text editor along with simple automation using system tools and the command line can eliminate the need for an ePub editor.
Notes: The top tools being used prior to this were Sigil, gedit and Notepad++. Sigil is mostly unmaintained now, and very buggy. The debugging techniques in place were very rudimentary. Another interesting thing happened: the session started getting digressed into a Web performance session when we went into the Chrome DevTools part, because everyone was so intrigued by how well (or poorly) their existing web properties currently performed.
Preflight [CA]
Goals: (a) To remember some fundamental principles of what the preflight process aims to accomplish, (b) To discuss problems that occur post-conversion and how they can be dealt with in the preflight process.
Notes: One of the technical challenges in low-resourced settings like this is access to good technology. Only two computers had access to Adobe InDesign, which meant only a couple of people understood the challenges of this process from past experience.
The Terminal [CE]
Goals: (a) To become familiar with the most common UNIX commands and their functionality, (b) To learn how to execute popular command line tools with arguments, (c) To read a documentation file for a command line tool.
Notes: Gladly, there was a very serious open source mandate and every computer had the same version of Ubuntu installed. There were also a couple of iMacs to run the Adobe suite.
Asset management [CA]
Goals: (a) To be able to explain what assets are and contrast assets from media, (b) To reflect on inadequacies of existing asset management processes (i.e. USB, email), (c) To comprehend the ethical issues associated with the embedding of assets (especially fonts) that are unlicensed.
Responsive Web Design, and tables [both]
Goals: (a) To be able to articulate the challenges of trying to design books and websites for multiple screen sizes, (b) To list the basic techniques and principles of responsive web design, (c) To be able to use the media-query CSS selector for several screen sizes to make a page responsive, (d) To build fluid tables in contrast with using structures of block elements for the same.
MathML & LaTeX [CA]
Goals: To practice converting formulae from existing textbooks into MathML and LaTeX, through the use of reference charts.
Scripting enhancements [CE]
Goals: To see the change of one webpage into another (after a series of transformations, repeated for several instances) through very simple scripting.
Trial runs & reflection [both]
Goals: (a) To run through a trial of the process covered, incorporating newer practices, techniques, and tools introduced into thus far, (b) To collectively tweak the template process based on issues faced technically and in coordination as a team.
Notes: This was the most useful session and generates the most number of learnings.
Design systems and frameworks [CA]
Goals: (a) To internalize the meaning of a design system, and how it is similar to and different from a branding guideline, (b) To walkover the popular design systems prevelant on the web, including Material Design and Atomic design, (c) To become more familiar with how frameworks like Bootstrap and Foundation are laid out, and how to use portions (mostly components) of them.
Whitelisting vs Blacklisting in practice [CA]
Goals: To contrast, through an elaborate example, two approaches of converting a textbook from a DTP file: through whitelisting (by beginning with a blank slate and turning it into an ePub using a step-by-step process) and blacklisting (by starting with the ePub export output from InDesign and refining it)
What is data science & machine learning? [CE]
Goals: To build a very elementary understanding of data science, machine learning, and how these both relate to data management and advanced statistics.
(recap) [both]
Self-explanatory.

Reflections

Apart from picking tens of inter-cultural learnings, the sessions were a humbling reminder of the power automation and engineering can bring to these processes. It also reminded everyone that while curriculum building was something teachers did, in the digital world (in this case, ePubs), a fair amount of technical expertise was required with the current state of technology to get decently far (even for the technically-savvy).

If you are interested in this problem and helping the CIET team with this initiative, please drop us a note at hello [at] opencurriculum [dot] org!