Installing Endeca 6.1 using Puppet
At Buzzillions, we’ve been a user of Endeca for almost 3 years. However, we haven’t automated the installation of the software. Each time we upgraded to the next version or upgraded the hardware on an existing version, we’d have to dust off our documentation always wondering if we’d added those last minute tweaks that *really* finish off the installs. Did we add the fact that we had to change the capitalization of “Pipeline.epx” to the our internal wiki? Probably not. But is it really necessary to automate things if Charles and Kevin have it stored in their brain? Definitely, yes. In fact, one of engineers wanted a “play” version of Endeca and 15 minutes later, he had his own sandbox to mess with.
This Endeca-centered blog post is the first in an occasional series where we’ll discuss how we’re using Endeca. One of our engineers was mentioning how finding Endeca tips using Google wasn’t very easy. So here goes.
We’ve got enough machines in various development and production environments that standardizing installations and being able to bring new machines online quickly has become a business necessity.
So during this upgrade to version 6.1, we opted to run the whole install from Puppet , an open source server provisioning software project. We already use it to provision most of the other major software in our infrastructure.
I generally tell people that it takes about 4 times as long to automate a process as it does to “just do it” manually. That’s a hard pill to swallow when time spent on infrastructure has to be balanced with time spent innovating. For us, Endeca has long been one of the installs that’s done by the software engineers and not the systems team. But upgrading from 5.1.3 to 6.1.1 was the right time to automate the whole thing. So we grabbed our wiki pages, our journal entries and telnet session logs and sent them to our brave but persistent systems admin. He figured out the test cases, asked questions and probably installed Endeca over 50 times on the way to success.
The good news is that he can get the whole thing from a blank machine to a machine running forge/dgidx and dgraphs that have yesterday’s data and config files in about 15 minutes using Puppet.
Note that it helps to have all necessary files in a code repository system that Puppet has access to. We use Subversion.
If you’ve ever installed Endeca, the steps are usually included in 3 to 6 different PDF files that are in various “docs” directories. Though we’re partial to the CommerceSuiteQuickStartGuide.pdf these days. The migration steps are where it gets complicated. You generally need to work with Support to find out which files are critical and which ones are unnecessary. The various Migration Guides are fairly incomplete for a real migration.
Here are our steps for Migration from 5.1.3 to 6.1. Note that these steps will be slightly different than your specific steps. Really, the hardest part about the automated install is to formalize the exact ordering and dependency of all the install elements and the migration elements. The way puppet works is that at each step, you need to specify a “test” so that Puppet doesn’t run a step that it has already run before (it might be the existence of a directory or a file or the response code from the emgr_update calls). I think the phrase is making an “idempotent operation”. Also note that we use Red Hat Linux 5.
- Create a user account specific to the version of endeca you’re installing.
- Don’t just use a username that is generic. Using one that is specific per version allows you to upgrade on the same server since you may want to add new software to a server where you’re already using Endeca. The environment variables for different versions are different so we opted to have different users since our inittab scripts run as this special version specific endeca user.
- Puppet creates the endeca version specific user along with its profile environment variables file.
- Standard install steps
- Create a shortcut UNIX alias to change directories to the base /endeca directory, call it $ENDECA.
- Create a shortcut alias for your application name, call it $APPNAME
- As the endeca version specific user,
- Install the MDEX software using the silent install option. Easy. No config file needed.
- Install the PlatformServices software using the silent install option. Here you need to decide whether it’s a FULL install (for Forge/Dgidx servers) or an AGENT ONLY install (MDEX/Dgraph only servers). Silent install file is simple.
- Install the Merchandising Workbench software using the silent install option (or your appropriate Workbench package). Silent install file is simple.
- Install the ContentAssembler API basically by unzipping into the $ENDECA
- Install the DeploymentTemplate 3.0 using a silent install file. This silent install file can be quite complicated but we just use the simple options (anybody know where the documentation for the advanced options is located?). NOTE that since Endeca uses relative paths during deployment deploy, we had to write a shell script to call their shell script to get these relative paths to work in Puppet. The other installs seem to use the path that you pass in.
- Migration steps
- State files
- To keep your N values stable between instances, you need to grab the two files located in $ENDECA/$APPNAME/data/state and move them to your new deployment application in the same directory. We keep update these files in our Subversion repository after every Forge run.
- autogen_dimensions.xml.external.gz
- autogen_dimensions.xml
- To keep your N values stable between instances, you need to grab the two files located in $ENDECA/$APPNAME/data/state and move them to your new deployment application in the same directory. We keep update these files in our Subversion repository after every Forge run.
- Pipeline files
- The pipeline is divided generally into two parts: Dev Studio files and Web Studio files.
- The Dev Studio files are files that users DON’T change in Web Studio. They are generally located in $ENDECA/$APPNAME/config/pipeline and you can just copy them to the new deployment application in the same directory.
- The Web Studio files are kept in the “ether” so you have to extract them using the “emgr_update” application. You set which files are “Web Studio maintained” by changing the “ConfigUpdate” section of the AppConfig.xml. Make an empty directory $ENDECA/$APPNAME/temp and extract the files using the emgr_update command and then copy to the pipeline directory in the new deployment application.
- emgr_update –host localhost:8006 –action get_ws_settings –app_name $APPNAME –prefix $APPNAME –dir $ENDECA/$APPNAME/temp
- Once you’ve copied the Dev Studio files and then the Web Studio maintained files to the new pipeline then you can stuff the files into the new Web Studio.
- emgr_update –host localhost:8006 –action update_mgr_settings –app_name $APPNAME –prefix $APPNAME –dir $ENDECA/$APPNAME/config/pipeline
- If you have externally managed dimensions, you will need to ignore a warning like “warning, cannot find <file>, proceed anyway?” . Just say yes, it doesn’t actually need the externally managed dimension files.
- The pipeline is divided generally into two parts: Dev Studio files and Web Studio files.
- Data files
- In preparation for the baseline update, we move our data files to the $ENDECA/$APPNAME/test-data/baseline directory. We mainly use XML files as our pipeline source files.
- Web Studio usernames and preferences
- Copy the “webstudio” directory in the 5.1.3 $ENDECA_CONF/state
- Remove the existing “webstudio” directory in your new deployment application in $ENDECA_TOOLS_CONF/state
- Turn off the Endeca processes. Paste the “webstudio” directory from above into $ENDECA_TOOLS_CONF/state. Turn Endeca processes back on.
- Before you baseline, here are some miscellany:
- By hand, migrate your AppConfig.xml from 5.1.3 to 6.1.1 (not many material changes)
- In AppConfig.xml, change “pipeline.epx” reference to “Pipeline.epx”
- State files
- Run the $ENDECA/$APPNAME/control/initialize_services.sh to setup the servers and modules explained in the AppConfig.xml
- Run $ENDECA/$APPNAME/control/load_baseline_test_data.sh
- Run $ENDECA/$APPNAME/control/baseline_update.sh
- At this point, via Puppet, with NO human intervention we have installed a new Endeca master server (Forge/Dgidx). The Puppet installation for an MDEX/Dgraph server is just the user creation, MDEX install, PlatformServices AGENT ONLY install and an inittab script. Then you start the process and include the server and port into the AppConfig.xml which will configure the server the next time you run any EAC command.
Send us your comments and hopefully other Endeca users will start to blog/write about the ways they make life easier on themselves or the cool things they do. The Encyclopedia (read “wiki”) part of the Endeca documentation system is also a great place to make updates and share tips.
Cheers, The Buzzillions Team





Our friends at Jira (a.k.a. 



