Introduction: Automatic MSDS Finder / Online Data Retrieval
Hey,
This is a project I did for Freeside Atlanta, a Georgia nonprofit organization for coders, makers, artists, and researchers. Freeside is located in a warehouse and among their tools, they house a lot of chemicals. However, they needed to keep a database of the material safety data sheet (MSDS) of all of their chemicals. However, individually retrieving the MSDS of each of the hundreds of chemicals is impractical. Also, they wanted to have a system in place so that whenever anyone brought a new chemical to the space, they would be able to easily add the MSDS. Lastly, Freeside wanted to have an electronic copy of all the MSDS as well as a physical copy in a binder.
Therefore, this project is a script written in Java to retrieve the MSDS of multiple chemicals.
If you are a programming or electronics enthusiast in the Atlanta region, you should look into joining Freeside. Its a very collaborative environment with lots of smart people and every tool under the sun.
learn more at: http://wiki.freesideatlanta.org/
see more cool projects at: http://blog.freesideatlanta.org/
This is a project I did for Freeside Atlanta, a Georgia nonprofit organization for coders, makers, artists, and researchers. Freeside is located in a warehouse and among their tools, they house a lot of chemicals. However, they needed to keep a database of the material safety data sheet (MSDS) of all of their chemicals. However, individually retrieving the MSDS of each of the hundreds of chemicals is impractical. Also, they wanted to have a system in place so that whenever anyone brought a new chemical to the space, they would be able to easily add the MSDS. Lastly, Freeside wanted to have an electronic copy of all the MSDS as well as a physical copy in a binder.
Therefore, this project is a script written in Java to retrieve the MSDS of multiple chemicals.
If you are a programming or electronics enthusiast in the Atlanta region, you should look into joining Freeside. Its a very collaborative environment with lots of smart people and every tool under the sun.
learn more at: http://wiki.freesideatlanta.org/
see more cool projects at: http://blog.freesideatlanta.org/
Step 1: Plan
I don't want to simply upload the code and have people copy it without knowing first what is going on and how to use it. This section outlines what is going on in the program and what the steps we took to write it are.
Basically, we input a list of chemical names to the program and the program goes through every single one of the chemicals and finds an MSDS for it. Output will look like the picture below.
So, before we got started writing the program, we went through Freeside and documented every single chemical. Then, we made a txt document of every chemical to input into the program. The program reads one line at a time, so we put each chemical on its own line.
Then, we had to find an online database of all the chemical MSDS. We used http://hazard.com/msds/index.php. If you can't find an online database of the thing you are searching for, you might have a harder time than others.
So, the program will find each MSDS but it will need a way to output these. For the example I will be posting here, we simply output the MSDS to a text document. However, its just as easy to output them into a txt document or some other form of output.
You need to import the necessary libraries in order for this code to work.
I used maven, so I just had to copy the dependencies into the Pom file.
However, you can just as easily get the Jar files and import the libraries into you IDE.
you need:
"JSoup Parser" -- which you can get from http://jsoup.org/
"Apache httpclient" -- which you can get from http://hc.apache.org/httpclient-3.x/
Note: if you are using Maven, you will get errors in you code until you run it the first time and it downloads the libraries. You will probably see an output like the attached picture
Basically, we input a list of chemical names to the program and the program goes through every single one of the chemicals and finds an MSDS for it. Output will look like the picture below.
So, before we got started writing the program, we went through Freeside and documented every single chemical. Then, we made a txt document of every chemical to input into the program. The program reads one line at a time, so we put each chemical on its own line.
Then, we had to find an online database of all the chemical MSDS. We used http://hazard.com/msds/index.php. If you can't find an online database of the thing you are searching for, you might have a harder time than others.
So, the program will find each MSDS but it will need a way to output these. For the example I will be posting here, we simply output the MSDS to a text document. However, its just as easy to output them into a txt document or some other form of output.
You need to import the necessary libraries in order for this code to work.
I used maven, so I just had to copy the dependencies into the Pom file.
However, you can just as easily get the Jar files and import the libraries into you IDE.
you need:
"JSoup Parser" -- which you can get from http://jsoup.org/
"Apache httpclient" -- which you can get from http://hc.apache.org/httpclient-3.x/
Note: if you are using Maven, you will get errors in you code until you run it the first time and it downloads the libraries. You will probably see an output like the attached picture
Step 2: Code
I couldn't attach the code because the formatting would mess up.
So, I've copied it below. Explanations are below.
App.java:
package org.freesideatlanta.msds;
import java.util.ArrayList;
/**
*
* @author Praznav
*/
public class App {
public static void main(String[] args) {
try {
String filename = args[0];
ChemicalReader reader = new ChemicalReader(filename);
ArrayList<String> chemicals = reader.getChemicalNames();
MsdsCatalog catalog = new MsdsCatalog();
MsdsWriter writer = new MsdsWriter();
for (String chemical : chemicals) {
MSDS msds = catalog.query(chemical);
String text = msds.getText();
writer.write(chemical, text);
}
writer.close();
} catch (ArrayIndexOutOfBoundsException e) {
System.out.println("Usage: App [filename]");
System.out.println(e.getMessage());
}
}
}
Chemical Reader.java:
package org.freesideatlanta.msds;
import java.util.ArrayList;
import java.io.*;
public class ChemicalReader {
String name;
ArrayList<String> chemicalList = new ArrayList<String>(); // List of all the chemicals. Line number is index + 1
String line; // String that holds current file line
BufferedReader bufRead;
public ChemicalReader(String filename) {
name = filename;
}
public ArrayList<String> getChemicalNames() {
try {
FileReader input = new FileReader(name);
bufRead = new BufferedReader(input);
System.out.println("Reading starts now ....");
System.out.println("___________________________________________________________________________________");
System.out.println();
line = bufRead.readLine(); // reads teh first line
getAllChemicals();
bufRead.close(); // closes the reader
replaceBadCharacters();
} catch (IOException e) {
System.out.println(e.getMessage());
System.out.println("error! u suck at this ");
}
return chemicalList;
}
private void replaceBadCharacters() {
// This interates through the arraylist and
// replaces every space with a +
int size = chemicalList.size(); // variable for the size of the array list
for (int i = 0; i < size; i++) { // one iteration for every index in teh arraylist
String a = chemicalList.get(i).replaceAll(" ", "+"); // creates a new variable and replaces teh space with a +
chemicalList.remove(i); // takes out hte old string at the index
chemicalList.add(i, a); // inserts the new string
}
}
private void getAllChemicals() {
// This reads each line of the txt document and
// puts each line into seperate index of an
// arraylist.
try {
while (line != null) { // while loop that iterates through every line until there isn't one
System.out.println("chemical: " + line); // prints the chemical name
// chemicalList.add(line.trim()); // adds it to the arraylist created above
chemicalList.add(line);
line = "";
line = bufRead.readLine(); // next line
}
} catch (IOException e) {
System.out.println(e.getMessage());
System.out.println("error! u suck at this ");
}
}
}
MSDS.java:
package org.freesideatlanta.msds;
public class MSDS {
String name;
String MSDStext;
public MSDS (String a) {
name = a;
}
public String getText () {
return MSDStext;
}
public void changeText(String a) {
MSDStext = a;
}
}
MSDSCatalog.java:
package org.freesideatlanta.msds;
import org.apache.http.*;
import org.apache.http.client.*;
import org.apache.http.impl.client.*;
import org.apache.http.client.methods.*;
import org.apache.http.util.*;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.util.ArrayList;
import java.io.*;
public class MsdsCatalog {
HttpClient client;
String URLhere = "http://hazard.com/msds/gn.cgi?query=";
ArrayList<String> errorsHere;
MSDS chemicalsMSDS;
String body;
String edittedBody;
public MsdsCatalog() {
client = new DefaultHttpClient(); // start the client
//URLhere = "http://hazard.com/msds/gn.cgi?query="; // this is the generic part of the URL common to every site
errorsHere = new ArrayList<String>();
}
public MSDS query(String chemical) {
try {
URLhere = "http://hazard.com/msds/gn.cgi?query=";
chemicalsMSDS = new MSDS(chemical);
System.out.println("Next chemical" + "\n" + "Chemical: " + chemical);
URLhere = URLhere + chemical;
HttpGet method = new HttpGet(URLhere);// inserts URL to the method
System.out.println(URLhere);
HttpResponse response = client.execute(method); // gets a response from teh URL
HttpEntity entity = response.getEntity(); // creates an entity
body = EntityUtils.toString(entity); // converts the entity to a string and ads it to the body
boolean hasJtBakerDb = (body.indexOf("jtbaker.com") >= 0);
boolean hasSafetyCard = (body.indexOf("mf/cards/file") >= 0);
boolean hasFileCard = (body.indexOf("href=f") >= 0);
if (hasJtBakerDb || hasSafetyCard || hasFileCard) {
// affirmative case
System.out.println("No Errors");
chemicalsMSDS = getMSDS();
} else {
// negative case
System.out.println("ERROR! THIS CHEMICAL IS NOT FOUND ON THE DATABASE!");
System.out.println("THIS WILL BE ADDED TO THE ERROR LIST");
errorsHere.add(chemical);
}
} catch (IOException e) {
System.out.println(e.getMessage());
System.out.println("error! u suck at this ");
}
return chemicalsMSDS;
}
private MSDS getMSDS() {
boolean hasSafetyCard = (body.indexOf("mf/cards/file") >= 0);
boolean hasFisher = (body.indexOf("fscim") >= 0);
if (hasSafetyCard == true) {
chemicalsMSDS = retrieveSafetyCard();
} else if (hasFisher == true) {
chemicalsMSDS = retrieveFisher();
} else {
chemicalsMSDS = retrieveMSDS();
}
return chemicalsMSDS;
}
private MSDS retrieveSafetyCard() {
try {
Document abc = Jsoup.connect(URLhere).get();
Elements links = abc.select("a[href]");
for (int i = 0; i < links.size(); i++) {
boolean isSafetyCard = (links.get(i).html().indexOf("Safety Card") >= 0);
if (isSafetyCard) {
String newURL = links.get(i).attr("abs:href");
HttpGet method = new HttpGet(newURL);
HttpResponse response = client.execute(method);
HttpEntity entity = response.getEntity();
String MSDSq = EntityUtils.toString(entity);
Document doc_one = Jsoup.parse(MSDSq);
MSDSq = doc_one.body().text();
chemicalsMSDS.changeText(MSDSq);
i = links.size() + 10;
} else {
}
}
} catch (IOException e) {
System.out.println(e.getMessage());
System.out.println("error! u suck at this ");
}
return chemicalsMSDS;
}
private MSDS retrieveFisher() {
try {
Document abc = Jsoup.connect(URLhere).get();
Elements links = abc.select("a[href]");
for (int i = 0; i < links.size(); i ++) {
boolean isFisher = (links.get(i).html().indexOf("Fisher ") >= 0);
if (isFisher == true) {
String newURL = links.get(i).attr("abs:href");
HttpGet method = new HttpGet(newURL);
HttpResponse response = client.execute(method);
HttpEntity entity = response.getEntity();
String MSDSq = EntityUtils.toString(entity);
Document doc_one = Jsoup.parse(MSDSq);
MSDSq = doc_one.body().text();
chemicalsMSDS.changeText(MSDSq);
i = links.size() + 10;
} else {
}
}
} catch (IOException e) {
System.out.println(e.getMessage());
System.out.println("error! u suck at this ");
}
return chemicalsMSDS;
}
private MSDS retrieveMSDS() {
try {
Document abc = Jsoup.connect(URLhere).get();
Elements links = abc.select("a[href]");
for (int i = 0; i < links.size(); i ++) {
boolean isJtBaker = (links.get(i).html().indexOf("Mallinckrodt Baker ") >= 0);
boolean isErrors = (links.get(i).attr("abs:href").indexOf("msds/errors.html") >= 0);
boolean isSearch = (links.get(i).attr("abs:href").indexOf("msds/search.html") >= 0);
boolean isArchive = (links.get(i).attr("abs:href").indexOf("msds/index.php") >= 0);
if (isJtBaker == true || isErrors == true || isSearch == true || isArchive == true) {
} else {
String newURL = links.get(i).attr("abs:href");
HttpGet method = new HttpGet(newURL);
HttpResponse response = client.execute(method);
HttpEntity entity = response.getEntity();
String MSDSq = EntityUtils.toString(entity);
Document doc_one = Jsoup.parse(MSDSq);
MSDSq = doc_one.body().text();
chemicalsMSDS.changeText(MSDSq);
i = links.size() + 10;
}
}
} catch (IOException e) {
System.out.println(e.getMessage());
System.out.println("error! u suck at this ");
}
return chemicalsMSDS;
}
private String deleteString(int beginIndex, int endIndex, String a) {
String toBeDeleted = a.substring(beginIndex, endIndex);
return a.replace(toBeDeleted, "");
}
}
MSDSWriter.java:
package org.freesideatlanta.msds;
import java.io.*;
/*
* @author praznav
*/
public class MsdsWriter {
FileWriter fstream;
BufferedWriter out;
public MsdsWriter () {
try {
fstream = new FileWriter("output.txt");
out = new BufferedWriter(fstream);
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
public void write(String chemical, String text) {
try {
out.append(chemical);
out.newLine();
out.append("__________");
out.newLine();
out.append(text);
out.newLine();
out.newLine();
out.newLine();
} catch (Exception e) {//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
public void close() {
try {
out.close();
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
}
here is a summary of each file:
-- App --
contains the main method and instantiates every other class
-- ChemicalReader --
Compile a chemical list as an array object of some kind
Convert a chemical name with spaces into a URL friendly parameter list
-- MsdsCatalog --
Execute a query on the "friendly" chemical name, and get a result
See if this result is valid for our purposes
In terms of priority: Safety Card URL is preferred, then Fisher URL, then go through remaining links
-- Msds --
String name = msds.getChemicalName();
String text = msds.getText();
-- MsdsWriter --
Generate output text files in a sub-directory
So, I've copied it below. Explanations are below.
App.java:
package org.freesideatlanta.msds;
import java.util.ArrayList;
/**
*
* @author Praznav
*/
public class App {
public static void main(String[] args) {
try {
String filename = args[0];
ChemicalReader reader = new ChemicalReader(filename);
ArrayList<String> chemicals = reader.getChemicalNames();
MsdsCatalog catalog = new MsdsCatalog();
MsdsWriter writer = new MsdsWriter();
for (String chemical : chemicals) {
MSDS msds = catalog.query(chemical);
String text = msds.getText();
writer.write(chemical, text);
}
writer.close();
} catch (ArrayIndexOutOfBoundsException e) {
System.out.println("Usage: App [filename]");
System.out.println(e.getMessage());
}
}
}
Chemical Reader.java:
package org.freesideatlanta.msds;
import java.util.ArrayList;
import java.io.*;
public class ChemicalReader {
String name;
ArrayList<String> chemicalList = new ArrayList<String>(); // List of all the chemicals. Line number is index + 1
String line; // String that holds current file line
BufferedReader bufRead;
public ChemicalReader(String filename) {
name = filename;
}
public ArrayList<String> getChemicalNames() {
try {
FileReader input = new FileReader(name);
bufRead = new BufferedReader(input);
System.out.println("Reading starts now ....");
System.out.println("___________________________________________________________________________________");
System.out.println();
line = bufRead.readLine(); // reads teh first line
getAllChemicals();
bufRead.close(); // closes the reader
replaceBadCharacters();
} catch (IOException e) {
System.out.println(e.getMessage());
System.out.println("error! u suck at this ");
}
return chemicalList;
}
private void replaceBadCharacters() {
// This interates through the arraylist and
// replaces every space with a +
int size = chemicalList.size(); // variable for the size of the array list
for (int i = 0; i < size; i++) { // one iteration for every index in teh arraylist
String a = chemicalList.get(i).replaceAll(" ", "+"); // creates a new variable and replaces teh space with a +
chemicalList.remove(i); // takes out hte old string at the index
chemicalList.add(i, a); // inserts the new string
}
}
private void getAllChemicals() {
// This reads each line of the txt document and
// puts each line into seperate index of an
// arraylist.
try {
while (line != null) { // while loop that iterates through every line until there isn't one
System.out.println("chemical: " + line); // prints the chemical name
// chemicalList.add(line.trim()); // adds it to the arraylist created above
chemicalList.add(line);
line = "";
line = bufRead.readLine(); // next line
}
} catch (IOException e) {
System.out.println(e.getMessage());
System.out.println("error! u suck at this ");
}
}
}
MSDS.java:
package org.freesideatlanta.msds;
public class MSDS {
String name;
String MSDStext;
public MSDS (String a) {
name = a;
}
public String getText () {
return MSDStext;
}
public void changeText(String a) {
MSDStext = a;
}
}
MSDSCatalog.java:
package org.freesideatlanta.msds;
import org.apache.http.*;
import org.apache.http.client.*;
import org.apache.http.impl.client.*;
import org.apache.http.client.methods.*;
import org.apache.http.util.*;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.util.ArrayList;
import java.io.*;
public class MsdsCatalog {
HttpClient client;
String URLhere = "http://hazard.com/msds/gn.cgi?query=";
ArrayList<String> errorsHere;
MSDS chemicalsMSDS;
String body;
String edittedBody;
public MsdsCatalog() {
client = new DefaultHttpClient(); // start the client
//URLhere = "http://hazard.com/msds/gn.cgi?query="; // this is the generic part of the URL common to every site
errorsHere = new ArrayList<String>();
}
public MSDS query(String chemical) {
try {
URLhere = "http://hazard.com/msds/gn.cgi?query=";
chemicalsMSDS = new MSDS(chemical);
System.out.println("Next chemical" + "\n" + "Chemical: " + chemical);
URLhere = URLhere + chemical;
HttpGet method = new HttpGet(URLhere);// inserts URL to the method
System.out.println(URLhere);
HttpResponse response = client.execute(method); // gets a response from teh URL
HttpEntity entity = response.getEntity(); // creates an entity
body = EntityUtils.toString(entity); // converts the entity to a string and ads it to the body
boolean hasJtBakerDb = (body.indexOf("jtbaker.com") >= 0);
boolean hasSafetyCard = (body.indexOf("mf/cards/file") >= 0);
boolean hasFileCard = (body.indexOf("href=f") >= 0);
if (hasJtBakerDb || hasSafetyCard || hasFileCard) {
// affirmative case
System.out.println("No Errors");
chemicalsMSDS = getMSDS();
} else {
// negative case
System.out.println("ERROR! THIS CHEMICAL IS NOT FOUND ON THE DATABASE!");
System.out.println("THIS WILL BE ADDED TO THE ERROR LIST");
errorsHere.add(chemical);
}
} catch (IOException e) {
System.out.println(e.getMessage());
System.out.println("error! u suck at this ");
}
return chemicalsMSDS;
}
private MSDS getMSDS() {
boolean hasSafetyCard = (body.indexOf("mf/cards/file") >= 0);
boolean hasFisher = (body.indexOf("fscim") >= 0);
if (hasSafetyCard == true) {
chemicalsMSDS = retrieveSafetyCard();
} else if (hasFisher == true) {
chemicalsMSDS = retrieveFisher();
} else {
chemicalsMSDS = retrieveMSDS();
}
return chemicalsMSDS;
}
private MSDS retrieveSafetyCard() {
try {
Document abc = Jsoup.connect(URLhere).get();
Elements links = abc.select("a[href]");
for (int i = 0; i < links.size(); i++) {
boolean isSafetyCard = (links.get(i).html().indexOf("Safety Card") >= 0);
if (isSafetyCard) {
String newURL = links.get(i).attr("abs:href");
HttpGet method = new HttpGet(newURL);
HttpResponse response = client.execute(method);
HttpEntity entity = response.getEntity();
String MSDSq = EntityUtils.toString(entity);
Document doc_one = Jsoup.parse(MSDSq);
MSDSq = doc_one.body().text();
chemicalsMSDS.changeText(MSDSq);
i = links.size() + 10;
} else {
}
}
} catch (IOException e) {
System.out.println(e.getMessage());
System.out.println("error! u suck at this ");
}
return chemicalsMSDS;
}
private MSDS retrieveFisher() {
try {
Document abc = Jsoup.connect(URLhere).get();
Elements links = abc.select("a[href]");
for (int i = 0; i < links.size(); i ++) {
boolean isFisher = (links.get(i).html().indexOf("Fisher ") >= 0);
if (isFisher == true) {
String newURL = links.get(i).attr("abs:href");
HttpGet method = new HttpGet(newURL);
HttpResponse response = client.execute(method);
HttpEntity entity = response.getEntity();
String MSDSq = EntityUtils.toString(entity);
Document doc_one = Jsoup.parse(MSDSq);
MSDSq = doc_one.body().text();
chemicalsMSDS.changeText(MSDSq);
i = links.size() + 10;
} else {
}
}
} catch (IOException e) {
System.out.println(e.getMessage());
System.out.println("error! u suck at this ");
}
return chemicalsMSDS;
}
private MSDS retrieveMSDS() {
try {
Document abc = Jsoup.connect(URLhere).get();
Elements links = abc.select("a[href]");
for (int i = 0; i < links.size(); i ++) {
boolean isJtBaker = (links.get(i).html().indexOf("Mallinckrodt Baker ") >= 0);
boolean isErrors = (links.get(i).attr("abs:href").indexOf("msds/errors.html") >= 0);
boolean isSearch = (links.get(i).attr("abs:href").indexOf("msds/search.html") >= 0);
boolean isArchive = (links.get(i).attr("abs:href").indexOf("msds/index.php") >= 0);
if (isJtBaker == true || isErrors == true || isSearch == true || isArchive == true) {
} else {
String newURL = links.get(i).attr("abs:href");
HttpGet method = new HttpGet(newURL);
HttpResponse response = client.execute(method);
HttpEntity entity = response.getEntity();
String MSDSq = EntityUtils.toString(entity);
Document doc_one = Jsoup.parse(MSDSq);
MSDSq = doc_one.body().text();
chemicalsMSDS.changeText(MSDSq);
i = links.size() + 10;
}
}
} catch (IOException e) {
System.out.println(e.getMessage());
System.out.println("error! u suck at this ");
}
return chemicalsMSDS;
}
private String deleteString(int beginIndex, int endIndex, String a) {
String toBeDeleted = a.substring(beginIndex, endIndex);
return a.replace(toBeDeleted, "");
}
}
MSDSWriter.java:
package org.freesideatlanta.msds;
import java.io.*;
/*
* @author praznav
*/
public class MsdsWriter {
FileWriter fstream;
BufferedWriter out;
public MsdsWriter () {
try {
fstream = new FileWriter("output.txt");
out = new BufferedWriter(fstream);
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
public void write(String chemical, String text) {
try {
out.append(chemical);
out.newLine();
out.append("__________");
out.newLine();
out.append(text);
out.newLine();
out.newLine();
out.newLine();
} catch (Exception e) {//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
public void close() {
try {
out.close();
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
}
here is a summary of each file:
-- App --
contains the main method and instantiates every other class
-- ChemicalReader --
Compile a chemical list as an array object of some kind
Convert a chemical name with spaces into a URL friendly parameter list
-- MsdsCatalog --
Execute a query on the "friendly" chemical name, and get a result
See if this result is valid for our purposes
In terms of priority: Safety Card URL is preferred, then Fisher URL, then go through remaining links
-- Msds --
String name = msds.getChemicalName();
String text = msds.getText();
-- MsdsWriter --
Generate output text files in a sub-directory